Skip to content

Commit a2f0dd9

Browse files
gpop63zmoog
andauthored
[AWS] Add EMR metrics data stream (#6120)
* add emr_metrics data stream * bump package version * fix aws package version * fix field types * add unit and metric_types * add docs * fix typo in docs * Update packages/aws/data_stream/emr_metrics/manifest.yml Co-authored-by: Maurizio Branca <maurizio.branca@elastic.co> * add tags_filter and include_linked_accounts * add beta release --------- Co-authored-by: Maurizio Branca <maurizio.branca@elastic.co>
1 parent 9495cd7 commit a2f0dd9

File tree

13 files changed

+918
-1
lines changed

13 files changed

+918
-1
lines changed
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Amazon EMR (ElasticMapReduce)
2+
3+
The Amazon EMR integration allows you to monitor [Amazon EMR](https://aws.amazon.com/emr/) — a fully managed big data processing and analytics service.
4+
5+
Use the Amazon EMR integration to collect metrics related to your EMR instances. Then visualize that data in Kibana, create alerts to notify you if something goes wrong, and reference the metrics when troubleshooting an issue.
6+
7+
For example, you could use this data to track Amazon EMR cluster progress and cluster storage. Then you can alert when utilization for an instance crosses a predefined threshold.
8+
9+
**IMPORTANT: Extra AWS charges on AWS API requests will be generated by this integration. Please refer to the AWS integration for more details.**
10+
11+
## Data streams
12+
13+
The Amazon EMR integration collects one type of data: metrics.
14+
15+
**Metrics** give you insight into the state of Amazon EMR.
16+
The metrics collected by the Amazon EMR integration include cluster progress, cluster state, cluster or node storage, and more. See more details in the [Metrics reference](#metrics-reference)
17+
18+
## Requirements
19+
20+
You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it.
21+
You can use our hosted Elasticsearch Service on Elastic Cloud, which is recommended, or self-manage the Elastic Stack on your own hardware.
22+
23+
Before using any AWS integration you will need:
24+
25+
* **AWS Credentials** to connect with your AWS account.
26+
* **AWS Permissions** to make sure the user you're using to connect has permission to share the relevant data.
27+
28+
For more details about these requirements, see the **AWS** integration documentation.
29+
30+
## Setup
31+
32+
Use this integration if you only need to collect data from the Amazon EMR service.
33+
34+
If you want to collect data from two or more AWS services, consider using the **AWS** integration.
35+
When you configure the AWS integration, you can collect data from as many AWS services as you'd like.
36+
37+
For step-by-step instructions on how to set up an integration, see the
38+
{{ url "getting-started-observability" "Getting started" }} guide.
39+
40+
## Metrics reference
41+
42+
{{event "emr_metrics"}}
43+
44+
{{fields "emr_metrics"}}

packages/aws/changelog.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
# newer versions go on top
2+
- version: "1.45.0"
3+
changes:
4+
- description: Add AWS EMR metrics data stream.
5+
type: enhancement
6+
link: https://github.com/elastic/integrations/pull/6120
27
- version: "1.44.4"
38
changes:
49
- description: Migrate AWS Metric Overview dashboard visualizations to lens.
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
metricsets: ["cloudwatch"]
2+
period: {{period}}
3+
{{#if data_granularity}}
4+
data_granularity: {{data_granularity}}
5+
{{/if}}
6+
{{#if include_linked_accounts}}
7+
include_linked_accounts: {{include_linked_accounts}}
8+
{{/if}}
9+
{{#if access_key_id}}
10+
access_key_id: {{access_key_id}}
11+
{{/if}}
12+
{{#if secret_access_key}}
13+
secret_access_key: {{secret_access_key}}
14+
{{/if}}
15+
{{#if session_token}}
16+
session_token: {{session_token}}
17+
{{/if}}
18+
{{#if credential_profile_name}}
19+
credential_profile_name: {{credential_profile_name}}
20+
{{/if}}
21+
{{#if shared_credential_file}}
22+
shared_credential_file: {{shared_credential_file}}
23+
{{/if}}
24+
{{#if role_arn}}
25+
role_arn: {{role_arn}}
26+
{{/if}}
27+
{{#if default_region}}
28+
default_region: {{default_region}}
29+
{{/if}}
30+
{{#if regions}}
31+
regions:
32+
{{#each regions as |region i|}}
33+
- {{region}}
34+
{{/each}}
35+
{{/if}}
36+
{{#if latency}}
37+
latency: {{latency}}
38+
{{/if}}
39+
{{#if tags_filter}}
40+
tags_filter: {{tags_filter}}
41+
{{/if}}
42+
{{#if proxy_url }}
43+
proxy_url: {{proxy_url}}
44+
{{/if}}
45+
metrics:
46+
- namespace: AWS/ElasticMapReduce
47+
resource_type: emr
48+
statistic: ["Average"]
49+
name:
50+
- IsIdle
51+
- ContainerPendingRatio
52+
- LiveDataNodes
53+
- MultiMasterInstanceGroupNodesRunningPercentage
54+
- HDFSUtilization
55+
- YARNMemoryAvailablePercentage
56+
- TotalUnitsRunning
57+
- TotalNodesRunning
58+
- TotalVCPURunning
59+
- CoreUnitsRunning
60+
- CoreNodesRunning
61+
- CoreVCPURunning
62+
- TaskUnitsRunning
63+
- TaskNodesRunning
64+
- TaskVCPURunning
65+
- AutoTerminationIsClusterIdle
66+
- namespace: AWS/ElasticMapReduce
67+
resource_type: emr
68+
statistic: ["Sum"]
69+
name:
70+
- ContainerAllocated
71+
- ContainerReserved
72+
- ContainerPending
73+
- AppsCompleted
74+
- AppsFailed
75+
- AppsKilled
76+
- AppsPending
77+
- AppsRunning
78+
- AppsSubmitted
79+
- CoreNodesPending
80+
- MRTotalNodes
81+
- MRActiveNodes
82+
- MRLostNodes
83+
- MRUnhealthyNodes
84+
- MRDecommissionedNodes
85+
- MRRebootedNodes
86+
- MultiMasterInstanceGroupNodesRunning
87+
- MultiMasterInstanceGroupNodesRequested
88+
- S3BytesWritten
89+
- S3BytesRead
90+
- HDFSBytesRead
91+
- HDFSBytesWritten
92+
- TotalLoad
93+
- MemoryTotalMB
94+
- MemoryReservedMB
95+
- MemoryAvailableMB
96+
- MemoryAllocatedMB
97+
- PendingDeletionBlocks
98+
- UnderReplicatedBlocks
99+
- DfsPendingReplicationBlocks
100+
- CapacityRemainingGB
101+
- TotalNotebookKernels
102+
- namespace: AWS/ElasticMapReduce
103+
resource_type: emr
104+
statistic: ["Maximum"]
105+
name:
106+
- MissingBlocks
107+
- CorruptBlocks
108+
- TotalUnitsRequested
109+
- TotalNodesRequested
110+
- TotalVCPURequested
111+
- CoreUnitsRequested
112+
- CoreNodesRequested
113+
- CoreVCPURequested
114+
- TaskUnitsRequested
115+
- TaskNodesRequested
116+
- TaskVCPURequested
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
- name: cloud
2+
title: Cloud
3+
group: 2
4+
description: Fields related to the cloud or infrastructure the events are coming from.
5+
footnote: 'Examples: If Metricbeat is running on an EC2 host and fetches data from its host, the cloud info contains the data about this machine. If Metricbeat runs on a remote machine outside the cloud and fetches data from a service running in the cloud, the field contains cloud data from the machine the service is running on.'
6+
type: group
7+
fields:
8+
- name: image.id
9+
type: keyword
10+
description: Image ID for the cloud instance.
11+
- name: host
12+
title: Host
13+
group: 2
14+
description: 'A host is defined as a general computing instance.
15+
16+
ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.'
17+
type: group
18+
fields:
19+
- name: containerized
20+
type: boolean
21+
description: >
22+
If the host is a container.
23+
24+
- name: os.build
25+
type: keyword
26+
example: "18D109"
27+
description: >
28+
OS build information.
29+
30+
- name: os.codename
31+
type: keyword
32+
example: "stretch"
33+
description: >
34+
OS codename, if any.
35+
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
- name: data_stream.type
2+
type: constant_keyword
3+
description: Data stream type.
4+
- name: data_stream.dataset
5+
type: constant_keyword
6+
description: Data stream dataset.
7+
- name: data_stream.namespace
8+
type: constant_keyword
9+
description: Data stream namespace.
10+
- name: '@timestamp'
11+
type: date
12+
description: Event timestamp.
13+
- name: event.module
14+
type: constant_keyword
15+
description: Event module
16+
value: aws
17+
- name: event.dataset
18+
type: constant_keyword
19+
description: Event dataset
20+
value: aws.emr_metrics
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
- external: ecs
2+
name: cloud
3+
- external: ecs
4+
name: cloud.account.id
5+
- external: ecs
6+
name: cloud.account.name
7+
- external: ecs
8+
name: cloud.availability_zone
9+
- external: ecs
10+
name: cloud.instance.id
11+
- external: ecs
12+
name: cloud.instance.name
13+
- external: ecs
14+
name: cloud.project.id
15+
- external: ecs
16+
name: cloud.machine.type
17+
- external: ecs
18+
name: cloud.provider
19+
- external: ecs
20+
name: cloud.region
21+
- external: ecs
22+
name: ecs.version
23+
- external: ecs
24+
name: error
25+
- external: ecs
26+
name: error.message
27+
- external: ecs
28+
name: service.type
29+
- external: ecs
30+
name: host.architecture
31+
- external: ecs
32+
name: host.domain
33+
- external: ecs
34+
name: host.hostname
35+
- external: ecs
36+
name: host.id
37+
- external: ecs
38+
name: host.ip
39+
- external: ecs
40+
name: host.mac
41+
- external: ecs
42+
name: host.name
43+
- external: ecs
44+
name: host.os.family
45+
- external: ecs
46+
name: host.os.kernel
47+
- external: ecs
48+
name: host.os.name
49+
- external: ecs
50+
name: host.os.platform
51+
- external: ecs
52+
name: host.os.version
53+
- external: ecs
54+
name: host.type
55+
- external: ecs
56+
name: container.id
57+
- external: ecs
58+
name: container.image.name
59+
- external: ecs
60+
name: container.labels
61+
- external: ecs
62+
name: container.name

0 commit comments

Comments
 (0)