Hadoop metrics
The Hadoop system records a set of metric counters for each job that it runs. elasticsearch-hadoop extends on that and provides metrics about its activity for each job run by leveraging the Hadoop Counters infrastructure. During each run, elasticsearch-hadoop sends statistics from each task instance, as it is running, which get aggregated by the Map/Reduce infrastructure and are available through the standard Hadoop APIs.
elasticsearch-hadoop provides the following counters, available under org.elasticsearch.hadoop.mr.Counter
enum:
Counter name | Purpose |
---|---|
Data focused | |
BYTES_SENT | Total number of data/communication bytes sent over the network to Elasticsearch |
BYTES_ACCEPTED | Data/Documents accepted by Elasticsearch in bytes |
BYTES_RETRIED | Data/Documents rejected by Elasticsearch in bytes |
BYTES_RECEIVED | Data/Documents received from Elasticsearch in bytes |
Document focused | |
DOCS_SENT | Number of docs sent over the network to Elasticsearch |
DOCS_ACCEPTED | Number of documents sent and accepted by Elasticsearch |
DOCS_RETRIED | Number of documents sent but rejected by Elasticsearch |
DOCS_RECEIVED | Number of documents received from Elasticsearch |
Network focused | |
BULK_TOTAL | Number of bulk requests made to Elasticsearch |
BULK_RETRIES | Number of bulk retries (caused by document rejections) |
SCROLL_TOTAL | Number of scroll pulled from Elasticsearch |
NODE_RETRIES | Number of node fall backs (caused by network errors) |
NET_RETRIES | Number of network retries (caused by network errors) |
Time focused | |
NET_TOTAL_TIME_MS | Overall time (in ms) spent over the network |
BULK_TOTAL_TIME_MS | Time (in ms) spent over the network by the bulk requests |
BULK_RETRIES_TOTAL_TIME_MS | Time (in ms) spent over the network retrying bulk requests |
SCROLL_TOTAL_TIME_MS | Time (in ms) spent over the network reading the scroll requests |
One can use the counters programatically, depending on the API used, through mapred or mapreduce. Whatever the choice, elasticsearch-hadoop performs automatic reports without any user intervention. In fact, when using elasticsearch-hadoop one will see the stats reported at the end of the job run, for example:
13:55:08,100 INFO main mapreduce.Job - Job job_local127738678_0013 completed successfully 13:55:08,101 INFO main mapreduce.Job - Counters: 35 ... Elasticsearch Hadoop Counters Bulk Retries=0 Bulk Retries Total Time(ms)=0 Bulk Total=20 Bulk Total Time(ms)=518 Bytes Accepted=159129 Bytes Sent=159129 Bytes Received=79921 Bytes Retried=0 Documents Accepted=993 Documents Sent=993 Documents Received=0 Documents Retried=0 Network Retries=0 Network Total Time(ms)=937 Node Retries=0 Scroll Total=0 Scroll Total Time(ms)=0