Loading

ES|QL and Pandas

The Elasticsearch Query Language (ES|QL) provides a powerful way to filter, transform, and analyze data stored in Elasticsearch. Designed to be easy to learn and use, it is a perfect fit for data scientists familiar with Pandas and other dataframe-based libraries. ES|QL queries produce tables with named columns, which is the definition of dataframes.

This page shows you an example of using ES|QL and Pandas together to work with dataframes.

Use the employees sample data and mapping. The easiest way to load this dataset is to run two Elasticsearch API requests in the Kibana Console.

Use the ES|QL CSV import to convert the employees dataset to a Pandas dataframe object.

from io import StringIO from elasticsearch import Elasticsearch import pandas as pd client = Elasticsearch( "https://[host].elastic-cloud.com", api_key="...", ) response = client.esql.query( query="FROM employees | LIMIT 500", format="csv", ) df = pd.read_csv(StringIO(response.body)) print(df) 

Even though the dataset contains only 100 records, a LIMIT of 500 is specified to suppress ES|QL warnings about potentially missing records. This prints the following dataframe:

 avg_worked_seconds ... salary_change.long still_hired 0 268728049 ... 1 True 1 328922887 ... [-7, 11] True 2 200296405 ... [12, 14] False 3 311267831 ... [0, 1, 3, 13] True 4 244294991 ... [-2, 13] True .. ... ... ... ... 95 204381503 ... NaN False 96 206258084 ... -1 False 97 272392146 ... [-2, 4, 8] False 98 377713748 ... [-8, -3, 10, 14] True 99 223910853 ... [-7, 13] True 

You can now analyze the data with Pandas or you can also continue transforming the data using ES|QL.

In the next example, the STATS … BY command is utilized to count how many employees are speaking a given language. The results are sorted with the languages column using SORT:

response = client.esql.query( query=""" FROM employees | STATS count = COUNT(emp_no) BY languages | SORT languages | LIMIT 500 """, format="csv", ) df = pd.read_csv( StringIO(response.body), dtype={"count": "Int64", "languages": "Int64"}, ) print(df) 

Note that the dtype parameter of pd.read_csv() is useful when the type inferred by Pandas is not enough. The code prints the following response:

 count languages 0 15 1 1 19 2 2 17 3 3 18 4 4 21 5 

Use the built-in parameters support of the ES|QL REST API to pass parameters to a query:

response = client.esql.query( query=""" FROM employees | STATS count = COUNT(emp_no) BY languages | WHERE languages >= (?) | SORT languages | LIMIT 500 """, format="csv", params=[3], ) df = pd.read_csv( StringIO(response.body), dtype={"count": "Int64", "languages": "Int64"}, ) print(df) 

The code above outputs the following:

 count languages 0 17 3 1 18 4 2 21 5 

If you want to learn more about ES|QL, refer to the ES|QL documentation. You can also check out this other Python example using Boston Celtics data.