Understanding states in Elasticsearch scripted_metric Aggregations

Beinset Hounwanou
2 min readMay 16, 2024

Introduction:

When working with Elasticsearch’s powerful aggregation capabilities, understanding the role of states within scripted_metric aggregations is crucial. Elasticsearch divides each index into segments to optimize performance and enable parallel processing. Misinterpreting states can lead to incorrect results and performance issues. Let's delve into what states represents and how it is used in Elasticsearch for advanced data analysis.

Explanation with an Analogy:

Imagine you have a large library with many bookshelves (segments). Each bookshelf has several books (documents). You want to find all books related to a specific topic and summarize the information.

  1. Initialization (init_script): You prepare an empty box (state) for each bookshelf to collect relevant books.
  2. Mapping (map_script): You go through each book and place relevant ones in your box (state.docs).
  3. Combining (combine_script): After scanning all books on a bookshelf, you close the box and label it with the collected information.
  4. Reducing (reduce_script): Finally, you gather all labeled boxes (states) and combine their contents to create a comprehensive summary.

Detailed Example:

Let’s look at a detailed example to better understand this:

{
"aggs": {
"example_metric": {
"scripted_metric": {
"init_script": "state.docs = [];",
"map_script": """
if (doc.containsKey('some_field')) {
state.docs.add(doc['some_field'].value);
}
""",
"combine_script": "return state;",
"reduce_script": """
def final_result = 0;
for (state in states) {
for (doc in state.docs) {
final_result += doc;
}
}
return final_result;
"""
}
}
}
}

Elasticsearch processes each segment separately and in parallel. For scripted_metric aggregations, each segment produces an intermediate state after executing the init, map, and combine scripts. These intermediate states are then combined in the reduce_script to produce the final result.

Steps and Role of states:

  1. init_script: Initializes state.docs as an empty array for each segment.
  2. map_script: Adds relevant document values to state.docs.
  3. combine_script: Returns the intermediate state for each segment.
  4. reduce_script:
  • states is a collection of intermediate states from each segment.
  • for (state in states) iterates over these states.
  • Processes aggregated values from all segments to calculate the final result.

In summary, states in the reduce_script represents the intermediate results from each segment, not each document. This understanding ensures effective aggregation and data analysis in Elasticsearch.

--

--

No responses yet