- Notifications
You must be signed in to change notification settings - Fork 25.7k
ESQL: Add nulls support to Categorize #117655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Hi @ivancea, I've created a changelog YAML for you. |
| Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 suggestions.
Files not reviewed (1)
- x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec: Language not supported
Comments skipped due to low confidence (6)
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:67
- [nitpick] The change from static to final for the CategorizeEvaluator class may be unintended. If this class is intended to be nested within another class without requiring an instance of the outer class, it should remain static.
public final class CategorizeEvaluator implements Releasable { x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:64
- The close method should also handle the seenNull variable if it is used to track state. Ensure that any state variables are reset or handled appropriately when closing resources.
@Override public void close() { categorizer.close(); } x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:126
- Ensure that the eval method is covered by tests, especially for cases where null values are encountered and handled by the process method.
public IntVector eval(int positionCount, BytesRefVector vVector) { x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java:83
- The increment of category IDs by 1 could cause confusion. Ensure this is intentional and document the reasoning.
idMap.put(oldCategoryId + 1, newCategoryId + 1); x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:119
- Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.
public TokenListCategory computeCategory(String s, CategorizationAnalyzer analyzer) { x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:128
- Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.
public TokenListCategory computeCategory(TokenStream ts, int unfilteredStringLen, long numDocs) throws IOException { ...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java Show resolved Hide resolved
...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java Show resolved Hide resolved
...c/main/java/org/elasticsearch/compute/aggregation/blockhash/AbstractCategorizeBlockHash.java Show resolved Hide resolved
...n/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java Outdated Show resolved Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec Show resolved Hide resolved
jan-elastic left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
alex-spies left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I'm late to the party, but LGTM and thanks @ivancea !
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function.
Also, implement
seenGroupIds(), which would fail some queries with nulls otherwise.