Skip to content

Conversation

@ivancea
Copy link
Contributor

@ivancea ivancea commented Nov 27, 2024

Handle nulls and empty strings (Which resolve to null) on Categorize grouping function.

Also, implement seenGroupIds(), which would fail some queries with nulls otherwise.

@ivancea ivancea added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Nov 27, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

@ivancea ivancea requested review from alex-spies, Copilot, costin, jan-elastic and nik9000 and removed request for costin and nik9000 November 27, 2024 17:51
@ivancea ivancea marked this pull request as ready for review November 27, 2024 17:51
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 suggestions.

Files not reviewed (1)
  • x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec: Language not supported
Comments skipped due to low confidence (6)

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:67

  • [nitpick] The change from static to final for the CategorizeEvaluator class may be unintended. If this class is intended to be nested within another class without requiring an instance of the outer class, it should remain static.
public final class CategorizeEvaluator implements Releasable { 

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:64

  • The close method should also handle the seenNull variable if it is used to track state. Ensure that any state variables are reset or handled appropriately when closing resources.
@Override public void close() { categorizer.close(); } 

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:126

  • Ensure that the eval method is covered by tests, especially for cases where null values are encountered and handled by the process method.
public IntVector eval(int positionCount, BytesRefVector vVector) { 

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java:83

  • The increment of category IDs by 1 could cause confusion. Ensure this is intentional and document the reasoning.
idMap.put(oldCategoryId + 1, newCategoryId + 1); 

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:119

  • Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.
public TokenListCategory computeCategory(String s, CategorizationAnalyzer analyzer) { 

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:128

  • Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.
public TokenListCategory computeCategory(TokenStream ts, int unfilteredStringLen, long numDocs) throws IOException { 
Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ivancea ivancea merged commit 6b94a91 into elastic:main Nov 28, 2024
16 checks passed
@ivancea ivancea deleted the esql-categorize-nulls branch November 28, 2024 15:07
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117655

ivancea added a commit to ivancea/elasticsearch that referenced this pull request Nov 28, 2024
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm late to the party, but LGTM and thanks @ivancea !

elasticsearchmachine pushed a commit that referenced this pull request Nov 28, 2024
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0

4 participants