Skip to content

Conversation

fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Dec 12, 2024

This PR is to support named arguments for functions - provided in map format. The named arguments are provided as constant/literal values, in comma separated name:value pairs, as described in an example below. The types of the values can be string, numeric, boolean or an array of them, null is not supported as a valid value.

FROM test | EVAL x = function(field, {"stringArg":"value", "intArg":1, "doubleArg":2.0, "boolArg":true, "arrayArg":["a", "b"]}) 

This map format is only supported by function arguments, and it is not available in the other places in the query. This PR does not introduce a Map type in the ES|QL.

Internally the map is represented as a MapExpression that hosts a list of EntryExpressions, it can be a(optional) child of a function.

A couple of test functions(MapCount, LogWithBaseInMap) are included in this PR to validate that MapExpression and EntryExpression work as expected as function arguments, serialization works properly, and the related docs can be generated properly. These two test functions are under snapshot only.

MapCount does not require serialization, it is evaluated at planning time by constant folding at coordinator node, it is not sent to the data nodes.

LogWithBaseInMap requires serialization, and the two related tests - LogWithBaseInMapTests and LogWithBaseInMapSerializationTests will not be merged, as they may trigger the generation of unnecessary docs, they are there to validate serialization and doc generation work properly.

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql added the test-release Trigger CI checks against release build label Dec 12, 2024
@costin costin added :Analytics/ES|QL AKA ESQL and removed :Analytics/EQL EQL querying labels Dec 19, 2024
@fang-xing-esql fang-xing-esql marked this pull request as ready for review December 23, 2024 19:34
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 23, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

builder.startObject();
builder.field("name", arg.name());
if (arg.mapParam()) {
if (arg instanceof EsqlFunctionRegistry.MapArgSignature mp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - This different behaviour for ArgSignature / MapArgSignature could be included in the specific classes in a method like definition(XcontentBuilder builder) instead of using a instanceof

/**
* Represent a collect of key-value pairs.
*/
public class MapExpression extends Expression implements Map<Expression, Expression> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something planned? I don't think we need to implement the Map interface on this class, but expose the underlying Map

Expression value = expression(entry.constant());
if (value instanceof Literal l) {
if (l.dataType() == NULL) {
throw new ParsingException(source(ctx), "Invalid named function argument [{}], NULL is not supported", l);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the function arg name (key) so the error message is more actionable?

namedArgs.add(new Literal(source(entry.string()), key, KEYWORD));
namedArgs.add(l);
} else {
throw new ParsingException(source(ctx), "Invalid named function argument [{}], only constant value is supported", value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can we add the function arg name?

return map;
}

private TypeResolution validateOptions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed for this PR, but we will need to provide a validation mechanism for map options so they are checked for valid keys, valid types etc

entryExpressions.add(new EntryExpression(key.source(), key, value));
map.put(key, value);
if (key.foldable()) {
this.keyFoldedMap.put(key.fold(), value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that there are no duplicate key entries, otherwise users could add them by mistake, like

ROW value = 8.0 | EVAL l = round(log_with_base_in_map(value, {"base": 9.0, "base": 3.0} 

which can be difficult to debug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do the validation on duplicated keys in the parser, so that we stop as soon as duplicated keys are found.

Numeric expression. If `null`, the function returns `null`.

`map`::
Input value. The input is a valid constant map expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how to describe the map type to users so it's more informative.

Maybe something like "arguments map" or "named arguments"? "Map expression" sounds a bit weird.

I know, naming... 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know, the naming could be tricky. How about "named arguments in map"? It may look more clear in the docs if we provide them some examples. We can have our documentation team friends review it, perhaps they have a better idea how to describe it in a more meaningful way.

I don't plan to merge this log_with_base_in_map related tests or docs when this PR is ready to merge, I use them to validate the doc generation and serialization to catch some possible issues that the Match function may hit. Do you prefer to merge the log_with_base_in_map as a snapshot function? Or shall I remove it when this is ready to merge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"named arguments in map" sounds better - we can have some iterations on this with the docs team as you're suggesting 👍

I don't plan to merge this log_with_base_in_map related tests or docs when this PR is ready to merge

Got it, I'm thinking on match implementing this once this is merged. We can postpone naming decisions with that.

Do you prefer to merge the log_with_base_in_map as a snapshot function? Or shall I remove it when this is ready to merge?

I think it's a good idea to keep the function as a testing ground, so we don't have to re-test the map param mechanism as part of new functions. 💯

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this @fang-xing-esql !

Expression value = entries.get(i * 2 + 1);
entryExpressions.add(new EntryExpression(key.source(), key, value));
map.put(key, value);
if (key.foldable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can key be non-foldable or be foldable later? (as in optimization rules kicking in and propagating values and making the keys foldable)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the implementation of MapExpression for function arguments, the keys are all Literals(foldable). If MapExpression is used for some other purposes in the future, we could have non-foldable keys, that become foldable during optimization, and this foldable map need to be updated/refreshed, if this happens, one option is to have an optimization rule to refresh foldable map of a MapExpression.

if (key instanceof Expression) {
return map.get(key);
} else {
key = key instanceof String s ? s.toLowerCase(Locale.ROOT) : key;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about the case sensitiveness of the keys?
ES itself, if I'm not mistaken, is case sensitive to how the elements of the queries are specified. For example, query_string query written like

{ "query": { "query_string": { "query": "(new york city) OR (big apple)", "Default_field": "content" } } } 

where Default_field is used will complain that there is no such element. I know it's not the exact same thing (ES|QL and the query DSL) but we have functions - full text search - that may want to expose all sorts of parameters and users who are used to query DSL want to use the same/similar functionality in ES|QL, we might want to have a consistent behavior.
Also, allowing this (not being case sensitive) from the beginning, in case we are required to make it consistent with ES, we might not be doing it so easily (bwc concerns).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, I didn't know that search is case-sensitive, keeping the map key string unchanged makes sense to me. Here are a couple of examples that I tried.

+ curl -u elastic:password -v -X GET 'localhost:9200/case_sensitive/_search?format=txt&pretty' -H 'Content-Type: application/json' '-d { "query": { "match": { "field1": { "Query": "a" } } } } ' { "error" : { "root_cause" : [ { "type" : "parsing_exception", "reason" : "[match] query does not support [Query]", "line" : 6, "col" : 18 } ], "type" : "parsing_exception", "reason" : "[match] query does not support [Query]", "line" : 6, "col" : 18 }, "status" : 400 } + curl -u elastic:password -v -X GET 'localhost:9200/case_sensitive/_search?format=txt&pretty' -H 'Content-Type: application/json' '-d { "query": { "query_string": { "query": "(a) or (A)", "Default_field": "field1" } } } ' { "error" : { "root_cause" : [ { "type" : "parsing_exception", "reason" : "[query_string] query does not support [Default_field]", "line" : 6, "col" : 24 } ], "type" : "parsing_exception", "reason" : "[query_string] query does not support [Default_field]", "line" : 6, "col" : 24 }, "status" : 400 } 
List<String> names = new ArrayList<>(ctx.entryExpression().size());
List<EsqlBaseParser.EntryExpressionContext> kvCtx = ctx.entryExpression();
for (EsqlBaseParser.EntryExpressionContext entry : kvCtx) {
String key = visitString(entry.string()).fold().toString().toLowerCase(Locale.ROOT); // make key case-insensitive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why calling fold here? (folding is a function of the optimizer) This is a string here nevertheless.
Also, same thing I mentioned earlier about case sensitivity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fold() is replaced by getText() and unquote().

namedArgs.add(l);
names.add(key);
} else {
throw new ParsingException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This is only possible when a parameter is provided as a map entry value. We take constants as map entry values, so parameters are valid inputs. And parameters can be for constant values, identifiers and patterns, if users misuse it and provide a parameter for identifiers or patterns, we end up with this code path. I added some tests in StatementParserTests to cover this.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fang-xing-esql
Copy link
Member Author

Thanks for reviewing @ivancea @carlosdelest @astefan !

@fang-xing-esql fang-xing-esql merged commit 11fbc8c into elastic:main Jan 16, 2025
17 checks passed
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Jan 16, 2025
@fang-xing-esql fang-xing-esql added the auto-backport Automatically create backport pull requests when merged label Jan 16, 2025
@fang-xing-esql
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Jan 17, 2025
* MapExpression for functions (cherry picked from commit 11fbc8c) # Conflicts: #	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.interp #	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.java #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
svilen-mihaylov-elastic added a commit that referenced this pull request Mar 7, 2025
Adds options to QSTR function. #118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL. Closes #120933
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
Adds options to QSTR function. elastic#118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL. Closes elastic#120933
svilen-mihaylov-elastic added a commit to svilen-mihaylov-elastic/elasticsearch that referenced this pull request Mar 18, 2025
Adds options to QSTR function. elastic#118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL. Closes elastic#120933 (cherry picked from commit ee4bcac) # Conflicts: #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
svilen-mihaylov-elastic added a commit to svilen-mihaylov-elastic/elasticsearch that referenced this pull request Mar 18, 2025
Adds options to QSTR function. elastic#118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL. Closes elastic#120933 (cherry picked from commit ee4bcac) # Conflicts: #	x-pack/plugin/esql/qa/testFixtures/src/main/resources/qstr-function.csv-spec #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
elasticsearchmachine pushed a commit that referenced this pull request Mar 18, 2025
…5114) * Added optional parameters to QSTR ES|QL function (#121787) Adds options to QSTR function. #118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL. Closes #120933 (cherry picked from commit ee4bcac) # Conflicts: #	x-pack/plugin/esql/qa/testFixtures/src/main/resources/qstr-function.csv-spec #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java * [CI] Auto commit changes from spotless * Fix compilation error * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
elasticsearchmachine pushed a commit that referenced this pull request Mar 18, 2025
…5112) * Added optional parameters to QSTR ES|QL function (#121787) Adds options to QSTR function. #118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL. Closes #120933 (cherry picked from commit ee4bcac) # Conflicts: #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java #	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java * [CI] Auto commit changes from spotless * Fix compilation error * Fix missing import * getFirst() -> get(0) * Make method public * Make asBuilder public in subclasses * Revert "Make asBuilder public in subclasses" This reverts commit f444aa6. * Revert "Make method public" This reverts commit a1d9f56. * .asQueryBuilder() -> .toQueryBuilder() * Remove extraneous change which sneaked in during backport --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-release Trigger CI checks against release build v8.18.0 v9.0.0

6 participants