Skip to content

Conversation

@craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Jun 20, 2023

Recently we added support for line-simplification in geo-lines when in time-series aggregations: #94954. This work did not, however, cover the case of MergedGeoLines, which occurs when the same geo_line covers multiple shards on data nodes, and needs to be merged on the coordinating node. The original work would use line-simplification and time-ordering on the data node, but then revert to re-sorting (unnecessary) and truncation (incorrect) on the coordinating node.

This PR rectifies that, and adds two fields to the InternalGeoLine:

  • nonOverlapping
    • re-sorting is NOT required because the sort value ranges do not overlap, so we only sort the set of incoming geo_lines (easy/fast) instead of every single point (complex/slow).
  • simplified
    • The approach to memory limiting is by line-simplification instead of truncation.
    • If the data nodes are doing line-simplification, we want the coordinating node to do that as well

For time-series, both of the above are true, and at this point all other cases will have both false. The reason for two booleans is:

  • They are not necessarily related concepts
  • We consider supporting line-simplification as a user-requested option in the non-time-series geo_line in future (much nicer than truncation)

Fixes #96983

@craigtaverner craigtaverner added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 20, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@craigtaverner craigtaverner force-pushed the merged_geo_lines_simplifier branch from 87361e0 to 383e605 Compare June 21, 2023 09:33
@craigtaverner craigtaverner changed the title Support time-series (simplify) version of MergedGeoLines Fix time-series geo_line to include reduce phase in MergedGeoLines Jun 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

@craigtaverner craigtaverner force-pushed the merged_geo_lines_simplifier branch from 033baa2 to 4b7333e Compare June 21, 2023 14:16
Copy link
Contributor

@salvatore-campagna salvatore-campagna Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use static comparators? And also naturalOrdering and reverseOrder comparators (they make the code more readable in my opinion).

@salvatore-campagna
Copy link
Contributor

LGTM, I just left a couple of comments about using static comparators...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me how do you test the case where you have the line split among two data nodes.

@craigtaverner craigtaverner force-pushed the merged_geo_lines_simplifier branch from 4b7333e to d59fc5f Compare June 22, 2023 10:30
@craigtaverner craigtaverner added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jun 22, 2023
@elasticsearchmachine elasticsearchmachine merged commit df59704 into elastic:main Jun 22, 2023
@craigtaverner craigtaverner deleted the merged_geo_lines_simplifier branch June 22, 2023 11:13
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.9
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jun 22, 2023
…lastic#96953) Recently we added support for line-simplification in geo-lines when in time-series aggregations: elastic#94954. This work did not, however, cover the case of `MergedGeoLines`, which occurs when the same geo_line covers multiple shards on data nodes, and needs to be merged on the coordinating node. The original work would use line-simplification and time-ordering on the data node, but then revert to re-sorting (unnecessary) and truncation (incorrect) on the coordinating node. This PR rectifies that, and adds two fields to the InternalGeoLine: * nonOverlapping * re-sorting is *NOT* required because the sort value ranges do not overlap, so we only sort the set of incoming geo_lines (easy/fast) instead of every single point (complex/slow). * simplified * The approach to memory limiting is by line-simplification instead of truncation. * If the data nodes are doing line-simplification, we want the coordinating node to do that as well For time-series, both of the above are true, and at this point all other cases will have both false. The reason for two booleans is: * They are not necessarily related concepts * We consider supporting line-simplification as a user-requested option in the non-time-series geo_line in future (much nicer than truncation) Fixes elastic#96983
elasticsearchmachine pushed a commit that referenced this pull request Jun 22, 2023
…96953) (#97005) Recently we added support for line-simplification in geo-lines when in time-series aggregations: #94954. This work did not, however, cover the case of `MergedGeoLines`, which occurs when the same geo_line covers multiple shards on data nodes, and needs to be merged on the coordinating node. The original work would use line-simplification and time-ordering on the data node, but then revert to re-sorting (unnecessary) and truncation (incorrect) on the coordinating node. This PR rectifies that, and adds two fields to the InternalGeoLine: * nonOverlapping * re-sorting is *NOT* required because the sort value ranges do not overlap, so we only sort the set of incoming geo_lines (easy/fast) instead of every single point (complex/slow). * simplified * The approach to memory limiting is by line-simplification instead of truncation. * If the data nodes are doing line-simplification, we want the coordinating node to do that as well For time-series, both of the above are true, and at this point all other cases will have both false. The reason for two booleans is: * They are not necessarily related concepts * We consider supporting line-simplification as a user-requested option in the non-time-series geo_line in future (much nicer than truncation) Fixes #96983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.9.0 v8.10.0

4 participants