- Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix time-series geo_line to include reduce phase in MergedGeoLines #96953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix time-series geo_line to include reduce phase in MergedGeoLines #96953
Conversation
| Pinging @elastic/es-analytics-geo (Team:Analytics) |
| Hi @craigtaverner, I've created a changelog YAML for you. |
87361e0 to 383e605 Compare | Hi @craigtaverner, I've updated the changelog YAML for you. |
| Hi @craigtaverner, I've updated the changelog YAML for you. |
033baa2 to 4b7333e Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use static comparators? And also naturalOrdering and reverseOrder comparators (they make the code more readable in my opinion).
| LGTM, I just left a couple of comments about using static comparators... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me how do you test the case where you have the line split among two data nodes.
And test sort-order ASC
As well as shuffling the incoming geo_lines to verify that the merge sort re-orders them.
4b7333e to d59fc5f Compare 💚 Backport successful
|
…lastic#96953) Recently we added support for line-simplification in geo-lines when in time-series aggregations: elastic#94954. This work did not, however, cover the case of `MergedGeoLines`, which occurs when the same geo_line covers multiple shards on data nodes, and needs to be merged on the coordinating node. The original work would use line-simplification and time-ordering on the data node, but then revert to re-sorting (unnecessary) and truncation (incorrect) on the coordinating node. This PR rectifies that, and adds two fields to the InternalGeoLine: * nonOverlapping * re-sorting is *NOT* required because the sort value ranges do not overlap, so we only sort the set of incoming geo_lines (easy/fast) instead of every single point (complex/slow). * simplified * The approach to memory limiting is by line-simplification instead of truncation. * If the data nodes are doing line-simplification, we want the coordinating node to do that as well For time-series, both of the above are true, and at this point all other cases will have both false. The reason for two booleans is: * They are not necessarily related concepts * We consider supporting line-simplification as a user-requested option in the non-time-series geo_line in future (much nicer than truncation) Fixes elastic#96983
…96953) (#97005) Recently we added support for line-simplification in geo-lines when in time-series aggregations: #94954. This work did not, however, cover the case of `MergedGeoLines`, which occurs when the same geo_line covers multiple shards on data nodes, and needs to be merged on the coordinating node. The original work would use line-simplification and time-ordering on the data node, but then revert to re-sorting (unnecessary) and truncation (incorrect) on the coordinating node. This PR rectifies that, and adds two fields to the InternalGeoLine: * nonOverlapping * re-sorting is *NOT* required because the sort value ranges do not overlap, so we only sort the set of incoming geo_lines (easy/fast) instead of every single point (complex/slow). * simplified * The approach to memory limiting is by line-simplification instead of truncation. * If the data nodes are doing line-simplification, we want the coordinating node to do that as well For time-series, both of the above are true, and at this point all other cases will have both false. The reason for two booleans is: * They are not necessarily related concepts * We consider supporting line-simplification as a user-requested option in the non-time-series geo_line in future (much nicer than truncation) Fixes #96983
Recently we added support for line-simplification in geo-lines when in time-series aggregations: #94954. This work did not, however, cover the case of
MergedGeoLines, which occurs when the same geo_line covers multiple shards on data nodes, and needs to be merged on the coordinating node. The original work would use line-simplification and time-ordering on the data node, but then revert to re-sorting (unnecessary) and truncation (incorrect) on the coordinating node.This PR rectifies that, and adds two fields to the InternalGeoLine:
For time-series, both of the above are true, and at this point all other cases will have both false. The reason for two booleans is:
Fixes #96983