Skip to content

Conversation

joegallo
Copy link
Contributor

This change is 99% tests, and 1% removing a call to new HashMap(...) that shallowly copied an incoming map into a new map. It's certainly been convenient be able to write tests with Map.of(...) for the source of the document, but in reality the cost of re-hashing at scale at runtime is not zero (though it is small), and the idea that it's protecting us from much of anything to rehash in this way seems mistaken, especially since it's a only shallow re-hashing.

@joegallo joegallo added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v9.0.0 v8.18.0 labels Jan 24, 2025
@joegallo joegallo requested a review from masseyke January 24, 2025 19:40
@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

joegallo and others added 4 commits January 24, 2025 15:03
Honestly, shallowly rehashing the map is misleading at best -- the maps that back an ingest document are typically deeply nested, so a shallow rehash at the top-level doesn't do much for you.
@joegallo joegallo force-pushed the optimize-ingest-ctx-map-construction branch from f3bd49c to 1cde31e Compare January 24, 2025 20:03
Map<String, Object> source
) {
super(new HashMap<>(source), new IngestDocMetadata(index, id, version, routing, versionType, timestamp));
super(source, new IngestDocMetadata(index, id, version, routing, versionType, timestamp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way this can burn us in any existing scripts, is there? I don't think they can ever call this constructor, so it couldn't be a problem. Just trying to think through potential problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the ctx of a script processor, so by the time the script processor has a handle on it, it would have already been duplicated off the once (at the very very beginning new IngestService.newIngestDocument). It's not something you can call from a script processor.

I spent a fair amount of time reading through the various constructors and copy constructors of the IngestDocument and its friends, and I really do think I've covered all the bases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But philosophically, if it's a bug to not rehash, then what about the bug of re-hashing but only one layer deep -- these maps are almost always deeply nested, if this code had a job to do then it never actually did it.

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joegallo joegallo merged commit 022b841 into elastic:main Jan 27, 2025
16 checks passed
@joegallo joegallo deleted the optimize-ingest-ctx-map-construction branch January 27, 2025 16:03
joegallo added a commit to joegallo/elasticsearch that referenced this pull request Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v8.18.0 v9.0.0

3 participants