Skip to content

Conversation

joegallo
Copy link
Contributor

And use it in the remove processor.

Related to #123891, and also this is a follow up to #124322 and #125051 (earlier nearby PRs that were laying the groundwork for this change).

Prior to this change, we had to traverse the document tree twice in the remove processor for each field that we wanted to remove: once to check whether the field existed (in the hasField call), and then once to actually remove the field (in the removeField call). This was necessary because removeField would throw an exception if the field didn't exist, so the call had to be guarded. By adding an ignoreMissing parameter to removeField we can remove the hasField-guarding and just specify that we don't care if the field doesn't exist (well, assuming ignore_missing has been set to true on the processor itself, which it typically is in the wild).

I'm labeling this as a >refactoring since there's no user-visible change in behavior, I'm just twiddling the code a bit so that it happens to be faster. On which note, this speeds up the remove processor by 30% -- I'm seeing that it's taking 290 microseconds per document rather than 413 on main (a further note: prior to #120573 it was taking 681 microseconds per document for the same benchmark).

@joegallo joegallo added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v8.19.0 v9.1.0 labels Mar 19, 2025
@joegallo joegallo requested a review from parkertimmins March 19, 2025 15:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo joegallo added the auto-backport Automatically create backport pull requests when merged label Mar 19, 2025
@joegallo joegallo merged commit e210ea8 into elastic:main Mar 19, 2025
17 checks passed
@joegallo joegallo deleted the ingest-document-remove-field-ignore-missing branch March 19, 2025 20:55
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
@joegallo
Copy link
Contributor Author

Screenshot 2025-03-24 at 9 24 42 AM

Here's a screenshot from the nightly benchmarks -- there's a very nice decrease in the time spent in remove processors due to #120573, but the additional contribution from this PR also sticks out. Overall we're spending about 60% less time in remove processors during this benchmark as a result of these two PRs. Not bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v8.19.0 v9.1.0

3 participants