Skip to content

Conversation

eyalkoren
Copy link
Contributor

@eyalkoren eyalkoren commented Mar 26, 2025

Namespacing algorithm [EDIT 1]:

  1. start by checking whether the document is OTel or not. A document is considered OTel if:
    • resource exists as a key and the value is a map
    • resource either doesn't contain an attributes field, or contains an attributes field of type map
    • scope is either missing or a map
    • attributes is either missing or a map
    • body is either missing or a map
    • body either doesn't contain a text field, or contains a text field of type String
    • body either doesn't contain a structured field, or contains a structured field that is not of type String
  2. if it is OTel - return as is
  3. if it is not OTel:
    • create a new attributes map
    • create new resource map with one entry of which attributes is the key and a new map as its value
    • move the following top level fields (if they exist) to the new attributes map: attributes, resource, span_id, body, severity_text and trace_id
    • add the new attributes and resource maps as top level fields
    • rename special keys (e.g. span.id, log.level) to OTel-compliant names: for each, look for a value first in the nested form and if not found look for a top level dotted field. The first value that is found is used for the renamed field
    • move all remaining top level fields, other than @timestamp, trace_id, span_id, severity_text, body, attributes, resource and scope to the new attributes map
    • flatten all fields that are not arrays in attributes
    • move specific attributes that describe resources from attributes to resource.attributes
@eyalkoren eyalkoren self-assigned this Mar 27, 2025
@eyalkoren eyalkoren added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Mar 27, 2025
@eyalkoren eyalkoren marked this pull request as ready for review March 27, 2025 16:06
@elasticsearchmachine
Copy link
Collaborator

Hi @eyalkoren, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eyal, I left some initial comments. I had a question about the way that we nest a document with an existing attributes field into the OTel attributes. Is this something we want to do? For example this doc:

{ "attributes": { "a": "b", "c": [1, 2, 3] }, "log.level": "1234" }

Becomes, after processing:

{ "resource": { "attributes": {} }, "severity_text": "1234", "attributes": { "attributes.a": "b", "attributes.c": [1, 2, 3] } }

Is that the desired behavior for an existing attributes field?

@eyalkoren
Copy link
Contributor Author

eyalkoren commented Mar 30, 2025

Answering the general question:

I had a question about the way that we nest a document with an existing attributes field into the OTel attributes. Is this something we want to do?

We started off by trying to be more "clever" about this and merge OTel with non-OTel. Then we had to handle lots of corner cases, like:

  • if attributes exists and is not a map - it needs to go into a new attributes map
  • if resource exists and is not a map - it needs to go into a new attributes map
  • if there was attributes map before that included a resource entry, and the top-level resource is not a map, we need to make sure that the new attributes.resource entry's value becomes an array that includes both values
  • same for resource.attribtues
  • if both span.id and span_id exist - we need to do something about it, for example: make the new span_id an array with two values

And so forth.

So last week we decided to change the way we think about it: a document is either sent by an OTel-compliant shipper, or not. If not, no reason to treat the original fields as if they have the OTel sematics. So even if it has a field that has an OTel name, we can consider it to be by chance and namespace it. If that's so- no reason to complicate things for the unlikely event where non-OTel documents contain fields with intended OTel semantics.

eyalkoren and others added 2 commits March 30, 2025 09:53
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
eyalkoren and others added 4 commits March 30, 2025 10:06
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@eyalkoren eyalkoren changed the title Adding EcsNamespaceProcessor Adding NormalizeToStreamProcessor May 29, 2025
@dakrone dakrone changed the title Adding NormalizeToStreamProcessor Adding NormalizeForStreamProcessor May 29, 2025
Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a ticket for us to track adding the periodic ci to run the otel-semver-crawler test bits, but I'm fine with that staying disabled for now on this PR.

I see also that there's a perhaps unfinished conversation about the wording of the documentation. I'm okay if that's either resolved here and now, or if we fuss with the wording of things in a follow up PR.

@joegallo
Copy link
Contributor

I believe it's the case that this PR is intended to be backported to 8.19.0, so I'm going to add that label, and also add the label for attempting to automatically backport it. Feel free to remove those labels if this is not actually intended for 8.19.0, though.

@joegallo joegallo added auto-backport Automatically create backport pull requests when merged v8.19.0 labels May 30, 2025
@dakrone
Copy link
Member

dakrone commented May 31, 2025

This should go to 8.19

@eyalkoren
Copy link
Contributor Author

CI keeps being unhappy with random stuff, I don't think related to this PR contents.
Once we get a green build, are we good to merge?

Regarding the followup task of running the test nightly - do you want me to open a GH issue with summary of what I already discussed with the delivery team?

@joegallo
Copy link
Contributor

joegallo commented Jun 3, 2025

Once we get a green build, are we good to merge?

For my part, yes. 👍 (And quick, it's green!)

@joegallo
Copy link
Contributor

joegallo commented Jun 3, 2025

Regarding the followup task of running the test nightly - do you want me to open a GH issue with summary of what I already discussed with the delivery team?

I'll reach out to you offline.

@joegallo joegallo merged commit d3d2d9b into elastic:main Jun 3, 2025
17 checks passed
elasticsearchmachine pushed a commit that referenced this pull request Sep 12, 2025
…or` (#134524) Fixes field querying and writing logic for NormalizeForStreamProcessor so that it can function on both `classic` and `flexible` ingest pipeline access patterns. NormalizeForStreamProcessor was added in #125699 with support for the default ingest node field access logic (now known as `classic` mode). We have since added support for the `flexible` access pattern in ingest pipelines, which allows for querying dotted field names and writing dotted field names when parent path elements are missing. The NormalizeForStreamProcessor was written with the classic access pattern in mind. The processor was designed to look for singular field names and to rely on the classic field writing logic which creates intermediate parent objects when setting a value that is nested in the document. When flexible mode was enabled, the logic did not anticipate dotted field names that could be inconsistently accessible from the source map at certain points in the path notation. Further, the flexible access pattern does not create intermediate parent objects like before. A secondary renaming method was added to take these changes into account.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Sep 17, 2025
…or` (elastic#134524) Fixes field querying and writing logic for NormalizeForStreamProcessor so that it can function on both `classic` and `flexible` ingest pipeline access patterns. NormalizeForStreamProcessor was added in elastic#125699 with support for the default ingest node field access logic (now known as `classic` mode). We have since added support for the `flexible` access pattern in ingest pipelines, which allows for querying dotted field names and writing dotted field names when parent path elements are missing. The NormalizeForStreamProcessor was written with the classic access pattern in mind. The processor was designed to look for singular field names and to rely on the classic field writing logic which creates intermediate parent objects when setting a value that is nested in the document. When flexible mode was enabled, the logic did not anticipate dotted field names that could be inconsistently accessible from the source map at certain points in the path notation. Further, the flexible access pattern does not create intermediate parent objects like before. A secondary renaming method was added to take these changes into account.
gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 18, 2025
…or` (elastic#134524) Fixes field querying and writing logic for NormalizeForStreamProcessor so that it can function on both `classic` and `flexible` ingest pipeline access patterns. NormalizeForStreamProcessor was added in elastic#125699 with support for the default ingest node field access logic (now known as `classic` mode). We have since added support for the `flexible` access pattern in ingest pipelines, which allows for querying dotted field names and writing dotted field names when parent path elements are missing. The NormalizeForStreamProcessor was written with the classic access pattern in mind. The processor was designed to look for singular field names and to rely on the classic field writing logic which creates intermediate parent objects when setting a value that is nested in the document. When flexible mode was enabled, the logic did not anticipate dotted field names that could be inconsistently accessible from the source map at certain points in the path notation. Further, the flexible access pattern does not create intermediate parent objects like before. A secondary renaming method was added to take these changes into account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature serverless-linked Added by automation, don't add manually Team:Data Management Meta label for data/management team v8.19.0 v9.1.0

5 participants