Skip to content

Conversation

@tedsuo
Copy link
Contributor

@tedsuo tedsuo commented Dec 2, 2024

This OTEP defines a EntityProvider, to enable the updating of resources that have a lifespan shorter than the lifespan of the application instance.

This OTEP is currently a draft. The primary open question is how to handle active spans that are already running when a resource change is made. Please see the OTEP for details.

@tigrannajaryan
Copy link
Member

I support the idea of allowing the Resources to be changed over time. We still need to make sure this complies with our stability guarantees and Resource spec.

We will need to specify how this will interact with the entities. The Entities SIG has a proposal about updating the Resource in a specific way and I want to make sure we don't introduce conflicting ideas that will be impossible to merge in the future.

@tedsuo tedsuo added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Dec 3, 2024
Copy link

@feldentm-SAP feldentm-SAP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I appreciate the document. I left picky comments to allow following my perception when reading it top-down without knowing more details.
Things that need to be changed:

  • structure and links to other documents need to be improved before releasing the document
  • the document should provide a clear context at the top and stay in that context
  • we might have to define elsewhere where which kinds of changes are expected
  • I struggle with the pseudo code syntax and the provided signatures seem to be incomplete; maybe adding explicit void could help
  • unless I got something completely wrong, the change is breaking and we should spend more effort on explaining why it is necessary and why it wasn't avoidable

## Example Implementation

Pseudocode examples for a possible Validator and ResourceProvider implementation. Attention is placed on making the ResourceProvider thread safe, without introducing any locking or synchronization overhead to `GetResource`, which is the only ResourceProvider method on the hot path for OpenTelemetry instrumentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is locking relevant here? it seems to be the current focus topic; I agree that it is important, but it isn't the most important topic, right? Using the API and migration should be the most important topics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is implementation example specific - I think we should alter the example implementation given the changes to the rest of the specification, but debating just using a link to a prototype instead.


```
// Example of a thread-safe ResourceProvider
class ResourceProvider{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really understand the pseudo code syntax. It seems to be Go with classes and some strange extra syntax that I do not get. Also, explicit this is really uncommon and only used in languages that require it due to bad language design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this comment again - The syntax/language for pseudo-code isn't important here.

Are you able to understand what the goal of the interface is from the description and the example? If so, let's evaluate that, not choice of pseudo-code syntax.

If you have specific things you don't understand, list them so they can be addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that regardless of the pseudo-code someone chooses to use, we request that the examples to be 100% explicit and not assume that the reader knows any implicit details about a particular programming language.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Dec 5, 2024

We have this wording in a Stable spec doc:

When used with distributed tracing, a resource can be associated with the TracerProvider when the TracerProvider is created. That association cannot be changed later.

How do we reconcile this OTEP with that last assertion? One possibility would be that a Resource directly associated with TracerProvider cannot be changed, but a Resource associated with the ResourceProvider can be replaced and thus the indirect association of the Resource with TracerProvider can change. I would want to confirm that this is an allowed modification of spec wording and we are not breaking our stability guarantees.

@jzwc
Copy link

jzwc commented Dec 7, 2024

Thanks for the comprehensive proposal.

Let me comment here on the whole topic from a narow position of a mobile, especially real user monitoring and analytics area.

The proposal, in its analysis, mixes, in my opinion, two, potentially several, different things together: the truly immutable resources (at least on mobile - like operating system, its version, hardware, etc.) and the mutable states of the device (connection QoS) and application (background-foreground, session, user, and their attributes).

My opinion is that just making resources mutable does not solve the apparent inadequacy of the open-telemetry model for the RUM and analytics realm.

The mutable state requires different handling than resources, on several levels. Not only does the state mutate, but it can also be different for different concurrently running bits of code. E.g., a background thread indeed is subject to the same mutable device state (online/offline), yet it can still run in a context of some state attributes in which it was spawned (e.g., session), although the respective app state on the main thread has already changed.

Thus, there seem to be at least three different “resources” realms:

  • the classic immutable resources as defined in OTel (with potential mutability proposed here)
  • mutable global state that applies to all running code of the application
  • mutable state that is typically global but can define different (typically temporary) local context for concurrently running code

Thus it would be, in my opinion, beneficial to narrow the “let us make resources mutable“ proposal to the use cases where the seemingly immutable (by current definition) resources may mutate.

As for other use-cases of mutable device and applications state, a dedicated solution would serve them better. The dedicated channel to sent the mutable state attributes, mentioned in a comment, seems as a sound solution for me. For the local “sticky” state attributes, they can perhaps be provided in the affected signal attributes to override the global state values.

@jsuereth
Copy link
Contributor

jsuereth commented Dec 9, 2024

Responding to @jzwc

Thus, there seem to be at least three different “resources” realms:

  1. the classic immutable resources as defined in OTel (with potential mutability proposed here)
  2. mutable global state that applies to all running code of the application
  3. mutable state that is typically global but can define different (typically temporary) local context for concurrently running code

I'm not sure if you've seen the OTEP where we split apart Resource, but imagine Resource now as a composable set of Entities (bundles). Resource represents the context in which telemetry was generated, but I think the thing we learned from Mobile is that the context is not the SDK itself but something more dynamic, and we're trying to respond to that here. IMO - This proposal is about your (3) - mutable state that is typically global but can define different (typically temporary) local context for concurrently running code.

Imagine that resource is composed of a set of bundles (entities). 90% of the entities are static for the lifetime of the SDK, e.g. OS, hardware, etc. A few change (like session). With this proposal and the [previous OTEP](mutable state that is typically global but can define different (typically temporary) local context for concurrently running code), you'd just be swapping up the entities that changes wholesale (e.g. Here's a new session, replace the old one and all information about it wholesale).

As for other use-cases of mutable device and applications state, a dedicated solution would serve them better.

This is also a thing with the Entities work, there would be a special "Entities" channel that could fire out state change events.

TL;DR; I agree on the surface that if you just viewed this proposal purely from resource perspective, I think we'd wind up mixing concerns as you suggest. If you layer in the Entities work, I think this should give us the tools we need to solve our problems.

@tedsuo
Copy link
Contributor Author

tedsuo commented Dec 12, 2024

Ok, big update! I've added Entities to the design. It might even make sense to rename this an EntityProvider instead of a ResourceProvider.

Regardless of the name, I'm positive that this design is not correct, as I'm not even sure which Entity document to be referring to at this point (I used OTEP 0256). But as Cunningham's Law states, "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

Please let me know what is missing, I would love to pair with one of the entity sponsors to rewrite this and add further details.

@tedsuo
Copy link
Contributor Author

tedsuo commented Dec 18, 2024

Latest update:

  • Added an AddEntity method to ResourceProvider
  • Clarified that the resource object should be regenerated entirely when an entity changes

The next big bugbear I would like to look at is how to handle the potential for thrashing resources, such as the networking stack on mobile rapidly changing.

One possible solution is to batch changes in a time window. Rather that have separate individual callbacks for EntityState or EntityDelete, there would be a single callback that receives a list of entity events along with the final computed resource. Adjusting the time window length makes a tradeoff between the maximum number of batch flushes that can be triggered and the accuracy of the resources on any particular batch. I don't know if that is a good solution. @bidetofevil what do you actually want here?

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 30, 2025
@tedsuo tedsuo reopened this Jul 31, 2025
@github-actions github-actions bot removed the Stale label Aug 1, 2025
@github-actions
Copy link

github-actions bot commented Aug 9, 2025

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 9, 2025
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Aug 17, 2025
After an entity is deleted, a new resource object MUST be generated by merging
the list of entities together in order.

`Delete Entity` MUST trigger the `On EntityState` operation for all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask why 2 operations were needed if the entire resource is passed to both and then I noticed here you even say that Delete triggers On EntityState and not On EntityDelete. Are both needed? And if so, should this be On EntityDelete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be On EntityDelete.

It's a good question about why you would need to. The assumption is that a "change" may do something different than a "removal".

However, in practice so far, I think just an On EntityState change event is probably enough, but want to see more prototyping and instrumentation to prove that.

cc entities folks for thoughts @tedsuo @dyladan @dmitryax @smith

@tedsuo tedsuo reopened this Aug 25, 2025
@tedsuo tedsuo requested review from a team as code owners August 25, 2025 16:09
@github-actions github-actions bot removed the Stale label Aug 26, 2025
Changes to Resource Provider Specification from prototyping
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 12, 2025
@jsuereth jsuereth removed the Stale label Sep 16, 2025
@jsuereth
Copy link
Contributor

We have this wording in a Stable spec doc:

When used with distributed tracing, a resource can be associated with the TracerProvider when the TracerProvider is created. That association cannot be changed later.

How do we reconcile this OTEP with that last assertion? One possibility would be that a Resource directly associated with TracerProvider cannot be changed, but a Resource associated with the ResourceProvider can be replaced and thus the indirect association of the Resource with TracerProvider can change. I would want to confirm that this is an allowed modification of spec wording and we are not breaking our stability guarantees.

This is now addressed in the OTEP. It no longer states this is non breaking and instead offers a path forward.

@jsuereth jsuereth dismissed feldentm-SAP’s stale review September 16, 2025 18:08

Proposal has been reworded, please re-review the updated proposal

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 24, 2025
@github-actions
Copy link

github-actions bot commented Oct 1, 2025

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog.opentelemetry.io OTEP OpenTelemetry Enhancement Proposal (OTEP) Stale