Skip to content

Conversation

bisgaard-itis
Copy link
Contributor

@bisgaard-itis bisgaard-itis commented Sep 25, 2025

N.B.

I rerequested reviews from everyone because I had to do quite a lot of changes to how we configure the tracing.

What do these changes do?

  • Enable a parent based tracing sampling strategy throughout simcore services. This strategy is a head based sampling strategy (see Configure our opentelemetry tracing so we can sample it and enable opentelemetry tracing in prodΒ osparc-ops-environments#1090). That means the first service a request entering the stack hits determines whether or not the trace of the request should be sampled or not (based on a probablility configured via env vars). If the trace is sampled, then all following services will also sample the trace. This is one of the simplest sampling strategies ensuring all sampled traces are complete (no missing spans).
  • Unit tests have been added here to test the sampling strategy, but once this goes in master we should also test there using a lower sampling probability (I suggest 0,1) and then propagate this to PROD eventually.
  • During the development of the tests I discovered a that the TraceProvider we use from the opentelemetry library is global, which poses a limitation in our tests. Hence, I started using a local TraceProvider which is passed around everywhere. This is not supported by the aiohttp server instrumentation lib (Allow passing a TracerProvider when instrumenting an aiohttp serverΒ open-telemetry/opentelemetry-python-contrib#3801) so I had to fix the middleware from that lib.
  • This refactoring gave me the opportunity to disable tracing in many tests.

Related issue/s

How to test

  • Unit tests have been added to service-library test suite

Dev-ops

@bisgaard-itis bisgaard-itis self-assigned this Sep 25, 2025
@bisgaard-itis bisgaard-itis added the a:infra+ops maintenance of infrastructure or operations (discussed in retro) label Sep 25, 2025
@bisgaard-itis bisgaard-itis added this to the Cheops milestone Sep 25, 2025
@bisgaard-itis bisgaard-itis marked this pull request as ready for review September 25, 2025 08:36
@bisgaard-itis bisgaard-itis changed the title 1090 implement sampling tracing strategy ✨ Implement sampling tracing strategy Sep 25, 2025
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 71.38889% with 103 lines in your changes missing coverage. Please review.
βœ… Project coverage is 87.63%. Comparing base (98bd68a) to head (009d00c).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@ Coverage Diff @@ ## master #8421 +/- ## ========================================== + Coverage 87.32% 87.63% +0.30%  ========================================== Files 1877 1999 +122 Lines 73118 77828 +4710 Branches 1333 1338 +5 ========================================== + Hits 63850 68201 +4351  - Misses 8869 9227 +358  - Partials 399 400 +1 
Flag Coverage Ξ”
integrationtests 64.12% <50.90%> (+0.43%) ⬆️
unittests 86.32% <71.38%> (+0.32%) ⬆️
Components Coverage Ξ”
pkg_aws_library 93.59% <ΓΈ> (ΓΈ)
pkg_celery_library 84.27% <ΓΈ> (ΓΈ)
pkg_dask_task_models_library 79.33% <ΓΈ> (ΓΈ)
pkg_models_library 93.07% <ΓΈ> (ΓΈ)
pkg_notifications_library 85.20% <ΓΈ> (ΓΈ)
pkg_postgres_database 87.95% <ΓΈ> (ΓΈ)
pkg_service_integration 70.17% <ΓΈ> (ΓΈ)
pkg_service_library 70.98% <75.75%> (+0.06%) ⬆️
pkg_settings_library 90.20% <100.00%> (+0.01%) ⬆️
pkg_simcore_sdk 84.95% <ΓΈ> (ΓΈ)
agent 93.10% <45.45%> (-0.44%) ⬇️
api_server 91.88% <70.58%> (βˆ…)
autoscaling 95.72% <81.81%> (+<0.01%) ⬆️
catalog 92.27% <45.45%> (-0.09%) ⬇️
clusters_keeper 99.14% <90.00%> (+<0.01%) ⬆️
dask_sidecar 92.37% <100.00%> (+0.58%) ⬆️
datcore_adapter 97.95% <90.00%> (+0.01%) ⬆️
director 75.72% <50.00%> (-0.18%) ⬇️
director_v2 90.93% <74.07%> (+0.62%) ⬆️
dynamic_scheduler 96.84% <68.75%> (-0.07%) ⬇️
dynamic_sidecar 90.44% <77.77%> (+0.01%) ⬆️
efs_guardian 89.83% <81.81%> (+0.09%) ⬆️
invitations 90.90% <54.54%> (-0.52%) ⬇️
payments 92.80% <76.47%> (+0.08%) ⬆️
resource_usage_tracker 92.22% <87.50%> (+0.17%) ⬆️
storage 86.49% <40.00%> (-0.26%) ⬇️
webclient βˆ… <ΓΈ> (βˆ…)
webserver 87.28% <68.42%> (-0.04%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update 98bd68a...009d00c. Read the comment docs.

πŸš€ New features to boost your workflow:
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should:

  • add the Red lamp to your PR description since it seems you wanna test stuff
  • create MRs in the osparc-config repos concerning your ENV var
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@bisgaard-itis bisgaard-itis changed the title ✨ Implement sampling tracing strategy 🚨✨ Implement sampling tracing strategy Sep 25, 2025
Copy link
Contributor

mergify bot commented Sep 25, 2025

πŸ§ͺ CI Insights

Here's what we observed from your CI run for 009d00c.

🟒 All jobs passed!

But CI Insights is watching πŸ‘€

@bisgaard-itis bisgaard-itis changed the title 🚨✨ Implement sampling tracing strategy 🚨✨ Implement tracing sampling strategy Sep 25, 2025
Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ‘

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few things that could be optimised

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the effort!

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, left some comments but most are minor or questions. Thanks a lot for this big contribution πŸŽ‰

@mrnicegyu11 mrnicegyu11 changed the title 🚨✨ Implement tracing sampling strategy 🚨✨ Implement tracing sampling strategy (🚧 devops 🚧) Oct 9, 2025
@bisgaard-itis bisgaard-itis enabled auto-merge (squash) October 9, 2025 13:13
@bisgaard-itis
Copy link
Contributor Author

@Mergifyio queue

@bisgaard-itis bisgaard-itis added the πŸ€–-automerge marks PR as ready to be merged for Mergify label Oct 9, 2025
Copy link
Contributor

mergify bot commented Oct 9, 2025

queue

🟠 Waiting for conditions to match

  • -closed [πŸ“Œ queue requirement]
  • any of: [πŸ”€ queue conditions]
    • all of: [πŸ“Œ queue conditions of queue default]
      • branch-protection-review-decision = APPROVED [πŸ›‘ GitHub branch protection]
      • label=πŸ€–-automerge
      • #approved-reviews-by >= 2 [πŸ›‘ GitHub branch protection]
      • #approved-reviews-by>=2
      • #changes-requested-reviews-by = 0 [πŸ›‘ GitHub branch protection]
      • #changes-requested-reviews-by=0
      • #review-threads-unresolved = 0 [πŸ›‘ GitHub branch protection]
      • #review-threads-unresolved=0
      • -conflict
      • -draft
      • base=master
      • label!=πŸ€–-do-not-merge
      • any of: [πŸ›‘ GitHub branch protection]
        • check-skipped = deploy to dockerhub
        • check-neutral = deploy to dockerhub
        • check-success = deploy to dockerhub
      • any of: [πŸ›‘ GitHub branch protection]
        • check-success = system-tests
        • check-neutral = system-tests
        • check-skipped = system-tests
      • any of: [πŸ›‘ GitHub branch protection]
        • check-success = unit-tests
        • check-neutral = unit-tests
        • check-skipped = unit-tests
      • any of: [πŸ›‘ GitHub branch protection]
        • check-success = check OAS' are up to date
        • check-neutral = check OAS' are up to date
        • check-skipped = check OAS' are up to date
      • any of: [πŸ›‘ GitHub branch protection]
        • check-success = integration-tests
        • check-neutral = integration-tests
        • check-skipped = integration-tests
      • any of: [πŸ›‘ GitHub branch protection]
        • check-success = build-test-images (frontend) / build-test-images
        • check-neutral = build-test-images (frontend) / build-test-images
        • check-skipped = build-test-images (frontend) / build-test-images
      • any of: [πŸ›‘ GitHub branch protection]
        • check-success = SonarCloud Code Analysis
        • check-neutral = SonarCloud Code Analysis
        • check-skipped = SonarCloud Code Analysis
  • -conflict [πŸ“Œ queue requirement]
  • -draft [πŸ“Œ queue requirement]
  • any of: [πŸ“Œ queue -> configuration change requirements]
    • -mergify-configuration-changed
    • check-success = Configuration changed
@bisgaard-itis bisgaard-itis removed the πŸ€–-automerge marks PR as ready to be merged for Mergify label Oct 9, 2025
@bisgaard-itis bisgaard-itis disabled auto-merge October 9, 2025 13:15
@mrnicegyu11 mrnicegyu11 merged commit ff5ca9c into ITISFoundation:master Oct 9, 2025
93 of 95 checks passed
@bisgaard-itis bisgaard-itis deleted the 1090-implement-sampling-tracing-strategy branch October 9, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:infra+ops maintenance of infrastructure or operations (discussed in retro)

9 participants