Skip to content

Conversation

@kruskall
Copy link
Member

Motivation/summary

The otel span start/end dates are used to compute the event
duration of the APM event, stored as a time.Duration.
The duration is converted to nanoseconds stored in an int64
field, however the values is casted to int in the event_duration
pipeline leading to an overflow if the original 64 bit value
cannot be casted safely to the 32-bit int.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Related issues

Closes #9780

The otel span start/end dates are used to compute the event duration of the APM event, stored as a time.Duration. The duration is converted to nanoseconds stored in an int64 field, however the values is casted to int in the event_duration pipeline leading to an overflow if the original 64 bit value cannot be casted safely to the 32-bit int.
@kruskall kruskall requested a review from a team December 28, 2022 02:09
@kruskall kruskall changed the title Fix/event duration overflow fix: cast event.duration to long in event_duration pipeline Dec 28, 2022
@mergify
Copy link
Contributor

mergify bot commented Dec 28, 2022

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Dec 28, 2022
Copy link
Contributor

@marclop marclop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the quick fix. We should backport to 8.6 as well

@kruskall kruskall added the backport-8.6 Automated backport with mergify label Dec 28, 2022
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Dec 28, 2022
Copy link
Contributor

@marclop marclop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kruskall can you also add a changelog entry, please?

@ghost
Copy link

ghost commented Dec 28, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-12-28T09:14:19.434+0000

  • Duration: 20 min 9 sec

Test stats 🧪

Test Results
Failed 0
Passed 154
Skipped 0
Total 154

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ghost
Copy link

ghost commented Dec 28, 2022

📚 Go benchmark report

Diff with the main branch

name old time/op new time/op delta pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64 FetchAndAdd/FetchFromCache-12 41.2ns ± 0% 47.0ns ± 3% +14.03% (p=0.016 n=4+5) FetchAndAdd/FetchAndAddToCache-12 89.4ns ± 2% 97.8ns ± 1% +9.39% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64 ContextReset/X-Real-IP_ipv4-12 876ns ± 8% 776ns ±15% -11.49% (p=0.032 n=5+5) ContextReset/Remote_Addr_ipv4-12 432ns ± 4% 589ns ±12% +36.52% (p=0.016 n=4+5) ContextResetContentEncoding/empty-12 110ns ± 1% 123ns ± 0% +11.82% (p=0.008 n=5+5) ContextResetContentEncoding/uncompressed-12 129ns ± 0% 145ns ± 1% +12.31% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64 Publisher-12 1.00s ± 0% 1.00s ± 0% -0.09% (p=0.032 n=5+5) pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64 WriteTransaction/json_codec-12 4.12µs ± 5% 10.09µs ±72% +145.28% (p=0.008 n=5+5) WriteTransaction/json_codec_big_tx-12 4.89µs ± 2% 9.47µs ±52% +93.78% (p=0.008 n=5+5) ReadEvents/json_codec/0_events-12 316ns ± 7% 355ns ± 5% +12.51% (p=0.008 n=5+5) ReadEvents/json_codec/1_events-12 10.3µs ± 4% 10.7µs ± 2% +3.64% (p=0.032 n=5+5) ReadEvents/nop_codec/0_events-12 303ns ± 7% 343ns ± 5% +13.17% (p=0.016 n=5+5) ReadEvents/nop_codec_big_tx/0_events-12 311ns ± 7% 341ns ± 5% +9.73% (p=0.016 n=5+5) IsTraceSampled/sampled-12 69.9ns ± 3% 77.9ns ± 3% +11.53% (p=0.008 n=5+5) IsTraceSampled/unsampled-12 70.9ns ± 1% 80.4ns ± 1% +13.33% (p=0.008 n=5+5) IsTraceSampled/unknown-12 372ns ± 4% 419ns ± 2% +12.76% (p=0.008 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64 WriteTransaction/json_codec-12 3.00kB ± 0% 3.00kB ± 0% +0.03% (p=0.029 n=4+4) WriteTransaction/json_codec_big_tx-12 3.77kB ± 0% 3.78kB ± 0% +0.04% (p=0.016 n=4+5) ReadEvents/json_codec/199_events-12 1.10MB ± 0% 1.10MB ± 0% +0.06% (p=0.029 n=4+4) ReadEvents/nop_codec_big_tx/100_events-12 251kB ± 0% 250kB ± 0% -0.09% (p=0.016 n=5+5) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64 pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64 ReadEvents/json_codec/399_events-12 5.90k ± 0% 5.90k ± 0% +0.02% (p=0.029 n=4+4) 

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Copy link
Contributor

@lahsivjar lahsivjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the changelog entry Marc mentioned earlier.

@mergify
Copy link
Contributor

mergify bot commented Dec 28, 2022

This pull request is now in conflicts. Could you fix it @kruskall? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream git checkout -b fix/event-duration-overflow upstream/fix/event-duration-overflow git merge upstream/main git push upstream fix/event-duration-overflow 
@kruskall kruskall requested a review from marclop December 28, 2022 09:14
@kruskall
Copy link
Member Author

kruskall commented Dec 28, 2022

@marclop Added a changelog. Thanks for mentioning that! 🙇

@kruskall kruskall enabled auto-merge (squash) December 28, 2022 09:28
@kruskall kruskall merged commit f6872f4 into elastic:main Dec 28, 2022
@kruskall kruskall deleted the fix/event-duration-overflow branch December 28, 2022 09:35
mergify bot pushed a commit that referenced this pull request Dec 28, 2022
* fix: cast event.duration to long in event_duration pipeline The otel span start/end dates are used to compute the event duration of the APM event, stored as a time.Duration. The duration is converted to nanoseconds stored in an int64 field, however the values is casted to int in the event_duration pipeline leading to an overflow if the original 64 bit value cannot be casted safely to the 32-bit int. * test: update otlp system-test to validate duration overflow fix * changelog: add changelog entry (cherry picked from commit f6872f4) # Conflicts: #	apmpackage/apm/changelog.yml
kruskall added a commit that referenced this pull request Dec 29, 2022
…ckport #9901) (#9910) * fix: cast event.duration to long in event_duration pipeline (#9901) * fix: cast event.duration to long in event_duration pipeline The otel span start/end dates are used to compute the event duration of the APM event, stored as a time.Duration. The duration is converted to nanoseconds stored in an int64 field, however the values is casted to int in the event_duration pipeline leading to an overflow if the original 64 bit value cannot be casted safely to the 32-bit int. * test: update otlp system-test to validate duration overflow fix * changelog: add changelog entry (cherry picked from commit f6872f4) # Conflicts: #	apmpackage/apm/changelog.yml * changelog: fix conflict and add 8.6 changelog * changelog: fix changelog version 8.6 is not released yet, do not use the explicit version in the changelog file. Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com>
@SenthilkumarRasan
Copy link

do we have the above fix backport to 8.4 version?

@simitt
Copy link
Contributor

simitt commented Jan 9, 2023

The bug fix will be part of 8.6.0, there is no further release planned for 8.4.

@SenthilkumarRasan
Copy link

is there ETA for 8.6 release? we are waiting on this feature to have proper data from open-telemetry for Jenkins pipelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.6 Automated backport with mergify

6 participants