You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.
22
+
This OTEP provides guidance on how to record errors using OpenTelemetry Logs
23
+
focusing on minimizing duplication and providing context to reduce the noise.
24
+
25
+
In the long term, errors recorded on logs **will replace span events**
26
+
(according to [Event vision OTEP](./0265-event-vision.md)).
27
+
28
+
> [!NOTE]
29
+
> Throughout the OTEP *exception* and *error* are used in the following way:
30
+
> -*Error* refers to a general concept describing any non-success condition,
31
+
> which may manifest as an exception, non-successful status code, or an invalid
32
+
> response.
33
+
> -*Exception* specifically refers to runtime exceptions and their associated stack traces.
23
34
24
35
## Motivation
25
36
26
-
Today OTel supports recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries and may be recorded as OTel logs via logging bridge.
37
+
Today OTel supports recording *exceptions* using span events available through Trace API. Outside of OTel world,
38
+
*errors* are usually recorded by user apps and libraries using logging libraries
39
+
and may be recorded as OTel logs via logging bridge.
27
40
28
-
Exceptions recorded on logs have the following advantages over span events:
41
+
Errors recorded on logs have the following advantages over span events:
29
42
30
43
- they can be recorded for operations that don't have any tracing instrumentation
31
44
- they can be sampled along with or separately from spans
32
-
- they can have different severity levels to reflect how critical the exception is
45
+
- they can have different severity levels to reflect how critical the error is
33
46
- they are already reported natively by many frameworks and libraries
34
47
35
-
Recording exceptions is essential for troubleshooting, but regardless of how exceptions are recorded, they could be noisy:
48
+
Recording errors is essential for troubleshooting, but regardless of how they are recorded, they could be noisy:
36
49
37
-
- distributed applications experience transient errors at the rate proportional to their scale and exceptions in logs could be misleading -
38
-
individual occurrence of transient errors are not necessarily indicative of a problem.
50
+
- distributed applications experience transient errors at the rate proportional to their scale and
51
+
errors in logs could be misleading - individual occurrence of transient errors
52
+
are not necessarily indicative of a problem.
39
53
- exception stack traces can be huge. Corresponding attribute value can frequently reach several KBs resulting in high costs
40
-
associated with ingesting and storing them. It's also common to log exceptions multiple times while they bubble up
41
-
leading to duplication and aggravating the verbosity problem.
54
+
associated with ingesting and storing them. It's also common to log errors multiple times
55
+
as they bubble up leading to duplication and aggravating the verbosity problem.
56
+
- severity depends on the context and, in general case, is not known when error
57
+
occurs. Errors are frequently handled (suppressed, retried, ignored) by the caller.
58
+
59
+
In this OTEP, we'll provide guidance around recording errors that minimizes duplication,
60
+
allows reducing noise with configuration, and allows capturing errors in the
61
+
absence of a recorded span.
42
62
43
-
In this OTEP, we'll provide guidance around recording exceptions that minimizes duplication, allows reducing noise with configuration, and
44
-
allows capturing exceptions in the absence of a recorded span.
63
+
This guidance applies to general-purpose instrumentations including natively
64
+
instrumented libraries.
45
65
46
-
This guidance applies to general-purpose instrumentations including native ones. Application developers should consider following it as a
47
-
starting point, but they are encouraged to adjust it to their needs.
66
+
Application developers should consider following it as a starting point, but
67
+
they are encouraged to adjust it to their needs.
48
68
49
69
## Guidance
50
70
51
71
This guidance boils down to the following:
52
72
53
-
Instrumentations SHOULD record exception information (along with other context) as a log record with appropriate severity.
54
-
Only unhandled exceptions SHOULD be recorded as `Error` or higher. Instrumentations SHOULD do the best effort to report
55
-
each exception once.
73
+
Instrumentations SHOULD record error information along with relevant context as
74
+
a log record with appropriate severity.
56
75
57
-
Instrumentations SHOULD provide the whole exception instance to the OTel SDK so it can
58
-
record it fully or partially based on provided configuration. The default SDK behavior SHOULD
59
-
be to record exception stack traces when logging exceptions at `Error` or higher severity.
76
+
Instrumentations SHOULD set severity to `Error` or higher only when log describes a
77
+
problem affecting application functionality, availability, performance, security or
78
+
another aspect important for this type of applications.
60
79
61
-
In the long term, exceptions recorded on logs will replace span events (according to [Event vision OTEP](./0265-event-vision.md)).
80
+
When instrumentation records exception, it SHOULD provide
81
+
the whole exception instance to the OTel SDK so the SDK can record it fully or
82
+
partially based on provided configuration. The default SDK behavior SHOULD
83
+
be to record exception stack traces when logging exceptions at `Error` or higher severity.
62
84
63
85
### Details
64
86
65
-
1.Exceptions SHOULD be recorded as[logs](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/exceptions/exceptions-logs.md)
66
-
or [log-based events](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/events.md)
87
+
1.Errors SHOULD be recorded on[logs](https://github.com/open-telemetry/semantic-conventions/blob/v1.29.0/docs/exceptions/exceptions-logs.md)
88
+
or as [log-based events](https://github.com/open-telemetry/semantic-conventions/blob/v1.29.0/docs/general/events.md)
67
89
68
-
2. Instrumentations for incoming requests, message processing, background job execution, or others that wrap user code and usually
69
-
create local root spans, SHOULD record logs for unhandled exceptions with `Error` severity.
90
+
2. Instrumentations for incoming requests, message processing, background job execution, or others that wrap application code and usually
91
+
create local root spans, SHOULD record logs for unhandled errors with `Error` severity.
70
92
71
93
Some runtimes provide global exception handler that can be used to log exceptions.
72
94
Priority should be given to the instrumentation point where the operation context is available.
73
95
Language SIGs are encouraged to give runtime-specific guidance. For example, here is the
3. Natively instrumented libraries SHOULD record a log describing an exception and the context it happened in
78
-
as soon as the exception is detected (or where the most context is available).
99
+
3. Natively instrumented libraries SHOULD record a log describing an error and the context it happened in
100
+
as soon as the error is detected (or where the most context is available).
79
101
80
-
4. It's NOT RECOMMENDED to record the same exception as it propagates through the stack frames, or
102
+
4. It's NOT RECOMMENDED to record the same error as it propagates through the call stack, or
81
103
to attach the same instance of an exception to multiple log records.
82
104
83
-
5. An exception (or error) SHOULD be logged with appropriate severity depending on the available context.
105
+
5. An error SHOULD be logged with appropriate severity depending on the available context.
84
106
85
-
-Exceptions or errors that don't indicate actual issues SHOULD be recorded with
107
+
-Errors that don't indicate actual issues SHOULD be recorded with
86
108
severity not higher than `Info`.
87
109
88
-
Such exceptions can be used to control application logic and have a minor impact, if any,
89
-
on application functionality, availability, or performance.
110
+
Such errors can be used to control application logic and have a minor impact, if any,
111
+
on application functionality, availability, or performance (beyond performance hit introduced
112
+
if exception is used to control application logic).
90
113
91
114
Examples:
92
115
93
-
- exception is thrown when checking optional dependency or resource existence.
94
-
- exception thrown when client disconnects before reading full response from the server
116
+
- error is returned when checking optional dependency or resource existence.
117
+
- exception is thrown on the server when client disconnects before reading
118
+
full response from the server
95
119
96
-
-Exceptions or errors that are expected to be retried or handled by the caller or another
97
-
layer of the component SHOULD be recorded with severity not higher than `Warning`.
120
+
-Errors that are expected to be retried or handled by the caller or another
121
+
layer of the component SHOULD be recorded with severity not higher than `Warn`.
98
122
99
-
Such exceptions represent transient failures that are common and expected in
123
+
Such errors represent transient failures that are common and expected in
100
124
distributed applications. They typically increase the latency of individual
101
125
operations and have a minor impact on overall application availability.
102
126
@@ -108,40 +132,40 @@ In the long term, exceptions recorded on logs will replace span events (accordin
108
132
- remote dependency returned 503 "Service Unavailable" response for 5 times in a row,
109
133
retry attempts are exhausted and the corresponding operation has failed.
110
134
111
-
- Unhandled (by the user code) exceptions that don't result in application shutdown SHOULD
112
-
be recorded with severity `Error`
135
+
- Unhandled (by the application code) errors that don't result in application
136
+
shutdown SHOULD be recorded with severity `Error`
113
137
114
-
These exceptions are not expected and may indicate a bug in the application logic
138
+
These errors are not expected and may indicate a bug in the application logic
115
139
that this application instance was not able to recover from or a gap in the error
116
140
handling logic.
117
141
118
142
Examples:
119
143
120
144
- Background job terminates with an exception
121
-
- HTTP framework error handler catches exception thrown by the user code.
145
+
- HTTP framework error handler catches exception thrown by the application code.
122
146
123
147
Note: some frameworks use exceptions as a communication mechanism when request fails. For example,
124
148
Spring users can throw [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)
125
149
exception to return unsuccessful status code. Such exceptions represent errors already handled by the application code.
126
-
Application code, in this case, is expected to logs error at appropriate severity and
127
-
general-purpose instrumentation SHOULD NOT record such exceptions.
150
+
Application code, in this case, is expected to log this at appropriate severity.
151
+
General-purpose instrumentation MAY record such errors, but at severity not higher than `Warn`.
128
152
129
-
-Exceptions or errors that result in application shutdown SHOULD be recorded with severity `Fatal`.
153
+
-Errors that result in application shutdown SHOULD be recorded with severity `Fatal`.
130
154
131
155
- The application detects an invalid configuration at startup and shuts down.
132
156
- The application encounters a (presumably) terminal error, such as an out-of-memory condition.
133
157
134
-
1. When recording exception on logs, user applications and instrumentations are encouraged to add additional attributes
158
+
6. When recording exceptions on logs, applications and instrumentations are encouraged to add additional attributes
135
159
to describe the context that the exception was thrown in.
136
160
They are also encouraged to define their own error events and enrich them with exception details.
137
161
138
-
2. OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to
162
+
7. OTel SDK SHOULD record stack traces on exceptions with severity `Error` or higher and SHOULD allow users to
139
163
change the threshold.
140
164
141
165
See [logback exception config](https://logback.qos.ch/manual/layouts.html#ex) for an example of configuration that
142
166
records stack trace conditionally.
143
167
144
-
3. Instrumentation libraries that record exceptions using span events SHOULD gracefully migrate
168
+
8. Instrumentation libraries that record exceptions using span events SHOULD gracefully migrate
145
169
to log-based exceptions offering it as an opt-in feature first and then switching to log-based exceptions
146
170
in the next major version update.
147
171
@@ -163,15 +187,15 @@ OTel Logs API SHOULD provide methods that enrich log record with exception detai
163
187
OTel SDK, based on the log severity and configuration, SHOULD record exception details fully or partially.
164
188
165
189
The signature of the method is to be determined by each language
166
-
and can be overloaded as appropriate including ability to collect and customize stack trace
190
+
and can be overloaded as appropriate including ability to customize stack trace
167
191
collection.
168
192
169
-
It MUST be possible to efficiently set exception information on a log record without
170
-
using the `setException` method.
193
+
It MUST be possible to efficiently set exception information on a log record based on configuration
194
+
and without using the `setException` method.
171
195
172
196
## Examples
173
197
174
-
### Logging exception from client library in a user application
198
+
### Logging errors from client library in a user application
@@ -334,12 +360,13 @@ See [corresponding Java (tracing) instrumentation](https://github.com/open-telem
334
360
335
361
## Trade-offs and mitigations
336
362
337
-
1. Breaking change for any component following existing [exception guidance](/specification/trace/exceptions.md) which recommends recording exceptions as span events in every instrumentation that detects them.
363
+
1. Switching from recording exceptions as span events to log records is a breaking change
364
+
for any component following existing [exception guidance](/specification/trace/exceptions.md).
338
365
339
366
**Mitigation:**
340
367
- OpenTelemetry API and/or SDK in the future may provide opt-in span events -> log-based events conversion,
341
-
but that's not enough - instrumentations will have to change their behavior to report exception logs
342
-
with appropriate severity (or stop reporting them).
368
+
but that's not enough - instrumentations will have to change their behavior to report errors
369
+
as logs with appropriate severity.
343
370
- We should provide opt-in mechanism for existing instrumentations to switch to logs.
344
371
345
372
2. Recording exceptions as log-based events would result in UX degradation for users
@@ -355,12 +382,8 @@ Alternatives:
355
382
356
383
1. Deduplicate exception info by marking exception instances as logged.
357
384
This can potentially mitigate the problem for existing application when it logs exceptions extensively.
358
-
We should still provide optimal guidance for the greenfield applications and libraries.
359
-
360
-
2. Log full exception info only when exception is thrown for the first time.
361
-
This results in at-most-once logging, but even this is known to be problematic since absolute
362
-
majority of exceptions are handled.
363
-
It also relies on the assumption that most libraries will follow this guidance.
385
+
We should still provide optimal guidance for the greenfield applications and libraries,
0 commit comments