You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Logging errors from client library in a user application](#logging-errors-from-client-library-in-a-user-application)
11
12
*[Logging errors inside the natively instrumented Library](#logging-errors-inside-the-natively-instrumented-library)
@@ -19,7 +20,7 @@
19
20
20
21
<!-- tocstop -->
21
22
22
-
This OTEP provides guidance on how to record errors using OpenTelemetry Logs
23
+
This OTEP provides guidance on how to record errors using OpenTelemetry Logs,
23
24
focusing on minimizing duplication and providing context to reduce the noise.
24
25
25
26
In the long term, errors recorded on logs **will replace span events**
@@ -35,7 +36,7 @@ In the long term, errors recorded on logs **will replace span events**
35
36
36
37
## Motivation
37
38
38
-
Today OTel supports recording *exceptions* using span events available through the Trace API. Outside the OTel world,
39
+
Today, OTel supports recording *exceptions* using span events available through the Trace API. Outside the OTel world,
39
40
*errors* are usually recorded by user apps and libraries by using logging libraries,
40
41
and may be recorded as OTel logs via a logging bridge.
41
42
@@ -48,20 +49,20 @@ Using logs to record errors has the following advantages over using span events:
48
49
49
50
Recording errors is essential for troubleshooting, but regardless of how they are recorded, they could be noisy:
50
51
51
-
- distributed applications experience transient errors at the rate proportional to their scale and
52
-
errors in logs could be misleading - individual occurrences of transient errors
52
+
- distributed applications experience transient errors at a rate proportional to their scale, and
53
+
errors in logs could be misleading. Individual occurrences of transient errors
53
54
are not necessarily indicative of a problem.
54
-
- exception stack traces can be huge. The corresponding attribute value can frequently reach several KBs resulting in high costs
55
+
- exception stack traces can be huge. The corresponding attribute value can frequently reach several KBs, resulting in high costs
55
56
associated with ingesting and storing them. It's also common to log errors multiple times
56
-
as they bubble up leading to duplication and aggravating the verbosity problem.
57
+
as they bubble up, leading to duplication and aggravating the verbosity problem.
57
58
- severity depends on the context and, in the general case, is not known at the time the error
58
59
occurs since errors are frequently handled (suppressed, retried, ignored) by the caller.
59
60
60
61
In this OTEP, we'll provide guidance around recording errors that minimizes duplication,
61
62
allows reducing noise with configuration, and allows capturing errors in the
62
63
absence of a recorded span.
63
64
64
-
This guidance applies to general-purpose instrumentations including natively
65
+
This guidance applies to general-purpose instrumentations, including natively
65
66
instrumented libraries.
66
67
67
68
Application developers should consider following it as a starting point, but
@@ -75,7 +76,7 @@ Instrumentations SHOULD record error information along with relevant context as
75
76
a log record with appropriate severity.
76
77
77
78
Instrumentations SHOULD set severity to `Error` or higher only when the log describes a
78
-
problem affecting application functionality, availability, performance, security or
79
+
problem affecting application functionality, availability, performance, security, or
79
80
another aspect that is important for the given type of application.
80
81
81
82
When instrumentation records an exception, it SHOULD provide
@@ -109,14 +110,14 @@ be to record exception stack traces when logging exceptions at `Error` or higher
109
110
severity not higher than `Info`.
110
111
111
112
Such errors can be used to control application logic and have a minor impact, if any,
112
-
on application functionality, availability, or performance (beyond performance hit introduced
113
-
if exception is used to control application logic).
113
+
on application functionality, availability, or performance (beyond the performance hit introduced
114
+
if an exception is used to control application logic).
114
115
115
116
Examples:
116
117
117
118
- an error is returned when checking optional dependency or resource existence.
118
-
- an exception is thrown on the server when client disconnects before reading
119
-
full response from the server
119
+
- an exception is thrown on the server when the client disconnects before reading
120
+
the full response from the server.
120
121
121
122
- Errors that are expected to be retried or handled by the caller or another
122
123
layer of the component SHOULD be recorded with severity not higher than `Warn`.
@@ -127,11 +128,11 @@ be to record exception stack traces when logging exceptions at `Error` or higher
127
128
128
129
Examples:
129
130
130
-
- an attempt to connect to the required remote dependency times out
131
-
- a remote dependency returns 401 "Unauthorized" response code
132
-
- writing data to a file results in an IO exception
133
-
- a remote dependency returned 503 "Service Unavailable" response for 5 times in a row,
134
-
retry attempts are exhausted and the corresponding operation has failed.
131
+
- an attempt to connect to the required remote dependency times out.
132
+
- a remote dependency returns a 401 "Unauthorized" response code.
133
+
- writing data to a file results in an IO exception.
134
+
- a remote dependency returned a 503 "Service Unavailable" response for 5 times in a row,
135
+
retry attempts are exhausted, and the corresponding operation has failed.
135
136
136
137
- Unhandled (by the application code) errors that don't result in application
137
138
shutdown SHOULD be recorded with severity `Error`
@@ -148,8 +149,8 @@ be to record exception stack traces when logging exceptions at `Error` or higher
148
149
Note: some frameworks use exceptions as a communication mechanism when a request fails. For example,
149
150
Spring users can throw a [ResponseStatusException](https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/server/ResponseStatusException.html)
150
151
exception to return an unsuccessful status code. Such exceptions represent errors already handled by the application code.
151
-
Application code, in this case, is expected to log this at appropriate severity.
152
-
General-purpose instrumentation MAY record such errors, but at severity not higher than `Warn`.
152
+
Application code, in this case, is expected to log this at the appropriate severity.
153
+
General-purpose instrumentation MAY record such errors, but at a severity not higher than `Warn`.
153
154
154
155
- Errors that result in application shutdown SHOULD be recorded with severity `Fatal`.
155
156
@@ -169,33 +170,53 @@ be to record exception stack traces when logging exceptions at `Error` or higher
169
170
records stack trace conditionally.
170
171
171
172
8. Instrumentation libraries that record exceptions using span events SHOULD gracefully migrate
172
-
to log-based exceptions offering it as an opt-in feature first and then switching to log-based exceptions
173
+
to log-based exceptions, offering it as an opt-in feature first and then switching to log-based exceptions
173
174
in the next major version update.
174
175
175
176
## API changes
176
177
177
178
> [!NOTE]
178
179
>
179
-
> It should not be an instrumentation concern to decide whether exception stack trace
180
+
> It should not be an instrumentation concern to decide whether an exception stack trace
180
181
> should be recorded or not.
181
182
>
182
-
> A natively instrumented library may write logs providing exception instance
183
+
> A natively instrumented library may write logs providing an exception instance
183
184
> through a log bridge and not be aware of this guidance.
184
185
>
185
186
> It also may be desirable by some vendors/apps to record all exception details at all levels.
186
187
187
-
OTel Logs API SHOULD provide methods that enrich log record with exception details such as
188
-
`setException(exception)` and similar to [RecordException](../specification/trace/api.md#record-exception) method on span.
188
+
The OTel Logs API SHOULD provide methods that enrich log records with exception details such as
189
+
`setException(exception)` and similar to the [RecordException](../specification/trace/api.md#record-exception) method on span.
189
190
190
-
OTel SDK, based on the log severity and configuration, SHOULD record exception details fully or partially.
191
+
The OTel SDK, based on the log severity and configuration, SHOULD record exception details fully or partially.
191
192
192
193
The signature of the method is to be determined by each language
193
-
and can be overloaded as appropriate including ability to customize stack trace
194
+
and can be overloaded as appropriate, including the ability to customize stack trace
194
195
collection.
195
196
196
197
It MUST be possible to efficiently set exception information on a log record based on configuration
197
198
and without using the `setException` method.
198
199
200
+
## Migrating instrumentations
201
+
202
+
> [!NOTE]
203
+
> New instrumentations or existing ones that do not record exceptions on span events SHOULD
204
+
> NOT start recording exceptions on span events. They SHOULD NOT implement the migration plan
205
+
> described below.
206
+
>
207
+
> This section covers migration recommendations for existing instrumentations that already
208
+
> report exceptions using span events.
209
+
210
+
We will define a configuration option to let users choose if they want instrumentations to record exceptions
211
+
on span events or logs.
212
+
213
+
Specific instrumentation SHOULD default to recording exceptions on span events in its current major version
214
+
and record them on logs only when the user opts-in.
215
+
216
+
In the next major version, this instrumentation SHOULD stop recording exceptions on span events.
217
+
218
+
This is a simplified version of [stability opt-in migration](https://github.com/open-telemetry/semantic-conventions/blob/727700406f9e6cc3f4e4680a81c4c28f2eb71569/docs/http/README.md?plain=1#L13-L37) used in semantic conventions.
219
+
199
220
## Examples
200
221
201
222
### Logging errors from client library in a user application
@@ -236,7 +257,7 @@ try {
236
257
### Logging errors inside the natively instrumented Library
237
258
238
259
It's a common practice to record errors using logging libraries. Client libraries that are natively instrumented with OpenTelemetry should
239
-
leverage OTel Events/Logs API for their exception logging purposes.
260
+
leverage the OTel Events/Logs API for their exception logging purposes.
240
261
241
262
```java
242
263
publicclassStorageClient {
@@ -250,9 +271,9 @@ public class StorageClient {
250
271
}
251
272
252
273
logger.logRecordBuilder()
253
-
// In general we don't know if it's an error - we expect caller
254
-
// to handle it and decide. So this is warning (at most).
255
-
// If exception thrown below remains unhandled, it'd be logged by the global handler.
274
+
// In general we don't know if it's an error - we expect the caller
275
+
// to handle it and decide. So this is a warning (at most).
276
+
// If the exception thrown below remains unhandled, it'd be logged by the global handler.
0 commit comments