Skip to content

Conversation

prwhelan
Copy link
Member

Add es.inference.requests.time metric around infer API.

As recommended by OTel spec, errors are determined by the presence or absence of the error.type attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException).

Additional notes:

  • ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM.
  • Request count is now always recorded, even when there are failures loading the endpoint configuration.
  • Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream.

Example from local node to APM (redacted a bunch):

{ "_index": ".ds-metrics-apm.app.elasticsearch-default-2024.10.25-000001", "data_stream": { "dataset": "apm.app.elasticsearch", "namespace": "default", "type": "metrics" }, "es": { "inference": { "requests": { "time": { "values": [ 6992.5 ], "counts": [ 1 ] } } } }, "labels": { "model_id": "gpt-4o-mini", "otel_instrumentation_scope_name": "elasticsearch", "service": "openai", "task_type": "completion" }, "numeric_labels": { "status_code": 200 }, ... } } 
Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream.
@prwhelan prwhelan added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.0.0 labels Oct 29, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @prwhelan, I've created a changelog YAML for you.

@prwhelan
Copy link
Member Author

@elasticmachine update branch

@prwhelan prwhelan marked this pull request as ready for review October 31, 2024 12:34
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of questions

Comment on lines 128 to 129
listener.onFailure(e);
recordMetrics(model, timer, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do listenener.onFailure(e); after we recordMetrics. I don't think that any code after calling listener.on is guaranteed to run.

};
}

protected void onCancel() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this is not an abstract method? Do we not want to implement this in all subclasses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most subclasses don't have any actions to take when the stream is canceled, they're mostly just forwarding the request/cancel and modifying the response payload, so I figured we should just continue to no-op them for onCancel

Copy link
Contributor

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prwhelan
Copy link
Member Author

prwhelan commented Nov 5, 2024

@elasticmachine update branch

@prwhelan prwhelan merged commit 26870ef into elastic:main Nov 5, 2024
16 checks passed
jozala pushed a commit that referenced this pull request Nov 13, 2024
Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream.
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream.
@jonathan-buttner
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

jonathan-buttner pushed a commit to jonathan-buttner/elasticsearch that referenced this pull request Dec 13, 2024
Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream. (cherry picked from commit 26870ef)
maxhniebergall pushed a commit that referenced this pull request Dec 13, 2024
* [ML] Inference duration and error metrics (#115876) Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream. (cherry picked from commit 26870ef) * fixing switch with class issue --------- Co-authored-by: Pat Whelan <pat.whelan@elastic.co>
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…stic#118700) * [ML] Inference duration and error metrics (elastic#115876) Add `es.inference.requests.time` metric around `infer` API. As recommended by OTel spec, errors are determined by the presence or absence of the `error.type` attribute in the metric. "error.type" will be the http status code (as a string) if it is available, otherwise it will be the name of the exception (e.g. NullPointerException). Additional notes: - ApmInferenceStats is merged into InferenceStats. Originally we planned to have multiple implementations, but now we're only using APM. - Request count is now always recorded, even when there are failures loading the endpoint configuration. - Added a hook in streaming for cancel messages, so we can close the metrics when a user cancels the stream. (cherry picked from commit 26870ef) * fixing switch with class issue --------- Co-authored-by: Pat Whelan <pat.whelan@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v9.0.0

5 participants