-
- Notifications
You must be signed in to change notification settings - Fork 10.8k
[Misc] Add metrics for request queue time, forward time, and execute time #9659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
model_execute_time and time_in_queue.| @WoosukKwon, Could you help me review the code? |
| @njhill , Could you help me review the code? Thanks! |
| @youkaichao , Could you help me review the code? Thanks! |
…time (vllm-project#9659) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…time (vllm-project#9659) Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…time (vllm-project#9659) Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
vllm:time_in_queue_requests appears to be an exact duplicate of vllm:request_queue_time_seconds. Both record first_scheduled_time-arrival_time: ``` if seq_group.is_finished(): time_queue_requests.append( seq_group.metrics.first_scheduled_time - seq_group.metrics.arrival_time) ``` ``` def maybe_set_first_scheduled_time(self, time: float) -> None: if self.metrics.first_scheduled_time is None: self.metrics.first_scheduled_time = time self.metrics.time_in_queue = time - self.metrics.arrival_time ``` vllm:time_in_queue_requests was added by vllm-project#9659 and vllm:request_queue_time_seconds was later added by vllm-project#4464. However, neither existed when each PR was first created. The latter seems like the right one to keep since it is implemented in V1, used in the Grafana dashboard, and has test coverage. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Metrics originally added by vllm-project#9659 These seem to be of questionable value relative to the existing prefill, decode, and inference time metrics. And since they would be challenging to implement in V1, and they don't conform to the standard of using seconds as units, let's deprecate them Signed-off-by: Mark McLoughlin <markmc@redhat.com>
…time (vllm-project#9659) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
/metricsAPI.collect_model_forward_timeandcollect_model_execute_timedue to the--otlp-traces-endpointflag, so that metrics can also collect information aboutmodel_forward_timeandmodel_execute_time.