Skip to content

Commit de54e7f

Browse files
authored
[TRTLLM-6104] add docs on request_perf_metrics to triton LLMAPI backen (#769)
1 parent 448706a commit de54e7f

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

docs/llmapi.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,35 @@ python3 tensorrt_llm/triton_backend/scripts/launch_triton_server.py --model_repo
3232
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "The future of AI is", "max_tokens":10}' | jq
3333
```
3434

35+
* Optional: include performance metrics
36+
37+
To retrieve detailed performance metrics per request such as KV cache usage, timing breakdowns, and speculative decoding statistics - add `"sampling_param_return_perf_metrics": true` to your request payload:
38+
39+
```bash
40+
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "Please explain to me what is machine learning?", "max_tokens":10, "sampling_param_return_perf_metrics":true}' | jq
41+
```
42+
43+
Sample response with performance metrics
44+
```json
45+
{
46+
"acceptance_rate": "0.0",
47+
"arrival_time_ns": "76735247746000",
48+
"first_scheduled_time_ns": "76735248284000",
49+
"first_token_time_ns": "76735374300000",
50+
"kv_cache_alloc_new_blocks": "1",
51+
"kv_cache_alloc_total_blocks": "1",
52+
"kv_cache_hit_rate": "0.0",
53+
"kv_cache_missed_block": "1",
54+
"kv_cache_reused_block": "0",
55+
"last_token_time_ns": "76736545324000",
56+
"model_name": "tensorrt_llm",
57+
"model_version": "1",
58+
"text_output": "Please explain to me what is machine learning? \n\nMachine learning is a field of computer science that involves the development of algorithms and models that can learn from data without being explicitly programmed. It is a",
59+
"total_accepted_draft_tokens": "0",
60+
"total_draft_tokens": "0"
61+
}
62+
```
63+
3564
`inflight_batcher_llm_client.py` is not supported yet.
3665

3766
* Run test on dataset

0 commit comments

Comments
 (0)