You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "The future of AI is", "max_tokens":10}'| jq
33
33
```
34
34
35
+
* Optional: include performance metrics
36
+
37
+
To retrieve detailed performance metrics per request such as KV cache usage, timing breakdowns, and speculative decoding statistics - add `"sampling_param_return_perf_metrics": true` to your request payload:
38
+
39
+
```bash
40
+
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "Please explain to me what is machine learning?", "max_tokens":10, "sampling_param_return_perf_metrics":true}'| jq
41
+
```
42
+
43
+
Sample response with performance metrics
44
+
```json
45
+
{
46
+
"acceptance_rate": "0.0",
47
+
"arrival_time_ns": "76735247746000",
48
+
"first_scheduled_time_ns": "76735248284000",
49
+
"first_token_time_ns": "76735374300000",
50
+
"kv_cache_alloc_new_blocks": "1",
51
+
"kv_cache_alloc_total_blocks": "1",
52
+
"kv_cache_hit_rate": "0.0",
53
+
"kv_cache_missed_block": "1",
54
+
"kv_cache_reused_block": "0",
55
+
"last_token_time_ns": "76736545324000",
56
+
"model_name": "tensorrt_llm",
57
+
"model_version": "1",
58
+
"text_output": "Please explain to me what is machine learning? \n\nMachine learning is a field of computer science that involves the development of algorithms and models that can learn from data without being explicitly programmed. It is a",
59
+
"total_accepted_draft_tokens": "0",
60
+
"total_draft_tokens": "0"
61
+
}
62
+
```
63
+
35
64
`inflight_batcher_llm_client.py` is not supported yet.
0 commit comments