Skip to content

Commit 0edef02

Browse files
authored
fix: use the correct max tokens parameter name in example (#773)
1 parent 9237442 commit 0edef02

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/llmapi.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,15 @@ python3 tensorrt_llm/triton_backend/scripts/launch_triton_server.py --model_repo
2929
* Send request
3030

3131
```bash
32-
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "The future of AI is", "max_tokens":10}' | jq
32+
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "The future of AI is", "sampling_param_max_tokens":10}' | jq
3333
```
3434

3535
* Optional: include performance metrics
3636

3737
To retrieve detailed performance metrics per request such as KV cache usage, timing breakdowns, and speculative decoding statistics - add `"sampling_param_return_perf_metrics": true` to your request payload:
3838

3939
```bash
40-
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "Please explain to me what is machine learning?", "max_tokens":10, "sampling_param_return_perf_metrics":true}' | jq
40+
curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "Please explain to me what is machine learning?", "sampling_param_max_tokens":10, "sampling_param_return_perf_metrics":true}' | jq
4141
```
4242

4343
Sample response with performance metrics

0 commit comments

Comments
 (0)