Skip to content

Conversation

@nv-guomingz
Copy link
Collaborator

Fix the missing typos on coda_graph_config.

@nv-guomingz
Copy link
Collaborator Author

/bot run

@nv-guomingz nv-guomingz changed the title doc: update docs with right cuda_graph_config doc: update cuda_graph_config usage part in DS R1 docs Jul 7, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #11154 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11154 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8248 completed with status: 'SUCCESS'

@kaiyux kaiyux requested a review from Copilot July 8, 2025 06:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces deprecated CUDA Graph flags (cuda_graph_padding_enabled and cuda_graph_batch_sizes) with the new nested cuda_graph_config structure across tests and documentation.

  • Updated integration tests to use the new cuda_graph_config dict in model configs.
  • Revised tech blog and best practice docs to show cuda_graph_config usage.
  • Ensured consistency of config examples across YAML snippets.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/integration/defs/perf/pytorch_model_config.py Replaced flat CUDA Graph flags with nested cuda_graph_config.
docs/source/blogs/tech_blog/blog3_Optimizing_DeepSeek_R1_Throughput_on_NVIDIA_Blackwell_GPUs.md Updated inline code example to reference cuda_graph_config.
docs/source/blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md Refactored YAML snippets to use the new cuda_graph_config map.
Comments suppressed due to low confidence (3)

docs/source/blogs/tech_blog/blog3_Optimizing_DeepSeek_R1_Throughput_on_NVIDIA_Blackwell_GPUs.md:154

  • The inline code snippet includes a literal \n which may not render as intended. Consider using a fenced code block or multi-line inline code for cuda_graph_config to improve readability and rendering.
 This had a significant **22% E2E performance impact** for throughput scenarios. CUDA Graphs allow capturing a sequence of CUDA operations and launching them as a single unit, drastically reducing kernel launch overheads. This is particularly beneficial for models with many small kernels, and particularly on the PyTorch flow, because the python host code normally executes slower than C++. Since the CUDA Graph freezes the kernel launch parameters, which is normally associated with the tensor shapes, it can only be safely used with static shape, meaning that different CUDA graphs need to be captured for different batch sizes. Each graph will have some cost of memory usage, and capturing time, thus we cannot capture every possible CUDA graph for all possible batches. For the non-captured batch sizes, PyTorch eager mode code will be executed. There is a feature called CUDA Graph padding in TensorRT-LLM, which is a good trade-off between the number of CUDA Graphs and the CUDA Graph hit ratio; it tries to pad a batch to the nearest one with a captured CUDA Graph. Normally you should enable the CUDA Graph padding feature to increase the CUDA Graph hit rate, but the padding itself has some overhead due to wasted tokens computation. Users can opt-out the CUDA Graph padding feature to see the perf benefits, by setting the `cuda_graph_config:\n padding_enabled: False`, see API here [Pytorch backend config](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/config.py#L41) 

docs/source/blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md:198

  • The YAML example drops the leading hyphens used for mapping keys and may not be valid YAML as shown. Update the snippet to present a proper mapping, e.g.:```yaml
    use_cuda_graph: true
    cuda_graph_config:
    padding_enabled: true
    batch_sizes:
    • 896
    • 512
    • 256

cuda_graph_config:

**docs/source/blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md:265** * Similarly, in the second YAML snippet ensure consistent YAML mapping syntax rather than list-like hyphens before each key. Use a code fence to clearly separate document structure from list items. 

cuda_graph_config:

</details> 
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
@nv-guomingz nv-guomingz force-pushed the user/guomingz/update_cuda_graph_config_doc branch from ecebf3d to f1235b4 Compare July 8, 2025 07:08
@nv-guomingz
Copy link
Collaborator Author

/bot reuse-pipeline

@nv-guomingz nv-guomingz enabled auto-merge (squash) July 8, 2025 07:09
@tensorrt-cicd
Copy link
Collaborator

PR_Github #11242 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11242 [ reuse-pipeline ] completed with state SUCCESS
Release Check Pipeline #1411 failed
Reusing PR_Github #11154 for commit f1235b4

…ghput_on_NVIDIA_Blackwell_GPUs.md Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
@nv-guomingz nv-guomingz force-pushed the user/guomingz/update_cuda_graph_config_doc branch from f1235b4 to a077921 Compare July 8, 2025 07:39
@nv-guomingz
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11250 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11250 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #11154 for commit a077921

@nv-guomingz nv-guomingz merged commit c8fa08d into NVIDIA:main Jul 8, 2025
3 checks passed
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Yuxin <yuxinz@nvidia.com>
@nv-guomingz nv-guomingz deleted the user/guomingz/update_cuda_graph_config_doc branch September 30, 2025 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants