Skip to content

Commit 2d7f34c

Browse files
committed
clean up
Signed-off-by: Erin Ho <erinh@nvidia.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
1 parent ebd9e33 commit 2d7f34c

File tree

2 files changed

+5
-39
lines changed

2 files changed

+5
-39
lines changed

docs/source/helper.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,10 @@ def extract_meta_info(filename: str) -> Optional[DocMeta]:
5959

6060
def generate_examples():
6161
root_dir = Path(__file__).parent.parent.parent.resolve()
62-
ignore_list = {'__init__.py', 'quickstart_example.py'}
62+
ignore_list = {
63+
'__init__.py', 'quickstart_example.py', 'quickstart_advanced.py',
64+
'quickstart_multimodal.py', 'star_attention.py'
65+
}
6366
doc_dir = root_dir / "docs/source/examples"
6467

6568
def collect_script_paths(examples_subdir: str) -> list[Path]:

docs/source/llm-api/index.md

Lines changed: 1 addition & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,10 @@
22

33
The LLM API is a high-level Python API designed to streamline LLM inference workflows.
44

5-
It supports a broad range of use cases, from single-GPU setups to multi-GPU and multi-node deployments, with built-in support for various parallelism strategies and advanced features. The LLM API integrates seamlessly with the broader inference ecosystem, including NVIDIA [Dynamo](https://github.com/ai-dynamo/dynamo) and the [Triton Inference Server](https://github.com/triton-inference-server/server).
5+
It supports a broad range of use cases, from single-GPU setups to multi-GPU and multi-node deployments, with built-in support for various parallelism strategies and advanced features. The LLM API integrates seamlessly with the broader inference ecosystem, including NVIDIA [Dynamo](https://github.com/ai-dynamo/dynamo).
66

77
While the LLM API simplifies inference workflows with a high-level interface, it is also designed with flexibility in mind. Under the hood, it uses a PyTorch-native and modular backend, making it easy to customize, extend, or experiment with the runtime.
88

9-
## Table of Contents
10-
- [Quick Start Example](#quick-start-example)
11-
- [Supported Models](#supported-models)
12-
- [Tips and Troubleshooting](#tips-and-troubleshooting)
139

1410
## Quick Start Example
1511
A simple inference example with TinyLlama using the LLM API:
@@ -53,39 +49,6 @@ llm = LLM(model=<local_path_to_model>)
5349
> **Note:** Some models require accepting specific [license agreements]((https://ai.meta.com/resources/models-and-libraries/llama-downloads/)). Make sure you have agreed to the terms and authenticated with Hugging Face before downloading.
5450
5551

56-
## Supported Models
57-
58-
59-
| Models | [Model Class Name](https://github.com/NVIDIA/TensorRT-LLM/tree/main/tensorrt_llm/_torch/models) | HuggingFace Model ID Example | Modality |
60-
| :-------------------------------------------------------------- | :----------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------ | :------: |
61-
| BERT-based | `BertForSequenceClassification` | `textattack/bert-base-uncased-yelp-polarity` | L |
62-
| DeepSeek-V3 | `DeepseekV3ForCausalLM` | `deepseek-ai/DeepSeek-V3 ` | L |
63-
| Gemma3 | `Gemma3ForCausalLM` | `google/gemma-3-1b-it` | L |
64-
| HyperCLOVAX-SEED-Vision | `HCXVisionForCausalLM` | `naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B` | L + V |
65-
| VILA | `LlavaLlamaModel` | `Efficient-Large-Model/NVILA-8B` | L + V |
66-
| LLaVA-NeXT | `LlavaNextForConditionalGeneration` | `llava-hf/llava-v1.6-mistral-7b-hf` | L + V |
67-
| Llama 3 <br> Llama 3.1 <br> Llama 2 <br> LLaMA | `LlamaForCausalLM` | `meta-llama/Meta-Llama-3.1-70B` | L |
68-
| Llama 4 Scout <br> Llama 4 Maverick | `Llama4ForConditionalGeneration` | `meta-llama/Llama-4-Scout-17B-16E-Instruct` <br> `meta-llama/Llama-4-Maverick-17B-128E-Instruct` | L + V |
69-
| Mistral | `MistralForCausalLM` | `mistralai/Mistral-7B-v0.1` | L |
70-
| Mixtral | `MixtralForCausalLM` | `mistralai/Mixtral-8x7B-v0.1` | L |
71-
| Llama 3.2 | `MllamaForConditionalGeneration` | `meta-llama/Llama-3.2-11B-Vision` | L |
72-
| Nemotron-3 <br> Nemotron-4 <br> Minitron | `NemotronForCausalLM` | `nvidia/Minitron-8B-Base` | L |
73-
| Nemotron-H | `NemotronHForCausalLM` | `nvidia/Nemotron-H-8B-Base-8K` <br> `nvidia/Nemotron-H-47B-Base-8K` <br> `nvidia/Nemotron-H-56B-Base-8K` | L |
74-
| LLamaNemotron <br> LlamaNemotron Super <br> LlamaNemotron Ultra | `NemotronNASForCausalLM` | `nvidia/Llama-3_1-Nemotron-51B-Instruct` <br> `nvidia/Llama-3_3-Nemotron-Super-49B-v1` <br> `nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` | L |
75-
| QwQ, Qwen2 | `Qwen2ForCausalLM` | `Qwen/Qwen2-7B-Instruct` | L |
76-
| Qwen2-based | `Qwen2ForProcessRewardModel` | `Qwen/Qwen2.5-Math-PRM-7B` | L |
77-
| Qwen2-based | `Qwen2ForRewardModel` | `Qwen/Qwen2.5-Math-RM-72B` | L |
78-
| Qwen2-VL | `Qwen2VLForConditionalGeneration` | `Qwen/Qwen2-VL-7B-Instruct` | L + V |
79-
| Qwen2.5-VL | `Qwen2_5_VLForConditionalGeneration` | `Qwen/Qwen2.5-VL-7B-Instruct` | L + V |
80-
81-
82-
- **L**: Language model only
83-
- **L + V**: Language and Vision multimodal support
84-
- Llama 3.2 accepts vision input, but our support currently limited to text only.
85-
86-
> **Note:** For the most up-to-date list of supported models, you may refer to the [TensorRT-LLM model definitions](https://github.com/NVIDIA/TensorRT-LLM/tree/main/tensorrt_llm/_torch/models).
87-
88-
8952
## Tips and Troubleshooting
9053

9154
The following tips typically assist new LLM API users who are familiar with other APIs that are part of TensorRT-LLM:

0 commit comments

Comments
 (0)