Update TensorRT-LLM #1358

Shixiaowei02 · 2024-03-26T12:20:31Z

Model Support
- Support Smaug-72B-v0.1
Features
- Medusa IFB support
- [Experimental] Support FP8 FMHA, note that the performance is not optimal, and we will keep optimizing it
- [BREAKING CHANGE] Support embedding sharing for Gemma
API
- [BREAKING CHANGE] Remove LoRA related parameters from convert checkpoint scripts
- [BREAKING CHANGE] Simplify Qwen convert checkpoint script
- Add advanced and multi-GPU examples for Python binding of executor C++ API, see examples/bindings/README.md
- High-level API, please refer to examples/high-level-api/README.md for guide
  - [BREAKING CHANGE] Reuse the QuantConfig used in trtllm-build tool, support broader quantization features
  - Add support for TensorRT-LLM checkpoint as model input
  - Refine SamplingConfig used in LLM.generate or LLM.generate_async APIs, with the support of beam search, a variety of penalties, and more features
  - Add support for the StreamingLLM feature, enable it by setting LLM(streaming_llm=...)
Bug fixes
- Fix the issue that ModelRunnerCpp does not transfer SamplingConfig tensor fields correctly ModelRunnerCpp does not transfer SamplingConfig Tensor fields correctly #1183
Benchmark
- Add percentile latency report to gptManagerBenchmark
Performance
- Improve custom all-reduce kernel

Co-authored-by: Kaiyu <26294424+kaiyux@users.noreply.github.com>

Update TensorRT-LLM

2709f5e

Shixiaowei02 requested a review from kaiyux March 26, 2024 12:20

kaiyux approved these changes Mar 26, 2024

View reviewed changes

Shixiaowei02 merged commit 850b6fa into main Mar 26, 2024

Shixiaowei02 deleted the kaiyu/update branch March 26, 2024 12:47

wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025

Update TensorRT-LLM (NVIDIA#1358)

ab3f07c

Co-authored-by: Kaiyu <26294424+kaiyux@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update TensorRT-LLM #1358

Update TensorRT-LLM #1358

Uh oh!

Shixiaowei02 commented Mar 26, 2024 •

edited

Loading

Labels

3 participants

Update TensorRT-LLM #1358

Update TensorRT-LLM #1358

Uh oh!

Conversation

Shixiaowei02 commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

3 participants

Shixiaowei02 commented Mar 26, 2024 •

edited

Loading