Posts by Anjali Shah
Agentic AI / Generative AI Mar 18, 2025
NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance
NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over... 14 MIN READ
Agentic AI / Generative AI Feb 14, 2025
Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding
Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,... 7 MIN READ
Agentic AI / Generative AI Jan 16, 2025
Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM
Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the... 7 MIN READ
Agentic AI / Generative AI Dec 17, 2024
Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding
Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only... 8 MIN READ
Agentic AI / Generative AI Dec 11, 2024
NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight Batching
NVIDIA recently announced that NVIDIA TensorRT-LLM now accelerates encoder-decoder model architectures. TensorRT-LLM is an open-source library that optimizes... 4 MIN READ
Agentic AI / Generative AI Dec 02, 2024
TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x
NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ