unix1986

Pinned Loading

ATen ATen Public

Forked from zdevito/ATen

ATen: A TENsor library for C++11

C++
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
TensorRT-LLM TensorRT-LLM Public

Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++
zhihu/ZhiLight zhihu/ZhiLight Public

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 900 103
zhihu/rucene zhihu/rucene Public

Rust port of Lucene

Rust 1k 63
cutlass cutlass Public

Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++