liyanboSustech (Yanbo Li) · GitHub

Popular repositories Loading

Diff-cache Diff-cache Public

Forked from xdit-project/xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1
llama.cpp llama.cpp Public

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++
tensorrtx tensorrtx Public

Forked from wang-xinyu/tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

C++
InfiniGen InfiniGen Public

Forked from snu-comparch/InfiniGen

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python
H2O H2O Public

Forked from FMInference/H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python
prompt-cache prompt-cache Public

Forked from yale-sys/prompt-cache

Modular and structured prompt caching for low-latency LLM inference

Python