- Shenzhen Guangdong, China
- 08:14
(UTC +08:00)
Popular repositories Loading
- Diff-cache
Diff-cache PublicForked from xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Python 1
-
- tensorrtx
tensorrtx PublicForked from wang-xinyu/tensorrtx
Implementation of popular deep learning networks with TensorRT network definition API
C++
- InfiniGen
InfiniGen PublicForked from snu-comparch/InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Python
- H2O
H2O PublicForked from FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Python
- prompt-cache
prompt-cache PublicForked from yale-sys/prompt-cache
Modular and structured prompt caching for low-latency LLM inference
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.