massif-01 / vllm_benchmark_block_fp8 Star 1 Code Issues Pull requests Automated Triton w8a8 block FP8 kernel tuning tool for vLLM. Auto-detects model architecture, supports Qwen3-Coder-30B-A3B-Instruct-FP8/DeepSeek-V3/custom models, multi-GPU parallel tuning, and generates optimized kernel configs for quantization. triton performance-tuning kernel-tuning fp8 vllm Updated Oct 31, 2025 Python
NikitaZelenskis / LLM-Kernel-Tuner Star 1 Code Issues Pull requests A package for automated kernel tuning with LLMs. python framework gpu cuda auto-tuning kernel-tuning llm llm-tools llm-agents Updated Oct 24, 2025 Python