Skip to content
#

gpu-optimization

Here are 18 public repositories matching this topic...

The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA

  • Updated Jun 29, 2024
  • Python

🤖 Ollama Consumer - A Python-based interactive chat interface for Ollama models with advanced model management, comprehensive benchmarking, vision support, and automatic error recovery. Features dynamic model switching, GPU optimization, and intelligent service monitoring for seamless AI model interactions.

  • Updated Aug 6, 2025
  • Python

A high-performance kernel implementation of multi-head attention using Triton. Focused on minimizing memory overhead and maximizing throughput for large-scale transformer layers. Includes clean-tensor layouts, head-grouping optimisations, and ready-to-benchmark code you can plug into custom models.

  • Updated Aug 12, 2025
  • Python

Improve this page

Add a description, image, and links to the gpu-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-optimization topic, visit your repo's landing page and select "manage topics."

Learn more