Nimbus is a self-hosted CI platform built around Firecracker microVMs, org-scoped storage, and end-to-end observability for AI evaluation workloads. It replaces GitHub-hosted runners while keeping execution on infrastructure you control.
- Getting Started
 - Configuration Reference
 - Operations Guide
 - Onboarding Playbook
 - GitHub Actions Migration Guide
 - ROI Calculator
 - Firecracker Security Hardening
 - ClickHouse Schema
 - Runbook
 - Policy-as-Code
 
- Control Plane (
src/nimbus/control_plane): Validates GitHubworkflow_jobwebhooks (HMAC, timestamp, delivery replay fence), enforces distributed per-org rate limits, issues agent/cache tokens, and brokers job leases over Redis + Postgres. It also exposes SAML SSO, SCIM provisioning, service accounts, and compliance export logging. - Host Agent (
src/nimbus/host_agent): Runs Firecracker microVMs with snapshot boot and network fencing, plus Docker and GPU executors selected by capability labels. The agent layers in warm pools, resource/performance telemetry, offline egress enforcement, SBOM generation, and supply-chain allow/deny policies. - Executor System (
src/nimbus/runners): Pluggable executors share a commonExecutorprotocol, pool manager, resource tracker, and watchdogs for timeouts and lease renewal fencing. - Cache Proxy (
src/nimbus/cache_proxy): Multi-tenant cache front-end with S3/local backends, org quotas, eviction metrics, and circuit-breakers to isolate backend failures. - Docker Layer Cache (
src/nimbus/docker_cache): Minimal OCI registry enforcing org-prefixed repositories, blob accounting, and scoped cache tokens for push/pull. - Logging Pipeline (
src/nimbus/logging_pipeline): ClickHouse-backed ingestion API with batched writes, scoped query filters, and hardened metrics endpoints. - Web Dashboard (
web): React/Vite SPA that surfaces job queues, agent health, logs, and compliance metadata via the public API. 
- Lease fencing with rotating fence tokens prevents duplicate job execution across agents.
 - Webhook signature + timestamp validation with replay tracking (
x-github-delivery) blocks tampering and replays. - Agent/cache/service-account tokens are org scoped, versioned, and auditable through Postgres-backed ledgers.
 - Offline-mode egress enforcement combines metadata endpoint deny-lists, regex policy packs, and explicit registry allow-lists.
 - Rootfs attestation, cosign provenance checks, and per-job SBOM generation tighten host supply-chain posture.
 - Metrics and admin endpoints require bearer tokens and can be IP-filtered for additional hardening.
 
- Install dependencies and bootstrap the environment (
uv venv,uv pip install -e .). - Configure the required environment variables and secrets (see Configuration).
 - Follow the detailed setup in Getting Started to launch services and supporting infrastructure.
 - Run the automated checks with 
uv run pytestto validate control plane, host agent, caching, and executor integrations. 
Workflows can target Nimbus runners using capability-based labels:
# Secure isolation (default) runs-on: [nimbus] # Uses Firecracker microVMs # Fast startup for CI/CD runs-on: [nimbus, docker] # ~200ms startup # GPU acceleration for ML/AI runs-on: [nimbus, gpu, pytorch, gpu-count:2] # 2 GPUs # Custom configurations runs-on: [nimbus, docker, image:node:18-alpine]The control plane verifies workflow_job signatures, enforces per-org rate limits, and dispatches jobs to agents based on capability matching.
Nimbus publishes curated container images, such as nimbus/ai-eval-runner (Node.js, eval2otel, ollama client) for evaluation workloads. See Getting Started for example usage.
- Day-two procedures, monitoring, and ClickHouse schema live in the Operations Guide.
 - Firecracker jailer, seccomp, and capability dropping guidance is documented in Firecracker Security Hardening.
 
src/nimbus/control_plane: FastAPI application, database models, RBAC/SCIM/SAML integrations, compliance tooling.src/nimbus/host_agent: Firecracker launcher, multi-executor orchestration, warm pools, egress enforcement, and SSH utilities.src/nimbus/runners: Executor implementations (Firecracker, Docker, GPU), pool manager, resource tracker, performance monitor.src/nimbus/cache_proxy&src/nimbus/docker_cache: Artifact/cache services with metrics, quota enforcement, and S3/OCI backends.src/nimbus/logging_pipeline: ClickHouse ingestion and querying service for job logs.web: Vite/React dashboard for operational monitoring.tests: Extensive pytest suite covering services, executors, security controls, and CLI tooling.
- Complete: Multi-tenant isolation, lease fencing, webhook replay protection, distributed rate limiting, metrics endpoint authentication, tenant analytics dashboard, multi-executor system with Firecracker/Docker/GPU support, warm pools, snapshot boot, comprehensive performance monitoring.
 - In Progress: Enhanced GPU scheduling, ARM64 support, advanced resource optimization.
 - Planned: Kubernetes executor, Windows containers, auto-scaling warm pools, cost optimization features.
 
Nimbus is ready for production deployments with a mature multi-executor architecture. See the Executor System Guide for comprehensive usage documentation. Contributions welcome in:
- New executor implementations (Kubernetes, ARM64, Windows)
 - Advanced GPU scheduling and optimization
 - Performance analysis and cost optimization
 - Extended warm pool strategies