Posted on Jun 24

Building an Online Code Compiler: A Complete Guide

#go #webdev #programming #distributedsystems

By the Toki Space Team

Creating a production-grade online code compiler is one of the most complex and demanding projects in distributed systems. It calls for expertise in container orchestration, job queuing, real-time data flow, security sandboxing, and precise resource control. Unlike basic code execution tools, a full-fledged compiler platform must support multiple programming languages, manage parallel executions, stream outputs live, and recover gracefully from errors.

In this tutorial, you’ll learn how to build a fully functional online compiler using Docker for containerization, RabbitMQ for managing execution jobs, Redis for real-time communication, and React for the user interface. We’ll walk through the architecture, implementation, and deployment process—drawing from real-world experience building the code execution system behind Toki Space.
Please forgive me, if there are code execution errors. This is mainly meant to give you an idea of how it works under the hood. Let's dive in.

Architecture Overview

System Components

Our online code compiler consists of five main components working together:

┌─────────────────┐ WebSocket ┌─────────────────┐ │ React │ ◄──────────────► │ Node.js │ │ Frontend │ │ Backend │ │ │ HTTP/REST │ │ └─────────────────┘ ◄──────────────► └─────────────────┘ │ │ Jobs ▼ ┌─────────────────┐ ┌─────────────────┐ │ Redis │ │ RabbitMQ │ │ (Streaming) │ │ (Job Queue) │ └─────────────────┘ └─────────────────┘ ▲ │ │ Results │ Jobs │ ▼ └─────────── ┌─────────────────┐ ◄───── │ Runner │ │ Service │ │ (Go) │ └─────────────────┘ │ │ Docker API ▼ ┌─────────────────┐ │ Docker │ │ Containers │ │ (Multi-lang) │ └─────────────────┘

Core Technologies

Frontend: React with TypeScript for the user interface
Backend: Node.js with Express for API and WebSocket handling
Runner Service: Go service for container management and code execution
Message Queue: RabbitMQ for reliable job distribution
Streaming: Redis for real-time output streaming
Containers: Docker for secure code execution isolation

Design Principles

Language Agnostic: Support for Python, Node.js, Go, Rust, Java, and more
Secure Isolation: Each execution runs in a separate Docker container
Real-time Feedback: Stream output as code executes
Scalable Architecture: Horizontal scaling through message queues
Fault Tolerance: Graceful handling of failures and timeouts

Docker Container System

The Challenge of Multi-Language Execution

Running user code safely requires solving several complex problems:

Security Isolation: Prevent malicious code from accessing the host system
Resource Limits: Control CPU, memory, and execution time
Environment Setup: Provide language-specific tools and dependencies
Cleanup: Remove containers and workspaces after execution

Container-Per-Language Architecture

Instead of spinning up new containers for each execution (which is slow), we use persistent containers per language:

type LanguageVM struct { Language string ContainerName string IsRunning bool WorkspaceDir string mutex sync.Mutex } type Manager struct { config config.FirecrackerConfig logger *logrus.Logger vms map[string]*LanguageVM vmsMutex sync.RWMutex }

Benefits of Persistent Containers:

Fast Execution: No container startup overhead
Warm Environments: Dependencies already installed
Resource Efficiency: Reuse container resources
Consistent State: Predictable execution environment

Container Initialization

Each language gets its own persistent container with pre-installed tools:

func (m *Manager) initializePersistentContainers() error { m.logger.Info("Initializing persistent containers for all languages...") for language := range m.config.Environments { m.logger.Infof("Starting persistent container for language: %s", language) vm := &LanguageVM{ Language: language, ContainerName: fmt.Sprintf("runner-vm-%s", language), WorkspaceDir: filepath.Join(m.config.WorkspaceDir, language), IsRunning: false, } // Create language-specific workspace directory if err := os.MkdirAll(vm.WorkspaceDir, 0755); err != nil { return fmt.Errorf("failed to create workspace directory for %s: %w", language, err) } // Start the container if err := m.startPersistentContainer(vm); err != nil { m.logger.Errorf("Failed to start container for %s: %v", language, err) continue } m.vms[language] = vm } return nil }

Language-Specific Execution

Each language requires different setup and execution commands:

func (m *Manager) executeInPersistentContainer(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) { var execCmd []string workspacePath := fmt.Sprintf("/tmp/workspaces/%s/%s", vm.Language, req.JobID) switch vm.Language { case "python": entryPoint := req.EntryPoint if entryPoint == "" { entryPoint = "main.py" } execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && if [ -f requirements.txt ]; then pip install -r requirements.txt; fi && timeout 30 python %s", workspacePath, entryPoint)} case "nodejs": entryPoint := req.EntryPoint if entryPoint == "" { entryPoint = "index.js" } execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && if [ -f package.json ]; then npm install; fi && timeout 30 node %s", workspacePath, entryPoint)} case "go": execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && go mod tidy 2>/dev/null || true && timeout 30 go run .", workspacePath)} case "rust": execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && timeout 30 cargo run", workspacePath)} case "java": entryPoint := req.EntryPoint if entryPoint == "" { entryPoint = "Main.java" } className := strings.TrimSuffix(entryPoint, ".java") execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && javac %s && timeout 30 java %s", workspacePath, entryPoint, className)} } // Execute with timeout execCtx, cancel := context.WithTimeout(ctx, 30*time.Second) defer cancel() cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...) output, err := cmd.CombinedOutput() result := &ExecutionResult{ JobID: req.JobID, WorkspaceID: req.WorkspaceID, Output: string(output), ExitCode: 0, } if err != nil { result.Error = err.Error() result.ExitCode = 1 } return result, nil }

Key Implementation Details:

Workspace Isolation: Each job gets its own directory within the container
Dependency Management: Automatic installation of requirements.txt, package.json, etc.
Timeout Protection: 30-second execution limit prevents infinite loops
Error Handling: Capture both stdout and stderr for complete output

Message Queue Integration

Why RabbitMQ for Code Execution?

Code execution jobs have specific requirements that make RabbitMQ ideal:

Reliability: Jobs must not be lost if a worker crashes
Durability: Job queue survives server restarts
Fair Distribution: Distribute jobs evenly across worker instances
Dead Letter Queues: Handle failed jobs gracefully
Priority Queues: Support urgent job execution

RabbitMQ Topology

Our message queue setup uses a topic exchange for flexible routing:

# Exchange Configuration exchanges: jobs: name: "code-execution.jobs" type: "topic" durable: true auto_delete: false results: name: "code-execution.results" type: "topic" durable: true auto_delete: false dead_letter: name: "code-execution.dead-letter" type: "direct" durable: true auto_delete: false # Queue Configuration queues: job_prefix: "jobs" result_prefix: "results" dead_letter_suffix: "dlq"

Routing Keys Pattern:

jobs.python - Python execution jobs
jobs.nodejs - Node.js execution jobs
jobs.go - Go execution jobs
results.python - Python execution results
results.nodejs - Node.js execution results

Job Message Format

Standardized job messages ensure compatibility across services:

interface ExecutionJob { job_id: string; // Unique identifier workspace_id: string; // User workspace language: string; // Target language source_code: string; // Code to execute entry_point?: string; // Main file (optional) dependencies?: string[]; // Package dependencies timeout?: number; // Execution timeout memory_limit?: number; // Memory limit in MB }

Producer Implementation (Node.js Backend)

const amqp = require('amqplib'); class JobProducer { constructor(rabbitmqUrl) { this.rabbitmqUrl = rabbitmqUrl; this.connection = null; this.channel = null; } async connect() { this.connection = await amqp.connect(this.rabbitmqUrl); this.channel = await this.connection.createChannel(); // Declare exchanges await this.channel.assertExchange('code-execution.jobs', 'topic', { durable: true }); await this.channel.assertExchange('code-execution.results', 'topic', { durable: true }); } async submitJob(job) { const routingKey = `jobs.${job.language}`; const jobMessage = { job_id: job.job_id, workspace_id: job.workspace_id, language: job.language, url: this.createDataURL(job.source_code, job.entry_point), entry_point: job.entry_point }; await this.channel.publish( 'code-execution.jobs', routingKey, Buffer.from(JSON.stringify(jobMessage)), { persistent: true, messageId: job.job_id, timestamp: Date.now() } ); console.log(`Job ${job.job_id} submitted for ${job.language}`); } createDataURL(sourceCode, filename = 'main.py') { // Create inline data URL for source code return `data:text/plain;filename=${filename},${sourceCode}`; } async close() { if (this.channel) await this.channel.close(); if (this.connection) await this.connection.close(); } }

Consumer Implementation (Go Runner Service)

func (m *Manager) consumeRabbitMQMessages(config RabbitMQConfig) error { conn, err := amqp.Dial(config.URL) if err != nil { return fmt.Errorf("failed to connect to RabbitMQ: %w", err) } defer conn.Close() ch, err := conn.Channel() if err != nil { return fmt.Errorf("failed to open channel: %w", err) } defer ch.Close() // Declare unified queue for all languages jobQueue, err := ch.QueueDeclare( "jobs.all-languages", // name true, // durable false, // delete when unused false, // exclusive false, // no-wait nil, // arguments ) if err != nil { return fmt.Errorf("failed to declare job queue: %w", err) } // Bind queue to exchange for each language languages := []string{"python", "nodejs", "go", "rust", "java"} for _, lang := range languages { err = ch.QueueBind( jobQueue.Name, fmt.Sprintf("jobs.%s", lang), "code-execution.jobs", false, nil, ) if err != nil { return fmt.Errorf("failed to bind queue for %s: %w", lang, err) } } // Set QoS for fair distribution err = ch.Qos(1, 0, false) if err != nil { return fmt.Errorf("failed to set QoS: %w", err) } // Start consuming msgs, err := ch.Consume( jobQueue.Name, "", // consumer tag false, // auto-ack false, // exclusive false, // no-local false, // no-wait nil, // args ) if err != nil { return fmt.Errorf("failed to register consumer: %w", err) } for msg := range msgs { go m.processJob(msg) } return nil } func (m *Manager) processJob(msg amqp.Delivery) { var execReq ExecutionRequest if err := json.Unmarshal(msg.Body, &execReq); err != nil { m.logger.WithError(err).Error("Failed to parse job message") msg.Nack(false, false) // Don't requeue invalid messages return } // Execute the job ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second) result, err := m.ExecuteCode(ctx, execReq) cancel() if err != nil { m.logger.WithError(err).Error("Job execution failed") result = &ExecutionResult{ JobID: execReq.JobID, Output: "", Error: err.Error(), ExitCode: 1, } } // Publish result resultBytes, _ := json.Marshal(result) err = msg.Channel.Publish( "code-execution.results", fmt.Sprintf("results.%s", execReq.Language), false, // mandatory false, // immediate amqp.Publishing{ ContentType: "application/json", Body: resultBytes, }) if err != nil { m.logger.WithError(err).Error("Failed to publish result") msg.Nack(false, true) // Requeue on publish failure } else { msg.Ack(false) // Acknowledge successful processing } }

Real-time Streaming

The Challenge of Live Output

Unlike traditional job systems that return results after completion, code execution benefits from real-time output streaming:

User Experience: Immediate feedback as code runs
Long-Running Jobs: Show progress for lengthy operations
Debugging: See output line-by-line for easier debugging
Interactive Input: Support for programs requiring user input

Redis Pub/Sub for Streaming

Redis provides excellent pub/sub capabilities for real-time streaming:

// Redis streaming setup const redis = require('redis'); class StreamingService { constructor(redisUrl) { this.publisher = redis.createClient({ url: redisUrl }); this.subscriber = redis.createClient({ url: redisUrl }); } async connect() { await this.publisher.connect(); await this.subscriber.connect(); } // Publish output line from runner service async publishOutput(jobId, line, stream = 'stdout') { const message = { job_id: jobId, line: line, stream: stream, timestamp: new Date().toISOString() }; await this.publisher.publish( `execution:${jobId}`, JSON.stringify(message) ); } // Subscribe to job output async subscribeToJob(jobId, callback) { await this.subscriber.subscribe(`execution:${jobId}`, (message) => { const data = JSON.parse(message); callback(data); }); } async unsubscribeFromJob(jobId) { await this.subscriber.unsubscribe(`execution:${jobId}`); } }

WebSocket Integration

Connect Redis streams to frontend via WebSocket:

// WebSocket server integration const WebSocket = require('ws'); class WebSocketManager { constructor(server, streamingService) { this.wss = new WebSocket.Server({ server }); this.streamingService = streamingService; this.connections = new Map(); // jobId -> Set of WebSocket connections this.setupWebSocketHandlers(); } setupWebSocketHandlers() { this.wss.on('connection', (ws) => { ws.on('message', async (data) => { const message = JSON.parse(data); switch (message.type) { case 'subscribe': await this.subscribeToJobOutput(ws, message.job_id); break; case 'unsubscribe': await this.unsubscribeFromJobOutput(ws, message.job_id); break; } }); ws.on('close', () => { this.cleanupConnection(ws); }); }); } async subscribeToJobOutput(ws, jobId) { // Add connection to job subscription if (!this.connections.has(jobId)) { this.connections.set(jobId, new Set()); // Subscribe to Redis stream for this job await this.streamingService.subscribeToJob(jobId, (data) => { // Broadcast to all subscribed WebSocket connections const connections = this.connections.get(jobId); if (connections) { connections.forEach(conn => { if (conn.readyState === WebSocket.OPEN) { conn.send(JSON.stringify({ type: 'output', data: data })); } }); } }); } this.connections.get(jobId).add(ws); } async unsubscribeFromJobOutput(ws, jobId) { const connections = this.connections.get(jobId); if (connections) { connections.delete(ws); // If no more connections, unsubscribe from Redis if (connections.size === 0) { await this.streamingService.unsubscribeFromJob(jobId); this.connections.delete(jobId); } } } cleanupConnection(ws) { // Remove connection from all job subscriptions for (const [jobId, connections] of this.connections.entries()) { connections.delete(ws); if (connections.size === 0) { this.streamingService.unsubscribeFromJob(jobId); this.connections.delete(jobId); } } } }

Enhanced Runner with Streaming

Modify the runner service to stream output line-by-line:

func (m *Manager) executeWithStreaming(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) { // ... command setup ... cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...) // Create pipes for real-time output capture stdout, err := cmd.StdoutPipe() if err != nil { return nil, fmt.Errorf("failed to create stdout pipe: %w", err) } stderr, err := cmd.StderrPipe() if err != nil { return nil, fmt.Errorf("failed to create stderr pipe: %w", err) } // Start the command if err := cmd.Start(); err != nil { return nil, fmt.Errorf("failed to start command: %w", err) } // Stream output in real-time var outputBuffer strings.Builder var wg sync.WaitGroup wg.Add(2) // Stream stdout go func() { defer wg.Done() scanner := bufio.NewScanner(stdout) for scanner.Scan() { line := scanner.Text() outputBuffer.WriteString(line + "\n") // Publish to Redis for real-time streaming m.publishStreamingOutput(req.JobID, line, "stdout") } }() // Stream stderr go func() { defer wg.Done() scanner := bufio.NewScanner(stderr) for scanner.Scan() { line := scanner.Text() outputBuffer.WriteString(line + "\n") // Publish to Redis for real-time streaming m.publishStreamingOutput(req.JobID, line, "stderr") } }() // Wait for command completion err = cmd.Wait() wg.Wait() // Wait for all output to be processed result := &ExecutionResult{ JobID: req.JobID, WorkspaceID: req.WorkspaceID, Output: outputBuffer.String(), ExitCode: 0, } if err != nil { result.Error = err.Error() result.ExitCode = 1 } return result, nil } func (m *Manager) publishStreamingOutput(jobID, line, stream string) { // This would integrate with your Redis client message := StreamingOutput{ JobID: jobID, Line: line, Stream: stream, Timestamp: time.Now(), } // Publish to Redis (implementation depends on your Redis client) // m.redisClient.Publish(fmt.Sprintf("execution:%s", jobID), message) }

This streaming approach provides real-time feedback to users, making the code execution feel immediate and interactive rather than a black-box operation.

Backend Implementation

Node.js API Server

The backend serves as the orchestration layer between the frontend and the execution infrastructure:

const express = require('express'); const WebSocket = require('ws'); const { v4: uuidv4 } = require('uuid'); const cors = require('cors'); class CodeExecutionAPI { constructor() { this.app = express(); this.server = null; this.jobProducer = null; this.streamingService = null; this.wsManager = null; this.activeJobs = new Map(); // Track running jobs this.setupMiddleware(); this.setupRoutes(); } setupMiddleware() { this.app.use(cors()); this.app.use(express.json({ limit: '10mb' })); this.app.use(express.urlencoded({ extended: true })); // Request logging this.app.use((req, res, next) => { console.log(`${req.method} ${req.path} - ${new Date().toISOString()}`); next(); }); } setupRoutes() { // Health check this.app.get('/health', (req, res) => { res.json({ status: 'ok', timestamp: new Date().toISOString() }); }); // Submit code execution job this.app.post('/api/execute', async (req, res) => { try { const result = await this.handleCodeExecution(req.body); res.json(result); } catch (error) { console.error('Execution error:', error); res.status(500).json({ error: 'Execution failed', message: error.message }); } }); // Get job status this.app.get('/api/jobs/:jobId', (req, res) => { const jobId = req.params.jobId; const job = this.activeJobs.get(jobId); if (!job) { return res.status(404).json({ error: 'Job not found' }); } res.json(job); }); // List active jobs this.app.get('/api/jobs', (req, res) => { const jobs = Array.from(this.activeJobs.values()); res.json({ jobs, count: jobs.length }); }); // Cancel job this.app.delete('/api/jobs/:jobId', async (req, res) => { const jobId = req.params.jobId; await this.cancelJob(jobId); res.json({ message: 'Job cancelled' }); }); } async handleCodeExecution(requestBody) { const { language, source_code, entry_point, workspace_id = 'default', timeout = 30 } = requestBody; // Validate request if (!language || !source_code) { throw new Error('Language and source_code are required'); } const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java']; if (!supportedLanguages.includes(language)) { throw new Error(`Unsupported language: ${language}`); } // Generate unique job ID const jobId = uuidv4(); // Create job record const job = { job_id: jobId, workspace_id, language, source_code, entry_point, timeout, status: 'queued', created_at: new Date().toISOString(), updated_at: new Date().toISOString() }; this.activeJobs.set(jobId, job); // Submit to RabbitMQ await this.jobProducer.submitJob(job); // Update status job.status = 'submitted'; job.updated_at = new Date().toISOString(); return { job_id: jobId, status: 'submitted', message: 'Job submitted for execution', stream_url: `/stream/${jobId}` }; } async cancelJob(jobId) { const job = this.activeJobs.get(jobId); if (job && job.status === 'running') { // Send cancellation signal (implementation depends on your setup) job.status = 'cancelled'; job.updated_at = new Date().toISOString(); } } async start(port = 3001) { // Initialize services await this.initializeServices(); // Start HTTP server this.server = this.app.listen(port, () => { console.log(`Code execution API running on port ${port}`); }); // Setup WebSocket manager this.wsManager = new WebSocketManager(this.server, this.streamingService); // Setup result consumer this.setupResultConsumer(); } async initializeServices() { // Initialize RabbitMQ producer this.jobProducer = new JobProducer(process.env.RABBITMQ_URL); await this.jobProducer.connect(); // Initialize Redis streaming this.streamingService = new StreamingService(process.env.REDIS_URL); await this.streamingService.connect(); console.log('All services initialized successfully'); } setupResultConsumer() { // Consumer for job results from RabbitMQ const amqp = require('amqplib'); amqp.connect(process.env.RABBITMQ_URL) .then(conn => conn.createChannel()) .then(ch => { // Declare results queue return ch.assertQueue('results.all', { durable: true }) .then(() => { // Bind to results exchange return ch.bindQueue('results.all', 'code-execution.results', 'results.*'); }) .then(() => { // Consume results return ch.consume('results.all', (msg) => { if (msg) { this.handleJobResult(JSON.parse(msg.content.toString())); ch.ack(msg); } }); }); }) .catch(console.error); } handleJobResult(result) { const job = this.activeJobs.get(result.job_id); if (job) { job.status = result.exit_code === 0 ? 'completed' : 'failed'; job.result = result; job.updated_at = new Date().toISOString(); // Optionally clean up completed jobs after some time setTimeout(() => { this.activeJobs.delete(result.job_id); }, 300000); // 5 minutes } } async stop() { if (this.server) { this.server.close(); } if (this.jobProducer) { await this.jobProducer.close(); } if (this.streamingService) { await this.streamingService.close(); } } } // Environment configuration const config = { port: process.env.PORT || 3001, rabbitmq_url: process.env.RABBITMQ_URL || 'amqp://admin:admin123@localhost:5672', redis_url: process.env.REDIS_URL || 'redis://localhost:6379' }; // Start the server const api = new CodeExecutionAPI(); api.start(config.port); // Graceful shutdown process.on('SIGINT', async () => { console.log('Shutting down gracefully...'); await api.stop(); process.exit(0); });

Express Middleware for Code Validation

Add security and validation middleware:

const rateLimit = require('express-rate-limit'); const validator = require('validator'); // Rate limiting for code execution const executionLimiter = rateLimit({ windowMs: 60 * 1000, // 1 minute max: 10, // Maximum 10 executions per minute per IP message: 'Too many execution requests, please try again later', standardHeaders: true, legacyHeaders: false }); // Code validation middleware const validateCodeRequest = (req, res, next) => { const { language, source_code, entry_point } = req.body; // Check required fields if (!language || !source_code) { return res.status(400).json({ error: 'Missing required fields', required: ['language', 'source_code'] }); } // Validate language const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java']; if (!supportedLanguages.includes(language)) { return res.status(400).json({ error: 'Unsupported language', supported: supportedLanguages }); } // Check code length (prevent abuse) if (source_code.length > 100000) { // 100KB limit return res.status(400).json({ error: 'Source code too large', max_size: '100KB' }); } // Validate entry point if provided if (entry_point && !validator.isAlphanumeric(entry_point.replace(/[._-]/g, ''))) { return res.status(400).json({ error: 'Invalid entry point format' }); } // Basic security checks (can be expanded) const dangerousPatterns = [ /rm\s+-rf/, /sudo/, /passwd/, /\/etc\/passwd/, /mkfs/, /format/, /del\s+\/[a-z]/i ]; for (const pattern of dangerousPatterns) { if (pattern.test(source_code)) { return res.status(400).json({ error: 'Code contains potentially dangerous operations' }); } } next(); }; // Apply middleware to execution endpoint app.post('/api/execute', executionLimiter, validateCodeRequest, async (req, res) => { // ... execution logic } );

Frontend Development

React Code Execution Component

Based on the CodeRunnerDemo.tsx, here's a production-ready implementation:

import React, { useState, useEffect, useCallback, useRef } from 'react'; import { Button } from '@/components/ui/button'; import { Card, CardContent } from '@/components/ui/card'; import { Badge } from '@/components/ui/badge'; import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs'; import { Play, Square, Copy, Download, Settings } from 'lucide-react'; import CodeMirror from '@uiw/react-codemirror'; import { githubDark } from '@uiw/codemirror-theme-github'; import { javascript } from '@codemirror/lang-javascript'; import { python } from '@codemirror/lang-python'; import { rust } from '@codemirror/lang-rust'; import { go } from '@codemirror/lang-go'; import { java } from '@codemirror/lang-java'; interface ExecutionResult { job_id: string; status: string; output?: string; error?: string; exit_code?: number; duration?: number; } interface OutputLine { line: string; stream: 'stdout' | 'stderr'; timestamp: string; } const LANGUAGES = { python: { ext: python(), icon: '🐍', name: 'Python', starter: 'print("Hello, World!")' }, javascript: { ext: javascript(), icon: '⚡', name: 'JavaScript', starter: 'console.log("Hello, World!");' }, go: { ext: go(), icon: '🔵', name: 'Go', starter: 'package main\n\nimport "fmt"\n\nfunc main() {\n fmt.Println("Hello, World!")\n}' }, rust: { ext: rust(), icon: '🦀', name: 'Rust', starter: 'fn main() {\n println!("Hello, World!");\n}' }, java: { ext: java(), icon: '☕', name: 'Java', starter: 'public class Main {\n public static void main(String[] args) {\n System.out.println("Hello, World!");\n }\n}' } }; export function CodeExecutor() { const [selectedLanguage, setSelectedLanguage] = useState('python'); const [code, setCode] = useState(LANGUAGES.python.starter); const [isExecuting, setIsExecuting] = useState(false); const [output, setOutput] = useState<OutputLine[]>([]); const [executionResult, setExecutionResult] = useState<ExecutionResult | null>(null); const [currentJobId, setCurrentJobId] = useState<string | null>(null); const wsRef = useRef<WebSocket | null>(null); const outputRef = useRef<HTMLDivElement>(null); // WebSocket connection for real-time output const connectWebSocket = useCallback((jobId: string) => { const wsUrl = `ws://localhost:3001/stream/${jobId}`; wsRef.current = new WebSocket(wsUrl); wsRef.current.onopen = () => { console.log('WebSocket connected for job:', jobId); wsRef.current?.send(JSON.stringify({ type: 'subscribe', job_id: jobId })); }; wsRef.current.onmessage = (event) => { const message = JSON.parse(event.data); if (message.type === 'output') { const outputLine: OutputLine = { line: message.data.line, stream: message.data.stream, timestamp: message.data.timestamp }; setOutput(prev => [...prev, outputLine]); // Auto-scroll to bottom setTimeout(() => { if (outputRef.current) { outputRef.current.scrollTop = outputRef.current.scrollHeight; } }, 10); } }; wsRef.current.onclose = () => { console.log('WebSocket disconnected'); }; wsRef.current.onerror = (error) => { console.error('WebSocket error:', error); }; }, []); // Execute code const executeCode = useCallback(async () => { if (isExecuting) return; setIsExecuting(true); setOutput([]); setExecutionResult(null); try { const response = await fetch('http://localhost:3001/api/execute', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ language: selectedLanguage, source_code: code, workspace_id: 'web-editor' }) }); if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } const result = await response.json(); setCurrentJobId(result.job_id); // Connect WebSocket for streaming output connectWebSocket(result.job_id); // Poll for final result pollJobResult(result.job_id); } catch (error) { console.error('Execution failed:', error); setOutput([{ line: `Error: ${error.message}`, stream: 'stderr', timestamp: new Date().toISOString() }]); setIsExecuting(false); } }, [code, selectedLanguage, isExecuting, connectWebSocket]); // Poll for job completion const pollJobResult = useCallback(async (jobId: string) => { const pollInterval = 1000; // 1 second const maxPolls = 60; // 60 seconds timeout let polls = 0; const poll = async () => { try { const response = await fetch(`http://localhost:3001/api/jobs/${jobId}`); const job = await response.json(); if (job.status === 'completed' || job.status === 'failed') { setExecutionResult(job.result); setIsExecuting(false); // Close WebSocket if (wsRef.current) { wsRef.current.close(); } return; } polls++; if (polls < maxPolls) { setTimeout(poll, pollInterval); } else { // Timeout setIsExecuting(false); setOutput(prev => [...prev, { line: 'Execution timeout - job may still be running', stream: 'stderr', timestamp: new Date().toISOString() }]); } } catch (error) { console.error('Polling error:', error); setIsExecuting(false); } }; setTimeout(poll, pollInterval); }, []); // Stop execution const stopExecution = useCallback(async () => { if (currentJobId) { try { await fetch(`http://localhost:3001/api/jobs/${currentJobId}`, { method: 'DELETE' }); } catch (error) { console.error('Failed to cancel job:', error); } } if (wsRef.current) { wsRef.current.close(); } setIsExecuting(false); setCurrentJobId(null); }, [currentJobId]); // Copy code to clipboard const copyCode = useCallback(() => { navigator.clipboard.writeText(code); }, [code]); // Download output const downloadOutput = useCallback(() => { const outputText = output.map(line => `[${line.timestamp}] ${line.stream}: ${line.line}` ).join('\n'); const blob = new Blob([outputText], { type: 'text/plain' }); const url = URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = `execution-output-${Date.now()}.txt`; a.click(); URL.revokeObjectURL(url); }, [output]); // Language change handler useEffect(() => { setCode(LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].starter); }, [selectedLanguage]); // Cleanup WebSocket on unmount useEffect(() => { return () => { if (wsRef.current) { wsRef.current.close(); } }; }, []); return ( <div className="max-w-6xl mx-auto p-4 space-y-4"> {/* Header */} <div className="flex items-center justify-between"> <h1 className="text-2xl font-bold">Online Code Compiler</h1>  <Badge variant="outline" className="text-sm"> Real-time execution with Docker containers </Badge>  </div>  {/* Language Selection */} <Card> <CardContent className="p-4"> <Tabs value={selectedLanguage} onValueChange={setSelectedLanguage}> <TabsList className="grid w-full grid-cols-5"> {Object.entries(LANGUAGES).map(([lang, config]) => ( <TabsTrigger key={lang} value={lang} className="flex items-center gap-2"> <span>{config.icon}</span>  <span className="hidden sm:inline">{config.name}</span>  </TabsTrigger>  ))} </TabsList>  </Tabs>  </CardContent>  </Card>  {/* Code Editor */} <Card> <CardContent className="p-0"> <div className="border-b p-4 flex items-center justify-between"> <h3 className="font-semibold">Code Editor</h3>  <div className="flex items-center gap-2"> <Button variant="outline" size="sm" onClick={copyCode} > <Copy className="h-4 w-4 mr-2" /> Copy </Button>  <Button onClick={isExecuting ? stopExecution : executeCode} disabled={!code.trim()} variant={isExecuting ? "destructive" : "default"} > {isExecuting ? ( <> <Square className="h-4 w-4 mr-2" /> Stop </>  ) : ( <> <Play className="h-4 w-4 mr-2" /> Run {LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].name} </>  )} </Button>  </div>  </div>  <CodeMirror value={code} onChange={(value) => setCode(value)} theme={githubDark} extensions={[LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].ext]} className="text-sm" basicSetup={{ lineNumbers: true, foldGutter: true, dropCursor: false, allowMultipleSelections: false }} />  </CardContent>  </Card>  {/* Output Panel */} <Card> <CardContent className="p-0"> <div className="border-b p-4 flex items-center justify-between"> <div className="flex items-center gap-2"> <h3 className="font-semibold">Output</h3>  {isExecuting && ( <Badge variant="secondary" className="animate-pulse"> Executing... </Badge>  )} {executionResult && ( <Badge variant={executionResult.exit_code === 0 ? "default" : "destructive"} > Exit code: {executionResult.exit_code} </Badge>  )} </div>  <div className="flex items-center gap-2"> {output.length > 0 && ( <Button variant="outline" size="sm" onClick={downloadOutput} > <Download className="h-4 w-4 mr-2" /> Download </Button>  )} <Button variant="outline" size="sm" onClick={() => setOutput([])} > Clear </Button>  </div>  </div>  <div ref={outputRef} className="h-96 overflow-y-auto p-4 bg-gray-950 text-gray-100 font-mono text-sm" > {output.length === 0 ? ( <div className="text-gray-500 italic"> Click "Run" to execute your code. Output will appear here in real-time. </div>  ) : ( output.map((line, index) => ( <div key={index} className={`whitespace-pre-wrap ${ line.stream === 'stderr' ? 'text-red-400' : 'text-gray-100' }`} > {line.line} </div>  )) )} </div>  </CardContent>  </Card>  {/* Execution Statistics */} {executionResult && ( <Card> <CardContent className="p-4"> <h3 className="font-semibold mb-2">Execution Statistics</h3>  <div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm"> <div> <span className="text-gray-500">Duration:</span>  <div className="font-mono">{executionResult.duration || 0}ms</div>  </div>  <div> <span className="text-gray-500">Exit Code:</span>  <div className="font-mono">{executionResult.exit_code}</div>  </div>  <div> <span className="text-gray-500">Language:</span>  <div className="font-mono">{selectedLanguage}</div>  </div>  <div> <span className="text-gray-500">Job ID:</span>  <div className="font-mono text-xs">{executionResult.job_id}</div>  </div>  </div>  </CardContent>  </Card>  )} </div>  ); }

Security & Isolation

The Security Challenge

Running arbitrary user code presents significant security risks:

Host System Access: Malicious code could access the host filesystem
Network Attacks: Code could scan internal networks or launch attacks
Resource Exhaustion: Infinite loops or memory bombs could crash servers
Data Exfiltration: Code could attempt to steal sensitive information
Privilege Escalation: Attempts to gain root or admin access

Docker Security Model

Docker provides multiple layers of security isolation:

# Docker container security configuration version: '3.8' services: runner-python: image: python:3.11-slim security_opt: - no-new-privileges:true - seccomp:unconfined # May need custom seccomp profile cap_drop: - ALL cap_add: - SETGID - SETUID read_only: true tmpfs: - /tmp:exec,size=100m - /var/tmp:exec,size=100m ulimits: nproc: 64 # Limit number of processes nofile: 1024 # Limit open files memlock: 67108864 # Limit locked memory memory: 256m # Memory limit cpus: '0.5' # CPU limit pids_limit: 100 # Process limit networks: - isolated_network networks: isolated_network: driver: bridge internal: true # No external network access

Advanced Container Security

Implement additional security measures in the runner service:

func (m *Manager) createSecureContainer(language, jobID string) error { containerConfig := &container.Config{ Image: m.config.Environments[language], Cmd: []string{"tail", "-f", "/dev/null"}, Env: []string{ "HOME=/tmp", "USER=runner", "SHELL=/bin/bash", }, WorkingDir: "/tmp/workspace", User: "1000:1000", // Non-root user // Resource limits Memory: 256 * 1024 * 1024, // 256MB MemorySwap: 256 * 1024 * 1024, // No swap CpuShares: 512, // Half CPU priority // Security options SecurityOpts: []string{ "no-new-privileges:true", "seccomp=unconfined", // or custom profile }, // Network isolation NetworkDisabled: true, } hostConfig := &container.HostConfig{ // Resource constraints Resources: container.Resources{ Memory: 256 * 1024 * 1024, MemorySwap: 256 * 1024 * 1024, CPUShares: 512, PidsLimit: 50, Ulimits: []*units.Ulimit{ {Name: "nproc", Soft: 32, Hard: 32}, {Name: "nofile", Soft: 256, Hard: 256}, }, }, // Security ReadonlyRootfs: true, CapDrop: []string{"ALL"}, CapAdd: []string{"SETGID", "SETUID"}, // Temporary filesystems Tmpfs: map[string]string{ "/tmp": "exec,size=100m", "/var/tmp": "exec,size=100m", }, // No privileged access Privileged: false, // Network isolation NetworkMode: "none", } // Create container resp, err := m.dockerClient.ContainerCreate( context.Background(), containerConfig, hostConfig, nil, // networking config nil, // platform fmt.Sprintf("runner-%s-%s", language, jobID), ) if err != nil { return fmt.Errorf("failed to create container: %w", err) } // Start container if err := m.dockerClient.ContainerStart( context.Background(), resp.ID, types.ContainerStartOptions{}, ); err != nil { return fmt.Errorf("failed to start container: %w", err) } return nil }

Code Sanitization

Implement static analysis for dangerous patterns:

class CodeSanitizer { constructor() { this.dangerousPatterns = { filesystem: [ /\bopen\s*\(\s*['"][\/\\]/, // File system access /\bfile\s*\(\s*['"][\/\\]/, // File operations /\bos\.system/, // OS commands /\bsubprocess/, // Process execution /\beval\s*\(/, // Code evaluation /\bexec\s*\(/, // Code execution ], network: [ /\bsocket\s*\(/, // Network sockets /\burllib/, // URL operations /\brequests\./, // HTTP requests /\bhttplib/, // HTTP library /\bfetch\s*\(/, // Fetch API ], system: [ /\bos\.getenv/, // Environment variables /\bprocess\.env/, // Node.js environment /\b__import__/, // Dynamic imports /\brequire\s*\(\s*['"]child_process['"]/, // Child process ] }; } analyze(code, language) { const issues = []; for (const [category, patterns] of Object.entries(this.dangerousPatterns)) { for (const pattern of patterns) { const matches = code.match(pattern); if (matches) { issues.push({ category, pattern: pattern.toString(), match: matches[0], severity: this.getSeverity(category) }); } } } return { safe: issues.length === 0, issues, score: this.calculateSafetyScore(issues) }; } getSeverity(category) { const severityMap = { filesystem: 'high', network: 'medium', system: 'high' }; return severityMap[category] || 'low'; } calculateSafetyScore(issues) { const weights = { high: 10, medium: 5, low: 1 }; const totalWeight = issues.reduce((sum, issue) => sum + weights[issue.severity], 0); return Math.max(0, 100 - totalWeight); } } // Usage in API app.post('/api/execute', validateCodeRequest, async (req, res) => { const sanitizer = new CodeSanitizer(); const analysis = sanitizer.analyze(req.body.source_code, req.body.language); if (!analysis.safe && analysis.score < 50) { return res.status(400).json({ error: 'Code contains potentially dangerous operations', issues: analysis.issues, safety_score: analysis.score }); } // Proceed with execution... });

Network Isolation

Implement network restrictions at multiple levels:

# Docker network setup with restrictions docker network create --driver bridge \ --subnet=172.20.0.0/16 \ --opt com.docker.network.bridge.enable_icc=false \ --opt com.docker.network.bridge.enable_ip_masquerade=false \ isolated-execution # Firewall rules for container network iptables -I DOCKER-USER -s 172.20.0.0/16 -j DROP iptables -I DOCKER-USER -s 172.20.0.0/16 -d 172.20.0.0/16 -j ACCEPT

Performance Optimization

Container Lifecycle Management

Optimize container startup and cleanup:

type ContainerPool struct { pools map[string]*LanguagePool mutex sync.RWMutex logger *logrus.Logger } type LanguagePool struct { language string containers []*ContainerInstance available chan *ContainerInstance maxSize int currentSize int mutex sync.Mutex } type ContainerInstance struct { ID string Language string CreatedAt time.Time LastUsed time.Time InUse bool } func NewContainerPool(maxSize int, logger *logrus.Logger) *ContainerPool { return &ContainerPool{ pools: make(map[string]*LanguagePool), logger: logger, } } func (cp *ContainerPool) GetContainer(language string) (*ContainerInstance, error) { cp.mutex.RLock() pool, exists := cp.pools[language] cp.mutex.RUnlock() if !exists { cp.mutex.Lock() pool = &LanguagePool{ language: language, available: make(chan *ContainerInstance, 10), maxSize: 10, } cp.pools[language] = pool cp.mutex.Unlock() } // Try to get from available pool select { case container := <-pool.available: container.InUse = true container.LastUsed = time.Now() return container, nil default: // Create new container if under limit return cp.createNewContainer(pool) } } func (cp *ContainerPool) ReturnContainer(container *ContainerInstance) { cp.mutex.RLock() pool := cp.pools[container.Language] cp.mutex.RUnlock() container.InUse = false container.LastUsed = time.Now() // Clean container workspace cp.cleanContainerWorkspace(container) // Return to pool select { case pool.available <- container: // Successfully returned to pool default: // Pool is full, destroy container cp.destroyContainer(container) } } func (cp *ContainerPool) cleanContainerWorkspace(container *ContainerInstance) { // Execute cleanup commands in container cleanupCmd := []string{ "docker", "exec", container.ID, "bash", "-c", "rm -rf /tmp/workspace/* 2>/dev/null || true" } exec.Command(cleanupCmd[0], cleanupCmd[1:]...).Run() }

Memory Management

Implement intelligent memory management:

type MemoryManager struct { totalMemory uint64 usedMemory uint64 containerMem map[string]uint64 mutex sync.RWMutex logger *logrus.Logger } func (mm *MemoryManager) AllocateMemory(containerID string, requested uint64) error { mm.mutex.Lock() defer mm.mutex.Unlock() // Check if allocation would exceed limits if mm.usedMemory + requested > mm.totalMemory * 80 / 100 { // 80% threshold return fmt.Errorf("insufficient memory: %d MB requested, %d MB available", requested/1024/1024, (mm.totalMemory-mm.usedMemory)/1024/1024) } mm.usedMemory += requested mm.containerMem[containerID] = requested mm.logger.Infof("Allocated %d MB to container %s", requested/1024/1024, containerID) return nil } func (mm *MemoryManager) ReleaseMemory(containerID string) { mm.mutex.Lock() defer mm.mutex.Unlock() if allocated, exists := mm.containerMem[containerID]; exists { mm.usedMemory -= allocated delete(mm.containerMem, containerID) mm.logger.Infof("Released %d MB from container %s", allocated/1024/1024, containerID) } } func (mm *MemoryManager) GetMemoryStats() map[string]interface{} { mm.mutex.RLock() defer mm.mutex.RUnlock() return map[string]interface{}{ "total_mb": mm.totalMemory / 1024 / 1024, "used_mb": mm.usedMemory / 1024 / 1024, "available_mb": (mm.totalMemory - mm.usedMemory) / 1024 / 1024, "utilization": float64(mm.usedMemory) / float64(mm.totalMemory) * 100, "active_containers": len(mm.containerMem), } }

Load Balancing

Implement intelligent load balancing:

type LoadBalancer struct { workers []*WorkerNode roundRobin int mutex sync.Mutex healthChecker *HealthChecker } type WorkerNode struct { ID string Address string CPU float64 Memory float64 ActiveJobs int MaxJobs int LastSeen time.Time Healthy bool } func (lb *LoadBalancer) SelectWorker(job *ExecutionJob) (*WorkerNode, error) { lb.mutex.Lock() defer lb.mutex.Unlock() healthyWorkers := lb.getHealthyWorkers() if len(healthyWorkers) == 0 { return nil, fmt.Errorf("no healthy workers available") } // Sort by load (CPU + Memory + Active Jobs) sort.Slice(healthyWorkers, func(i, j int) bool { loadI := lb.calculateLoad(healthyWorkers[i]) loadJ := lb.calculateLoad(healthyWorkers[j]) return loadI < loadJ }) // Select least loaded worker selected := healthyWorkers[0] selected.ActiveJobs++ lb.logger.Infof("Selected worker %s (load: %.2f)", selected.ID, lb.calculateLoad(selected)) return selected, nil } func (lb *LoadBalancer) calculateLoad(worker *WorkerNode) float64 { // Weighted load calculation cpuWeight := 0.3 memoryWeight := 0.3 jobWeight := 0.4 cpuLoad := worker.CPU / 100.0 memoryLoad := worker.Memory / 100.0 jobLoad := float64(worker.ActiveJobs) / float64(worker.MaxJobs) return cpuWeight*cpuLoad + memoryWeight*memoryLoad + jobWeight*jobLoad }

Production Deployment

Docker Compose Production Setup

version: '3.8' services: # RabbitMQ cluster rabbitmq: image: rabbitmq:3.12-management hostname: rabbitmq-main environment: RABBITMQ_ERLANG_COOKIE: ${RABBITMQ_COOKIE} RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER} RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASS} RABBITMQ_DEFAULT_VHOST: / volumes: - rabbitmq_data:/var/lib/rabbitmq - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf networks: - backend deploy: replicas: 1 resources: limits: memory: 1G cpus: '0.5' # Redis cluster redis: image: redis:7-alpine command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru volumes: - redis_data:/data networks: - backend deploy: replicas: 1 resources: limits: memory: 512M cpus: '0.25' # API Backend api-backend: build: context: ./backend dockerfile: Dockerfile.production environment: NODE_ENV: production RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/ REDIS_URL: redis://redis:6379 LOG_LEVEL: info RATE_LIMIT_WINDOW: 60000 RATE_LIMIT_MAX: 10 depends_on: - rabbitmq - redis networks: - backend - frontend deploy: replicas: 2 resources: limits: memory: 512M cpus: '0.5' update_config: order: start-first failure_action: rollback # Runner Service runner-service: build: context: ./runner dockerfile: Dockerfile.production environment: RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/ REDIS_URL: redis://redis:6379 LOG_LEVEL: info MAX_CONCURRENT_JOBS: 5 WORKSPACE_DIR: /tmp/workspaces volumes: - /var/run/docker.sock:/var/run/docker.sock - runner_workspaces:/tmp/workspaces depends_on: - rabbitmq - redis networks: - backend - execution deploy: replicas: 3 resources: limits: memory: 2G cpus: '1.0' placement: constraints: - node.role == worker # Frontend frontend: build: context: ./frontend dockerfile: Dockerfile.production environment: REACT_APP_API_URL: http://api-backend:3001 REACT_APP_WS_URL: ws://api-backend:3001 depends_on: - api-backend networks: - frontend deploy: replicas: 2 resources: limits: memory: 256M cpus: '0.25' # Load Balancer nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf - ./ssl:/etc/ssl/certs depends_on: - frontend - api-backend networks: - frontend deploy: replicas: 1 resources: limits: memory: 128M cpus: '0.1' volumes: rabbitmq_data: redis_data: runner_workspaces: networks: frontend: driver: overlay backend: driver: overlay execution: driver: overlay internal: true

Kubernetes Deployment

# namespace.yaml apiVersion: v1 kind: Namespace metadata: name: code-compiler --- # configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: app-config namespace: code-compiler data: RABBITMQ_URL: "amqp://admin:password@rabbitmq:5672/" REDIS_URL: "redis://redis:6379" LOG_LEVEL: "info" MAX_CONCURRENT_JOBS: "5" --- # runner-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: runner-service namespace: code-compiler spec: replicas: 3 selector: matchLabels: app: runner-service template: metadata: labels: app: runner-service spec: containers: - name: runner image: your-registry/runner-service:latest envFrom: - configMapRef: name: app-config resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m" volumeMounts: - name: docker-sock mountPath: /var/run/docker.sock - name: workspaces mountPath: /tmp/workspaces securityContext: runAsNonRoot: true runAsUser: 1000 volumes: - name: docker-sock hostPath: path: /var/run/docker.sock - name: workspaces emptyDir: sizeLimit: 10Gi --- # hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: runner-service-hpa namespace: code-compiler spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: runner-service minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80

Monitoring and Observability

# monitoring-stack.yaml version: '3.8' services: # Prometheus prometheus: image: prom/prometheus:latest command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' - '--storage.tsdb.retention.time=30d' - '--web.enable-lifecycle' volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus networks: - monitoring # Grafana grafana: image: grafana/grafana:latest environment: GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD} volumes: - grafana_data:/var/lib/grafana - ./grafana/dashboards:/etc/grafana/provisioning/dashboards - ./grafana/datasources:/etc/grafana/provisioning/datasources networks: - monitoring # ELK Stack for Logs elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0 environment: discovery.type: single-node xpack.security.enabled: false volumes: - elasticsearch_data:/usr/share/elasticsearch/data networks: - logging logstash: image: docker.elastic.co/logstash/logstash:8.11.0 volumes: - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf depends_on: - elasticsearch networks: - logging kibana: image: docker.elastic.co/kibana/kibana:8.11.0 environment: ELASTICSEARCH_HOSTS: http://elasticsearch:9200 depends_on: - elasticsearch networks: - logging volumes: prometheus_data: grafana_data: elasticsearch_data: networks: monitoring: logging:

Key Learnings

1. Container Management is Complex

Key Challenges:

Cold Start Problem: Container creation takes 2-5 seconds
Resource Leaks: Containers not properly cleaned up
State Management: Persistent vs ephemeral container strategies
Network Isolation: Balancing security with functionality

Solutions Implemented:

Container pooling with pre-warmed instances
Automatic cleanup with garbage collection
Persistent containers with workspace isolation
Network-isolated execution environments

2. Real-time Streaming Requires Careful Architecture

Technical Insights:

WebSocket Management: Connection pooling and cleanup crucial
Message Ordering: Ensure output lines arrive in sequence
Buffer Management: Handle high-frequency output efficiently
Connection Recovery: Graceful handling of network issues

Best Practices:

Use Redis pub/sub for scalable streaming
Implement connection heartbeats
Buffer and batch small messages
Provide fallback to polling for unreliable connections

3. Security Cannot Be an Afterthought

Critical Security Measures:

Defense in Depth: Multiple security layers
Principle of Least Privilege: Minimal container permissions
Resource Limits: Prevent resource exhaustion attacks
Code Analysis: Static analysis before execution

Security Architecture:

┌─────────────────┐ │ Code Input │ ├─────────────────┤ │ Static Analysis │ ← First line of defense ├─────────────────┤ │ Rate Limiting │ ← Prevent abuse ├─────────────────┤ │ Docker Sandbox │ ← Isolation layer ├─────────────────┤ │ Resource Limits │ ← Resource protection ├─────────────────┤ │ Network Filter │ ← Network restrictions └─────────────────┘

4. Performance Optimization is Multi-Faceted

Optimization Areas:

Container Lifecycle: Pool management and reuse
Resource Allocation: Dynamic scaling based on load
Queue Management: Fair distribution and priority handling
Caching: Language environment and dependency caching

Performance Metrics to Track:

Container startup time
Execution latency
Queue depth
Resource utilization
Success/failure rates

5. Production Reliability Requires Operational Excellence

Observability Stack:

Metrics: Prometheus + Grafana for system health
Logging: ELK stack for centralized log analysis
Tracing: Distributed tracing for request flows
Alerting: PagerDuty integration for critical issues

Deployment Strategies:

Blue-green deployments for zero downtime
Canary releases for gradual rollouts
Circuit breakers for fault tolerance
Auto-scaling based on queue depth and CPU usage

6. Language-Specific Considerations

Each programming language has unique requirements:

Python:

Dependency management with pip
Virtual environment isolation
Import path security
Package installation caching

Node.js:

npm/yarn dependency resolution
Module loading restrictions
Event loop management
Memory garbage collection

Go:

Module system (go.mod)
Build caching for faster compilation
Static binary advantages
Goroutine resource management

Rust:

Cargo package management
Compilation time optimization
Memory safety guarantees
Target architecture handling

Java:

Classpath management
JVM startup optimization
Garbage collection tuning
Security manager configuration

Conclusion

Building a production-ready online code compiler is a journey that touches every aspect of modern distributed systems engineering. From container orchestration to real-time streaming, from security isolation to performance optimization, each component requires careful consideration and robust implementation.

The key to success lies in:

Robust Architecture: Design for failure and scale from day one
Security First: Implement security at every layer
Performance Focus: Optimize for user experience and resource efficiency
Operational Excellence: Monitor, measure, and continuously improve
Incremental Development: Start simple and add complexity gradually

The result should be a platform that feels immediate and reliable, allowing developers to focus on code rather than infrastructure. When users can execute code with the same confidence they have in their local development environment, you've achieved the goal of a truly powerful online code compiler.

Learning Resources

Essential Reading

Distributed Systems:

"Designing Data-Intensive Applications" by Martin Kleppmann - Comprehensive guide to distributed system patterns
"Building Microservices" by Sam Newman - Microservice architecture and communication patterns
"Site Reliability Engineering" by Google - Production system reliability practices

Container Technologies:

"Docker Deep Dive" by Nigel Poulton - Comprehensive Docker guide
"Kubernetes in Action" by Marko Lukša - Kubernetes orchestration patterns
"Container Security" by Liz Rice - Security best practices for containers

Real-time Systems:

"High Performance Browser Networking" by Ilya Grigorik - WebSocket and real-time communication
"Redis in Action" by Josiah Carlson - Redis patterns for real-time applications

Documentation and Specifications

Container Security:

Message Queues:

Performance Optimization:

Open Source Projects

Code Execution Platforms:

Judge0 - Online code execution system
HackerEarth API - Commercial code execution platform
Glot.io - Simple code execution service

Container Management:

Docker - Container runtime
Podman - Alternative container runtime
gVisor - Application kernel for containers

Message Queue Solutions:

RabbitMQ - Feature-rich message broker
Apache Kafka - High-throughput distributed streaming
Redis - In-memory data structure store

Tools and Development Environment

Development Tools:

Docker Desktop - Local container development
Kubernetes KIND - Local Kubernetes development
Minikube - Local Kubernetes cluster

Monitoring and Observability:

Prometheus - Metrics collection and alerting
Grafana - Metrics visualization and dashboards
ELK Stack - Centralized logging and analysis

Testing Frameworks:

Testcontainers - Integration testing with containers
k6 - Load testing for APIs and WebSockets
Artillery - Performance testing toolkit

With love from the Toki Space team

This tutorial represents our collective experience building Toki's code execution platform. The architecture and lessons shared here will help you build your own robust online code compiler. For questions or contributions, reach out to our engineering team at hello@tokispace.com

Table of Contents