By the Toki Space Team
Creating a production-grade online code compiler is one of the most complex and demanding projects in distributed systems. It calls for expertise in container orchestration, job queuing, real-time data flow, security sandboxing, and precise resource control. Unlike basic code execution tools, a full-fledged compiler platform must support multiple programming languages, manage parallel executions, stream outputs live, and recover gracefully from errors.
In this tutorial, you’ll learn how to build a fully functional online compiler using Docker for containerization, RabbitMQ for managing execution jobs, Redis for real-time communication, and React for the user interface. We’ll walk through the architecture, implementation, and deployment process—drawing from real-world experience building the code execution system behind Toki Space.
Please forgive me, if there are code execution errors. This is mainly meant to give you an idea of how it works under the hood. Let's dive in.
Table of Contents
- Architecture Overview
- Docker Container System
- Message Queue Integration
- Real-time Streaming
- Backend Implementation
- Frontend Development
- Security & Isolation
- Performance Optimization
- Production Deployment
- Key Learnings
Architecture Overview
System Components
Our online code compiler consists of five main components working together:
┌─────────────────┐ WebSocket ┌─────────────────┐ │ React │ ◄──────────────► │ Node.js │ │ Frontend │ │ Backend │ │ │ HTTP/REST │ │ └─────────────────┘ ◄──────────────► └─────────────────┘ │ │ Jobs ▼ ┌─────────────────┐ ┌─────────────────┐ │ Redis │ │ RabbitMQ │ │ (Streaming) │ │ (Job Queue) │ └─────────────────┘ └─────────────────┘ ▲ │ │ Results │ Jobs │ ▼ └─────────── ┌─────────────────┐ ◄───── │ Runner │ │ Service │ │ (Go) │ └─────────────────┘ │ │ Docker API ▼ ┌─────────────────┐ │ Docker │ │ Containers │ │ (Multi-lang) │ └─────────────────┘
Core Technologies
- Frontend: React with TypeScript for the user interface
- Backend: Node.js with Express for API and WebSocket handling
- Runner Service: Go service for container management and code execution
- Message Queue: RabbitMQ for reliable job distribution
- Streaming: Redis for real-time output streaming
- Containers: Docker for secure code execution isolation
Design Principles
- Language Agnostic: Support for Python, Node.js, Go, Rust, Java, and more
- Secure Isolation: Each execution runs in a separate Docker container
- Real-time Feedback: Stream output as code executes
- Scalable Architecture: Horizontal scaling through message queues
- Fault Tolerance: Graceful handling of failures and timeouts
Docker Container System
The Challenge of Multi-Language Execution
Running user code safely requires solving several complex problems:
- Security Isolation: Prevent malicious code from accessing the host system
- Resource Limits: Control CPU, memory, and execution time
- Environment Setup: Provide language-specific tools and dependencies
- Cleanup: Remove containers and workspaces after execution
Container-Per-Language Architecture
Instead of spinning up new containers for each execution (which is slow), we use persistent containers per language:
type LanguageVM struct { Language string ContainerName string IsRunning bool WorkspaceDir string mutex sync.Mutex } type Manager struct { config config.FirecrackerConfig logger *logrus.Logger vms map[string]*LanguageVM vmsMutex sync.RWMutex }
Benefits of Persistent Containers:
- Fast Execution: No container startup overhead
- Warm Environments: Dependencies already installed
- Resource Efficiency: Reuse container resources
- Consistent State: Predictable execution environment
Container Initialization
Each language gets its own persistent container with pre-installed tools:
func (m *Manager) initializePersistentContainers() error { m.logger.Info("Initializing persistent containers for all languages...") for language := range m.config.Environments { m.logger.Infof("Starting persistent container for language: %s", language) vm := &LanguageVM{ Language: language, ContainerName: fmt.Sprintf("runner-vm-%s", language), WorkspaceDir: filepath.Join(m.config.WorkspaceDir, language), IsRunning: false, } // Create language-specific workspace directory if err := os.MkdirAll(vm.WorkspaceDir, 0755); err != nil { return fmt.Errorf("failed to create workspace directory for %s: %w", language, err) } // Start the container if err := m.startPersistentContainer(vm); err != nil { m.logger.Errorf("Failed to start container for %s: %v", language, err) continue } m.vms[language] = vm } return nil }
Language-Specific Execution
Each language requires different setup and execution commands:
func (m *Manager) executeInPersistentContainer(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) { var execCmd []string workspacePath := fmt.Sprintf("/tmp/workspaces/%s/%s", vm.Language, req.JobID) switch vm.Language { case "python": entryPoint := req.EntryPoint if entryPoint == "" { entryPoint = "main.py" } execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && if [ -f requirements.txt ]; then pip install -r requirements.txt; fi && timeout 30 python %s", workspacePath, entryPoint)} case "nodejs": entryPoint := req.EntryPoint if entryPoint == "" { entryPoint = "index.js" } execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && if [ -f package.json ]; then npm install; fi && timeout 30 node %s", workspacePath, entryPoint)} case "go": execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && go mod tidy 2>/dev/null || true && timeout 30 go run .", workspacePath)} case "rust": execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && timeout 30 cargo run", workspacePath)} case "java": entryPoint := req.EntryPoint if entryPoint == "" { entryPoint = "Main.java" } className := strings.TrimSuffix(entryPoint, ".java") execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c", fmt.Sprintf("cd %s && javac %s && timeout 30 java %s", workspacePath, entryPoint, className)} } // Execute with timeout execCtx, cancel := context.WithTimeout(ctx, 30*time.Second) defer cancel() cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...) output, err := cmd.CombinedOutput() result := &ExecutionResult{ JobID: req.JobID, WorkspaceID: req.WorkspaceID, Output: string(output), ExitCode: 0, } if err != nil { result.Error = err.Error() result.ExitCode = 1 } return result, nil }
Key Implementation Details:
- Workspace Isolation: Each job gets its own directory within the container
- Dependency Management: Automatic installation of requirements.txt, package.json, etc.
- Timeout Protection: 30-second execution limit prevents infinite loops
- Error Handling: Capture both stdout and stderr for complete output
Message Queue Integration
Why RabbitMQ for Code Execution?
Code execution jobs have specific requirements that make RabbitMQ ideal:
- Reliability: Jobs must not be lost if a worker crashes
- Durability: Job queue survives server restarts
- Fair Distribution: Distribute jobs evenly across worker instances
- Dead Letter Queues: Handle failed jobs gracefully
- Priority Queues: Support urgent job execution
RabbitMQ Topology
Our message queue setup uses a topic exchange for flexible routing:
# Exchange Configuration exchanges: jobs: name: "code-execution.jobs" type: "topic" durable: true auto_delete: false results: name: "code-execution.results" type: "topic" durable: true auto_delete: false dead_letter: name: "code-execution.dead-letter" type: "direct" durable: true auto_delete: false # Queue Configuration queues: job_prefix: "jobs" result_prefix: "results" dead_letter_suffix: "dlq"
Routing Keys Pattern:
-
jobs.python
- Python execution jobs -
jobs.nodejs
- Node.js execution jobs -
jobs.go
- Go execution jobs -
results.python
- Python execution results -
results.nodejs
- Node.js execution results
Job Message Format
Standardized job messages ensure compatibility across services:
interface ExecutionJob { job_id: string; // Unique identifier workspace_id: string; // User workspace language: string; // Target language source_code: string; // Code to execute entry_point?: string; // Main file (optional) dependencies?: string[]; // Package dependencies timeout?: number; // Execution timeout memory_limit?: number; // Memory limit in MB }
Producer Implementation (Node.js Backend)
const amqp = require('amqplib'); class JobProducer { constructor(rabbitmqUrl) { this.rabbitmqUrl = rabbitmqUrl; this.connection = null; this.channel = null; } async connect() { this.connection = await amqp.connect(this.rabbitmqUrl); this.channel = await this.connection.createChannel(); // Declare exchanges await this.channel.assertExchange('code-execution.jobs', 'topic', { durable: true }); await this.channel.assertExchange('code-execution.results', 'topic', { durable: true }); } async submitJob(job) { const routingKey = `jobs.${job.language}`; const jobMessage = { job_id: job.job_id, workspace_id: job.workspace_id, language: job.language, url: this.createDataURL(job.source_code, job.entry_point), entry_point: job.entry_point }; await this.channel.publish( 'code-execution.jobs', routingKey, Buffer.from(JSON.stringify(jobMessage)), { persistent: true, messageId: job.job_id, timestamp: Date.now() } ); console.log(`Job ${job.job_id} submitted for ${job.language}`); } createDataURL(sourceCode, filename = 'main.py') { // Create inline data URL for source code return `data:text/plain;filename=${filename},${sourceCode}`; } async close() { if (this.channel) await this.channel.close(); if (this.connection) await this.connection.close(); } }
Consumer Implementation (Go Runner Service)
func (m *Manager) consumeRabbitMQMessages(config RabbitMQConfig) error { conn, err := amqp.Dial(config.URL) if err != nil { return fmt.Errorf("failed to connect to RabbitMQ: %w", err) } defer conn.Close() ch, err := conn.Channel() if err != nil { return fmt.Errorf("failed to open channel: %w", err) } defer ch.Close() // Declare unified queue for all languages jobQueue, err := ch.QueueDeclare( "jobs.all-languages", // name true, // durable false, // delete when unused false, // exclusive false, // no-wait nil, // arguments ) if err != nil { return fmt.Errorf("failed to declare job queue: %w", err) } // Bind queue to exchange for each language languages := []string{"python", "nodejs", "go", "rust", "java"} for _, lang := range languages { err = ch.QueueBind( jobQueue.Name, fmt.Sprintf("jobs.%s", lang), "code-execution.jobs", false, nil, ) if err != nil { return fmt.Errorf("failed to bind queue for %s: %w", lang, err) } } // Set QoS for fair distribution err = ch.Qos(1, 0, false) if err != nil { return fmt.Errorf("failed to set QoS: %w", err) } // Start consuming msgs, err := ch.Consume( jobQueue.Name, "", // consumer tag false, // auto-ack false, // exclusive false, // no-local false, // no-wait nil, // args ) if err != nil { return fmt.Errorf("failed to register consumer: %w", err) } for msg := range msgs { go m.processJob(msg) } return nil } func (m *Manager) processJob(msg amqp.Delivery) { var execReq ExecutionRequest if err := json.Unmarshal(msg.Body, &execReq); err != nil { m.logger.WithError(err).Error("Failed to parse job message") msg.Nack(false, false) // Don't requeue invalid messages return } // Execute the job ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second) result, err := m.ExecuteCode(ctx, execReq) cancel() if err != nil { m.logger.WithError(err).Error("Job execution failed") result = &ExecutionResult{ JobID: execReq.JobID, Output: "", Error: err.Error(), ExitCode: 1, } } // Publish result resultBytes, _ := json.Marshal(result) err = msg.Channel.Publish( "code-execution.results", fmt.Sprintf("results.%s", execReq.Language), false, // mandatory false, // immediate amqp.Publishing{ ContentType: "application/json", Body: resultBytes, }) if err != nil { m.logger.WithError(err).Error("Failed to publish result") msg.Nack(false, true) // Requeue on publish failure } else { msg.Ack(false) // Acknowledge successful processing } }
Real-time Streaming
The Challenge of Live Output
Unlike traditional job systems that return results after completion, code execution benefits from real-time output streaming:
- User Experience: Immediate feedback as code runs
- Long-Running Jobs: Show progress for lengthy operations
- Debugging: See output line-by-line for easier debugging
- Interactive Input: Support for programs requiring user input
Redis Pub/Sub for Streaming
Redis provides excellent pub/sub capabilities for real-time streaming:
// Redis streaming setup const redis = require('redis'); class StreamingService { constructor(redisUrl) { this.publisher = redis.createClient({ url: redisUrl }); this.subscriber = redis.createClient({ url: redisUrl }); } async connect() { await this.publisher.connect(); await this.subscriber.connect(); } // Publish output line from runner service async publishOutput(jobId, line, stream = 'stdout') { const message = { job_id: jobId, line: line, stream: stream, timestamp: new Date().toISOString() }; await this.publisher.publish( `execution:${jobId}`, JSON.stringify(message) ); } // Subscribe to job output async subscribeToJob(jobId, callback) { await this.subscriber.subscribe(`execution:${jobId}`, (message) => { const data = JSON.parse(message); callback(data); }); } async unsubscribeFromJob(jobId) { await this.subscriber.unsubscribe(`execution:${jobId}`); } }
WebSocket Integration
Connect Redis streams to frontend via WebSocket:
// WebSocket server integration const WebSocket = require('ws'); class WebSocketManager { constructor(server, streamingService) { this.wss = new WebSocket.Server({ server }); this.streamingService = streamingService; this.connections = new Map(); // jobId -> Set of WebSocket connections this.setupWebSocketHandlers(); } setupWebSocketHandlers() { this.wss.on('connection', (ws) => { ws.on('message', async (data) => { const message = JSON.parse(data); switch (message.type) { case 'subscribe': await this.subscribeToJobOutput(ws, message.job_id); break; case 'unsubscribe': await this.unsubscribeFromJobOutput(ws, message.job_id); break; } }); ws.on('close', () => { this.cleanupConnection(ws); }); }); } async subscribeToJobOutput(ws, jobId) { // Add connection to job subscription if (!this.connections.has(jobId)) { this.connections.set(jobId, new Set()); // Subscribe to Redis stream for this job await this.streamingService.subscribeToJob(jobId, (data) => { // Broadcast to all subscribed WebSocket connections const connections = this.connections.get(jobId); if (connections) { connections.forEach(conn => { if (conn.readyState === WebSocket.OPEN) { conn.send(JSON.stringify({ type: 'output', data: data })); } }); } }); } this.connections.get(jobId).add(ws); } async unsubscribeFromJobOutput(ws, jobId) { const connections = this.connections.get(jobId); if (connections) { connections.delete(ws); // If no more connections, unsubscribe from Redis if (connections.size === 0) { await this.streamingService.unsubscribeFromJob(jobId); this.connections.delete(jobId); } } } cleanupConnection(ws) { // Remove connection from all job subscriptions for (const [jobId, connections] of this.connections.entries()) { connections.delete(ws); if (connections.size === 0) { this.streamingService.unsubscribeFromJob(jobId); this.connections.delete(jobId); } } } }
Enhanced Runner with Streaming
Modify the runner service to stream output line-by-line:
func (m *Manager) executeWithStreaming(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) { // ... command setup ... cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...) // Create pipes for real-time output capture stdout, err := cmd.StdoutPipe() if err != nil { return nil, fmt.Errorf("failed to create stdout pipe: %w", err) } stderr, err := cmd.StderrPipe() if err != nil { return nil, fmt.Errorf("failed to create stderr pipe: %w", err) } // Start the command if err := cmd.Start(); err != nil { return nil, fmt.Errorf("failed to start command: %w", err) } // Stream output in real-time var outputBuffer strings.Builder var wg sync.WaitGroup wg.Add(2) // Stream stdout go func() { defer wg.Done() scanner := bufio.NewScanner(stdout) for scanner.Scan() { line := scanner.Text() outputBuffer.WriteString(line + "\n") // Publish to Redis for real-time streaming m.publishStreamingOutput(req.JobID, line, "stdout") } }() // Stream stderr go func() { defer wg.Done() scanner := bufio.NewScanner(stderr) for scanner.Scan() { line := scanner.Text() outputBuffer.WriteString(line + "\n") // Publish to Redis for real-time streaming m.publishStreamingOutput(req.JobID, line, "stderr") } }() // Wait for command completion err = cmd.Wait() wg.Wait() // Wait for all output to be processed result := &ExecutionResult{ JobID: req.JobID, WorkspaceID: req.WorkspaceID, Output: outputBuffer.String(), ExitCode: 0, } if err != nil { result.Error = err.Error() result.ExitCode = 1 } return result, nil } func (m *Manager) publishStreamingOutput(jobID, line, stream string) { // This would integrate with your Redis client message := StreamingOutput{ JobID: jobID, Line: line, Stream: stream, Timestamp: time.Now(), } // Publish to Redis (implementation depends on your Redis client) // m.redisClient.Publish(fmt.Sprintf("execution:%s", jobID), message) }
This streaming approach provides real-time feedback to users, making the code execution feel immediate and interactive rather than a black-box operation.
Backend Implementation
Node.js API Server
The backend serves as the orchestration layer between the frontend and the execution infrastructure:
const express = require('express'); const WebSocket = require('ws'); const { v4: uuidv4 } = require('uuid'); const cors = require('cors'); class CodeExecutionAPI { constructor() { this.app = express(); this.server = null; this.jobProducer = null; this.streamingService = null; this.wsManager = null; this.activeJobs = new Map(); // Track running jobs this.setupMiddleware(); this.setupRoutes(); } setupMiddleware() { this.app.use(cors()); this.app.use(express.json({ limit: '10mb' })); this.app.use(express.urlencoded({ extended: true })); // Request logging this.app.use((req, res, next) => { console.log(`${req.method} ${req.path} - ${new Date().toISOString()}`); next(); }); } setupRoutes() { // Health check this.app.get('/health', (req, res) => { res.json({ status: 'ok', timestamp: new Date().toISOString() }); }); // Submit code execution job this.app.post('/api/execute', async (req, res) => { try { const result = await this.handleCodeExecution(req.body); res.json(result); } catch (error) { console.error('Execution error:', error); res.status(500).json({ error: 'Execution failed', message: error.message }); } }); // Get job status this.app.get('/api/jobs/:jobId', (req, res) => { const jobId = req.params.jobId; const job = this.activeJobs.get(jobId); if (!job) { return res.status(404).json({ error: 'Job not found' }); } res.json(job); }); // List active jobs this.app.get('/api/jobs', (req, res) => { const jobs = Array.from(this.activeJobs.values()); res.json({ jobs, count: jobs.length }); }); // Cancel job this.app.delete('/api/jobs/:jobId', async (req, res) => { const jobId = req.params.jobId; await this.cancelJob(jobId); res.json({ message: 'Job cancelled' }); }); } async handleCodeExecution(requestBody) { const { language, source_code, entry_point, workspace_id = 'default', timeout = 30 } = requestBody; // Validate request if (!language || !source_code) { throw new Error('Language and source_code are required'); } const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java']; if (!supportedLanguages.includes(language)) { throw new Error(`Unsupported language: ${language}`); } // Generate unique job ID const jobId = uuidv4(); // Create job record const job = { job_id: jobId, workspace_id, language, source_code, entry_point, timeout, status: 'queued', created_at: new Date().toISOString(), updated_at: new Date().toISOString() }; this.activeJobs.set(jobId, job); // Submit to RabbitMQ await this.jobProducer.submitJob(job); // Update status job.status = 'submitted'; job.updated_at = new Date().toISOString(); return { job_id: jobId, status: 'submitted', message: 'Job submitted for execution', stream_url: `/stream/${jobId}` }; } async cancelJob(jobId) { const job = this.activeJobs.get(jobId); if (job && job.status === 'running') { // Send cancellation signal (implementation depends on your setup) job.status = 'cancelled'; job.updated_at = new Date().toISOString(); } } async start(port = 3001) { // Initialize services await this.initializeServices(); // Start HTTP server this.server = this.app.listen(port, () => { console.log(`Code execution API running on port ${port}`); }); // Setup WebSocket manager this.wsManager = new WebSocketManager(this.server, this.streamingService); // Setup result consumer this.setupResultConsumer(); } async initializeServices() { // Initialize RabbitMQ producer this.jobProducer = new JobProducer(process.env.RABBITMQ_URL); await this.jobProducer.connect(); // Initialize Redis streaming this.streamingService = new StreamingService(process.env.REDIS_URL); await this.streamingService.connect(); console.log('All services initialized successfully'); } setupResultConsumer() { // Consumer for job results from RabbitMQ const amqp = require('amqplib'); amqp.connect(process.env.RABBITMQ_URL) .then(conn => conn.createChannel()) .then(ch => { // Declare results queue return ch.assertQueue('results.all', { durable: true }) .then(() => { // Bind to results exchange return ch.bindQueue('results.all', 'code-execution.results', 'results.*'); }) .then(() => { // Consume results return ch.consume('results.all', (msg) => { if (msg) { this.handleJobResult(JSON.parse(msg.content.toString())); ch.ack(msg); } }); }); }) .catch(console.error); } handleJobResult(result) { const job = this.activeJobs.get(result.job_id); if (job) { job.status = result.exit_code === 0 ? 'completed' : 'failed'; job.result = result; job.updated_at = new Date().toISOString(); // Optionally clean up completed jobs after some time setTimeout(() => { this.activeJobs.delete(result.job_id); }, 300000); // 5 minutes } } async stop() { if (this.server) { this.server.close(); } if (this.jobProducer) { await this.jobProducer.close(); } if (this.streamingService) { await this.streamingService.close(); } } } // Environment configuration const config = { port: process.env.PORT || 3001, rabbitmq_url: process.env.RABBITMQ_URL || 'amqp://admin:admin123@localhost:5672', redis_url: process.env.REDIS_URL || 'redis://localhost:6379' }; // Start the server const api = new CodeExecutionAPI(); api.start(config.port); // Graceful shutdown process.on('SIGINT', async () => { console.log('Shutting down gracefully...'); await api.stop(); process.exit(0); });
Express Middleware for Code Validation
Add security and validation middleware:
const rateLimit = require('express-rate-limit'); const validator = require('validator'); // Rate limiting for code execution const executionLimiter = rateLimit({ windowMs: 60 * 1000, // 1 minute max: 10, // Maximum 10 executions per minute per IP message: 'Too many execution requests, please try again later', standardHeaders: true, legacyHeaders: false }); // Code validation middleware const validateCodeRequest = (req, res, next) => { const { language, source_code, entry_point } = req.body; // Check required fields if (!language || !source_code) { return res.status(400).json({ error: 'Missing required fields', required: ['language', 'source_code'] }); } // Validate language const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java']; if (!supportedLanguages.includes(language)) { return res.status(400).json({ error: 'Unsupported language', supported: supportedLanguages }); } // Check code length (prevent abuse) if (source_code.length > 100000) { // 100KB limit return res.status(400).json({ error: 'Source code too large', max_size: '100KB' }); } // Validate entry point if provided if (entry_point && !validator.isAlphanumeric(entry_point.replace(/[._-]/g, ''))) { return res.status(400).json({ error: 'Invalid entry point format' }); } // Basic security checks (can be expanded) const dangerousPatterns = [ /rm\s+-rf/, /sudo/, /passwd/, /\/etc\/passwd/, /mkfs/, /format/, /del\s+\/[a-z]/i ]; for (const pattern of dangerousPatterns) { if (pattern.test(source_code)) { return res.status(400).json({ error: 'Code contains potentially dangerous operations' }); } } next(); }; // Apply middleware to execution endpoint app.post('/api/execute', executionLimiter, validateCodeRequest, async (req, res) => { // ... execution logic } );
Frontend Development
React Code Execution Component
Based on the CodeRunnerDemo.tsx, here's a production-ready implementation:
import React, { useState, useEffect, useCallback, useRef } from 'react'; import { Button } from '@/components/ui/button'; import { Card, CardContent } from '@/components/ui/card'; import { Badge } from '@/components/ui/badge'; import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs'; import { Play, Square, Copy, Download, Settings } from 'lucide-react'; import CodeMirror from '@uiw/react-codemirror'; import { githubDark } from '@uiw/codemirror-theme-github'; import { javascript } from '@codemirror/lang-javascript'; import { python } from '@codemirror/lang-python'; import { rust } from '@codemirror/lang-rust'; import { go } from '@codemirror/lang-go'; import { java } from '@codemirror/lang-java'; interface ExecutionResult { job_id: string; status: string; output?: string; error?: string; exit_code?: number; duration?: number; } interface OutputLine { line: string; stream: 'stdout' | 'stderr'; timestamp: string; } const LANGUAGES = { python: { ext: python(), icon: '🐍', name: 'Python', starter: 'print("Hello, World!")' }, javascript: { ext: javascript(), icon: '⚡', name: 'JavaScript', starter: 'console.log("Hello, World!");' }, go: { ext: go(), icon: '🔵', name: 'Go', starter: 'package main\n\nimport "fmt"\n\nfunc main() {\n fmt.Println("Hello, World!")\n}' }, rust: { ext: rust(), icon: '🦀', name: 'Rust', starter: 'fn main() {\n println!("Hello, World!");\n}' }, java: { ext: java(), icon: '☕', name: 'Java', starter: 'public class Main {\n public static void main(String[] args) {\n System.out.println("Hello, World!");\n }\n}' } }; export function CodeExecutor() { const [selectedLanguage, setSelectedLanguage] = useState('python'); const [code, setCode] = useState(LANGUAGES.python.starter); const [isExecuting, setIsExecuting] = useState(false); const [output, setOutput] = useState<OutputLine[]>([]); const [executionResult, setExecutionResult] = useState<ExecutionResult | null>(null); const [currentJobId, setCurrentJobId] = useState<string | null>(null); const wsRef = useRef<WebSocket | null>(null); const outputRef = useRef<HTMLDivElement>(null); // WebSocket connection for real-time output const connectWebSocket = useCallback((jobId: string) => { const wsUrl = `ws://localhost:3001/stream/${jobId}`; wsRef.current = new WebSocket(wsUrl); wsRef.current.onopen = () => { console.log('WebSocket connected for job:', jobId); wsRef.current?.send(JSON.stringify({ type: 'subscribe', job_id: jobId })); }; wsRef.current.onmessage = (event) => { const message = JSON.parse(event.data); if (message.type === 'output') { const outputLine: OutputLine = { line: message.data.line, stream: message.data.stream, timestamp: message.data.timestamp }; setOutput(prev => [...prev, outputLine]); // Auto-scroll to bottom setTimeout(() => { if (outputRef.current) { outputRef.current.scrollTop = outputRef.current.scrollHeight; } }, 10); } }; wsRef.current.onclose = () => { console.log('WebSocket disconnected'); }; wsRef.current.onerror = (error) => { console.error('WebSocket error:', error); }; }, []); // Execute code const executeCode = useCallback(async () => { if (isExecuting) return; setIsExecuting(true); setOutput([]); setExecutionResult(null); try { const response = await fetch('http://localhost:3001/api/execute', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ language: selectedLanguage, source_code: code, workspace_id: 'web-editor' }) }); if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } const result = await response.json(); setCurrentJobId(result.job_id); // Connect WebSocket for streaming output connectWebSocket(result.job_id); // Poll for final result pollJobResult(result.job_id); } catch (error) { console.error('Execution failed:', error); setOutput([{ line: `Error: ${error.message}`, stream: 'stderr', timestamp: new Date().toISOString() }]); setIsExecuting(false); } }, [code, selectedLanguage, isExecuting, connectWebSocket]); // Poll for job completion const pollJobResult = useCallback(async (jobId: string) => { const pollInterval = 1000; // 1 second const maxPolls = 60; // 60 seconds timeout let polls = 0; const poll = async () => { try { const response = await fetch(`http://localhost:3001/api/jobs/${jobId}`); const job = await response.json(); if (job.status === 'completed' || job.status === 'failed') { setExecutionResult(job.result); setIsExecuting(false); // Close WebSocket if (wsRef.current) { wsRef.current.close(); } return; } polls++; if (polls < maxPolls) { setTimeout(poll, pollInterval); } else { // Timeout setIsExecuting(false); setOutput(prev => [...prev, { line: 'Execution timeout - job may still be running', stream: 'stderr', timestamp: new Date().toISOString() }]); } } catch (error) { console.error('Polling error:', error); setIsExecuting(false); } }; setTimeout(poll, pollInterval); }, []); // Stop execution const stopExecution = useCallback(async () => { if (currentJobId) { try { await fetch(`http://localhost:3001/api/jobs/${currentJobId}`, { method: 'DELETE' }); } catch (error) { console.error('Failed to cancel job:', error); } } if (wsRef.current) { wsRef.current.close(); } setIsExecuting(false); setCurrentJobId(null); }, [currentJobId]); // Copy code to clipboard const copyCode = useCallback(() => { navigator.clipboard.writeText(code); }, [code]); // Download output const downloadOutput = useCallback(() => { const outputText = output.map(line => `[${line.timestamp}] ${line.stream}: ${line.line}` ).join('\n'); const blob = new Blob([outputText], { type: 'text/plain' }); const url = URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = `execution-output-${Date.now()}.txt`; a.click(); URL.revokeObjectURL(url); }, [output]); // Language change handler useEffect(() => { setCode(LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].starter); }, [selectedLanguage]); // Cleanup WebSocket on unmount useEffect(() => { return () => { if (wsRef.current) { wsRef.current.close(); } }; }, []); return ( <div className="max-w-6xl mx-auto p-4 space-y-4"> {/* Header */} <div className="flex items-center justify-between"> <h1 className="text-2xl font-bold">Online Code Compiler</h1> <Badge variant="outline" className="text-sm"> Real-time execution with Docker containers </Badge> </div> {/* Language Selection */} <Card> <CardContent className="p-4"> <Tabs value={selectedLanguage} onValueChange={setSelectedLanguage}> <TabsList className="grid w-full grid-cols-5"> {Object.entries(LANGUAGES).map(([lang, config]) => ( <TabsTrigger key={lang} value={lang} className="flex items-center gap-2"> <span>{config.icon}</span> <span className="hidden sm:inline">{config.name}</span> </TabsTrigger> ))} </TabsList> </Tabs> </CardContent> </Card> {/* Code Editor */} <Card> <CardContent className="p-0"> <div className="border-b p-4 flex items-center justify-between"> <h3 className="font-semibold">Code Editor</h3> <div className="flex items-center gap-2"> <Button variant="outline" size="sm" onClick={copyCode} > <Copy className="h-4 w-4 mr-2" /> Copy </Button> <Button onClick={isExecuting ? stopExecution : executeCode} disabled={!code.trim()} variant={isExecuting ? "destructive" : "default"} > {isExecuting ? ( <> <Square className="h-4 w-4 mr-2" /> Stop </> ) : ( <> <Play className="h-4 w-4 mr-2" /> Run {LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].name} </> )} </Button> </div> </div> <CodeMirror value={code} onChange={(value) => setCode(value)} theme={githubDark} extensions={[LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].ext]} className="text-sm" basicSetup={{ lineNumbers: true, foldGutter: true, dropCursor: false, allowMultipleSelections: false }} /> </CardContent> </Card> {/* Output Panel */} <Card> <CardContent className="p-0"> <div className="border-b p-4 flex items-center justify-between"> <div className="flex items-center gap-2"> <h3 className="font-semibold">Output</h3> {isExecuting && ( <Badge variant="secondary" className="animate-pulse"> Executing... </Badge> )} {executionResult && ( <Badge variant={executionResult.exit_code === 0 ? "default" : "destructive"} > Exit code: {executionResult.exit_code} </Badge> )} </div> <div className="flex items-center gap-2"> {output.length > 0 && ( <Button variant="outline" size="sm" onClick={downloadOutput} > <Download className="h-4 w-4 mr-2" /> Download </Button> )} <Button variant="outline" size="sm" onClick={() => setOutput([])} > Clear </Button> </div> </div> <div ref={outputRef} className="h-96 overflow-y-auto p-4 bg-gray-950 text-gray-100 font-mono text-sm" > {output.length === 0 ? ( <div className="text-gray-500 italic"> Click "Run" to execute your code. Output will appear here in real-time. </div> ) : ( output.map((line, index) => ( <div key={index} className={`whitespace-pre-wrap ${ line.stream === 'stderr' ? 'text-red-400' : 'text-gray-100' }`} > {line.line} </div> )) )} </div> </CardContent> </Card> {/* Execution Statistics */} {executionResult && ( <Card> <CardContent className="p-4"> <h3 className="font-semibold mb-2">Execution Statistics</h3> <div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm"> <div> <span className="text-gray-500">Duration:</span> <div className="font-mono">{executionResult.duration || 0}ms</div> </div> <div> <span className="text-gray-500">Exit Code:</span> <div className="font-mono">{executionResult.exit_code}</div> </div> <div> <span className="text-gray-500">Language:</span> <div className="font-mono">{selectedLanguage}</div> </div> <div> <span className="text-gray-500">Job ID:</span> <div className="font-mono text-xs">{executionResult.job_id}</div> </div> </div> </CardContent> </Card> )} </div> ); }
Security & Isolation
The Security Challenge
Running arbitrary user code presents significant security risks:
- Host System Access: Malicious code could access the host filesystem
- Network Attacks: Code could scan internal networks or launch attacks
- Resource Exhaustion: Infinite loops or memory bombs could crash servers
- Data Exfiltration: Code could attempt to steal sensitive information
- Privilege Escalation: Attempts to gain root or admin access
Docker Security Model
Docker provides multiple layers of security isolation:
# Docker container security configuration version: '3.8' services: runner-python: image: python:3.11-slim security_opt: - no-new-privileges:true - seccomp:unconfined # May need custom seccomp profile cap_drop: - ALL cap_add: - SETGID - SETUID read_only: true tmpfs: - /tmp:exec,size=100m - /var/tmp:exec,size=100m ulimits: nproc: 64 # Limit number of processes nofile: 1024 # Limit open files memlock: 67108864 # Limit locked memory memory: 256m # Memory limit cpus: '0.5' # CPU limit pids_limit: 100 # Process limit networks: - isolated_network networks: isolated_network: driver: bridge internal: true # No external network access
Advanced Container Security
Implement additional security measures in the runner service:
func (m *Manager) createSecureContainer(language, jobID string) error { containerConfig := &container.Config{ Image: m.config.Environments[language], Cmd: []string{"tail", "-f", "/dev/null"}, Env: []string{ "HOME=/tmp", "USER=runner", "SHELL=/bin/bash", }, WorkingDir: "/tmp/workspace", User: "1000:1000", // Non-root user // Resource limits Memory: 256 * 1024 * 1024, // 256MB MemorySwap: 256 * 1024 * 1024, // No swap CpuShares: 512, // Half CPU priority // Security options SecurityOpts: []string{ "no-new-privileges:true", "seccomp=unconfined", // or custom profile }, // Network isolation NetworkDisabled: true, } hostConfig := &container.HostConfig{ // Resource constraints Resources: container.Resources{ Memory: 256 * 1024 * 1024, MemorySwap: 256 * 1024 * 1024, CPUShares: 512, PidsLimit: 50, Ulimits: []*units.Ulimit{ {Name: "nproc", Soft: 32, Hard: 32}, {Name: "nofile", Soft: 256, Hard: 256}, }, }, // Security ReadonlyRootfs: true, CapDrop: []string{"ALL"}, CapAdd: []string{"SETGID", "SETUID"}, // Temporary filesystems Tmpfs: map[string]string{ "/tmp": "exec,size=100m", "/var/tmp": "exec,size=100m", }, // No privileged access Privileged: false, // Network isolation NetworkMode: "none", } // Create container resp, err := m.dockerClient.ContainerCreate( context.Background(), containerConfig, hostConfig, nil, // networking config nil, // platform fmt.Sprintf("runner-%s-%s", language, jobID), ) if err != nil { return fmt.Errorf("failed to create container: %w", err) } // Start container if err := m.dockerClient.ContainerStart( context.Background(), resp.ID, types.ContainerStartOptions{}, ); err != nil { return fmt.Errorf("failed to start container: %w", err) } return nil }
Code Sanitization
Implement static analysis for dangerous patterns:
class CodeSanitizer { constructor() { this.dangerousPatterns = { filesystem: [ /\bopen\s*\(\s*['"][\/\\]/, // File system access /\bfile\s*\(\s*['"][\/\\]/, // File operations /\bos\.system/, // OS commands /\bsubprocess/, // Process execution /\beval\s*\(/, // Code evaluation /\bexec\s*\(/, // Code execution ], network: [ /\bsocket\s*\(/, // Network sockets /\burllib/, // URL operations /\brequests\./, // HTTP requests /\bhttplib/, // HTTP library /\bfetch\s*\(/, // Fetch API ], system: [ /\bos\.getenv/, // Environment variables /\bprocess\.env/, // Node.js environment /\b__import__/, // Dynamic imports /\brequire\s*\(\s*['"]child_process['"]/, // Child process ] }; } analyze(code, language) { const issues = []; for (const [category, patterns] of Object.entries(this.dangerousPatterns)) { for (const pattern of patterns) { const matches = code.match(pattern); if (matches) { issues.push({ category, pattern: pattern.toString(), match: matches[0], severity: this.getSeverity(category) }); } } } return { safe: issues.length === 0, issues, score: this.calculateSafetyScore(issues) }; } getSeverity(category) { const severityMap = { filesystem: 'high', network: 'medium', system: 'high' }; return severityMap[category] || 'low'; } calculateSafetyScore(issues) { const weights = { high: 10, medium: 5, low: 1 }; const totalWeight = issues.reduce((sum, issue) => sum + weights[issue.severity], 0); return Math.max(0, 100 - totalWeight); } } // Usage in API app.post('/api/execute', validateCodeRequest, async (req, res) => { const sanitizer = new CodeSanitizer(); const analysis = sanitizer.analyze(req.body.source_code, req.body.language); if (!analysis.safe && analysis.score < 50) { return res.status(400).json({ error: 'Code contains potentially dangerous operations', issues: analysis.issues, safety_score: analysis.score }); } // Proceed with execution... });
Network Isolation
Implement network restrictions at multiple levels:
# Docker network setup with restrictions docker network create --driver bridge \ --subnet=172.20.0.0/16 \ --opt com.docker.network.bridge.enable_icc=false \ --opt com.docker.network.bridge.enable_ip_masquerade=false \ isolated-execution # Firewall rules for container network iptables -I DOCKER-USER -s 172.20.0.0/16 -j DROP iptables -I DOCKER-USER -s 172.20.0.0/16 -d 172.20.0.0/16 -j ACCEPT
Performance Optimization
Container Lifecycle Management
Optimize container startup and cleanup:
type ContainerPool struct { pools map[string]*LanguagePool mutex sync.RWMutex logger *logrus.Logger } type LanguagePool struct { language string containers []*ContainerInstance available chan *ContainerInstance maxSize int currentSize int mutex sync.Mutex } type ContainerInstance struct { ID string Language string CreatedAt time.Time LastUsed time.Time InUse bool } func NewContainerPool(maxSize int, logger *logrus.Logger) *ContainerPool { return &ContainerPool{ pools: make(map[string]*LanguagePool), logger: logger, } } func (cp *ContainerPool) GetContainer(language string) (*ContainerInstance, error) { cp.mutex.RLock() pool, exists := cp.pools[language] cp.mutex.RUnlock() if !exists { cp.mutex.Lock() pool = &LanguagePool{ language: language, available: make(chan *ContainerInstance, 10), maxSize: 10, } cp.pools[language] = pool cp.mutex.Unlock() } // Try to get from available pool select { case container := <-pool.available: container.InUse = true container.LastUsed = time.Now() return container, nil default: // Create new container if under limit return cp.createNewContainer(pool) } } func (cp *ContainerPool) ReturnContainer(container *ContainerInstance) { cp.mutex.RLock() pool := cp.pools[container.Language] cp.mutex.RUnlock() container.InUse = false container.LastUsed = time.Now() // Clean container workspace cp.cleanContainerWorkspace(container) // Return to pool select { case pool.available <- container: // Successfully returned to pool default: // Pool is full, destroy container cp.destroyContainer(container) } } func (cp *ContainerPool) cleanContainerWorkspace(container *ContainerInstance) { // Execute cleanup commands in container cleanupCmd := []string{ "docker", "exec", container.ID, "bash", "-c", "rm -rf /tmp/workspace/* 2>/dev/null || true" } exec.Command(cleanupCmd[0], cleanupCmd[1:]...).Run() }
Memory Management
Implement intelligent memory management:
type MemoryManager struct { totalMemory uint64 usedMemory uint64 containerMem map[string]uint64 mutex sync.RWMutex logger *logrus.Logger } func (mm *MemoryManager) AllocateMemory(containerID string, requested uint64) error { mm.mutex.Lock() defer mm.mutex.Unlock() // Check if allocation would exceed limits if mm.usedMemory + requested > mm.totalMemory * 80 / 100 { // 80% threshold return fmt.Errorf("insufficient memory: %d MB requested, %d MB available", requested/1024/1024, (mm.totalMemory-mm.usedMemory)/1024/1024) } mm.usedMemory += requested mm.containerMem[containerID] = requested mm.logger.Infof("Allocated %d MB to container %s", requested/1024/1024, containerID) return nil } func (mm *MemoryManager) ReleaseMemory(containerID string) { mm.mutex.Lock() defer mm.mutex.Unlock() if allocated, exists := mm.containerMem[containerID]; exists { mm.usedMemory -= allocated delete(mm.containerMem, containerID) mm.logger.Infof("Released %d MB from container %s", allocated/1024/1024, containerID) } } func (mm *MemoryManager) GetMemoryStats() map[string]interface{} { mm.mutex.RLock() defer mm.mutex.RUnlock() return map[string]interface{}{ "total_mb": mm.totalMemory / 1024 / 1024, "used_mb": mm.usedMemory / 1024 / 1024, "available_mb": (mm.totalMemory - mm.usedMemory) / 1024 / 1024, "utilization": float64(mm.usedMemory) / float64(mm.totalMemory) * 100, "active_containers": len(mm.containerMem), } }
Load Balancing
Implement intelligent load balancing:
type LoadBalancer struct { workers []*WorkerNode roundRobin int mutex sync.Mutex healthChecker *HealthChecker } type WorkerNode struct { ID string Address string CPU float64 Memory float64 ActiveJobs int MaxJobs int LastSeen time.Time Healthy bool } func (lb *LoadBalancer) SelectWorker(job *ExecutionJob) (*WorkerNode, error) { lb.mutex.Lock() defer lb.mutex.Unlock() healthyWorkers := lb.getHealthyWorkers() if len(healthyWorkers) == 0 { return nil, fmt.Errorf("no healthy workers available") } // Sort by load (CPU + Memory + Active Jobs) sort.Slice(healthyWorkers, func(i, j int) bool { loadI := lb.calculateLoad(healthyWorkers[i]) loadJ := lb.calculateLoad(healthyWorkers[j]) return loadI < loadJ }) // Select least loaded worker selected := healthyWorkers[0] selected.ActiveJobs++ lb.logger.Infof("Selected worker %s (load: %.2f)", selected.ID, lb.calculateLoad(selected)) return selected, nil } func (lb *LoadBalancer) calculateLoad(worker *WorkerNode) float64 { // Weighted load calculation cpuWeight := 0.3 memoryWeight := 0.3 jobWeight := 0.4 cpuLoad := worker.CPU / 100.0 memoryLoad := worker.Memory / 100.0 jobLoad := float64(worker.ActiveJobs) / float64(worker.MaxJobs) return cpuWeight*cpuLoad + memoryWeight*memoryLoad + jobWeight*jobLoad }
Production Deployment
Docker Compose Production Setup
version: '3.8' services: # RabbitMQ cluster rabbitmq: image: rabbitmq:3.12-management hostname: rabbitmq-main environment: RABBITMQ_ERLANG_COOKIE: ${RABBITMQ_COOKIE} RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER} RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASS} RABBITMQ_DEFAULT_VHOST: / volumes: - rabbitmq_data:/var/lib/rabbitmq - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf networks: - backend deploy: replicas: 1 resources: limits: memory: 1G cpus: '0.5' # Redis cluster redis: image: redis:7-alpine command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru volumes: - redis_data:/data networks: - backend deploy: replicas: 1 resources: limits: memory: 512M cpus: '0.25' # API Backend api-backend: build: context: ./backend dockerfile: Dockerfile.production environment: NODE_ENV: production RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/ REDIS_URL: redis://redis:6379 LOG_LEVEL: info RATE_LIMIT_WINDOW: 60000 RATE_LIMIT_MAX: 10 depends_on: - rabbitmq - redis networks: - backend - frontend deploy: replicas: 2 resources: limits: memory: 512M cpus: '0.5' update_config: order: start-first failure_action: rollback # Runner Service runner-service: build: context: ./runner dockerfile: Dockerfile.production environment: RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/ REDIS_URL: redis://redis:6379 LOG_LEVEL: info MAX_CONCURRENT_JOBS: 5 WORKSPACE_DIR: /tmp/workspaces volumes: - /var/run/docker.sock:/var/run/docker.sock - runner_workspaces:/tmp/workspaces depends_on: - rabbitmq - redis networks: - backend - execution deploy: replicas: 3 resources: limits: memory: 2G cpus: '1.0' placement: constraints: - node.role == worker # Frontend frontend: build: context: ./frontend dockerfile: Dockerfile.production environment: REACT_APP_API_URL: http://api-backend:3001 REACT_APP_WS_URL: ws://api-backend:3001 depends_on: - api-backend networks: - frontend deploy: replicas: 2 resources: limits: memory: 256M cpus: '0.25' # Load Balancer nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf - ./ssl:/etc/ssl/certs depends_on: - frontend - api-backend networks: - frontend deploy: replicas: 1 resources: limits: memory: 128M cpus: '0.1' volumes: rabbitmq_data: redis_data: runner_workspaces: networks: frontend: driver: overlay backend: driver: overlay execution: driver: overlay internal: true
Kubernetes Deployment
# namespace.yaml apiVersion: v1 kind: Namespace metadata: name: code-compiler --- # configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: app-config namespace: code-compiler data: RABBITMQ_URL: "amqp://admin:password@rabbitmq:5672/" REDIS_URL: "redis://redis:6379" LOG_LEVEL: "info" MAX_CONCURRENT_JOBS: "5" --- # runner-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: runner-service namespace: code-compiler spec: replicas: 3 selector: matchLabels: app: runner-service template: metadata: labels: app: runner-service spec: containers: - name: runner image: your-registry/runner-service:latest envFrom: - configMapRef: name: app-config resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m" volumeMounts: - name: docker-sock mountPath: /var/run/docker.sock - name: workspaces mountPath: /tmp/workspaces securityContext: runAsNonRoot: true runAsUser: 1000 volumes: - name: docker-sock hostPath: path: /var/run/docker.sock - name: workspaces emptyDir: sizeLimit: 10Gi --- # hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: runner-service-hpa namespace: code-compiler spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: runner-service minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
Monitoring and Observability
# monitoring-stack.yaml version: '3.8' services: # Prometheus prometheus: image: prom/prometheus:latest command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' - '--storage.tsdb.retention.time=30d' - '--web.enable-lifecycle' volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus networks: - monitoring # Grafana grafana: image: grafana/grafana:latest environment: GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD} volumes: - grafana_data:/var/lib/grafana - ./grafana/dashboards:/etc/grafana/provisioning/dashboards - ./grafana/datasources:/etc/grafana/provisioning/datasources networks: - monitoring # ELK Stack for Logs elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0 environment: discovery.type: single-node xpack.security.enabled: false volumes: - elasticsearch_data:/usr/share/elasticsearch/data networks: - logging logstash: image: docker.elastic.co/logstash/logstash:8.11.0 volumes: - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf depends_on: - elasticsearch networks: - logging kibana: image: docker.elastic.co/kibana/kibana:8.11.0 environment: ELASTICSEARCH_HOSTS: http://elasticsearch:9200 depends_on: - elasticsearch networks: - logging volumes: prometheus_data: grafana_data: elasticsearch_data: networks: monitoring: logging:
Key Learnings
1. Container Management is Complex
Key Challenges:
- Cold Start Problem: Container creation takes 2-5 seconds
- Resource Leaks: Containers not properly cleaned up
- State Management: Persistent vs ephemeral container strategies
- Network Isolation: Balancing security with functionality
Solutions Implemented:
- Container pooling with pre-warmed instances
- Automatic cleanup with garbage collection
- Persistent containers with workspace isolation
- Network-isolated execution environments
2. Real-time Streaming Requires Careful Architecture
Technical Insights:
- WebSocket Management: Connection pooling and cleanup crucial
- Message Ordering: Ensure output lines arrive in sequence
- Buffer Management: Handle high-frequency output efficiently
- Connection Recovery: Graceful handling of network issues
Best Practices:
- Use Redis pub/sub for scalable streaming
- Implement connection heartbeats
- Buffer and batch small messages
- Provide fallback to polling for unreliable connections
3. Security Cannot Be an Afterthought
Critical Security Measures:
- Defense in Depth: Multiple security layers
- Principle of Least Privilege: Minimal container permissions
- Resource Limits: Prevent resource exhaustion attacks
- Code Analysis: Static analysis before execution
Security Architecture:
┌─────────────────┐ │ Code Input │ ├─────────────────┤ │ Static Analysis │ ← First line of defense ├─────────────────┤ │ Rate Limiting │ ← Prevent abuse ├─────────────────┤ │ Docker Sandbox │ ← Isolation layer ├─────────────────┤ │ Resource Limits │ ← Resource protection ├─────────────────┤ │ Network Filter │ ← Network restrictions └─────────────────┘
4. Performance Optimization is Multi-Faceted
Optimization Areas:
- Container Lifecycle: Pool management and reuse
- Resource Allocation: Dynamic scaling based on load
- Queue Management: Fair distribution and priority handling
- Caching: Language environment and dependency caching
Performance Metrics to Track:
- Container startup time
- Execution latency
- Queue depth
- Resource utilization
- Success/failure rates
5. Production Reliability Requires Operational Excellence
Observability Stack:
- Metrics: Prometheus + Grafana for system health
- Logging: ELK stack for centralized log analysis
- Tracing: Distributed tracing for request flows
- Alerting: PagerDuty integration for critical issues
Deployment Strategies:
- Blue-green deployments for zero downtime
- Canary releases for gradual rollouts
- Circuit breakers for fault tolerance
- Auto-scaling based on queue depth and CPU usage
6. Language-Specific Considerations
Each programming language has unique requirements:
Python:
- Dependency management with pip
- Virtual environment isolation
- Import path security
- Package installation caching
Node.js:
- npm/yarn dependency resolution
- Module loading restrictions
- Event loop management
- Memory garbage collection
Go:
- Module system (go.mod)
- Build caching for faster compilation
- Static binary advantages
- Goroutine resource management
Rust:
- Cargo package management
- Compilation time optimization
- Memory safety guarantees
- Target architecture handling
Java:
- Classpath management
- JVM startup optimization
- Garbage collection tuning
- Security manager configuration
Conclusion
Building a production-ready online code compiler is a journey that touches every aspect of modern distributed systems engineering. From container orchestration to real-time streaming, from security isolation to performance optimization, each component requires careful consideration and robust implementation.
The key to success lies in:
- Robust Architecture: Design for failure and scale from day one
- Security First: Implement security at every layer
- Performance Focus: Optimize for user experience and resource efficiency
- Operational Excellence: Monitor, measure, and continuously improve
- Incremental Development: Start simple and add complexity gradually
The result should be a platform that feels immediate and reliable, allowing developers to focus on code rather than infrastructure. When users can execute code with the same confidence they have in their local development environment, you've achieved the goal of a truly powerful online code compiler.
Learning Resources
Essential Reading
Distributed Systems:
- "Designing Data-Intensive Applications" by Martin Kleppmann - Comprehensive guide to distributed system patterns
- "Building Microservices" by Sam Newman - Microservice architecture and communication patterns
- "Site Reliability Engineering" by Google - Production system reliability practices
Container Technologies:
- "Docker Deep Dive" by Nigel Poulton - Comprehensive Docker guide
- "Kubernetes in Action" by Marko Lukša - Kubernetes orchestration patterns
- "Container Security" by Liz Rice - Security best practices for containers
Real-time Systems:
- "High Performance Browser Networking" by Ilya Grigorik - WebSocket and real-time communication
- "Redis in Action" by Josiah Carlson - Redis patterns for real-time applications
Documentation and Specifications
Container Security:
Message Queues:
Performance Optimization:
Open Source Projects
Code Execution Platforms:
- Judge0 - Online code execution system
- HackerEarth API - Commercial code execution platform
- Glot.io - Simple code execution service
Container Management:
- Docker - Container runtime
- Podman - Alternative container runtime
- gVisor - Application kernel for containers
Message Queue Solutions:
- RabbitMQ - Feature-rich message broker
- Apache Kafka - High-throughput distributed streaming
- Redis - In-memory data structure store
Tools and Development Environment
Development Tools:
- Docker Desktop - Local container development
- Kubernetes KIND - Local Kubernetes development
- Minikube - Local Kubernetes cluster
Monitoring and Observability:
- Prometheus - Metrics collection and alerting
- Grafana - Metrics visualization and dashboards
- ELK Stack - Centralized logging and analysis
Testing Frameworks:
- Testcontainers - Integration testing with containers
- k6 - Load testing for APIs and WebSockets
- Artillery - Performance testing toolkit
With love from the Toki Space team
This tutorial represents our collective experience building Toki's code execution platform. The architecture and lessons shared here will help you build your own robust online code compiler. For questions or contributions, reach out to our engineering team at hello@tokispace.com
Top comments (0)