Extended kernel-images Chromium environment with Browser Operator DevTools and eval server for browser automation, testing, and AI agent evaluation.
This platform provides:
- Browser Operator DevTools - Custom DevTools frontend with AI chat panel
- Eval Server API - HTTP/WebSocket API for browser automation and evaluation
- Headful Chrome with GUI access via WebRTC
- Chrome DevTools Protocol for automation (Playwright, Puppeteer)
- Screen Recording API for session capture
- Local Docker Compose for development
- Google Cloud Run deployment option
- Docker and Docker Compose installed
- Make utility
- Git with submodule access
- Python 3 (for running evals)
- Google Cloud Account with billing enabled
- gcloud CLI installed and authenticated
- All of the above
Best for: Background services, docker-compose workflows, persistent containers
# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start all services in background make compose-up # 4. Verify everything works make testBest for: Interactive debugging, seeing live logs, quick testing
# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start in interactive mode (logs to terminal) make run # In another terminal, verify make testAfter starting with either make compose-up or make run, access:
| Service | URL | Purpose |
|---|---|---|
| WebRTC Client | http://localhost:8000 | Live browser view with control |
| DevTools UI | http://localhost:8001 | Enhanced DevTools with AI chat |
| Eval Server API | http://localhost:8080 | HTTP REST API for automation |
| WebRTC Neko | http://localhost:8081 | WebRTC control interface |
| Eval Server WS | ws://localhost:8082 | WebSocket JSON-RPC API |
| CDP Endpoint | http://localhost:9222/json | Chrome DevTools Protocol |
| Recording API | http://localhost:444/api | Screen recording controls |
make help # Show all available commands make init # Initialize git submodules make build # Build images (smart caching) make rebuild # Force complete rebuild make build-devtools # Build DevTools base (~30 min) make rebuild-devtools # Fast rebuild with local changes make compose-up # Start in background make run # Start in interactive mode make stop # Stop all containers make restart # Restart containers make logs # View container logs make test # Run API verification test make clean # Clean up everything| Feature | make run | make compose-up |
|---|---|---|
| Log visibility | Live logs in terminal | Background, use make logs |
| Stopping | Ctrl+C or docker stop | make stop or docker-compose down |
| Restarting | Stop and run again | docker-compose restart |
| Use case | Interactive debugging | Background development |
| Startup script | run-local.sh | docker-compose.yml |
| Lock cleanup | Script cleans before start | Container cleans on start |
| Volume mounts | Defined in script | Defined in compose file |
With Docker Compose (make compose-up):
Editing Eval Server Code:
# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Restart container (no rebuild needed, volume-mounted) docker-compose restart # 3. Test changes make testEditing DevTools:
# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Restart containers docker-compose down && docker-compose up -dFull Rebuild:
make rebuild # Rebuild everything from scratch make compose-up # Start containersWith Direct Docker Run (make run):
Editing Eval Server Code:
# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Since eval-server is NOT volume-mounted in run mode, rebuild make rebuild # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make runEditing DevTools:
# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make runFull Rebuild:
make rebuild # Rebuild everything from scratch # Press Ctrl+C in the terminal running 'make run' make run # Start in interactive modeWith make run:
# Default: ./chromium-data make run # Custom location CHROMIUM_DATA_HOST=/path/to/data make run # Ephemeral (no persistence) CHROMIUM_DATA_HOST="" make runWith make compose-up:
# Edit docker-compose.yml to change CHROMIUM_DATA_HOST # Or set environment variable: CHROMIUM_DATA_HOST=/path/to/data make compose-upWith make run:
# Open specific URLs when browser starts URLS="https://google.com https://github.com" make runWith make compose-up:
# Add URLS to docker-compose.yml environment section# Simple test make test # Specific evaluation cd evals python3 run.py --path data/web-task-agent/flight-001.yaml --verbose # All evaluations in a directory python3 run.py --path data/web-task-agent/ --verboseContainer won't start (docker-compose):
# Check logs docker logs kernel-browser-extended # Clean restart make stop make clean make build make compose-upContainer won't start (make run):
# Stop existing container docker stop kernel-browser-extended docker rm kernel-browser-extended # Clean rebuild make clean make rebuild make runPort conflicts:
# Remove existing container docker rm -f kernel-browser-extended # Then start with your preferred method make compose-up # OR make runLock file errors (should be automatic now): The system now automatically cleans lock files on startup. If you still see errors:
With docker-compose:
docker-compose down rm -f ./chromium-data/user-data/Singleton* make compose-upWith make run:
# Press Ctrl+C to stop rm -f ./chromium-data/user-data/Singleton* make runSeeing stale code after changes (make run):
# Eval server code is NOT volume-mounted in run mode # You must rebuild after code changes make rebuild # Press Ctrl+C in terminal running 'make run' make runWant to see live logs (docker-compose):
# Option 1: Follow logs make logs # Option 2: Switch to interactive mode make stop make run# Set your project ID export PROJECT_ID="your-gcp-project-id" gcloud config set project $PROJECT_ID # Authenticate (if not already done) gcloud auth login gcloud auth application-default login# Automated deployment (recommended) ./deployment/cloudrun/deploy.sh # Or with custom settings ./deployment/cloudrun/deploy.sh --project your-project-id --region us-central1After deployment, you'll get URLs like:
π Service Endpoints: Main Interface: https://kernel-browser-xxx-uc.a.run.app WebRTC Client: https://kernel-browser-xxx-uc.a.run.app/ Chrome DevTools: https://kernel-browser-xxx-uc.a.run.app/ws Recording API: https://kernel-browser-xxx-uc.a.run.app/api Health Check: https://kernel-browser-xxx-uc.a.run.app/health Access the main URL in your browser to get real-time Chrome access:
- Full mouse/keyboard control
- Copy/paste support
- Window resizing
- Audio streaming (experimental)
Connect automation tools to the /ws endpoint:
// Playwright const browser = await chromium.connectOverCDP('wss://your-service-url/ws'); // Puppeteer const browser = await puppeteer.connect({ browserWSEndpoint: 'wss://your-service-url/ws', });Capture screen recordings via REST API:
# Start recording curl -X POST https://your-service-url/api/recording/start -d '{}' # Stop recording curl -X POST https://your-service-url/api/recording/stop -d '{}' # Download recording curl https://your-service-url/api/recording/download --output recording.mp4Key configuration options in service.yaml:
env: - name: ENABLE_WEBRTC value: "true" # Enable WebRTC streaming - name: WIDTH value: "1024" # Browser width - name: HEIGHT value: "768" # Browser height - name: CHROMIUM_FLAGS value: "--no-sandbox..." # Chrome launch flags - name: NEKO_ICESERVERS value: '[{"urls": [...]}]' # TURN/STUN serversDefault Cloud Run settings:
- CPU: 4 cores
- Memory: 8GB
- Timeout: 1 hour
- Concurrency: 1 (one browser per container)
- Min instances: 0 (scales to zero when unused)
- Max instances: 10 (adjustable)
- Cold start: ~30-60 seconds
Edit service.yaml to modify Chrome behavior:
- name: CHROMIUM_FLAGS value: "--user-data-dir=/home/kernel/user-data --disable-dev-shm-usage --custom-flag"For production WebRTC, configure a TURN server:
- name: NEKO_ICESERVERS value: '[{"urls": ["turn:turn.example.com:3478"], "username": "user", "credential": "pass"}]'The platform supports running WebArena benchmark evaluations against self-hosted test websites. This is completely optional and only needed if you're running WebArena tasks.
WebArena is a research benchmark with 812 tasks across 7 self-hosted websites (e-commerce, forums, GitLab, Wikipedia, etc.). To run these evaluations, you need to route specific domains to a custom IP address.
1. Configure environment variables in evals/.env:
cd evals cp .env.example .env vim .envAdd:
# WebArena Infrastructure Configuration WEBARENA_HOST_IP=172.16.55.59 # IP where WebArena sites are hosted WEBARENA_NETWORK=172.16.55.0/24 # Network CIDR for routing # WebArena Site URLs (optional - customize if needed) SHOPPING=http://onestopmarket.com SHOPPING_ADMIN=http://onestopmarket.com/admin REDDIT=http://reddit.com GITLAB=http://gitlab.com WIKIPEDIA=http://wikipedia.org2. Start container (configuration is auto-loaded):
make compose-up # OR make run3. Verify WebArena routing is enabled:
docker logs kernel-browser-extended | grep -i webarenaYou should see:
π [init] Configuring WebArena DNS mapping to 172.16.55.59... π [init] Adding route to 172.16.55.0/24 via 172.17.0.1... 4. Run WebArena evaluations:
cd evals python3 run_webarena.py --task-id 1 --verboseWhen WEBARENA_HOST_IP is set:
- DNS Mapping: Chromium routes WebArena domains (gitlab.com, reddit.com, etc.) to your specified IP
- Network Routing: Container adds route to reach the WebArena network
- Automatic: Configuration happens on container startup via
scripts/init-container.sh
Without configuration (default):
- System works normally with standard DNS resolution
- WebArena routing is completely disabled
- No impact on regular browser automation
You can use different IP addresses for different environments:
# Local development WEBARENA_HOST_IP=172.16.55.59 WEBARENA_NETWORK=172.16.55.0/24 # Cloud deployment WEBARENA_HOST_IP=34.123.45.67 WEBARENA_NETWORK=34.123.45.0/24 # Disable WebArena (default) WEBARENA_HOST_IP= WEBARENA_NETWORK=See CLAUDE.md for detailed WebArena configuration documentation.
web-agent/ βββ browser-operator-core/ # Submodule: DevTools frontend source βββ kernel-images/ # Submodule: Base browser environment βββ deployment/ # Deployment configurations β βββ cloudrun/ # Google Cloud Run deployment β β βββ deploy.sh # Cloud deployment script β β βββ cloudbuild.yaml # CI/CD pipeline config β β βββ service.yaml # Cloud Run service definition β β βββ service-secrets.yaml # Service with Secret Manager β β βββ cloudrun-wrapper.sh # Cloud Run entrypoint β β βββ cloudrun-kernel-wrapper.sh # Alternative wrapper β β βββ supervisord-cloudrun.conf # Supervisor for Cloud Run β β βββ nginx.conf # Reverse proxy config β βββ local/ # Local deployment β βββ run-local.sh # Interactive Docker run script βββ nginx/ # Nginx configurations β βββ nginx-devtools.conf # DevTools nginx config βββ scripts/ # Utility scripts β βββ init-container.sh # Auto-cleanup of lock files β βββ test-eval-server.sh # Eval server build test βββ supervisor/services/ # Service configs (overrides) βββ eval-server/ β βββ nodejs/ # Eval server (use this, NOT submodule) β βββ src/ # API server, evaluation server, lib β βββ start.js # Server entrypoint β βββ package.json βββ evals/ β βββ run.py # Python evaluation runner β βββ lib/judge.py # Judge implementations β βββ data/ # Evaluation YAML files βββ Dockerfile.local # Main Docker build (local dev) βββ Dockerfile.devtools # DevTools frontend build βββ Dockerfile.cloudrun # Cloud Run build βββ docker-compose.yml # Local deployment config βββ Makefile # Build commands βββ CLAUDE.md # Technical documentation βββ README.md # This file See the detailed troubleshooting section under Local Docker Compose Deployment above.
Common quick fixes:
# Clean restart make stop && make clean && make build && make compose-up # Check logs docker logs kernel-browser-extended # Verify services docker exec kernel-browser-extended supervisorctl status-
Build Timeout
# Use local build for testing ./deploy.sh --local -
Port Binding Errors
- Cloud Run requires port 8080
- nginx proxies internal services
- Check
nginx.conffor port mappings
-
Chrome Crashes
- Ensure
--no-sandboxflag is set - Check memory limits (8GB minimum)
- Verify non-root user execution
- Ensure
# View service logs gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=kernel-browser" --project=$PROJECT_ID --limit=50 # Check service status gcloud run services describe kernel-browser --region=us-central1 # Test endpoints curl https://your-service-url/health curl https://your-service-url/json/version- Service runs as non-root user
- Chrome uses
--no-sandbox(required for containers) - WebRTC streams are not encrypted by default
- Consider VPC/firewall rules for production
- Use Cloud IAM for API access control
Approximate Cloud Run costs:
- CPU: $0.00002400 per vCPU-second
- Memory: $0.00000250 per GiB-second
- Requests: $0.40 per million requests
Example: 1 hour session β $0.50-1.00
The cloudbuild.yaml provides:
- Submodule initialization
- Docker image build with caching
- Container Registry push
- Cloud Run deployment
- Traffic routing
# Normal build (with cache) - recommended for development gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml # Force rebuild without cache - use when dependencies change gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml --substitutions=_NO_CACHE=true # Automated deployment with Twilio TURN server setup ./deployment/cloudrun/deploy.shThe build system uses Docker layer caching by default to reduce build times and costs:
- With cache: ~5-10 minutes, lower cost
- Without cache:
30+ minutes, higher cost ($3-5 per build)
Use _NO_CACHE=true only when:
- Dependencies have changed significantly
- Base images need updating
- Debugging build issues
- CLAUDE.md - Detailed technical documentation for Claude Code
- kernel-images Documentation
- Browser Operator DevTools
- Cloud Run Documentation
- WebRTC Documentation
- Chrome DevTools Protocol
# Execute browser task curl -X POST http://localhost:8080/v1/responses \ -H "Content-Type: application/json" \ -d '{ "input": "Navigate to google.com and search for puppies", "url": "about:blank", "wait_timeout": 5000, "model": { "main_model": { "provider": "openai", "model": "gpt-4", "api_key": "your-api-key" } } }' # Get page content curl -X POST http://localhost:8080/page/content \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "format": "html"}' # Capture screenshot curl -X POST http://localhost:8080/page/screenshot \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "fullPage": false}'const WebSocket = require('ws'); const ws = new WebSocket('ws://localhost:8082'); ws.on('open', () => { // Subscribe to evaluations ws.send(JSON.stringify({ jsonrpc: '2.0', method: 'subscribe', params: { clientId: 'my-client' }, id: 1 })); }); ws.on('message', (data) => { const response = JSON.parse(data); console.log('Received:', response); });Need help? Check CLAUDE.md for detailed technical docs or open an issue.