Web Agent - Browser Automation & Evaluation Platform

Extended kernel-images Chromium environment with Browser Operator DevTools and eval server for browser automation, testing, and AI agent evaluation.

🏗️ Architecture

This platform provides:

Browser Operator DevTools - Custom DevTools frontend with AI chat panel
Eval Server API - HTTP/WebSocket API for browser automation and evaluation
Headful Chrome with GUI access via WebRTC
Chrome DevTools Protocol for automation (Playwright, Puppeteer)
Screen Recording API for session capture
Local Docker Compose for development
Google Cloud Run deployment option

📋 Prerequisites

For Local Development

Docker and Docker Compose installed
Make utility
Git with submodule access
Python 3 (for running evals)

For Cloud Run Deployment

Google Cloud Account with billing enabled
gcloud CLI installed and authenticated
All of the above

🚀 Local Development - Two Deployment Options

Option 1: Docker Compose (Recommended for Development)

Best for: Background services, docker-compose workflows, persistent containers

# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start all services in background make compose-up # 4. Verify everything works make test

Option 2: Direct Docker Run (Interactive Mode)

Best for: Interactive debugging, seeing live logs, quick testing

# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start in interactive mode (logs to terminal) make run # In another terminal, verify make test

Access Points

After starting with either make compose-up or make run, access:

Service	URL	Purpose
WebRTC Client	http://localhost:8000	Live browser view with control
DevTools UI	http://localhost:8001	Enhanced DevTools with AI chat
Eval Server API	http://localhost:8080	HTTP REST API for automation
WebRTC Neko	http://localhost:8081	WebRTC control interface
Eval Server WS	ws://localhost:8082	WebSocket JSON-RPC API
CDP Endpoint	http://localhost:9222/json	Chrome DevTools Protocol
Recording API	http://localhost:444/api	Screen recording controls

Available Make Commands

make help # Show all available commands make init # Initialize git submodules make build # Build images (smart caching) make rebuild # Force complete rebuild make build-devtools # Build DevTools base (~30 min) make rebuild-devtools # Fast rebuild with local changes make compose-up # Start in background make run # Start in interactive mode make stop # Stop all containers make restart # Restart containers make logs # View container logs make test # Run API verification test make clean # Clean up everything

Comparison: `make run` vs `make compose-up`

Feature	`make run`	`make compose-up`
Log visibility	Live logs in terminal	Background, use `make logs`
Stopping	Ctrl+C or `docker stop`	`make stop` or `docker-compose down`
Restarting	Stop and run again	`docker-compose restart`
Use case	Interactive debugging	Background development
Startup script	`run-local.sh`	`docker-compose.yml`
Lock cleanup	Script cleans before start	Container cleans on start
Volume mounts	Defined in script	Defined in compose file

Development Workflow

With Docker Compose (make compose-up):

Editing Eval Server Code:

# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Restart container (no rebuild needed, volume-mounted) docker-compose restart # 3. Test changes make test

Editing DevTools:

# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Restart containers docker-compose down && docker-compose up -d

Full Rebuild:

make rebuild # Rebuild everything from scratch make compose-up # Start containers

With Direct Docker Run (make run):

Editing Eval Server Code:

# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Since eval-server is NOT volume-mounted in run mode, rebuild make rebuild # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make run

Editing DevTools:

# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make run

Full Rebuild:

make rebuild # Rebuild everything from scratch # Press Ctrl+C in the terminal running 'make run' make run # Start in interactive mode

Customizing Browser Data Location

With make run:

# Default: ./chromium-data make run # Custom location CHROMIUM_DATA_HOST=/path/to/data make run # Ephemeral (no persistence) CHROMIUM_DATA_HOST="" make run

With make compose-up:

# Edit docker-compose.yml to change CHROMIUM_DATA_HOST # Or set environment variable: CHROMIUM_DATA_HOST=/path/to/data make compose-up

Opening URLs on Startup

With make run:

# Open specific URLs when browser starts URLS="https://google.com https://github.com" make run

With make compose-up:

# Add URLS to docker-compose.yml environment section

Running Evaluations

# Simple test make test # Specific evaluation cd evals python3 run.py --path data/web-task-agent/flight-001.yaml --verbose # All evaluations in a directory python3 run.py --path data/web-task-agent/ --verbose

Troubleshooting

Container won't start (docker-compose):

# Check logs docker logs kernel-browser-extended # Clean restart make stop make clean make build make compose-up

Container won't start (make run):

# Stop existing container docker stop kernel-browser-extended docker rm kernel-browser-extended # Clean rebuild make clean make rebuild make run

Port conflicts:

# Remove existing container docker rm -f kernel-browser-extended # Then start with your preferred method make compose-up # OR make run

Lock file errors (should be automatic now): The system now automatically cleans lock files on startup. If you still see errors:

With docker-compose:

docker-compose down rm -f ./chromium-data/user-data/Singleton* make compose-up

With make run:

# Press Ctrl+C to stop rm -f ./chromium-data/user-data/Singleton* make run

Seeing stale code after changes (make run):

# Eval server code is NOT volume-mounted in run mode # You must rebuild after code changes make rebuild # Press Ctrl+C in terminal running 'make run' make run

Want to see live logs (docker-compose):

# Option 1: Follow logs make logs # Option 2: Switch to interactive mode make stop make run

🚀 Google Cloud Run Deployment

Configure Google Cloud

# Set your project ID export PROJECT_ID="your-gcp-project-id" gcloud config set project $PROJECT_ID # Authenticate (if not already done) gcloud auth login gcloud auth application-default login

Deploy to Cloud Run

# Automated deployment (recommended) ./deployment/cloudrun/deploy.sh # Or with custom settings ./deployment/cloudrun/deploy.sh --project your-project-id --region us-central1

Access Cloud Run Service

After deployment, you'll get URLs like:

🌐 Service Endpoints: Main Interface: https://kernel-browser-xxx-uc.a.run.app WebRTC Client: https://kernel-browser-xxx-uc.a.run.app/ Chrome DevTools: https://kernel-browser-xxx-uc.a.run.app/ws Recording API: https://kernel-browser-xxx-uc.a.run.app/api Health Check: https://kernel-browser-xxx-uc.a.run.app/health

📖 Detailed Usage

WebRTC Live View

Access the main URL in your browser to get real-time Chrome access:

Full mouse/keyboard control
Copy/paste support
Window resizing
Audio streaming (experimental)

Chrome DevTools Protocol

Connect automation tools to the /ws endpoint:

// Playwright const browser = await chromium.connectOverCDP('wss://your-service-url/ws'); // Puppeteer  const browser = await puppeteer.connect({ browserWSEndpoint: 'wss://your-service-url/ws', });

Recording API

Capture screen recordings via REST API:

# Start recording curl -X POST https://your-service-url/api/recording/start -d '{}' # Stop recording  curl -X POST https://your-service-url/api/recording/stop -d '{}' # Download recording curl https://your-service-url/api/recording/download --output recording.mp4

⚙️ Configuration

Environment Variables

Key configuration options in service.yaml:

env: - name: ENABLE_WEBRTC value: "true" # Enable WebRTC streaming - name: WIDTH  value: "1024" # Browser width - name: HEIGHT value: "768" # Browser height - name: CHROMIUM_FLAGS value: "--no-sandbox..." # Chrome launch flags - name: NEKO_ICESERVERS value: '[{"urls": [...]}]' # TURN/STUN servers

Resource Limits

Default Cloud Run settings:

CPU: 4 cores
Memory: 8GB
Timeout: 1 hour
Concurrency: 1 (one browser per container)

Scaling

Min instances: 0 (scales to zero when unused)
Max instances: 10 (adjustable)
Cold start: ~30-60 seconds

🔧 Advanced Configuration

Custom Chrome Flags

Edit service.yaml to modify Chrome behavior:

- name: CHROMIUM_FLAGS value: "--user-data-dir=/home/kernel/user-data --disable-dev-shm-usage --custom-flag"

TURN Server for WebRTC

For production WebRTC, configure a TURN server:

- name: NEKO_ICESERVERS value: '[{"urls": ["turn:turn.example.com:3478"], "username": "user", "credential": "pass"}]'

WebArena Configuration (Optional)

The platform supports running WebArena benchmark evaluations against self-hosted test websites. This is completely optional and only needed if you're running WebArena tasks.

What is WebArena?

WebArena is a research benchmark with 812 tasks across 7 self-hosted websites (e-commerce, forums, GitLab, Wikipedia, etc.). To run these evaluations, you need to route specific domains to a custom IP address.

Quick Setup

1. Configure environment variables in evals/.env:

cd evals cp .env.example .env vim .env

Add:

# WebArena Infrastructure Configuration WEBARENA_HOST_IP=172.16.55.59 # IP where WebArena sites are hosted WEBARENA_NETWORK=172.16.55.0/24 # Network CIDR for routing # WebArena Site URLs (optional - customize if needed) SHOPPING=http://onestopmarket.com SHOPPING_ADMIN=http://onestopmarket.com/admin REDDIT=http://reddit.com GITLAB=http://gitlab.com WIKIPEDIA=http://wikipedia.org

2. Start container (configuration is auto-loaded):

make compose-up # OR make run

3. Verify WebArena routing is enabled:

docker logs kernel-browser-extended | grep -i webarena

You should see:

🌐 [init] Configuring WebArena DNS mapping to 172.16.55.59... 🌐 [init] Adding route to 172.16.55.0/24 via 172.17.0.1...

4. Run WebArena evaluations:

cd evals python3 run_webarena.py --task-id 1 --verbose

How It Works

When WEBARENA_HOST_IP is set:

DNS Mapping: Chromium routes WebArena domains (gitlab.com, reddit.com, etc.) to your specified IP
Network Routing: Container adds route to reach the WebArena network
Automatic: Configuration happens on container startup via scripts/init-container.sh

Without configuration (default):

System works normally with standard DNS resolution
WebArena routing is completely disabled
No impact on regular browser automation

Deployment-Specific IPs

You can use different IP addresses for different environments:

# Local development WEBARENA_HOST_IP=172.16.55.59 WEBARENA_NETWORK=172.16.55.0/24 # Cloud deployment WEBARENA_HOST_IP=34.123.45.67 WEBARENA_NETWORK=34.123.45.0/24 # Disable WebArena (default) WEBARENA_HOST_IP= WEBARENA_NETWORK=

See CLAUDE.md for detailed WebArena configuration documentation.

📁 Project Structure

web-agent/ ├── browser-operator-core/ # Submodule: DevTools frontend source ├── kernel-images/ # Submodule: Base browser environment ├── deployment/ # Deployment configurations │ ├── cloudrun/ # Google Cloud Run deployment │ │ ├── deploy.sh # Cloud deployment script │ │ ├── cloudbuild.yaml # CI/CD pipeline config │ │ ├── service.yaml # Cloud Run service definition │ │ ├── service-secrets.yaml # Service with Secret Manager │ │ ├── cloudrun-wrapper.sh # Cloud Run entrypoint │ │ ├── cloudrun-kernel-wrapper.sh # Alternative wrapper │ │ ├── supervisord-cloudrun.conf # Supervisor for Cloud Run │ │ └── nginx.conf # Reverse proxy config │ └── local/ # Local deployment │ └── run-local.sh # Interactive Docker run script ├── nginx/ # Nginx configurations │ └── nginx-devtools.conf # DevTools nginx config ├── scripts/ # Utility scripts │ ├── init-container.sh # Auto-cleanup of lock files │ └── test-eval-server.sh # Eval server build test ├── supervisor/services/ # Service configs (overrides) ├── eval-server/ │ └── nodejs/ # Eval server (use this, NOT submodule) │ ├── src/ # API server, evaluation server, lib │ ├── start.js # Server entrypoint │ └── package.json ├── evals/ │ ├── run.py # Python evaluation runner │ ├── lib/judge.py # Judge implementations │ └── data/ # Evaluation YAML files ├── Dockerfile.local # Main Docker build (local dev) ├── Dockerfile.devtools # DevTools frontend build ├── Dockerfile.cloudrun # Cloud Run build ├── docker-compose.yml # Local deployment config ├── Makefile # Build commands ├── CLAUDE.md # Technical documentation └── README.md # This file

🐛 Troubleshooting

Local Development Issues

See the detailed troubleshooting section under Local Docker Compose Deployment above.

Common quick fixes:

# Clean restart make stop && make clean && make build && make compose-up # Check logs docker logs kernel-browser-extended # Verify services docker exec kernel-browser-extended supervisorctl status

Cloud Run Issues

Build Timeout

# Use local build for testing ./deploy.sh --local

Port Binding Errors
- Cloud Run requires port 8080
- nginx proxies internal services
- Check nginx.conf for port mappings
Chrome Crashes
- Ensure --no-sandbox flag is set
- Check memory limits (8GB minimum)
- Verify non-root user execution

Cloud Run Debug Commands

# View service logs gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=kernel-browser" --project=$PROJECT_ID --limit=50 # Check service status gcloud run services describe kernel-browser --region=us-central1 # Test endpoints curl https://your-service-url/health curl https://your-service-url/json/version

🔒 Security Considerations

Service runs as non-root user
Chrome uses --no-sandbox (required for containers)
WebRTC streams are not encrypted by default
Consider VPC/firewall rules for production
Use Cloud IAM for API access control

💰 Cost Estimation

Approximate Cloud Run costs:

CPU: $0.00002400 per vCPU-second
Memory: $0.00000250 per GiB-second
Requests: $0.40 per million requests

Example: 1 hour session ≈ $0.50-1.00

🔄 CI/CD Pipeline

The cloudbuild.yaml provides:

Submodule initialization
Docker image build with caching
Container Registry push
Cloud Run deployment
Traffic routing

Build Commands

# Normal build (with cache) - recommended for development gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml # Force rebuild without cache - use when dependencies change gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml --substitutions=_NO_CACHE=true # Automated deployment with Twilio TURN server setup ./deployment/cloudrun/deploy.sh

Cache Control

The build system uses Docker layer caching by default to reduce build times and costs:

With cache: ~5-10 minutes, lower cost
Without cache: ~~30+ minutes, higher cost (~~$3-5 per build)

Use _NO_CACHE=true only when:

Dependencies have changed significantly
Base images need updating
Debugging build issues

📚 Additional Resources

CLAUDE.md - Detailed technical documentation for Claude Code
kernel-images Documentation
Browser Operator DevTools
Cloud Run Documentation
WebRTC Documentation
Chrome DevTools Protocol

🎯 API Examples

Eval Server HTTP API

# Execute browser task curl -X POST http://localhost:8080/v1/responses \ -H "Content-Type: application/json" \ -d '{  "input": "Navigate to google.com and search for puppies",  "url": "about:blank",  "wait_timeout": 5000,  "model": {  "main_model": {  "provider": "openai",  "model": "gpt-4",  "api_key": "your-api-key"  }  }  }' # Get page content curl -X POST http://localhost:8080/page/content \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "format": "html"}' # Capture screenshot curl -X POST http://localhost:8080/page/screenshot \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "fullPage": false}'

WebSocket JSON-RPC API

const WebSocket = require('ws'); const ws = new WebSocket('ws://localhost:8082'); ws.on('open', () => { // Subscribe to evaluations ws.send(JSON.stringify({ jsonrpc: '2.0', method: 'subscribe', params: { clientId: 'my-client' }, id: 1 })); }); ws.on('message', (data) => { const response = JSON.parse(data); console.log('Received:', response); });

Need help? Check CLAUDE.md for detailed technical docs or open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
browser-agent-server		browser-agent-server
deployments		deployments
docs		docs
evals		evals
submodules		submodules
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
Dockerfile.devtools		Dockerfile.devtools
Dockerfile.kernel-cloud		Dockerfile.kernel-cloud
Readme.md		Readme.md

BrowserOperator/web-agent

Folders and files

Latest commit

History

Repository files navigation