Skip to content

BrowserOperator/web-agent

Β 
Β 

Repository files navigation

Web Agent - Browser Automation & Evaluation Platform

Extended kernel-images Chromium environment with Browser Operator DevTools and eval server for browser automation, testing, and AI agent evaluation.

πŸ—οΈ Architecture

This platform provides:

  • Browser Operator DevTools - Custom DevTools frontend with AI chat panel
  • Eval Server API - HTTP/WebSocket API for browser automation and evaluation
  • Headful Chrome with GUI access via WebRTC
  • Chrome DevTools Protocol for automation (Playwright, Puppeteer)
  • Screen Recording API for session capture
  • Local Docker Compose for development
  • Google Cloud Run deployment option

πŸ“‹ Prerequisites

For Local Development

  1. Docker and Docker Compose installed
  2. Make utility
  3. Git with submodule access
  4. Python 3 (for running evals)

For Cloud Run Deployment

  1. Google Cloud Account with billing enabled
  2. gcloud CLI installed and authenticated
  3. All of the above

πŸš€ Local Development - Two Deployment Options

Option 1: Docker Compose (Recommended for Development)

Best for: Background services, docker-compose workflows, persistent containers

# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start all services in background make compose-up # 4. Verify everything works make test

Option 2: Direct Docker Run (Interactive Mode)

Best for: Interactive debugging, seeing live logs, quick testing

# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start in interactive mode (logs to terminal) make run # In another terminal, verify make test

Access Points

After starting with either make compose-up or make run, access:

Service URL Purpose
WebRTC Client http://localhost:8000 Live browser view with control
DevTools UI http://localhost:8001 Enhanced DevTools with AI chat
Eval Server API http://localhost:8080 HTTP REST API for automation
WebRTC Neko http://localhost:8081 WebRTC control interface
Eval Server WS ws://localhost:8082 WebSocket JSON-RPC API
CDP Endpoint http://localhost:9222/json Chrome DevTools Protocol
Recording API http://localhost:444/api Screen recording controls

Available Make Commands

make help # Show all available commands make init # Initialize git submodules make build # Build images (smart caching) make rebuild # Force complete rebuild make build-devtools # Build DevTools base (~30 min) make rebuild-devtools # Fast rebuild with local changes make compose-up # Start in background make run # Start in interactive mode make stop # Stop all containers make restart # Restart containers make logs # View container logs make test # Run API verification test make clean # Clean up everything

Comparison: make run vs make compose-up

Feature make run make compose-up
Log visibility Live logs in terminal Background, use make logs
Stopping Ctrl+C or docker stop make stop or docker-compose down
Restarting Stop and run again docker-compose restart
Use case Interactive debugging Background development
Startup script run-local.sh docker-compose.yml
Lock cleanup Script cleans before start Container cleans on start
Volume mounts Defined in script Defined in compose file

Development Workflow

With Docker Compose (make compose-up):

Editing Eval Server Code:

# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Restart container (no rebuild needed, volume-mounted) docker-compose restart # 3. Test changes make test

Editing DevTools:

# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Restart containers docker-compose down && docker-compose up -d

Full Rebuild:

make rebuild # Rebuild everything from scratch make compose-up # Start containers

With Direct Docker Run (make run):

Editing Eval Server Code:

# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Since eval-server is NOT volume-mounted in run mode, rebuild make rebuild # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make run

Editing DevTools:

# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make run

Full Rebuild:

make rebuild # Rebuild everything from scratch # Press Ctrl+C in the terminal running 'make run' make run # Start in interactive mode

Customizing Browser Data Location

With make run:

# Default: ./chromium-data make run # Custom location CHROMIUM_DATA_HOST=/path/to/data make run # Ephemeral (no persistence) CHROMIUM_DATA_HOST="" make run

With make compose-up:

# Edit docker-compose.yml to change CHROMIUM_DATA_HOST # Or set environment variable: CHROMIUM_DATA_HOST=/path/to/data make compose-up

Opening URLs on Startup

With make run:

# Open specific URLs when browser starts URLS="https://google.com https://github.com" make run

With make compose-up:

# Add URLS to docker-compose.yml environment section

Running Evaluations

# Simple test make test # Specific evaluation cd evals python3 run.py --path data/web-task-agent/flight-001.yaml --verbose # All evaluations in a directory python3 run.py --path data/web-task-agent/ --verbose

Troubleshooting

Container won't start (docker-compose):

# Check logs docker logs kernel-browser-extended # Clean restart make stop make clean make build make compose-up

Container won't start (make run):

# Stop existing container docker stop kernel-browser-extended docker rm kernel-browser-extended # Clean rebuild make clean make rebuild make run

Port conflicts:

# Remove existing container docker rm -f kernel-browser-extended # Then start with your preferred method make compose-up # OR make run

Lock file errors (should be automatic now): The system now automatically cleans lock files on startup. If you still see errors:

With docker-compose:

docker-compose down rm -f ./chromium-data/user-data/Singleton* make compose-up

With make run:

# Press Ctrl+C to stop rm -f ./chromium-data/user-data/Singleton* make run

Seeing stale code after changes (make run):

# Eval server code is NOT volume-mounted in run mode # You must rebuild after code changes make rebuild # Press Ctrl+C in terminal running 'make run' make run

Want to see live logs (docker-compose):

# Option 1: Follow logs make logs # Option 2: Switch to interactive mode make stop make run

πŸš€ Google Cloud Run Deployment

Configure Google Cloud

# Set your project ID export PROJECT_ID="your-gcp-project-id" gcloud config set project $PROJECT_ID # Authenticate (if not already done) gcloud auth login gcloud auth application-default login

Deploy to Cloud Run

# Automated deployment (recommended) ./deployment/cloudrun/deploy.sh # Or with custom settings ./deployment/cloudrun/deploy.sh --project your-project-id --region us-central1

Access Cloud Run Service

After deployment, you'll get URLs like:

🌐 Service Endpoints: Main Interface: https://kernel-browser-xxx-uc.a.run.app WebRTC Client: https://kernel-browser-xxx-uc.a.run.app/ Chrome DevTools: https://kernel-browser-xxx-uc.a.run.app/ws Recording API: https://kernel-browser-xxx-uc.a.run.app/api Health Check: https://kernel-browser-xxx-uc.a.run.app/health 

πŸ“– Detailed Usage

WebRTC Live View

Access the main URL in your browser to get real-time Chrome access:

  • Full mouse/keyboard control
  • Copy/paste support
  • Window resizing
  • Audio streaming (experimental)

Chrome DevTools Protocol

Connect automation tools to the /ws endpoint:

// Playwright const browser = await chromium.connectOverCDP('wss://your-service-url/ws'); // Puppeteer  const browser = await puppeteer.connect({ browserWSEndpoint: 'wss://your-service-url/ws', });

Recording API

Capture screen recordings via REST API:

# Start recording curl -X POST https://your-service-url/api/recording/start -d '{}' # Stop recording  curl -X POST https://your-service-url/api/recording/stop -d '{}' # Download recording curl https://your-service-url/api/recording/download --output recording.mp4

βš™οΈ Configuration

Environment Variables

Key configuration options in service.yaml:

env: - name: ENABLE_WEBRTC value: "true" # Enable WebRTC streaming - name: WIDTH  value: "1024" # Browser width - name: HEIGHT value: "768" # Browser height - name: CHROMIUM_FLAGS value: "--no-sandbox..." # Chrome launch flags - name: NEKO_ICESERVERS value: '[{"urls": [...]}]' # TURN/STUN servers

Resource Limits

Default Cloud Run settings:

  • CPU: 4 cores
  • Memory: 8GB
  • Timeout: 1 hour
  • Concurrency: 1 (one browser per container)

Scaling

  • Min instances: 0 (scales to zero when unused)
  • Max instances: 10 (adjustable)
  • Cold start: ~30-60 seconds

πŸ”§ Advanced Configuration

Custom Chrome Flags

Edit service.yaml to modify Chrome behavior:

- name: CHROMIUM_FLAGS value: "--user-data-dir=/home/kernel/user-data --disable-dev-shm-usage --custom-flag"

TURN Server for WebRTC

For production WebRTC, configure a TURN server:

- name: NEKO_ICESERVERS value: '[{"urls": ["turn:turn.example.com:3478"], "username": "user", "credential": "pass"}]'

WebArena Configuration (Optional)

The platform supports running WebArena benchmark evaluations against self-hosted test websites. This is completely optional and only needed if you're running WebArena tasks.

What is WebArena?

WebArena is a research benchmark with 812 tasks across 7 self-hosted websites (e-commerce, forums, GitLab, Wikipedia, etc.). To run these evaluations, you need to route specific domains to a custom IP address.

Quick Setup

1. Configure environment variables in evals/.env:

cd evals cp .env.example .env vim .env

Add:

# WebArena Infrastructure Configuration WEBARENA_HOST_IP=172.16.55.59 # IP where WebArena sites are hosted WEBARENA_NETWORK=172.16.55.0/24 # Network CIDR for routing # WebArena Site URLs (optional - customize if needed) SHOPPING=http://onestopmarket.com SHOPPING_ADMIN=http://onestopmarket.com/admin REDDIT=http://reddit.com GITLAB=http://gitlab.com WIKIPEDIA=http://wikipedia.org

2. Start container (configuration is auto-loaded):

make compose-up # OR make run

3. Verify WebArena routing is enabled:

docker logs kernel-browser-extended | grep -i webarena

You should see:

🌐 [init] Configuring WebArena DNS mapping to 172.16.55.59... 🌐 [init] Adding route to 172.16.55.0/24 via 172.17.0.1... 

4. Run WebArena evaluations:

cd evals python3 run_webarena.py --task-id 1 --verbose

How It Works

When WEBARENA_HOST_IP is set:

  • DNS Mapping: Chromium routes WebArena domains (gitlab.com, reddit.com, etc.) to your specified IP
  • Network Routing: Container adds route to reach the WebArena network
  • Automatic: Configuration happens on container startup via scripts/init-container.sh

Without configuration (default):

  • System works normally with standard DNS resolution
  • WebArena routing is completely disabled
  • No impact on regular browser automation

Deployment-Specific IPs

You can use different IP addresses for different environments:

# Local development WEBARENA_HOST_IP=172.16.55.59 WEBARENA_NETWORK=172.16.55.0/24 # Cloud deployment WEBARENA_HOST_IP=34.123.45.67 WEBARENA_NETWORK=34.123.45.0/24 # Disable WebArena (default) WEBARENA_HOST_IP= WEBARENA_NETWORK=

See CLAUDE.md for detailed WebArena configuration documentation.

πŸ“ Project Structure

web-agent/ β”œβ”€β”€ browser-operator-core/ # Submodule: DevTools frontend source β”œβ”€β”€ kernel-images/ # Submodule: Base browser environment β”œβ”€β”€ deployment/ # Deployment configurations β”‚ β”œβ”€β”€ cloudrun/ # Google Cloud Run deployment β”‚ β”‚ β”œβ”€β”€ deploy.sh # Cloud deployment script β”‚ β”‚ β”œβ”€β”€ cloudbuild.yaml # CI/CD pipeline config β”‚ β”‚ β”œβ”€β”€ service.yaml # Cloud Run service definition β”‚ β”‚ β”œβ”€β”€ service-secrets.yaml # Service with Secret Manager β”‚ β”‚ β”œβ”€β”€ cloudrun-wrapper.sh # Cloud Run entrypoint β”‚ β”‚ β”œβ”€β”€ cloudrun-kernel-wrapper.sh # Alternative wrapper β”‚ β”‚ β”œβ”€β”€ supervisord-cloudrun.conf # Supervisor for Cloud Run β”‚ β”‚ └── nginx.conf # Reverse proxy config β”‚ └── local/ # Local deployment β”‚ └── run-local.sh # Interactive Docker run script β”œβ”€β”€ nginx/ # Nginx configurations β”‚ └── nginx-devtools.conf # DevTools nginx config β”œβ”€β”€ scripts/ # Utility scripts β”‚ β”œβ”€β”€ init-container.sh # Auto-cleanup of lock files β”‚ └── test-eval-server.sh # Eval server build test β”œβ”€β”€ supervisor/services/ # Service configs (overrides) β”œβ”€β”€ eval-server/ β”‚ └── nodejs/ # Eval server (use this, NOT submodule) β”‚ β”œβ”€β”€ src/ # API server, evaluation server, lib β”‚ β”œβ”€β”€ start.js # Server entrypoint β”‚ └── package.json β”œβ”€β”€ evals/ β”‚ β”œβ”€β”€ run.py # Python evaluation runner β”‚ β”œβ”€β”€ lib/judge.py # Judge implementations β”‚ └── data/ # Evaluation YAML files β”œβ”€β”€ Dockerfile.local # Main Docker build (local dev) β”œβ”€β”€ Dockerfile.devtools # DevTools frontend build β”œβ”€β”€ Dockerfile.cloudrun # Cloud Run build β”œβ”€β”€ docker-compose.yml # Local deployment config β”œβ”€β”€ Makefile # Build commands β”œβ”€β”€ CLAUDE.md # Technical documentation └── README.md # This file 

πŸ› Troubleshooting

Local Development Issues

See the detailed troubleshooting section under Local Docker Compose Deployment above.

Common quick fixes:

# Clean restart make stop && make clean && make build && make compose-up # Check logs docker logs kernel-browser-extended # Verify services docker exec kernel-browser-extended supervisorctl status

Cloud Run Issues

  1. Build Timeout

    # Use local build for testing ./deploy.sh --local
  2. Port Binding Errors

    • Cloud Run requires port 8080
    • nginx proxies internal services
    • Check nginx.conf for port mappings
  3. Chrome Crashes

    • Ensure --no-sandbox flag is set
    • Check memory limits (8GB minimum)
    • Verify non-root user execution

Cloud Run Debug Commands

# View service logs gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=kernel-browser" --project=$PROJECT_ID --limit=50 # Check service status gcloud run services describe kernel-browser --region=us-central1 # Test endpoints curl https://your-service-url/health curl https://your-service-url/json/version

πŸ”’ Security Considerations

  • Service runs as non-root user
  • Chrome uses --no-sandbox (required for containers)
  • WebRTC streams are not encrypted by default
  • Consider VPC/firewall rules for production
  • Use Cloud IAM for API access control

πŸ’° Cost Estimation

Approximate Cloud Run costs:

  • CPU: $0.00002400 per vCPU-second
  • Memory: $0.00000250 per GiB-second
  • Requests: $0.40 per million requests

Example: 1 hour session β‰ˆ $0.50-1.00

πŸ”„ CI/CD Pipeline

The cloudbuild.yaml provides:

  1. Submodule initialization
  2. Docker image build with caching
  3. Container Registry push
  4. Cloud Run deployment
  5. Traffic routing

Build Commands

# Normal build (with cache) - recommended for development gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml # Force rebuild without cache - use when dependencies change gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml --substitutions=_NO_CACHE=true # Automated deployment with Twilio TURN server setup ./deployment/cloudrun/deploy.sh

Cache Control

The build system uses Docker layer caching by default to reduce build times and costs:

  • With cache: ~5-10 minutes, lower cost
  • Without cache: 30+ minutes, higher cost ($3-5 per build)

Use _NO_CACHE=true only when:

  • Dependencies have changed significantly
  • Base images need updating
  • Debugging build issues

πŸ“š Additional Resources

🎯 API Examples

Eval Server HTTP API

# Execute browser task curl -X POST http://localhost:8080/v1/responses \ -H "Content-Type: application/json" \ -d '{  "input": "Navigate to google.com and search for puppies",  "url": "about:blank",  "wait_timeout": 5000,  "model": {  "main_model": {  "provider": "openai",  "model": "gpt-4",  "api_key": "your-api-key"  }  }  }' # Get page content curl -X POST http://localhost:8080/page/content \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "format": "html"}' # Capture screenshot curl -X POST http://localhost:8080/page/screenshot \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "fullPage": false}'

WebSocket JSON-RPC API

const WebSocket = require('ws'); const ws = new WebSocket('ws://localhost:8082'); ws.on('open', () => { // Subscribe to evaluations ws.send(JSON.stringify({ jsonrpc: '2.0', method: 'subscribe', params: { clientId: 'my-client' }, id: 1 })); }); ws.on('message', (data) => { const response = JSON.parse(data); console.log('Received:', response); });

Need help? Check CLAUDE.md for detailed technical docs or open an issue.