Skip to content

BrowserOperator/web-agent

Repository files navigation

Web Agent - Browser Automation & Evaluation Platform

Extended kernel-images Chromium environment with Browser Operator DevTools and eval server for browser automation, testing, and AI agent evaluation.

πŸ—οΈ Architecture

This platform provides:

  • Browser Operator DevTools - Custom DevTools frontend with AI chat panel
  • Eval Server API - HTTP/WebSocket API for browser automation and evaluation
  • Headful Chrome with GUI access via WebRTC
  • Chrome DevTools Protocol for automation (Playwright, Puppeteer)
  • Screen Recording API for session capture
  • Local Docker Compose for development
  • Google Cloud Run deployment option

πŸ“‹ Prerequisites

For Local Development

  1. Docker and Docker Compose installed
  2. Make utility
  3. Git with submodule access
  4. Python 3 (for running evals)

For Cloud Run Deployment

  1. Google Cloud Account with billing enabled
  2. gcloud CLI installed and authenticated
  3. All of the above

πŸš€ Local Development - Two Deployment Options

Option 1: Docker Compose (Recommended for Development)

Best for: Background services, docker-compose workflows, persistent containers

# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start all services in background make compose-up # 4. Verify everything works make test

Option 2: Direct Docker Run (Interactive Mode)

Best for: Interactive debugging, seeing live logs, quick testing

# 1. Initialize submodules make init # 2. Build Docker images (takes ~30 minutes first time) make build # 3. Start in interactive mode (logs to terminal) make run # In another terminal, verify make test

Access Points

After starting with either make compose-up or make run, access:

Service URL Purpose
WebRTC Client http://localhost:8000 Live browser view with control
DevTools UI http://localhost:8001 Enhanced DevTools with AI chat
Eval Server API http://localhost:8080 HTTP REST API for automation
WebRTC Neko http://localhost:8081 WebRTC control interface
Eval Server WS ws://localhost:8082 WebSocket JSON-RPC API
CDP Endpoint http://localhost:9222/json Chrome DevTools Protocol
Recording API http://localhost:444/api Screen recording controls

Available Make Commands

make help # Show all available commands make init # Initialize git submodules make build # Build images (smart caching) make rebuild # Force complete rebuild make build-devtools # Build DevTools base (~30 min) make rebuild-devtools # Fast rebuild with local changes make compose-up # Start in background make run # Start in interactive mode make stop # Stop all containers make restart # Restart containers make logs # View container logs make test # Run API verification test make clean # Clean up everything

Comparison: make run vs make compose-up

Feature make run make compose-up
Log visibility Live logs in terminal Background, use make logs
Stopping Ctrl+C or docker stop make stop or docker-compose down
Restarting Stop and run again docker-compose restart
Use case Interactive debugging Background development
Startup script run-local.sh docker-compose.yml
Lock cleanup Script cleans before start Container cleans on start
Volume mounts Defined in script Defined in compose file

Development Workflow

With Docker Compose (make compose-up):

Editing Eval Server Code:

# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Restart container (no rebuild needed, volume-mounted) docker-compose restart # 3. Test changes make test

Editing DevTools:

# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Restart containers docker-compose down && docker-compose up -d

Full Rebuild:

make rebuild # Rebuild everything from scratch make compose-up # Start containers

With Direct Docker Run (make run):

Editing Eval Server Code:

# 1. Make changes in eval-server/nodejs/ vim eval-server/nodejs/src/api-server.js # 2. Since eval-server is NOT volume-mounted in run mode, rebuild make rebuild # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make run

Editing DevTools:

# 1. Make changes in browser-operator-core/front_end/ vim browser-operator-core/front_end/panels/ai_chat/... # 2. Rebuild DevTools only make rebuild-devtools # 3. Stop and restart # Press Ctrl+C in the terminal running 'make run' make run

Full Rebuild:

make rebuild # Rebuild everything from scratch # Press Ctrl+C in the terminal running 'make run' make run # Start in interactive mode

Customizing Browser Data Location

With make run:

# Default: ./chromium-data make run # Custom location CHROMIUM_DATA_HOST=/path/to/data make run # Ephemeral (no persistence) CHROMIUM_DATA_HOST="" make run

With make compose-up:

# Edit docker-compose.yml to change CHROMIUM_DATA_HOST # Or set environment variable: CHROMIUM_DATA_HOST=/path/to/data make compose-up

Opening URLs on Startup

With make run:

# Open specific URLs when browser starts URLS="https://google.com https://github.com" make run

With make compose-up:

# Add URLS to docker-compose.yml environment section

Running Evaluations

# Simple test make test # Specific evaluation cd evals python3 run.py --path data/web-task-agent/flight-001.yaml --verbose # All evaluations in a directory python3 run.py --path data/web-task-agent/ --verbose

Troubleshooting

Container won't start (docker-compose):

# Check logs docker logs kernel-browser-extended # Clean restart make stop make clean make build make compose-up

Container won't start (make run):

# Stop existing container docker stop kernel-browser-extended docker rm kernel-browser-extended # Clean rebuild make clean make rebuild make run

Port conflicts:

# Remove existing container docker rm -f kernel-browser-extended # Then start with your preferred method make compose-up # OR make run

Lock file errors (should be automatic now): The system now automatically cleans lock files on startup. If you still see errors:

With docker-compose:

docker-compose down rm -f ./chromium-data/user-data/Singleton* make compose-up

With make run:

# Press Ctrl+C to stop rm -f ./chromium-data/user-data/Singleton* make run

Seeing stale code after changes (make run):

# Eval server code is NOT volume-mounted in run mode # You must rebuild after code changes make rebuild # Press Ctrl+C in terminal running 'make run' make run

Want to see live logs (docker-compose):

# Option 1: Follow logs make logs # Option 2: Switch to interactive mode make stop make run

πŸš€ Google Cloud Run Deployment

Configure Google Cloud

# Set your project ID export PROJECT_ID="your-gcp-project-id" gcloud config set project $PROJECT_ID # Authenticate (if not already done) gcloud auth login gcloud auth application-default login

Deploy to Cloud Run

# Automated deployment (recommended) ./deployment/cloudrun/deploy.sh # Or with custom settings ./deployment/cloudrun/deploy.sh --project your-project-id --region us-central1

Access Cloud Run Service

After deployment, you'll get URLs like:

🌐 Service Endpoints: Main Interface: https://kernel-browser-xxx-uc.a.run.app WebRTC Client: https://kernel-browser-xxx-uc.a.run.app/ Chrome DevTools: https://kernel-browser-xxx-uc.a.run.app/ws Recording API: https://kernel-browser-xxx-uc.a.run.app/api Health Check: https://kernel-browser-xxx-uc.a.run.app/health 

πŸ“– Detailed Usage

WebRTC Live View

Access the main URL in your browser to get real-time Chrome access:

  • Full mouse/keyboard control
  • Copy/paste support
  • Window resizing
  • Audio streaming (experimental)

Chrome DevTools Protocol

Connect automation tools to the /ws endpoint:

// Playwright const browser = await chromium.connectOverCDP('wss://your-service-url/ws'); // Puppeteer  const browser = await puppeteer.connect({ browserWSEndpoint: 'wss://your-service-url/ws', });

Recording API

Capture screen recordings via REST API:

# Start recording curl -X POST https://your-service-url/api/recording/start -d '{}' # Stop recording  curl -X POST https://your-service-url/api/recording/stop -d '{}' # Download recording curl https://your-service-url/api/recording/download --output recording.mp4

βš™οΈ Configuration

Environment Variables

Key configuration options in service.yaml:

env: - name: ENABLE_WEBRTC value: "true" # Enable WebRTC streaming - name: WIDTH  value: "1024" # Browser width - name: HEIGHT value: "768" # Browser height - name: CHROMIUM_FLAGS value: "--no-sandbox..." # Chrome launch flags - name: NEKO_ICESERVERS value: '[{"urls": [...]}]' # TURN/STUN servers

Resource Limits

Default Cloud Run settings:

  • CPU: 4 cores
  • Memory: 8GB
  • Timeout: 1 hour
  • Concurrency: 1 (one browser per container)

Scaling

  • Min instances: 0 (scales to zero when unused)
  • Max instances: 10 (adjustable)
  • Cold start: ~30-60 seconds

πŸ”§ Advanced Configuration

Custom Chrome Flags

Edit service.yaml to modify Chrome behavior:

- name: CHROMIUM_FLAGS value: "--user-data-dir=/home/kernel/user-data --disable-dev-shm-usage --custom-flag"

TURN Server for WebRTC

For production WebRTC, configure a TURN server:

- name: NEKO_ICESERVERS value: '[{"urls": ["turn:turn.example.com:3478"], "username": "user", "credential": "pass"}]'

WebArena Configuration (Optional)

The platform supports running WebArena benchmark evaluations against self-hosted test websites. This is completely optional and only needed if you're running WebArena tasks.

What is WebArena?

WebArena is a research benchmark with 812 tasks across 7 self-hosted websites (e-commerce, forums, GitLab, Wikipedia, etc.). To run these evaluations, you need to route specific domains to a custom IP address.

Quick Setup

1. Configure environment variables in evals/.env:

cd evals cp .env.example .env vim .env

Add:

# WebArena Infrastructure Configuration WEBARENA_HOST_IP=172.16.55.59 # IP where WebArena sites are hosted WEBARENA_NETWORK=172.16.55.0/24 # Network CIDR for routing # WebArena Site URLs (optional - customize if needed) SHOPPING=http://onestopmarket.com SHOPPING_ADMIN=http://onestopmarket.com/admin REDDIT=http://reddit.com GITLAB=http://gitlab.com WIKIPEDIA=http://wikipedia.org

2. Start container (configuration is auto-loaded):

make compose-up # OR make run

3. Verify WebArena routing is enabled:

docker logs kernel-browser-extended | grep -i webarena

You should see:

🌐 [init] Configuring WebArena DNS mapping to 172.16.55.59... 🌐 [init] Adding route to 172.16.55.0/24 via 172.17.0.1... 

4. Run WebArena evaluations:

cd evals python3 run_webarena.py --task-id 1 --verbose

How It Works

When WEBARENA_HOST_IP is set:

  • DNS Mapping: Chromium routes WebArena domains (gitlab.com, reddit.com, etc.) to your specified IP
  • Network Routing: Container adds route to reach the WebArena network
  • Automatic: Configuration happens on container startup via scripts/init-container.sh

Without configuration (default):

  • System works normally with standard DNS resolution
  • WebArena routing is completely disabled
  • No impact on regular browser automation

Deployment-Specific IPs

You can use different IP addresses for different environments:

# Local development WEBARENA_HOST_IP=172.16.55.59 WEBARENA_NETWORK=172.16.55.0/24 # Cloud deployment WEBARENA_HOST_IP=34.123.45.67 WEBARENA_NETWORK=34.123.45.0/24 # Disable WebArena (default) WEBARENA_HOST_IP= WEBARENA_NETWORK=

See CLAUDE.md for detailed WebArena configuration documentation.

πŸ“ Project Structure

web-agent/ β”œβ”€β”€ browser-operator-core/ # Submodule: DevTools frontend source β”œβ”€β”€ kernel-images/ # Submodule: Base browser environment β”œβ”€β”€ deployment/ # Deployment configurations β”‚ β”œβ”€β”€ cloudrun/ # Google Cloud Run deployment β”‚ β”‚ β”œβ”€β”€ deploy.sh # Cloud deployment script β”‚ β”‚ β”œβ”€β”€ cloudbuild.yaml # CI/CD pipeline config β”‚ β”‚ β”œβ”€β”€ service.yaml # Cloud Run service definition β”‚ β”‚ β”œβ”€β”€ service-secrets.yaml # Service with Secret Manager β”‚ β”‚ β”œβ”€β”€ cloudrun-wrapper.sh # Cloud Run entrypoint β”‚ β”‚ β”œβ”€β”€ cloudrun-kernel-wrapper.sh # Alternative wrapper β”‚ β”‚ β”œβ”€β”€ supervisord-cloudrun.conf # Supervisor for Cloud Run β”‚ β”‚ └── nginx.conf # Reverse proxy config β”‚ └── local/ # Local deployment β”‚ └── run-local.sh # Interactive Docker run script β”œβ”€β”€ nginx/ # Nginx configurations β”‚ └── nginx-devtools.conf # DevTools nginx config β”œβ”€β”€ scripts/ # Utility scripts β”‚ β”œβ”€β”€ init-container.sh # Auto-cleanup of lock files β”‚ └── test-eval-server.sh # Eval server build test β”œβ”€β”€ supervisor/services/ # Service configs (overrides) β”œβ”€β”€ eval-server/ β”‚ └── nodejs/ # Eval server (use this, NOT submodule) β”‚ β”œβ”€β”€ src/ # API server, evaluation server, lib β”‚ β”œβ”€β”€ start.js # Server entrypoint β”‚ └── package.json β”œβ”€β”€ evals/ β”‚ β”œβ”€β”€ run.py # Python evaluation runner β”‚ β”œβ”€β”€ lib/judge.py # Judge implementations β”‚ └── data/ # Evaluation YAML files β”œβ”€β”€ Dockerfile.local # Main Docker build (local dev) β”œβ”€β”€ Dockerfile.devtools # DevTools frontend build β”œβ”€β”€ Dockerfile.cloudrun # Cloud Run build β”œβ”€β”€ docker-compose.yml # Local deployment config β”œβ”€β”€ Makefile # Build commands β”œβ”€β”€ CLAUDE.md # Technical documentation └── README.md # This file 

πŸ› Troubleshooting

Local Development Issues

See the detailed troubleshooting section under Local Docker Compose Deployment above.

Common quick fixes:

# Clean restart make stop && make clean && make build && make compose-up # Check logs docker logs kernel-browser-extended # Verify services docker exec kernel-browser-extended supervisorctl status

Cloud Run Issues

  1. Build Timeout

    # Use local build for testing ./deploy.sh --local
  2. Port Binding Errors

    • Cloud Run requires port 8080
    • nginx proxies internal services
    • Check nginx.conf for port mappings
  3. Chrome Crashes

    • Ensure --no-sandbox flag is set
    • Check memory limits (8GB minimum)
    • Verify non-root user execution

Cloud Run Debug Commands

# View service logs gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=kernel-browser" --project=$PROJECT_ID --limit=50 # Check service status gcloud run services describe kernel-browser --region=us-central1 # Test endpoints curl https://your-service-url/health curl https://your-service-url/json/version

πŸ”’ Security Considerations

  • Service runs as non-root user
  • Chrome uses --no-sandbox (required for containers)
  • WebRTC streams are not encrypted by default
  • Consider VPC/firewall rules for production
  • Use Cloud IAM for API access control

πŸ’° Cost Estimation

Approximate Cloud Run costs:

  • CPU: $0.00002400 per vCPU-second
  • Memory: $0.00000250 per GiB-second
  • Requests: $0.40 per million requests

Example: 1 hour session β‰ˆ $0.50-1.00

πŸ”„ CI/CD Pipeline

The cloudbuild.yaml provides:

  1. Submodule initialization
  2. Docker image build with caching
  3. Container Registry push
  4. Cloud Run deployment
  5. Traffic routing

Build Commands

# Normal build (with cache) - recommended for development gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml # Force rebuild without cache - use when dependencies change gcloud builds submit --config deployment/cloudrun/cloudbuild.yaml --substitutions=_NO_CACHE=true # Automated deployment with Twilio TURN server setup ./deployment/cloudrun/deploy.sh

Cache Control

The build system uses Docker layer caching by default to reduce build times and costs:

  • With cache: ~5-10 minutes, lower cost
  • Without cache: 30+ minutes, higher cost ($3-5 per build)

Use _NO_CACHE=true only when:

  • Dependencies have changed significantly
  • Base images need updating
  • Debugging build issues

πŸ“š Additional Resources

🎯 API Examples

Eval Server HTTP API

# Execute browser task curl -X POST http://localhost:8080/v1/responses \ -H "Content-Type: application/json" \ -d '{  "input": "Navigate to google.com and search for puppies",  "url": "about:blank",  "wait_timeout": 5000,  "model": {  "main_model": {  "provider": "openai",  "model": "gpt-4",  "api_key": "your-api-key"  }  }  }' # Get page content curl -X POST http://localhost:8080/page/content \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "format": "html"}' # Capture screenshot curl -X POST http://localhost:8080/page/screenshot \ -H "Content-Type: application/json" \ -d '{"clientId": "test", "tabId": "tab-001", "fullPage": false}'

WebSocket JSON-RPC API

const WebSocket = require('ws'); const ws = new WebSocket('ws://localhost:8082'); ws.on('open', () => { // Subscribe to evaluations ws.send(JSON.stringify({ jsonrpc: '2.0', method: 'subscribe', params: { clientId: 'my-client' }, id: 1 })); }); ws.on('message', (data) => { const response = JSON.parse(data); console.log('Received:', response); });

Need help? Check CLAUDE.md for detailed technical docs or open an issue.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •