- 🔧 Tool Router Mode: Composio's intelligent tool routing for accessing Gmail, Slack, GitHub, and 500+ integrations
- ◉ Browser Tools Mode: Gemini 2.5 Computer Use for visual browser automation with screenshots, clicks, typing, scrolling, and navigation
- Sidebar Chat Interface: Clean, modern React-based chat UI accessible from any tab
- Direct Browser Automation: No backend required - all API calls made directly from extension
- Visual Feedback: Blue click indicators and element highlighting during automation
- Smart Coordinate Scaling: Automatically scales Gemini's 1000x1000 coordinate system to actual viewport dimensions
- Safety Features: Confirmation dialogs for sensitive actions (checkout, payment, etc.)
- Node.js 18+ and npm
- Chrome or Edge browser (Manifest V3 support)
- Google API key for Gemini (required)
- Composio API key (optional, for Tool Router mode)
- Clone this repository
- Install dependencies:
npm install- Build the extension:
npm run build- Load the extension in Chrome:
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" in the top right
- Click "Load unpacked"
- Select the
distfolder - Open Settings (⚙️ icon) to configure your API keys
- Open Chrome and navigate to
-
Google API Key (Required)
- Get your key from Google AI Studio
- Add it in Settings under "Google API Key"
- Supports: Gemini 2.5 Pro, Flash, and Flash Lite
-
Composio API Key (Optional - for Tool Router mode)
- Get your key from Composio Dashboard
- Add it in Settings under "Composio API Key"
- Enables access to 500+ app integrations
- Enable Browser Tools by clicking the ◉ button in the chat header
- The extension automatically uses Gemini 2.5 Computer Use Preview
- Provide natural language instructions to control the browser
Example prompts:
- "Navigate to reddit.com and scroll down"
- "Click on the search box and type 'puppies'"
- "Take a screenshot of this page"
- "Click the first image on the page"
- Add your Composio API key in Settings
- Click ◉ to disable Browser Tools (or keep it off)
- Chat normally - the AI will automatically use Composio tools when needed
Example prompts:
- "Check my Gmail for unread messages"
- "Create a GitHub issue titled 'Bug in login flow'"
- "Send a Slack message to #general with 'Hello team!'"
Run with hot reload:
npm run devThen reload the extension in Chrome after each change.
- FAQ - Frequently asked questions and quick troubleshooting
- TROUBLESHOOTING.md - Detailed troubleshooting guide for common issues
Atlas consists of three main components that work together:
┌─────────────────┐ │ sidepanel.tsx │ ← React UI with chat interface │ (React UI) │ └────────┬────────┘ │ ↓ Messages ┌─────────────────┐ │ background.ts │ ← Service worker (screenshots, navigation) │ (Service │ │ Worker) │ └────────┬────────┘ │ ↓ Execute actions ┌─────────────────┐ │ content.ts │ ← Content script (DOM manipulation) │ (Injected │ │ on all tabs) │ └─────────────────┘ The main React component handles:
Browser Tools Mode (streamWithGeminiComputerUse):
- Takes initial screenshot of current page
- Sends screenshot + conversation history to Gemini 2.5 Computer Use
- Receives function calls (click, type, navigate, scroll, etc.)
- Executes actions via
executeBrowserAction() - Re-takes screenshot after each action
- Scales coordinates from Gemini's 1000x1000 grid to actual viewport
- Supports up to 30 turns of action-response loops
- Shows visual feedback in UI
Tool Router Mode (streamWithAISDKAndMCP):
- Connects to Composio's MCP (Model Context Protocol) server
- Uses Vercel AI SDK for streaming responses
- Auto-discovers and calls tools via MCP
- Manages MCP client lifecycle and session persistence
Key Functions:
executeTool()- Sends message to background script to perform browser actionsscaleCoordinates()- Converts Gemini coordinates to viewport coordinatesexecuteBrowserAction()- Maps Gemini function names to actual browser actionsrequiresUserConfirmation()- Safety checks for sensitive actionsloadSettings()- Manages Composio session initialization
The service worker provides:
Screenshot Functionality:
- Captures visible tab using
chrome.tabs.captureVisibleTab() - Auto-handles restricted pages (chrome://, about:, etc.) by navigating to Google.com
- Filters for actual visible tabs (not devtools or hidden windows)
- Returns data URL for screenshot
Action Execution:
- Relays commands from sidepanel to content script
- Ensures content script is injected before execution
- Handles tab targeting and message passing
Browser APIs:
- Gets browser history
- Manages bookmarks
- Navigates tabs
- Tracks recent pages for memory
Key Message Types:
TAKE_SCREENSHOT- Capture current tabGET_PAGE_CONTEXT- Extract page metadataEXECUTE_ACTION- Run browser action (click, type, etc.)NAVIGATE- Change page URLGET_HISTORY- Fetch browsing history
Injected into every webpage to:
Extract Page Context:
- URL, title, text content
- Links, images, forms
- Viewport dimensions (width, height, scroll position)
- Metadata (description, keywords, author)
Execute DOM Actions:
Click (highlightElement() + executePageAction('click')):
- Can click by CSS selector OR by coordinates
- Dispatches full mouse event sequence (mousedown, mouseup, click)
- Shows blue pulsing animation at click location
- Highlights element with blue outline
- Returns element info for debugging
Type (keyboard_type action):
- Types character by character to simulate real keyboard input
- Works with regular inputs, textareas, and contenteditable elements
- Dispatches input/change events for React/Vue/Angular compatibility
- For
type_text_at: clicks coordinates, waits for focus, clears existing text, then types
Scroll:
- Supports up/down by pixels
- Can scroll to top/bottom
- Can scroll element into view
Navigate:
- Uses
chrome.tabs.update()to change URLs
Special Actions:
hover- Mouse over at coordinatesdrag_drop- Drag from one point to anotherkey_combination- Press keyboard shortcutsclear_input- Clear focused field
Visual Feedback:
- Blue outline on clicked elements
- Blue pulsing circle at click coordinates
- Animation automatically cleans up after 600ms
When you enable Browser Tools (◉ button):
1. User sends message: "Navigate to reddit.com and scroll" ↓ 2. sidepanel.tsx → streamWithGeminiComputerUse() ↓ 3. Take initial screenshot via executeTool('screenshot') ↓ 4. Send to Gemini 2.5 Computer Use with: - Screenshot as inline_data (base64 PNG) - Conversation history - System instruction with available functions ↓ 5. Gemini responds with function call: navigate({url: "https://reddit.com"}) ↓ 6. executeBrowserAction() maps to executeTool('navigate') ↓ 7. background.ts receives EXECUTE_ACTION message ↓ 8. chrome.tabs.update() navigates to URL ↓ 9. Wait 2.5 seconds for page to load ↓ 10. Take new screenshot ↓ 11. Send function_response back to Gemini: { url: "https://reddit.com", success: true, [screenshot] } ↓ 12. Gemini calls scroll_down() ↓ 13. Execute via content.ts → window.scrollBy() ↓ 14. Take another screenshot ↓ 15. Continue loop (up to 30 turns) until task complete Coordinate Scaling Explained:
Gemini Computer Use uses normalized coordinates (0-1000 on both axes). Atlas automatically scales them:
// Gemini returns: x=500, y=300 (in 1000x1000 space) // Actual viewport: 1920x1080 scaledX = (500 / 1000) * 1920 = 960 scaledY = (300 / 1000) * 1080 = 324This ensures clicks land in the right place regardless of screen size.
The extension has built-in safety checks:
Always Requires Confirmation:
- Keyboard combinations (Ctrl+A, Alt+Tab, etc.)
Context-Aware Confirmation:
- Sensitive Pages: Checkout, payment, login, admin pages
- Sensitive Data: Detecting passwords or credit cards being typed
- Form Submissions: When typing with
press_enter: true
Confirmation appears as a browser dialog before executing the action.
When Composio API key is provided:
1. initializeComposioToolRouter() creates session ↓ 2. Gets MCP URLs (chat_session_mcp_url, tool_router_mcp_url) ↓ 3. Connects to MCP via StreamableHTTPClientTransport ↓ 4. Queries available tools: mcpClient.tools() ↓ 5. Merges MCP tools with local tools (getBrowserHistory) ↓ 6. Passes all tools to AI SDK: streamText({ tools: allTools }) ↓ 7. AI SDK orchestrates tool calls via MCP ↓ 8. Composio executes integration actions ↓ 9. Results streamed back to user Available Tools in Tool Router Mode:
- Composio Tools - 500+ integrations (Gmail, Slack, GitHub, etc.)
- getBrowserHistory - Built-in browser history search
The MCP client persists across messages but is recreated on "New Chat" to refresh available tools.
Chrome Storage (chrome.storage.local):
atlasSettings- API keys, model selectioncomposioSessionId- Active Composio sessioncomposioChatMcpUrl- Chat MCP endpointcomposioToolRouterMcpUrl- Tool Router MCP endpointextensionUserId- Unique user ID for rate limitingbrowserMemory- Recent pages, preferences
Session Management:
- Composio sessions expire after 24 hours
- New chat resets session to refresh tools
- Session automatically recreated if expired
When using Tool Router mode (with Composio API key), the agent has access to a built-in getBrowserHistory tool that allows it to search through your browsing history.
Tool Features:
- Search by keyword - Filter history by page title or URL
- Time range - Default searches last 7 days, configurable
- Result limit - Default returns 20 results, adjustable
Example Usage:
- "What GitHub repositories did I visit this week?"
- "Find the Reddit post I looked at yesterday"
- "Show me recent news articles I read"
- "What programming tutorials did I visit last month?"
The tool respects Chrome's history permissions and only accesses data you've already stored in your browser history.
atlas/ ├── sidepanel.tsx # Main React component (1426 lines) │ ├── Browser Tools mode with Gemini Computer Use │ ├── Tool Router mode with MCP │ ├── Message parsing with React Markdown │ └── Coordinate scaling and safety checks │ ├── content.ts # Content script (714 lines) │ ├── Page context extraction │ ├── DOM manipulation (click, type, scroll) │ ├── Visual feedback (blue click indicators) │ └── Keyboard simulation │ ├── background.ts # Service worker (302 lines) │ ├── Screenshot capture │ ├── Tab navigation │ ├── Browser history │ └── Message routing │ ├── settings.tsx # Settings page (163 lines) │ └── API key configuration UI │ ├── tools.ts # Composio integration (68 lines) │ └── MCP session management │ ├── types.ts # TypeScript definitions (271 lines) │ ├── Zod schemas for validation │ └── Interface definitions │ ├── manifest.json # Extension manifest (Manifest V3) │ ├── Permissions (tabs, history, bookmarks, etc.) │ └── Content script injection │ └── vite.config.ts # Build configuration - React 18 - UI framework with hooks
- TypeScript - Type safety
- Vite - Build tool and bundler
- Vercel AI SDK - Streaming AI responses
- React Markdown - Markdown rendering
- Zod - Runtime validation
- Chrome Extension APIs - Manifest V3
- Google Gemini API - AI models (2.5 Pro/Flash/Lite/Computer Use)
- Composio MCP - Tool Router integration
- StreamableHTTPClientTransport - MCP transport
When clicking on a page:
- Blue outline appears around element (3px solid #007AFF)
- Light blue background highlight (rgba(0, 122, 255, 0.1))
- Pulsing circle animation at click coordinates
- All effects auto-remove after 600ms
The keyboard_type action:
- Types character-by-character
- Dispatches keydown, keypress, keyup for each char
- Triggers input/change events for React compatibility
- Works with INPUT, TEXTAREA, and contenteditable elements
- Screenshots have retry logic (3 attempts with 1.5s delays)
- Connection errors automatically retry
- Graceful fallbacks for missing elements
- Detailed error messages in UI
Contributions welcome! Please:
- Open an issue first to discuss changes
- Fork the repository
- Create a feature branch
- Submit a pull request
- Composio Platform - Intelligent tool routing for AI agents
- Composio Tool Router Documentation - Learn how to use Tool Router to route tool calls across 500+ integrations
- Composio GitHub - Python and TS SDK
- ChatGPT Atlas - OpenAI's browser automation AI agent
- Gemini Computer Use Model - Google's AI model for browser automation
- Gemini API Documentation - Official documentation for Gemini Computer Use
MIT
