Skip to content

Network Monitoring

Network monitoring in Pydoll allows you to observe and analyze HTTP requests, responses, and other network activity during browser automation. This is essential for debugging, performance analysis, API testing, and understanding how web applications communicate with servers.

Network vs Fetch Domain

Network domain is for passive monitoring (observing traffic). Fetch domain is for active interception (modifying requests/responses). This guide focuses on monitoring. For request interception, see the advanced documentation.

Enabling Network Events

Before you can monitor network activity, you must enable the Network domain:

import asyncio from pydoll.browser.chromium import Chrome  async def main():  async with Chrome() as browser:  tab = await browser.start()   # Enable network monitoring  await tab.enable_network_events()   # Now navigate  await tab.go_to('https://api.github.com')   # Don't forget to disable when done (optional but recommended)  await tab.disable_network_events()  asyncio.run(main()) 

Enable Before Navigation

Always enable network events before navigating to capture all requests. Requests made before enabling won't be captured.

Getting Network Logs

Pydoll automatically stores network logs when network events are enabled. You can retrieve them using get_network_logs():

import asyncio from pydoll.browser.chromium import Chrome  async def analyze_requests():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   # Navigate to a page  await tab.go_to('https://httpbin.org/json')   # Wait for page to fully load  await asyncio.sleep(2)   # Get all network logs  logs = await tab.get_network_logs()   print(f"Total requests captured: {len(logs)}")   for log in logs:  request = log['params']['request']  print(f"→ {request['method']} {request['url']}")  asyncio.run(analyze_requests()) 

Production-Ready Waiting

The examples above use asyncio.sleep(2) for simplicity. In production code, consider using more explicit waiting strategies:

  • Wait for specific elements to appear
  • Use the Event System to detect when all resources have loaded
  • Implement network idle detection (see Real-Time Network Monitoring section)

This ensures your automation waits exactly as long as needed, no more, no less.

Filtering Network Logs

You can filter logs by URL pattern:

import asyncio from pydoll.browser.chromium import Chrome  async def filter_logs_example():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()  await tab.go_to('https://example.com')  await asyncio.sleep(2)   # Get all logs  all_logs = await tab.get_network_logs()   # Get logs for specific domain  api_logs = await tab.get_network_logs(filter='api.example.com')   # Get logs for specific endpoint  user_logs = await tab.get_network_logs(filter='/api/users')  asyncio.run(filter_logs_example()) 

Understanding Network Event Structure

Network logs contain detailed information about each request. Here's the structure:

RequestWillBeSentEvent

This event is fired when a request is about to be sent:

{  'method': 'Network.requestWillBeSent',  'params': {  'requestId': 'unique-request-id',  'loaderId': 'loader-id',  'documentURL': 'https://example.com',  'request': {  'url': 'https://api.example.com/data',  'method': 'GET', # or 'POST', 'PUT', 'DELETE', etc.  'headers': {  'User-Agent': 'Chrome/...',  'Accept': 'application/json',  ...  },  'postData': '...', # Only present for POST/PUT requests  'initialPriority': 'High',  'referrerPolicy': 'strict-origin-when-cross-origin'  },  'timestamp': 1234567890.123,  'wallTime': 1234567890.123,  'initiator': {  'type': 'script', # or 'parser', 'other'  'stack': {...} # Call stack if initiated from script  },  'type': 'XHR', # Resource type: Document, Script, Image, XHR, etc.  'frameId': 'frame-id',  'hasUserGesture': False  } } 

Key Fields Reference

Field Location Type Description
requestId params.requestId str Unique identifier for this request
url params.request.url str Complete request URL
method params.request.method str HTTP method (GET, POST, etc.)
headers params.request.headers dict Request headers
postData params.request.postData str Request body (POST/PUT)
timestamp params.timestamp float Monotonic time when request started
type params.type str Resource type (Document, XHR, Image, etc.)
initiator params.initiator dict What triggered this request

Getting Response Bodies

To get the actual response content, use get_network_response_body():

import asyncio from pydoll.browser.chromium import Chrome  async def fetch_api_response():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   # Navigate to API endpoint  await tab.go_to('https://httpbin.org/json')  await asyncio.sleep(2)   # Get all requests  logs = await tab.get_network_logs()   for log in logs:  request_id = log['params']['requestId']  url = log['params']['request']['url']   # Only get response for JSON endpoint  if 'httpbin.org/json' in url:  try:  # Get response body  response_body = await tab.get_network_response_body(request_id)  print(f"Response from {url}:")  print(response_body)  except Exception as e:  print(f"Could not get response body: {e}")  asyncio.run(fetch_api_response()) 

Response Body Availability

Response bodies are only available for requests that have completed. Also, some response types (like images or redirects) may not have accessible bodies.

Practical Use Cases

1. API Testing and Validation

Monitor API calls to verify correct requests are being made:

import asyncio from pydoll.browser.chromium import Chrome  async def validate_api_calls():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   # Navigate to your app  await tab.go_to('https://your-app.com')   # Trigger some action that makes API calls  button = await tab.find(id='load-data-button')  await button.click()  await asyncio.sleep(2)   # Get API logs  api_logs = await tab.get_network_logs(filter='/api/')   print(f"\n📊 API Calls Summary:")  print(f"Total API calls: {len(api_logs)}")   for log in api_logs:  request = log['params']['request']  method = request['method']  url = request['url']   # Check if correct auth header is present  headers = request.get('headers', {})  has_auth = 'Authorization' in headers or 'authorization' in headers   print(f"\n{method} {url}")  print(f" ✓ Has Authorization: {has_auth}")   # Validate POST data if applicable  if method == 'POST' and 'postData' in request:  print(f" 📤 Body: {request['postData'][:100]}...")  asyncio.run(validate_api_calls()) 

2. Performance Analysis

Analyze request timing and identify slow resources:

import asyncio from pydoll.browser.chromium import Chrome  async def analyze_performance():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   await tab.go_to('https://example.com')  await asyncio.sleep(5)   logs = await tab.get_network_logs()   # Store timing data  timings = []   for log in logs:  params = log['params']  request_id = params['requestId']  url = params['request']['url']  resource_type = params.get('type', 'Other')   timings.append({  'url': url,  'type': resource_type,  'timestamp': params['timestamp']  })   # Sort by timestamp  timings.sort(key=lambda x: x['timestamp'])   print("\n⏱️ Request Timeline:")  start_time = timings[0]['timestamp'] if timings else 0   for timing in timings[:20]: # Show first 20  elapsed = (timing['timestamp'] - start_time) * 1000 # Convert to ms  print(f"{elapsed:7.0f}ms | {timing['type']:12} | {timing['url'][:80]}")  asyncio.run(analyze_performance()) 

3. Detecting External Resources

Find all external domains your page connects to:

import asyncio from urllib.parse import urlparse from collections import Counter from pydoll.browser.chromium import Chrome  async def analyze_domains():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   await tab.go_to('https://news.ycombinator.com')  await asyncio.sleep(5)   logs = await tab.get_network_logs()   # Count requests per domain  domains = Counter()   for log in logs:  url = log['params']['request']['url']  try:  domain = urlparse(url).netloc  if domain:  domains[domain] += 1  except:  pass   print("\n🌐 External Domains:")  for domain, count in domains.most_common(10):  print(f" {count:3} requests | {domain}")  asyncio.run(analyze_domains()) 

4. Monitoring Specific Resource Types

Track specific types of resources like images or scripts:

import asyncio from pydoll.browser.chromium import Chrome  async def track_resource_types():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   await tab.go_to('https://example.com')  await asyncio.sleep(3)   logs = await tab.get_network_logs()   # Group by resource type  by_type = {}   for log in logs:  params = log['params']  resource_type = params.get('type', 'Other')  url = params['request']['url']   if resource_type not in by_type:  by_type[resource_type] = []   by_type[resource_type].append(url)   print("\n📦 Resources by Type:")  for rtype in sorted(by_type.keys()):  urls = by_type[rtype]  print(f"\n{rtype}: {len(urls)} resource(s)")  for url in urls[:3]: # Show first 3  print(f" • {url}")  if len(urls) > 3:  print(f" ... and {len(urls) - 3} more")  asyncio.run(track_resource_types()) 

Real-Time Network Monitoring

For real-time monitoring, use event callbacks instead of polling get_network_logs():

Understanding Events

Real-time monitoring uses Pydoll's event system to react to network activity as it happens. For a deep dive into how events work, see Event System.

import asyncio from pydoll.browser.chromium import Chrome from pydoll.protocol.network.events import (  NetworkEvent,  RequestWillBeSentEvent,  ResponseReceivedEvent,  LoadingFailedEvent )  async def real_time_monitoring():  async with Chrome() as browser:  tab = await browser.start()   # Statistics  stats = {  'requests': 0,  'responses': 0,  'failed': 0  }   # Request callback  async def on_request(event: RequestWillBeSentEvent):  stats['requests'] += 1  url = event['params']['request']['url']  method = event['params']['request']['method']  print(f"→ {method:6} | {url}")   # Response callback  async def on_response(event: ResponseReceivedEvent):  stats['responses'] += 1  response = event['params']['response']  status = response['status']  url = response['url']   # Color code by status  if 200 <= status < 300:  color = '\033[92m' # Green  elif 300 <= status < 400:  color = '\033[93m' # Yellow  else:  color = '\033[91m' # Red  reset = '\033[0m'   print(f"← {color}{status}{reset} | {url}")   # Failed callback  async def on_failed(event: LoadingFailedEvent):  stats['failed'] += 1  error = event['params']['errorText']  print(f"✗ FAILED: {error}")   # Enable and register callbacks  await tab.enable_network_events()  await tab.on(NetworkEvent.REQUEST_WILL_BE_SENT, on_request)  await tab.on(NetworkEvent.RESPONSE_RECEIVED, on_response)  await tab.on(NetworkEvent.LOADING_FAILED, on_failed)   # Navigate  await tab.go_to('https://example.com')  await asyncio.sleep(5)   print(f"\n📊 Summary:")  print(f" Requests: {stats['requests']}")  print(f" Responses: {stats['responses']}")  print(f" Failed: {stats['failed']}")  asyncio.run(real_time_monitoring()) 

Resource Types Reference

Pydoll captures the following resource types:

Type Description Examples
Document Main HTML documents Page loads, iframe sources
Stylesheet CSS files External .css, inline styles
Image Image resources .jpg, .png, .gif, .webp, .svg
Media Audio/video files .mp4, .webm, .mp3, .ogg
Font Web fonts .woff, .woff2, .ttf, .otf
Script JavaScript files .js files, inline scripts
TextTrack Subtitle files .vtt, .srt
XHR XMLHttpRequest AJAX requests, legacy API calls
Fetch Fetch API requests Modern API calls
EventSource Server-Sent Events Real-time streams
WebSocket WebSocket connections Bidirectional communication
Manifest Web app manifests PWA configuration
Other Other resource types Miscellaneous

Advanced: Extracting Response Timing

Network events include detailed timing information:

import asyncio from pydoll.browser.chromium import Chrome from pydoll.protocol.network.events import NetworkEvent, ResponseReceivedEvent  async def analyze_timing():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()   # Custom callback to capture timing  timing_data = []   async def on_response(event: ResponseReceivedEvent):  response = event['params']['response']  timing = response.get('timing')   if timing:  # Calculate different phases  dns_time = timing.get('dnsEnd', 0) - timing.get('dnsStart', 0)  connect_time = timing.get('connectEnd', 0) - timing.get('connectStart', 0)  ssl_time = timing.get('sslEnd', 0) - timing.get('sslStart', 0)  send_time = timing.get('sendEnd', 0) - timing.get('sendStart', 0)  wait_time = timing.get('receiveHeadersStart', 0) - timing.get('sendEnd', 0)  receive_time = timing.get('receiveHeadersEnd', 0) - timing.get('receiveHeadersStart', 0)   timing_data.append({  'url': response['url'][:50],  'dns': dns_time if dns_time > 0 else 0,  'connect': connect_time if connect_time > 0 else 0,  'ssl': ssl_time if ssl_time > 0 else 0,  'send': send_time,  'wait': wait_time,  'receive': receive_time,  'total': receive_time + wait_time + send_time  })   await tab.on(NetworkEvent.RESPONSE_RECEIVED, on_response)  await tab.go_to('https://github.com')  await asyncio.sleep(5)   # Print timing breakdown  print("\n⏱️ Request Timing Breakdown (ms):")  print(f"{'URL':<50} | {'DNS':>6} | {'Connect':>8} | {'SSL':>6} | {'Send':>6} | {'Wait':>6} | {'Receive':>8} | {'Total':>7}")  print("-" * 120)   for data in sorted(timing_data, key=lambda x: x['total'], reverse=True)[:10]:  print(f"{data['url']:<50} | {data['dns']:6.1f} | {data['connect']:8.1f} | {data['ssl']:6.1f} | "  f"{data['send']:6.1f} | {data['wait']:6.1f} | {data['receive']:8.1f} | {data['total']:7.1f}")  asyncio.run(analyze_timing()) 

Timing Fields Explanation

Phase Fields Description
DNS dnsStartdnsEnd DNS lookup time
Connect connectStartconnectEnd TCP connection establishment
SSL sslStartsslEnd SSL/TLS handshake
Send sendStartsendEnd Time to send request
Wait sendEndreceiveHeadersStart Waiting for server response (TTFB)
Receive receiveHeadersStartreceiveHeadersEnd Time to receive response headers

Time to First Byte (TTFB)

TTFB is the "Wait" phase - the time between sending the request and receiving the first byte of the response. This is crucial for performance analysis.

Best Practices

1. Enable Network Events Only When Needed

import asyncio from pydoll.browser.chromium import Chrome  async def best_practice_enable():  async with Chrome() as browser:  tab = await browser.start()   # ✅ Good: Enable before navigation, disable after  await tab.enable_network_events()  await tab.go_to('https://example.com')  await asyncio.sleep(2)  logs = await tab.get_network_logs()  await tab.disable_network_events()   # ❌ Bad: Leaving it enabled throughout entire session  # await tab.enable_network_events()  # ... long automation session ... 

2. Filter Logs to Reduce Memory Usage

import asyncio from pydoll.browser.chromium import Chrome  async def best_practice_filter():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()  await tab.go_to('https://example.com')  await asyncio.sleep(2)   # ✅ Good: Filter for specific requests  api_logs = await tab.get_network_logs(filter='/api/')   # ❌ Bad: Getting all logs when you only need specific ones  all_logs = await tab.get_network_logs()  filtered = [log for log in all_logs if '/api/' in log['params']['request']['url']] 

3. Handle Missing Fields Safely

import asyncio from pydoll.browser.chromium import Chrome  async def best_practice_safe_access():  async with Chrome() as browser:  tab = await browser.start()  await tab.enable_network_events()  await tab.go_to('https://example.com')  await asyncio.sleep(2)   logs = await tab.get_network_logs()   # ✅ Good: Safe access with .get()  for log in logs:  params = log.get('params', {})  request = params.get('request', {})  url = request.get('url', 'Unknown')  post_data = request.get('postData') # May be None   if post_data:  print(f"POST data: {post_data}")   # ❌ Bad: Direct access can raise KeyError  # url = log['params']['request']['url']  # post_data = log['params']['request']['postData'] # May not exist! 

4. Use Event Callbacks for Real-Time Needs

import asyncio from pydoll.protocol.network.events import NetworkEvent, RequestWillBeSentEvent  # ✅ Good: Real-time monitoring with callbacks async def on_request(event: RequestWillBeSentEvent):  print(f"New request: {event['params']['request']['url']}")  await tab.on(NetworkEvent.REQUEST_WILL_BE_SENT, on_request)  # ❌ Bad: Polling logs repeatedly (inefficient) while True:  logs = await tab.get_network_logs()  # Process logs...  await asyncio.sleep(0.5) # Wasteful! 

See Also