Menu

Distributed Tracing with OpenTelemetry

Relevant source files

This document covers the distributed tracing capabilities in the Selenium Grid Docker deployment, implemented using OpenTelemetry (OTEL) instrumentation. It explains how to enable tracing across Grid components, configure OTLP exporters, and integrate with tracing backends like Jaeger for observability and debugging of test execution flows.

For information about monitoring and metrics collection, see Observability and Monitoring. For video recording and debugging capabilities, see Remote Access and Debugging.

Overview of Distributed Tracing Architecture

The Selenium Grid implements distributed tracing using OpenTelemetry instrumentation built into the Selenium server JAR. When enabled, each Grid component generates trace spans that track request flows across the distributed system, providing visibility into session creation, node selection, and test execution.

Distributed Tracing Flow Diagram

Sources: docker-compose-v3-tracing.yml1-58 docker-compose-v3-full-grid-tracing.yml1-108 charts/selenium-grid/templates/_helpers.tpl181-183

Environment Variable Configuration

Tracing is controlled through a set of environment variables that must be configured consistently across all Grid components. The primary configuration variables enable tracing and specify the export destination.

VariablePurposeDefaultExample
SE_ENABLE_TRACINGEnable/disable tracingfalsetrue
SE_OTEL_TRACES_EXPORTERTrace exporter typejaegerotlp
SE_OTEL_EXPORTER_ENDPOINTOTLP endpoint URL-http://jaeger:4317
SE_OTEL_SERVICE_NAMEService name in tracesComponent nameselenium-hub

The tracing configuration is applied uniformly across all components to ensure trace continuity. Additional OpenTelemetry configuration can be provided through standard OTEL environment variables.

Sources: docker-compose-v3-tracing.yml18-20 docker-compose-v3-tracing.yml55-57 charts/selenium-grid/templates/_helpers.tpl181-183

Docker Compose Tracing Setup

The project provides pre-configured Docker Compose files that demonstrate tracing integration with Jaeger as the tracing backend. These configurations show both Hub-Node and full distributed Grid deployments with tracing enabled.

Basic Hub-Node Tracing Configuration

Hub-Node Tracing Architecture

The basic tracing setup uses docker-compose-v3-tracing.yml which includes:

  • Jaeger all-in-one container exposing UI on port 16686 and OTLP on port 4317
  • Hub component with tracing enabled and OTLP endpoint configured
  • All browser nodes configured to send traces to the same Jaeger instance

Sources: docker-compose-v3-tracing.yml5-21 docker-compose-v3-tracing.yml45-57

Full Distributed Grid Tracing

The distributed Grid tracing configuration in docker-compose-v3-full-grid-tracing.yml enables tracing across all microservice components:

Distributed Grid Tracing Architecture

Each component in the distributed Grid is configured with identical tracing settings, ensuring complete trace coverage across the request flow from Router through Distributor to the executing Node.

Sources: docker-compose-v3-full-grid-tracing.yml10-22 docker-compose-v3-full-grid-tracing.yml44-59 docker-compose-v3-full-grid-tracing.yml73-75

Kubernetes Helm Chart Tracing Configuration

The Helm chart provides comprehensive tracing configuration through the tracing section in values.yaml. Tracing can be enabled with an external endpoint or by deploying Jaeger as a chart dependency.

Tracing Configuration Structure

Helm Chart Tracing Configuration Flow

The chart uses the seleniumGrid.enableTracing template helper to determine if tracing should be enabled, checking both tracing.enabled and tracing.enabledWithExistingEndpoint values.

Sources: charts/selenium-grid/templates/_helpers.tpl181-183 charts/selenium-grid/Chart.yaml17-20

Jaeger Integration

The Helm chart can automatically deploy Jaeger through a chart dependency when tracing.enabled is true. The Jaeger chart dependency is conditionally included based on the tracing configuration:

When using an existing Jaeger deployment, set tracing.enabledWithExistingEndpoint: true and configure the appropriate endpoint URL.

Sources: charts/selenium-grid/Chart.yaml17-20

Custom Labels and Resource Attributes

The Helm chart supports adding custom labels that become OpenTelemetry resource attributes, providing additional context in traces. This is configured through the customLabels setting:

These labels are automatically converted to OpenTelemetry resource attributes and appear in trace metadata.

Sources: charts/selenium-grid/CHANGELOG.md130

Trace Visualization and Analysis

Once tracing is enabled and configured, traces can be viewed through the Jaeger UI or other OTLP-compatible backends. The traces provide detailed visibility into the Selenium Grid request lifecycle.

Typical Trace Structure

Selenium Grid Trace Span Structure

Traces capture the complete request flow from initial client request through final browser command execution, with each component contributing spans that include relevant metadata and timing information.

Accessing Jaeger UI

When using the provided Docker Compose configurations, the Jaeger UI is accessible at http://localhost:16686. The UI provides:

  • Trace search and filtering capabilities
  • Service dependency graphs
  • Performance analysis and latency breakdown
  • Error tracking and debugging information

Sources: docker-compose-v3-tracing.yml8 docker-compose-v3-full-grid-tracing.yml8

Troubleshooting Tracing Issues

Common tracing configuration issues and their solutions:

Missing Traces

  • Verify SE_ENABLE_TRACING=true is set on all components
  • Check that SE_OTEL_EXPORTER_ENDPOINT points to the correct backend
  • Ensure the tracing backend is accessible from Grid components

Incomplete Trace Spans

  • Confirm all Grid components have consistent tracing configuration
  • Verify network connectivity between components and tracing backend
  • Check for component startup order dependencies

Performance Impact

  • Tracing adds minimal overhead but can be disabled in production if needed
  • Use sampling configuration to reduce trace volume in high-traffic environments
  • Monitor backend storage capacity for trace data retention

Sources: tests/charts/templates/test.py306-314