DEV Community

Muhammad Syukur Abadi
Muhammad Syukur Abadi

Posted on

Application Tracing With Golang, OpenTelemetry, and Grafana Tempo

Trace

Trace is a record or history from a request. Trace give us big picture abour what is happening from application accepts request to return a response. This record is build above span.

Span

Span is a unit that records particular operation within certain time window. Span contains information about trace name, timestamp from request starts to when it ends, and other informations in form of span attributes such as HTTP request, HTTP URL, and URL path. This is an example of hello span on json format.

{ "name": "hello", "context": { "trace_id": "5b8aa5a2d2c872e8321cf37308d69df2", "span_id": "051581bf3cb55c13" }, "parent_id": null, "start_time": "2022-04-29T18:52:58.114201Z", "end_time": "2022-04-29T18:52:58.114687Z", "attributes": { "http.route": "some_route1" }, "events": [ { "name": "Guten Tag!", "timestamp": "2022-04-29T18:52:58.114561Z", "attributes": { "event_attributes": 1 } } ] } 
Enter fullscreen mode Exit fullscreen mode

Span above contains informations about

  • a name hello
  • span lifetime on start_time and end_time fields
  • http.route span attribute records information URL
  • event field that tells us what happens during span lifetime

Span hello is a parent span, because value on parent_id field is null. Pay attention on hello_greetings span below.

{ "name": "hello-greetings", "context": { "trace_id": "5b8aa5a2d2c872e8321cf37308d69df2", "span_id": "5fb397be34d26b51" }, "parent_id": "051581bf3cb55c13", "start_time": "2022-04-29T18:52:58.114304Z", "end_time": "2022-04-29T22:52:58.114561Z", "attributes": { "http.route": "some_route2" }, "events": [ { "name": "hey there!", "timestamp": "2022-04-29T18:52:58.114561Z", "attributes": { "event_attributes": 1 } }, { "name": "bye now!", "timestamp": "2022-04-29T18:52:58.114585Z", "attributes": { "event_attributes": 1 } } ] } 
Enter fullscreen mode Exit fullscreen mode

Value of parent_id field on greetings span is same as value of span_id on hello span. We call greetings span is a child span from hello span. Below is hello-salutation span, where it's parent_id is equal to hello's span_id.

{ "name": "hello-salutations", "context": { "trace_id": "5b8aa5a2d2c872e8321cf37308d69df2", "span_id": "93564f51e1abe1c2" }, "parent_id": "051581bf3cb55c13", "start_time": "2022-04-29T18:52:58.114492Z", "end_time": "2022-04-29T18:52:58.114631Z", "attributes": { "http.route": "some_route3" }, "events": [ { "name": "hey there!", "timestamp": "2022-04-29T18:52:58.114561Z", "attributes": { "event_attributes": 1 } } ] } 
Enter fullscreen mode Exit fullscreen mode

OpenTelemetry Collector

Collector is a component to receive, extract, and export telemetry data. OpenTelemetry collector is use to receive, extract, and export telemetry data in various formats such as Jaeger, Prometheus, or FluentBit. OpenTelemetry collector built above several components:

  1. Receiver

Receiver focus on collecting data from various sources and formats. Here is an example of receiver.

receivers: # Data sources: logs fluentforward: endpoint: 0.0.0.0:8006 # Data sources: metrics hostmetrics: scrapers: cpu: disk: filesystem: load: memory: network: process: processes: paging: # Data sources: traces jaeger: protocols: grpc: endpoint: 0.0.0.0:4317 thrift_binary: thrift_compact: thrift_http: # Data sources: traces, metrics, logs kafka: protocol_version: 2.0.0 # Data sources: traces, metrics opencensus: # Data sources: traces, metrics, logs otlp: protocols: grpc: endpoint: 0.0.0.0:4317 tls: cert_file: cert.pem key_file: cert-key.pem http: endpoint: 0.0.0.0:4318 # Data sources: metrics prometheus: config: scrape_configs: - job_name: otel-collector scrape_interval: 5s static_configs: - targets: [localhost:8888] # Data sources: traces zipkin: 
Enter fullscreen mode Exit fullscreen mode
  1. Processor

Processor is use to process data from receiver. Processed data on this component cosists of filtering, renaming, or recalculating. We could not only define processor once, but also multiple of them, depending on our needs. Outcome of telemetry data depends on sequence or order of data processing on this component. Here is example of processor definition

processors: # Data sources: traces attributes: actions: - key: environment value: production action: insert - key: db.statement action: delete - key: email action: hash # Data sources: traces, metrics, logs batch: # Data sources: metrics, metrics, logs filter: error_mode: ignore traces: span: - 'attributes["container.name"] == "app_container_1"' - 'resource.attributes["host.name"] == "localhost"' - 'name == "app_3"' spanevent: - 'attributes["grpc"] == true' - 'IsMatch(name, ".*grpc.*")' metrics: metric: - 'name == "my.metric" and resource.attributes["my_label"] == "abc123"' - 'type == METRIC_DATA_TYPE_HISTOGRAM' datapoint: - 'metric.type == METRIC_DATA_TYPE_SUMMARY' - 'resource.attributes["service.name"] == "my_service_name"' logs: log_record: - 'IsMatch(body, ".*password.*")' - 'severity_number < SEVERITY_NUMBER_WARN' # Data sources: traces, metrics, logs memory_limiter: check_interval: 5s limit_mib: 4000 spike_limit_mib: 500 # Data sources: traces resource: attributes: - key: cloud.zone value: zone-1 action: upsert - key: k8s.cluster.name from_attribute: k8s-cluster action: insert - key: redundant-attribute action: delete # Data sources: traces probabilistic_sampler: hash_seed: 22 sampling_percentage: 15 # Data sources: traces span: name: to_attributes: rules: - ^\/api\/v1\/document\/(?P<documentId>.*)\/update$ from_attributes: [db.svc, operation] separator: '::' 
Enter fullscreen mode Exit fullscreen mode
  1. Exporter

Exporter is component that aims to export telemetry data that has been processed to other backend processes such as Jaeger, Zipkin, Prometheus, or Kafka. Each key on exporter use type/name format, where type defines kind of exporter like Jaeger, Kafka, or OpenTelemetry, and name is an optional field to seperate exporter that has same type. Here is an example of exporter.

exporters: # Data sources: traces, metrics, logs file: path: ./filename.json # Data sources: traces otlp/jaeger: endpoint: jaeger-server:4317 tls: cert_file: cert.pem key_file: cert-key.pem # Data sources: traces, metrics, logs kafka: protocol_version: 2.0.0 # Data sources: traces, metrics, logs # NOTE: Prior to v0.86.0 use `logging` instead of `debug` debug: verbosity: detailed # Data sources: traces, metrics opencensus: endpoint: otelcol2:55678 # Data sources: traces, metrics, logs otlp: endpoint: otelcol2:4317 tls: cert_file: cert.pem key_file: cert-key.pem # Data sources: traces, metrics otlphttp: endpoint: https://otlp.example.com:4318 # Data sources: metrics prometheus: endpoint: 0.0.0.0:8889 namespace: default # Data sources: metrics prometheusremotewrite: endpoint: http://prometheus.example.com:9411/api/prom/push # When using the official Prometheus (running via Docker) # endpoint: 'http://prometheus:9090/api/v1/write', add: # tls: # insecure: true # Data sources: traces zipkin: endpoint: http://zipkin.example.com:9411/api/v2/spans 
Enter fullscreen mode Exit fullscreen mode

Instrumenting Go Application With OpenTelemetry Tracing

Spin Up Collector

Trace and span data emitted by server will be collected, stored, filtered, and exported by OpenTelemetry collector. The rules of collector defined by a configuration file as shown below.

receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: exporters: otlphttp/trace: endpoint: http://tempo:4318 tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlphttp/trace] 
Enter fullscreen mode Exit fullscreen mode

OpenTelemetry collector will collects all emmited traces and spans as mentioned at receivers.otlp.http.endpoint field. Value of this field is 0.0.0.0:4318 where 0.0.0.0 means the collector will listen to all IPv4 addresses.

After traces and spans has been collected by receiver, then they will be processed on processor. In this example, the data will be processed as batch.

Processed data then will be exported to trace visualization technology such as Grafana Tempo. Exporters are defined on exporters section.

Finally, we compile collecting, processing, and exporting traces data on service field.

Defining Exporter

Exporter on code level aims to connect application with collector. NewTracer function below explains exporter creation and connects created exporter with collector.

package trace import ( "context" "fmt" "net/http" "runtime" "time" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/trace" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/otlp/otlptrace" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" "go.opentelemetry.io/otel/propagation" "go.opentelemetry.io/otel/sdk/resource" traceSdk "go.opentelemetry.io/otel/sdk/trace" ) var globalServiceName string func NewTracer( ctx context.Context, serviceName string, ) error { propagator := propagation.NewCompositeTextMapPropagator( propagation.TraceContext{}, propagation.Baggage{}, ) otel.SetTextMapPropagator(propagator) // Create OpenTelemetry tracing client that exports // trace data to OpenTelemetry HTTP receiver traceClientHTTP := otlptracehttp.NewClient( otlptracehttp.WithInsecure(), ) traceExporter, err := otlptrace.New( ctx, traceClientHTTP, ) if err != nil { return err } res, err := resource.New( ctx, resource.WithContainer(), resource.WithFromEnv(), ) if err != nil { return err } tracerProvider := traceSdk.NewTracerProvider( traceSdk.WithBatcher( traceExporter, traceSdk.WithBatchTimeout(time.Second), ), traceSdk.WithResource( res, ), ) globalServiceName = serviceName otel.SetTracerProvider(tracerProvider) return nil } 
Enter fullscreen mode Exit fullscreen mode

Define Tracing Components

In this article, we are going to record events on HTTP request and business logic/service by creating trace and generate span from each component. HTTPLevelSpan to trace over HTTP handler, while ServiceLevelSpan to trace over service level.

package trace import ( "context" "fmt" "net/http" "runtime" "time" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/trace" "go.opentelemetry.io/otel/attribute" ) func HTTPLevelSpan( ctx context.Context, r *http.Request, ) (context.Context, trace.Span) { spanName := fmt.Sprintf("%s %s", r.Method, r.URL.String()) ctx, span := otel.Tracer(globalServiceName).Start(ctx, spanName) if span.IsRecording() { span.SetAttributes(attribute.String("http.request.method", r.Method)) span.SetAttributes(attribute.String("server.address", r.RemoteAddr)) span.SetAttributes(attribute.String("url.full", r.URL.String())) } return ctx, span } func ServiceLevelSpan( ctx context.Context, ) (context.Context, trace.Span) { pc := make([]uintptr, 10) runtime.Callers(2, pc) f := runtime.FuncForPC(pc[0]) ctx, span := otel.Tracer(globalServiceName).Start(ctx, f.Name()) if span.IsRecording() { span.SetAttributes(attribute.String("code.function.name", f.Name())) } return ctx, span } 
Enter fullscreen mode Exit fullscreen mode

When conducting trace on any aspect of application like HTTP, service, GRPC, or database, it is recommended to follow Semantic Convention or semconv. Semantic convention give us meaningful information about generated trace and span. Based on code above, code.function, server.address, and url.full are examples of semantic convention.

Instrumenting Application

Code below demonstrate usages of HTTPLevelSpan and ServiceLevelSpan

package server import ( "log" "net/http" ) func RunServer() { http.HandleFunc("/", home()) if err := http.ListenAndServe(":9320", nil); err != nil { log.Fatalf("could not start server: %v\n", err) } } 
Enter fullscreen mode Exit fullscreen mode
package server import ( "fmt" "log" "net/http" "strconv" "time" "go-trace-demo/service" "go-trace-demo/utils/metrics" utilsTrace "go-trace-demo/utils/trace" ) func home() http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { ctx := r.Context() ctx, span := utilsTrace.HTTPLevelSpan(ctx, r) defer span.End() respCode := 200 failQuery := r.URL.Query().Get("fail") if failQuery != "" { respCode = 500 isFail, err := strconv.ParseBool(failQuery) if err != nil { w.WriteHeader(http.StatusInternalServerError) w.Write([]byte("error parse query")) return } if isFail { w.WriteHeader(http.StatusInternalServerError) w.Write([]byte("encounter fail")) return } } err := service.GetUserByID(ctx) if err != nil { return } err = service.ListStores(ctx) if err != nil { return } w.Write([]byte("Hello, World!")) } } 
Enter fullscreen mode Exit fullscreen mode
package service import ( "context" utilsTrace "go-trace-demo/utils/trace" ) func GetUserByID( ctx context.Context, ) error { ctx, span := utilsTrace.ServiceLevelSpan(ctx) defer span.End() return nil } func ListStores( ctx context.Context, ) error { ctx, span := utilsTrace.ServiceLevelSpan(ctx) defer span.End() return nil } 
Enter fullscreen mode Exit fullscreen mode

Below is main code

func main() { ctx := context.Background() serviceName := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") err := utilsTrace.NewTracer(ctx, serviceName) if err != nil { log.Fatalln("fail setup tracer") } server.RunServer() } 
Enter fullscreen mode Exit fullscreen mode

Glue Everything

We'll run application, OpenTelemetry Collector, and Grafana Tempo using docker compose file below

version: "3" networks: observability: services: tempo: image: grafana/tempo:main-c5323bf command: [ "-config.file=/etc/tempo.yaml" ] volumes: - ./config/tempo-config.yaml:/etc/tempo.yaml - ./config/tempo-data:/var/tempo ports: - "3200:3200" # tempo - "4317" # open this port for OpenTelemetry gRPC exporter - "4318" # open this port for OpenTelemetry HTTP exporter networks: - observability otel-collector: image: otel/opentelemetry-collector:latest volumes: - ./config/otel-config.yaml:/etc/otel/config.yaml command: - '--config=/etc/otel/config.yaml' ports: - "4317:4317" #grpc - "4318:4318" #http depends_on: - loki networks: - observability app: container_name: app build: context: ./ dockerfile: ./Dockerfile ports: - 9320:9320 networks: - observability depends_on: - otel-collector environment: - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 - OTEL_SERVICE_NAME=go-loki-app-demo grafana: environment: - GF_PATHS_PROVISIONING=/etc/grafana/provisioning - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin entrypoint: - sh - -euc - | mkdir -p /etc/grafana/provisioning/datasources cat <<EOF > /etc/grafana/provisioning/datasources/ds.yaml apiVersion: 1 datasources: - name: Loki type: loki access: proxy  orgId: 1 url: http://loki:3100 basicAuth: false isDefault: true version: 1 editable: false - name: Tempo type: tempo access: proxy  orgId: 1 url: http://tempo:3200 basicAuth: false version: 1 editable: false EOF /run.sh image: grafana/grafana:latest ports: - "3000:3000" networks: - observability 
Enter fullscreen mode Exit fullscreen mode

Our application defined at service.app section. Be aware that we define OTEL_EXPORTER_OTLP_ENDPOINT environment variables for application. The reason is when we are defining OpenTelemetry HTTP exporter, we don't specify OpenTelemetry collector address. OpenTelemetry will look for OTEL_EXPORTER_OTLP_ENDPOINT environment variable to know where application should send trace data. If you have specific OpenTelemetry exporter, you can use WithEndpointURL argument for otlptracehttp.NewClient. Additionally, we define OTEL_SERVICE_NAME environment variable where the purpose is for OpenTelemetry identify what service they are receive the trace data. If OTEL_SERVICE_NAME not provided, OpenTelemetry will fill with unknown_service value.

Visualizing Traces Using Grafana Tempo

After successful container creation with Docker compose, let's generate trace data by accessing localhost:9320/. Next, access Grafana on localhost:3000, then head to Exlpore menu by click dropdown menu on top left corner.
alt text

Switch to Search query type instead of TraceQL, then click Service Name dropdown. It will show service name we provided on OTEL_SERVICE_NAME
alt text. By click Run Query, we'll have something like this.
alt text

Explore trace data by click available row on Trace column. We will be redirected to trace detail like this
alt text.

Additionally, explore span attribute by click one of span. We will see span attribute and resource attribute we defined when we are creating OpenTelemetry trace instance.

References

Top comments (0)