Posted on Apr 19, 2024

Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

Monitoring application logs is a crucial aspect of the software development and deployment lifecycle. In this post, we'll delve into the process of observing logs generated by Docker container applications operating within HashiCorp Nomad. With the aid of Grafana, Vector, and Loki, we'll explore effective strategies for log analysis and visualization, enhancing visibility and troubleshooting capabilities within your Nomad environment.

Steps:

Install Nomad in Linux
Install Docker
Run Nomad in both server and client mode
Run Loki in Nomad
Deploy Logging app
Deploy Vector in Nomad
Deploy Grafana
Observe logs in Grafana using Loki Datasource

Application structure

Install Nomad locally (Linux)

Nomad, much like Kubernetes, serves as a powerful container orchestration tool, facilitating seamless application deployment and management. In this guide, we'll walk through the installation process on a Linux machine, specifically Ubuntu 22.04 LTS. Let's dive in:

Install required packages

$ sudo apt-get update && sudo apt-get install wget gpg coreutils

Add the HashiCorp GPG key

$ wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg

Add official Hasicorp Linux Repository

$ echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list

Update and Install

$ sudo apt-get update && sudo apt-get install nomad

This command will install the latest Nomad binary. The installation can be confirmed by checking the version of nomad by running

$ nomad -v

Install Docker

To deploy containers, docker is essential and can be installed by the following steps. Nomad will detect the docker in a system using docker driver and use it to deploy containers.

Set up Docker's apt repository

# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update

Install Docker

$ sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Restart docker and update permission

$ sudo systemctl restart docker $ sudo usermod -aG docker ubuntu

Run Nomad in both server and client mode

Nomad binary can run in both server and client mode. The server manages the state of the cluster and clients are used to deploy applications. In production, multiple Nomad servers and clients running in separate machines are interconnected to form a cluster. Here, we will use a single machine to run both server and client.

Nomad require a config file to run the application in server and client mode. The config file is also required to add more capabilities to nomad such as adding docker driver, setting telemetry, autopilot and so on.

Here is the nomad configuration file which can be used to run nomad. It also contains the information required to run docker containers. Create a file nomad.hcl and copy the below content to the file.

datacenter = "dc1" data_dir = "/opt/nomad/data" bind_addr = "0.0.0.0" log_level = "INFO" server { enabled = true bootstrap_expect = 1 search { fuzzy_enabled = true limit_query = 200 limit_results = 1000 min_term_length = 5 } } # Enable the client client { enabled = true options { "driver.raw_exec.enable" = "1" "docker.privileged.enabled" = "true" } server_join { retry_join = [ "127.0.0.1" ] } } plugin "docker" { config { endpoint = "unix:///var/run/docker.sock" extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"] volumes { enabled = true selinuxlabel = "z" } allow_privileged = true } } telemetry { collection_interval = "15s" disable_hostname = true prometheus_metrics = true publish_allocation_metrics = true publish_node_metrics = true }

Run the following command in the terminal where the above file is located to start Nomad.

$ sudo nomad agent -config=nomad.hcl

This will start the application.

==> Loaded configuration from nomad.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 192.168.1.32:4646; RPC: 192.168.1.32:4647; Serf: 192.168.1.32:4648 Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648 Client: true Log Level: INFO Node Id: 2921dae9-99dc-a65d-1a1f-25d9822c1500 Region: global (DC: dc1) Server: true Version: 1.7.7 ==> Nomad agent started! Log data will stream in below:

The Nomad UI will be available in the advertise address. Here it is, 192.168.1.32:4646, This will be based on the network interface you are connected and the value will be different for different machines. Going to this IP and port in web browser will open nomad UI as shown below.

You have installed both nomad and docker in the system and ready to deploy the observability applications!

Run Loki in Nomad

Loki is a scalable log aggregation system and is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream. It is configured such that the logs are persisted to the file system. It can also be configured to store the data in various datastores like AWS S3, Minio and so on.

The nomad job file used to run loki is given below.

job "loki" { datacenters = ["dc1"] type = "service" group "loki" { count = 1 network { mode = "host" port "loki" { to = 3100 static = 3100 } } service { name = "loki" port = "loki" provider = "nomad" } task "loki" { driver = "docker" user = "root" config { image = "grafana/loki:2.9.7" args = [ "-config.file", "local/config.yml", ] volumes = ["/loki_data:/loki"] ports = ["loki"] } template { data=<<EOH auth_enabled: false server: http_listen_port: 3100 ingester: lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s # Any chunk not receiving new logs in this time will be flushed chunk_idle_period: 1h # All chunks will be flushed when they hit this age, default is 1h max_chunk_age: 1h # Loki will attempt to build chunks up to 1.5MB, flushing if chunk_idle_period or max_chunk_age is reached first chunk_target_size: 1048576 # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m) chunk_retain_period: 30s max_transfer_retries: 0 # Chunk transfers disabled wal: enabled: true dir: "/loki/wal" schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h storage_config: boltdb_shipper: active_index_directory: /loki/boltdb-shipper-active cache_location: /loki/boltdb-shipper-cache cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space shared_store: filesystem filesystem: directory: /loki/chunks compactor: working_directory: /loki/boltdb-shipper-compactor shared_store: filesystem limits_config: reject_old_samples: true reject_old_samples_max_age: 168h chunk_store_config: max_look_back_period: 0s table_manager: retention_deletes_enabled: false retention_period: 0s EOH  destination = "local/config.yml" change_mode = "restart" } resources { cpu = 1000 #Mhz memory = 1000 #MB } } } }

Copy the above job configuration and add it in the nomad UI.

Deploy Logging app

The logging app is a simple docker container which emits logs randomly. It send 4 log levels, INFO, ERROR, WARNING and DEBUG to stdout. The job file is give below.

job "logger" { datacenters = ["dc1"] type = "service" group "logger" { count = 1 task "logger" { driver = "docker" config { image = "chentex/random-logger:latest" } resources { cpu = 100 # 100 MHz memory = 100 # 100MB } } } }

Copy the above configuration and deploy another job from Nomad UI.

Deploy Vector in Nomad

Vector is log collecting agent which support many source and destination for log ingestion and export. Here, vector will collect the log data using docker_logs source and send to loki. The job file is given below.

job "vector" { datacenters = ["dc1"] # system job, runs on all nodes type = "system" group "vector" { count = 1 network { port "api" { to = 8686 } } ephemeral_disk { size = 500 sticky = true } task "vector" { driver = "docker" config { image = "timberio/vector:0.30.0-debian" ports = ["api"] volumes = ["/var/run/docker.sock:/var/run/docker.sock"] } env { VECTOR_CONFIG = "local/vector.toml" VECTOR_REQUIRE_HEALTHY = "false" } resources { cpu = 100 # 100 MHz memory = 100 # 100MB } # template with Vector's configuration template { destination = "local/vector.toml" change_mode = "signal" change_signal = "SIGHUP" # overriding the delimiters to [[ ]] to avoid conflicts with Vector's native templating, which also uses {{ }} left_delimiter = "[[" right_delimiter = "]]" data=<<EOH data_dir = "alloc/data/vector/" [api] enabled = true address = "0.0.0.0:8686" playground = true [sources.logs] type = "docker_logs" [sinks.out] type = "console" inputs = [ "logs" ] encoding.codec = "json" target = "stdout" [sinks.loki] type = "loki" compression = "snappy" encoding.codec = "json" inputs = ["logs"] endpoint = "http://[[ range nomadService "loki" ]][[.Address]]:[[.Port]][[ end ]]" healthcheck.enabled = true out_of_order_action = "drop" # remove fields that have been converted to labels to avoid having the field twice remove_label_fields = true [sinks.loki.labels] # See https://vector.dev/docs/reference/vrl/expressions/#path-example-nested-path job = "{{label.\"com.hashicorp.nomad.job_name\" }}" task = "{{label.\"com.hashicorp.nomad.task_name\" }}" group = "{{label.\"com.hashicorp.nomad.task_group_name\" }}" namespace = "{{label.\"com.hashicorp.nomad.namespace\" }}" node = "{{label.\"com.hashicorp.nomad.node_name\" }}"  EOH  } kill_timeout = "30s" } } }

Copy the above configuration and deploy the job from Nomad UI.Once vector is deployed, it will start collecting the logs from docker logs and send it to Loki.

Deploy Grafana

Grafana is a popular tool to visualize logs, metrics and traces. Here we will use grafana and its Loki Data source connector to view and explore log data sent by docker applications running in Nomad.
The grafana nomad job file is given below.

job "grafana" { datacenters = ["dc1"] type = "service" group "grafana" { count = 1 network { mode = "host" port "grafana" { to = 3000 static = 3000 } } task "grafana" { driver = "docker" env { GF_LOG_LEVEL = "ERROR" GF_LOG_MODE = "console" GF_PATHS_DATA = "/var/lib/grafana" } user = "root" config { image = "grafana/grafana:10.4.2" ports = ["grafana"] volumes = ["/grafana_volume:/var/lib/grafana"] } resources { cpu = 2000 memory = 2000 } } } }

Deploy this job using the Nomad UI as described in previous section.

After deploying all the jobs, the Nomad UI looks like this.

Connecting to Loki Data source in Grafana

After confirming all jobs are running as expected, open the grafana UI on :3000. For me the address is my host machine's private IP which is http://192.168.1.32:3000. Login to the Dashboard, using default username admin and password admin.

Select the Data sources option, press Add Data sources

Search for loki and select it

Update the connection URL of loki. Based on the loki nomad job file, it is listening on static port 3100. The connection url will be like this, http://(machine-ip):3100.