DEV Community

Jhon Thomas Ticona Chambi
Jhon Thomas Ticona Chambi

Posted on

Observability Practices: A Complete Guide with Node.js Implementation

Introduction

In today's distributed systems landscape, observability has become critical for understanding complex applications. This article demonstrates comprehensive observability practices using a real-world Node.js API integrated with Prometheus and Grafana.

The Three Pillars of Observability

1. Metrics

Numerical measurements providing quantitative insights:

  • Business Metrics: User registrations, transactions, revenue
  • Application Metrics: Response times, error rates, throughput
  • Infrastructure Metrics: CPU usage, memory consumption, disk I/O

2. Logs

Time-stamped records of discrete events:

  • Structured Logging: JSON format for better parsing
  • Contextual Information: Request IDs, user context, transaction details
  • Different Log Levels: DEBUG, INFO, WARN, ERROR, FATAL

3. Traces

Track requests across multiple services:

  • Distributed Tracing: Follow requests through microservices
  • Performance Analysis: Identify slow components
  • Dependency Mapping: Understand service relationships

Real-World Implementation with Node.js

Our demonstration project implements a RESTful API with comprehensive observability features.

Architecture Overview

Node.js API (Port 3000) → Prometheus (Port 9090) → Grafana (Port 3000) ↓ Traffic Generator (Test Script) 
Enter fullscreen mode Exit fullscreen mode

Core Metrics Implementation

HTTP Request Duration Histogram

const httpDuration = new promClient.Histogram({ name: 'http_request_duration_ms', help: 'Duration of HTTP requests in ms', labelNames: ['method', 'route', 'status'], buckets: [1, 5, 15, 50, 100, 500, 1000] }); 
Enter fullscreen mode Exit fullscreen mode

Request Counter

const httpRequests = new promClient.Counter({ name: 'http_requests_total', help: 'Total HTTP requests', labelNames: ['method', 'route', 'status'] }); 
Enter fullscreen mode Exit fullscreen mode

Active Connections Gauge

const activeConnections = new promClient.Gauge({ name: 'active_connections', help: 'Active connections' }); 
Enter fullscreen mode Exit fullscreen mode

API Endpoints for Testing

  1. GET / - Basic health endpoint
  2. GET /users - List users with variable latency
  3. GET /users/:id - Specific user lookup with error cases
  4. GET /slow - Intentionally slow endpoint (2-5s response time)
  5. GET /error - Random error generation for testing
  6. GET /metrics - Prometheus metrics endpoint

Middleware Implementation

// Response time tracking app.use(responseTime((req, res, time) => { const route = req.route ? req.route.path : req.path; httpDuration.labels(req.method, route, res.statusCode).observe(time); httpRequests.labels(req.method, route, res.statusCode).inc(); if (res.statusCode >= 400) { httpErrors.labels(req.method, route, res.statusCode).inc(); } })); 
Enter fullscreen mode Exit fullscreen mode

Essential Prometheus Queries

Request Rate (RPS)

rate(http_requests_total[1m]) 
Enter fullscreen mode Exit fullscreen mode

Error Rate Percentage

rate(http_errors_total[1m]) / rate(http_requests_total[1m]) * 100 
Enter fullscreen mode Exit fullscreen mode

95th Percentile Response Time

histogram_quantile(0.95, rate(http_request_duration_ms_bucket[1m])) 
Enter fullscreen mode Exit fullscreen mode

Best Practices

1. Metric Design Principles

  • Use Standard Suffixes: _total, _duration_seconds, _bytes
  • Consistent Labeling: Standardize label names across services
  • Avoid High Cardinality: Limit unique label combinations

2. Golden Signals Implementation

  • Latency: Time to process requests
  • Traffic: Demand on your system
  • Errors: Rate of failed requests
  • Saturation: Resource utilization

3. Effective Alerting

  • Alert on Symptoms: Focus on user impact, not causes
  • Meaningful Thresholds: Avoid alert fatigue
  • Runbook Integration: Provide clear remediation steps

GitHub Repository

The complete implementation with automated setup scripts is available:

🔗 GitHub Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git

Repository Features

  • ✅ Complete source code with all implementation files
  • ✅ Automated setup scripts for Windows/Mac/Linux
  • ✅ Comprehensive documentation and troubleshooting guides
  • ✅ CI/CD ready with GitHub Actions integration
  • ✅ Traffic generator for realistic testing scenarios

Conclusion

Observability transforms raw metrics into actionable insights that drive better system reliability and user experience. This implementation demonstrates:

  1. Holistic Approach: Combining metrics, logs, and traces
  2. Practical Implementation: Real-world Node.js example
  3. Automation First: Scripted setup reduces barriers
  4. Best Practices: Following established patterns

Next Steps

  1. Extend metrics with business-specific measurements
  2. Implement meaningful alerting
  3. Add distributed tracing with Jaeger
  4. Apply patterns to production systems

Author: Jhon TiCona Chambi

Technologies: Node.js, Prometheus, Grafana, Express.js

Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git

Top comments (0)