Introduction
Once your web application hits production, the most critical question becomes: how is it performing right now? Logs tell you what happened, but you want to spot problems before users start complaining.
In this article, I'll share how I built a complete monitoring system for Peakline — a FastAPI application for Strava data analysis that processes thousands of requests daily from athletes worldwide.
What's Inside:
- Metrics architecture (HTTP, API, business metrics)
- Prometheus + Grafana setup from scratch
- 50+ production-ready metrics
- Advanced PromQL queries
- Reactive dashboards
- Best practices and pitfalls
Architecture: Three Monitoring Levels
Modern monitoring isn't just "set up Grafana and look at graphs." It's a well-thought-out architecture with several layers:
┌─────────────────────────────────────────────────┐ │ FastAPI Application │ │ ├── HTTP Middleware (auto-collect metrics) │ │ ├── Business Logic (business metrics) │ │ └── /metrics endpoint (Prometheus format) │ └──────────────────┬──────────────────────────────┘ │ scrape every 5s ┌──────────────────▼──────────────────────────────┐ │ Prometheus │ │ ├── Time Series Database (TSDB) │ │ ├── Storage retention: 200h │ │ └── PromQL Engine │ └──────────────────┬──────────────────────────────┘ │ query data ┌──────────────────▼──────────────────────────────┐ │ Grafana │ │ ├── Dashboards │ │ ├── Alerting │ │ └── Visualization │ └─────────────────────────────────────────────────┘ Why This Stack?
Prometheus — the de-facto standard for metrics. Pull model, powerful PromQL query language, excellent Kubernetes integration.
Grafana — the best visualization tool. Beautiful dashboards, alerting, templating, rich UI.
FastAPI — async Python framework with native metrics support via prometheus_client.
Basic Infrastructure Setup
Docker Compose: 5-Minute Quick Start
First, let's spin up Prometheus and Grafana in Docker:
# docker-compose.yml version: '3.8' services: prometheus: image: prom/prometheus:latest container_name: prometheus ports: - "9090:9090" volumes: - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention.time=200h' # 8+ days of history - '--web.enable-lifecycle' networks: - monitoring extra_hosts: - "host.docker.internal:host-gateway" # Access host machine grafana: image: grafana/grafana:latest container_name: grafana ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} # Use .env! - GF_SERVER_ROOT_URL=/grafana # For nginx reverse proxy volumes: - grafana_data:/var/lib/grafana - ./monitoring/grafana/provisioning:/etc/grafana/provisioning depends_on: - prometheus networks: - monitoring volumes: prometheus_data: grafana_data: networks: monitoring: driver: bridge Key Points:
-
storage.tsdb.retention.time=200h— keep metrics for 8+ days (for weekly analysis) -
extra_hosts: host.docker.internal— allows Prometheus to reach the app on the host - Volumes for data persistence
Prometheus Configuration
# monitoring/prometheus.yml global: scrape_interval: 15s # How often to collect metrics evaluation_interval: 15s # How often to check alerts scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'webapp' static_configs: - targets: ['host.docker.internal:8000'] # Your app port scrape_interval: 5s # More frequent for web apps metrics_path: /metrics Important: scrape_interval: 5s for web apps is a balance between data freshness and system load. In production, typically 15-30s.
Grafana Datasource Provisioning
To avoid manual Prometheus setup in Grafana, use provisioning:
# monitoring/grafana/provisioning/datasources/prometheus.yml apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 isDefault: true editable: true Now Grafana automatically connects to Prometheus on startup.
docker-compose up -d Level 1: HTTP Metrics
The most basic but critically important layer — monitoring HTTP requests. Middleware automatically collects metrics for all HTTP requests.
Metrics Initialization
# webapp/main.py from prometheus_client import Counter, Histogram, CollectorRegistry, generate_latest, CONTENT_TYPE_LATEST from fastapi import FastAPI, Request from fastapi.responses import PlainTextResponse import time app = FastAPI(title="Peakline", version="2.0.0") # Create separate registry for metrics isolation registry = CollectorRegistry() # Counter: monotonically increasing value (request count) http_requests_total = Counter( 'http_requests_total', 'Total number of HTTP requests', ['method', 'endpoint', 'status_code'], # Labels for grouping registry=registry ) # Histogram: distribution of values (execution time) http_request_duration_seconds = Histogram( 'http_request_duration_seconds', 'HTTP request duration in seconds', ['method', 'endpoint'], registry=registry ) # API call counters api_calls_total = Counter( 'api_calls_total', 'Total number of API calls by type', ['api_type'], registry=registry ) # Separate error counters http_errors_4xx_total = Counter( 'http_errors_4xx_total', 'Total number of 4xx HTTP errors', ['endpoint', 'status_code'], registry=registry ) http_errors_5xx_total = Counter( 'http_errors_5xx_total', 'Total number of 5xx HTTP errors', ['endpoint', 'status_code'], registry=registry ) Middleware for Automatic Collection
The magic happens in middleware — it wraps every request:
@app.middleware("http") async def metrics_middleware(request: Request, call_next): start_time = time.time() # Execute request response = await call_next(request) duration = time.time() - start_time # Path normalization: /api/activities/12345 → /api/activities/{id} path = request.url.path if path.startswith('/api/'): parts = path.split('/') if len(parts) > 3 and parts[3].isdigit(): parts[3] = '{id}' path = '/'.join(parts) # Record metrics http_requests_total.labels( method=request.method, endpoint=path, status_code=str(response.status_code) ).inc() http_request_duration_seconds.labels( method=request.method, endpoint=path ).observe(duration) # Track API calls if path.startswith('/api/'): api_type = path.split('/')[2] if len(path.split('/')) > 2 else 'unknown' api_calls_total.labels(api_type=api_type).inc() # Track errors separately status_code = response.status_code if 400 <= status_code < 500: http_errors_4xx_total.labels(endpoint=path, status_code=str(status_code)).inc() elif status_code >= 500: http_errors_5xx_total.labels(endpoint=path, status_code=str(status_code)).inc() return response Key Techniques:
Path normalization — critically important! Without this, you'll get thousands of unique metrics for
/api/activities/1,/api/activities/2, etc.Labels — allow filtering and grouping metrics in PromQL
Separate error counters — simplifies alert writing
Metrics Endpoint
@app.get("/metrics") async def metrics(): """Prometheus metrics endpoint""" return PlainTextResponse( generate_latest(registry), media_type=CONTENT_TYPE_LATEST ) Now Prometheus can collect metrics from http://localhost:8000/metrics.
What We Get in Prometheus
# Metrics format in /metrics endpoint: http_requests_total{method="GET",endpoint="/api/activities",status_code="200"} 1543 http_requests_total{method="POST",endpoint="/api/activities",status_code="201"} 89 http_request_duration_seconds_bucket{method="GET",endpoint="/api/activities",le="0.1"} 1234 Level 2: External API Metrics
Web applications often integrate with external APIs (Stripe, AWS, etc.). It's important to track not only your own requests but also dependencies.
External API Metrics
# External API metrics external_api_calls_total = Counter( 'external_api_calls_total', 'Total number of external API calls by endpoint type', ['endpoint_type'], registry=registry ) external_api_errors_total = Counter( 'external_api_errors_total', 'Total number of external API errors by endpoint type', ['endpoint_type'], registry=registry ) external_api_latency_seconds = Histogram( 'external_api_latency_seconds', 'External API call latency in seconds', ['endpoint_type'], registry=registry ) API Call Tracking Helper
Instead of duplicating code everywhere you call the API, create a universal wrapper:
async def track_external_api_call(endpoint_type: str, api_call_func, *args, **kwargs): """ Universal wrapper for tracking API calls Usage: result = await track_external_api_call( 'athlete_activities', client.get_athlete_activities, athlete_id=123 ) """ start_time = time.time() try: # Increment call counter external_api_calls_total.labels(endpoint_type=endpoint_type).inc() # Execute API call result = await api_call_func(*args, **kwargs) # Record latency duration = time.time() - start_time external_api_latency_seconds.labels(endpoint_type=endpoint_type).observe(duration) # Check for API errors (status >= 400) if isinstance(result, Exception) or (hasattr(result, 'status') and result.status >= 400): external_api_errors_total.labels(endpoint_type=endpoint_type).inc() return result except Exception as e: # Record latency and error duration = time.time() - start_time external_api_latency_seconds.labels(endpoint_type=endpoint_type).observe(duration) external_api_errors_total.labels(endpoint_type=endpoint_type).inc() raise e Usage in Code
@app.get("/api/activities") async def get_activities(athlete_id: int): # Instead of direct API call: # activities = await external_client.get_athlete_activities(athlete_id) # Use wrapper with tracking: activities = await track_external_api_call( 'athlete_activities', external_client.get_athlete_activities, athlete_id=athlete_id ) return activities Now we can see:
- How many calls to each external API endpoint
- How many returned errors
- Latency for each call type
Level 3: Business Metrics
This is the most valuable part of monitoring — metrics that reflect actual application usage.
Business Metrics Types
# === Authentication === user_logins_total = Counter( 'user_logins_total', 'Total number of user logins', registry=registry ) user_registrations_total = Counter( 'user_registrations_total', 'Total number of new user registrations', registry=registry ) user_deletions_total = Counter( 'user_deletions_total', 'Total number of user deletions', registry=registry ) # === File Operations === fit_downloads_total = Counter( 'fit_downloads_total', 'Total number of FIT file downloads', registry=registry ) gpx_downloads_total = Counter( 'gpx_downloads_total', 'Total number of GPX file downloads', registry=registry ) gpx_uploads_total = Counter( 'gpx_uploads_total', 'Total number of GPX file uploads', registry=registry ) # === User Actions === settings_updates_total = Counter( 'settings_updates_total', 'Total number of user settings updates', registry=registry ) feature_requests_total = Counter( 'feature_requests_total', 'Total number of feature requests', registry=registry ) feature_votes_total = Counter( 'feature_votes_total', 'Total number of votes for features', registry=registry ) # === Reports === manual_reports_total = Counter( 'manual_reports_total', 'Total number of manually created reports', registry=registry ) auto_reports_total = Counter( 'auto_reports_total', 'Total number of automatically created reports', registry=registry ) failed_reports_total = Counter( 'failed_reports_total', 'Total number of failed report creation attempts', registry=registry ) Incrementing in Code
@app.post("/api/auth/login") async def login(credentials: LoginCredentials): user = await authenticate_user(credentials) if user: # Increment successful login counter user_logins_total.inc() return {"token": generate_token(user)} return {"error": "Invalid credentials"} @app.post("/api/activities/report") async def create_report(activity_id: int, is_auto: bool = False): try: report = await generate_activity_report(activity_id) # Different counters for manual and automatic reports if is_auto: auto_reports_total.inc() else: manual_reports_total.inc() return report except Exception as e: failed_reports_total.inc() raise e Level 4: Performance and Caching
Cache Metrics
Cache is a critical part of performance. Need to track hit rate:
cache_hits_total = Counter( 'cache_hits_total', 'Total number of cache hits', ['cache_type'], registry=registry ) cache_misses_total = Counter( 'cache_misses_total', 'Total number of cache misses', ['cache_type'], registry=registry ) # In caching code: async def get_from_cache(key: str, cache_type: str = 'generic'): value = await cache.get(key) if value is not None: cache_hits_total.labels(cache_type=cache_type).inc() return value else: cache_misses_total.labels(cache_type=cache_type).inc() return None Background Task Metrics
If you have background tasks (Celery, APScheduler), track them:
background_task_duration_seconds = Histogram( 'background_task_duration_seconds', 'Background task execution time', ['task_type'], registry=registry ) async def run_background_task(task_type: str, task_func, *args, **kwargs): start_time = time.time() try: result = await task_func(*args, **kwargs) return result finally: duration = time.time() - start_time background_task_duration_seconds.labels(task_type=task_type).observe(duration) PromQL: Metrics Query Language
Prometheus uses its own query language — PromQL. Not SQL, but very powerful.
Basic Queries
# 1. Just get metric (instant vector) http_requests_total # 2. Filter by labels http_requests_total{method="GET"} http_requests_total{status_code="200"} http_requests_total{method="GET", endpoint="/api/activities"} # 3. Regular expressions in labels http_requests_total{status_code=~"5.."} # All 5xx errors http_requests_total{endpoint=~"/api/.*"} # All API endpoints # 4. Time interval (range vector) http_requests_total[5m] # Data for last 5 minutes Rate and irate: Rate of Change
Counter constantly grows, but we need rate of change — RPS (requests per second):
# Rate - average rate over interval rate(http_requests_total[5m]) # irate - instantaneous rate (between last two points) irate(http_requests_total[5m]) When to use what:
-
rate()— for alerts and trend graphs (smooths spikes) -
irate()— for detailed analysis (shows peaks)
Aggregation with sum, avg, max
# Total app RPS sum(rate(http_requests_total[5m])) # RPS by method sum(rate(http_requests_total[5m])) by (method) # RPS by endpoint, sorted sort_desc(sum(rate(http_requests_total[5m])) by (endpoint)) # Average latency avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])) Histogram and Percentiles
For Histogram metrics (latency, duration) use histogram_quantile:
# P50 (median) latency histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[5m])) # P95 latency histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) # P99 latency (99% of requests faster than this) histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) # P95 per endpoint histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) by (endpoint) Complex Queries
1. Success Rate (percentage of successful requests)
( sum(rate(http_requests_total{status_code=~"2.."}[5m])) / sum(rate(http_requests_total[5m])) ) * 100 2. Error Rate (percentage of errors)
( sum(rate(http_requests_total{status_code=~"4..|5.."}[5m])) / sum(rate(http_requests_total[5m])) ) * 100 3. Cache Hit Rate
( sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m]))) ) * 100 4. Top-5 Slowest Endpoints
topk(5, histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]) ) by (endpoint) ) 5. API Health Score (0-100)
( ( sum(rate(external_api_calls_total[5m])) - sum(rate(external_api_errors_total[5m])) ) / sum(rate(external_api_calls_total[5m])) ) * 100 Grafana Dashboards: Visualization
Now the fun part — turning raw metrics into beautiful and informative dashboards.
Dashboard 1: HTTP & Performance
Panel 1: Request Rate
sum(rate(http_requests_total[5m])) - Type: Time series
- Color: Blue gradient
- Unit: requests/sec
- Legend: Total RPS
Panel 2: Success Rate
( sum(rate(http_requests_total{status_code=~"2.."}[5m])) / sum(rate(http_requests_total[5m])) ) * 100 - Type: Stat
- Color: Green if > 95%, yellow if > 90%, red if < 90%
- Unit: percent (0-100)
- Value: Current (last)
Panel 3: Response Time (P50, P95, P99)
# P50 histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[5m])) # P95 histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) # P99 histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) - Type: Time series
- Unit: seconds (s)
- Legend: P50, P95, P99
Panel 4: Errors by Type
sum(rate(http_requests_total{status_code=~"4.."}[5m])) by (status_code) sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (status_code) - Type: Bar chart
- Colors: Yellow (4xx), Red (5xx)
Panel 5: Request Rate by Endpoint
sort_desc(sum(rate(http_requests_total[5m])) by (endpoint)) - Type: Bar chart
- Limit: Top 10
Dashboard 2: Business Metrics
This dashboard shows real product usage — what users do and how often.
Panel 1: User Activity (24h)
# Logins increase(user_logins_total[24h]) # Registrations increase(user_registrations_total[24h]) # Deletions increase(user_deletions_total[24h]) - Type: Stat
- Layout: Horizontal
Panel 2: Downloads by Type
sum(rate({__name__=~".*_downloads_total"}[5m])) by (__name__) - Type: Pie chart
- Legend: Right side
Panel 3: Feature Usage Timeline
rate(gpx_fixer_usage_total[5m]) rate(search_usage_total[5m]) rate(manual_reports_total[5m]) - Type: Time series
- Stacking: Normal
Dashboard 3: External API
Critical to monitor dependencies on external services — they can become bottlenecks.
Panel 1: API Health Score
( sum(rate(external_api_calls_total[5m])) - sum(rate(external_api_errors_total[5m])) ) / sum(rate(external_api_calls_total[5m])) * 100 - Type: Gauge
- Min: 0, Max: 100
- Thresholds: 95 (green), 90 (yellow), 0 (red)
Panel 2: API Latency by Endpoint
histogram_quantile(0.95, rate(external_api_latency_seconds_bucket[5m])) by (endpoint_type) - Type: Bar chart
- Sort: Descending
Panel 3: Error Rate by Endpoint
sum(rate(external_api_errors_total[5m])) by (endpoint_type) - Type: Bar chart
- Color: Red
Variables: Dynamic Dashboards
Grafana supports variables for interactive dashboards:
Creating a Variable
- Dashboard Settings → Variables → Add variable
- Name:
endpoint - Type: Query
- Query:
label_values(http_requests_total, endpoint) Using in Panels
# Filter by selected endpoint sum(rate(http_requests_total{endpoint="$endpoint"}[5m])) # Multi-select sum(rate(http_requests_total{endpoint=~"$endpoint"}[5m])) by (endpoint) Useful Variables
# Time interval Variable: interval Type: Interval Values: 1m,5m,10m,30m,1h # HTTP method Variable: method Query: label_values(http_requests_total, method) # Status code Variable: status_code Query: label_values(http_requests_total, status_code) Alerting: System Reactivity
Monitoring without alerts is like a car without brakes. Let's set up smart alerts.
Grafana Alerting
Alert 1: High Error Rate
( sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) ) * 100 > 1 - Condition:
> 1(more than 1% errors) - For: 5m (for 5 minutes)
- Severity: Critical
- Notification: Slack, Email, Telegram
Alert 2: High Latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2 - Condition: P95 > 2 seconds
- For: 10m
- Severity: Warning
Alert 3: External API Down
sum(rate(external_api_errors_total[5m])) / sum(rate(external_api_calls_total[5m])) > 0.5 - Condition: More than 50% API errors
- For: 2m
- Severity: Critical
Alert 4: No Data
absent_over_time(http_requests_total[10m]) - Condition: No metrics for 10 minutes
- Severity: Critical
- Means: app crashed or Prometheus can't collect metrics
Best Practices: Battle-Tested Experience
1. Labels: Don't Overdo It
❌ Bad:
# Too detailed labels = cardinality explosion http_requests_total.labels( method=request.method, endpoint=request.url.path, # Every unique URL! user_id=str(user.id), # Thousands of users! timestamp=str(time.time()) # Infinite values! ).inc() ✅ Good:
# Normalized endpoints + limited label set http_requests_total.labels( method=request.method, endpoint=normalize_path(request.url.path), # /api/users/{id} status_code=str(response.status_code) ).inc() Rule: High-cardinality data (user_id, timestamps, unique IDs) should NOT be labels.
2. Naming Convention
Follow Prometheus naming conventions:
# Good names: http_requests_total # <namespace>_<name>_<unit> external_api_latency_seconds # Unit in name cache_hits_total # Clear it's a Counter # Bad names: RequestCount # Not CamelCase api-latency # Don't use dashes request_time # Unit not specified 3. Rate() Interval
Rate() interval should be minimum 4x larger than scrape_interval:
# If scrape_interval = 15s rate(http_requests_total[1m]) # 4x = 60s ✅ rate(http_requests_total[30s]) # 2x = poor accuracy ❌ 4. Histogram Buckets
Proper buckets are critical for accurate percentiles:
# Default (bad for latency): Histogram('latency_seconds', 'Latency') # [.005, .01, .025, .05, .1, ...] # Custom buckets for web latency: Histogram( 'http_request_duration_seconds', 'Request latency', buckets=[.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10] ) Principle: Buckets should cover the typical range of values.
5. Metrics Cost
Every metric costs memory. Let's calculate:
Memory = Series count × (~3KB per series) Series = Metric × Label combinations Example:
# 1 metric × 5 methods × 20 endpoints × 15 status codes = 1,500 series http_requests_total{method, endpoint, status_code} # 1,500 × 3KB = ~4.5MB for one metric! Tip: Regularly check cardinality:
# Top metrics by cardinality topk(10, count by (__name__)({__name__=~".+"})) Production Checklist
Before launching in production, check:
- [ ] Retention policy configured (
storage.tsdb.retention.time) - [ ] Disk space monitored (Prometheus can take a lot of space)
- [ ] Backups configured for Grafana dashboards
- [ ] Alerts tested (create artificial error)
- [ ] Notification channels work (send test alert)
- [ ] Access control configured (don't leave Grafana with admin/admin!)
- [ ] HTTPS configured for Grafana (via nginx reverse proxy)
- [ ] Cardinality checked (
topk(10, count by (__name__)({__name__=~".+"}))) - [ ] Documentation created (what metric is responsible for what)
- [ ] On-call process defined (who gets alerts and what to do)
Real Case: Finding a Problem
Imagine: users complain about slow performance. Here's how monitoring helped find and fix the problem in minutes.
Step 1: Open Grafana → HTTP Performance Dashboard
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) We see: P95 latency jumped from 0.2s to 3s.
Step 2: Check latency by endpoint
topk(5, histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) by (endpoint)) Found: /api/activities — 5 seconds!
Step 3: Check external APIs
histogram_quantile(0.95, rate(external_api_latency_seconds_bucket[5m])) by (endpoint_type) External API athlete_activities — 4.8 seconds. There's the problem!
Step 4: Check error rate
rate(external_api_errors_total{endpoint_type="athlete_activities"}[5m]) No errors, just slow. So the problem isn't on our side — external service is lagging.
Solution:
- Add aggressive caching for external API (TTL 5 minutes)
- Set up alert for latency > 2s
- Add timeout to requests
Step 5: After deploy, verify
# Cache hit rate (cache_hits_total / (cache_hits_total + cache_misses_total)) * 100 Hit rate 85% → latency dropped to 0.3s. Victory! 🎉
What's Next?
You've built a production-ready monitoring system. But this is just the beginning:
Next Steps:
- Distributed Tracing — add Jaeger/Tempo for request tracing
- Logging — integrate Loki for centralized logs
- Custom Dashboards — create dashboards for business (not just DevOps)
- SLO/SLI — define Service Level Objectives
- Anomaly Detection — use machine learning for anomaly detection
- Cost Monitoring — add cost metrics (AWS CloudWatch, etc.)
Useful Resources:
Conclusion
A monitoring system isn't "set it and forget it." It's a living organism that needs to evolve with your application. But the basic architecture we've built scales from startup to enterprise.
Key Takeaways:
- Three metric levels: HTTP (infrastructure) → API (dependencies) → Business (product)
- Middleware automates basic metrics collection
- PromQL is powerful — learn gradually
- Labels matter — but don't overdo cardinality
- Alerts are critical — monitoring without alerts is useless
- Document — in six months you'll forget what
foo_bar_totalmeans
Monitoring is a culture, not a tool. Start simple, iterate, improve. And your application will run stably, while you sleep peacefully 😴
About Peakline
This monitoring system was built for Peakline — a web application for Strava activity analysis. Peakline provides athletes with:
- Detailed segment analysis with interactive maps
- Historical weather data for every activity
- Advanced FIT file generation for virtual races
- Automatic GPX track error correction
- Route planner
All these features require reliable monitoring to ensure quality user experience.
Questions? Leave them in the comments!
P.S. If you found this helpful — share with colleagues who might benefit!
About the Author
Solo developer building Peakline — tools for athletes. Athlete and enthusiast myself, believe in automation, observability, and quality code. Continuing to develop the project and share experience with the community in 2025.
Connect
- 🌐 Peakline Website
- 💬 Share your monitoring setup in comments
- 📧 Questions? Drop a comment below!
Tags: #prometheus #grafana #monitoring #python #fastapi #devops #observability #sre #metrics #production




Top comments (0)