Debian Nginx监控与报警实现指南
Nginx自带的stub_status模块可提供实时性能数据,是轻量级监控的基础。
/etc/nginx/nginx.conf或站点配置文件),添加以下内容:server { listen 80; server_name localhost; location /nginx_status { stub_status on; allow 127.0.0.1; # 仅允许本地访问 deny all; } } 保存后重启Nginx:sudo systemctl restart nginx。http://localhost/nginx_status,输出结果包含: Active connections:当前活跃连接数(包括Reading/Writing/Waiting);server accepts handled requests:总连接数/成功处理数/总请求数;Reading/Writing/Waiting:读取请求头、发送响应、空闲keep-alive连接的连接数。#!/bin/bash STATUS=$(curl -s http://localhost/nginx_status) ACTIVE=$(echo "$STATUS" | awk '/Active/ {print $3}') MAX_CONN=500 # 最大连接数阈值 if [ "$ACTIVE" -gt "$MAX_CONN" ]; then echo "High active connections: $ACTIVE" | mail -s "Nginx Alert" admin@example.com fi 将脚本添加到cron(如每5分钟执行一次):*/5 * * * * /path/to/script.sh。对于分布式或大规模环境,**Prometheus(指标收集)+ Grafana(可视化与报警)**是行业标准方案。
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter-0.11.0.linux-amd64.tar.gz tar -zxvf nginx-prometheus-exporter-*.tar.gz -C /usr/local/bin chmod +x /usr/local/bin/nginx-prometheus-exporter /metrics接口(需与Exporter配置一致):location /metrics { stub_status on; access_log off; allow 127.0.0.1; deny all; } 重启Nginx。nohup /usr/local/bin/nginx-prometheus-exporter -nginx.scrape-uri=http://localhost/metrics > /dev/null 2>&1 & /etc/prometheus/prometheus.yml),添加Nginx job:scrape_configs: - job_name: 'nginx' static_configs: - targets: ['localhost:9113'] # Exporter默认端口 重启Prometheus:sudo systemctl restart prometheus。http://localhost:3000),进入Configuration > Data Sources,选择Prometheus并配置URL(http://localhost:9090);+ > Import,输入官方仪表盘ID(如12708),即可查看请求量、响应时间、错误率等可视化指标;Alerting > New alert rule,选择指标(如nginx_http_requests_total),设置条件(如rate(nginx_http_requests_total[5m]) > 1000),并配置通知渠道(如Email、Slack)。Nginx日志(access.log/error.log)是排查问题的关键,可通过以下工具实现实时监控与报警。
log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$request_time"'; access_log /var/log/nginx/access.log main; error_log /var/log/nginx/error.log; 重启Nginx使配置生效。tail -f /var/log/nginx/error.log | grep "HTTP/1.1\" 5";awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10;tail -f /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr。sudo apt install goaccess goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED 浏览器访问http://server_ip/report.html即可查看;#!/bin/bash ERR_COUNT=$(grep "HTTP/1.1\" 5" /var/log/nginx/error.log | wc -l) MAX_ERR=5 # 5xx错误阈值 if [ "$ERR_COUNT" -gt "$MAX_ERR" ]; then echo "High 5xx errors: $ERR_COUNT" | mail -s "Nginx Error Alert" admin@example.com fi 添加到cron(如每小时执行一次):0 * * * * /path/to/script.sh。报警是监控的最后一环,需根据业务需求设置合理的阈值。常见报警方式:
mail命令发送(需配置Postfix或Sendmail),如上述脚本中的mail -s "Nginx Alert" admin@example.com;Alerting > Notification channels中配置;if ! pgrep nginx > /dev/null; then systemctl restart nginx echo "Nginx restarted at $(date)" >> /var/log/nginx_monitor.log fi logrotate工具定期压缩、删除旧日志,防止磁盘空间耗尽(配置文件:/etc/logrotate.d/nginx)。