|
| 1 | +This is the monitoring setup on testnet |
| 2 | + |
| 3 | +## Monitoring |
| 4 | +Monitoring Tools Overview |
| 5 | + |
| 6 | +1. Netdata: Provides real-time performance monitoring and alerting. It’s easy to set up and offers detailed visualizations of hardware metrics. |
| 7 | +2. Prometheus: Collects and stores time-series data from Netdata, and can be queried by Grafana. |
| 8 | +3. Grafana: Used for creating dashboards and visualizing data collected by Prometheus or other data sources. |
| 9 | + |
| 10 | +Considerations for using Cloudwatch? |
| 11 | + |
| 12 | +Using CloudWatch: |
| 13 | + |
| 14 | +Pros: |
| 15 | + |
| 16 | +• Centralized monitoring and logging. |
| 17 | +• Integration with other AWS services. |
| 18 | +• Scalable and managed by AWS. |
| 19 | + |
| 20 | +Cons: |
| 21 | +• Cost: CloudWatch charges based on data ingested, stored, and retrieved. |
| 22 | +• Potential additional cost for high-volume data and metrics. |
| 23 | + |
| 24 | +Storing on EC2 Machine: |
| 25 | + |
| 26 | +Pros: |
| 27 | + |
| 28 | +• Cost-effective: Avoids CloudWatch costs. |
| 29 | +• Complete control over data storage and access. |
| 30 | + |
| 31 | +Cons: |
| 32 | +• Management overhead: Need to handle log rotation, storage limits, and backups. |
| 33 | +• Less integration with AWS monitoring tools. |
| 34 | + |
| 35 | +For longterm logging and to avoid Cloudwatch to save costs, store logs on ec2 machine with logrotate |
| 36 | + |
| 37 | +### Setup netdata |
| 38 | + |
| 39 | +1. Install |
| 40 | + |
| 41 | +wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh |
| 42 | + |
| 43 | +2. Add shell script for collection. Copy code from [Step 2 in this tutorial](https://opentezos.com/node-baking/deploy-a-node/monitor-a-node/) |
| 44 | + |
| 45 | +3. Add metrics to node run command (depending on your installation either in CLI, docker or in system service ) |
| 46 | + |
| 47 | + octez-node run --rpc-addr 127.0.0.1:8732 --log-output tezos.log --metrics-addr=:9091 |
| 48 | + |
| 49 | +4. Check if netdata is running |
| 50 | + sudo systemctl status netdata |
| 51 | + |
| 52 | +### Add custom metrics |
| 53 | +0. Install |
| 54 | + |
| 55 | + sudo apt-get install jq |
| 56 | + |
| 57 | +1. Add custom script |
| 58 | + |
| 59 | + sudo nano /usr/libexec/netdata/charts.d/octez.sh |
| 60 | + |
| 61 | +2. Add this |
| 62 | + |
| 63 | +``` |
| 64 | +#!/bin/bash |
| 65 | +
|
| 66 | +octez_update_every=10 |
| 67 | +octez_priority=90000 |
| 68 | +
|
| 69 | +octez_check() { |
| 70 | + which octez-client >/dev/null 2>&1 || return 1 |
| 71 | + return 0 |
| 72 | +} |
| 73 | +
|
| 74 | +octez_get() { |
| 75 | + local data=$(curl -s http://localhost:8732/monitor/metrics) |
| 76 | + echo "octez_version:$data" |
| 77 | + echo "octez_validator_chain_is_bootsrtapped:$(echo $data | jq .is_bootstrapped)" |
| 78 | + echo "octez_p2p_connections_outgoing:$(echo $data | jq .connections.outgoing)" |
| 79 | + echo "octez_validator_chain_last_finished_request_completion_timestamp:$(echo $data | jq .last_finished_request_completion_timestamp)" |
| 80 | + echo "octez_p2p_peers_accepted:$(echo $data | jq .peers.accepted)" |
| 81 | + echo "octez_p2p_connections_active:$(echo $data | jq .connections.active)" |
| 82 | + echo "octez_store_invalid_blocks:$(echo $data | jq .invalid_blocks)" |
| 83 | + echo "octez_validator_chain_head_round:$(echo $data | jq .head.round)" |
| 84 | + echo "ocaml_gc_allocated_bytes:$(echo $data | jq .gc.allocated_bytes)" |
| 85 | + echo "octez_mempool_pending_applied:$(echo $data | jq .mempool.pending_applied)" |
| 86 | +
|
| 87 | + # Collect average CPU usage |
| 88 | + local cpu_usage=$(top -bn1 | grep "Cpu(s)" | \ |
| 89 | + sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \ |
| 90 | + awk '{print 100 - $1}') |
| 91 | + echo "average_cpu_usage:$cpu_usage" |
| 92 | +
|
| 93 | + # Collect average RAM usage |
| 94 | + local ram_usage=$(free -m | awk 'NR==2{printf "%.2f", $3*100/$2 }') |
| 95 | + echo "average_ram_usage:$ram_usage" |
| 96 | +
|
| 97 | + # Collect network inbound and outbound traffic |
| 98 | + local net_dev=$(cat /proc/net/dev | grep 'eth0' | awk '{print $2 " " $10}') |
| 99 | + local net_in=$(echo $net_dev | awk '{print $1}') |
| 100 | + local net_out=$(echo $net_dev | awk '{print $2}') |
| 101 | + echo "network_inbound:$net_in" |
| 102 | + echo "network_outbound:$net_out" |
| 103 | +} |
| 104 | +
|
| 105 | +case "$1" in |
| 106 | + get) |
| 107 | + octez_get |
| 108 | + ;; |
| 109 | + check) |
| 110 | + octez_check |
| 111 | + ;; |
| 112 | + *) |
| 113 | + echo "Usage: $0 {get|check}" |
| 114 | + exit 1 |
| 115 | + ;; |
| 116 | +esac |
| 117 | +``` |
| 118 | + |
| 119 | +3. Make script executable |
| 120 | + |
| 121 | + sudo chmod +x /usr/libexec/netdata/charts.d/octez.sh |
| 122 | + |
| 123 | +4. Edit netdata config to use that script and add |
| 124 | + |
| 125 | + sudo ./edit-config netdata.conf |
| 126 | + |
| 127 | +``` |
| 128 | +[plugins] |
| 129 | + charts.d = yes |
| 130 | +
|
| 131 | +[plugin:charts.d] |
| 132 | + # Load the custom script for Octez |
| 133 | + update every = 10 |
| 134 | + command options = octez |
| 135 | +``` |
| 136 | + |
| 137 | +5. Add this to charts.d.conf: |
| 138 | + |
| 139 | + sudo nano /etc/netdata/charts.d.conf |
| 140 | + |
| 141 | +``` |
| 142 | +octez="yes" |
| 143 | +``` |
| 144 | + |
| 145 | +6. Edit tezos-ghostnet/config.json |
| 146 | + |
| 147 | +``` |
| 148 | +{ "data-dir": "/home/ubuntu/tezos-ghostnet", |
| 149 | + "p2p": |
| 150 | + { "bootstrap-peers": |
| 151 | + [ "ghostnet.teztnets.com", "ghostnet.tzinit.org", |
| 152 | + "ghostnet.tzboot.net", "ghostnet.boot.ecadinfra.com", |
| 153 | + "ghostnet.stakenow.de:9733" ], "listen-addr": "[::]:9732" }, |
| 154 | + "shell": { "history_mode": "rolling" }, "network": "ghostnet", |
| 155 | + "metrics_addr": [ "127.0.0.1:9091" ] } |
| 156 | +``` |
| 157 | + |
| 158 | +**Note**: Normally the rpc settings would be expected here but the config init doesnt add it and if added manually it doesnt work anymore: |
| 159 | +``` |
| 160 | +"rpc": { |
| 161 | + "listen-addrs": ["127.0.0.1:8732"], |
| 162 | + "acl": [ |
| 163 | + { |
| 164 | + "address": "127.0.0.1", |
| 165 | + "blacklist": [] |
| 166 | + }, |
| 167 | + { |
| 168 | + "address": "::1", |
| 169 | + "blacklist": [] |
| 170 | + } |
| 171 | + ] |
| 172 | + }, |
| 173 | +``` |
| 174 | + |
| 175 | +5. Restart netdata |
| 176 | + |
| 177 | + sudo systemctl restart netdata |
| 178 | + sudo service netdata restart |
| 179 | + sudo journalctl -u netdata |
| 180 | + |
| 181 | + |
| 182 | +### Prometheus and Grafana |
| 183 | + |
| 184 | +For a more comprehensive setup using Prometheus and Grafana, you can start with the following: |
| 185 | + |
| 186 | +1. **Install Prometheus**: |
| 187 | + ```sh |
| 188 | + wget https://github.com/prometheus/prometheus/releases/download/v2.32.1/prometheus-2.32.1.linux-amd64.tar.gz |
| 189 | + tar xvfz prometheus-2.32.1.linux-amd64.tar.gz |
| 190 | + cd prometheus-2.32.1.linux-amd64 |
| 191 | + ./prometheus --config.file=prometheus.yml |
| 192 | + ``` |
| 193 | + |
| 194 | +2. **Install Grafana**: |
| 195 | + ```sh |
| 196 | + sudo apt-get install -y adduser libfontconfig1 |
| 197 | + wget https://dl.grafana.com/oss/release/grafana_8.3.3_amd64.deb |
| 198 | + sudo dpkg -i grafana_8.3.3_amd64.deb |
| 199 | + sudo systemctl start grafana-server |
| 200 | + sudo systemctl enable grafana-server |
| 201 | + ``` |
| 202 | + |
| 203 | +3. **Configure Prometheus to scrape metrics**: |
| 204 | + Add your targets in `prometheus.yml`: |
| 205 | + ```yaml |
| 206 | + scrape_configs: |
| 207 | + - job_name: 'node' |
| 208 | + static_configs: |
| 209 | + - targets: ['localhost:9090'] |
| 210 | + ``` |
| 211 | +
|
| 212 | +4. **Access Grafana**: |
| 213 | + Open your browser and navigate to `http://<your-ec2-instance-ip>:3000`, then configure Grafana to use Prometheus as a data source. |
| 214 | + |
0 commit comments