Skip to content

Commit 71740fa

Browse files
author
Marvin Ottersberg
committed
docs: added complete docs
1 parent 967bb7e commit 71740fa

17 files changed

+7130
-179
lines changed

README.md

Lines changed: 1058 additions & 179 deletions
Large diffs are not rendered by default.
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
This is the monitoring setup on testnet
2+
3+
## Monitoring
4+
Monitoring Tools Overview
5+
6+
1. Netdata: Provides real-time performance monitoring and alerting. It’s easy to set up and offers detailed visualizations of hardware metrics.
7+
2. Prometheus: Collects and stores time-series data from Netdata, and can be queried by Grafana.
8+
3. Grafana: Used for creating dashboards and visualizing data collected by Prometheus or other data sources.
9+
10+
Considerations for using Cloudwatch?
11+
12+
Using CloudWatch:
13+
14+
Pros:
15+
16+
• Centralized monitoring and logging.
17+
• Integration with other AWS services.
18+
• Scalable and managed by AWS.
19+
20+
Cons:
21+
• Cost: CloudWatch charges based on data ingested, stored, and retrieved.
22+
• Potential additional cost for high-volume data and metrics.
23+
24+
Storing on EC2 Machine:
25+
26+
Pros:
27+
28+
• Cost-effective: Avoids CloudWatch costs.
29+
• Complete control over data storage and access.
30+
31+
Cons:
32+
• Management overhead: Need to handle log rotation, storage limits, and backups.
33+
• Less integration with AWS monitoring tools.
34+
35+
For longterm logging and to avoid Cloudwatch to save costs, store logs on ec2 machine with logrotate
36+
37+
### Setup netdata
38+
39+
1. Install
40+
41+
wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh
42+
43+
2. Add shell script for collection. Copy code from [Step 2 in this tutorial](https://opentezos.com/node-baking/deploy-a-node/monitor-a-node/)
44+
45+
3. Add metrics to node run command (depending on your installation either in CLI, docker or in system service )
46+
47+
octez-node run --rpc-addr 127.0.0.1:8732 --log-output tezos.log --metrics-addr=:9091
48+
49+
4. Check if netdata is running
50+
sudo systemctl status netdata
51+
52+
### Add custom metrics
53+
0. Install
54+
55+
sudo apt-get install jq
56+
57+
1. Add custom script
58+
59+
sudo nano /usr/libexec/netdata/charts.d/octez.sh
60+
61+
2. Add this
62+
63+
```
64+
#!/bin/bash
65+
66+
octez_update_every=10
67+
octez_priority=90000
68+
69+
octez_check() {
70+
which octez-client >/dev/null 2>&1 || return 1
71+
return 0
72+
}
73+
74+
octez_get() {
75+
local data=$(curl -s http://localhost:8732/monitor/metrics)
76+
echo "octez_version:$data"
77+
echo "octez_validator_chain_is_bootsrtapped:$(echo $data | jq .is_bootstrapped)"
78+
echo "octez_p2p_connections_outgoing:$(echo $data | jq .connections.outgoing)"
79+
echo "octez_validator_chain_last_finished_request_completion_timestamp:$(echo $data | jq .last_finished_request_completion_timestamp)"
80+
echo "octez_p2p_peers_accepted:$(echo $data | jq .peers.accepted)"
81+
echo "octez_p2p_connections_active:$(echo $data | jq .connections.active)"
82+
echo "octez_store_invalid_blocks:$(echo $data | jq .invalid_blocks)"
83+
echo "octez_validator_chain_head_round:$(echo $data | jq .head.round)"
84+
echo "ocaml_gc_allocated_bytes:$(echo $data | jq .gc.allocated_bytes)"
85+
echo "octez_mempool_pending_applied:$(echo $data | jq .mempool.pending_applied)"
86+
87+
# Collect average CPU usage
88+
local cpu_usage=$(top -bn1 | grep "Cpu(s)" | \
89+
sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \
90+
awk '{print 100 - $1}')
91+
echo "average_cpu_usage:$cpu_usage"
92+
93+
# Collect average RAM usage
94+
local ram_usage=$(free -m | awk 'NR==2{printf "%.2f", $3*100/$2 }')
95+
echo "average_ram_usage:$ram_usage"
96+
97+
# Collect network inbound and outbound traffic
98+
local net_dev=$(cat /proc/net/dev | grep 'eth0' | awk '{print $2 " " $10}')
99+
local net_in=$(echo $net_dev | awk '{print $1}')
100+
local net_out=$(echo $net_dev | awk '{print $2}')
101+
echo "network_inbound:$net_in"
102+
echo "network_outbound:$net_out"
103+
}
104+
105+
case "$1" in
106+
get)
107+
octez_get
108+
;;
109+
check)
110+
octez_check
111+
;;
112+
*)
113+
echo "Usage: $0 {get|check}"
114+
exit 1
115+
;;
116+
esac
117+
```
118+
119+
3. Make script executable
120+
121+
sudo chmod +x /usr/libexec/netdata/charts.d/octez.sh
122+
123+
4. Edit netdata config to use that script and add
124+
125+
sudo ./edit-config netdata.conf
126+
127+
```
128+
[plugins]
129+
charts.d = yes
130+
131+
[plugin:charts.d]
132+
# Load the custom script for Octez
133+
update every = 10
134+
command options = octez
135+
```
136+
137+
5. Add this to charts.d.conf:
138+
139+
sudo nano /etc/netdata/charts.d.conf
140+
141+
```
142+
octez="yes"
143+
```
144+
145+
6. Edit tezos-ghostnet/config.json
146+
147+
```
148+
{ "data-dir": "/home/ubuntu/tezos-ghostnet",
149+
"p2p":
150+
{ "bootstrap-peers":
151+
[ "ghostnet.teztnets.com", "ghostnet.tzinit.org",
152+
"ghostnet.tzboot.net", "ghostnet.boot.ecadinfra.com",
153+
"ghostnet.stakenow.de:9733" ], "listen-addr": "[::]:9732" },
154+
"shell": { "history_mode": "rolling" }, "network": "ghostnet",
155+
"metrics_addr": [ "127.0.0.1:9091" ] }
156+
```
157+
158+
**Note**: Normally the rpc settings would be expected here but the config init doesnt add it and if added manually it doesnt work anymore:
159+
```
160+
"rpc": {
161+
"listen-addrs": ["127.0.0.1:8732"],
162+
"acl": [
163+
{
164+
"address": "127.0.0.1",
165+
"blacklist": []
166+
},
167+
{
168+
"address": "::1",
169+
"blacklist": []
170+
}
171+
]
172+
},
173+
```
174+
175+
5. Restart netdata
176+
177+
sudo systemctl restart netdata
178+
sudo service netdata restart
179+
sudo journalctl -u netdata
180+
181+
182+
### Prometheus and Grafana
183+
184+
For a more comprehensive setup using Prometheus and Grafana, you can start with the following:
185+
186+
1. **Install Prometheus**:
187+
```sh
188+
wget https://github.com/prometheus/prometheus/releases/download/v2.32.1/prometheus-2.32.1.linux-amd64.tar.gz
189+
tar xvfz prometheus-2.32.1.linux-amd64.tar.gz
190+
cd prometheus-2.32.1.linux-amd64
191+
./prometheus --config.file=prometheus.yml
192+
```
193+
194+
2. **Install Grafana**:
195+
```sh
196+
sudo apt-get install -y adduser libfontconfig1
197+
wget https://dl.grafana.com/oss/release/grafana_8.3.3_amd64.deb
198+
sudo dpkg -i grafana_8.3.3_amd64.deb
199+
sudo systemctl start grafana-server
200+
sudo systemctl enable grafana-server
201+
```
202+
203+
3. **Configure Prometheus to scrape metrics**:
204+
Add your targets in `prometheus.yml`:
205+
```yaml
206+
scrape_configs:
207+
- job_name: 'node'
208+
static_configs:
209+
- targets: ['localhost:9090']
210+
```
211+
212+
4. **Access Grafana**:
213+
Open your browser and navigate to `http://<your-ec2-instance-ip>:3000`, then configure Grafana to use Prometheus as a data source.
214+
305 KB
Loading

0 commit comments

Comments
 (0)