Solar Delegate Node monitoring script for running periodic health checks and reporting via Discord. Following facilities are monitored:
Host status
- last boot time and pending restart
- cpu load
- memory usage
- swap usage
- disk usage
Node processes
- SW Version
- Solar Relay
- Solar Forger
- True Block Weight forked from original by Solar Delegate Goose @galperins4
Network status
- Relay sync status and lag
- Forger missed blocks
- Forger rank
- Delegate voters
Discord embed side colors indicate alert status and the probes causing alert are displayed as bold underlined code style.
Project and most of the probes were inspired by Solar Delegate @mtaylan 's Solar Node Monitoring scripts
- Python3
- Python virtual environment
- Process Manager 2 (pm2)
- Webhook url associated with a Discord server & channel
pm2 stop lazy-delegate
cd ~/lazy-delegate/ && git pull
. .venv/bin/activate
pip3 install -r requirements.txt && deactivateadd the following in ~/lazy-delegate/src/config/config
DEBUG=0
DISCORD_USER='<@your_userid_not_bots>'
start and check logs
pm2 start lazy-delegate && pm2 logs lazy-delegateReplace SUDO_USER with a username having sudo rights (i.e. having sudo group) and run command below
cd && bash <(curl -s https://raw.githubusercontent.com/osrn/lazy-delegate/main/install.sh) SUDO_USERmove on to the configuration
Discord channel webhook creation is common knowledge hence not mentioned here.
cd ~/lazy-delegate && cp src/config/config.sample src/config/config
PM2='path-to-pm2-executable'
Path to pm2 executable
CHK_FORGER=1
Enable(1)/disable(0) monitoring PM2 Forger process. Relay process is always checked.
CHK_TBW=1
Enable(1)/disable(0) monitoring PM2 TBW-pool and TBW-pay processes
CHK_POOL=1
Enable(1)/disable(0) monitoring PM2 TBW-pool process
NODE_IP=xx.xx.xx.xx
IP address of the forger node to be monitored - as registered in PEER LIST
DELEGATE_NAME='xxxx'
Registered delegate name for the forger node
RANKLIMIT=52
Alert will be produced when rank > RANKLIMIT
LOCAL_API='http://127.0.0.1:6003/api'
Default is local node to query API. However, this can be set to any relay node with public API
NET_API='https://sxp.mainnet.sh/api'
Best to point to the public API for the network. Yet, it is ok to set to any relay node with public API, or even localhost. Remember to change, when Mainnet.
PRERELEASE=0
Set to 0 for Mainnet and 1 if Testnet
PROBE_CYCLE = 120
Probe execution (health check) interval in seconds. Notice that a value < 60 may suffer from github API rate limiting with a 403 Forbidden response.
DEBUG = 0
Set to 1 for verbose logging
HEARTBEAT_CYCLE = 3600
Interval in seconds for heartbeat messages sent to discord.
DISCORD_HOOK='https://discord.com/api/webhooks/xxxxx/yyyyyyyyyy'
Discord hook :)
DISCORD_USER='your_userid_not_bots'
Userid of the discord user to notify with a @mention for alert situation. User will not be mentioned if no alert or alert ceased.
start the app and monitor logs
cd ~/lazy-delegate && pm2 start package.json && pm2 logs lazy-delegateto start the app at boot with pm2
cd && pm2 saveto start pm2 at boot;
Option 1) Have sudo privileges? pm2 startup and follow the instructions
Option 2) No sudo privileges like solar? (crontab -l; echo "@reboot /bin/bash -lc \"source /home/solar/.solarrc; pm2 resurrect\"") | sort -u - | crontab -
to stop|start|restart the process on-the-fly
pm2 stop|start|restart lazy-delegateWhenever the config file changes, app needs to be restarted
pm2 restart lazy-delegateto remove the process for whatever reason:
pm2 stop lazy-delegate
pm2 delete lazy-delegate
# optionally, remove logs
rm ~/.pm2/logs/lazy-delegate*Node is probed periodically for health checks and any issues raised or cleared during the rest period are reported to Discord instantly. Issues are reported only once, the first time.
Probe class is responsible for keeping track of the values and governing the alarm raising and clearing logic.
A heartbeat status report is sent in regular intervals. Any missing report should indicate a problem with the host, node or lazy-delegate app itself.
v0.62b
Solar Core 3.3.0-next.3 API compatibility
- Adaptation for Solar Core 3.3.0-next.3 API change for use of block id in delegate attribute
v0.61b
fix: testnet release version
- added config option
PRERELEASEto specify which core version to check against; release(0) and prelease(1) branches
v0.6b
script start/stop handler
- added handler for SIGINT and SIGTERM for cleanup and service status notification
- added rank alert limit. To set, add
RANKLIMIT=xxin the config. Default value is52.
v0.56b
notification and alert improvements
- notification when rank changes
- notification when voter count changes
- voter count change is not an alert reason in heartbeat anymore
- notification now includes a footer with timestamp
v0.55b
better notification for alert conditions
- mention user in alert condition message to receive notification
- better visibility for probes causing alert condition in heartbeat
- turn on/off verbose logging via config. colored output for debug messages
v0.54b
- fix: error in last block produced check before epoch
v0.53b
- fix: rank, voters and missed blocks should not be reported if CHK_FORGER=0
- minor doc changes
v0.52b
- PM2 executable path is now read from the config (solar core 3.2.0-next2 does not export alias to user's shell)
- TBW-pool process probe now can be enabled/disabled independent of TBW-tbw & TBW-pay probes
- fixed forger process alert condition test
v0.51b install.sh
- non-sudo user friendly installation for required apt packages
- rewrites CPATH to prevent python package compilation errors (CPATH is restored back afterwards)
- Added missing python3-pip APT package to the installation
- Stop jobs before complete reinstall
documentation
- how to start pm2 at boot
v0.5b
- Values for probes with an active alert are now shown as codeblock in heartbeat status message
- An info message will be sent to the discord channel if delegate gained any voters during the rest period
- Add probe for Lazy Delegate version