Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.
- Installation
- Getting started
- Features and additional information
- Code snippets
- System monitoring and server-side backends installation and configuration
Click here if you don't have allsw repo configured
- Install
CERN-CA-certspackage (required byaliswrepo) (as root)
yum -y install CERN-CA-certs
- Add
aliswrepo (as root)
cat > /etc/yum.repos.d/alisw-el7.repo <<EOF
[alisw-el7]
name=ALICE Software - EL7
baseurl=https://ali-ci.cern.ch/repo/RPMS/el7.x86_64/
enabled=1
gpgcheck=0
EOF
- Install Monitoring RPM package (as root)
yum -y install alisw-Monitoring+v1.5.0-1.x86_64
- Configure Modules
export MODULEPATH=/opt/alisw/el7/modulefiles:$MODULEPATH
- Load enviroment
eval `modulecmd bash load Monitoring/v1.5.0-1`
The installation directory is: /opt/alisw/el7/Monitoring/v1.5.0-1
Click here if you don't have aliBuild installed
- Enable Software Collections in order to install
devtoolset-6which includesgcc 6.2.0(as root)
yum install -y centos-release-scl
yum-config-manager --enable rhel-server-rhscl-7-rpms
- Install packages required by aliBuild (as root)
yum install -y curl curl-devel python python-devel devtoolset-6 \
python-pip bzip2-devel autoconf automake texinfo git \
gettext-devel libtool freetype freetype-devel libpng \
libpng-devel sqlite sqlite-devel ncurses-devel gettext \
mesa-libGLU-devel libX11-devel libXpm-devel libXext-devel \
libXft-devel libxml2 libxml2-devel environment-modules
- Upgrade
pipandsetuptools, install aliBuild and missing python packags (as root)
pip install --upgrade pip
pip install -U setuptools
pip install alibuild matplotlib numpy certifi ipython==5.4.1 ipywidgets \
ipykernel notebook metakernel pyyaml
- Enable
devtoolset-6. It's recommened to add it to~/.basrcor similar file
source /opt/rh/devtoolset-6/enable
- Set
alienvroot directory and load the environment
mkdir alice; cd alice
ALICE_WORK_DIR=`pwd`/sw; eval "`alienv shell-helper`"
- Compile
Monitoringand its dependecies viaaliBuild
aliBuild init Monitoring@master
aliBuild build Monitoring --defaults o2-daq
- Load the enviroment for Monitoring (in the
alicedirectory)
alienv load Monitoring/latest
In case of an issue with aliBuild refer to the official instructions.
Manual installation of the O2 Monitoring module.
- C++ compiler with C++14 support, eg.:
gcc-c++package fromdevtoolset-6on CentOS 7clang++on Mac OS
- Boost >= 1.56
- libcurl
- ApMon (optional)
git clone https://github.com/AliceO2Group/Monitoring.git
cd Monitoring; mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<installdir>
make -j
make install
The recommended way of getting (unique_ptr to) monitoring instance is Geting it from MonitoringFactory by passing URI(s) as a parameter (comma seperated if more than one).
AliceO2::Monitoring::MonitoringFactory::Get("backend[-protocol]://host:port[?query]");See table below to find out how to create URI for each backend:
| Backend name | Transport | URI backend[-protocol] | URI query |
|---|---|---|---|
| InfluxDB | HTTP | influxdb-http |
/write?db=<db> |
| InfluxDB | UDP | influxdb-udp |
- |
| ApMon | UDP | apmon |
- |
| InfoLogger | - | infologger |
- |
| Flume | UDP | flume |
- |
Simplified send method:
send(T value, std::string name)Or more advanced overload of send:
send(Metric&& metric)For example:
collector->send(10, "myMetricInt");
collector->send({10, "myMetricInt"});Two additional methods can be chained the to send(Metric&& metric) in order to insert custom tags or set custom timestamp:
addTags(std::vector<Tag>&& tags)setTimestamp(std::chrono::time_point<std::chrono::system_clock>& timestamp)
For example:
collector->send(Metric{10, "myMetric"}.addTags({{"tag1", "value1"}, {"tag2", "value2"}}));
collector->send(Metric{10, "myCrazyMetric"}.setTimestamp(timestamp));It's also possible to send multiple values in a single metric (only InfluxDB is currently supported, other backends fallback into sending metrics one by one)
void send(std::string name, std::vector<Metric>&& metrics)For example:
collector->send("measurementName", {{20, "myMetricIntMultiple"}, {20.30, "myMetricFloatMultple"}});Metrics consist of 4 parameters: name, value, timestamp and tags.
| Parameter name | Type | Required | Default |
|---|---|---|---|
| name | string | yes | - |
| value | int / double / string / uint64_t | yes | - |
| timestamp | chrono::time_point<std::chrono::system_clock> | no | current timestamp |
| tags | vector | no | -** |
**Default tag set is process specific and included in each metric:
- hostname
- PID
- process name
The module can calculate derived metrics. To do so, use addDerivedMetric(std::string name, DerivedMetricMode mode) with one of two available modes:
DerivedMetricMode::RATE- rate between two following metrics;DerivedMetricMode::AVERAGE- average value of all metrics stored in cache;
Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.
Currently process monitoring is supported only on Linux. To enable it use:
enableProcessMonitoring([interval in seconds]);The following metrics are generated every interval:
- etime - elapsed time since the process was started, in the form [[DD-]hh:]mm:ss
- pcpu - cpu utilization of the process in "##.#" format. Currently, it is the CPU time used divided by the time the process has been running (cputime/realtime ratio), expressed as a percentage. It will not add up to 100% unless you are lucky
- pmem - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage
- bytesReceived - the total number of bytes of data received by the process (per interface)
- bytesTransmitted - the total number of bytes of data transmitted by the process (per interface).
Code snippets are available in examples directory.
- Sending metric - examples/1-Basic.cxx
- Sending metric with custom taggs - examples/2-TaggedMetrics.cxx
- Sending metric with user defined timestamp - examples/3-UserDefinedTimestamp.cxx
- Calculating derived metrics - examples/4-RateDerivedMetric.cxx
- Sending multiple values in a single metric - examples/8-Multiple.cxx
This guide explains manual installation. For ansible deployment see AliceO2Group/system-configuration gitlab repo.
- Install collectd package (as root)
yum -y install collectd
- Edit configuration file:
/etc/collectd.conf(as root)
Interval 10
Include "/etc/collectd.d"
- Configure
networkwrite plugin:/etc/collectd.d/network.confin order to push metrics to InfluxDB instance. Replace<influxdb-host>with InfluxDB hostname. (as root)
LoadPlugin network
<Plugin network>
Server "<influxdb-host>" "25826"
</Plugin>
- Configure
cpumodule:/etc/collectd.d/cpu.conf(as root)
LoadPlugin cpu
<Plugin cpu>
ReportByCpu true
ReportByState true
ValuesPercentage true
</Plugin>
- Configure
diskplugin:/etc/collectd.d/disk.conf(as root)
LoadPlugin disk
<Plugin disk>
Disk "/[hs]d[a-f][0-9]?$/"
IgnoreSelected false
UseBSDName false
UdevNameAttr "DEVNAME"
</Plugin>
- Configure
interfaceplugin:/etc/collectd.d/interface.conf(as root)
LoadPlugin interface
- Configure
loadplugin:/etc/collectd.d/load.conf(as root)
LoadPlugin interface
- Configure
memoryplugin:/etc/collectd.d/memory.conf(as root)
LoadPlugin memory
- Configure
uptimeplugin:/etc/collectd.d/uptime.conf(as root)
LoadPlugin uptime
- Start collectd (as root)
systemctl start collectd.service
systemctl enable collectd.service
- Add
influxdbrepo (as root)
cat > /etc/yum.repos.d/influxdb.repo <<EOF
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
- Install InfluxDB package (as root)
yum -y install influxdb collectd
- Add UDP endpoint for application related metrics by editing configuration file
/etc/influxdb/influxdb.confwith database nametestand UDP port number8088. (as root)
[[udp]]
enabled = true
bind-address = ":8088"
database = "test"
batch-size = 5000
batch-timeout = "1s"
batch-pending = 100
read-buffer = 8388608
- Add an endpoint for
collectd(as root)
[[collectd]]
enabled = true
bind-address = ":25826"
database = "system-monitoring"
typesdb = "/usr/share/collectd/types.db"
- Open UDP port
25826and8088(as root)
firewall-cmd --zone=public --permanent --add-port=8088/udp
firewall-cmd --zone=public --permanent --add-port=25826/udp
firewall-cmd --reload
- Start InfluxDB (as root)
systemctl start influxdb
- Create database
testandsystem-monitoring
influx
influx> create database test
influx> create database system-monitoring
More details available at InfluxDB page.
- Install Java (as root)
yum -y install java
-
Download latest release of Apache Flume
-
Unpack file
tar -xvzf apache-flume-1.7.0-bin.tar.gz
-
Install custom source and/or sink from MonitoringCustomComponents repo. Adjust configuration file according to source/sink instructions. The sample configuration file is available in
conf/flume-conf.properties.template. -
Launch Flume using following command:
$ bin/flume-ng agent -n <agent-name> -c conf -f conf/<flume-confing>
Set correct <agent-name> and <flume-confing> name.
See Flume User Guide documentation for more details.
- Add Grafana repo (as root)
curl -s https://packagecloud.io/install/repositories/grafana/stable/script.rpm.sh | bash
- Install Grafana package (as root)
yum -y install grafana
- Open port 3000 (as root)
firewall-cmd --zone=public --add-port 3000/tcp --permanent
firewall-cmd --reload
- Change default
admin_userandadmin_password:/etc/grafana/grafana.ini. (as root)
See more regarding configuration file in the official documentation: http://docs.grafana.org/installation/configuration/
-
(Enable SSL)
- Set protocol to
https,ssl_modetoskip-verifyin configuration file - Generate private key and certificate via CERN Certification Authority
- Set
cert_fileandcert_keyvalue in configuration file
- Set protocol to
-
(Configure LDAP-based login:
/etc/grafana/ldap.toml) See official documentation at the Grafana webpage: http://docs.grafana.org/installation/ldap/ -
Start Grafana (as root)
systemctl start grafana-server
Follow official MonALISA Service Installation Guide.