Title: Lightweight data monitoring using RRDtool
       Author: Solène
       Date: 16 February 2023
       Tags: monitoring nocloud
       Description: In this article, I will introduce you to RRDtool, a robust
       software to keep track of data and render graphs from it
       
       # Introduction
       
       I like my servers to run the least code possible, and the least
       services running in general, this ease maintenance and let room for
       other thing to run.  I recently wrote about monitoring software to
       gather metrics and render them, but they are all overkill if you just
       want to keep track of a single value over time, and graph it for
       visualization.
       
       Fortunately, we have an old and robust tool doing the job fine, it's
       perfectly documented and called RRDtool.
       
 (HTM) RRDtool official website
       
       RRDtool stands for "Round Robin Database Tool", it's a set of programs
       and a specific file format to gather metrics.  The trick with RRD files
       is that they have a fixed size, when you create it, you need to define
       how many values you want to store in it, at which frequency, for how
       long.  This can't be changed after the file creation.
       
       In addition, RRD files allow you to create derivated time series to
       keep track of computed values on a longer timespan, but with a lesser
       resolution.  Think of the following use case: you want to monitor your
       home temperature every 10 minutes for the past 48 hours, but you want
       to keep track of some information for the past year, you can tell RRD
       to compute the average temperature for every hour, but for a week, or
       the average temperature for four hours but for a month, and the average
       temperature per day for a year.  All of this will be fixed size.
       
       # Anatomy of a RRD file
       
       RRD files can be dumped as XML, this will give you a glimpse that may
       ease the understanding of this special file format.
       
       Let's create a file to monitor the battery level of your computer every
       20 seconds, with the last 5 values, don't focus at understanding the
       whole command line now:
       
       ```rrdtool
       rrdtool create test.rrd --step 10 DS:battery:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:5
       ```
       
       If we dump the created file using the according command, we get this
       result (stripped a bit to make it fit better):
       
       ```rrdtool
       <!-- Round Robin Database Dump -->
       <rrd>
               <version>0003</version>
               <step>10</step> <!-- Seconds -->
               <lastupdate>1676569107</lastupdate> <!-- 2023-02-16 18:38:27 CET -->
       
               <ds>
                       <name> battery </name>
                       <type> GAUGE </type>
                       <minimal_heartbeat>20</minimal_heartbeat>
                       <min>0.0000000000e+00</min>
                       <max>1.0000000000e+02</max>
                       <!-- PDP Status -->
                       <last_ds>U</last_ds> <value>NaN</value> <unknown_sec> 7 </unknown_sec>
               </ds>
       
               <!-- Round Robin Archives -->
               <rra>
                       <cf>AVERAGE</cf>
                       <pdp_per_row>1</pdp_per_row> <!-- 10 seconds -->
       
                       <params> <xff>5.0000000000e-01</xff> </params>
                       <cdp_prep>
                               <ds>
                               <primary_value>0.0000000000e+00</primary_value>
                               <secondary_value>0.0000000000e+00</secondary_value>
                               <value>NaN</value>
                               <unknown_datapoints>0</unknown_datapoints>
                               </ds>
                       </cdp_prep>
                       <database>
                               <!-- 2023-02-16 18:37:40 CET / 1676569060 --> <row><v>NaN</v></row>
                               <!-- 2023-02-16 18:37:50 CET / 1676569070 --> <row><v>NaN</v></row>
                               <!-- 2023-02-16 18:38:00 CET / 1676569080 --> <row><v>NaN</v></row>
                               <!-- 2023-02-16 18:38:10 CET / 1676569090 --> <row><v>NaN</v></row>
                               <!-- 2023-02-16 18:38:20 CET / 1676569100 --> <row><v>NaN</v></row>
                       </database>
               </rra>
       </rrd>
       ```
       
       The most important thing to understand here, is that we have a "ds"
       (data serie) named battery of type GAUGE with no last value (I never
       updated it), but also a "RRA" (Round Robin Archive) for our average
       value that contain timestamp and no value associated to each.  You can
       see that internally, we already have our 5 slots that exist with a null
       value associated.  If I update the file, the first null value will
       disappear, and a new record will be added at the end with the actual
       value.
       
       # Monitoring a value
       
       In this guide, I would like to share my experience at using rrdtool to
       monitor my solar panel power output over the last few hours, which can
       be easily displayed on my local dashboard.  The data are also collected
       and sent to a graphana server, but it's not local and displaying to
       know the last values is wasting resources and bandwidth.
       
       First, you need `rrdtool` to be installed, you don't need anything else
       to work with RRD files.
       
       ## Create the RRD file
       
       Creating the RRD file is the most tricky part, because you can't change
       it afterward.
       
       I want to collect a data every 5 minutes (300 seconds), this is an
       absolute data between 0 and 4000, so we will define a step of 300
       seconds to tell the file must receive a value every 300 seconds.  The
       type of the value will be GAUGE, because it's just a value that doesn't
       depend on the previous one.  If we were monitoring power change over
       time, we would like to use DERIVE, because it computes the delta
       between each value.
       
       Furthermore, we need to configure the file to give up on a value slot
       if it's not updated within 600 seconds.
       
       Finally, we want to be able to graph each measurement, this can be done
       by adding an AVERAGE calculated value in the file, but with a
       resolution of 1 value, with 240 measurements stored.  What this mean,
       is for each time we add a value in the RRD file, the field for AVERAGE
       will be calculated with only the last value as input, and we will keep
       240 of them, allowing us to graph up to 240 * 5 minutes of data back in
       time.
       
       ```shell
       rrdtool create solar-power.rrd --step 300 ds:value:gauge:600:0:4000   rra:average:0.5:1:240
                                                      ^    ^     ^  ^  ^            ^     ^  ^  ^
                                                      |    |     |  |  | max value  |     |  |  | number of values to keep
                                                      |    |     |  | min value     |     |  | how many previous values should be used in the function, 1 means just a single value, so averaging itself
                                                      |    |     | time before null |     | (xfiles factor) how much percent of unknown values do we agree to use for calculating a value
                                                      |    | measurement type       | function to apply, can be AVERAGE, MAX, MIN, LAST, or mathematical operations
                                                      | variable name
       ```
       
       And then, you have your `solar-power.rrd` file created.  You can
       inspect it with `rrdtool info solar-power.rrd` or dump its content with
       `rrdtool dump solar-power.rrd`.
       
 (HTM) RRDtool create documentation
       
       ## Add values to the RRD file
       
       Now that we have prepared the file to receive data, we need to populate
       it with something useful.  This can be done using the command `rrdtool
       update`.
       
       ```shell
       CURRENT_POWER=$(some-command-returning-a-value)
       rrdtool update solar-power.rrd "N:${CURRENT_POWER}"
                                       ^    ^
                                       |    | value of the first field of the RRD file (we created a single field)
                                       | when the value has been measured, N equals to NOW
       ```
       
 (HTM) RRDtool update documentation
       
       ## Graph the content of the RRD file
       
       The trickiest part, but less problematic, is to generate a usable graph
       from the data.  The operation is not destructive as it's not modifying
       the file, so we can make a lot of experimentations on it without
       affecting the content.
       
       We will generate something simple like the picture below.  Of course,
       you can add a lot more information, color, axis, legends etc.. but I
       need my dashboard to stay simple and clean.
       
 (DIR) A diagram displaying solar power over time (on a cloudy day)
       
       ```shell
       rrdtool graph --end now -l 0 --start end-14000s --width 600 --height 300 \
           /var/www/htdocs/dashboard/solar.svg -a SVG \
           DEF:ds0=/var/lib/rrdtool/solar-power.rrd:value:AVERAGE \
           "LINE1:ds0#0000FF:power" \
           "GPRINT:ds0:LAST:current value %2.1lf"
       ```
       
       I think most flags are explicit, if not you can look at the
       documentation, what interests us here are the last three lines.
       
       The `DEF` line associates the RRA AVERAGE of the variable `value` in
       the file `/var/lib/rrdtool/solar-power.rrd` to the name `ds0` that will
       be used later in the command line.
       
       The `LINE1` line associates a legend, and a color to the rendering of
       this variable.
       
       The `GPRINT` line adds a text in the legend, here we are using the last
       value of `ds0` and format it in a printf style string `current value
       %2.1lf`.
       
 (HTM) RRDtool graph documentation
 (HTM) RRDtool graph examples
       
       # Conclusion
       
       RRDtool is very nice, it's a storage engine for monitoring software
       such as collectd or munin, but we can also use them on the spot with
       simple scripts.  However, they have drawbacks, when you start to create
       many files it doesn't scale well, generate a lot of I/O and consume CPU
       if you need to render hundreds of pictures, that's why a daemon named
       `rrdcached` has been created to help mitigate the load issue by
       delegating updates of a lot of RRD files in a more sequential way.
       
       # Going further
       
       I encourage you to look at the official project website, all the other
       command can be very useful, and rrdtool also exports data as XML or
       JSON if needed, which is perfect to plug in with other software.