dataswamp.org

       Title: Monitor your systems with reed-alert
       Author: Solène
       Date: 17 January 2018
       Tags: unix lisp reed-alert
       Description: 
       
       This article will present my software __reed-alert__, it checks
       user-defined states and send user-defined notification. I made it
       really easy to use but still configurable and extensible.
       
       
       ## Description
       
       __reed-alert__ is _not_ a monitoring tool producing graph or storing
       values. It does a job sysadmins are looking for because there are no
       alternative product (the alternatives comes from a very huge
       infrastructure like Zabbix so it's not comparable).
       
       From its configuration file, __reed-alert__ will check various states
       and then, if it fails, will trigger a command to send a notification
       (totally user-defined).
       
       
       ## Fetch it 
       
       This is a open-source and free software released under MIT license,
       you can install it with the following command:
       
           # git clone git://bitreich.org/reed-alert
           # cd reed-alert
           # make
           # doas make install
       
       This will install a script `reed-alert` in /usr/local/bin/ with the
       default Makefile variables. It will try to use ecl and then sbcl if
       ecl is not installed.
       
       it, but we will see here how to get started quickly.
       
       You will find a few files there, __reed-alert__ is a Common LISP
       software and it has been chose for (I hope) good reasons that the
       configuration file is plain Common LISP.
       
       There is a configuration file looking like a real world example named
       **config.lisp.sample** and another configuration file I use for testing
       named **example.lisp** containing lot of cases.
       
       
       ## Let's start
       
       In order to use __reed-alert__ we only need to create a new
       configuration file and then add a cron job.
       
       
       ### Configuration
       
       We are going to see how to configure __reed-alert__. You can find more
       explanations or details in the __README__ file.
       
       
       #### Alerts 
       
       We have to configure two kind of parameters, first we need to set-up a
       way to receive alerts, easiest way to do so is by sending a mail with
       "mail" command. Alerts are declared with the function **alert** and as
       parameters the alert name and the command to be executed. Some
       variables are replaced with values from the probe, in the __README__
       file you can find the list of probes, it looks like %date% or
       %params%.
       
       In Common LISP functions are called by using a parenthesis before its
       name and until the parenthesis is closed, we are giving its
       parameters.
       
       Example:
       
           (alert mail "echo 'problem on %hostname%' | mail me@example.com")
       
       One should take care about nesting quotes here.
       
       __reed-alert__ will fork a shell to start the command, so pipes and
       redirection works. You can be creative when writing alerts that:
       
       + use a SMS service
       + write a script to post on a forum
       + publishing a file on a server
       + send text to IRC with ii client
       
       
       #### Checks
       
       Now we have some alerts, we will configure some checks in order to
       make __reed-alert__ useful. It uses *probes* which are pre-defined
       checks with parameters, a probe could be "has this file not been
       updated since N minutes ?" or "Is the disk space usage of partition X
       more than Y ?"
       
       I chose to name the function "=>" to make a check, it isn't a name
       and reminds an item or something going forward. Both previous example
       using our previous mail notifier would look like:
       
           (=> mail file-updated :path "/program/file.generated" :limit "10")
           (=> mail disk-usage   :limit 90)
       
       It's also possible to use shell commands and check the return code
       using the __command__ probe, allowing the user to define useful
       checks.
       
           (=> mail command :command "echo '/is-this-gopher-server-up?' | nc
       -w 3 dataswamp.org 70"
                            :desc "dataswamp.org gopher server")
       
       We use echo + netcat to check if a connection to a socket works. The
       **:desc** keyword will give a nicer name in the output instead of just
       "COMMAND".
       
       
       #### Garniture
       
       We wrote the minimum required to configure __reed-alert__, now the
       configuration file so your **my-config.lisp** file should looks like
       this:
       
           (alert mail "echo 'problem on %hostname%' | mail me@example.com")
           (=> mail file-updated :path "/program/file.generated" :limit "10")
           (=> mail disk-usage   :limit 90)
       
       Now, you can start it every 5 minutes from a crontab with this:
       
           */5 * * * * ( reed-alert /path/to/my-config.lisp )
       
       If you prefer to use ecl:
       
           */5 * * * * ( reed-alert /path/to/my-config.lisp )
       
       The time between each run is up to you, depending on what you monitor.
       
       
       #### Important
       
       By default, when a check returns a failure, __reed-alert__ will only
       trigger the notifier associated once it reach the 3rd failure. And
       then, will notify again when the service is back (the variable %state%
       is replaced by start or end to know if it starts or stops.)
       
       This is to prevent reed-alert to send a notification each time it
       checks, there is absolutely no need for this for most users.
       
       The number of failures before triggering can be modified by using the
       keyword ":try" as in the following example: 
       
           (=> mail disk-usage :limit 90 :try 1)
       
       In this case, you will get notified at the first failure of it.
       
       The number of failures of failed checks is stored in files (1 per
       check) in the "states/" directory of reed-alert working directory.