dataswamp.org

       Title: Reed-alert: five years later
       Author: Solène
       Date: 10 February 2022
       Tags: unix reed-alert linux lisp nocloud
       Description: Experience feedback of using the reed-alert program on a
       server
       
       # Introduction
       
       I wrote the program reed-alert five years ago, I've been using it since
       its first days, here is some feed back about it.
       
       The software reed-alert is meant to be used by system administrators
       who want to monitor their infrastructures and get alerts when things go
       wrong.  I got a lot more experience in the monitoring field over time
       and I wanted to share some thoughts about this project.
       
 (HTM) reed-alert source code
       
       # Reed-alert
       
       ## The name
       
       The software name is a pun I found in a Star Trek Enterprise episode.
       
 (HTM) Reed alert pun origins
       
       ## Project finished
       
       The code didn't receive many commits over the last years, I consider
       the program to be complete with regard to features, but new probes
       could be added, or bug fixes could be done.  But the core of the
       software itself is perfect to me.
       
       The probes are small parts of code allowing to monitor extra states,
       like http return code, working ping, service started etc...  It's
       already easy to extend reed-alert using a shell command returning 0 or
       not 0 to define a custom probe.
       
       ## Reliability
       
       I don't remember having a single issue with reed-alert since I've set
       it up on my server.  It's run by a cron job every 10 minutes, this mean
       a common lisp interpreter is loading the code, evaluating the
       configuration file, running the check commands and alerts commands if
       required, and stops.  I chose a serviceless paradigm for reed-alert as
       it make the code and usage a lot simpler.  With a running service, it
       could fail, leak memory, be exploited and certainly many other bugs I
       can't think of.
       
       Reed-alert is simple as it only need a common lisp interpreter, the
       most notable sbcl and ecl interpreters are absolutely reliable and
       change very little over time.  Some unix standard commands are required
       for some checks or default alerts, such as ping, service, mail or curl
       but this defers all the work to well established binaries.
       
       The source code is minimal with 179 lines for reed-alert core and 159
       lines for the probes, a total of 338 lines of code (including empty
       lines and comments), hacking on reed-alert is super easy and always a
       lot of fun for me.  For whatever reason, my common lisp software often
       work at first try when I add new features, so it's always pleasant to
       work on them.
       
       ## Awesome features
       
       One aspect of reed-alert that may disturb users at first is the choice
       of common lisp code as a configuration file, this may look complicated
       at first, but a simple configuration doesn't require more common lisp
       knowledge than what is explained in reed-alert documentation.  But it
       gives all its power when you need to loop over a data entry to run
       checks, allowing to make reed-alert dynamic instead of handwriting all
       the configuration.
       
       The use of common lisp as configuration has other advantages, it's
       possible to chain checks to easily prevent some checks to be done in
       case a condition is failing.  Let me give a few examples for this:
       
       * if you monitor a web server, you first want to check if it replies on
       ICMP before trying to check and report errors on HTTP level
       * if you monitor remote servers, you first want to check if you can
       reach the internet and that your local gateway is online
       * if you check a local web server, it would be a good idea to check if
       all the required services are running first
       
       All the previous conditions can be done with reed-alert thanks to the
       code-as-configuration choice.
       
       ## Scalability
       
       I've been asked a few times if reed-alert could be used in a
       professional context.  Depending on what you call a professional
       environment, I will reply it depends.
       
       Reed-alert is dumb, it needs to be run from a scheduling software (such
       as cron) and will sequentially run the checks.  It won't guarantee a
       perfect timing between checks.
       
       If you need multiples machines to run a set of checks, reed-alert is
       not able to share the states to continue to work reliably in a high
       availability environment.
       
       In regard to resources usage, while reed-alert is small it needs to run
       the command lisp interpreter every time, if you want to run reed-alert
       every minute or multiple time per minute, I'd recommend using something
       else.
       
       # A real life example
       
       Here is a chunk of the configuration I've been running for years, it
       checks the system itself and some remote servers.
       
       ```
       (=> mail disk-usage  :path "/"     :limit 60 :desc "partition /")
       (=> mail disk-usage  :path "/var"  :limit 70 :desc "partition /var")
       (=> mail disk-usage  :path "/home" :limit 95 :desc "partition /home")
       (=> mail service :name "dovecot")
       (=> mail service :name "spamd")
       (=> mail service :name "dkimproxy_out")
       (=> mail service :name "smtpd")
       (=> mail service :name "ntpd")
       
       (=> mail number-of-processes :limit 140)
       
       ;; check dataswamp server is working
       (=> mail ping :host "dataswamp.org" :desc "Dataswamp")
       
       ;; check webzine related web servers
       (and
           (=> mail ping :host "openports.pl"     :desc "Liaison Grifon.fr")
           (=> mail curl-http-status :url "https://webzine.puffy.cafe" :desc "Webzine Puffy.cafe" :timeout 10)
           (=> mail curl-http-status :url "https://puffy.cafe" :desc "Puffy.cafe" :timeout 10)
           (=> mail ssl-expiration :host "webzine.puffy.cafe" :seconds (* 7 24 60 60))
           (=> mail ssl-expiration :host "puffy.cafe" :seconds (* 7 24 60 60)))
       
       ;; check openports.pl is working
       (and
           (=> mail ping :host "46.23.90.152"  :desc "Openports.pl ping")
           (=> mail curl-http-status :url "http://46.23.90.152" :desc "Packages OpenBSD http" :timeout 10))
       
       ;; check www.openbsd.org website is replying under 10 seconds
       (=> mail curl-http-status :url "https://www.openbsd.org" :desc "OpenBSD.org" :timeout 10)
       
       ;; check if a XML file is created regularly and valid
       (=> mail file-updated :path "/var/www/htdocs/solene/openbsd-current.xml" :limit 1440)
       (=> mail command :command (format nil "xmllint /var/www/htdocs/solene/openbsd-current.xml") :desc "XML openbsd-current.xml is not valid")
       
       
       ;; monitoring multiple gopher servers
       (loop for host in '("grifon.fr" "dataswamp.org" "gopherproject.org")
             do
             (=> mail command
                 :try 6
                 :command (format nil "echo '/is-alive?done-by-solene-at-libera' | nc -w 3 ~a 70" host)
                 :desc (concatenate 'string "Gopher " host)))
       
       (quit)
       ```
       
       # Conclusion
       
       I wrote a simple software using an old programming language (Common
       LISP ANSI is from 1994), the result is that it's reliable over time,
       require no code maintenance and is fun to code on.
       
 (HTM) Common Lisp on Wikipedia