Title: OpenBSD: pkg_add performance analysis
       Author: Solène
       Date: 08 July 2021
       Tags: bandwidth openbsd unix
       Description: 
       
       # Introduction
       
       OpenBSD package manager pkg_add is known to be quite slow and using
       much bandwidth, I'm trying to figure out easy ways to improve it and I
       may nailed something today by replacing ftp(1) http client by curl.
       
       # Testing protocol
       
       I used on an OpenBSD -current amd64 the following command "pkg_add -u
       -v | head -n 70" which will check for updates of the 70 first packages
       and then stop.  The packages tested are always the same so the test is
       reproducible.
       
       The traditional "ftp" will be tested, but also "curl" and "curl -N".
       
       The bandwidth usage has been accounted using "pfctl -s labels" by a
       match rule matching the mirror IP and reset after each test.
       
       # What happens when pkg_add runs
       
       Here is a quick intro to what happens in the code when you run pkg_add
       -u on http://
       
       * pkg_add downloads the package list on the mirror (which could be
       considered to be an index.html file) which weights ~2.5 MB, if you add
       two packages separately the index will be downloaded twice.
       * pkg_add will run /usr/bin/ftp on the first package to upgrade to read
       its first bytes and pipe this to gunzip (done from perl from pkg_add)
       and piped to signify to check the package signature.  The signature is
       the list of dependencies and their version which is used by pkg_add to
       know if the package requires update and the whole package signify
       signature is stored in the gzip header if the whole package is
       downloaded (there are 2 signatures: signify and the packages
       dependencies, don't be mislead!).
       * if everything is fine, package is downloaded and the old one is
       replaced.
       * if there is no need to update, package is skipped.
       * new package = new connection with ftp(1) and pipes to setup
       
       Using FETCH_CMD variable it's possible to tell pkg_add to use another
       command than /usr/bin/ftp as long as it understand "-o -" parameter and
       also "-S session" for https:// connections.  Because curl doesn't
       support the "-S session=..." parameter, I used a shell wrapper that
       discard this parameter.
       
       # Raw results
       
       I measured the whole execution time and the total bytes downloaded for
       each combination.  I didn't show the whole results but I did the tests
       multiple times and the standard deviation is near to 0, meaning a test
       done multiple time was giving the same result at each run.
       
       ```
       operation               time to run     data transferred
       ---------               -----------     ----------------
       ftp http://             39.01           26
       curl -N http://                28.74           12
       curl http://            31.76           14
       ftp https://            76.55           26
       curl -N https://        55.62           15
       curl https://           54.51           15
       ```
       
 (IMG) Charts with results
       
       # Analysis
       
       There are a few surprising facts from the results.
       
       * ftp(1) not taking the same time in http and https, while it is
       supposed to reuse the same TLS socket to avoid handshake for every
       package.
       * ftp(1) bandwidth usage is drastically higher than with curl, time
       seems proportional to the bandwidth difference.
       * curl -N and curl performs exactly the same using https.
       
       # Conclusion
       
       Using http:// is way faster than https://, the risk is about privacy
       because in case of man in the middle the download packaged will be
       known, but the signify signature will prevent any malicious package
       modification to be installed.  Using 'FETCH_CMD="/usr/local/bin/curl -L
       -s -q -N"' gave the best results.
       
       However I can't explain yet the very different behaviors between ftp
       and curl or between http and https.
       
       # Extra: set a download speed limit to pkg_add operations
       
       By using curl as FETCH_CMD you can use the "--limit-rate 900k"
       parameter to limit the transfer speed to the given rate.