dataswamp.org

       Title: How the OpenBSD -stable packages are built
       Author: Solène
       Date: 29 October 2020
       Tags: openbsd
       Description: 
       
       In this long blog post, I will write about the technical details
       of the OpenBSD stable packages building infrastructure. I have setup
       the infrastructure with the help of Theo De Raadt who provided me
       the hardware in summer 2019, since then, OpenBSD users can upgrade
       their packages using `pkg_add -u` for critical updates that has
       been backported by the contributors. Many thanks to them, without
       their work there would be no packages to build. Thanks to pea@ who
       is my backup for operating this infrastructure in case something
       happens to me.
       
       **The total lines of code used is around 110 lines of shell.**
       
       
       ## Original design
       
       In the original design, the process was the following. It was done
       separately on each machine (amd64, arm64, i386, sparc64).
       
       
       ### Updating ports
       
       First step is to update the ports tree using `cvs up` from a cron
       job and capture its output. **If** there is a result, the process
       continues into the next steps and we discard the result.
       
       With CVS being per-directory and not using a database like git or
       svn, it is not possible to "poll" for an update except by verifying
       every directory if a new version of files is available. This check
       is done three time a day.
       
       
       ### Make a list of ports to compile
       
       This step is the most complicated of the process and weights for a
       third of the total lines of code.
       
       The script uses `cvs rdiff` between the cvs release and stable
       branches to show what changed since release, and its output is
       passed through a few grep and awk scripts to only retrieve the
       "pkgpaths" (the pkgpath of curl is **net/curl**) of the packages
       that were updated since the last release.
       
       From this raw output of cvs rdiff:
       
           File ports/net/dhcpcd/Makefile changed from revision 1.80 to
       1.80.2.1
           File ports/net/dhcpcd/distinfo changed from revision 1.48 to
       1.48.2.1
           File ports/net/dnsdist/Makefile changed from revision 1.19 to
       1.19.2.1
           File ports/net/dnsdist/distinfo changed from revision 1.7 to
       1.7.2.1
           File ports/net/icinga/core2/Makefile changed from revision 1.104 to
       1.104.2.1
           File ports/net/icinga/core2/distinfo changed from revision 1.40 to
       1.40.2.1
           File ports/net/synapse/Makefile changed from revision 1.13 to
       1.13.2.1
           File ports/net/synapse/distinfo changed from revision 1.11 to
       1.11.2.1
           File ports/net/synapse/pkg/PLIST changed from revision 1.10 to
       1.10.2.1
       
       The script will produce:
       
           net/dhcpcd
           net/dnsdist
           net/icinga/core2
           net/synapse
       
       From here, for each pkgpath we have sorted out, the sqlports database
       is queried to get the full list of pkgpaths of each packages, this
       will include all packages like flavors, subpackages and multipackages.
       
       This is important because an update in `editors/vim` pkgpath will
       trigger this long list of packages:
       
           editors/vim,-lang
           editors/vim,-main
           editors/vim,gtk2
           editors/vim,gtk2,-lang
           [...40 results hidden for readability...]
           editors/vim,no_x11,ruby
           editors/vim,no_x11,ruby,-lang
           editors/vim,no_x11,ruby,-main
       
       Once we gathered all the pkgpaths to build and stored them in a
       file, next step can start.
       
       
       ### Preparing the environment
       
       As the compilation is done on the real system (using PORTS_PRIVSEP
       though) and not in a chroot we need to clean all packages installed
       except the minimum required for the build infrastructure, which are
       rsync and sqlports.
       
       `dpb(1)` can't be used because it didn't gave good results for
       building the delta of the packages between release and stable.
       
       The various temporary directories used by the ports infrastructure
       are cleaned to be sure the build starts in a clean environment.
       
       
       ### Compiling and creating the packages
       
       This step is really simple. The ports infrastructure is used
       to build the packages list we produced at step 2.
       
           env SUBDIRLIST=package_list BULK=yes make package
       
       In the script there is some code to manage the logs of the previous
       batch but there is nothing more.
       
       Every new run of the process will pass over all the packages which
       received a commit, but the ports infrastructure is smart enough to
       avoid rebuilding ports which already have a package with the correct
       version.
       
       
       ### Transfer the package to the signing team
       
       Once the packages are built, we need to pass only the built
       packages to the person who will manually sign the packages before
       publishing them and have the mirrors to sync.
       
       From the package list, the package file lists are generated and
       reused by rsync to only copy the packages generated.
       
           env SUBDIRLIST=package_list show=PKGNAMES make | grep -v "^=" | \
                 grep ^. | tr ' ' '\n' | sed 's,$,\.tgz,' | sort -u
       
       **The system has all the -release packages in
       `${PACKAGE_REPOSITORY}/${MACHINE_ARCH}/all/` (like
       `/usr/ports/packages/amd64/all`) to avoid rebuilding all dependencies
       required for building a package update, thus we can't copy all the
       packages from the directory where the packages are moved after
       compilation.**
       
       
       ### Send a notification
       
       Last step is to send an email with the output of rsync to send an
       email telling which machine built which package to tell the people
       signing the packages that some packages are available.
       
       As this process is done on each machine and that they
       don't necessarily build the same packages (no firefox on sparc64)
       and they don't build at the same speed (arm64 is slower), mails
       from the four machines could arrive at very different time, which
       led to a small design change.
       
       The whole process is automatic from building to delivering the
       packages for signature. The signature step requires a human to be
       done though, but this is the price for security and privilege
       separation.
       
       
       ## Current design
       
       In the original design, all the servers were running their separate
       cron job, updating their own cvs ports tree and doing a very long
       cvs diff. The result was working but not very practical for the
       people signing who were receiving mails from each machine for each
       batch.
       
       The new design only changed one thing: One machine was chosen to
       run the cron job, produce the package list and then will copy that
       list to the other machines which update their ports tree and run
       the build. Once all machines finished to build, the initiator machine
       will gather outputs and send an unique mail with a summary of each
       machine. This became easier to compare the output of each architecture
       and once you receive the email this means every machine finished
       their job and the signing can be done.
       
       Having the summary of all the building machines resulted in another
       improvement: In the logic of the script, it is possible to send an
       email telling absolutely no package has been built while the process
       was triggered, which means, something went wrong.  From here, I
       need to check the logs to understand why the last commit didn't
       produce a package. This can be failures like a **distinfo** file
       update forgotten in the commit.
       
       Also, this permitted fixing one issue: As the distfiles are shared
       through a common NFS mount point, if multiples machines try to fetch
       a distfile at the same time, both will fail to build.  Now, the
       initiator machine will download all the required distfiles before
       starting the build on every node.
       
       All of the previous scripts were reused, except the one
       sending the email which had to be rewritten.