add xargs article - www.codemadness.org - www.codemadness.org saait content files
 (HTM) git clone git://git.codemadness.org/www.codemadness.org
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) commit 32c11fc0471f7a0e7354a089ff663668863701fe
 (DIR) parent 53575c812355488a857c20d86f09ce787a956adc
 (HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
       Date:   Wed, 22 Nov 2023 19:31:51 +0100
       
       add xargs article
       
       Diffstat:
         M config.cfg                          |       2 +-
         M output/atom.xml                     |      14 +++++++++++++-
         M output/atom_content.xml             |     181 ++++++++++++++++++++++++++++++-
         M output/index                        |       1 +
         M output/index.html                   |       1 +
         M output/rss.xml                      |       8 ++++++++
         M output/rss_content.xml              |     174 +++++++++++++++++++++++++++++++
         M output/sitemap.xml                  |       4 ++++
         M output/twtxt.txt                    |       1 +
         M output/urllist.txt                  |       1 +
         A output/xargs.html                   |     218 +++++++++++++++++++++++++++++++
         A output/xargs.md                     |     188 +++++++++++++++++++++++++++++++
         A pages/xargs.cfg                     |       6 ++++++
         A pages/xargs.md                      |     188 +++++++++++++++++++++++++++++++
       
       14 files changed, 984 insertions(+), 3 deletions(-)
       ---
 (DIR) diff --git a/config.cfg b/config.cfg
       @@ -1,5 +1,5 @@
        # last updated the site.
       -siteupdated = 2023-11-20
       +siteupdated = 2023-11-22
        
        sitetitle = Codemadness
        siteurl = https://www.codemadness.org
 (DIR) diff --git a/output/atom.xml b/output/atom.xml
       @@ -2,11 +2,23 @@
        <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
                <title>Codemadness</title>
                <subtitle>blog with various projects and articles about computer-related things</subtitle>
       -        <updated>2023-11-20T00:00:00Z</updated>
       +        <updated>2023-11-22T00:00:00Z</updated>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org" />
                <id>https://www.codemadness.org/atom.xml</id>
                <link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom.xml" />
        <entry>
       +        <title>xargs: an example for batch jobs</title>
       +        <link rel="alternate" type="text/html" href="https://www.codemadness.org/xargs.html" />
       +        <id>https://www.codemadness.org/xargs.html</id>
       +        <updated>2023-11-22T00:00:00Z</updated>
       +        <published>2023-11-22T00:00:00Z</published>
       +        <author>
       +                <name>Hiltjo</name>
       +                <uri>https://www.codemadness.org</uri>
       +        </author>
       +        <summary>xargs: an example for batch jobs</summary>
       +</entry>
       +<entry>
                <title>Improved Youtube RSS/Atom feed</title>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org/youtube-feed.html" />
                <id>https://www.codemadness.org/youtube-feed.html</id>
 (DIR) diff --git a/output/atom_content.xml b/output/atom_content.xml
       @@ -2,11 +2,190 @@
        <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
                <title>Codemadness</title>
                <subtitle>blog with various projects and articles about computer-related things</subtitle>
       -        <updated>2023-11-20T00:00:00Z</updated>
       +        <updated>2023-11-22T00:00:00Z</updated>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org" />
                <id>https://www.codemadness.org/atom_content.xml</id>
                <link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom_content.xml" />
        <entry>
       +        <title>xargs: an example for batch jobs</title>
       +        <link rel="alternate" type="text/html" href="https://www.codemadness.org/xargs.html" />
       +        <id>https://www.codemadness.org/xargs.html</id>
       +        <updated>2023-11-22T00:00:00Z</updated>
       +        <published>2023-11-22T00:00:00Z</published>
       +        <author>
       +                <name>Hiltjo</name>
       +                <uri>https://www.codemadness.org</uri>
       +        </author>
       +        <summary>xargs: an example for batch jobs</summary>
       +        <content type="html"><![CDATA[<h1>xargs: an example for batch jobs</h1>
       +        <p><strong>Last modification on </strong> <time>2023-11-22</time></p>
       +        <p>This describes a simple shellscript programming pattern to process a list of
       +jobs in parallel. This script example is contained in one file.</p>
       +<h1>Simple but less optimal example</h1>
       +<pre><code>#!/bin/sh
       +maxjobs=4
       +
       +# fake program for example purposes.
       +someprogram() {
       +        echo "Yep yep, I'm totally a real program!"
       +        sleep "$1"
       +}
       +
       +# run(arg1, arg2)
       +run() {
       +        echo "[$1] $2 started" &gt;&amp;2
       +        someprogram "$1" &gt;/dev/null
       +        status="$?"
       +        echo "[$1] $2 done" &gt;&amp;2
       +        return "$status"
       +}
       +
       +# process the jobs.
       +j=1
       +for f in 1 2 3 4 5 6 7 8 9 10; do
       +        run "$f" "something" &amp;
       +
       +        jm=$((j % maxjobs)) # shell arithmetic: modulo
       +        test "$jm" = "0" &amp;&amp; wait
       +        j=$((j+1))
       +done
       +wait
       +</code></pre>
       +<h1>Why is this less optimal</h1>
       +<p>This is less optimal because it waits until all jobs in the same batch are finished
       +(each batch contain $maxjobs items).</p>
       +<p>For example with 2 items per batch and 4 total jobs it could be:</p>
       +<ul>
       +<li>Job 1 is started.</li>
       +<li>Job 2 is started.</li>
       +<li>Job 2 is done.</li>
       +<li>Job 1 is done.</li>
       +<li>Wait: wait on process status of all background processes.</li>
       +<li>Job 3 in new batch is started.</li>
       +</ul>
       +<p>This could be optimized to:</p>
       +<ul>
       +<li>Job 1 is started.</li>
       +<li>Job 2 is started.</li>
       +<li>Job 2 is done.</li>
       +<li>Job 3 in new batch is started (immediately).</li>
       +<li>Job 1 is done.</li>
       +<li>...</li>
       +</ul>
       +<p>It also does not handle signals such as SIGINT (^C). However the xargs example
       +below does:</p>
       +<h1>Example</h1>
       +<pre><code>#!/bin/sh
       +maxjobs=4
       +
       +# fake program for example purposes.
       +someprogram() {
       +        echo "Yep yep, I'm totally a real program!"
       +        sleep "$1"
       +}
       +
       +# run(arg1, arg2)
       +run() {
       +        echo "[$1] $2 started" &gt;&amp;2
       +        someprogram "$1" &gt;/dev/null
       +        status="$?"
       +        echo "[$1] $2 done" &gt;&amp;2
       +        return "$status"
       +}
       +
       +# child process job.
       +if test "$CHILD_MODE" = "1"; then
       +        run "$1" "$2"
       +        exit "$?"
       +fi
       +
       +# generate a list of jobs for processing.
       +list() {
       +        for f in 1 2 3 4 5 6 7 8 9 10; do
       +                printf '%s\0%s\0' "$f" "something"
       +        done
       +}
       +
       +# process jobs in parallel.
       +list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
       +</code></pre>
       +<h1>Run and timings</h1>
       +<p>Although the above example is kindof stupid, it already shows the queueing of
       +jobs is more efficient.</p>
       +<p>Script 1:</p>
       +<pre><code>time ./script1.sh
       +[...snip snip...]
       +real    0m22.095s
       +</code></pre>
       +<p>Script 2:</p>
       +<pre><code>time ./script2.sh
       +[...snip snip...]
       +real    0m18.120s
       +</code></pre>
       +<h1>How it works</h1>
       +<p>The parent process:</p>
       +<ul>
       +<li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
       +execute as a child process.</li>
       +<li>The list function writes the parameters to stdout. These parameters are
       +separated by the NUL byte separator. The NUL byte separator is used because
       +this character cannot be used in filenames (which can contain spaces or even
       +newlines) and cannot be used in text (the NUL byte terminates the buffer for
       +a string).</li>
       +<li>The -L option must match the amount of arguments that are specified for the
       +job. It will split the specified parameters per job.</li>
       +<li>The expression "$(readlink -f "$0")" gets the absolute path to the
       +shellscript itself. This is passed as the executable to run for xargs.</li>
       +<li>xargs calls the script itself with the specified parameters it is being fed.
       +The environment variable $CHILD_MODE is set to indicate to the script itself
       +it is run as a child process of the script.</li>
       +</ul>
       +<p>The child process:</p>
       +<ul>
       +<li><p>The command-line arguments are passed by the parent using xargs.</p>
       +</li>
       +<li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
       +it is run as a child process of the script.</p>
       +</li>
       +<li><p>The script itself (ran in child-mode process) only executes the task and
       +signals its status back to xargs and the parent.</p>
       +</li>
       +<li><p>The exit status of the child program is signaled to xargs. This could be
       +handled, for example to stop on the first failure (in this example it is not).
       +For example if the program is killed, stopped or the exit status is 255 then
       +xargs stops running also.</p>
       +</li>
       +</ul>
       +<h1>xargs -P and portability</h1>
       +<p>Note that some of the options, like -P are as of writing (2023) non-POSIX:
       +<a href="https://pubs.opengroup.org/onlinepubs/9699919799/">https://pubs.opengroup.org/onlinepubs/9699919799/</a>.
       +However many systems support this useful extension.</p>
       +<h1>Explanation of used xargs options:</h1>
       +<p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
       +<pre><code>xargs - construct argument list(s) and execute utility
       +</code></pre>
       +<p>Options explained:</p>
       +<ul>
       +<li>-r: Do not run the command if there are no arguments. Normally the command
       +is executed at least once even if there are no arguments.</li>
       +<li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
       +spaces and newlines.</li>
       +<li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
       +at once.</li>
       +<li>-L number: Call utility for every number of non-empty lines read. A line
       +ending in unescaped white space and the next non-empty line are considered
       +to form one single line. If EOF is reached and fewer than number lines have
       +been read then utility will be called with the available lines.</li>
       +</ul>
       +<h1>References</h1>
       +<ul>
       +<li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
       +<li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
       +<li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
       +</ul>
       +]]></content>
       +</entry>
       +<entry>
                <title>Improved Youtube RSS/Atom feed</title>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org/youtube-feed.html" />
                <id>https://www.codemadness.org/youtube-feed.html</id>
 (DIR) diff --git a/output/index b/output/index
       @@ -11,6 +11,7 @@ i                codemadness.org        70
        i                codemadness.org        70
        iPhlog posts                codemadness.org        70
        i                codemadness.org        70
       +12023-11-22 xargs: an example for batch jobs        /phlog/xargs        codemadness.org        70
        12023-11-20 Improved Youtube RSS/Atom feed        /phlog/youtube-feed        codemadness.org        70
        12023-10-25 Setup your own mail paste service        /phlog/mailservice        codemadness.org        70
        12022-07-01 A simple TODO application        /phlog/todo        codemadness.org        70
 (DIR) diff --git a/output/index.html b/output/index.html
       @@ -40,6 +40,7 @@
                        <div id="main">
                                <h1>Posts</h1>
                                <table>
       +<tr><td><time>2023-11-22</time></td><td><a href="xargs.html">xargs: an example for batch jobs</a></td></tr>
        <tr><td><time>2023-11-20</time></td><td><a href="youtube-feed.html">Improved Youtube RSS/Atom feed</a></td></tr>
        <tr><td><time>2023-10-25</time></td><td><a href="mailservice.html">Setup your own mail paste service</a></td></tr>
        <tr><td><time>2022-07-01</time></td><td><a href="todo-application.html">A simple TODO application</a></td></tr>
 (DIR) diff --git a/output/rss.xml b/output/rss.xml
       @@ -7,6 +7,14 @@
                <description>blog with various projects and articles about computer-related things</description>
                <link>https://www.codemadness.org</link>
        <item>
       +        <title>xargs: an example for batch jobs</title>
       +        <link>https://www.codemadness.org/xargs.html</link>
       +        <guid>https://www.codemadness.org/xargs.html</guid>
       +        <dc:date>2023-11-22T00:00:00Z</dc:date>
       +        <author>Hiltjo</author>
       +        <description>xargs: an example for batch jobs</description>
       +</item>
       +<item>
                <title>Improved Youtube RSS/Atom feed</title>
                <link>https://www.codemadness.org/youtube-feed.html</link>
                <guid>https://www.codemadness.org/youtube-feed.html</guid>
 (DIR) diff --git a/output/rss_content.xml b/output/rss_content.xml
       @@ -7,6 +7,180 @@
                <description>blog with various projects and articles about computer-related things</description>
                <link>https://www.codemadness.org</link>
        <item>
       +        <title>xargs: an example for batch jobs</title>
       +        <link>https://www.codemadness.org/xargs.html</link>
       +        <guid>https://www.codemadness.org/xargs.html</guid>
       +        <dc:date>2023-11-22T00:00:00Z</dc:date>
       +        <author>Hiltjo</author>
       +        <description><![CDATA[<h1>xargs: an example for batch jobs</h1>
       +        <p><strong>Last modification on </strong> <time>2023-11-22</time></p>
       +        <p>This describes a simple shellscript programming pattern to process a list of
       +jobs in parallel. This script example is contained in one file.</p>
       +<h1>Simple but less optimal example</h1>
       +<pre><code>#!/bin/sh
       +maxjobs=4
       +
       +# fake program for example purposes.
       +someprogram() {
       +        echo "Yep yep, I'm totally a real program!"
       +        sleep "$1"
       +}
       +
       +# run(arg1, arg2)
       +run() {
       +        echo "[$1] $2 started" &gt;&amp;2
       +        someprogram "$1" &gt;/dev/null
       +        status="$?"
       +        echo "[$1] $2 done" &gt;&amp;2
       +        return "$status"
       +}
       +
       +# process the jobs.
       +j=1
       +for f in 1 2 3 4 5 6 7 8 9 10; do
       +        run "$f" "something" &amp;
       +
       +        jm=$((j % maxjobs)) # shell arithmetic: modulo
       +        test "$jm" = "0" &amp;&amp; wait
       +        j=$((j+1))
       +done
       +wait
       +</code></pre>
       +<h1>Why is this less optimal</h1>
       +<p>This is less optimal because it waits until all jobs in the same batch are finished
       +(each batch contain $maxjobs items).</p>
       +<p>For example with 2 items per batch and 4 total jobs it could be:</p>
       +<ul>
       +<li>Job 1 is started.</li>
       +<li>Job 2 is started.</li>
       +<li>Job 2 is done.</li>
       +<li>Job 1 is done.</li>
       +<li>Wait: wait on process status of all background processes.</li>
       +<li>Job 3 in new batch is started.</li>
       +</ul>
       +<p>This could be optimized to:</p>
       +<ul>
       +<li>Job 1 is started.</li>
       +<li>Job 2 is started.</li>
       +<li>Job 2 is done.</li>
       +<li>Job 3 in new batch is started (immediately).</li>
       +<li>Job 1 is done.</li>
       +<li>...</li>
       +</ul>
       +<p>It also does not handle signals such as SIGINT (^C). However the xargs example
       +below does:</p>
       +<h1>Example</h1>
       +<pre><code>#!/bin/sh
       +maxjobs=4
       +
       +# fake program for example purposes.
       +someprogram() {
       +        echo "Yep yep, I'm totally a real program!"
       +        sleep "$1"
       +}
       +
       +# run(arg1, arg2)
       +run() {
       +        echo "[$1] $2 started" &gt;&amp;2
       +        someprogram "$1" &gt;/dev/null
       +        status="$?"
       +        echo "[$1] $2 done" &gt;&amp;2
       +        return "$status"
       +}
       +
       +# child process job.
       +if test "$CHILD_MODE" = "1"; then
       +        run "$1" "$2"
       +        exit "$?"
       +fi
       +
       +# generate a list of jobs for processing.
       +list() {
       +        for f in 1 2 3 4 5 6 7 8 9 10; do
       +                printf '%s\0%s\0' "$f" "something"
       +        done
       +}
       +
       +# process jobs in parallel.
       +list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
       +</code></pre>
       +<h1>Run and timings</h1>
       +<p>Although the above example is kindof stupid, it already shows the queueing of
       +jobs is more efficient.</p>
       +<p>Script 1:</p>
       +<pre><code>time ./script1.sh
       +[...snip snip...]
       +real    0m22.095s
       +</code></pre>
       +<p>Script 2:</p>
       +<pre><code>time ./script2.sh
       +[...snip snip...]
       +real    0m18.120s
       +</code></pre>
       +<h1>How it works</h1>
       +<p>The parent process:</p>
       +<ul>
       +<li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
       +execute as a child process.</li>
       +<li>The list function writes the parameters to stdout. These parameters are
       +separated by the NUL byte separator. The NUL byte separator is used because
       +this character cannot be used in filenames (which can contain spaces or even
       +newlines) and cannot be used in text (the NUL byte terminates the buffer for
       +a string).</li>
       +<li>The -L option must match the amount of arguments that are specified for the
       +job. It will split the specified parameters per job.</li>
       +<li>The expression "$(readlink -f "$0")" gets the absolute path to the
       +shellscript itself. This is passed as the executable to run for xargs.</li>
       +<li>xargs calls the script itself with the specified parameters it is being fed.
       +The environment variable $CHILD_MODE is set to indicate to the script itself
       +it is run as a child process of the script.</li>
       +</ul>
       +<p>The child process:</p>
       +<ul>
       +<li><p>The command-line arguments are passed by the parent using xargs.</p>
       +</li>
       +<li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
       +it is run as a child process of the script.</p>
       +</li>
       +<li><p>The script itself (ran in child-mode process) only executes the task and
       +signals its status back to xargs and the parent.</p>
       +</li>
       +<li><p>The exit status of the child program is signaled to xargs. This could be
       +handled, for example to stop on the first failure (in this example it is not).
       +For example if the program is killed, stopped or the exit status is 255 then
       +xargs stops running also.</p>
       +</li>
       +</ul>
       +<h1>xargs -P and portability</h1>
       +<p>Note that some of the options, like -P are as of writing (2023) non-POSIX:
       +<a href="https://pubs.opengroup.org/onlinepubs/9699919799/">https://pubs.opengroup.org/onlinepubs/9699919799/</a>.
       +However many systems support this useful extension.</p>
       +<h1>Explanation of used xargs options:</h1>
       +<p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
       +<pre><code>xargs - construct argument list(s) and execute utility
       +</code></pre>
       +<p>Options explained:</p>
       +<ul>
       +<li>-r: Do not run the command if there are no arguments. Normally the command
       +is executed at least once even if there are no arguments.</li>
       +<li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
       +spaces and newlines.</li>
       +<li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
       +at once.</li>
       +<li>-L number: Call utility for every number of non-empty lines read. A line
       +ending in unescaped white space and the next non-empty line are considered
       +to form one single line. If EOF is reached and fewer than number lines have
       +been read then utility will be called with the available lines.</li>
       +</ul>
       +<h1>References</h1>
       +<ul>
       +<li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
       +<li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
       +<li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
       +</ul>
       +]]></description>
       +</item>
       +<item>
                <title>Improved Youtube RSS/Atom feed</title>
                <link>https://www.codemadness.org/youtube-feed.html</link>
                <guid>https://www.codemadness.org/youtube-feed.html</guid>
 (DIR) diff --git a/output/sitemap.xml b/output/sitemap.xml
       @@ -1,6 +1,10 @@
        <?xml version="1.0" encoding="UTF-8"?>
        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
        <url>
       +        <loc>https://www.codemadness.org/xargs.html</loc>
       +        <lastmod>2023-11-22</lastmod>
       +</url>
       +<url>
                <loc>https://www.codemadness.org/youtube-feed.html</loc>
                <lastmod>2023-11-20</lastmod>
        </url>
 (DIR) diff --git a/output/twtxt.txt b/output/twtxt.txt
       @@ -1,3 +1,4 @@
       +2023-11-22T00:00:00Z        xargs: an example for batch jobs: https://www.codemadness.org/xargs.html
        2023-11-20T00:00:00Z        Improved Youtube RSS/Atom feed: https://www.codemadness.org/youtube-feed.html
        2023-10-25T00:00:00Z        Setup your own mail paste service: https://www.codemadness.org/mailservice.html
        2022-07-01T00:00:00Z        A simple TODO application: https://www.codemadness.org/todo-application.html
 (DIR) diff --git a/output/urllist.txt b/output/urllist.txt
       @@ -1,3 +1,4 @@
       +https://www.codemadness.org/xargs.html
        https://www.codemadness.org/youtube-feed.html
        https://www.codemadness.org/mailservice.html
        https://www.codemadness.org/todo-application.html
 (DIR) diff --git a/output/xargs.html b/output/xargs.html
       @@ -0,0 +1,218 @@
       +<!DOCTYPE html>
       +<html dir="ltr" lang="en">
       +<head>
       +        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
       +        <meta http-equiv="Content-Language" content="en" />
       +        <meta name="viewport" content="width=device-width" />
       +        <meta name="keywords" content="xargs, wow hyper speed" />
       +        <meta name="description" content="xargs: an example for batch jobs" />
       +        <meta name="author" content="Hiltjo" />
       +        <meta name="generator" content="Static content generated using saait: https://codemadness.org/saait.html" />
       +        <title>xargs: an example for batch jobs - Codemadness</title>
       +        <link rel="stylesheet" href="style.css" type="text/css" media="screen" />
       +        <link rel="stylesheet" href="print.css" type="text/css" media="print" />
       +        <link rel="alternate" href="atom.xml" type="application/atom+xml" title="Codemadness Atom Feed" />
       +        <link rel="alternate" href="atom_content.xml" type="application/atom+xml" title="Codemadness Atom Feed with content" />
       +        <link rel="icon" href="/favicon.png" type="image/png" />
       +</head>
       +<body>
       +        <nav id="menuwrap">
       +                <table id="menu" width="100%" border="0">
       +                <tr>
       +                        <td id="links" align="left">
       +                                <a href="index.html">Blog</a> |
       +                                <a href="/git/" title="Git repository with some of my projects">Git</a> |
       +                                <a href="/releases/">Releases</a> |
       +                                <a href="gopher://codemadness.org">Gopherhole</a>
       +                        </td>
       +                        <td id="links-contact" align="right">
       +                                <span class="hidden"> | </span>
       +                                <a href="/donate/">Donate</a> |
       +                                <a href="feeds.html">Feeds</a> |
       +                                <a href="pgp.asc">PGP</a> |
       +                                <a href="mailto:hiltjo@AT@codemadness.DOT.org">Mail</a>
       +                        </td>
       +                </tr>
       +                </table>
       +        </nav>
       +        <hr class="hidden" />
       +        <main id="mainwrap">
       +                <div id="main">
       +                        <article>
       +<header>
       +        <h1>xargs: an example for batch jobs</h1>
       +        <p>
       +        <strong>Last modification on </strong> <time>2023-11-22</time>
       +        </p>
       +</header>
       +
       +<p>This describes a simple shellscript programming pattern to process a list of
       +jobs in parallel. This script example is contained in one file.</p>
       +<h1>Simple but less optimal example</h1>
       +<pre><code>#!/bin/sh
       +maxjobs=4
       +
       +# fake program for example purposes.
       +someprogram() {
       +        echo "Yep yep, I'm totally a real program!"
       +        sleep "$1"
       +}
       +
       +# run(arg1, arg2)
       +run() {
       +        echo "[$1] $2 started" &gt;&amp;2
       +        someprogram "$1" &gt;/dev/null
       +        status="$?"
       +        echo "[$1] $2 done" &gt;&amp;2
       +        return "$status"
       +}
       +
       +# process the jobs.
       +j=1
       +for f in 1 2 3 4 5 6 7 8 9 10; do
       +        run "$f" "something" &amp;
       +
       +        jm=$((j % maxjobs)) # shell arithmetic: modulo
       +        test "$jm" = "0" &amp;&amp; wait
       +        j=$((j+1))
       +done
       +wait
       +</code></pre>
       +<h1>Why is this less optimal</h1>
       +<p>This is less optimal because it waits until all jobs in the same batch are finished
       +(each batch contain $maxjobs items).</p>
       +<p>For example with 2 items per batch and 4 total jobs it could be:</p>
       +<ul>
       +<li>Job 1 is started.</li>
       +<li>Job 2 is started.</li>
       +<li>Job 2 is done.</li>
       +<li>Job 1 is done.</li>
       +<li>Wait: wait on process status of all background processes.</li>
       +<li>Job 3 in new batch is started.</li>
       +</ul>
       +<p>This could be optimized to:</p>
       +<ul>
       +<li>Job 1 is started.</li>
       +<li>Job 2 is started.</li>
       +<li>Job 2 is done.</li>
       +<li>Job 3 in new batch is started (immediately).</li>
       +<li>Job 1 is done.</li>
       +<li>...</li>
       +</ul>
       +<p>It also does not handle signals such as SIGINT (^C). However the xargs example
       +below does:</p>
       +<h1>Example</h1>
       +<pre><code>#!/bin/sh
       +maxjobs=4
       +
       +# fake program for example purposes.
       +someprogram() {
       +        echo "Yep yep, I'm totally a real program!"
       +        sleep "$1"
       +}
       +
       +# run(arg1, arg2)
       +run() {
       +        echo "[$1] $2 started" &gt;&amp;2
       +        someprogram "$1" &gt;/dev/null
       +        status="$?"
       +        echo "[$1] $2 done" &gt;&amp;2
       +        return "$status"
       +}
       +
       +# child process job.
       +if test "$CHILD_MODE" = "1"; then
       +        run "$1" "$2"
       +        exit "$?"
       +fi
       +
       +# generate a list of jobs for processing.
       +list() {
       +        for f in 1 2 3 4 5 6 7 8 9 10; do
       +                printf '%s\0%s\0' "$f" "something"
       +        done
       +}
       +
       +# process jobs in parallel.
       +list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
       +</code></pre>
       +<h1>Run and timings</h1>
       +<p>Although the above example is kindof stupid, it already shows the queueing of
       +jobs is more efficient.</p>
       +<p>Script 1:</p>
       +<pre><code>time ./script1.sh
       +[...snip snip...]
       +real    0m22.095s
       +</code></pre>
       +<p>Script 2:</p>
       +<pre><code>time ./script2.sh
       +[...snip snip...]
       +real    0m18.120s
       +</code></pre>
       +<h1>How it works</h1>
       +<p>The parent process:</p>
       +<ul>
       +<li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
       +execute as a child process.</li>
       +<li>The list function writes the parameters to stdout. These parameters are
       +separated by the NUL byte separator. The NUL byte separator is used because
       +this character cannot be used in filenames (which can contain spaces or even
       +newlines) and cannot be used in text (the NUL byte terminates the buffer for
       +a string).</li>
       +<li>The -L option must match the amount of arguments that are specified for the
       +job. It will split the specified parameters per job.</li>
       +<li>The expression "$(readlink -f "$0")" gets the absolute path to the
       +shellscript itself. This is passed as the executable to run for xargs.</li>
       +<li>xargs calls the script itself with the specified parameters it is being fed.
       +The environment variable $CHILD_MODE is set to indicate to the script itself
       +it is run as a child process of the script.</li>
       +</ul>
       +<p>The child process:</p>
       +<ul>
       +<li><p>The command-line arguments are passed by the parent using xargs.</p>
       +</li>
       +<li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
       +it is run as a child process of the script.</p>
       +</li>
       +<li><p>The script itself (ran in child-mode process) only executes the task and
       +signals its status back to xargs and the parent.</p>
       +</li>
       +<li><p>The exit status of the child program is signaled to xargs. This could be
       +handled, for example to stop on the first failure (in this example it is not).
       +For example if the program is killed, stopped or the exit status is 255 then
       +xargs stops running also.</p>
       +</li>
       +</ul>
       +<h1>xargs -P and portability</h1>
       +<p>Note that some of the options, like -P are as of writing (2023) non-POSIX:
       +<a href="https://pubs.opengroup.org/onlinepubs/9699919799/">https://pubs.opengroup.org/onlinepubs/9699919799/</a>.
       +However many systems support this useful extension.</p>
       +<h1>Explanation of used xargs options:</h1>
       +<p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
       +<pre><code>xargs - construct argument list(s) and execute utility
       +</code></pre>
       +<p>Options explained:</p>
       +<ul>
       +<li>-r: Do not run the command if there are no arguments. Normally the command
       +is executed at least once even if there are no arguments.</li>
       +<li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
       +spaces and newlines.</li>
       +<li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
       +at once.</li>
       +<li>-L number: Call utility for every number of non-empty lines read. A line
       +ending in unescaped white space and the next non-empty line are considered
       +to form one single line. If EOF is reached and fewer than number lines have
       +been read then utility will be called with the available lines.</li>
       +</ul>
       +<h1>References</h1>
       +<ul>
       +<li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
       +<li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
       +<li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
       +</ul>
       +
       +                        </article>
       +                </div>
       +        </main>
       +</body>
       +</html>
 (DIR) diff --git a/output/xargs.md b/output/xargs.md
       @@ -0,0 +1,188 @@
       +This describes a simple shellscript programming pattern to process a list of
       +jobs in parallel. This script example is contained in one file.
       +
       +
       +# Simple but less optimal example
       +
       +        #!/bin/sh
       +        maxjobs=4
       +        
       +        # fake program for example purposes.
       +        someprogram() {
       +                echo "Yep yep, I'm totally a real program!"
       +                sleep "$1"
       +        }
       +        
       +        # run(arg1, arg2)
       +        run() {
       +                echo "[$1] $2 started" >&2
       +                someprogram "$1" >/dev/null
       +                status="$?"
       +                echo "[$1] $2 done" >&2
       +                return "$status"
       +        }
       +        
       +        # process the jobs.
       +        j=1
       +        for f in 1 2 3 4 5 6 7 8 9 10; do
       +                run "$f" "something" &
       +        
       +                jm=$((j % maxjobs)) # shell arithmetic: modulo
       +                test "$jm" = "0" && wait
       +                j=$((j+1))
       +        done
       +        wait
       +
       +
       +# Why is this less optimal
       +
       +This is less optimal because it waits until all jobs in the same batch are finished
       +(each batch contain $maxjobs items).
       +
       +For example with 2 items per batch and 4 total jobs it could be:
       +
       +* Job 1 is started.
       +* Job 2 is started.
       +* Job 2 is done.
       +* Job 1 is done.
       +* Wait: wait on process status of all background processes.
       +* Job 3 in new batch is started.
       +
       +
       +This could be optimized to:
       +
       +* Job 1 is started.
       +* Job 2 is started.
       +* Job 2 is done.
       +* Job 3 in new batch is started (immediately).
       +* Job 1 is done.
       +* ...
       +
       +
       +It also does not handle signals such as SIGINT (^C). However the xargs example
       +below does:
       +
       +
       +# Example
       +
       +        #!/bin/sh
       +        maxjobs=4
       +        
       +        # fake program for example purposes.
       +        someprogram() {
       +                echo "Yep yep, I'm totally a real program!"
       +                sleep "$1"
       +        }
       +        
       +        # run(arg1, arg2)
       +        run() {
       +                echo "[$1] $2 started" >&2
       +                someprogram "$1" >/dev/null
       +                status="$?"
       +                echo "[$1] $2 done" >&2
       +                return "$status"
       +        }
       +        
       +        # child process job.
       +        if test "$CHILD_MODE" = "1"; then
       +                run "$1" "$2"
       +                exit "$?"
       +        fi
       +        
       +        # generate a list of jobs for processing.
       +        list() {
       +                for f in 1 2 3 4 5 6 7 8 9 10; do
       +                        printf '%s\0%s\0' "$f" "something"
       +                done
       +        }
       +        
       +        # process jobs in parallel.
       +        list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
       +
       +
       +# Run and timings
       +
       +Although the above example is kindof stupid, it already shows the queueing of
       +jobs is more efficient.
       +
       +Script 1:
       +
       +        time ./script1.sh
       +        [...snip snip...]
       +        real    0m22.095s
       +
       +Script 2:
       +
       +        time ./script2.sh
       +        [...snip snip...]
       +        real    0m18.120s
       +
       +
       +# How it works
       +
       +The parent process:
       +
       +* The parent, using xargs, handles the queue of jobs and schedules the jobs to
       +  execute as a child process.
       +* The list function writes the parameters to stdout. These parameters are
       +  separated by the NUL byte separator. The NUL byte separator is used because
       +  this character cannot be used in filenames (which can contain spaces or even
       +  newlines) and cannot be used in text (the NUL byte terminates the buffer for
       +  a string).
       +* The -L option must match the amount of arguments that are specified for the
       +  job. It will split the specified parameters per job.
       +* The expression "$(readlink -f "$0")" gets the absolute path to the
       +  shellscript itself. This is passed as the executable to run for xargs.
       +* xargs calls the script itself with the specified parameters it is being fed.
       +  The environment variable $CHILD_MODE is set to indicate to the script itself
       +  it is run as a child process of the script.
       +
       +
       +The child process:
       +
       +* The command-line arguments are passed by the parent using xargs.
       +
       +* The environment variable $CHILD_MODE is set to indicate to the script itself
       +  it is run as a child process of the script.
       +
       +* The script itself (ran in child-mode process) only executes the task and
       +  signals its status back to xargs and the parent.
       +
       +* The exit status of the child program is signaled to xargs. This could be
       +  handled, for example to stop on the first failure (in this example it is not).
       +  For example if the program is killed, stopped or the exit status is 255 then
       +  xargs stops running also.
       +
       +
       +# xargs -P and portability
       +
       +Note that some of the options, like -P are as of writing (2023) non-POSIX:
       +<https://pubs.opengroup.org/onlinepubs/9699919799/>.
       +However many systems support this useful extension.
       +
       +
       +# Explanation of used xargs options:
       +
       +From the OpenBSD man page: <https://man.openbsd.org/xargs>
       +
       +        xargs - construct argument list(s) and execute utility
       +
       +Options explained:
       +
       +* -r: Do not run the command if there are no arguments. Normally the command
       +  is executed at least once even if there are no arguments.
       +* -0: Change xargs to expect NUL ('\0') characters as separators, instead of
       +  spaces and newlines.
       +* -P maxprocs: Parallel mode: run at most maxprocs invocations of utility
       +  at once.
       +* -L number: Call utility for every number of non-empty lines read. A line
       +  ending in unescaped white space and the next non-empty line are considered
       +  to form one single line. If EOF is reached and fewer than number lines have
       +  been read then utility will be called with the available lines.
       +
       +
       +# References
       +
       +* xargs: <https://man.openbsd.org/xargs>
       +* printf: <https://man.openbsd.org/printf>
       +* wait(2): <https://man.openbsd.org/wait>
 (DIR) diff --git a/pages/xargs.cfg b/pages/xargs.cfg
       @@ -0,0 +1,6 @@
       +title = xargs: an example for batch jobs
       +id = xargs
       +description = xargs: an example for batch jobs
       +keywords = xargs, wow hyper speed
       +created = 2023-11-22
       +updated = 2023-11-22
 (DIR) diff --git a/pages/xargs.md b/pages/xargs.md
       @@ -0,0 +1,188 @@
       +This describes a simple shellscript programming pattern to process a list of
       +jobs in parallel. This script example is contained in one file.
       +
       +
       +# Simple but less optimal example
       +
       +        #!/bin/sh
       +        maxjobs=4
       +        
       +        # fake program for example purposes.
       +        someprogram() {
       +                echo "Yep yep, I'm totally a real program!"
       +                sleep "$1"
       +        }
       +        
       +        # run(arg1, arg2)
       +        run() {
       +                echo "[$1] $2 started" >&2
       +                someprogram "$1" >/dev/null
       +                status="$?"
       +                echo "[$1] $2 done" >&2
       +                return "$status"
       +        }
       +        
       +        # process the jobs.
       +        j=1
       +        for f in 1 2 3 4 5 6 7 8 9 10; do
       +                run "$f" "something" &
       +        
       +                jm=$((j % maxjobs)) # shell arithmetic: modulo
       +                test "$jm" = "0" && wait
       +                j=$((j+1))
       +        done
       +        wait
       +
       +
       +# Why is this less optimal
       +
       +This is less optimal because it waits until all jobs in the same batch are finished
       +(each batch contain $maxjobs items).
       +
       +For example with 2 items per batch and 4 total jobs it could be:
       +
       +* Job 1 is started.
       +* Job 2 is started.
       +* Job 2 is done.
       +* Job 1 is done.
       +* Wait: wait on process status of all background processes.
       +* Job 3 in new batch is started.
       +
       +
       +This could be optimized to:
       +
       +* Job 1 is started.
       +* Job 2 is started.
       +* Job 2 is done.
       +* Job 3 in new batch is started (immediately).
       +* Job 1 is done.
       +* ...
       +
       +
       +It also does not handle signals such as SIGINT (^C). However the xargs example
       +below does:
       +
       +
       +# Example
       +
       +        #!/bin/sh
       +        maxjobs=4
       +        
       +        # fake program for example purposes.
       +        someprogram() {
       +                echo "Yep yep, I'm totally a real program!"
       +                sleep "$1"
       +        }
       +        
       +        # run(arg1, arg2)
       +        run() {
       +                echo "[$1] $2 started" >&2
       +                someprogram "$1" >/dev/null
       +                status="$?"
       +                echo "[$1] $2 done" >&2
       +                return "$status"
       +        }
       +        
       +        # child process job.
       +        if test "$CHILD_MODE" = "1"; then
       +                run "$1" "$2"
       +                exit "$?"
       +        fi
       +        
       +        # generate a list of jobs for processing.
       +        list() {
       +                for f in 1 2 3 4 5 6 7 8 9 10; do
       +                        printf '%s\0%s\0' "$f" "something"
       +                done
       +        }
       +        
       +        # process jobs in parallel.
       +        list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
       +
       +
       +# Run and timings
       +
       +Although the above example is kindof stupid, it already shows the queueing of
       +jobs is more efficient.
       +
       +Script 1:
       +
       +        time ./script1.sh
       +        [...snip snip...]
       +        real    0m22.095s
       +
       +Script 2:
       +
       +        time ./script2.sh
       +        [...snip snip...]
       +        real    0m18.120s
       +
       +
       +# How it works
       +
       +The parent process:
       +
       +* The parent, using xargs, handles the queue of jobs and schedules the jobs to
       +  execute as a child process.
       +* The list function writes the parameters to stdout. These parameters are
       +  separated by the NUL byte separator. The NUL byte separator is used because
       +  this character cannot be used in filenames (which can contain spaces or even
       +  newlines) and cannot be used in text (the NUL byte terminates the buffer for
       +  a string).
       +* The -L option must match the amount of arguments that are specified for the
       +  job. It will split the specified parameters per job.
       +* The expression "$(readlink -f "$0")" gets the absolute path to the
       +  shellscript itself. This is passed as the executable to run for xargs.
       +* xargs calls the script itself with the specified parameters it is being fed.
       +  The environment variable $CHILD_MODE is set to indicate to the script itself
       +  it is run as a child process of the script.
       +
       +
       +The child process:
       +
       +* The command-line arguments are passed by the parent using xargs.
       +
       +* The environment variable $CHILD_MODE is set to indicate to the script itself
       +  it is run as a child process of the script.
       +
       +* The script itself (ran in child-mode process) only executes the task and
       +  signals its status back to xargs and the parent.
       +
       +* The exit status of the child program is signaled to xargs. This could be
       +  handled, for example to stop on the first failure (in this example it is not).
       +  For example if the program is killed, stopped or the exit status is 255 then
       +  xargs stops running also.
       +
       +
       +# xargs -P and portability
       +
       +Note that some of the options, like -P are as of writing (2023) non-POSIX:
       +<https://pubs.opengroup.org/onlinepubs/9699919799/>.
       +However many systems support this useful extension.
       +
       +
       +# Explanation of used xargs options:
       +
       +From the OpenBSD man page: <https://man.openbsd.org/xargs>
       +
       +        xargs - construct argument list(s) and execute utility
       +
       +Options explained:
       +
       +* -r: Do not run the command if there are no arguments. Normally the command
       +  is executed at least once even if there are no arguments.
       +* -0: Change xargs to expect NUL ('\0') characters as separators, instead of
       +  spaces and newlines.
       +* -P maxprocs: Parallel mode: run at most maxprocs invocations of utility
       +  at once.
       +* -L number: Call utility for every number of non-empty lines read. A line
       +  ending in unescaped white space and the next non-empty line are considered
       +  to form one single line. If EOF is reached and fewer than number lines have
       +  been read then utility will be called with the available lines.
       +
       +
       +# References
       +
       +* xargs: <https://man.openbsd.org/xargs>
       +* printf: <https://man.openbsd.org/printf>
       +* wait(2): <https://man.openbsd.org/wait>