publish webdump post - www.codemadness.org - www.codemadness.org saait content files
 (HTM) git clone git://git.codemadness.org/www.codemadness.org
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) commit a958fcb1d4d7d22302bdb51fb1cdcda3afdd55b8
 (DIR) parent 5138e644ee86e41f39538e43005ba6429e94e27f
 (HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
       Date:   Fri, 28 Jun 2024 10:45:32 +0200
       
       publish webdump post
       
       The draft version was already linked from:
       https://www.bttr-software.de/forum/board_entry.php?id=21923
       
       Diffstat:
         M config.cfg                          |       2 +-
         M output/atom.xml                     |      14 +++++++++++++-
         M output/atom_content.xml             |     121 ++++++++++++++++++++++++++++++-
         M output/index                        |       1 +
         M output/index.html                   |       1 +
         M output/rss.xml                      |       8 ++++++++
         M output/rss_content.xml              |     114 +++++++++++++++++++++++++++++++
         M output/sitemap.xml                  |       4 ++++
         M output/twtxt.txt                    |       1 +
         M output/urllist.txt                  |       1 +
         A pages/webdump.cfg                   |       6 ++++++
         A pages/webdump.md                    |     135 +++++++++++++++++++++++++++++++
       
       12 files changed, 405 insertions(+), 3 deletions(-)
       ---
 (DIR) diff --git a/config.cfg b/config.cfg
       @@ -1,5 +1,5 @@
        # last updated the site.
       -siteupdated = 2024-05-18
       +siteupdated = 2024-06-28
        
        sitetitle = Codemadness
        siteurl = https://www.codemadness.org
 (DIR) diff --git a/output/atom.xml b/output/atom.xml
       @@ -2,7 +2,7 @@
        <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
                <title>Codemadness</title>
                <subtitle>blog with various projects and articles about computer-related things</subtitle>
       -        <updated>2024-05-18T00:00:00Z</updated>
       +        <updated>2024-06-28T00:00:00Z</updated>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org" />
                <id>https://www.codemadness.org/atom.xml</id>
                <link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom.xml" />
       @@ -43,6 +43,18 @@
                <summary>Improved Youtube Atom feed by adding video duration and filtering away shorts</summary>
        </entry>
        <entry>
       +        <title>webdump HTML to plain-text converter</title>
       +        <link rel="alternate" type="text/html" href="https://www.codemadness.org/webdump.html" />
       +        <id>https://www.codemadness.org/webdump.html</id>
       +        <updated>2023-11-20T00:00:00Z</updated>
       +        <published>2023-11-20T00:00:00Z</published>
       +        <author>
       +                <name>Hiltjo</name>
       +                <uri>https://www.codemadness.org</uri>
       +        </author>
       +        <summary>webdump HTML to plain-text converter</summary>
       +</entry>
       +<entry>
                <title>Setup your own mail paste service</title>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org/mailservice.html" />
                <id>https://www.codemadness.org/mailservice.html</id>
 (DIR) diff --git a/output/atom_content.xml b/output/atom_content.xml
       @@ -2,7 +2,7 @@
        <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
                <title>Codemadness</title>
                <subtitle>blog with various projects and articles about computer-related things</subtitle>
       -        <updated>2024-05-18T00:00:00Z</updated>
       +        <updated>2024-06-28T00:00:00Z</updated>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org" />
                <id>https://www.codemadness.org/atom_content.xml</id>
                <link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom_content.xml" />
       @@ -512,6 +512,125 @@ feeds() {
        ]]></content>
        </entry>
        <entry>
       +        <title>webdump HTML to plain-text converter</title>
       +        <link rel="alternate" type="text/html" href="https://www.codemadness.org/webdump.html" />
       +        <id>https://www.codemadness.org/webdump.html</id>
       +        <updated>2023-11-20T00:00:00Z</updated>
       +        <published>2023-11-20T00:00:00Z</published>
       +        <author>
       +                <name>Hiltjo</name>
       +                <uri>https://www.codemadness.org</uri>
       +        </author>
       +        <summary>webdump HTML to plain-text converter</summary>
       +        <content type="html"><![CDATA[<h1>webdump HTML to plain-text converter</h1>
       +        <p><strong>Last modification on </strong> <time>2023-11-20</time></p>
       +        <p>webdump is (yet another) HTML to plain-text converter tool.</p>
       +<p>It reads HTML in UTF-8 from stdin and writes plain-text to stdout.</p>
       +<h2>Goals and scope</h2>
       +<p>The main goal of this tool for me is to use it for converting HTML mails to
       +plain-text and to convert HTML content in RSS feeds to plain-text.</p>
       +<p>The tool will only convert HTML to stdout, similarly to links -dump or lynx
       +-dump but simpler and more secure.</p>
       +<ul>
       +<li>HTML and XHTML will be supported.</li>
       +<li>There will be some workarounds and quirks for broken and legacy HTML code.</li>
       +<li>It will be usable and secure for reading HTML from mails and RSS/Atom feeds.</li>
       +<li>No remote resources which are part of the HTML will be downloaded:
       +images, video, audio, etc. But these may be visible as a link reference.</li>
       +<li>Data will be written to stdout. Intended for plain-text or a text terminal.</li>
       +<li>No support for Javascript, CSS, frame rendering or form processing.</li>
       +<li>No HTTP or network protocol handling: HTML data is read from stdin.</li>
       +<li>Listings for references and some options to extract them in a list that is
       +usable for scripting. Some references are: link anchors, images, audio, video,
       +HTML (i)frames, etc.</li>
       +<li>Security: on OpenBSD it uses pledge("stdio", NULL).</li>
       +<li>Keep the code relatively small, simple and hackable.</li>
       +</ul>
       +<h2>Features</h2>
       +<ul>
       +<li>Support for word-wrapping.</li>
       +<li>A mode to enable basic markup: bold, underline, italic and blink ;)</li>
       +<li>Indentation of headers, paragraphs, pre and list items.</li>
       +<li>Basic support to query an elements or hide them.</li>
       +<li>Show link references.</li>
       +<li>Show link references and resources such as img, video, audio, subtitles.</li>
       +<li>Export link references and resources to a TAB-separated format.</li>
       +</ul>
       +<h2>Usage examples</h2>
       +<pre><code>url='https://codemadness.org/sfeed.html'
       +
       +curl -s "$url" | webdump -r -b "$url" | less
       +
       +curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
       +
       +curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
       +</code></pre>
       +<p>Yes, all these option flags look ugly, a shellscript wrapper could be used :)</p>
       +<h2>Practical examples</h2>
       +<p>To use webdump as a HTML to text filter for example in the mutt mail client,
       +change in ~/.mailcap:</p>
       +<pre><code>text/html; webdump -i -l -r &lt; %s; needsterminal; copiousoutput
       +</code></pre>
       +<p>In mutt you should then add:</p>
       +<pre><code>auto_view text/html
       +</code></pre>
       +<p>Using webdump as a HTML to text filter for sfeed_curses (otherwise the default is lynx):</p>
       +<pre><code>SFEED_HTMLCONV="webdump -d -8 -r -i -l -a" sfeed_curses ~/.sfeed/feeds/*
       +</code></pre>
       +<h1>Query/selector examples</h1>
       +<p>The query syntax using the -s option is a bit inspired by CSS (but much more limited).</p>
       +<p>To get the title from a HTML page:</p>
       +<pre><code>url='https://codemadness.org/sfeed.html'
       +
       +title=$(curl -s "$url" | webdump -s 'title' "$url")
       +printf '%s\n' "$title"
       +</code></pre>
       +<p>List audio and video-related content from a HTML page, redirect fd 3 to fd 1 (stdout):</p>
       +<pre><code>url="https://media.ccc.de/v/051_Recent_features_to_OpenBSD-ntpd_and_bgpd"
       +curl -s "$url" | webdump -x -s 'audio,video' "$url" 3&gt;&amp;1 &gt;/dev/null | cut -f 2
       +</code></pre>
       +<h2>Clone</h2>
       +<pre><code>git clone git://git.codemadness.org/webdump
       +</code></pre>
       +<h2>Browse</h2>
       +<p>You can browse the source-code at:</p>
       +<ul>
       +<li><a href="https://git.codemadness.org/webdump/">https://git.codemadness.org/webdump/</a></li>
       +<li><a href="gopher://codemadness.org/1/git/webdump">gopher://codemadness.org/1/git/webdump</a></li>
       +</ul>
       +<h2>Build and install</h2>
       +<pre><code>$ make
       +# make install
       +</code></pre>
       +<h2>Dependencies</h2>
       +<ul>
       +<li>C compiler.</li>
       +<li>libc + some BSDisms.</li>
       +</ul>
       +<h2>Trade-offs</h2>
       +<p>All software has trade-offs.</p>
       +<p>webdump processes HTML in a single-pass. It does not buffer the full DOM tree.
       +Although due to the nature of HTML/XML some parts like attributes need to be
       +buffered.</p>
       +<p>Rendering tables in webdump is very limited. Twibright Links has really nice
       +table rendering. However implementing a similar feature in the current design of
       +webdump would make the code much more complex. Twibright links
       +processes a full DOM tree and processes the tables in multiple passes (to
       +measure the table cells) etc.  Of course tables can be nested also, or HTML tables
       +that are used for creating layouts (these are mostly older webpages).</p>
       +<p>These trade-offs and preferences are chosen for now. It may change in the
       +future.  Fortunately there are the usual good suspects for HTML to plain-text
       +conversion, each with their own chosen trade-offs of course:</p>
       +<ul>
       +<li>twibright links: <a href="http://links.twibright.com/">http://links.twibright.com/</a></li>
       +<li>lynx: <a href="https://lynx.invisible-island.net/">https://lynx.invisible-island.net/</a></li>
       +<li>w3m: <a href="https://w3m.sourceforge.net/">https://w3m.sourceforge.net/</a></li>
       +<li>xmllint (part of libxml2): <a href="https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home">https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home</a></li>
       +<li>xmlstarlet: <a href="https://xmlstar.sourceforge.net/">https://xmlstar.sourceforge.net/</a></li>
       +</ul>
       +]]></content>
       +</entry>
       +<entry>
                <title>Setup your own mail paste service</title>
                <link rel="alternate" type="text/html" href="https://www.codemadness.org/mailservice.html" />
                <id>https://www.codemadness.org/mailservice.html</id>
 (DIR) diff --git a/output/index b/output/index
       @@ -14,6 +14,7 @@ i                codemadness.org        70
        12024-02-02 Chess puzzle book generator        /phlog/chess-puzzles        codemadness.org        70
        12023-11-22 xargs: an example for parallel batch jobs        /phlog/xargs        codemadness.org        70
        12023-11-20 Improved Youtube RSS/Atom feed        /phlog/youtube-feed        codemadness.org        70
       +12023-11-20 webdump HTML to plain-text converter        /phlog/webdump        codemadness.org        70
        12023-10-25 Setup your own mail paste service        /phlog/mailservice        codemadness.org        70
        12022-07-01 A simple TODO application        /phlog/todo        codemadness.org        70
        12022-03-23 2FA TOTP without crappy authenticator apps        /phlog/totp        codemadness.org        70
 (DIR) diff --git a/output/index.html b/output/index.html
       @@ -43,6 +43,7 @@
        <tr><td><time>2024-02-02</time></td><td><a href="chess-puzzles.html">Chess puzzle book generator</a></td></tr>
        <tr><td><time>2023-11-22</time></td><td><a href="xargs.html">xargs: an example for parallel batch jobs</a></td></tr>
        <tr><td><time>2023-11-20</time></td><td><a href="youtube-feed.html">Improved Youtube RSS/Atom feed</a></td></tr>
       +<tr><td><time>2023-11-20</time></td><td><a href="webdump.html">webdump HTML to plain-text converter</a></td></tr>
        <tr><td><time>2023-10-25</time></td><td><a href="mailservice.html">Setup your own mail paste service</a></td></tr>
        <tr><td><time>2022-07-01</time></td><td><a href="todo-application.html">A simple TODO application</a></td></tr>
        <tr><td><time>2022-03-23</time></td><td><a href="totp.html">2FA TOTP without crappy authenticator apps</a></td></tr>
 (DIR) diff --git a/output/rss.xml b/output/rss.xml
       @@ -31,6 +31,14 @@
                <description>Improved Youtube Atom feed by adding video duration and filtering away shorts</description>
        </item>
        <item>
       +        <title>webdump HTML to plain-text converter</title>
       +        <link>https://www.codemadness.org/webdump.html</link>
       +        <guid>https://www.codemadness.org/webdump.html</guid>
       +        <dc:date>2023-11-20T00:00:00Z</dc:date>
       +        <author>Hiltjo</author>
       +        <description>webdump HTML to plain-text converter</description>
       +</item>
       +<item>
                <title>Setup your own mail paste service</title>
                <link>https://www.codemadness.org/mailservice.html</link>
                <guid>https://www.codemadness.org/mailservice.html</guid>
 (DIR) diff --git a/output/rss_content.xml b/output/rss_content.xml
       @@ -497,6 +497,120 @@ feeds() {
        ]]></description>
        </item>
        <item>
       +        <title>webdump HTML to plain-text converter</title>
       +        <link>https://www.codemadness.org/webdump.html</link>
       +        <guid>https://www.codemadness.org/webdump.html</guid>
       +        <dc:date>2023-11-20T00:00:00Z</dc:date>
       +        <author>Hiltjo</author>
       +        <description><![CDATA[<h1>webdump HTML to plain-text converter</h1>
       +        <p><strong>Last modification on </strong> <time>2023-11-20</time></p>
       +        <p>webdump is (yet another) HTML to plain-text converter tool.</p>
       +<p>It reads HTML in UTF-8 from stdin and writes plain-text to stdout.</p>
       +<h2>Goals and scope</h2>
       +<p>The main goal of this tool for me is to use it for converting HTML mails to
       +plain-text and to convert HTML content in RSS feeds to plain-text.</p>
       +<p>The tool will only convert HTML to stdout, similarly to links -dump or lynx
       +-dump but simpler and more secure.</p>
       +<ul>
       +<li>HTML and XHTML will be supported.</li>
       +<li>There will be some workarounds and quirks for broken and legacy HTML code.</li>
       +<li>It will be usable and secure for reading HTML from mails and RSS/Atom feeds.</li>
       +<li>No remote resources which are part of the HTML will be downloaded:
       +images, video, audio, etc. But these may be visible as a link reference.</li>
       +<li>Data will be written to stdout. Intended for plain-text or a text terminal.</li>
       +<li>No support for Javascript, CSS, frame rendering or form processing.</li>
       +<li>No HTTP or network protocol handling: HTML data is read from stdin.</li>
       +<li>Listings for references and some options to extract them in a list that is
       +usable for scripting. Some references are: link anchors, images, audio, video,
       +HTML (i)frames, etc.</li>
       +<li>Security: on OpenBSD it uses pledge("stdio", NULL).</li>
       +<li>Keep the code relatively small, simple and hackable.</li>
       +</ul>
       +<h2>Features</h2>
       +<ul>
       +<li>Support for word-wrapping.</li>
       +<li>A mode to enable basic markup: bold, underline, italic and blink ;)</li>
       +<li>Indentation of headers, paragraphs, pre and list items.</li>
       +<li>Basic support to query an elements or hide them.</li>
       +<li>Show link references.</li>
       +<li>Show link references and resources such as img, video, audio, subtitles.</li>
       +<li>Export link references and resources to a TAB-separated format.</li>
       +</ul>
       +<h2>Usage examples</h2>
       +<pre><code>url='https://codemadness.org/sfeed.html'
       +
       +curl -s "$url" | webdump -r -b "$url" | less
       +
       +curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
       +
       +curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
       +</code></pre>
       +<p>Yes, all these option flags look ugly, a shellscript wrapper could be used :)</p>
       +<h2>Practical examples</h2>
       +<p>To use webdump as a HTML to text filter for example in the mutt mail client,
       +change in ~/.mailcap:</p>
       +<pre><code>text/html; webdump -i -l -r &lt; %s; needsterminal; copiousoutput
       +</code></pre>
       +<p>In mutt you should then add:</p>
       +<pre><code>auto_view text/html
       +</code></pre>
       +<p>Using webdump as a HTML to text filter for sfeed_curses (otherwise the default is lynx):</p>
       +<pre><code>SFEED_HTMLCONV="webdump -d -8 -r -i -l -a" sfeed_curses ~/.sfeed/feeds/*
       +</code></pre>
       +<h1>Query/selector examples</h1>
       +<p>The query syntax using the -s option is a bit inspired by CSS (but much more limited).</p>
       +<p>To get the title from a HTML page:</p>
       +<pre><code>url='https://codemadness.org/sfeed.html'
       +
       +title=$(curl -s "$url" | webdump -s 'title' "$url")
       +printf '%s\n' "$title"
       +</code></pre>
       +<p>List audio and video-related content from a HTML page, redirect fd 3 to fd 1 (stdout):</p>
       +<pre><code>url="https://media.ccc.de/v/051_Recent_features_to_OpenBSD-ntpd_and_bgpd"
       +curl -s "$url" | webdump -x -s 'audio,video' "$url" 3&gt;&amp;1 &gt;/dev/null | cut -f 2
       +</code></pre>
       +<h2>Clone</h2>
       +<pre><code>git clone git://git.codemadness.org/webdump
       +</code></pre>
       +<h2>Browse</h2>
       +<p>You can browse the source-code at:</p>
       +<ul>
       +<li><a href="https://git.codemadness.org/webdump/">https://git.codemadness.org/webdump/</a></li>
       +<li><a href="gopher://codemadness.org/1/git/webdump">gopher://codemadness.org/1/git/webdump</a></li>
       +</ul>
       +<h2>Build and install</h2>
       +<pre><code>$ make
       +# make install
       +</code></pre>
       +<h2>Dependencies</h2>
       +<ul>
       +<li>C compiler.</li>
       +<li>libc + some BSDisms.</li>
       +</ul>
       +<h2>Trade-offs</h2>
       +<p>All software has trade-offs.</p>
       +<p>webdump processes HTML in a single-pass. It does not buffer the full DOM tree.
       +Although due to the nature of HTML/XML some parts like attributes need to be
       +buffered.</p>
       +<p>Rendering tables in webdump is very limited. Twibright Links has really nice
       +table rendering. However implementing a similar feature in the current design of
       +webdump would make the code much more complex. Twibright links
       +processes a full DOM tree and processes the tables in multiple passes (to
       +measure the table cells) etc.  Of course tables can be nested also, or HTML tables
       +that are used for creating layouts (these are mostly older webpages).</p>
       +<p>These trade-offs and preferences are chosen for now. It may change in the
       +future.  Fortunately there are the usual good suspects for HTML to plain-text
       +conversion, each with their own chosen trade-offs of course:</p>
       +<ul>
       +<li>twibright links: <a href="http://links.twibright.com/">http://links.twibright.com/</a></li>
       +<li>lynx: <a href="https://lynx.invisible-island.net/">https://lynx.invisible-island.net/</a></li>
       +<li>w3m: <a href="https://w3m.sourceforge.net/">https://w3m.sourceforge.net/</a></li>
       +<li>xmllint (part of libxml2): <a href="https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home">https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home</a></li>
       +<li>xmlstarlet: <a href="https://xmlstar.sourceforge.net/">https://xmlstar.sourceforge.net/</a></li>
       +</ul>
       +]]></description>
       +</item>
       +<item>
                <title>Setup your own mail paste service</title>
                <link>https://www.codemadness.org/mailservice.html</link>
                <guid>https://www.codemadness.org/mailservice.html</guid>
 (DIR) diff --git a/output/sitemap.xml b/output/sitemap.xml
       @@ -13,6 +13,10 @@
                <lastmod>2023-11-20</lastmod>
        </url>
        <url>
       +        <loc>https://www.codemadness.org/webdump.html</loc>
       +        <lastmod>2023-11-20</lastmod>
       +</url>
       +<url>
                <loc>https://www.codemadness.org/mailservice.html</loc>
                <lastmod>2024-02-10</lastmod>
        </url>
 (DIR) diff --git a/output/twtxt.txt b/output/twtxt.txt
       @@ -1,6 +1,7 @@
        2024-02-02T00:00:00Z        Chess puzzle book generator: https://www.codemadness.org/chess-puzzles.html
        2023-11-22T00:00:00Z        xargs: an example for parallel batch jobs: https://www.codemadness.org/xargs.html
        2023-11-20T00:00:00Z        Improved Youtube RSS/Atom feed: https://www.codemadness.org/youtube-feed.html
       +2023-11-20T00:00:00Z        webdump HTML to plain-text converter: https://www.codemadness.org/webdump.html
        2023-10-25T00:00:00Z        Setup your own mail paste service: https://www.codemadness.org/mailservice.html
        2022-07-01T00:00:00Z        A simple TODO application: https://www.codemadness.org/todo-application.html
        2022-03-23T00:00:00Z        2FA TOTP without crappy authenticator apps: https://www.codemadness.org/totp.html
 (DIR) diff --git a/output/urllist.txt b/output/urllist.txt
       @@ -1,6 +1,7 @@
        https://www.codemadness.org/chess-puzzles.html
        https://www.codemadness.org/xargs.html
        https://www.codemadness.org/youtube-feed.html
       +https://www.codemadness.org/webdump.html
        https://www.codemadness.org/mailservice.html
        https://www.codemadness.org/todo-application.html
        https://www.codemadness.org/totp.html
 (DIR) diff --git a/pages/webdump.cfg b/pages/webdump.cfg
       @@ -0,0 +1,6 @@
       +title = webdump HTML to plain-text converter
       +id = webdump
       +description = webdump HTML to plain-text converter
       +keywords = webdump, HTML to plain-text, converter, formatter
       +created = 2023-11-20
       +updated = 2023-11-20
 (DIR) diff --git a/pages/webdump.md b/pages/webdump.md
       @@ -0,0 +1,135 @@
       +webdump is (yet another) HTML to plain-text converter tool.
       +
       +It reads HTML in UTF-8 from stdin and writes plain-text to stdout.
       +
       +
       +## Goals and scope
       +
       +The main goal of this tool for me is to use it for converting HTML mails to
       +plain-text and to convert HTML content in RSS feeds to plain-text.
       +
       +The tool will only convert HTML to stdout, similarly to links -dump or lynx
       +-dump but simpler and more secure.
       +
       +* HTML and XHTML will be supported.
       +* There will be some workarounds and quirks for broken and legacy HTML code.
       +* It will be usable and secure for reading HTML from mails and RSS/Atom feeds.
       +* No remote resources which are part of the HTML will be downloaded:
       +  images, video, audio, etc. But these may be visible as a link reference.
       +* Data will be written to stdout. Intended for plain-text or a text terminal.
       +* No support for Javascript, CSS, frame rendering or form processing.
       +* No HTTP or network protocol handling: HTML data is read from stdin.
       +* Listings for references and some options to extract them in a list that is
       +  usable for scripting. Some references are: link anchors, images, audio, video,
       +  HTML (i)frames, etc.
       +* Security: on OpenBSD it uses pledge("stdio", NULL).
       +* Keep the code relatively small, simple and hackable.
       +
       +
       +## Features
       +
       +* Support for word-wrapping.
       +* A mode to enable basic markup: bold, underline, italic and blink ;)
       +* Indentation of headers, paragraphs, pre and list items.
       +* Basic support to query an elements or hide them.
       +* Show link references.
       +* Show link references and resources such as img, video, audio, subtitles.
       +* Export link references and resources to a TAB-separated format.
       +
       +
       +## Usage examples
       +
       +        url='https://codemadness.org/sfeed.html'
       +        
       +        curl -s "$url" | webdump -r -b "$url" | less
       +        
       +        curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
       +        
       +        curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
       +
       +Yes, all these option flags look ugly, a shellscript wrapper could be used :)
       +
       +
       +## Practical examples
       +
       +To use webdump as a HTML to text filter for example in the mutt mail client,
       +change in ~/.mailcap:
       +
       +        text/html; webdump -i -l -r < %s; needsterminal; copiousoutput
       +
       +In mutt you should then add:
       +
       +        auto_view text/html
       +
       +
       +Using webdump as a HTML to text filter for sfeed_curses (otherwise the default is lynx):
       +
       +        SFEED_HTMLCONV="webdump -d -8 -r -i -l -a" sfeed_curses ~/.sfeed/feeds/*
       +
       +
       +# Query/selector examples
       +
       +The query syntax using the -s option is a bit inspired by CSS (but much more limited).
       +
       +To get the title from a HTML page:
       +
       +        url='https://codemadness.org/sfeed.html'
       +        
       +        title=$(curl -s "$url" | webdump -s 'title' "$url")
       +        printf '%s\n' "$title"
       +
       +List audio and video-related content from a HTML page, redirect fd 3 to fd 1 (stdout):
       +
       +        url="https://media.ccc.de/v/051_Recent_features_to_OpenBSD-ntpd_and_bgpd"
       +        curl -s "$url" | webdump -x -s 'audio,video' "$url" 3>&1 >/dev/null | cut -f 2
       +
       +
       +## Clone
       +
       +        git clone git://git.codemadness.org/webdump
       +
       +
       +## Browse
       +
       +You can browse the source-code at:
       +
       +* <https://git.codemadness.org/webdump/>
       +* <gopher://codemadness.org/1/git/webdump>
       +
       +
       +## Build and install
       +
       +        $ make
       +        # make install
       +
       +
       +## Dependencies
       +
       +* C compiler.
       +* libc + some BSDisms.
       +
       +
       +## Trade-offs
       +
       +All software has trade-offs.
       +
       +webdump processes HTML in a single-pass. It does not buffer the full DOM tree.
       +Although due to the nature of HTML/XML some parts like attributes need to be
       +buffered.
       +
       +Rendering tables in webdump is very limited. Twibright Links has really nice
       +table rendering. However implementing a similar feature in the current design of
       +webdump would make the code much more complex. Twibright links
       +processes a full DOM tree and processes the tables in multiple passes (to
       +measure the table cells) etc.  Of course tables can be nested also, or HTML tables
       +that are used for creating layouts (these are mostly older webpages).
       +
       +These trade-offs and preferences are chosen for now. It may change in the
       +future.  Fortunately there are the usual good suspects for HTML to plain-text
       +conversion, each with their own chosen trade-offs of course:
       +
       +* twibright links: <http://links.twibright.com/>
       +* lynx: <https://lynx.invisible-island.net/>
       +* w3m: <https://w3m.sourceforge.net/>
       +* xmllint (part of libxml2): <https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home>
       +* xmlstarlet: <https://xmlstar.sourceforge.net/>