webdump.1 - webdump - HTML to plain-text converter for webpages
 (HTM) git clone git://git.codemadness.org/webdump
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       webdump.1 (3230B)
       ---
            1 .Dd October 6, 2023
            2 .Dt WEBDUMP 1
            3 .Os
            4 .Sh NAME
            5 .Nm webdump
            6 .Nd convert HTML to plain-text
            7 .Sh SYNOPSIS
            8 .Nm
            9 .Op Fl 8adiIlrx
           10 .Op Fl b Ar baseurl
           11 .Op Fl s Ar selector
           12 .Op Fl u Ar selector
           13 .Op Fl w Ar termwidth
           14 .Sh DESCRIPTION
           15 .Nm
           16 reads UTF-8 HTML data from stdin.
           17 It converts and writes the output as plain-text to stdout.
           18 A
           19 .Ar baseurl
           20 can be specified if the links in the feed are relative URLs.
           21 This must be an absolute URI.
           22 .Pp
           23 The options are as follows:
           24 .Bl -tag -width Ds
           25 .It Fl 8
           26 Use UTF-8 symbols for certain items like bullet items and rulers to make the
           27 output fancier.
           28 .It Fl a
           29 Toggle ANSI escape codes usage, by default it is not enabled.
           30 .It Fl b Ar baseurl
           31 Base URL of links.
           32 This is used to make links absolute.
           33 The specified URL is always preferred over the value in a <base/> tag.
           34 .It Fl d
           35 Deduplicate link references.
           36 When a duplicate link reference is found reuse the same link reference number.
           37 .It Fl i
           38 Toggle if link reference numbers are displayed inline or not, by default it is
           39 not enabled.
           40 .It Fl I
           41 Toggle if URLs for link reference are displayed inline or not, by default it is
           42 not enabled.
           43 .It Fl l
           44 Toggle if link references are displayed at the bottom or not, by default it is
           45 not enabled.
           46 .It Fl r
           47 Toggle if line-wrapping mode is enabled, by default it is not enabled.
           48 .It Fl s
           49 CSS-like selectors, this sets a reader mode to show only content matching the
           50 selector, see the section
           51 .Sx SELECTOR SYNTAX
           52 for the syntax.
           53 Multiple selectors can be specified by separating them with a comma.
           54 .It Fl u
           55 CSS-like selectors, this sets a reader mode to hide content matching the
           56 selector, see the section
           57 .Sx SELECTOR SYNTAX
           58 for the syntax.
           59 Multiple selectors can be specified by separating them with a comma.
           60 .It Fl w Ar termwidth
           61 The terminal width.
           62 The default is 77 characters.
           63 .It Fl x
           64 Write resources as TAB-separated lines to file descriptor 3.
           65 .El
           66 .Sh SELECTOR SYNTAX
           67 The syntax has some inspiration from CSS, but it is more limited.
           68 Some examples:
           69 .Bl -item
           70 .It
           71 "main" would match on the "main" tags.
           72 .It
           73 "#someid" would match on any tag which has the id attribute set to "someid".
           74 .It
           75 ".someclass" would match on any tag which has the class attribute set to
           76 "someclass".
           77 .It
           78 "main#someid" would match on the "main" tag which has the id attribute set to
           79 "someid".
           80 .It
           81 "main.someclass" would match on the "main" tags which has the class
           82 attribute set to "someclass".
           83 .It
           84 "ul li" would match on any "li" tag which also has a parent "ul" tag.
           85 .It
           86 "li@0" would match on any "li" tag which is also the first child element of its
           87 parent container.
           88 Note that this differs from filtering on a collection of "li" elements.
           89 .El
           90 .Sh EXIT STATUS
           91 .Ex -std
           92 .Sh EXAMPLES
           93 .Bd -literal
           94 url='https://codemadness.org/sfeed.html'
           95 
           96 curl -s "$url" | webdump -r -b "$url" | less
           97 
           98 curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
           99 
          100 curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
          101 .Ed
          102 .Pp
          103 To use
          104 .Nm
          105 as a HTML to text filter for example in the mutt mail client, change in
          106 ~/.mailcap:
          107 .Bd -literal
          108 text/html; webdump -i -l -r < %s; needsterminal; copiousoutput
          109 .Ed
          110 .Sh SEE ALSO
          111 .Xr curl 1 ,
          112 .Xr xmllint 1 ,
          113 .Xr xmlstarlet 1 ,
          114 .Xr ftp 1
          115 .Sh AUTHORS
          116 .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org