webdump.1 - webdump - HTML to plain-text converter for webpages (HTM) git clone git://git.codemadness.org/webdump (DIR) Log (DIR) Files (DIR) Refs (DIR) README (DIR) LICENSE --- webdump.1 (3230B) --- 1 .Dd October 6, 2023 2 .Dt WEBDUMP 1 3 .Os 4 .Sh NAME 5 .Nm webdump 6 .Nd convert HTML to plain-text 7 .Sh SYNOPSIS 8 .Nm 9 .Op Fl 8adiIlrx 10 .Op Fl b Ar baseurl 11 .Op Fl s Ar selector 12 .Op Fl u Ar selector 13 .Op Fl w Ar termwidth 14 .Sh DESCRIPTION 15 .Nm 16 reads UTF-8 HTML data from stdin. 17 It converts and writes the output as plain-text to stdout. 18 A 19 .Ar baseurl 20 can be specified if the links in the feed are relative URLs. 21 This must be an absolute URI. 22 .Pp 23 The options are as follows: 24 .Bl -tag -width Ds 25 .It Fl 8 26 Use UTF-8 symbols for certain items like bullet items and rulers to make the 27 output fancier. 28 .It Fl a 29 Toggle ANSI escape codes usage, by default it is not enabled. 30 .It Fl b Ar baseurl 31 Base URL of links. 32 This is used to make links absolute. 33 The specified URL is always preferred over the value in a <base/> tag. 34 .It Fl d 35 Deduplicate link references. 36 When a duplicate link reference is found reuse the same link reference number. 37 .It Fl i 38 Toggle if link reference numbers are displayed inline or not, by default it is 39 not enabled. 40 .It Fl I 41 Toggle if URLs for link reference are displayed inline or not, by default it is 42 not enabled. 43 .It Fl l 44 Toggle if link references are displayed at the bottom or not, by default it is 45 not enabled. 46 .It Fl r 47 Toggle if line-wrapping mode is enabled, by default it is not enabled. 48 .It Fl s 49 CSS-like selectors, this sets a reader mode to show only content matching the 50 selector, see the section 51 .Sx SELECTOR SYNTAX 52 for the syntax. 53 Multiple selectors can be specified by separating them with a comma. 54 .It Fl u 55 CSS-like selectors, this sets a reader mode to hide content matching the 56 selector, see the section 57 .Sx SELECTOR SYNTAX 58 for the syntax. 59 Multiple selectors can be specified by separating them with a comma. 60 .It Fl w Ar termwidth 61 The terminal width. 62 The default is 77 characters. 63 .It Fl x 64 Write resources as TAB-separated lines to file descriptor 3. 65 .El 66 .Sh SELECTOR SYNTAX 67 The syntax has some inspiration from CSS, but it is more limited. 68 Some examples: 69 .Bl -item 70 .It 71 "main" would match on the "main" tags. 72 .It 73 "#someid" would match on any tag which has the id attribute set to "someid". 74 .It 75 ".someclass" would match on any tag which has the class attribute set to 76 "someclass". 77 .It 78 "main#someid" would match on the "main" tag which has the id attribute set to 79 "someid". 80 .It 81 "main.someclass" would match on the "main" tags which has the class 82 attribute set to "someclass". 83 .It 84 "ul li" would match on any "li" tag which also has a parent "ul" tag. 85 .It 86 "li@0" would match on any "li" tag which is also the first child element of its 87 parent container. 88 Note that this differs from filtering on a collection of "li" elements. 89 .El 90 .Sh EXIT STATUS 91 .Ex -std 92 .Sh EXAMPLES 93 .Bd -literal 94 url='https://codemadness.org/sfeed.html' 95 96 curl -s "$url" | webdump -r -b "$url" | less 97 98 curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R 99 100 curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R 101 .Ed 102 .Pp 103 To use 104 .Nm 105 as a HTML to text filter for example in the mutt mail client, change in 106 ~/.mailcap: 107 .Bd -literal 108 text/html; webdump -i -l -r < %s; needsterminal; copiousoutput 109 .Ed 110 .Sh SEE ALSO 111 .Xr curl 1 , 112 .Xr xmllint 1 , 113 .Xr xmlstarlet 1 , 114 .Xr ftp 1 115 .Sh AUTHORS 116 .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org