README - sfeed - RSS and Atom parser
 (HTM) git clone git://git.codemadness.org/sfeed
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       README (34641B)
       ---
            1 sfeed
            2 -----
            3 
            4 RSS and Atom parser (and some format programs).
            5 
            6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are
            7 formatting programs included to convert this TAB-separated format to various
            8 other formats. There are also some programs and scripts included to import and
            9 export OPML and to fetch, filter, merge and order feed items.
           10 
           11 
           12 Build and install
           13 -----------------
           14 
           15 $ make
           16 # make install
           17 
           18 
           19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
           20 
           21 $ make SFEED_CURSES=""
           22 # make SFEED_CURSES="" install
           23 
           24 
           25 To change the theme for sfeed_curses you can set SFEED_THEME.  See the themes/
           26 directory for the theme names.
           27 
           28 $ make SFEED_THEME="templeos"
           29 # make SFEED_THEME="templeos" install
           30 
           31 
           32 Usage
           33 -----
           34 
           35 Initial setup:
           36 
           37         mkdir -p "$HOME/.sfeed/feeds"
           38         cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
           39 
           40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
           41 is included and evaluated as a shellscript for sfeed_update, so its functions
           42 and behaviour can be overridden:
           43 
           44         $EDITOR "$HOME/.sfeed/sfeedrc"
           45 
           46 or you can import existing OPML subscriptions using sfeed_opml_import(1):
           47 
           48         sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
           49 
           50 an example to export from an other RSS/Atom reader called newsboat and import
           51 for sfeed_update:
           52 
           53         newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
           54 
           55 an example to export from an other RSS/Atom reader called rss2email (3.x+) and
           56 import for sfeed_update:
           57 
           58         r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
           59 
           60 Update feeds, this script merges the new items, see sfeed_update(1) for more
           61 information what it can do:
           62 
           63         sfeed_update
           64 
           65 Format feeds:
           66 
           67 Plain-text list:
           68 
           69         sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
           70 
           71 HTML view (no frames), copy style.css for a default style:
           72 
           73         cp style.css "$HOME/.sfeed/style.css"
           74         sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
           75 
           76 HTML view with the menu as frames, copy style.css for a default style:
           77 
           78         mkdir -p "$HOME/.sfeed/frames"
           79         cp style.css "$HOME/.sfeed/frames/style.css"
           80         cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
           81 
           82 To automatically update your feeds periodically and format them in a way you
           83 like you can make a wrapper script and add it as a cronjob.
           84 
           85 Most protocols are supported because curl(1) is used by default and also proxy
           86 settings from the environment (such as the $http_proxy environment variable)
           87 are used.
           88 
           89 The sfeed(1) program itself is just a parser that parses XML data from stdin
           90 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
           91 Gopher, SSH, etc.
           92 
           93 See the section "Usage and examples" below and the man-pages for more
           94 information how to use sfeed(1) and the additional tools.
           95 
           96 
           97 Dependencies
           98 ------------
           99 
          100 - C compiler (C99).
          101 - libc (recommended: C99 and POSIX >= 200809).
          102 
          103 
          104 Optional dependencies
          105 ---------------------
          106 
          107 - POSIX make(1) for the Makefile.
          108 - POSIX sh(1),
          109   used by sfeed_update(1) and sfeed_opml_export(1).
          110 - POSIX utilities such as awk(1) and sort(1),
          111   used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
          112   sfeed_update(1).
          113 - curl(1) binary: https://curl.haxx.se/ ,
          114   used by sfeed_update(1), but can be replaced with any tool like wget(1),
          115   OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
          116 - iconv(1) command-line utilities,
          117   used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
          118   encoded then you don't need this. For a minimal iconv implementation:
          119   https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
          120 - xargs with support for the -P and -0 option,
          121   used by sfeed_update(1).
          122 - mandoc for documentation: https://mdocml.bsd.lv/
          123 - curses (typically ncurses), otherwise see minicurses.h,
          124   used by sfeed_curses(1).
          125 - a terminal (emulator) supporting UTF-8 and the used capabilities,
          126   used by sfeed_curses(1).
          127 
          128 
          129 Optional run-time dependencies for sfeed_curses
          130 -----------------------------------------------
          131 
          132 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
          133 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
          134 - awk, used by the sfeed_content and sfeed_markread script.
          135   See the ENVIRONMENT VARIABLES section in the man page to change it.
          136 - lynx, used by the sfeed_content script to convert HTML content.
          137   See the ENVIRONMENT VARIABLES section in the man page to change it.
          138 
          139 
          140 Formats supported
          141 -----------------
          142 
          143 sfeed supports a subset of XML 1.0 and a subset of:
          144 
          145 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
          146 - Atom 0.3 (draft, historic).
          147 - RSS 0.90+.
          148 - RDF (when used with RSS).
          149 - MediaRSS extensions (media:).
          150 - Dublin Core extensions (dc:).
          151 
          152 Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
          153 supported by converting them to RSS/Atom or to the sfeed(5) format directly.
          154 
          155 
          156 OS tested
          157 ---------
          158 
          159 - Linux,
          160   compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
          161   libc: glibc, musl.
          162 - OpenBSD (clang, gcc).
          163 - NetBSD (with NetBSD curses).
          164 - FreeBSD
          165 - DragonFlyBSD
          166 - GNU/Hurd
          167 - Illumos (OpenIndiana).
          168 - Windows (cygwin gcc + mintty, mingw).
          169 - HaikuOS
          170 - SerenityOS
          171 - FreeDOS (djgpp, Open Watcom).
          172 - FUZIX (sdcc -mz80, with the sfeed parser program).
          173 
          174 
          175 Architectures tested
          176 --------------------
          177 
          178 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
          179 
          180 
          181 Files
          182 -----
          183 
          184 sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
          185                     in TAB-separated format to stdout.
          186 sfeed_atom        - Format feed data (TSV) to an Atom feed.
          187 sfeed_content     - View item content, for use with sfeed_curses.
          188 sfeed_curses      - Format feed data (TSV) to a curses interface.
          189 sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
          190 sfeed_gopher      - Format feed data (TSV) to Gopher files.
          191 sfeed_html        - Format feed data (TSV) to HTML.
          192 sfeed_json        - Format feed data (TSV) to JSON Feed.
          193 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
          194 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
          195 sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
          196 sfeed_mbox        - Format feed data (TSV) to mbox.
          197 sfeed_plain       - Format feed data (TSV) to a plain-text list.
          198 sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
          199 sfeed_update      - Update feeds and merge items.
          200 sfeed_web         - Find URLs to RSS/Atom feed from a webpage.
          201 sfeed_xmlenc      - Detect character-set encoding from a XML stream.
          202 sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
          203 style.css         - Example stylesheet to use with sfeed_html(1) and
          204                     sfeed_frames(1).
          205 
          206 
          207 Files read at runtime by sfeed_update(1)
          208 ----------------------------------------
          209 
          210 sfeedrc - Config file. This file is evaluated as a shellscript in
          211           sfeed_update(1).
          212 
          213 At least the following functions can be overridden per feed:
          214 
          215 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
          216 - filter: to filter on fields.
          217 - merge: to change the merge logic.
          218 - order: to change the sort order.
          219 
          220 See also the sfeedrc(5) man page documentation for more details.
          221 
          222 The feeds() function is called to process the feeds. The default feed()
          223 function is executed concurrently as a background job in your sfeedrc(5) config
          224 file to make updating faster. The variable maxjobs can be changed to limit or
          225 increase the amount of concurrent jobs (8 by default).
          226 
          227 
          228 Files written at runtime by sfeed_update(1)
          229 -------------------------------------------
          230 
          231 feedname     - TAB-separated format containing all items per feed. The
          232                sfeed_update(1) script merges new items with this file.
          233                The format is documented in sfeed(5).
          234 
          235 
          236 File format
          237 -----------
          238 
          239 man 5 sfeed
          240 man 5 sfeedrc
          241 man 1 sfeed
          242 
          243 
          244 Usage and examples
          245 ------------------
          246 
          247 Find RSS/Atom feed URLs from a webpage:
          248 
          249         url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
          250 
          251 output example:
          252 
          253         https://codemadness.org/atom.xml        application/atom+xml
          254         https://codemadness.org/atom_content.xml        application/atom+xml
          255 
          256 - - -
          257 
          258 Make sure your sfeedrc config file exists, see the sfeedrc.example file. To
          259 update your feeds (configfile argument is optional):
          260 
          261         sfeed_update "configfile"
          262 
          263 Format the feeds files:
          264 
          265         # Plain-text list.
          266         sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
          267         # HTML view (no frames), copy style.css for a default style.
          268         sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
          269         # HTML view with the menu as frames, copy style.css for a default style.
          270         mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
          271 
          272 View formatted output in your browser:
          273 
          274         $BROWSER "$HOME/.sfeed/feeds.html"
          275 
          276 View formatted output in your editor:
          277 
          278         $EDITOR "$HOME/.sfeed/feeds.txt"
          279 
          280 - - -
          281 
          282 View formatted output in a curses interface.  The interface has a look inspired
          283 by the mutt mail client.  It has a sidebar panel for the feeds, a panel with a
          284 listing of the items and a small statusbar for the selected item/URL. Some
          285 functions like searching and scrolling are integrated in the interface itself.
          286 
          287 Just like the other format programs included in sfeed you can run it like this:
          288 
          289         sfeed_curses ~/.sfeed/feeds/*
          290 
          291 ... or by reading from stdin:
          292 
          293         sfeed_curses < ~/.sfeed/feeds/xkcd
          294 
          295 By default sfeed_curses marks the items of the last day as new/bold. This limit
          296 might be overridden by setting the environment variable $SFEED_NEW_AGE to the
          297 desired maximum in seconds. To manage read/unread items in a different way a
          298 plain-text file with a list of the read URLs can be used. To enable this
          299 behaviour the path to this file can be specified by setting the environment
          300 variable $SFEED_URL_FILE to the URL file:
          301 
          302         export SFEED_URL_FILE="$HOME/.sfeed/urls"
          303         [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
          304         sfeed_curses ~/.sfeed/feeds/*
          305 
          306 It then uses the shellscript "sfeed_markread" to process the read and unread
          307 items.
          308 
          309 - - -
          310 
          311 Example script to view feed items in a vertical list/menu in dmenu(1). It opens
          312 the selected URL in the browser set in $BROWSER:
          313 
          314         #!/bin/sh
          315         url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
          316                 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
          317         test -n "${url}" && $BROWSER "${url}"
          318 
          319 dmenu can be found at: https://git.suckless.org/dmenu/
          320 
          321 - - -
          322 
          323 Generate a sfeedrc config file from your exported list of feeds in OPML
          324 format:
          325 
          326         sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
          327 
          328 - - -
          329 
          330 Export an OPML file of your feeds from a sfeedrc config file (configfile
          331 argument is optional):
          332 
          333         sfeed_opml_export configfile > myfeeds.opml
          334 
          335 - - -
          336 
          337 The filter function can be overridden in your sfeedrc file. This allows
          338 filtering items per feed. It can be used to shorten URLs, filter away
          339 advertisements, strip tracking parameters and more.
          340 
          341         # filter fields.
          342         # filter(name, url)
          343         filter() {
          344                 case "$1" in
          345                 "tweakers")
          346                         awk -F '\t' 'BEGIN { OFS = "\t"; }
          347                         # skip ads.
          348                         $2 ~ /^ADV:/ {
          349                                 next;
          350                         }
          351                         # shorten link.
          352                         {
          353                                 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
          354                                         $3 = substr($3, RSTART, RLENGTH);
          355                                 }
          356                                 print $0;
          357                         }';;
          358                 "yt BSDNow")
          359                         # filter only BSD Now from channel.
          360                         awk -F '\t' '$2 ~ / \| BSD Now/';;
          361                 *)
          362                         cat;;
          363                 esac | \
          364                         # replace youtube links with embed links.
          365                         sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
          366 
          367                         awk -F '\t' 'BEGIN { OFS = "\t"; }
          368                         function filterlink(s) {
          369                                 # protocol must start with http, https or gopher.
          370                                 if (match(s, /^(http|https|gopher):\/\//) == 0) {
          371                                         return "";
          372                                 }
          373 
          374                                 # shorten feedburner links.
          375                                 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
          376                                         s = substr($3, RSTART, RLENGTH);
          377                                 }
          378 
          379                                 # strip tracking parameters
          380                                 # urchin, facebook, piwik, webtrekk and generic.
          381                                 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
          382                                 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
          383 
          384                                 gsub(/\?&/, "?", s);
          385                                 gsub(/[\?&]+$/, "", s);
          386 
          387                                 return s
          388                         }
          389                         {
          390                                 $3 = filterlink($3); # link
          391                                 $8 = filterlink($8); # enclosure
          392 
          393                                 # try to remove tracking pixels: <img/> tags with 1px width or height.
          394                                 gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
          395 
          396                                 print $0;
          397                         }'
          398         }
          399 
          400 - - -
          401 
          402 Aggregate feeds. This filters new entries (maximum one day old) and sorts them
          403 by newest first. Prefix the feed name in the title. Convert the TSV output data
          404 to an Atom XML feed (again):
          405 
          406         #!/bin/sh
          407         cd ~/.sfeed/feeds/ || exit 1
          408 
          409         awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
          410         BEGIN {        OFS = "\t"; }
          411         int($1) >= old {
          412                 $2 = "[" FILENAME "] " $2;
          413                 print $0;
          414         }' * | \
          415         sort -k1,1rn | \
          416         sfeed_atom
          417 
          418 - - -
          419 
          420 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
          421 showing them as plain-text per line similar to sfeed_plain(1):
          422 
          423 Create a FIFO:
          424 
          425         fifo="/tmp/sfeed_fifo"
          426         mkfifo "$fifo"
          427 
          428 On the reading side:
          429 
          430         # This keeps track of unique lines so might consume much memory.
          431         # It tries to reopen the $fifo after 1 second if it fails.
          432         while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
          433 
          434 On the writing side:
          435 
          436         feedsdir="$HOME/.sfeed/feeds/"
          437         cd "$feedsdir" || exit 1
          438         test -p "$fifo" || exit 1
          439 
          440         # 1 day is old news, don't write older items.
          441         awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
          442         BEGIN { OFS = "\t"; }
          443         int($1) >= old {
          444                 $2 = "[" FILENAME "] " $2;
          445                 print $0;
          446         }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
          447 
          448 cut -b is used to trim the "N " prefix of sfeed_plain(1).
          449 
          450 - - -
          451 
          452 For some podcast feed the following code can be used to filter the latest
          453 enclosure URL (probably some audio file):
          454 
          455         awk -F '\t' 'BEGIN { latest = 0; }
          456         length($8) {
          457                 ts = int($1);
          458                 if (ts > latest) {
          459                         url = $8;
          460                         latest = ts;
          461                 }
          462         }
          463         END { if (length(url)) { print url; } }'
          464 
          465 ... or on a file already sorted from newest to oldest:
          466 
          467         awk -F '\t' '$8 { print $8; exit }'
          468 
          469 - - -
          470 
          471 Over time your feeds file might become quite big. You can archive items of a
          472 feed from (roughly) the last week by doing for example:
          473 
          474         awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
          475         mv feed feed.bak
          476         mv feed.new feed
          477 
          478 This could also be run weekly in a crontab to archive the feeds. Like throwing
          479 away old newspapers. It keeps the feeds list tidy and the formatted output
          480 small.
          481 
          482 - - -
          483 
          484 Convert mbox to separate maildirs per feed and filter duplicate messages using the
          485 fdm program.
          486 fdm is available at: https://github.com/nicm/fdm
          487 
          488 fdm config file (~/.sfeed/fdm.conf):
          489 
          490         set unmatched-mail keep
          491 
          492         account "sfeed" mbox "%[home]/.sfeed/mbox"
          493                 $cachepath = "%[home]/.sfeed/fdm.cache"
          494                 cache "${cachepath}"
          495                 $maildir = "%[home]/feeds/"
          496 
          497                 # Check if message is in the cache by Message-ID.
          498                 match case "^Message-ID: (.*)" in headers
          499                         action {
          500                                 tag "msgid" value "%1"
          501                         }
          502                         continue
          503 
          504                 # If it is in the cache, stop.
          505                 match matched and in-cache "${cachepath}" key "%[msgid]"
          506                         action {
          507                                 keep
          508                         }
          509 
          510                 # Not in the cache, process it and add to cache.
          511                 match case "^X-Feedname: (.*)" in headers
          512                         action {
          513                                 # Store to local maildir.
          514                                 maildir "${maildir}%1"
          515 
          516                                 add-to-cache "${cachepath}" key "%[msgid]"
          517                                 keep
          518                         }
          519 
          520 Now run:
          521 
          522         $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
          523         $ fdm -f ~/.sfeed/fdm.conf fetch
          524 
          525 Now you can view feeds in mutt(1) for example.
          526 
          527 - - -
          528 
          529 Read from mbox and filter duplicate messages using the fdm program and deliver
          530 it to a SMTP server. This works similar to the rss2email program.
          531 fdm is available at: https://github.com/nicm/fdm
          532 
          533 fdm config file (~/.sfeed/fdm.conf):
          534 
          535         set unmatched-mail keep
          536 
          537         account "sfeed" mbox "%[home]/.sfeed/mbox"
          538                 $cachepath = "%[home]/.sfeed/fdm.cache"
          539                 cache "${cachepath}"
          540 
          541                 # Check if message is in the cache by Message-ID.
          542                 match case "^Message-ID: (.*)" in headers
          543                         action {
          544                                 tag "msgid" value "%1"
          545                         }
          546                         continue
          547 
          548                 # If it is in the cache, stop.
          549                 match matched and in-cache "${cachepath}" key "%[msgid]"
          550                         action {
          551                                 keep
          552                         }
          553 
          554                 # Not in the cache, process it and add to cache.
          555                 match case "^X-Feedname: (.*)" in headers
          556                         action {
          557                                 # Connect to a SMTP server and attempt to deliver the
          558                                 # mail to it.
          559                                 # Of course change the server and e-mail below.
          560                                 smtp server "codemadness.org" to "hiltjo@codemadness.org"
          561 
          562                                 add-to-cache "${cachepath}" key "%[msgid]"
          563                                 keep
          564                         }
          565 
          566 Now run:
          567 
          568         $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
          569         $ fdm -f ~/.sfeed/fdm.conf fetch
          570 
          571 Now you can view feeds in mutt(1) for example.
          572 
          573 - - -
          574 
          575 Convert mbox to separate maildirs per feed and filter duplicate messages using
          576 procmail(1).
          577 
          578 procmail_maildirs.sh file:
          579 
          580         maildir="$HOME/feeds"
          581         feedsdir="$HOME/.sfeed/feeds"
          582         procmailconfig="$HOME/.sfeed/procmailrc"
          583 
          584         # message-id cache to prevent duplicates.
          585         mkdir -p "${maildir}/.cache"
          586 
          587         if ! test -r "${procmailconfig}"; then
          588                 printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
          589                 echo "See procmailrc.example for an example." >&2
          590                 exit 1
          591         fi
          592 
          593         find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
          594                 name=$(basename "${d}")
          595                 mkdir -p "${maildir}/${name}/cur"
          596                 mkdir -p "${maildir}/${name}/new"
          597                 mkdir -p "${maildir}/${name}/tmp"
          598                 printf 'Mailbox %s\n' "${name}"
          599                 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
          600         done
          601 
          602 Procmailrc(5) file:
          603 
          604         # Example for use with sfeed_mbox(1).
          605         # The header X-Feedname is used to split into separate maildirs. It is
          606         # assumed this name is sane.
          607 
          608         MAILDIR="$HOME/feeds/"
          609 
          610         :0
          611         * ^X-Feedname: \/.*
          612         {
          613                 FEED="$MATCH"
          614 
          615                 :0 Wh: "msgid_$FEED.lock"
          616                 | formail -D 1024000 ".cache/msgid_$FEED.cache"
          617 
          618                 :0
          619                 "$FEED"/
          620         }
          621 
          622 Now run:
          623 
          624         $ procmail_maildirs.sh
          625 
          626 Now you can view feeds in mutt(1) for example.
          627 
          628 - - -
          629 
          630 The fetch function can be overridden in your sfeedrc file. This allows to
          631 replace the default curl(1) for sfeed_update with any other client to fetch the
          632 RSS/Atom data or change the default curl options:
          633 
          634         # fetch a feed via HTTP/HTTPS etc.
          635         # fetch(name, url, feedfile)
          636         fetch() {
          637                 hurl -m 1048576 -t 15 "$2" 2>/dev/null
          638         }
          639 
          640 - - -
          641 
          642 Caching, incremental data updates and bandwidth-saving
          643 
          644 For servers that support it some incremental updates and bandwidth-saving can
          645 be done by using the "ETag" HTTP header.
          646 
          647 Create a directory for storing the ETags per feed:
          648 
          649         mkdir -p ~/.sfeed/etags/
          650 
          651 The curl ETag options (--etag-save and --etag-compare) can be used to store and
          652 send the previous ETag header value. curl version 7.73+ is recommended for it
          653 to work properly.
          654 
          655 The curl -z option can be used to send the modification date of a local file as
          656 a HTTP "If-Modified-Since" request header. The server can then respond if the
          657 data is modified or not or respond with only the incremental data.
          658 
          659 The curl --compressed option can be used to indicate the client supports
          660 decompression. Because RSS/Atom feeds are textual XML content this generally
          661 compresses very well.
          662 
          663 These options can be set by overriding the fetch() function in the sfeedrc
          664 file:
          665 
          666         # fetch(name, url, feedfile)
          667         fetch() {
          668                 etag="$HOME/.sfeed/etags/$(basename "$3")"
          669                 curl \
          670                         -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
          671                         --compressed \
          672                         --etag-save "${etag}" --etag-compare "${etag}" \
          673                         -z "${etag}" \
          674                         "$2" 2>/dev/null
          675         }
          676 
          677 These options can come at a cost of some privacy, because it exposes
          678 additional metadata from the previous request.
          679 
          680 - - -
          681 
          682 CDNs blocking requests due to a missing HTTP User-Agent request header
          683 
          684 sfeed_update will not send the "User-Agent" header by default for privacy
          685 reasons.  Some CDNs like Cloudflare or websites like Reddit.com don't like this
          686 and will block such HTTP requests.
          687 
          688 A custom User-Agent can be set by using the curl -H option, like so:
          689 
          690         curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
          691 
          692 The above example string pretends to be a Windows 10 (x86-64) machine running
          693 Firefox 78.
          694 
          695 - - -
          696 
          697 Page redirects
          698 
          699 For security and efficiency reasons by default redirects are not allowed and
          700 are treated as an error.
          701 
          702 For example to prevent hijacking an unencrypted http:// to https:// redirect or
          703 to not add time of an unnecessary page redirect each time.  It is encouraged to
          704 use the final redirected URL in the sfeedrc config file.
          705 
          706 If you want to ignore this advise you can override the fetch() function in the
          707 sfeedrc file and change the curl options "-L --max-redirs 0".
          708 
          709 - - -
          710 
          711 Shellscript to handle URLs and enclosures in parallel using xargs -P.
          712 
          713 This can be used to download and process URLs for downloading podcasts,
          714 webcomics, download and convert webpages, mirror videos, etc. It uses a
          715 plain-text cache file for remembering processed URLs. The match patterns are
          716 defined in the shellscript fetch() function and in the awk script and can be
          717 modified to handle items differently depending on their context.
          718 
          719 The arguments for the script are files in the sfeed(5) format. If no file
          720 arguments are specified then the data is read from stdin.
          721 
          722         #!/bin/sh
          723         # sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
          724         # Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
          725         
          726         cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
          727         jobs="${SFEED_JOBS:-4}"
          728         lockfile="${HOME}/.sfeed/sfeed_download.lock"
          729         
          730         # log(feedname, s, status)
          731         log() {
          732                 if [ "$1" != "-" ]; then
          733                         s="[$1] $2"
          734                 else
          735                         s="$2"
          736                 fi
          737                 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
          738         }
          739         
          740         # fetch(url, feedname)
          741         fetch() {
          742                 case "$1" in
          743                 *youtube.com*)
          744                         yt-dlp "$1";;
          745                 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
          746                         # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
          747                         curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
          748                 esac
          749         }
          750         
          751         # downloader(url, title, feedname)
          752         downloader() {
          753                 url="$1"
          754                 title="$2"
          755                 feedname="${3##*/}"
          756         
          757                 msg="${title}: ${url}"
          758         
          759                 # download directory.
          760                 if [ "${feedname}" != "-" ]; then
          761                         mkdir -p "${feedname}"
          762                         if ! cd "${feedname}"; then
          763                                 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
          764                                 return 1
          765                         fi
          766                 fi
          767         
          768                 log "${feedname}" "${msg}" "START"
          769                 if fetch "${url}" "${feedname}"; then
          770                         log "${feedname}" "${msg}" "OK"
          771         
          772                         # append it safely in parallel to the cachefile on a
          773                         # successful download.
          774                         (flock 9 || exit 1
          775                         printf '%s\n' "${url}" >> "${cachefile}"
          776                         ) 9>"${lockfile}"
          777                 else
          778                         log "${feedname}" "${msg}" "FAIL" >&2
          779                         return 1
          780                 fi
          781                 return 0
          782         }
          783         
          784         if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
          785                 # Downloader helper for parallel downloading.
          786                 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
          787                 # It should write the URI to the cachefile if it is successful.
          788                 downloader "$1" "$2" "$3"
          789                 exit $?
          790         fi
          791         
          792         # ...else parent mode:
          793         
          794         tmp="$(mktemp)" || exit 1
          795         trap "rm -f ${tmp}" EXIT
          796         
          797         [ -f "${cachefile}" ] || touch "${cachefile}"
          798         cat "${cachefile}" > "${tmp}"
          799         echo >> "${tmp}" # force it to have one line for awk.
          800         
          801         LC_ALL=C awk -F '\t' '
          802         # fast prefilter what to download or not.
          803         function filter(url, field, feedname) {
          804                 u = tolower(url);
          805                 return (match(u, "youtube\\.com") ||
          806                         match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
          807         }
          808         function download(url, field, title, filename) {
          809                 if (!length(url) || urls[url] || !filter(url, field, filename))
          810                         return;
          811                 # NUL-separated for xargs -0.
          812                 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
          813                 urls[url] = 1; # print once
          814         }
          815         {
          816                 FILENR += (FNR == 1);
          817         }
          818         # lookup table from cachefile which contains downloaded URLs.
          819         FILENR == 1 {
          820                 urls[$0] = 1;
          821         }
          822         # feed file(s).
          823         FILENR != 1 {
          824                 download($3, 3, $2, FILENAME); # link
          825                 download($8, 8, $2, FILENAME); # enclosure
          826         }
          827         ' "${tmp}" "${@:--}" | \
          828         SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
          829 
          830 - - -
          831 
          832 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
          833 TSV format.
          834 
          835         #!/bin/sh
          836         # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
          837         # The data is split per file per feed with the name of the newsboat title/url.
          838         # It writes the URLs of the read items line by line to a "urls" file.
          839         #
          840         # Dependencies: sqlite3, awk.
          841         #
          842         # Usage: create some directory to store the feeds then run this script.
          843         
          844         # newsboat cache.db file.
          845         cachefile="$HOME/.newsboat/cache.db"
          846         test -n "$1" && cachefile="$1"
          847         
          848         # dump data.
          849         # .mode ascii: Columns/rows delimited by 0x1F and 0x1E
          850         # get the first fields in the order of the sfeed(5) format.
          851         sqlite3 "$cachefile" <<!EOF |
          852         .headers off
          853         .mode ascii
          854         .output
          855         SELECT
          856                 i.pubDate, i.title, i.url, i.content, i.content_mime_type,
          857                 i.guid, i.author, i.enclosure_url,
          858                 f.rssurl AS rssurl, f.title AS feedtitle, i.unread
          859                 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
          860         FROM rss_feed f
          861         INNER JOIN rss_item i ON i.feedurl = f.rssurl
          862         ORDER BY
          863                 i.feedurl ASC, i.pubDate DESC;
          864         .quit
          865         !EOF
          866         # convert to sfeed(5) TSV format.
          867         LC_ALL=C awk '
          868         BEGIN {
          869                 FS = "\x1f";
          870                 RS = "\x1e";
          871         }
          872         # normal non-content fields.
          873         function field(s) {
          874                 gsub("^[[:space:]]*", "", s);
          875                 gsub("[[:space:]]*$", "", s);
          876                 gsub("[[:space:]]", " ", s);
          877                 gsub("[[:cntrl:]]", "", s);
          878                 return s;
          879         }
          880         # content field.
          881         function content(s) {
          882                 gsub("^[[:space:]]*", "", s);
          883                 gsub("[[:space:]]*$", "", s);
          884                 # escape chars in content field.
          885                 gsub("\\\\", "\\\\", s);
          886                 gsub("\n", "\\n", s);
          887                 gsub("\t", "\\t", s);
          888                 return s;
          889         }
          890         function feedname(feedurl, feedtitle) {
          891                 if (feedtitle == "") {
          892                         gsub("/", "_", feedurl);
          893                         return feedurl;
          894                 }
          895                 gsub("/", "_", feedtitle);
          896                 return feedtitle;
          897         }
          898         {
          899                 fname = feedname($9, $10);
          900                 if (!feed[fname]++) {
          901                         print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
          902                 }
          903         
          904                 contenttype = field($5);
          905                 if (contenttype == "")
          906                         contenttype = "html";
          907                 else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
          908                         contenttype = "html";
          909                 else
          910                         contenttype = "plain";
          911         
          912                 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
          913                         contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
          914                         > fname;
          915         
          916                 # write URLs of the read items to a file line by line.
          917                 if ($11 == "0") {
          918                         print $3 > "urls";
          919                 }
          920         }'
          921 
          922 - - -
          923 
          924 Progress indicator
          925 ------------------
          926 
          927 The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
          928 config.  It then calls sfeed_update and pipes the output lines to a function
          929 that counts the current progress. It writes the total progress to stderr.
          930 Alternative: pv -l -s totallines
          931 
          932         #!/bin/sh
          933         # Progress indicator script.
          934         
          935         # Pass lines as input to stdin and write progress status to stderr.
          936         # progress(totallines)
          937         progress() {
          938                 total="$(($1 + 0))" # must be a number, no divide by zero.
          939                 test "${total}" -le 0 -o "$1" != "${total}" && return
          940         LC_ALL=C awk -v "total=${total}" '
          941         {
          942                 counter++;
          943                 percent = (counter * 100) / total;
          944                 printf("\033[K") > "/dev/stderr"; # clear EOL
          945                 print $0;
          946                 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
          947                 fflush(); # flush all buffers per line.
          948         }
          949         END {
          950                 printf("\033[K") > "/dev/stderr";
          951         }'
          952         }
          953         
          954         # Counts the feeds from the sfeedrc config.
          955         countfeeds() {
          956                 count=0
          957         . "$1"
          958         feed() {
          959                 count=$((count + 1))
          960         }
          961                 feeds
          962                 echo "${count}"
          963         }
          964         
          965         config="${1:-$HOME/.sfeed/sfeedrc}"
          966         total=$(countfeeds "${config}")
          967         sfeed_update "${config}" 2>&1 | progress "${total}"
          968 
          969 - - -
          970 
          971 Counting unread and total items
          972 -------------------------------
          973 
          974 It can be useful to show the counts of unread items, for example in a
          975 windowmanager or statusbar.
          976 
          977 The below example script counts the items of the last day in the same way the
          978 formatting tools do:
          979 
          980         #!/bin/sh
          981         # Count the new items of the last day.
          982         LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
          983         {
          984                 total++;
          985         }
          986         int($1) >= old {
          987                 totalnew++;
          988         }
          989         END {
          990                 print "New:   " totalnew;
          991                 print "Total: " total;
          992         }' ~/.sfeed/feeds/*
          993 
          994 The below example script counts the unread items using the sfeed_curses URL
          995 file:
          996 
          997         #!/bin/sh
          998         # Count the unread and total items from feeds using the URL file.
          999         LC_ALL=C awk -F '\t' '
         1000         # URL file: amount of fields is 1.
         1001         NF == 1 {
         1002                 u[$0] = 1; # lookup table of URLs.
         1003                 next;
         1004         }
         1005         # feed file: check by URL or id.
         1006         {
         1007                 total++;
         1008                 if (length($3)) {
         1009                         if (u[$3])
         1010                                 read++;
         1011                 } else if (length($6)) {
         1012                         if (u[$6])
         1013                                 read++;
         1014                 }
         1015         }
         1016         END {
         1017                 print "Unread: " (total - read);
         1018                 print "Total:  " total;
         1019         }' ~/.sfeed/urls ~/.sfeed/feeds/*
         1020 
         1021 - - -
         1022 
         1023 sfeed.c: adding new XML tags or sfeed(5) fields to the parser
         1024 -------------------------------------------------------------
         1025 
         1026 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV
         1027 fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a
         1028 number.  This TagId is then mapped to the output field index.
         1029 
         1030 Steps to modify the code:
         1031 
         1032 * Add a new TagId enum for the tag.
         1033 
         1034 * (optional) Add a new FeedField* enum for the new output field or you can map
         1035   it to an existing field.
         1036 
         1037 * Add the new XML tag name to the array variable of parsed RSS or Atom
         1038   tags: rsstags[] or atomtags[].
         1039 
         1040   These must be defined in alphabetical order, because a binary search is used
         1041   which uses the strcasecmp() function.
         1042 
         1043 * Add the parsed TagId to the output field in the array variable fieldmap[].
         1044 
         1045   When another tag is also mapped to the same output field then the tag with
         1046   the highest TagId number value overrides the mapped field: the order is from
         1047   least important to high.
         1048 
         1049 * If this defined tag is just using the inner data of the XML tag, then this
         1050   definition is enough. If it for example has to parse a certain attribute you
         1051   have to add a check for the TagId to the xmlattr() callback function.
         1052 
         1053 * (optional) Print the new field in the printfields() function.
         1054 
         1055 Below is a patch example to add the MRSS "media:content" tag as a new field:
         1056 
         1057 diff --git a/sfeed.c b/sfeed.c
         1058 --- a/sfeed.c
         1059 +++ b/sfeed.c
         1060 @@ -50,7 +50,7 @@ enum TagId {
         1061          RSSTagGuidPermalinkTrue,
         1062          /* must be defined after GUID, because it can be a link (isPermaLink) */
         1063          RSSTagLink,
         1064 -        RSSTagEnclosure,
         1065 +        RSSTagMediaContent, RSSTagEnclosure,
         1066          RSSTagAuthor, RSSTagDccreator,
         1067          RSSTagCategory,
         1068          /* Atom */
         1069 @@ -81,7 +81,7 @@ typedef struct field {
         1070  enum {
         1071          FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent,
         1072          FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory,
         1073 -        FeedFieldLast
         1074 +        FeedFieldMediaContent, FeedFieldLast
         1075  };
         1076  
         1077  typedef struct feedcontext {
         1078 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
         1079          { STRP("enclosure"),         RSSTagEnclosure         },
         1080          { STRP("guid"),              RSSTagGuid              },
         1081          { STRP("link"),              RSSTagLink              },
         1082 +        { STRP("media:content"),     RSSTagMediaContent      },
         1083          { STRP("media:description"), RSSTagMediaDescription  },
         1084          { STRP("pubdate"),           RSSTagPubdate           },
         1085          { STRP("title"),             RSSTagTitle             }
         1086 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
         1087          [RSSTagGuidPermalinkFalse] = FeedFieldId,
         1088          [RSSTagGuidPermalinkTrue]  = FeedFieldId, /* special-case: both a link and an id */
         1089          [RSSTagLink]               = FeedFieldLink,
         1090 +        [RSSTagMediaContent]       = FeedFieldMediaContent,
         1091          [RSSTagEnclosure]          = FeedFieldEnclosure,
         1092          [RSSTagAuthor]             = FeedFieldAuthor,
         1093          [RSSTagDccreator]          = FeedFieldAuthor,
         1094 @@ -677,6 +679,8 @@ printfields(void)
         1095          string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
         1096          putchar(FieldSeparator);
         1097          string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
         1098 +        putchar(FieldSeparator);
         1099 +        string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
         1100          putchar('\n');
         1101  
         1102          if (ferror(stdout)) /* check for errors but do not flush */
         1103 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl,
         1104          }
         1105  
         1106          if (ctx.feedtype == FeedTypeRSS) {
         1107 -                if (ctx.tag.id == RSSTagEnclosure &&
         1108 +                if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) &&
         1109                      isattr(n, nl, STRP("url"))) {
         1110                          string_append(&tmpstr, v, vl);
         1111                  } else if (ctx.tag.id == RSSTagGuid &&
         1112 
         1113 - - -
         1114 
         1115 Running custom commands inside the sfeed_curses program
         1116 -------------------------------------------------------
         1117 
         1118 Running commands inside the sfeed_curses program can be useful for example to
         1119 sync items or mark all items across all feeds as read. It can be comfortable to
         1120 have a keybind for this inside the program to perform a scripted action and
         1121 then reload the feeds by sending the signal SIGHUP.
         1122 
         1123 In the input handling code you can then add a case:
         1124 
         1125         case 'M':
         1126                 forkexec((char *[]) { "markallread.sh", NULL }, 0);
         1127                 break;
         1128 
         1129 or
         1130 
         1131         case 'S':
         1132                 forkexec((char *[]) { "syncnews.sh", NULL }, 1);
         1133                 break;
         1134 
         1135 The specified script should be in $PATH or be an absolute path.
         1136 
         1137 Example of a `markallread.sh` shellscript to mark all URLs as read:
         1138 
         1139         #!/bin/sh
         1140         # mark all items/URLs as read.
         1141         tmp="$(mktemp)" || exit 1
         1142         (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
         1143         awk '!x[$0]++' > "$tmp" &&
         1144         mv "$tmp" ~/.sfeed/urls &&
         1145         pkill -SIGHUP sfeed_curses # reload feeds.
         1146 
         1147 Example of a `syncnews.sh` shellscript to update the feeds and reload them:
         1148 
         1149         #!/bin/sh
         1150         sfeed_update
         1151         pkill -SIGHUP sfeed_curses
         1152 
         1153 
         1154 Running programs in a new session
         1155 ---------------------------------
         1156 
         1157 By default processes are spawned in the same session and process group as
         1158 sfeed_curses.  When sfeed_curses is closed this can also close the spawned
         1159 process in some cases.
         1160 
         1161 When the setsid command-line program is available the following wrapper command
         1162 can be used to run the program in a new session, for a plumb program:
         1163 
         1164         setsid -f xdg-open "$@"
         1165 
         1166 Alternatively the code can be changed to call setsid() before execvp().
         1167 
         1168 
         1169 Open an URL directly in the same terminal
         1170 -----------------------------------------
         1171 
         1172 To open an URL directly in the same terminal using the text-mode lynx browser:
         1173 
         1174         SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
         1175 
         1176 
         1177 Yank to tmux buffer
         1178 -------------------
         1179 
         1180 This changes the yank command to set the tmux buffer, instead of X11 xclip:
         1181 
         1182         SFEED_YANKER="tmux set-buffer \`cat\`"
         1183 
         1184 
         1185 Known terminal issues
         1186 ---------------------
         1187 
         1188 Below lists some bugs or missing features in terminals that are found while
         1189 testing sfeed_curses.  Some of them might be fixed already upstream:
         1190 
         1191 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
         1192   scrolling.
         1193 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
         1194   middle-button, right-button is incorrect / reversed.
         1195 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
         1196   window title.
         1197 - Mouse button encoding for extended buttons (like side-buttons) in some
         1198   terminals are unsupported or map to the same button: for example side-buttons 7
         1199   and 8 map to the scroll buttons 4 and 5 in urxvt.
         1200 
         1201 
         1202 License
         1203 -------
         1204 
         1205 ISC, see LICENSE file.
         1206 
         1207 
         1208 Author
         1209 ------
         1210 
         1211 Hiltjo Posthuma <hiltjo@codemadness.org>