CROSSBOW(7) Miscellaneous Information Manual (urm) CROSSBOW(7) NAME crossbow-cookbook ? cookbookish examples of crossbow(1) usage DESCRIPTION This manual page contains short recipes demonstrating how to use the crossbow feed aggregator. Table of contents 1. Simple local mail notification 2. Incremental files collection 3. Download the full article 4. One mail per entry 5. Maintain a local multimedia collection EXAMPLES Simple local mail notification We want a periodic notification, via local mail, of the availability of new stories on a site. The configuration in crossbow.conf(5) would look like this: feed debian_micro url https://micronews.debian.org/feeds/feed.rss format %ft: %l\n The invocation of crossbow(1) will emit on stdout(3) a line like the following for each new item: Debian micronews: https://micronews.debian.org/....html By placing the following string in a crontab(5), a check for updates will be run automatically every two hours: 0 0-23/2 * * * crossbow Assuming that local mail delivery is enabled, and since the output of a cronjob is mailed to the owner of the crontab(5), the user will receive a mail with one line for each entry that appeared in the last two hours. Incremental files collection Let's consider a feed whose XML reports the whole article for each entry. We want to store individual articles in a separate file, under a specific directory on the filesystem. The configuration in crossbow.conf(5) would look like this: feed cosmic.voyage url gopher://cosmic.voyage:70/0/atom.xml handler pipe command sed -n w%n.txt chdir ~/scifi_stories/cosmic.voyage/ The invocation of crossbow(1) will spawn one sed(1) process for each new entry. The content, corresponding to the %d placeholder, will be piped to the subprocess. This in turn will write it on the specified file (w command), but not on stdout(3) (-n flag). As a result, the ~/scifi_stories/cosmic.voyage directory will be populated with files named 000000.txt, 000001.txt, 000002.txt, ...etc, since %n is expanded with an incremental numeric value. See crossbow-format(5). Security remark: unless the feed is trusted, it is strongly discouraged to name filesystem paths after entry properties others than %n. Consider for example the case where %t is used as a file name, and the title of a post is something like ../../.profile. %n is safe to use, since its value is not dependent on the feed content. Download the full article This scenario is similar to the previous one, but it tackles the situation where the feed entry does not contain the full content, while the entry's link field contains a valid URL, which is intended to be reached by means of a web browser. In this case we can leverage curl(1) to do the retrieval: feed debian_micro url https://micronews.debian.org/feeds/feed.rss handler exec command curl -o %n.html %l chdir ~/debian_micronews/ The "%n" and "%l" placeholders do not need to be quoted: they are handled safely even when their expansions contain white spaces. See crossbow-format(5). It is of course possible to use any non-interactive download manager in place of curl(1), or maybe a specialized script that fetches the entry link and scrapes the content out of it. One mail per entry We want to turn individual feed entries into plain (HTML-free) text messages, and deliver them via email. Our goal can be achieved by means of a generic shell script like the following: #!/bin/sh set -e feed_title="$1" post_title="$2" link="$3" lynx "${link:--stdin}" -dump -force_html | sed "s/^~/~~/" | # Escape dangerous tilde expressions mail -s "${feed_title:+${feed_title}: }${post_title:-...}" "${USER:?}" The script can be installed in the PATH, e.g. as /usr/local/bin/crossbow-to-mail, and then integrated in crossbow(1) as follows: ? If the tracked feed encloses the whole content in the XML: feed debian_micro url https://micronews.debian.org/feeds/feed.rss handler pipe command crossbow-to-mail %ft %t ? If the feed entries only relay the link to the article: feed lobsters.c url https://lobste.rs/t/c.rss handler exec command crossbow-to-mail %ft %t %l Note: The crossbow-to-mail script leverages lynx(1) to download and parse the HTML into textual form. Any other Security remark: The "s/^~/~~/" sed(1) regex prevents accidental or malicious tilde escapes from being interpreted by the mail(1) program. The mutt(1) mail user agent, if available, can be used as a safer drop-in replacement. Maintain a local multimedia collection Many sites specialized in multimedia delivery can be scraped using tools such as youtube-dl(1). If the web site allows the subscription of a feed, crossbow(1) can be combined with these tools in order to maintain incrementally a local collection of files. For example, YouTube provides feeds for users, channels and playlists. Each of these entities is assigned with a unique identifier, which can be easily figured by looking at the web URL. ? Given a user identifier UID, the feed is https://youtube.com/feeds/videos.xml?user=UID ? Given a channel identifier CID, the feed is https://youtube.com/feeds/videos.xml?channel_id=CID ? Given a playlist identifier PID, the feed is https://youtube.com/feeds/videos.xml?playlist_id=PID What follows is a convenient wrapper script that ensures proper file naming (although it is always wiser to use %n, as explained above): #!/bin/sh link="${1:?missing link}" incremental_id="${2:?missing incremental id}" format="$3" # Transform a title in a reasonably safe 'slug' slugify() { tr -d \\n | # explicitly drop new-lines tr /[:punct:][:space:] . | # turn all sly chars into dots tr -cs [:alnum:] # squeeze repetitions } fname="$( youtube-dl \ --get-filename \ -o "%(id)s_%(title)s.%(ext)s" \ "$link" )" || exit 1 youtube-dl \ ${format:+-f "$format"} \ -o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \ --no-progress \ "$link" Once again, the script can be installed in the PATH, e.g. as /usr/local/bin/crossbow-ytdl, and then integrated in crossbow(1) as follows: ? To save each published video: feed computerophile url https://youtube.com/feeds/videos.xml?user=Computerphile handler exec command crossbow-ytdl %l %n ? To save only the audio of each published video: feed nodumb url https://youtube.com/feeds/videos.xml?channel_id=UCVnIvJuTZqM5nnwGFpA57_Q handler exec command crossbow-ytdl %l %n SEE ALSO crossbow(1), lynx(1), sed(1), youtube-dl(1), crontab(5), cron(8) AUTHORS Giovanni Simoni October 9, 2021