CROSSBOW(7) Miscellaneous Information Manual (urm) CROSSBOW(7) NAME crossbow-cookbook ? examples of handling feeds SYNOPSIS crossbow set [...] DESCRIPTION This manual page contains short recipes describing common usage patterns for the crossbow feed aggregator. In all the following examples we will assume that the $ID environment variable is defined as an arbitrary feed identifier, and that the $URL environment variable is defined as the feed URL. EXAMPLES Simple local mail notification We want a periodic bulk notification about updates availability. The following feed set up can be used for this purpose: crossbow set -i "$ID" -u "$URL" \ -o pretty \ -f "updates from $ID:\n title: %t\n link: %l\n" The invocation of crossbow-fetch(1) will emit on stdout(3) a "record" like the following for each new item: updates from foobar: title: Today is a good day link: http://example.com/today-is-a-good-day The user can schedule on cron(8) a periodic invocation of crossbow-fetch(1). Assuming that local mail delivery is enabled, and since any output of a cronjob is emailed to the owner of the crontab(5), the user will receive an email having as body the concatenation of the records. Keep a local HTML file collection Let's consider the case of a feed for which the item's description field reports the whole article in HTML format. Individual articles need to be stored in a separate HTML file under a certain directory on the filesystem. The following feed set up can be used for this purpose: crossbow set -i "$ID" -u "$URL" \ -o pipe \ -f "sed -n w%n.html" \ -C /some/destination/path/ The invocation of crossbow-fetch(1) will spawn one sed(1) process for each new item. The item description will be piped to sed(1), which in turn will write it on a file (w command). The output files will be named 000000.html, 000001.html, 000002.html ..., since %n is expanded with an incremental numeric value. See crossbow-outfmt(5). Security remark: Unless the feed is trusted, it is strongly discouraged to use anything but %n to name files. Consider for example the case where %t is used instead of %n, and the title of a post is ../../../../home/user/public_html/index Security remark: We are using the w command of sed(1) to write to a file. It is not possible to use shell redirection since sub-commands are never executed through a shell interpreter. Invoking a shell interpreter from a command template is strongly discouraged, since the placeholders would be directly mixed with the shell script, and doing proper shell escaping against untrusted input is really hard, if not impossible. It is on the other hand safe to invoke a shell script whose code lives in a file and pass parameters to it. See crossbow-outfmt(5). Download the full article This scenario is similar to the previous one, except that the item description contains only part of the content, or nothing at all. The link field contains a valid URL, which is intended to be reached by means of a browser. In this case we can leverage curl(1) to do the retrieval: crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "curl -o %n.html %l" -C /some/destination/path/ Remark: Placeholders such as %n and %l do not need to be quoted: they are handled safely even when their expansions contain whitespaces. One mail per item We want to turn individual feed items into plain (HTML-free) text messages delivered via email. Our goal can be achieved by means of a generic shell script like the following: #!/bin/sh set -e feed_title="$1" post_title="$2" link="$3" lynx "${link:--stdin}" -dump -force_html | sed "s/^~/~~/" | # Escape dangerous tilde expressions mail -s "${feed_title:+${feed_title}: }${post_title:-...}" "${USER:?}" The script can be installed in the PATH, e.g. as /usr/local/bin/crossbow-to-mail, and then integrated in crossbow(1) as follows: ? If the feed provides the whole content as item description: crossbow set -i "$ID" -u "$URL" \ -o pipe \ -f "crossbow-to-mail %ft %t" ? If the feed provides only the URL of the article as item link: crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "crossbow-to-mail %ft %t %l" Remark: The crossbow-to-mail script depends on the excellent lynx(1) browser to download and parse the HTML into textual form. Security remark: The "s/^~/~~/" sed(1) command prevents tilde escapes to be honored by unsafe implementations of mail(1). The mutt(1) mail user agent, if available, can be used as a safer drop-in replacement. Follow YouTube user, channel or playlist The YouTube site provides feeds for users, channels and playlists. Each of these entities is assigned with a unique identifier which can be easily obtained by looking at the web URL. Once the user, channel or playlist identifier is known, it is trivial to obtain the corresponding feeds: ? https://youtube.com/feeds/videos.xml?user=user ? https://youtube.com/feeds/videos.xml?channel_id=channel ? https://youtube.com/feeds/videos.xml?playlist_id=playlist It is possible to combine crossbow(1) with the youtube-dl(1) tool, to maintain up to date a local collection of video or audio files. What follows is a convenient wrapper script that ensures proper file naming: #!/bin/sh link="${1:?mandatory argument missing: link}" incremental_id="${2:?mandatory argument missing: incremental id}" format="$3" # Transform a title in a reasonably safe 'slug' slugify() { tr -d \\n | # explicitly drop new-lines tr /[:punct:][:space:] . | # turn all sneaky chars into dots tr -cs [:alnum:] # squeeze ugly repetitions } fname="$( youtube-dl \ --get-filename \ -o "%(id)s_%(title)s.%(ext)s" \ "$link" )" || exit 1 youtube-dl \ ${format:+-f "$format"} \ -o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \ --no-progress \ "$link" Once again, the script can be installed in the PATH, e.g. as /usr/local/bin/crossbow-ytdl And then integrated in crossbow(1) as follows: ? To save each published item: crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "crossbow-ytdl %l %n" \ -C /some/destination/path ? To save each published item as audio: crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "crossbow-ytdl %l %n bestaudio" \ -C /some/destination/path SEE ALSO crossbow-fetch(1), crossbow-set(1), lynx(1), sed(1), youtube-dl(1), crontab(5), cron(8) AUTHORS Giovanni Simoni July 11, 2020