BBC NEWS... WITHOUT THE CRAP
       
       2024-03-09
       
       Did I mention recently that I love RSS? That it brings me great joy? That I
       start and finish almost every day in my feed reader? Probably.
       
       I used to have a single minor niggle with the BBC News RSS feed: that it
       included sports news, which I didn't care about. So I wrote a script that
       downloaded it, stripped sports news, and re-exported the feed for me to
       subscribe to. Magic.
       
 (IMG) RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.
       
       But lately - presumably as a result of technical changes at the Beeb's side -
       this feed has found two fresh ways to annoy me:
       * The feed now re-publishes a story if it gets re-promoted to the front
       page... but with a different <guid> (it appears to get a #0 after it when
       first published, a #1 the second time, and so on). In a typical day the feed
       reader might scoop up new stories about once an hour, any by the time I get to
       reading them the same exact story might appear in my reader multiple times.
       Ugh.
       * They've started adding iPlayer and BBC Sounds content to the BBC News feed.
       I don't follow BBC News in my feed reader because I want to watch or listen to
       things. If you do, that's fine, but I don't, and I'd rather filter this
       content out.
       
       Luckily, I already have a recipe for improving this feed, thanks to my prior
       work. Let's look at my newly-revised script (also available on GitHub):
       
       #!/usr/bin/env ruby
       require 'bundler/inline'
       
       # SAMPLE CRONTAB:
       
       # AT 41 MINUTES PAST EACH HOUR, RUN THE SCRIPT AND LOG THE RESULTS
       
       */20 * * * * ~/BBC-NEWS-RSS-FILTER-SPORT-OUT.RB >
       ~/BBC-NEWS-RSS-FILTER-SPORT-OUT.LOG 2>>&1
       
       DEPENDENCIES:
       
       * OPEN-URI - LOAD REMOTE URL CONTENT EASILY
       
       * NOKOGIRI - PARSE/FILTER XML
       
       gemfile do
         source 'https://rubygems.org'
         gem 'nokogiri'
       end
       require 'open-uri'
       
       REGULAR EXPRESSION DESCRIBING THE GUIDS TO REJECT FROM THE RESULTING RSS FEED
       
       WE WANT TO DROP EVERYTHING FROM THE "SPORT" SECTION OF THE WEBSITE, ALSO ANY
       IPLAYER/SOUNDS LINKS
       
       REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//
       
       LOAD AND FILTER THE ORIGINAL RSS
       
       rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
       rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING
       }.each(&:unlink)
       
       STRIP THE ANCHORS OFF THE S: BBC NEWS "REPUBLISHES" STORIES BY USING GUIDS
       WITH #0, #1, #2 ETC, WHICH RESULTS IN DUPLICATES IN FEED READERS
       
       rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}
       
       File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
       
       It's amazing what you can do with Nokogiri and a half dozen lines of Ruby.
       
       That revised script removes from the feed anything whose <guid> suggests it's
       sports news or from BBC Sounds or iPlayer, and also strips any "anchor" part
       of the <guid> before re-exporting the feed. Much better. (Strictly speaking,
       this can result in a technically-invalid feed by introducing duplicates, but
       your feed reader oughta be smart enough to compensate for and ignore that:
       mine certainly is!)
       
       You're free to take and adapt the script to your own needs, or - if you don't
       mind being tied to my opinions about what should be in BBC News' RSS feed -
       just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml
       
       LINKS
       
 (DIR) My very recent blog post about how RSS is better than ActivityPub.
 (DIR) My blog post about using RSS for joy, and not persuing "RSS Zero".
 (HTM) My 2021 blog note about starting and ending my days in FreshRSS.
 (HTM) My blog post about scripting-out sport from BBC News' RSS feed.
 (HTM) My Ruby script for filtering out the kinds of BBC News content I don't want to see right out of their RSS feed.
 (HTM) FreshRSS: my favourite RSS reader
 (HTM) Https://fox.q-t-a.uk/bbc-news-no-sport.xml