tccr.it

       Parse json instead of HTML to retrive title and URL of the last articles. - gophercgis - Collection of gopher CGI/DCGI for geomyidae
 (HTM) hg clone https://bitbucket.org/iamleot/gophercgis
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) changeset ea8197da47392d16091d275ab2e16cebd8d71a16
 (DIR) parent 621411a3bc8c7ae7e520d594581d6a16f4ff6360
 (HTM) Author: Leonardo Taccari <iamleot@gmail.com>
       Date:   Sun, 26 Aug 2018 14:08:34 
       
       Parse json instead of HTML to retrive title and URL of the last articles.
       
       XXX: Only recent articles section is implemented at the moment, the URL does
       XXX: not seem valid for other sections.
       XXX: Pagination is still not implemented ATM but it should be easier to
       XXX: implement now.
       
       Diffstat:
        internazionale/sections.dcgi |  46 ++++++++++++++-----------------------------
        1 files changed, 15 insertions(+), 31 deletions(-)
       ---
       diff -r 621411a3bc8c -r ea8197da4739 internazionale/sections.dcgi
       --- a/internazionale/sections.dcgi      Sun Aug 26 11:45:36 2018 +0200
       +++ b/internazionale/sections.dcgi      Sun Aug 26 14:08:34 2018 +0200
       @@ -1,24 +1,17 @@
        #!/bin/sh
        
       -#
       -# It seems that in order to enable pagination the following HTTP GET requests
       -# are done:
       -# 
       -#  <https://data.internazionale.it/stream_data/items/ultimi-articoli/0/0/$(date +'%Y-%m-%d_%H-%M-%S').json>
       -# 
       -# Instead of scraping the HTML page only for the last articles this can be
       -# reused in order to get more data to build the DCGI and to enable
       -# pagination.
       -#
       -
        
        ARTICLE_CGI="/cgi/internazionale/article.cgi"
        
        
        section="$2"
        case "${section}" in
       -       ultimi-articoli | i-piu-letti | reportage | opinioni | savagelove )
       -               url="https://www.internazionale.it/${section}"
       +       ultimi-articoli)
       +               url="https://data.internazionale.it/stream_data/items/${section}/0/0/$(date +'%Y-%m-%d_%H-%M-%S').json"
       +               ;;
       +       i-piu-letti | reportage | opinioni | savagelove )
       +               # TODO
       +               exit 1
                       ;;
               *)
                       exit 1
       @@ -29,24 +22,15 @@
        echo "Internazionale"
        echo ""
        
       -/usr/pkg/bin/curl -sgL "${url}" |
       -awk '
       -/class="box-article-title"/ {
       -       if (!match($0, /href="[^"]*"/)) {
       -               next
       -       }
       -       url = substr($0, RSTART + 6, RLENGTH - 7)
       -       url = "https://www.internazionale.it" url
       -
       -       title = $0
       -       sub(/^ *<a href="[^"]*" class="box-article-title">/, "", title)
       -       sub(/<\/a>.*$/, "", title)
       -
       -       gsub("\\|", "\\|", url)
       -       gsub("\\|", "\\|", title)
       -
       -       printf("[0|%s|'"${ARTICLE_CGI}?"'%s|server|port]\n", title, url)
       -}
       +/usr/bin/ftp -V -o - "${url}" |
       +/usr/pkg/bin/jq -r '
       +.items[] | (
       +"[0|" +
       +    "\(.title | gsub("\\|"; "\\|") )" + "|" +
       +    "'"${ARTICLE_CGI}?"'" + "https://www.internazionale.it" +
       +        "\(.url | gsub("\\|"; "\\|") )" + "|" +
       +    "server|port]"
       +)
        '
        
        echo ""