[HN Gopher] Show HN: Instantly create a GitHub repository to tak...
       ___________________________________________________________________
        
       Show HN: Instantly create a GitHub repository to take screenshots
       of a web page
        
       I built a GitHub repository template which automates the process of
       configuring a new repository to take web page screenshots using
       GitHub Actions.  You can try this out at
       https://github.com/simonw/shot-scraper-template  Use the
       https://github.com/simonw/shot-scraper-template/generate interface
       to create a new repository using that template, and paste the URL
       that you want to take screenshots of in as the "description" field.
       The new repository will then configure itself using GitHub Actions,
       take the screenshot and save it back to the repo!
        
       Author : simonw
       Score  : 106 points
       Date   : 2022-03-14 17:34 UTC (5 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | moharoune wrote:
       | Great idea to use Github for this, I've been working on
       | https://app.trackwebpage.com/ which also tracks the changes on
       | web pages and sends email notifications when changes happen (if
       | you wanted to), it's totally free now, you can just sign up and
       | track as much web pages as you want.
        
       | simonw wrote:
       | Here's a GitHub code search which shows repos that people have
       | created using my template: https://github.com/search?q=shot-
       | scraper-template+-user%3Asi...
       | 
       | My favourite so far is this one, which is taking screenshots of a
       | variety of French news websites:
       | https://github.com/ggtr1138/UneJournaux
        
       | endisneigh wrote:
       | I wish there were a way to use an iPhone or android as spare
       | computers easily (no app). I keep it charged all day, a way to
       | send some JavaScript to it in order to accomplish things like
       | this in a "serverless" fashion would be neat
        
         | crickcreek wrote:
         | You can do that easily on android. Termux, lineage,
         | postmarketOS
        
       | simonw wrote:
       | Related to this: you can also use my shot-scraper tool to scrape
       | web pages from the command line using JavaScript:
       | % pip install shot-scraper         % shot-scraper install
       | % shot-scraper javascript https://datasette.io/ "({
       | title: document.title,             tagline:
       | document.querySelector('.tagline').innerText         })"
       | {             "title": "Datasette: An open source multi-tool for
       | exploring and publishing data",             "tagline": "An open
       | source multi-tool for exploring and publishing data"         }
       | 
       | More here, including an example of using it to scrape data from
       | Hacker News that's not available in the API:
       | https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...
       | 
       | HN post about this from yesterday (which failed to get any
       | traction): https://news.ycombinator.com/item?id=30667588
        
         | marginalia_nu wrote:
         | That's fantastic!
         | 
         | I'll definitely investigate using this. I implemented my own
         | MacGyver version of this basic functionality off selenium to
         | grab screenshots for search.marginalia.nu/explore/random -- but
         | that script is super sketchy and held together in with bubble
         | gum and duct tape. Yours looks a lot better.
         | 
         | By the way, is there a way to extract favicons as well?
        
           | simonw wrote:
           | No, I've not thought about favicons. That's a really
           | interesting challenge.
           | 
           | I wonder if there's a way to detect favicons just using
           | JavaScript that runs against a page? Not sure if it's easy to
           | detect /favicon.ico v.s. the various meta tags.
        
             | simonw wrote:
             | Would be kind of fun to write JavaScript that runs against
             | the page that first looks for the meta tags, then tries to
             | fetch("/favicon.ico") and returns either the URL or a
             | base64 encoded copy of the image (since the "shot-scraper
             | javascript" command requires you to return JSON).
        
             | marginalia_nu wrote:
             | There's a lot of weird edge cases for favicons, most
             | browsers fall back to just looking for /favicon.ico if you
             | don't explicitly specify it in the meta tags, and if you
             | do, there's sometimes different versions.
             | 
             | Yeah, maybe it's a pipe dream :-/ But even without them, it
             | looks really useful!
        
       | [deleted]
        
       | cancan wrote:
       | This is really cool and I love the idea of using SVGs to even add
       | annotations, as you mentioned on your tweets. I might be
       | "borrowing" that idea soon for some our own work, and will try to
       | credit you!
        
       | 0des wrote:
       | Love the project. Do you ever worry about Microsoft tightening
       | the purse strings on these types of off-label uses for github?
        
         | simonw wrote:
         | I do. I'm happy to pay for Actions minutes on private repos,
         | but I do worry that they'll change their policy with regards to
         | free minutes for public repos at some point.
         | 
         | I felt a lot better about my git scraping work
         | (https://simonwillison.net/2020/Oct/9/git-scraping/) after
         | GitHub released https://next.github.com/projects/flat-data/
         | which was inspired by that work, as it feels like it's now
         | acknowledged as an OK use of their platform.
         | 
         | I'm hoping people don't abuse shot-scraper too much in terms of
         | saving huge binary files to free repositories - that's why I
         | haven't yet included tips on running scheduled scrapers in the
         | shot-scraper-template documentation.
        
           | beardicus wrote:
           | I saw you were poking with some image diff tools recently as
           | well, and I'm sure you've thought about this already but I'd
           | just like to explicitly state it: it'd be great if you could
           | scrape a screenshot periodically and only commit it if the
           | new one is significantly different.
        
             | simonw wrote:
             | Yeah that's the idea behind
             | https://github.com/simonw/image-diff but it's not quite fit
             | for purpose yet.
        
       ___________________________________________________________________
       (page generated 2022-03-14 23:00 UTC)