[HN Gopher] Show HN: Checksum.sh verify every install script
       ___________________________________________________________________
        
       Show HN: Checksum.sh verify every install script
        
       The pattern of downloading and executing installation scripts
       without verifying them has bothered me for a while.  I started
       messing around with a way to verify the checksum of scripts before
       I execute them. I've found it a really useful tool for installing
       things like Rust or Deno.  It's written entirely as a shell script,
       and it's easy to read and understand what's happening.  I hope it
       may be useful to someone else!
        
       Author : gavinuhma
       Score  : 65 points
       Date   : 2022-10-28 18:38 UTC (4 hours ago)
        
 (HTM) web link (checksum.sh)
 (TXT) w3m dump (checksum.sh)
        
       | dontbenebby wrote:
       | >The pattern of downloading and executing installation scripts
       | without verifying them has bothered me for a while.
       | 
       | Thanks for sharing this work OP! I didn't see a license mentioned
       | -- did you intend this to go into the public domain? I like how
       | you set up a cool domain name and did some sick graphics, but I'm
       | not sure how I can _legally_ use your code in the future.
       | 
       | That being said, I appreciate the work you put into this project.
       | 
       | I'm not going to list off specific examples, but MANY open source
       | projects serve either PGP keys or hashes in the clear. Or they
       | serve just hashes over HTTPS and now you have a trust issue.
       | 
       | Or, in one case, my favorite -- they had lovingly listed out the
       | MD5 sum for the program... but they served both that checksum,
       | and the code itself... over HTTPS.
       | 
       | Now, to be fair, HTTPS _does_ provide an integrity check, so
       | there 's a benefit beyond privacy or whatever but... this is a
       | RAMPANT problem in the open source community.
       | 
       | I ran into it mostly when trying to find esoteric security tools
       | when I was attempting OSCP and interviewing around for
       | penetration testing roles.
       | 
       | I got the sense rapidly shifting from "I was so scared of the
       | CFAA I did an entire master's thesis on the design of censorship
       | circumvention tools" to "Oh gee, I used to be such a narcissis,
       | demanding a high falutin salary when I couldmn't even fire up
       | Metasploit to wipe a server."
       | 
       | (The implication being that some folks abused their access when
       | my powers were week, and now, in time for spooky season, it's
       | time lean in to letting people take whatever drug they want if
       | they feel scared -- reality scares me too some days.)
        
         | gavinuhma wrote:
         | Good catch. Let me add a license
        
           | dontbenebby wrote:
           | Thanks, it wasn't meant in a gotcha way.
        
             | gavinuhma wrote:
             | I totally just forgot to add one. Added MIT just now.
             | Appreciate it!
        
       | orf wrote:
       | I feel like bash/sh should have this built in
        
       | dundarious wrote:
       | There are two big problems with the use of `echo $s` in
       | bash/POSIX sh:
       | 
       | 1. Never use echo to output untrusted content as the first
       | argument
       | 
       | Let's say `s='-e 1\n2'`, then `echo $s` will output:
       | 
       | > 1
       | 
       | > 2
       | 
       | Instead of:
       | 
       | > -e 1\n2
       | 
       | Always use printf if you want to start output with untrusted
       | content, e.g., `printf %s\\\n "$s"`.
       | 
       | 2. Never use unquoted variable expansion when trying to exactly
       | reproduce contents of the variable
       | 
       | Similarly, unquoted variable expansion re-tokenizes the contents
       | and will not preserve spaces appropriately. Say
       | `s='"a<space><space>b"'` (where each <space> is a literal ' ', HN
       | seems to be collapsing 2 spaces down to 1), then `echo $s` will
       | output:
       | 
       | > "a<space>b"
       | 
       | Instead of:
       | 
       | > "a<space><space>b"
       | 
       | You can get the latter with `echo "$s"` but use `printf %s\\\n
       | "$s"` to fix both issues.
       | 
       | PS: If you fail to use quoted expansion with printf, for example
       | like so, `printf %s\\\n $s`, then you'll notice the problem right
       | away, as it will effectively turn that into `for i in $s ; do
       | printf %s\\\n "$i" ; done`. That's actually a very useful feature
       | of printf if you know to use it.
       | 
       | Edit: These problems exist for bash/POSIX sh at least. Perhaps
       | you're using a shell that works differently, like zsh, because
       | otherwise issue 2 would probably have led to some checksum fails
       | for you already.
        
         | googlryas wrote:
         | Great post, you are wise in the ways of the shell. Minutiae
         | like this is exactly why I stop writing shell scripts the
         | moment I start, and reach for python or some other sane
         | language. But, I can't help but respect when I see masters of
         | sh work their magic.
        
           | dundarious wrote:
           | Honestly, 90% of problems with scripts are people forgetting
           | to put double quotes around stuff. The other stuff doesn't
           | come up that much, and once you write a few decent scripts,
           | the other stuff is as easy as noticing someone wrote `open =
           | True` in Python, not realizing they've redefined a builtin
           | function, and the fix is just do `is_open = True`.
           | 
           | So just put double quotes around all your variable expansions
           | unless you know you shouldn't -- 90% of scripts would be
           | "fixed" with just that. And don't bother putting curly braces
           | into the variable expansion unless you know you need to.
           | People tend to think `echo ${s}` is somehow better than `echo
           | $s` when it's exactly the same -- the curly braces are just a
           | way to allow you to, e.g., write `"${s}_"` as distinct from
           | `"${s_}"`. AFAIK in fish `${s}` is identical to `"$s"`, but
           | that's a different kettle of sh.
        
         | rnhmjoj wrote:
         | For more caveats like this one I recommend reading:
         | https://www.etalabs.net/sh_tricks.html
        
         | gavinuhma wrote:
         | This is awesome. Thank you! I've been through so many
         | iterations but it's been fun to improve
        
         | gavinuhma wrote:
         | Like this? https://github.com/gavinuhma/checksum.sh/pull/2
        
           | dundarious wrote:
           | Missed the other `echo $s` piped into shasum. But I echo the
           | sentiment of the another commenter that I'd rather rely on
           | `shasum --check` to give the OK or not.
        
             | gavinuhma wrote:
             | Got it. Thanks.
             | 
             | Re --check, I suppose the way to do that would be to
             | download the file to disk, which --check requires as fair
             | as I can tell. So I could download the file to disk,
             | --check, and then remove it. I think most of these installs
             | scripts are trying not to leave any artifacts around from
             | install, other than the resulting binary.
        
               | dundarious wrote:
               | You only need to create a temp file for the checksum
               | file, not the downloaded contents. In the below example,
               | no file exists on disk with the contents of `$s`.
               | 
               | > $ s='1<space><space>2'
               | 
               | > $ printf %s\\\n "$s" | shasum -a 256 > tmp.sum
               | 
               | > $ printf %s\\\n "$s" | shasum --check tmp.sum
               | 
               | > -: OK
               | 
               | So you can just `printf '%s<space><space>-\n' "$c" >
               | tmp.sum` and check with `printf %s\\\n "$s" | shasum
               | --check --status tmp.sum || { echo "checksum failed" > &2
               | ; exit 1 ; }`
               | 
               | Having to create temp files is a wrinkle (could probably
               | avoid it by using process substitution if you want to
               | give up on POSIX sh), but so is writing bash scripts in
               | general.
        
               | gavinuhma wrote:
               | Solid! I couldn't figure this out which I why I stopped
               | using "---check". I'll take a look
        
         | yjftsjthsd-h wrote:
         | If I may pile on with a general suggestion for people writing
         | shell scripts: Use shellcheck. Always. It will catch these
         | things automatically for you:)
        
       | throwawaaarrgh wrote:
       | If we kept a mirrored or distributed decentralized network of
       | just cryptographic hashes, that might solve a huge number of
       | problems around distributing files securely.
        
       | ithkuil wrote:
       | Awesome. I made something similar in
       | https://github.com/mkmik/runck
       | 
       | But I didn't but a fancy domain name :-)
        
         | gavinuhma wrote:
         | Haha thanks! Honestly when I saw the domain was available it
         | motivated me to finish the project and share it
        
       | thewataccount wrote:
       | Serious question - What is the benefit of verifying a hash? Are
       | we really worried about file integrity? Why don't people use GPG?
       | 
       | The hash only verifies file integrity, and that the content of
       | the url doesn't switch the script later. But keep in mind in most
       | scenerios, and attacker would also just change the hash listed
       | too (they're usually on the same website). This only mitigates
       | one very specific attack.
       | 
       | Why don't we use GPG here? That way we can verify ownership and
       | file integrity with at minimum TOFU, plus optional manual
       | verification? If we're going through the work of adding a wrapper
       | and all that, we may as well no?
       | 
       | This has the benefit that you only need to import the owner's
       | cert once, all future changes have the same cert. Where hashes
       | are obviously different every time, you have to trust the source
       | of the hash every time it changes. With GPG at the very least you
       | have TOFU with certs - and very best can have better assurance of
       | the initial download too.
       | 
       | EDIT: Just want to clarify - I'm openly asking why the "developer
       | community" is going the direction of hashes for script
       | verification vs GPG signatures.
       | 
       | I don't mean to diminish your project, your project looks fun,
       | and does make verifying hashes easier :)
        
         | [deleted]
        
         | tomrod wrote:
         | I'm not terribly deep in this space. What is the conceptual
         | difference of hash vs GPG sig?
        
           | atoav wrote:
           | A hash is the same when the values of the content are the
           | same. But when you get a new (maliciously hacked) install
           | script chances are that you won't have an old hash lying
           | around to check whether the script changed. Any attacker who
           | could swap the sceipt could also swap the hash, unless it is
           | a different channel.
           | 
           | With GPG the developer has a key pair (one private, one
           | public). They can then sign all their scripts with their
           | private key and publish the public one wherever. You can then
           | take that public key and verify that the script has been
           | indeed signed by the developers private key.
        
           | thewataccount wrote:
           | Admittedly this is likely the main reason GPG isn't more
           | common place because of the complexity.
           | 
           | This is the overview:
           | 
           | Developer generates a private/public key they use for all of
           | their projects.
           | 
           | You import their public key once - you can verify this from
           | their github, twitter, etc but that's optional.
           | 
           | They can sign a file with their key. You can check this
           | signature against their public key. This will guarantee the
           | file was signed by using that key and is unmodified.
           | 
           | If someone hijacks the website after this point and signs the
           | new downloads with their own key - then you will be able to
           | see it's invalid.
           | 
           | If you manually verify the key then you'll know your initial
           | download is valid - if you trust on first use then you at
           | least know all future files signed from that developer with
           | that cert are valid.
           | 
           | They also are effectively a hash for file integrity.
           | 
           | tl;dr - hashes tell you if a file is changed. Signatures tell
           | you if the file is changed, and who the person that made the
           | file is.
        
           | Jarwain wrote:
           | Hash essentially proves that the file you downloaded is the
           | same as the file that was uploaded. It tells you nothing
           | about Who uploaded the file. An attacker could make you
           | download their own file, but then the hash of the file won't
           | match what's published (unless the attacker changes the
           | published hash).
           | 
           | A GPG sig proves that the file was signed & uploaded by the
           | author, which defacto doubles as proof that it's the same
           | file. The idea here is that the author uploads their public
           | key, signs the package with their private key, and now
           | there's an association between the package and the author. An
           | attacker would have to obtain the author's private key, or
           | replace the public key with their own. Changing the public
           | key, however, is a big red flag.
        
         | pvg wrote:
         | Because for all of its problems, Web PKI is a working,
         | practical, large scale system of verification and GPG isn't -
         | you don't get much by trying to replicate what your web browser
         | and CAs do for you but clunkier.
        
         | XCSme wrote:
         | > would also just change the hash listed too
         | 
         | In my project I "host" the hash on a different medium, so in
         | order to compromise the file download the attacker would have
         | to compromise both the file hosting server and the hash hosting
         | medium (which in my case is GitHub).
         | 
         | I also don't really display the hashes, as the download only
         | happens when the script is updated, so your current version of
         | the script will check the hash on GitHub vs the hash of the
         | file download from the file hosting server.
         | 
         | EDIT: To be clear, this doesn't solve the problem with the
         | initial install and it is also not related to the Checksum.sh
         | script.
        
           | thewataccount wrote:
           | Interesting idea,
           | 
           | Does the script get the new version url&expected hash from
           | the website alone? Or does it get the expected hash from the
           | website, then calculate the URL from github?
           | 
           | Basically I'm wondering if that prevents just needing to
           | attack the website - if the url to download the update and
           | the expected hash are in the same place then it's still a
           | single point of failure.
        
             | XCSme wrote:
             | The latest file download URL is always the same /latest,
             | hosted on my server.
             | 
             | The version number and latest file hash are also fixed
             | URLs, stored on GitHub.
             | 
             | So for an update, the script checks GitHub for latest
             | version number, if newer it downloads the latest version
             | from my server, computes the hash and compares it to the
             | hash stored on the fixed GitHub URL before proceeding.
             | 
             | I think there's no way to replace the file with a malicious
             | one that will be distributed to the users unless you get
             | access to both my server and the GitHub repository.
        
               | thewataccount wrote:
               | Yeah I think that should work.
               | 
               | It does have the downside still that changes to the
               | website/github might break future updates in a way that
               | isn't (easily) verifiable.
               | 
               | While this is a solution personally I still like the idea
               | of GPG more since it'll work for any new files, works for
               | your new projects automagically, etc.
               | 
               | But I think you did at least fix the future update
               | problem with auto-updates, which is a lot more work then
               | most people put into it so thank you for addressing the
               | issue!
        
       | koolba wrote:
       | Just remember that any script that fetches anything else remotely
       | would still pass the checksum as only the initial script is
       | checked.
        
         | ChadNauseam wrote:
         | Yep. As an example, rustup happens to be in this category as
         | the checksums for rustc, cargo, etc. aren't checked.
        
           | gavinuhma wrote:
           | It's really interesting. There should be a massive ledger of
           | checksums for software
        
             | jandrese wrote:
             | It's called apt. Or dnf. Or most any package manager.
             | Having a gigantic general list runs into the problem of how
             | do you update it and how do you verify the updates?
        
               | yjftsjthsd-h wrote:
               | You use GPG and trust the people publishing things, who
               | sign the artifact that you actually download. Which is
               | internally how every package manager I've seen works
               | internally, anyways.
        
         | jandrese wrote:
         | It's the age old root of trust problem. In practice the good
         | enough is that if it passes SSL/TLS authentication on the
         | official domain then we wouldn't be able to stop an injection
         | attack either way. Validating against the source is no good if
         | it is the source that is compromised.
         | 
         | That's also kind of the issue with a lot of these shell
         | injection attacks. Sure someone could insert environment
         | variables or other shenanigans to take over your machine, but
         | if they have that much control over your shell there are
         | countless other ways they could also do it. Guarding against
         | this one particular case doesn't buy you much.
        
         | gavinuhma wrote:
         | Definitely. Important to note. There is a long long supply
         | chain
        
       | neeh0 wrote:
       | I wrote hundreds of those checks in scripts, makefiles, CI and
       | whatever else. After I found Nix (and NixOS) it's ridiculous not
       | to use it. Use it.
        
         | gavinuhma wrote:
         | I hadn't heard of NixOS. Super cool
        
       | NovemberWhiskey wrote:
       | I don't know; what's the threat model here?
       | 
       | If the script is deliberately malicious as originally published,
       | then the publisher will provide a valid checksum; so it doesn't
       | help.
       | 
       | If the script source is subverted by an attacker, then it only
       | helps if the attacker doesn't also have the means to change the
       | published checksum too.
       | 
       | If an attacker can modify the site which publishes the URL for
       | the script and the checksum, they can modify both at the same
       | time.
        
       | nerdponx wrote:
       | Why not use the -c option? Especially if you're using Bash or Zsh
       | which has "here-strings":                   checksum() {
       | hash="$1"           file="$2"           sha256sum -c <<< "${hash}
       | ${file}"         }
       | 
       | Or if you need to use a POSIX-ish shell:
       | checksum() {           hash="$1"           file="$2"
       | printf '%s  %s' "$hash" "$file" | sha256sum -c         }
       | 
       | Of course you can add a `--binary` option (uses '%s *%s' instead
       | of '%s %s'), options to use different hash functions, etc.
       | 
       | I also think it's weird to use `alias` inside a function, instead
       | of just using a parameter to store the name of the program to
       | execute.
        
         | gavinuhma wrote:
         | Great point on alias, thanks. I think that was a relic of an
         | older iteration.
         | 
         | I'll work through these suggestions. Appreciate it. Feel free
         | to send a PR if you want.
         | 
         | For the here string I think that won't work because the file
         | isn't being saved locally, it's just being piped (so $2 is a
         | URL). I can't do the usual `shasum -c <<<
         | "132e320edb0027470bfd836af8dadf174e4fee00 install.sh" which
         | takes a local filename but not the file content. As far as I
         | could tell anyway. I'll try it some more
        
       ___________________________________________________________________
       (page generated 2022-10-28 23:01 UTC)