[HN Gopher] How CDNs Generate Certificates
       ___________________________________________________________________
        
       How CDNs Generate Certificates
        
       Author : ordiblah
       Score  : 77 points
       Date   : 2020-06-25 20:04 UTC (2 hours ago)
        
 (HTM) web link (fly.io)
 (TXT) w3m dump (fly.io)
        
       | ancarda wrote:
       | Is anyone else feeling quite sad reading this article? ALPN being
       | used because only 80/443 are realistic these days, middleboxes
       | causing the TLS handshake to have padding so it's not
       | misinterpreted with an ancient protocol (SSLv2).
       | 
       | It feels like the Internet is so fragile.
        
         | [deleted]
        
         | profmonocle wrote:
         | ALPN would make sense for something like HTTP2 even if you
         | didn't have the problem of ports being blocked. If HTTP2 had
         | its own port clients would have to make multiple TCP connection
         | attempts for each host they connect to.
        
         | mrkurt wrote:
         | I have the opposite feeling, the clever "hacks" people use to
         | build very useful stuff that bypasses most problems with legacy
         | infrastructure are pretty exciting. It's very much like
         | watching a complex organism evolve into something you never
         | really could have imagined 8000 iterations ago.
        
         | SahAssar wrote:
         | Most of this could have been avoided by using DOH and SRV
         | records for HTTP/HTTPS. I still don't understand why SRV
         | records is not supported for HTTP/HTTPS in browsers.
        
           | ancarda wrote:
           | I remember looking into why A/AAAA is still used over SRV,
           | and it would seem performance is one of the big concerns;
           | browsers do not want to make more DNS lookups than necessary.
           | 
           | I think they'd end up with 4 lookups; A, AAAA, SRV
           | (_http2._tls), and SRV (_http._tls).
           | 
           | Though perhaps you are suggesting DoH could mean the resolver
           | also returns SRV records if you request A or AAAA? i.e.
           | proactively point out there's an HTTP server?
        
             | SahAssar wrote:
             | IIRC chrome is now racing QUIC and HTTP(2) connections
             | instead of doing negotiation/upgrade to detect QUIC
             | support. So this argument (if true) has fallen apart just a
             | few years later.
        
             | jiggawatts wrote:
             | This debate comes up a lot, and it's hilarious how
             | misguided it was.
             | 
             | I regularly work with load-balancers such as Citrix ADC
             | (NetScaler) or F5 BIG IP. These do DNS-based load-
             | balancing, dynamically returning "A" records to that the
             | browsers so that they can get the "single working IP
             | address" they're expecting. The browsers don't try very
             | hard to fail over to secondary IPs because this is the
             | established standard architecture, but they don't need to
             | because of this common setup.
             | 
             | Sounds like an optimal solution, right? It does at first
             | glance anyway, as long as you ignore the eye-watering price
             | tag on those load balancer boxes.
             | 
             | The subtle but critical issue is that by returning "A"
             | records, the load balancers have to use a short time to
             | live (TTL)! This is because there's a trade-off: You can
             | have fast failover, OR long-lived DNS caching. _With A
             | records you can 't have both!_
             | 
             | Typical response TTL times are 5-30 seconds, 5 minutes tops
             | if you hate your users. This means that many browsers will
             | be forced to repeatedly re-query the DNS servers on _every
             | page load_ for typical end-user workflows. It also means
             | that for all but the biggest, most popular sites, the ISP
             | DNS cache does practically nothing for these records.
             | 
             | Meanwhile with SRV records the TTL times can be much
             | higher, hours even. This is how Active Directory works, for
             | example, all of the Domain Controllers add themselves to
             | various SRV records so that if you query
             | "_ldap._tcp.dc._msdcs.test.com" you get back all the DCs.
             | These records include priorities and weightings, so you can
             | pull tricks like incrementally demote a DC or prioritise
             | the shiny new one.
             | 
             | If you watch the AD connection traffic in WireShark, it's
             | incredible. It very quickly steps through alternate
             | services and then reorders the successful hits in front of
             | the failures so that subsequent queries are lightning fast.
             | It is astonishingly tolerant of partial networking
             | failures, yet still fast to connect despite that!
             | 
             | The key mistake made by the original DNS design working
             | groups was that SRV records should have returned a list of
             | IP addresses instead of a list of host names.
        
           | mrkurt wrote:
           | We actually run into problems that are similar to what you'd
           | have with SRV records.
           | 
           | Fly.io apps can define different service "handlers" (like TLS
           | and HTTP). If you want to, you can accept TCP connections and
           | bypass our logic. Which is great and flexible.
           | 
           | The problem is, when someone is deploying a new version of
           | their app where they _change_ one of those things, we have to
           | e really careful about how we (a) load balance and (b) decide
           | to do things like TLS. If we're not careful we can end up
           | sending the wrong type of connection to a new VM that's
           | expecting something else.
           | 
           | SRV records sound like they'd have it worse. If you do a DNS
           | lookup to detect something like http2, the IP you connect to
           | _can't_ do anything else. It's much simpler / safer to
           | negotiate stuff like that at connection time.
        
             | SahAssar wrote:
             | That all assumes that you use the same port for all those
             | things, right? Which is one of the points of SRV, to not
             | have to squeeze everything into one canonical port. With a
             | SRV record you could route https to whatever port suited
             | you and rotate that out when rolling out changes,
        
       | lomkju wrote:
       | Can you tell why should I choose fly instead of AWS?
       | 
       | micro-2x shared 512MB $0.000003044 $8 VS t3a.nano 2 Variable 0.5
       | GiB EBS Only $0.0031 per Hour
       | 
       | I'm missing something? cause seeing the pricing I still feel AWS
       | is cheaper.
        
         | mrkurt wrote:
         | It's probably better to compare Fly with Lambda or Fargate.
         | It's not really meant to be cheaper than AWS, though, the real
         | value is being able to run app servers all over the world
         | without spending time maintaining servers or wrangling AWS.
        
           | lomkju wrote:
           | Makes sense. Comparing the pricing with AWS lambda fly.io is
           | way cheaper. Will give it a try :)
        
       | mholt wrote:
       | Anyone looking to automate certificate management at any sort of
       | scale should read this: https://docs.https.dev/acme-ops
       | 
       | ... and use Caddy to do the heavy lifting. (I'm biased, yes. But
       | the linked doc is multi-authored and applies to every sysadmin or
       | developer who needs to manage certs, regardless of your software
       | choice.)
        
       | awinter-py wrote:
       | woo, hadn't heard about firecracker
        
         | mrkurt wrote:
         | It's seriously the bomb.
        
         | tptacek wrote:
         | Firecracker is f'ing awesome. I have a lot of notes to write up
         | about it. I know this isn't how products actually succeed in
         | the real world, but I'll be honest and say that Kurt had me at
         | Fly with "WireGuard and Firecracker".
         | 
         | (For the unfamiliar reader: Firecracker is a micro-vm system
         | that sits sort of in between a fully virtualized host, like an
         | EC2 instance, and a container like Docker; you get the security
         | isolation of a hypervisor but the speed/simplicity of Docker.
         | It's the engine that powers AWS Lambda and Fargate. The Usenix
         | paper is a pretty great read, and the code [it's all in Rust]
         | is simple and easy to follow.)
         | 
         | https://www.usenix.org/system/files/nsdi20-paper-agache.pdf
        
           | AlphaSite wrote:
           | It's fairly similar in concept to:
           | https://vmware.github.io/vic/ for vsphere
           | 
           | Disclaimer: interned with the team
        
             | tptacek wrote:
             | Say more, if you can! I'm not at all familiar with that
             | project. Thanks!
        
       | [deleted]
        
       | tialaramex wrote:
       | It would be interesting to see stats from the CAs about which of
       | the Blessed Methods is most popular. (This article is about Let's
       | Encrypt using tls-alpn-01 which is an implementation of
       | 3.2.2.4.10 "TLS Using a Random Number"). Doubtless Fly aren't the
       | only people doing tls-alpn-01 in bulk but we don't have a good
       | overview as far as I'm aware.
       | 
       | In principle they can all generate those statistics because they
       | (are supposed to) log enough information to identify what went
       | wrong when, inevitably, something is misissued. Logically that
       | also includes at least which method was used to verify domain
       | authorization or control.
       | 
       | One of the things wrong at Symantec is that it turns out some of
       | the records were notionally kept at CrossCert, a separate Korean
       | company. CrossCert simply did not keep any records (or if it did
       | they were in such disarray that it seemed less likely to attract
       | retribution by refusing to disclose them) and Symantec had
       | seemingly never checked.
       | 
       | Knowing which methods are popular with Subscribers, and whether
       | that varies considerably between CAs would be valuable in trying
       | to figure out how more of the worst Blessed Methods can be
       | deprecated or improved, and who we need to be talking to about
       | that.
       | 
       | For example maybe Let's Encrypt is doing almost all the
       | 3.2.2.4.19 ("Agreed Upon Change to Website - ACME") then there's
       | no point ragging on other CAs for the shortcomings of relying on
       | plaintext HTTP in this method. Or maybe DigiCert are doing a lot
       | of 3.2.2.4.15 ("Phone Contact with Domain Contact") so they are
       | the people to talk through any proposed improvements around stuff
       | like leaving a Voice mail.
        
       | tptacek wrote:
       | Part of the last few weeks involved me learning Rust and using it
       | in anger (if hooking nfqueue up to tokio counts as "in anger") so
       | if you'd like to irritate the hell out of 'pcwalton, feel free to
       | ask me Rust questions.
        
         | dchest wrote:
         | Can you really read Rust code after learning it or does it
         | still look like a bunch of squiggles?
        
         | NetOpWibby wrote:
         | > Obviously, to do stuff like this, you need to generate
         | certificates. The reasonable way to do that in 2020 is with
         | LetsEncrypt. We do that for our users automatically, but "it
         | just works" makes for a pretty boring writeup, so let's see how
         | complicated and meandering I can make this.
         | 
         | This delighted me.
        
         | dochtman wrote:
         | Exciting! Are you doing this in your role as Latacora helping
         | out startups with security challenges? (Update: apparently not
         | https://twitter.com/tqbf/status/1276212163582070785)
         | 
         | How is the Fly proxy implemented? Are you using rustls and/or
         | any of the available ACME crates?
         | 
         | I've been wanting to implement tls-alpn-01 support for rustls
         | (although it might be possible to do this just by mutating the
         | ServerConfig over time).
         | 
         | Also interested to hear your general impressions of Rust so far
         | (I think I read some Twitter grumbling...).
        
           | tptacek wrote:
           | I'm full-time at Fly. I'll let Jerome answer the fly-proxy
           | question, since it's his code and I wouldn't want to
           | inadvertently take credit.
           | 
           | I think I came across as grumbling about Rust when my real
           | perspective was much more subtle. My take on Rust so far is
           | that it has been, for me, a vindication of a lot of decisions
           | the Go team made, because I've been directly exposed to some
           | of the downsides of the opposite decisions. But, while that
           | sounds like a critique of Rust, it's not! Rust is the way it
           | is for real reasons: zero-cost abstractions and no runtime
           | GC, which are, right now, requirements for some application
           | domains.
           | 
           | For me, right now, writing in Rust feels almost identical to
           | how writing in C++ felt 15 years ago. But I'll keep writing
           | in it, and it'll get faster for me. We're a Rust-on-the-data-
           | plane shop!
        
             | JoshTriplett wrote:
             | If you run into issues in Rust that you believe might be
             | signs of a need for language improvements, please feel free
             | to raise them. I'm happy to help.
        
           | dochtman wrote:
           | What I perceived as grumbling:
           | 
           | "I absolutely understand what y'all like so much about Rust,
           | but I have to say that as an auditor, my blood pressure drops
           | and my shoulders relax the moment I switch from reading a
           | Rust project to reading a Go project."
           | 
           | https://twitter.com/tqbf/status/1260678152084480008
        
             | tptacek wrote:
             | As long as we're clear that I'm not saying "Rust is less
             | secure than Go", which is not _at all_ what I meant. I just
             | meant that it 's much easier for me to read Go code.
             | 
             | (I will however miss match expressions when I return to my
             | home planet.)
        
               | dochtman wrote:
               | I'd be very curious to hear if there are specific bits
               | about the Rust language that you think make it harder to
               | audit or that (so far) it's just the lack of experience.
        
           | jeromegn wrote:
           | Hey there, Fly co-founder here!
           | 
           | Fly's proxy uses a mix of tokio, hyper and rustls. We don't
           | need to use a crate that handles ACME because we're
           | processing all the validation and certificate authorizations
           | from a centralized, boring, Rails application.
           | 
           | We've had to submit a PR to the rustls project a few months
           | ago to handle different ALPNs. Instead of resolving a
           | certificate only from a SNI, the crate now provides the full
           | ClientHello which contains negotiable ALPNs. With that
           | information you can respond to the tls-alpn-01 challenge.
        
       ___________________________________________________________________
       (page generated 2020-06-25 23:00 UTC)