[HN Gopher] Revoking certain certificates on March 4
       ___________________________________________________________________
        
       Revoking certain certificates on March 4
        
       Author : teddyh
       Score  : 311 points
       Date   : 2020-03-03 10:20 UTC (12 hours ago)
        
 (HTM) web link (community.letsencrypt.org)
 (TXT) w3m dump (community.letsencrypt.org)
        
       | shakna wrote:
       | Thankfully it's fairly painless to see if you're affected:
       | curl -XPOST -d 'fqdn=example.com'
       | https://unboundtest.com/caaproblem/checkhost
       | 
       | Replace example.com with your Fully Qualified Domain.
        
         | MrStonedOne wrote:
         | That can only test port 443, and not say, sstp on port 8384, or
         | soap/https on port 3443 or other rando ports for various
         | internal https layered applications
        
           | [deleted]
        
           | rswail wrote:
           | It can only test hosts that it has access to. Otherwise you
           | can download the file and check your serial number against
           | the list.
        
             | zimpenfish wrote:
             | It can't test hosts that aren't on 443 though, regardless
             | of access to the relevant port, e.g. (obvs. elided mydomain
             | here.)                   $ curl -XPOST -d
             | 'fqdn=pop3.mydomain:995'
             | https://unboundtest.com/caaproblem/checkhost
             | invalid name pop3.mydomain:995         404 page not found
             | $ openssl s_client -connect pop3.mydomain:995 -showcerts
             | </dev/null 2>/dev/null | openssl x509 -text -noout | grep
             | -A 1 Serial\ Number | tr -d :             Serial Number
             | 0325f31485b9c0f393e27b00e4678e881e3c
        
         | Ayesh wrote:
         | All my recent certificates are affected. I think it is the same
         | case for the majority of us.
        
           | tialaramex wrote:
           | The most likely cause will be that you issued similar
           | certificates a few days (more than eight hours) earlier with
           | the same Let's Encrypt account.
           | 
           | Suppose on Wednesday you get a cert for example.com and
           | www.example.com, and then on Thursday you realise you also
           | need images.example.com - you use the same ACME account (if
           | you run Certbot this will happen by default if you use the
           | same machine and user account, it silently makes you a free
           | account if you don't have one already) and so Let's Encrypt
           | can see that this account showed proof-of-control for two of
           | these names on Wednesday, so only fresh proof-of-control of
           | images.example.com is needed. Unfortunately this bug means
           | Let's Encrypt forgot to re-check CAA for the old names, and
           | so there's a risk they technically were no longer authorised
           | to issue for these names and shouldn't have given you the
           | Thursday certificate.
           | 
           | Rather than try to argue about whether it's appropriate to
           | disregard this check, Let's Encrypt decided to revoke all ~3
           | million affected certificates. That's maybe 2-3 days worth of
           | normal issuance in the last 90 days, so lots but hardly "the
           | majority".
        
           | [deleted]
        
           | vesinisa wrote:
           | I use DNS validation to obtain one certificate with both
           | domain.tld and *.domain.tld wildcard. My certificates seem
           | not to be affected. My certs last auto-renewed on 2020-02-20.
        
       | owenmarshall wrote:
       | > We confirmed the bug at 2020-02-29 03:08 UTC, and halted
       | issuance at 03:10. We deployed a fix at 05:22 UTC and then re-
       | enabled issuance.
       | 
       | Wow. That is a truly impressive way to handle a security bug and
       | I know it's not the first time Let's Encrypt has responded
       | extremely quickly.
       | 
       | I would love to hear how their engineering practices make this
       | possible.
        
         | d33 wrote:
         | And I'm curious about the work culture park. They're a small
         | organisation in terms of workforce, but somehow managed to
         | respond within minutes on Saturday. How does that work? Do they
         | have shifts or somehow the workers are so devoid of private
         | life that they have to respond during early morning on weekend?
        
           | jaas wrote:
           | Head of Let's Encrypt here.
           | 
           | We have an on-call rotation and a system for getting others
           | notified and online quickly when necessary. We make sure not
           | to bring too many people online so that some people are fresh
           | and can rotate in later if the incident lasts longer.
           | 
           | It's not often that staff have to put in time at night or on
           | weekends, and when it happens we work hard to make sure the
           | problem doesn't happen again.
        
           | dboreham wrote:
           | Out of hours response to a critical problem is standard. You
           | can achieve it in various ways but they all boil down to
           | people who know what they're doing having a professional
           | ethic. Typically it isn't possible to have shifts of
           | engineers with deep understanding of the code on call so
           | ultimately you need to wake someone up. So remember to keep a
           | note (up to date) of key staff home phone numbers, their home
           | addresses.
        
           | michaelt wrote:
           | The blog post only says they halted issuance within minutes
           | of confirming the bug - not that they confirmed the bug
           | within minutes of receiving a bug report.
        
             | pgporada wrote:
             | It takes time to correctly determine if a bug report is
             | real and to determine the possible scope of the bug.
        
           | icebraining wrote:
           | In California, it was Friday evening still.
        
           | m_eiman wrote:
           | I assume they probably have 24h support, but also consider
           | time zones: 3 AM Saturday in UTC is Friday evening in San
           | Fransisco:
           | 
           | https://duckduckgo.com/?q=03%3A00+utc+to+pst&ia=answer
        
           | lidHanteyk wrote:
           | They don't have many incidents or outages. As a result, it's
           | much easier to respond to the incidents that do occur.
        
           | pgporada wrote:
           | Speaking for myself here.
           | 
           | The workforce is spread across the states. When you're drawn
           | to the mission like I am, late nights here and there don't
           | matter at all. I communicated with my wife and I'm sure
           | others informed their significant others what was going on
           | etc and why Friday, Saturday, Sunday, Monday, and Tuesday
           | would be thrown out of wack. Members of the team put in much
           | more hours than I did and that is truly impressive. It takes
           | all of us with our different specialties to make an accurate
           | and effective response.
           | 
           | Some of the things we do are internal post mortems and find
           | ways to prevent the issue from happening again by either
           | improving alerting/monitoring, writing a runbook, fixing
           | code, and fixing misconceptions about a part of the entire
           | system. We do weekly readings of various RFCs, the Baseline
           | Requirements, and other CAs CP and CPS documents to again
           | better understand our system and Web PKI as a whole. This is
           | an understatement, but we heavily rely on automation. From
           | the moment the call was made to stop issuance, an SRE was
           | ready to run the code that disables the issuance pipeline.
           | 
           | The biggest takeaway is that communication and leadership
           | makes all the difference.
           | 
           | I have to go, there's work to be done.
        
           | tialaramex wrote:
           | For me at least 24/7 incident response is completely
           | acceptable in a properly compensated role so long as it's
           | accompanied by the culture that says _preventing_ such
           | incidents in the first place is Job #1
           | 
           | That is, I'm OK with being woken at 0200 to try to understand
           | and if appropriate fix or recover from a disaster _only_ so
           | long as if I 'd suspected this might happen the people
           | expecting me to be awake at 0200 would have given me the
           | resource (money, people, whatever) to fix it. If I feel like
           | I don't have that support, I'll only start looking at your
           | disaster during my working day.
           | 
           | My impression is that ISRG pays a lot of attention to
           | preventing disasters, so if I worked for ISRG (not very
           | practical since they're based on the US West Coast and I live
           | in England) I'd be comfortable taking a call in the middle of
           | the night to fix things.
        
             | bcrosby95 wrote:
             | Yeah, as long as it doesn't happen often. I'm technically
             | always on call but we haven't had an on call incident in
             | close to a year.
             | 
             | Basically I keep a phone and laptop on me at all times.
             | 
             | This is in comparison to a friend that works somewhere that
             | always has daily on call incidents that are not actually
             | problems 95% of the time. That would piss me off even if I
             | weren't always on call.
        
             | ithkuil wrote:
             | operations based on US west coast could definitely benefit
             | from a few people on the other side of the world to achieve
             | a 24/7 coverage while keeping a good work-life balance.
        
               | myself248 wrote:
               | Local nerds can be noctournal too. Letting people pick
               | their preferred shift is just as important as
               | accommodating other kinds of physiological diversity.
        
               | pgporada wrote:
               | Can confirm.
        
               | folmar wrote:
               | You'd normally want the 24/7 people to be part of the day
               | to day operations, otherwise they will quickly stop being
               | up to date, so their don't go out of the knowledge loop,
               | so selecting the reasonable timezone set is not trivial.
        
         | [deleted]
        
       | talkingtab wrote:
       | So lets see, the deal is that I get free certificates on any and
       | all of my domains. I get an easy way to install and update my
       | certificates that works with my nginx services. I can move the
       | service to a new address and instantly get a new certificate. The
       | certificates are universally trusted. I get notified by email
       | when there is a problem along with a way to detect and fix the
       | problem available to me.
       | 
       | I'm speechless. I used to pay real money to get certs without
       | half the service I get now for free.
       | 
       | Thanks letsencrypt.
        
       | Tomte wrote:
       | I haven't got a mail (I think), and I don't see that on their web
       | site or on their blog.
       | 
       | Is wading through Discourse threads now the new minimum
       | requirement for using services?
        
         | gindely wrote:
         | No, you can get your information from Hacker News.
         | 
         | (Do check your domain even if you didn't get an email, since
         | they have not delivered emails to everyone who is affected.)
        
       | MrStonedOne wrote:
       | >CAA
       | 
       | What is a CAA? Letsencrypt: please dont use initialisms in
       | customer facing blog posts without using the FULL name at the
       | first use. Makes things more learnable and googleable.
        
         | anderskaseorg wrote:
         | https://en.wikipedia.org/wiki/DNS_Certification_Authority_Au...
        
         | tyingq wrote:
         | Confused me too. They even already have a page they could have
         | linked to. https://letsencrypt.org/docs/caa/
        
         | Ayesh wrote:
         | Certificate Authority Authorization.
         | 
         | It's a fairly common initialism in CA/TLS world (heh). The DNS
         | record is also named "CAA".
        
         | rswail wrote:
         | It's a free service. I understand your frustration, but CAA is
         | a well known DNS record type. If you read the description of
         | the problem, it explains it.
        
           | djsumdog wrote:
           | I've been doing development and devops for years and have
           | never heard of a CAA. Then again, I use http-01 and not dns
           | validation, so that's probably why. Before LetsEncrypt I'd
           | buy certs and it involved a lot of copying/pasting CSRs into
           | web interfaces.
        
             | pfg wrote:
             | To be clear, CAA is relevant even if you're using http-01.
             | CAs need to check whether the CAA records of a given domain
             | allow/forbid issuance in addition to any of the methods
             | used to demonstrate domain ownership to the CA.
        
         | lvh wrote:
         | CAA is a term of art for a standard encoded in DNS. It stands
         | for "Certification Authority Authorization", but most people
         | who know what a CAA record is probably do not recognize it
         | written as words. (I know I would need to read it several times
         | to know that's what they meant, and I do this for a living.)
        
           | tialaramex wrote:
           | Expanding other DNS record names isn't very helpful either. I
           | know what a Canonical Name for something is, but CNAME seems
           | clearer, Mail Exchanger definitely isn't a helpful way to
           | think about what MX records are for...
           | 
           | PTR and TXT aren't initials they're just short for "pointer"
           | and "text" neither of which is much help divining what
           | they're actually used for, and presumably AAAA doesn't
           | actually stand for anything at all (?) other than it's four
           | times bigger than the A record.
        
             | captncraig wrote:
             | 4 A's for ipv6 of course, and 1 A for ipv4. How could that
             | possibly be confusing?
             | 
             | Had I been given a vote we would be using AAAA and AAAAAA
             | records.
        
       | 32gbsd wrote:
       | So it starts
        
         | GuyPostington wrote:
         | So what starts?
        
       | terom wrote:
       | For context in terms of what caused this, here's the PR which I
       | assume fixed the bug in question:
       | https://github.com/letsencrypt/boulder/pull/4690
       | 
       | It looks like a nasty and subtle pass-by-reference of a for-range
       | local variable, although I'm having trouble figuring out where
       | the reference is stored:
       | https://github.com/letsencrypt/boulder/blob/542cb6d2e06e756a...
       | 
       | I've spent plenty of time hunting down similar bizarre bugs in Go
       | code as well, where the called function ~implicitly~ takes a
       | pointer to the iteration variable and stores it somewhere. Each
       | iteration of the for loop updates the stack-local in-place, and
       | later reads of the stored reference will not read the original
       | value. It's hard to spot from the actual call site :/
       | 
       | EDIT: This was an explicitly taken `&v` reference, but the same
       | thing can also happen implicitly, if you call a `func (x *T) ...`
       | method on the variable.
        
         | heavenlyblue wrote:
         | People say Rust's borrowing rules are only useful for multi-
         | threaded environments. This is one of those issues that they
         | are supposed to solve.
        
           | fortran77 wrote:
           | "People" say that? Really? Now you build up straw men so you
           | can slip "Rust" into every discussion.
        
             | steveklabnik wrote:
             | An example from less than a day ago:
             | https://news.ycombinator.com/item?id=22466354
        
               | tedunangst wrote:
               | I'm not super confident I understand the bug, but it
               | looks like sequential access to the reference. If I'm not
               | mistaken, a mutable borrow in rust would end up with the
               | same bug.
        
               | fortran77 wrote:
               | I think you're right, too.
        
               | steveklabnik wrote:
               | I have not dug into the details enough to say if this is
               | a bug Rust would prevent or not, I am only responding to
               | the claim that people do not sometimes suggest that
               | Rust's complexity only matters in the multi-threaded
               | case.
        
               | tedunangst wrote:
               | Gotcha.
        
               | m-n wrote:
               | If I understand correctly, each `authzPB` collected in
               | the iteration stores references to fields of an
               | `authzModel`. Before the patch, these were identical,
               | referring to the fields of the loop variable v. Each
               | iteration of the loop, v is set, and all those stored
               | references pointed to the new value.
               | 
               | Rust does give a compilation error for that.
        
             | [deleted]
        
             | webappguy wrote:
             | Do you take issue with Rust?
        
           | [deleted]
        
           | Thaxll wrote:
           | I'm not sure I understand, it's a business logic bug, would
           | have happen in any language.
        
             | SolarNet wrote:
             | Except that the reference in question would have caused a
             | lifetime error in rust which would have required the
             | developrs to explictly acknowledge the choice they were
             | making, likely by changing a bunch of types.
             | 
             | Yes you could still do it in rust, but any reviwer of the
             | code would say "why in the world are you doing it this way"
             | because it would be forced into a complex cross call
             | monstrosity.
        
             | chc wrote:
             | How do you figure this is a business logic bug? It looks
             | like a pretty clear-cut implementation bug to me. Rust
             | would 100% have caught this bug, and in fact I'm pretty
             | sure it would have caught the bug at least two different
             | ways:
             | 
             | 1. The reference outlives the original value.
             | 
             | 2. You can't have multiple mutable references at the same
             | time.
        
               | cesarb wrote:
               | > 2. You can't have multiple mutable references at the
               | same time.
               | 
               | If I understood the issue correctly, only one of the
               | references would be a mutable one, so the way Rust could
               | have caught the bug would instead be the related rule:
               | "you can't have an immutable reference and a mutable
               | reference at the same time".
        
         | rcaught wrote:
         | Is it just me or is the PR and the associated linking really
         | lacking? The PR doesn't have a description and neither it or
         | the commits link back to the original communication (or vise
         | versa).
        
           | terom wrote:
           | For even more context, this seems to have been on a Friday
           | night (assuming US West coast) with production down: https://
           | letsencrypt.status.io/pages/incident/55957a99e800baa...
           | 
           | I'll cut the LE team some slack on this one :) the PR does
           | have tests
        
         | gwd wrote:
         | What's particularly unfortunate about this is the comment just
         | above the call:                   // Make a copy of k because
         | it will be reassigned with each loop.
         | 
         | But v is reassigned with each loop too.
         | 
         | The real question is why there's so much pass-by-reference in
         | the first place. K looks to be a domain name string -- it's
         | almost certainly faster to copy it than to dereference it
         | everywhere.
        
           | thenewnewguy wrote:
           | > The real question is why there's so much pass-by-reference
           | in the first place. K looks to be a domain name string --
           | it's almost certainly faster to copy it than to dereference
           | it everywhere.
           | 
           | I don't program in rust, so my knowledge here is limited to
           | what these words mean in C/C++ - however shouldn't making a
           | copy still require dereferencing the copy?
        
             | gwd wrote:
             | This is actually in Go, but the issue is the same.
             | 
             | Suppose you have a struct like this:
             | struct foo {             struct bar elem         } s;
             | 
             | If you know the address of `s`, you just calculate the
             | address of 'elem' from it and read the contents; a single
             | memory read, all the data together cache-wise. Suppose on
             | the other hand you have a struct like this:
             | struct foo {             struct bar *elemptr;         } s;
             | 
             | If you know the address of `s`, you have to first read
             | `elemptr`, and only then read the value of `elem`. That's
             | an extra memory fetch, and probably from a different part
             | of the memory than `elem` is from. Copying on modern
             | processors is very fast, and the resulting copy will be
             | "hot" in your cache. So conventional wisdom I've heard is
             | that unless `struct bar` is quite large (I've heard people
             | say hundreds of bytes), it's probably faster to just copy
             | the whole structure around than to copy the pointer to it
             | around and dereference it.
             | 
             | Caveat: I haven't run the numbers myself, but I've heard it
             | from several independent sources; including, for instance,
             | Apple's book on Swift.
        
               | jandrese wrote:
               | Won't the pointer also be hot in cache in this case? I
               | only ask because it seems to me like excessive data
               | copying (and cache eviction) is a major source of
               | slowness in modern programs. People are churning their
               | cache to pieces by copying the world for every function
               | call.
               | 
               | It's fine as long as your entire program fits neatly in
               | cache, but once you exceed the cache size performance
               | goes to hell because you force loads of misses of
               | slightly-older data by constantly copying your working
               | data.
        
         | terom wrote:
         | LE just posted their own (excellent!) incident report of this
         | on the mozilla bugtracker, including the discovery timeline,
         | analysis of the bug, and follow-up steps:
         | https://bugzilla.mozilla.org/show_bug.cgi?id=1619047#c1
         | 
         | The original bug report, which was initially diagnosed as only
         | affecting the error messages, not the actual CAA re-checking:
         | https://community.letsencrypt.org/t/rechecking-caa-fails-wit...
         | 
         | Brief discussion on revocation exemption requests:
         | https://bugzilla.mozilla.org/show_bug.cgi?id=1619179
         | 
         | Tomorrow will tell if granting a revocation exemption might
         | have been a good idea in hindsight.
        
       | geocrasher wrote:
       | Here's a bash one-liner to check all the domains you have:
       | 
       | for domain in $(cat list-of-domains.txt); do curl -s -X POST -F
       | "fqdn=$x" https://unboundtest.com/caaproblem/checkhost ;done |
       | sed '/is OK./d'
        
       | tialaramex wrote:
       | I suppose this is one way to answer my question
       | 
       | I asked (on m.d.s.policy) on the 29th how many issuances were
       | affected, Jacob replied saying they intended to spend that day
       | figuring out the answer, but then there was nothing further from
       | him. The incident doesn't seem drastic enough to prompt urgent
       | answers so I intended to revisit later this week if I heard
       | nothing further.
       | 
       | Now we have a complete list of affected certificates instead,
       | (the answer to my original question is about 3 million)
       | 
       | I was sort of hoping the answer was going to be like five
       | thousand or something manageable. Alas. In hindsight I guess this
       | was to be expected.
        
         | thenewnewguy wrote:
         | Well, from my understanding, they have no idea of whether or
         | not a given certificate was misissued as long as it meets the
         | baseline criteria for triggering the bug (which seems to be
         | just "issued after X date with more than 1 domain").
         | 
         | So while the number of certificates that should not have been
         | issued due to a blocking CAA record is likely small (or
         | possibly even 0), they have to revoke every cert that could
         | have triggered the bug, as they have no way to travel back in
         | time and find out what the CAA records they didn't check would
         | have been.
        
           | tialaramex wrote:
           | The criteria also require that the challenge answers used to
           | validate control were old (more than eight hours old).
           | 
           | If all proof-of-controls were fresh the CAA checks are also
           | fresh for those proof-of-controls so there's no bug. That's
           | why the big list is "only" about three million certificates.
           | 
           | Suppose you own example.com, example.org and example.net and
           | all you do is every 60 days or so you spin up Certbot once to
           | get a certificate for six names, the three domains and the
           | associated www FQDNs - that won't trigger this bug because
           | each time your old proof of control have expired and new
           | fresh ones will be used, triggering fresh CAA checks.
           | 
           | You're right though that it's likely the number of truly mis-
           | issued certificates may be zero because the most common way
           | to have a CAA record deliberately changed to forbid Let's
           | Encrypt after having successfully done a proof-of-control
           | (the scenario that would trigger their bug) is researchers
           | looking for bugs in CAA checking, and of course such
           | researchers would have reported this to Let's Encrypt
           | triggering exactly the same incident but probably at a more
           | friendly time like a Monday morning.
        
       | deweller wrote:
       | Will a nightly certbot invocation will replace these revoked
       | certificates without manual intervention?
        
         | londons_explore wrote:
         | I don't believe so.
         | 
         | I hope they add support for that soon.
        
         | nitely wrote:
         | Run `certbot renew --force-renewal`. That's what it says in the
         | email they sent me. But if you didn't get an email, then your
         | domains should not be affected.
        
           | Tomte wrote:
           | My cert is affected, and I didn't get an email.
           | 
           | Check your domain using the linked online tool.
        
           | gindely wrote:
           | I didn't get the email, and my domains were affected. (Two of
           | them are in the list, one of them current.)
           | 
           | I do get "you didn't renew your certificate" messages on a
           | semi regular basis (domains that have passed out of my
           | control) so I know they have my details.
        
         | simias wrote:
         | Apparently you have to manually use --force-renewal for certbot
         | to regenerate new certificates (I just did it just in case,
         | even though I'm 99% sure that I'm not concerned).
         | 
         | I assume that by default certbot only checks the expiration
         | date of local certificates against the system clock, it doesn't
         | ping any external resources so it can't be aware that the
         | certificate might have been revoked even though it hasn't
         | expired.
         | 
         | I agree that it would be nice if there was such an option,
         | although I assume that it would increase the server load
         | significantly if certbot connected to letsencrypt's servers at
         | every invocation so maybe that's why they didn't do it.
        
           | lightswitch05 wrote:
           | > I assume that by default certbot only checks the expiration
           | date of local certificates against the system clock, it
           | doesn't ping any external resources so it can't be aware that
           | the certificate might have been revoked even though it hasn't
           | expired.
           | 
           | I think the actual issue here is that the certificates have
           | not been revoked yet. We know that they will be revoked,
           | which is why we have to run with --force-renewal, but there
           | is no process for certbot to know that a certificate,
           | although not revoked, will soon become revoked. I would
           | expect certbot to automatically renew the next time its ran
           | post-revocation.
        
           | tehlike wrote:
           | Actually i think it would not be a significant issue. At
           | least for a range of situations. A static file with time
           | stamp ranges in case of an issue could be resource-
           | efficiently served to signal clients.
        
       | qXlihgad7n wrote:
       | Is there an RSS feed that you can subscribe to for alerts like
       | this? I can't see one.
        
         | thenewnewguy wrote:
         | https://community.letsencrypt.org/c/incidents.rss perhaps?
         | 
         | (Pro tip: you can append .rss to many pages on discourse to get
         | an RSS feed)
        
       | ge0rg wrote:
       | There is a list of all affected certificates posted under
       | https://letsencrypt.org/caaproblem/ - and it looks like they are
       | also leaking the account IDs from the list, so now you can map
       | different domains/certificates to the account that got them
       | issued.
        
         | tialaramex wrote:
         | Yeah, it does seem like it'd have been sensible _not_ to list
         | the account ID in this file. It 's convenient if you know your
         | account ID and want to pull out just your certs, but for most
         | people this associates all their certificates together.
         | 
         | If you own both https://www.happy-rainbow-nursery.example/ and
         | https://hardcore.bdsm-videos.example/ you probably go to some
         | lengths to avoid visitors realising the connection. Nothing
         | you're doing is illegal or even unethical - but it's obviously
         | going to cause uncomfortable conversations so why not avoid
         | that altogether. Let's Encrypt aren't doing you a favour if
         | tomorrow a mom at nursery says now she knows why you sound so
         | much like Masked Mistress Martha...
        
           | djsumdog wrote:
           | Yea, this is really bad. I've done some searching of the
           | data. Sometimes it doesn't matter. It looks like whoever is
           | currently running gab.com is probably a big consulting
           | company with like 100 other clients, so there's no big
           | relation there. But if you run a small personal blog and use
           | the same e-mail address for maybe more controversial sites
           | that are hosted on different IPs, now you could get doxed.
           | 
           | I'm guessing customer IDs are associated with e-mail
           | addresses? This seems like a good case of using different
           | e-mails for ever cert. There are open source tools like
           | anonaddy.com you can host yourself or buy from them (they
           | have a decent free tier).
           | 
           | I feel like this list seriously needs to be pulled. There is
           | some serious lack of oversight here.
        
             | pfg wrote:
             | > I'm guessing customer IDs are associated with e-mail
             | addresses?
             | 
             | They are (on Let's Encrypt's end), if an email address was
             | provided.
             | 
             | It's a 1:n relation, the same email may be used for any
             | number of ACME accounts. Roughly speaking, for most
             | clients, the ACME account maps to a specific ACME client on
             | a specific host. If you run three servers with separate
             | ACME clients, you're probably using three ACME accounts
             | (even if you're using the same email and issuing
             | certificates for the same domain).
             | 
             | Large or custom implementations may reuse the same ACME
             | account across many servers and domains. (Issuance would
             | typically be centralized and operated as a separate system
             | in these scenarios.)
        
       | low_key wrote:
       | PSA: The unbounded checker doesn't seem to work if you have
       | certificates issued for both ECC and RSA keys. For some of mine,
       | it passes the check with status "OK" and shows the serial number
       | of the certificate for the ECC key. The certificate that is going
       | to be revoked is not shown.
        
         | tialaramex wrote:
         | If you have more than one certificate in use, regardless of
         | what flavour, they only see one and assess that. Maybe the
         | checker should emphasise that. For small users they probably
         | only have one certificate in use, so this avoids some problems.
         | 
         | The issue would probably also affect people who have
         | geographically separate certificates e.g. if you have two
         | servers in different regions and decided rather than make
         | things more complicated for key distribution you'll just have
         | them each get their own certificates for the same name - that's
         | totally fine with Let's Encrypt (it doesn't scale, but if you
         | had 500 servers not 2 you'd probably redesign everything) but
         | obviously this test only sees one of those servers and won't
         | check the other certificate.
         | 
         | There's no way to know, given that two (or more) valid
         | certificates exist for a name, and seeing one of them, whether
         | the others are still actively used anywhere.
         | 
         | It would obviously be pretty easy to build a web form where you
         | can type in an FQDN and get told if any certificates matching
         | that name will be revoked, but then you get false positives
         | where it says yes, this certificate for some.name.example will
         | be revoked, you rush to replace your certificate for
         | some.name.example but maybe actually the one that will be
         | revoked is from 20 December 2019, and you already got a newer
         | one which was unaffected in February.
        
           | namibj wrote:
           | I wish they'd issue short-term scoped CAs under the same
           | criteria as they currently use for wildcards.
           | 
           | No significant load on their infrastructure, and you'd not
           | have to break the "private keys don't move over _any_
           | network" rule.
        
       | IceWreck wrote:
       | Yes, I just got their mail. Four certs out of six, all issued at
       | the same time were affected. The other two were not.
        
       | paulfurtado wrote:
       | Does anyone know of a generic way to detect that letsencrypt will
       | revoke a certificates soon?
       | 
       | The goal would be to have our automation automatically rotate the
       | certificates when similar issues occur in the future.
        
         | pfg wrote:
         | To my knowledge, there's no such mechanism in any of the
         | relevant protocols (i.e. ACME and OCSP).
        
       | kn100 wrote:
       | Yeah my domain was affected. Renewed, that would have sucked if I
       | hadn't seen this! Also, I didn't get an email and I'm pretty sure
       | my certs were generated with my email!
        
       | appleflaxen wrote:
       | There are several links in this thread, but the following page
       | allows you to enter your hostname and check online.
       | 
       | (no SSH, terminal access, etc) and it's from the letsencrypt team
       | (linked in blog post).
       | 
       | https://unboundtest.com/caaproblem.html
        
         | zimpenfish wrote:
         | Alas, only works for things on port 443 which is a bit of a
         | problem for most of my certificates...
        
       | rswail wrote:
       | This bug only affects you if you got domain validation (eg by
       | dns-01) but didn't immediately issue a certificate.
       | 
       | Letsencrypt validates the domain ownership for 30 days, so the
       | bug allows you to issue a certificate within that window, even if
       | you added a CAA record after validation that says "don't allow
       | issue by letsencrypt, or only allow issue by MyCA.example.com".
       | 
       | But if you have everything automated, you're checking for renewal
       | and issuing every day and probably validating as part of that, so
       | unlikely to encounter the bug, unless you validate in one step
       | and then sometime 8h+ later, issue a certificate.
        
         | kn100 wrote:
         | I don't think this is true - I don't recall using domain
         | validation. I think it's more related to multi domain certs.
        
         | JeanMarcS wrote:
         | Thank you. I was just about to run tests on all my clients
         | certs but as I've never done that it seems it's not usefull.
         | 
         | (Will still have a look but less stressfully)
        
       | mholt wrote:
       | FWIW, I believe any Caddy sites will not be affected by this
       | since caddy does not manage multi SAN certificates. Even if it
       | did, Caddy will immediately replace a certificate when it sees a
       | Revoked OCSP response. So, if you're using Caddy, there's
       | probably nothing you need to do. But if you are a caddy user and
       | are impacted, let me know.
        
       | gindely wrote:
       | If like me you have several hundred certificates to check, please
       | do something like this:
       | 
       | cd somewhere-nice
       | 
       | wget https://d4twhgtvn0ff5.cloudfront.net/caa-rechecking-
       | incident... gunzip caa-rechecking-incident-affected-
       | serials.txt.gz
       | 
       | for i in $(cat domains); do (openssl s_client -connect $i:443
       | -showcerts < /dev/null 2> /dev/null | openssl x509 -text -noout |
       | grep -A 1 Serial\ Number | tr -d : | tail -n1) |tee serials/$i;
       | done
       | 
       | cat serials/* | tr -d " " | sort | uniq > serials.collate
       | 
       | grep $( cat serials.collated | head -c-1 | tr "\n" "|" | sed -e
       | 's/|/\\\|/g' ) ../caa-rechecking-incident-affected-serials.txt
       | 
       | It will take a moment and then it may tell you that letsencrypt
       | misspoke when they said they sent emails to everyone whose
       | contact details they have.
        
         | Operyl wrote:
         | I thought I was in the minority there! We have 45 certificates
         | (of many more) that were affected, and our account id was
         | listed, and it has an email contact associated. I got no email
         | whatsoever, but I'm glad I had the foresight to check anyway.
        
           | gindely wrote:
           | I just noticed I got an email at 1949 UTC. I guess they're
           | still sending them out. Presumably some people will receive
           | their emails after the revocation.
        
             | Operyl wrote:
             | I spoke to someone from the team, they've got another 10%
             | to go (presumably much less now). I finally got mine as
             | well, and they're still coordinating to figure out the
             | timeline to revoke. Presumably they'll wait for the emails
             | first.
        
       | b4d wrote:
       | If anybody needs a bulk check:
       | 
       | for domain in $(cat domains.txt); do printf "$domain :" && curl
       | -XPOST -d "fqdn=$domain"
       | https://unboundtest.com/caaproblem/checkhost; done
        
       | wbond wrote:
       | Emailing users and giving them only 24 hours before their certs
       | are revoked seems very unreasonable. Say you are down and out
       | with a stomach bug or on holiday for a day or two.
       | 
       | My understanding is that the 90 day lifetime is largely because
       | revocation can be thwarted. Thus the practical difference between
       | 24 hours and one week is meaningful for server admins, but
       | inconsequential if someone is staging an attack.
        
         | thdrdt wrote:
         | Well it is very inconvenient, but isn't this the strength of
         | certificates?
         | 
         | When something is wrong you can revoke them immediately.
         | 
         | Why leave a potential vulnerability open for more than 24
         | hours?
        
           | wbond wrote:
           | Because revoking them will cause interruption to legitimate
           | users, but doesn't stop an attack.
           | 
           | I'm just starting to think LE is more aimed at large
           | organizations than people running smaller configurations.
           | Which is fine, thankfully we still have traditional CAs. I
           | just hope we don't devolve into a monoculture of ACME-only
           | SSL.
        
             | devrand wrote:
             | LE is actually following the rules outlined by the Baseline
             | Requirements. "Traditional CAs" have a tendency to just
             | ignore them when convenient.
             | 
             | For example, Sectigo has misissued nearly every certificate
             | since 2002, including ~11 million unexpired ones (as
             | December) and decided to just ignore their duty to revoke
             | misissued certifcates [1].
             | 
             | Should the rules be changed? Maybe. However, when you're
             | giving an immense responsibility to CAs then public trust
             | is paramount. Ignoring agreed upon rules whenever you find
             | it convenient does not inspire much confidence.
             | 
             | [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1593776
        
             | M2Ys4U wrote:
             | ACME-only isn't the problem, it's a Let's Encrypt mono-
             | culture I'm concerned about.
             | 
             | We could do with another LE-style service (or two) operated
             | independently (both organisationally and geopolitically).
        
               | ff317 wrote:
               | Yeah I'd love to see one or more additional free ACME
               | issuers that are largely functionally-equivalent to LE,
               | but in a different jurisdiction and under different
               | management, with separate infrastructure, etc.
               | 
               | One of the less-obvious reasons: for "serious" usage
               | where you're also stapling OCSP responses, there's a
               | dependency on the cert vendor's OCSP service. You can
               | cache the OCSP outputs to get through short windows of
               | unavailability, but if the vendor's OCSP goes offline for
               | days or suffers some serious incident, it pays to have
               | multiple vendors on-hand. There was such an incident with
               | GlobalSign back in October 2016 (who's otherwise a pretty
               | decent vendor!), so it is a legitimate concern.
               | 
               | For "serious" use-cases, you basically need redundant
               | live certs from redundant vendors, and not having a
               | second LE-like option means one of those is still a
               | legacy CA for now...
        
               | sigio wrote:
               | There's buypass.no / buypass.com ... they are a norwegian
               | CA that also implements ACME. I have only used them for
               | some testing certificates so far, that have not been
               | deployed in the wild, but their server works, the certs
               | are valid in all browsers, and they do upto 6 month valid
               | certs iirc.
               | 
               | link: https://community.buypass.com/
        
               | folmar wrote:
               | I'm using it in production with no trouble at all - I
               | have one place where it's only possible to add SSL certs
               | through a GUI so longer validity is a dealbreaker.
        
               | wbond wrote:
               | My point there was more about automation in the cert
               | space could lead to traditional CAs leaving the space, in
               | which case small operators like myself (handful of minor
               | servers) would be forced down the automation route, which
               | isn't necessarily a net positive.
        
               | wbl wrote:
               | Absence of automation is why the CA death penalty is
               | applied so late due to the consequent disruption.
        
               | Avamander wrote:
               | Blame other CAs for resting on their laurels and allowing
               | LE to steal their marketshare.
        
             | michaelbuckbee wrote:
             | LetsEncrypt has made the strongest headway in large
             | organizations with thousands of domains like Shopify,
             | Heroku, website builders, etc. as it hits a really sweet
             | spot of usability (controlling the host lets them approve
             | issuance), cost (free) and control (they can trigger mass
             | refreshes).
        
             | Santosh83 wrote:
             | I think it is more aimed at technically competent users,
             | regardless of organisation size. It is not, as it stands,
             | suitable for direct use by non-technical people who can
             | nevertheless follow step-by-step instructions to purchase
             | and install a certificate from the traditional CAs. Similar
             | 'hold my hand' tooling isn't there yet for LE. Nothing
             | about the protocol itself mandates such short validity
             | periods though I presume?
             | 
             | Nevertheless technical people bemoan average users
             | clustering towards centralised web-hosts but forget the
             | reality that hosting a website from your own desktop or a
             | VPS is far from trivial even in 2020!
        
               | [deleted]
        
               | namibj wrote:
               | >Nothing about the protocol itself mandates such short
               | validity periods though I presume?
               | 
               | Actually, revocation is broken. Which is a large part of
               | why LE uses 90 days.
        
         | pfg wrote:
         | The Baseline Requirements for publicly-trusted CAs (section
         | 4.9.1.1) require timely revocation of mis-issued certificates -
         | either 24 hours or 5 days depending on the reason. I'm not
         | entirely certain which is applicable here, but I'd assume Let's
         | Encrypt's hands are tied in this case.
        
           | wbond wrote:
           | That is a very useful bit of info. I guess if the mis-
           | issuance happened on Friday evening PT, then fives days is
           | March 4th.
        
             | thenewnewguy wrote:
             | The misissuences have happened over the last several months
             | (since at least December 2019), but it does seem that it
             | was _discovered_ on Friday.
        
           | djsumdog wrote:
           | I'm glad they decided on the 24 hours, unlike CAs like Comodo
           | which really shouldn't still be a CA after all their fuckups.
        
         | tialaramex wrote:
         | Not thwarted exactly, but the problem is that the question "Is
         | this certificate still good?" has three possible answers:
         | 
         | 1. "Yes, it's still good"
         | 
         | 2. "No, it's revoked"
         | 
         | 3. "There was a network problem so I'm not sure"
         | 
         | Of course bad guys who know you'd get answer 2 can most likely
         | ensure you have answer 3 instead. So the only _safe_ thing to
         | do is treat 2 and 3 the same. If we 're not sure this
         | certificate is fine then it's not fine. But in practice answer
         | 3 is common anyway. For some users it may happen essentially
         | all the time. So browser vendors don't like to treat 2 and 3
         | the same, even though that's the only safe option and _that_
         | can thwart the effectiveness of revocation.
         | 
         | There's definitely further opportunity for improved tooling
         | here. Perhaps this incident will drive it (Let's Encrypt's
         | sheer volume can help in this way).
        
           | willglynn wrote:
           | OCSP is a request/response protocol intended to answer
           | certificate validity questions. It works as you describe, and
           | failures cannot be treated as errors. An attacker who stole a
           | certificate can use it even after revocation by blocking
           | access the relevant OCSP responder.
           | 
           | https://tools.ietf.org/html/rfc2560
           | 
           | OCSP stapling is a mechanism by which a TLS server can make
           | OCSP requests ahead of time and serve the response in-band.
           | TLS clients get a certificate signed by the CA as usual, as
           | well as a recent OCSP response signed by the CA attesting to
           | its continued validity. OCSP stapling allows TLS clients like
           | browsers to know a certificate's revocation status without
           | having to make an extra request, but it changes nothing for
           | an attacker who stole a certificate since they can simply not
           | use it.
           | 
           | https://tools.ietf.org/html/rfc6066#section-8
           | 
           | OCSP Must Staple is an option that can be included on a
           | certificate stating "I promise to use OCSP stapling". An
           | attacker who stole a "must staple" certificate can either
           | include an OCSP response indicating the certificate is
           | revoked, or they can omit an OCSP response which the TLS
           | client will treat as a hard error.
           | 
           | https://tools.ietf.org/html/rfc7633
           | 
           | In short, RFC 7633 makes certificate revocation work. Web
           | browsers and web servers support this today. If you use Let's
           | Encrypt's `certbot`, pass it `--must-staple`.
        
             | wbond wrote:
             | So in the case that revocation now works, why is there a
             | continued push to shorten certificate lifetimes?
        
               | pfg wrote:
               | Multiple reasons:
               | 
               | 1. Firefox remains the only mainstream browser to support
               | OCSP Must Staple.
               | 
               | 2. OCSP Must Staple does not cover all threat models: if
               | an attacker gains the ability to temporarily issue
               | certificates for the victim's domain (rather than
               | obtaining the private key of an existing certificate),
               | they can request a certificate without the OCSP Must
               | Staple extension. A more effective method would be
               | something like the Expect-Staple header[1] (in enforce
               | mode).
               | 
               | 3. It allows the ecosystem to move significantly faster.
               | In a world where all certificates expire after 3 months,
               | phasing out insecure hash algorithms (in certificates)
               | would no longer take many years.
               | 
               | 4. It encourages regular key rotation (even if it's not
               | enforced)
               | 
               | [1]: https://scotthelme.co.uk/designing-a-new-security-
               | header-exp...
        
               | wbond wrote:
               | Items 3 and 4 seem like weak arguments. We are still
               | dealing with operating systems from 3+ years ago, so
               | moving below a 1 year certificate length wouldn't buy
               | much agility in terms of new algorithms.
        
               | pfg wrote:
               | Hash algorithms may not have been the best examples as
               | they require client support.
               | 
               | A better example would be something like Certificate
               | Transparency. Currently, browsers may require Certificate
               | Transparency for certificates issued after a certain
               | date. A malicious or compromised CA may work around this
               | by backdating certificates. This would be less of an
               | issue with shorter certificate lifetimes.
        
             | tialaramex wrote:
             | If you have Must Staple but don't have monitoring in place
             | to detect that your OCSP responses are growing stale before
             | they expire (or worse, you use Apache HTTPD which will
             | happily replace a GOOD OCSP response with a newer BAD one)
             | then you'd still be screwed here when Let's Encrypt revokes
             | certificates.
             | 
             | You need _at least_ effective monitoring and a good OCSP
             | stapling implementation (IIS is supposedly pretty good at
             | this) or else stapling is sadly going to make life worse
             | for you not better.
        
       | bobmaxup wrote:
       | Only 13 out of ~3,500 of certificates I manage required renewal
        
         | vermontdevil wrote:
         | 3,500? wow. What are you managing if I may ask?
        
           | bobmaxup wrote:
           | A CMS platform with customer provided domains.
        
       | low_key wrote:
       | Looks like LE will be adding to the billion certs they've issued!
       | 
       | https://news.ycombinator.com/item?id=22434466
        
       | rswail wrote:
       | It also only affects you if you are issuing the certificate for
       | more than one domain name if I'm reading it right.
       | 
       | What's supposed to happen:                 For each fqdn in the
       | request         if challenge succeeds (eg dns-01)           check
       | whether caa record exists           if (it doesn't) or (it does
       | and allows issue)             issue certificate
       | 
       | In the step on "check whether caa record exists", instead of
       | using the domain name that is being issued in this loop, it uses
       | the first one it found (or one of them, it's unclear which one).
       | So theoretically, if you wanted a cert for:
       | domain1.example.com, domain2.example.com
       | 
       | and you had a CAA record for domain1 that allowed letsencrypt but
       | then a different CAA record was added between the CAA check on
       | domain1 and the CAA check on domain2 (which wouldn't happen
       | because of the bug) you could get a cert for domain2 that the CAA
       | record said not to issue.
        
         | captncraig wrote:
         | Kinda frustrating having certs revoked when there have never
         | been CAA records for any names involved. I know they can't know
         | that to be true historically, but I wish they could do some
         | additional filtering.
        
       | ck2 wrote:
       | That thread enlightened me to a great trick to force cert renewal
       | even if it's been done too recently: add a second (sub)domain and
       | make a new cert with both
        
       | donatj wrote:
       | Can we get a "(some)" for the title?
        
       | rb808 wrote:
       | Certificates are such huge a maintenance problem. So many mines
       | waiting to blow up. We really need something better.
        
       | karimmaassen wrote:
       | Yes, let's.
       | https://www.reddit.com/r/ProgrammerHumor/comments/7x2ugb/let...
        
       | terom wrote:
       | Here's some quick&dirty stats from the list of revoked
       | certificates:
       | https://gist.github.com/SpComb/6338facd12e020ec4fe561ca91f32...
       | 
       | There's 3M "missing CAA checking results" in total, of which 2M
       | are dated from 2020 and 1M from last month. FWIW the only certs
       | of mine affected were old certs from 2019-12 which had since
       | already been renewed in Feb, and the renewed certs are not
       | affected?
       | 
       | The largest account has 445k certs revoked, and the most revoked
       | certs from last month (most likely to still be in active use?) is
       | 43k for a single account. I hope your rate-limits are in order if
       | you're going to start reissuing all of those before midnight :/
       | 
       | BTW account number 131 at the top of the file seems to mostly be
       | akamaiedge.net sites :)
        
       ___________________________________________________________________
       (page generated 2020-03-03 23:00 UTC)