[HN Gopher] Revoking certain certificates on March 4 ___________________________________________________________________ Revoking certain certificates on March 4 Author : teddyh Score : 311 points Date : 2020-03-03 10:20 UTC (12 hours ago) (HTM) web link (community.letsencrypt.org) (TXT) w3m dump (community.letsencrypt.org) | shakna wrote: | Thankfully it's fairly painless to see if you're affected: | curl -XPOST -d 'fqdn=example.com' | https://unboundtest.com/caaproblem/checkhost | | Replace example.com with your Fully Qualified Domain. | MrStonedOne wrote: | That can only test port 443, and not say, sstp on port 8384, or | soap/https on port 3443 or other rando ports for various | internal https layered applications | [deleted] | rswail wrote: | It can only test hosts that it has access to. Otherwise you | can download the file and check your serial number against | the list. | zimpenfish wrote: | It can't test hosts that aren't on 443 though, regardless | of access to the relevant port, e.g. (obvs. elided mydomain | here.) $ curl -XPOST -d | 'fqdn=pop3.mydomain:995' | https://unboundtest.com/caaproblem/checkhost | invalid name pop3.mydomain:995 404 page not found | $ openssl s_client -connect pop3.mydomain:995 -showcerts | </dev/null 2>/dev/null | openssl x509 -text -noout | grep | -A 1 Serial\ Number | tr -d : Serial Number | 0325f31485b9c0f393e27b00e4678e881e3c | Ayesh wrote: | All my recent certificates are affected. I think it is the same | case for the majority of us. | tialaramex wrote: | The most likely cause will be that you issued similar | certificates a few days (more than eight hours) earlier with | the same Let's Encrypt account. | | Suppose on Wednesday you get a cert for example.com and | www.example.com, and then on Thursday you realise you also | need images.example.com - you use the same ACME account (if | you run Certbot this will happen by default if you use the | same machine and user account, it silently makes you a free | account if you don't have one already) and so Let's Encrypt | can see that this account showed proof-of-control for two of | these names on Wednesday, so only fresh proof-of-control of | images.example.com is needed. Unfortunately this bug means | Let's Encrypt forgot to re-check CAA for the old names, and | so there's a risk they technically were no longer authorised | to issue for these names and shouldn't have given you the | Thursday certificate. | | Rather than try to argue about whether it's appropriate to | disregard this check, Let's Encrypt decided to revoke all ~3 | million affected certificates. That's maybe 2-3 days worth of | normal issuance in the last 90 days, so lots but hardly "the | majority". | [deleted] | vesinisa wrote: | I use DNS validation to obtain one certificate with both | domain.tld and *.domain.tld wildcard. My certificates seem | not to be affected. My certs last auto-renewed on 2020-02-20. | owenmarshall wrote: | > We confirmed the bug at 2020-02-29 03:08 UTC, and halted | issuance at 03:10. We deployed a fix at 05:22 UTC and then re- | enabled issuance. | | Wow. That is a truly impressive way to handle a security bug and | I know it's not the first time Let's Encrypt has responded | extremely quickly. | | I would love to hear how their engineering practices make this | possible. | d33 wrote: | And I'm curious about the work culture park. They're a small | organisation in terms of workforce, but somehow managed to | respond within minutes on Saturday. How does that work? Do they | have shifts or somehow the workers are so devoid of private | life that they have to respond during early morning on weekend? | jaas wrote: | Head of Let's Encrypt here. | | We have an on-call rotation and a system for getting others | notified and online quickly when necessary. We make sure not | to bring too many people online so that some people are fresh | and can rotate in later if the incident lasts longer. | | It's not often that staff have to put in time at night or on | weekends, and when it happens we work hard to make sure the | problem doesn't happen again. | dboreham wrote: | Out of hours response to a critical problem is standard. You | can achieve it in various ways but they all boil down to | people who know what they're doing having a professional | ethic. Typically it isn't possible to have shifts of | engineers with deep understanding of the code on call so | ultimately you need to wake someone up. So remember to keep a | note (up to date) of key staff home phone numbers, their home | addresses. | michaelt wrote: | The blog post only says they halted issuance within minutes | of confirming the bug - not that they confirmed the bug | within minutes of receiving a bug report. | pgporada wrote: | It takes time to correctly determine if a bug report is | real and to determine the possible scope of the bug. | icebraining wrote: | In California, it was Friday evening still. | m_eiman wrote: | I assume they probably have 24h support, but also consider | time zones: 3 AM Saturday in UTC is Friday evening in San | Fransisco: | | https://duckduckgo.com/?q=03%3A00+utc+to+pst&ia=answer | lidHanteyk wrote: | They don't have many incidents or outages. As a result, it's | much easier to respond to the incidents that do occur. | pgporada wrote: | Speaking for myself here. | | The workforce is spread across the states. When you're drawn | to the mission like I am, late nights here and there don't | matter at all. I communicated with my wife and I'm sure | others informed their significant others what was going on | etc and why Friday, Saturday, Sunday, Monday, and Tuesday | would be thrown out of wack. Members of the team put in much | more hours than I did and that is truly impressive. It takes | all of us with our different specialties to make an accurate | and effective response. | | Some of the things we do are internal post mortems and find | ways to prevent the issue from happening again by either | improving alerting/monitoring, writing a runbook, fixing | code, and fixing misconceptions about a part of the entire | system. We do weekly readings of various RFCs, the Baseline | Requirements, and other CAs CP and CPS documents to again | better understand our system and Web PKI as a whole. This is | an understatement, but we heavily rely on automation. From | the moment the call was made to stop issuance, an SRE was | ready to run the code that disables the issuance pipeline. | | The biggest takeaway is that communication and leadership | makes all the difference. | | I have to go, there's work to be done. | tialaramex wrote: | For me at least 24/7 incident response is completely | acceptable in a properly compensated role so long as it's | accompanied by the culture that says _preventing_ such | incidents in the first place is Job #1 | | That is, I'm OK with being woken at 0200 to try to understand | and if appropriate fix or recover from a disaster _only_ so | long as if I 'd suspected this might happen the people | expecting me to be awake at 0200 would have given me the | resource (money, people, whatever) to fix it. If I feel like | I don't have that support, I'll only start looking at your | disaster during my working day. | | My impression is that ISRG pays a lot of attention to | preventing disasters, so if I worked for ISRG (not very | practical since they're based on the US West Coast and I live | in England) I'd be comfortable taking a call in the middle of | the night to fix things. | bcrosby95 wrote: | Yeah, as long as it doesn't happen often. I'm technically | always on call but we haven't had an on call incident in | close to a year. | | Basically I keep a phone and laptop on me at all times. | | This is in comparison to a friend that works somewhere that | always has daily on call incidents that are not actually | problems 95% of the time. That would piss me off even if I | weren't always on call. | ithkuil wrote: | operations based on US west coast could definitely benefit | from a few people on the other side of the world to achieve | a 24/7 coverage while keeping a good work-life balance. | myself248 wrote: | Local nerds can be noctournal too. Letting people pick | their preferred shift is just as important as | accommodating other kinds of physiological diversity. | pgporada wrote: | Can confirm. | folmar wrote: | You'd normally want the 24/7 people to be part of the day | to day operations, otherwise they will quickly stop being | up to date, so their don't go out of the knowledge loop, | so selecting the reasonable timezone set is not trivial. | [deleted] | talkingtab wrote: | So lets see, the deal is that I get free certificates on any and | all of my domains. I get an easy way to install and update my | certificates that works with my nginx services. I can move the | service to a new address and instantly get a new certificate. The | certificates are universally trusted. I get notified by email | when there is a problem along with a way to detect and fix the | problem available to me. | | I'm speechless. I used to pay real money to get certs without | half the service I get now for free. | | Thanks letsencrypt. | Tomte wrote: | I haven't got a mail (I think), and I don't see that on their web | site or on their blog. | | Is wading through Discourse threads now the new minimum | requirement for using services? | gindely wrote: | No, you can get your information from Hacker News. | | (Do check your domain even if you didn't get an email, since | they have not delivered emails to everyone who is affected.) | MrStonedOne wrote: | >CAA | | What is a CAA? Letsencrypt: please dont use initialisms in | customer facing blog posts without using the FULL name at the | first use. Makes things more learnable and googleable. | anderskaseorg wrote: | https://en.wikipedia.org/wiki/DNS_Certification_Authority_Au... | tyingq wrote: | Confused me too. They even already have a page they could have | linked to. https://letsencrypt.org/docs/caa/ | Ayesh wrote: | Certificate Authority Authorization. | | It's a fairly common initialism in CA/TLS world (heh). The DNS | record is also named "CAA". | rswail wrote: | It's a free service. I understand your frustration, but CAA is | a well known DNS record type. If you read the description of | the problem, it explains it. | djsumdog wrote: | I've been doing development and devops for years and have | never heard of a CAA. Then again, I use http-01 and not dns | validation, so that's probably why. Before LetsEncrypt I'd | buy certs and it involved a lot of copying/pasting CSRs into | web interfaces. | pfg wrote: | To be clear, CAA is relevant even if you're using http-01. | CAs need to check whether the CAA records of a given domain | allow/forbid issuance in addition to any of the methods | used to demonstrate domain ownership to the CA. | lvh wrote: | CAA is a term of art for a standard encoded in DNS. It stands | for "Certification Authority Authorization", but most people | who know what a CAA record is probably do not recognize it | written as words. (I know I would need to read it several times | to know that's what they meant, and I do this for a living.) | tialaramex wrote: | Expanding other DNS record names isn't very helpful either. I | know what a Canonical Name for something is, but CNAME seems | clearer, Mail Exchanger definitely isn't a helpful way to | think about what MX records are for... | | PTR and TXT aren't initials they're just short for "pointer" | and "text" neither of which is much help divining what | they're actually used for, and presumably AAAA doesn't | actually stand for anything at all (?) other than it's four | times bigger than the A record. | captncraig wrote: | 4 A's for ipv6 of course, and 1 A for ipv4. How could that | possibly be confusing? | | Had I been given a vote we would be using AAAA and AAAAAA | records. | 32gbsd wrote: | So it starts | GuyPostington wrote: | So what starts? | terom wrote: | For context in terms of what caused this, here's the PR which I | assume fixed the bug in question: | https://github.com/letsencrypt/boulder/pull/4690 | | It looks like a nasty and subtle pass-by-reference of a for-range | local variable, although I'm having trouble figuring out where | the reference is stored: | https://github.com/letsencrypt/boulder/blob/542cb6d2e06e756a... | | I've spent plenty of time hunting down similar bizarre bugs in Go | code as well, where the called function ~implicitly~ takes a | pointer to the iteration variable and stores it somewhere. Each | iteration of the for loop updates the stack-local in-place, and | later reads of the stored reference will not read the original | value. It's hard to spot from the actual call site :/ | | EDIT: This was an explicitly taken `&v` reference, but the same | thing can also happen implicitly, if you call a `func (x *T) ...` | method on the variable. | heavenlyblue wrote: | People say Rust's borrowing rules are only useful for multi- | threaded environments. This is one of those issues that they | are supposed to solve. | fortran77 wrote: | "People" say that? Really? Now you build up straw men so you | can slip "Rust" into every discussion. | steveklabnik wrote: | An example from less than a day ago: | https://news.ycombinator.com/item?id=22466354 | tedunangst wrote: | I'm not super confident I understand the bug, but it | looks like sequential access to the reference. If I'm not | mistaken, a mutable borrow in rust would end up with the | same bug. | fortran77 wrote: | I think you're right, too. | steveklabnik wrote: | I have not dug into the details enough to say if this is | a bug Rust would prevent or not, I am only responding to | the claim that people do not sometimes suggest that | Rust's complexity only matters in the multi-threaded | case. | tedunangst wrote: | Gotcha. | m-n wrote: | If I understand correctly, each `authzPB` collected in | the iteration stores references to fields of an | `authzModel`. Before the patch, these were identical, | referring to the fields of the loop variable v. Each | iteration of the loop, v is set, and all those stored | references pointed to the new value. | | Rust does give a compilation error for that. | [deleted] | webappguy wrote: | Do you take issue with Rust? | [deleted] | Thaxll wrote: | I'm not sure I understand, it's a business logic bug, would | have happen in any language. | SolarNet wrote: | Except that the reference in question would have caused a | lifetime error in rust which would have required the | developrs to explictly acknowledge the choice they were | making, likely by changing a bunch of types. | | Yes you could still do it in rust, but any reviwer of the | code would say "why in the world are you doing it this way" | because it would be forced into a complex cross call | monstrosity. | chc wrote: | How do you figure this is a business logic bug? It looks | like a pretty clear-cut implementation bug to me. Rust | would 100% have caught this bug, and in fact I'm pretty | sure it would have caught the bug at least two different | ways: | | 1. The reference outlives the original value. | | 2. You can't have multiple mutable references at the same | time. | cesarb wrote: | > 2. You can't have multiple mutable references at the | same time. | | If I understood the issue correctly, only one of the | references would be a mutable one, so the way Rust could | have caught the bug would instead be the related rule: | "you can't have an immutable reference and a mutable | reference at the same time". | rcaught wrote: | Is it just me or is the PR and the associated linking really | lacking? The PR doesn't have a description and neither it or | the commits link back to the original communication (or vise | versa). | terom wrote: | For even more context, this seems to have been on a Friday | night (assuming US West coast) with production down: https:// | letsencrypt.status.io/pages/incident/55957a99e800baa... | | I'll cut the LE team some slack on this one :) the PR does | have tests | gwd wrote: | What's particularly unfortunate about this is the comment just | above the call: // Make a copy of k because | it will be reassigned with each loop. | | But v is reassigned with each loop too. | | The real question is why there's so much pass-by-reference in | the first place. K looks to be a domain name string -- it's | almost certainly faster to copy it than to dereference it | everywhere. | thenewnewguy wrote: | > The real question is why there's so much pass-by-reference | in the first place. K looks to be a domain name string -- | it's almost certainly faster to copy it than to dereference | it everywhere. | | I don't program in rust, so my knowledge here is limited to | what these words mean in C/C++ - however shouldn't making a | copy still require dereferencing the copy? | gwd wrote: | This is actually in Go, but the issue is the same. | | Suppose you have a struct like this: | struct foo { struct bar elem } s; | | If you know the address of `s`, you just calculate the | address of 'elem' from it and read the contents; a single | memory read, all the data together cache-wise. Suppose on | the other hand you have a struct like this: | struct foo { struct bar *elemptr; } s; | | If you know the address of `s`, you have to first read | `elemptr`, and only then read the value of `elem`. That's | an extra memory fetch, and probably from a different part | of the memory than `elem` is from. Copying on modern | processors is very fast, and the resulting copy will be | "hot" in your cache. So conventional wisdom I've heard is | that unless `struct bar` is quite large (I've heard people | say hundreds of bytes), it's probably faster to just copy | the whole structure around than to copy the pointer to it | around and dereference it. | | Caveat: I haven't run the numbers myself, but I've heard it | from several independent sources; including, for instance, | Apple's book on Swift. | jandrese wrote: | Won't the pointer also be hot in cache in this case? I | only ask because it seems to me like excessive data | copying (and cache eviction) is a major source of | slowness in modern programs. People are churning their | cache to pieces by copying the world for every function | call. | | It's fine as long as your entire program fits neatly in | cache, but once you exceed the cache size performance | goes to hell because you force loads of misses of | slightly-older data by constantly copying your working | data. | terom wrote: | LE just posted their own (excellent!) incident report of this | on the mozilla bugtracker, including the discovery timeline, | analysis of the bug, and follow-up steps: | https://bugzilla.mozilla.org/show_bug.cgi?id=1619047#c1 | | The original bug report, which was initially diagnosed as only | affecting the error messages, not the actual CAA re-checking: | https://community.letsencrypt.org/t/rechecking-caa-fails-wit... | | Brief discussion on revocation exemption requests: | https://bugzilla.mozilla.org/show_bug.cgi?id=1619179 | | Tomorrow will tell if granting a revocation exemption might | have been a good idea in hindsight. | geocrasher wrote: | Here's a bash one-liner to check all the domains you have: | | for domain in $(cat list-of-domains.txt); do curl -s -X POST -F | "fqdn=$x" https://unboundtest.com/caaproblem/checkhost ;done | | sed '/is OK./d' | tialaramex wrote: | I suppose this is one way to answer my question | | I asked (on m.d.s.policy) on the 29th how many issuances were | affected, Jacob replied saying they intended to spend that day | figuring out the answer, but then there was nothing further from | him. The incident doesn't seem drastic enough to prompt urgent | answers so I intended to revisit later this week if I heard | nothing further. | | Now we have a complete list of affected certificates instead, | (the answer to my original question is about 3 million) | | I was sort of hoping the answer was going to be like five | thousand or something manageable. Alas. In hindsight I guess this | was to be expected. | thenewnewguy wrote: | Well, from my understanding, they have no idea of whether or | not a given certificate was misissued as long as it meets the | baseline criteria for triggering the bug (which seems to be | just "issued after X date with more than 1 domain"). | | So while the number of certificates that should not have been | issued due to a blocking CAA record is likely small (or | possibly even 0), they have to revoke every cert that could | have triggered the bug, as they have no way to travel back in | time and find out what the CAA records they didn't check would | have been. | tialaramex wrote: | The criteria also require that the challenge answers used to | validate control were old (more than eight hours old). | | If all proof-of-controls were fresh the CAA checks are also | fresh for those proof-of-controls so there's no bug. That's | why the big list is "only" about three million certificates. | | Suppose you own example.com, example.org and example.net and | all you do is every 60 days or so you spin up Certbot once to | get a certificate for six names, the three domains and the | associated www FQDNs - that won't trigger this bug because | each time your old proof of control have expired and new | fresh ones will be used, triggering fresh CAA checks. | | You're right though that it's likely the number of truly mis- | issued certificates may be zero because the most common way | to have a CAA record deliberately changed to forbid Let's | Encrypt after having successfully done a proof-of-control | (the scenario that would trigger their bug) is researchers | looking for bugs in CAA checking, and of course such | researchers would have reported this to Let's Encrypt | triggering exactly the same incident but probably at a more | friendly time like a Monday morning. | deweller wrote: | Will a nightly certbot invocation will replace these revoked | certificates without manual intervention? | londons_explore wrote: | I don't believe so. | | I hope they add support for that soon. | nitely wrote: | Run `certbot renew --force-renewal`. That's what it says in the | email they sent me. But if you didn't get an email, then your | domains should not be affected. | Tomte wrote: | My cert is affected, and I didn't get an email. | | Check your domain using the linked online tool. | gindely wrote: | I didn't get the email, and my domains were affected. (Two of | them are in the list, one of them current.) | | I do get "you didn't renew your certificate" messages on a | semi regular basis (domains that have passed out of my | control) so I know they have my details. | simias wrote: | Apparently you have to manually use --force-renewal for certbot | to regenerate new certificates (I just did it just in case, | even though I'm 99% sure that I'm not concerned). | | I assume that by default certbot only checks the expiration | date of local certificates against the system clock, it doesn't | ping any external resources so it can't be aware that the | certificate might have been revoked even though it hasn't | expired. | | I agree that it would be nice if there was such an option, | although I assume that it would increase the server load | significantly if certbot connected to letsencrypt's servers at | every invocation so maybe that's why they didn't do it. | lightswitch05 wrote: | > I assume that by default certbot only checks the expiration | date of local certificates against the system clock, it | doesn't ping any external resources so it can't be aware that | the certificate might have been revoked even though it hasn't | expired. | | I think the actual issue here is that the certificates have | not been revoked yet. We know that they will be revoked, | which is why we have to run with --force-renewal, but there | is no process for certbot to know that a certificate, | although not revoked, will soon become revoked. I would | expect certbot to automatically renew the next time its ran | post-revocation. | tehlike wrote: | Actually i think it would not be a significant issue. At | least for a range of situations. A static file with time | stamp ranges in case of an issue could be resource- | efficiently served to signal clients. | qXlihgad7n wrote: | Is there an RSS feed that you can subscribe to for alerts like | this? I can't see one. | thenewnewguy wrote: | https://community.letsencrypt.org/c/incidents.rss perhaps? | | (Pro tip: you can append .rss to many pages on discourse to get | an RSS feed) | ge0rg wrote: | There is a list of all affected certificates posted under | https://letsencrypt.org/caaproblem/ - and it looks like they are | also leaking the account IDs from the list, so now you can map | different domains/certificates to the account that got them | issued. | tialaramex wrote: | Yeah, it does seem like it'd have been sensible _not_ to list | the account ID in this file. It 's convenient if you know your | account ID and want to pull out just your certs, but for most | people this associates all their certificates together. | | If you own both https://www.happy-rainbow-nursery.example/ and | https://hardcore.bdsm-videos.example/ you probably go to some | lengths to avoid visitors realising the connection. Nothing | you're doing is illegal or even unethical - but it's obviously | going to cause uncomfortable conversations so why not avoid | that altogether. Let's Encrypt aren't doing you a favour if | tomorrow a mom at nursery says now she knows why you sound so | much like Masked Mistress Martha... | djsumdog wrote: | Yea, this is really bad. I've done some searching of the | data. Sometimes it doesn't matter. It looks like whoever is | currently running gab.com is probably a big consulting | company with like 100 other clients, so there's no big | relation there. But if you run a small personal blog and use | the same e-mail address for maybe more controversial sites | that are hosted on different IPs, now you could get doxed. | | I'm guessing customer IDs are associated with e-mail | addresses? This seems like a good case of using different | e-mails for ever cert. There are open source tools like | anonaddy.com you can host yourself or buy from them (they | have a decent free tier). | | I feel like this list seriously needs to be pulled. There is | some serious lack of oversight here. | pfg wrote: | > I'm guessing customer IDs are associated with e-mail | addresses? | | They are (on Let's Encrypt's end), if an email address was | provided. | | It's a 1:n relation, the same email may be used for any | number of ACME accounts. Roughly speaking, for most | clients, the ACME account maps to a specific ACME client on | a specific host. If you run three servers with separate | ACME clients, you're probably using three ACME accounts | (even if you're using the same email and issuing | certificates for the same domain). | | Large or custom implementations may reuse the same ACME | account across many servers and domains. (Issuance would | typically be centralized and operated as a separate system | in these scenarios.) | low_key wrote: | PSA: The unbounded checker doesn't seem to work if you have | certificates issued for both ECC and RSA keys. For some of mine, | it passes the check with status "OK" and shows the serial number | of the certificate for the ECC key. The certificate that is going | to be revoked is not shown. | tialaramex wrote: | If you have more than one certificate in use, regardless of | what flavour, they only see one and assess that. Maybe the | checker should emphasise that. For small users they probably | only have one certificate in use, so this avoids some problems. | | The issue would probably also affect people who have | geographically separate certificates e.g. if you have two | servers in different regions and decided rather than make | things more complicated for key distribution you'll just have | them each get their own certificates for the same name - that's | totally fine with Let's Encrypt (it doesn't scale, but if you | had 500 servers not 2 you'd probably redesign everything) but | obviously this test only sees one of those servers and won't | check the other certificate. | | There's no way to know, given that two (or more) valid | certificates exist for a name, and seeing one of them, whether | the others are still actively used anywhere. | | It would obviously be pretty easy to build a web form where you | can type in an FQDN and get told if any certificates matching | that name will be revoked, but then you get false positives | where it says yes, this certificate for some.name.example will | be revoked, you rush to replace your certificate for | some.name.example but maybe actually the one that will be | revoked is from 20 December 2019, and you already got a newer | one which was unaffected in February. | namibj wrote: | I wish they'd issue short-term scoped CAs under the same | criteria as they currently use for wildcards. | | No significant load on their infrastructure, and you'd not | have to break the "private keys don't move over _any_ | network" rule. | IceWreck wrote: | Yes, I just got their mail. Four certs out of six, all issued at | the same time were affected. The other two were not. | paulfurtado wrote: | Does anyone know of a generic way to detect that letsencrypt will | revoke a certificates soon? | | The goal would be to have our automation automatically rotate the | certificates when similar issues occur in the future. | pfg wrote: | To my knowledge, there's no such mechanism in any of the | relevant protocols (i.e. ACME and OCSP). | kn100 wrote: | Yeah my domain was affected. Renewed, that would have sucked if I | hadn't seen this! Also, I didn't get an email and I'm pretty sure | my certs were generated with my email! | appleflaxen wrote: | There are several links in this thread, but the following page | allows you to enter your hostname and check online. | | (no SSH, terminal access, etc) and it's from the letsencrypt team | (linked in blog post). | | https://unboundtest.com/caaproblem.html | zimpenfish wrote: | Alas, only works for things on port 443 which is a bit of a | problem for most of my certificates... | rswail wrote: | This bug only affects you if you got domain validation (eg by | dns-01) but didn't immediately issue a certificate. | | Letsencrypt validates the domain ownership for 30 days, so the | bug allows you to issue a certificate within that window, even if | you added a CAA record after validation that says "don't allow | issue by letsencrypt, or only allow issue by MyCA.example.com". | | But if you have everything automated, you're checking for renewal | and issuing every day and probably validating as part of that, so | unlikely to encounter the bug, unless you validate in one step | and then sometime 8h+ later, issue a certificate. | kn100 wrote: | I don't think this is true - I don't recall using domain | validation. I think it's more related to multi domain certs. | JeanMarcS wrote: | Thank you. I was just about to run tests on all my clients | certs but as I've never done that it seems it's not usefull. | | (Will still have a look but less stressfully) | mholt wrote: | FWIW, I believe any Caddy sites will not be affected by this | since caddy does not manage multi SAN certificates. Even if it | did, Caddy will immediately replace a certificate when it sees a | Revoked OCSP response. So, if you're using Caddy, there's | probably nothing you need to do. But if you are a caddy user and | are impacted, let me know. | gindely wrote: | If like me you have several hundred certificates to check, please | do something like this: | | cd somewhere-nice | | wget https://d4twhgtvn0ff5.cloudfront.net/caa-rechecking- | incident... gunzip caa-rechecking-incident-affected- | serials.txt.gz | | for i in $(cat domains); do (openssl s_client -connect $i:443 | -showcerts < /dev/null 2> /dev/null | openssl x509 -text -noout | | grep -A 1 Serial\ Number | tr -d : | tail -n1) |tee serials/$i; | done | | cat serials/* | tr -d " " | sort | uniq > serials.collate | | grep $( cat serials.collated | head -c-1 | tr "\n" "|" | sed -e | 's/|/\\\|/g' ) ../caa-rechecking-incident-affected-serials.txt | | It will take a moment and then it may tell you that letsencrypt | misspoke when they said they sent emails to everyone whose | contact details they have. | Operyl wrote: | I thought I was in the minority there! We have 45 certificates | (of many more) that were affected, and our account id was | listed, and it has an email contact associated. I got no email | whatsoever, but I'm glad I had the foresight to check anyway. | gindely wrote: | I just noticed I got an email at 1949 UTC. I guess they're | still sending them out. Presumably some people will receive | their emails after the revocation. | Operyl wrote: | I spoke to someone from the team, they've got another 10% | to go (presumably much less now). I finally got mine as | well, and they're still coordinating to figure out the | timeline to revoke. Presumably they'll wait for the emails | first. | b4d wrote: | If anybody needs a bulk check: | | for domain in $(cat domains.txt); do printf "$domain :" && curl | -XPOST -d "fqdn=$domain" | https://unboundtest.com/caaproblem/checkhost; done | wbond wrote: | Emailing users and giving them only 24 hours before their certs | are revoked seems very unreasonable. Say you are down and out | with a stomach bug or on holiday for a day or two. | | My understanding is that the 90 day lifetime is largely because | revocation can be thwarted. Thus the practical difference between | 24 hours and one week is meaningful for server admins, but | inconsequential if someone is staging an attack. | thdrdt wrote: | Well it is very inconvenient, but isn't this the strength of | certificates? | | When something is wrong you can revoke them immediately. | | Why leave a potential vulnerability open for more than 24 | hours? | wbond wrote: | Because revoking them will cause interruption to legitimate | users, but doesn't stop an attack. | | I'm just starting to think LE is more aimed at large | organizations than people running smaller configurations. | Which is fine, thankfully we still have traditional CAs. I | just hope we don't devolve into a monoculture of ACME-only | SSL. | devrand wrote: | LE is actually following the rules outlined by the Baseline | Requirements. "Traditional CAs" have a tendency to just | ignore them when convenient. | | For example, Sectigo has misissued nearly every certificate | since 2002, including ~11 million unexpired ones (as | December) and decided to just ignore their duty to revoke | misissued certifcates [1]. | | Should the rules be changed? Maybe. However, when you're | giving an immense responsibility to CAs then public trust | is paramount. Ignoring agreed upon rules whenever you find | it convenient does not inspire much confidence. | | [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1593776 | M2Ys4U wrote: | ACME-only isn't the problem, it's a Let's Encrypt mono- | culture I'm concerned about. | | We could do with another LE-style service (or two) operated | independently (both organisationally and geopolitically). | ff317 wrote: | Yeah I'd love to see one or more additional free ACME | issuers that are largely functionally-equivalent to LE, | but in a different jurisdiction and under different | management, with separate infrastructure, etc. | | One of the less-obvious reasons: for "serious" usage | where you're also stapling OCSP responses, there's a | dependency on the cert vendor's OCSP service. You can | cache the OCSP outputs to get through short windows of | unavailability, but if the vendor's OCSP goes offline for | days or suffers some serious incident, it pays to have | multiple vendors on-hand. There was such an incident with | GlobalSign back in October 2016 (who's otherwise a pretty | decent vendor!), so it is a legitimate concern. | | For "serious" use-cases, you basically need redundant | live certs from redundant vendors, and not having a | second LE-like option means one of those is still a | legacy CA for now... | sigio wrote: | There's buypass.no / buypass.com ... they are a norwegian | CA that also implements ACME. I have only used them for | some testing certificates so far, that have not been | deployed in the wild, but their server works, the certs | are valid in all browsers, and they do upto 6 month valid | certs iirc. | | link: https://community.buypass.com/ | folmar wrote: | I'm using it in production with no trouble at all - I | have one place where it's only possible to add SSL certs | through a GUI so longer validity is a dealbreaker. | wbond wrote: | My point there was more about automation in the cert | space could lead to traditional CAs leaving the space, in | which case small operators like myself (handful of minor | servers) would be forced down the automation route, which | isn't necessarily a net positive. | wbl wrote: | Absence of automation is why the CA death penalty is | applied so late due to the consequent disruption. | Avamander wrote: | Blame other CAs for resting on their laurels and allowing | LE to steal their marketshare. | michaelbuckbee wrote: | LetsEncrypt has made the strongest headway in large | organizations with thousands of domains like Shopify, | Heroku, website builders, etc. as it hits a really sweet | spot of usability (controlling the host lets them approve | issuance), cost (free) and control (they can trigger mass | refreshes). | Santosh83 wrote: | I think it is more aimed at technically competent users, | regardless of organisation size. It is not, as it stands, | suitable for direct use by non-technical people who can | nevertheless follow step-by-step instructions to purchase | and install a certificate from the traditional CAs. Similar | 'hold my hand' tooling isn't there yet for LE. Nothing | about the protocol itself mandates such short validity | periods though I presume? | | Nevertheless technical people bemoan average users | clustering towards centralised web-hosts but forget the | reality that hosting a website from your own desktop or a | VPS is far from trivial even in 2020! | [deleted] | namibj wrote: | >Nothing about the protocol itself mandates such short | validity periods though I presume? | | Actually, revocation is broken. Which is a large part of | why LE uses 90 days. | pfg wrote: | The Baseline Requirements for publicly-trusted CAs (section | 4.9.1.1) require timely revocation of mis-issued certificates - | either 24 hours or 5 days depending on the reason. I'm not | entirely certain which is applicable here, but I'd assume Let's | Encrypt's hands are tied in this case. | wbond wrote: | That is a very useful bit of info. I guess if the mis- | issuance happened on Friday evening PT, then fives days is | March 4th. | thenewnewguy wrote: | The misissuences have happened over the last several months | (since at least December 2019), but it does seem that it | was _discovered_ on Friday. | djsumdog wrote: | I'm glad they decided on the 24 hours, unlike CAs like Comodo | which really shouldn't still be a CA after all their fuckups. | tialaramex wrote: | Not thwarted exactly, but the problem is that the question "Is | this certificate still good?" has three possible answers: | | 1. "Yes, it's still good" | | 2. "No, it's revoked" | | 3. "There was a network problem so I'm not sure" | | Of course bad guys who know you'd get answer 2 can most likely | ensure you have answer 3 instead. So the only _safe_ thing to | do is treat 2 and 3 the same. If we 're not sure this | certificate is fine then it's not fine. But in practice answer | 3 is common anyway. For some users it may happen essentially | all the time. So browser vendors don't like to treat 2 and 3 | the same, even though that's the only safe option and _that_ | can thwart the effectiveness of revocation. | | There's definitely further opportunity for improved tooling | here. Perhaps this incident will drive it (Let's Encrypt's | sheer volume can help in this way). | willglynn wrote: | OCSP is a request/response protocol intended to answer | certificate validity questions. It works as you describe, and | failures cannot be treated as errors. An attacker who stole a | certificate can use it even after revocation by blocking | access the relevant OCSP responder. | | https://tools.ietf.org/html/rfc2560 | | OCSP stapling is a mechanism by which a TLS server can make | OCSP requests ahead of time and serve the response in-band. | TLS clients get a certificate signed by the CA as usual, as | well as a recent OCSP response signed by the CA attesting to | its continued validity. OCSP stapling allows TLS clients like | browsers to know a certificate's revocation status without | having to make an extra request, but it changes nothing for | an attacker who stole a certificate since they can simply not | use it. | | https://tools.ietf.org/html/rfc6066#section-8 | | OCSP Must Staple is an option that can be included on a | certificate stating "I promise to use OCSP stapling". An | attacker who stole a "must staple" certificate can either | include an OCSP response indicating the certificate is | revoked, or they can omit an OCSP response which the TLS | client will treat as a hard error. | | https://tools.ietf.org/html/rfc7633 | | In short, RFC 7633 makes certificate revocation work. Web | browsers and web servers support this today. If you use Let's | Encrypt's `certbot`, pass it `--must-staple`. | wbond wrote: | So in the case that revocation now works, why is there a | continued push to shorten certificate lifetimes? | pfg wrote: | Multiple reasons: | | 1. Firefox remains the only mainstream browser to support | OCSP Must Staple. | | 2. OCSP Must Staple does not cover all threat models: if | an attacker gains the ability to temporarily issue | certificates for the victim's domain (rather than | obtaining the private key of an existing certificate), | they can request a certificate without the OCSP Must | Staple extension. A more effective method would be | something like the Expect-Staple header[1] (in enforce | mode). | | 3. It allows the ecosystem to move significantly faster. | In a world where all certificates expire after 3 months, | phasing out insecure hash algorithms (in certificates) | would no longer take many years. | | 4. It encourages regular key rotation (even if it's not | enforced) | | [1]: https://scotthelme.co.uk/designing-a-new-security- | header-exp... | wbond wrote: | Items 3 and 4 seem like weak arguments. We are still | dealing with operating systems from 3+ years ago, so | moving below a 1 year certificate length wouldn't buy | much agility in terms of new algorithms. | pfg wrote: | Hash algorithms may not have been the best examples as | they require client support. | | A better example would be something like Certificate | Transparency. Currently, browsers may require Certificate | Transparency for certificates issued after a certain | date. A malicious or compromised CA may work around this | by backdating certificates. This would be less of an | issue with shorter certificate lifetimes. | tialaramex wrote: | If you have Must Staple but don't have monitoring in place | to detect that your OCSP responses are growing stale before | they expire (or worse, you use Apache HTTPD which will | happily replace a GOOD OCSP response with a newer BAD one) | then you'd still be screwed here when Let's Encrypt revokes | certificates. | | You need _at least_ effective monitoring and a good OCSP | stapling implementation (IIS is supposedly pretty good at | this) or else stapling is sadly going to make life worse | for you not better. | bobmaxup wrote: | Only 13 out of ~3,500 of certificates I manage required renewal | vermontdevil wrote: | 3,500? wow. What are you managing if I may ask? | bobmaxup wrote: | A CMS platform with customer provided domains. | low_key wrote: | Looks like LE will be adding to the billion certs they've issued! | | https://news.ycombinator.com/item?id=22434466 | rswail wrote: | It also only affects you if you are issuing the certificate for | more than one domain name if I'm reading it right. | | What's supposed to happen: For each fqdn in the | request if challenge succeeds (eg dns-01) check | whether caa record exists if (it doesn't) or (it does | and allows issue) issue certificate | | In the step on "check whether caa record exists", instead of | using the domain name that is being issued in this loop, it uses | the first one it found (or one of them, it's unclear which one). | So theoretically, if you wanted a cert for: | domain1.example.com, domain2.example.com | | and you had a CAA record for domain1 that allowed letsencrypt but | then a different CAA record was added between the CAA check on | domain1 and the CAA check on domain2 (which wouldn't happen | because of the bug) you could get a cert for domain2 that the CAA | record said not to issue. | captncraig wrote: | Kinda frustrating having certs revoked when there have never | been CAA records for any names involved. I know they can't know | that to be true historically, but I wish they could do some | additional filtering. | ck2 wrote: | That thread enlightened me to a great trick to force cert renewal | even if it's been done too recently: add a second (sub)domain and | make a new cert with both | donatj wrote: | Can we get a "(some)" for the title? | rb808 wrote: | Certificates are such huge a maintenance problem. So many mines | waiting to blow up. We really need something better. | karimmaassen wrote: | Yes, let's. | https://www.reddit.com/r/ProgrammerHumor/comments/7x2ugb/let... | terom wrote: | Here's some quick&dirty stats from the list of revoked | certificates: | https://gist.github.com/SpComb/6338facd12e020ec4fe561ca91f32... | | There's 3M "missing CAA checking results" in total, of which 2M | are dated from 2020 and 1M from last month. FWIW the only certs | of mine affected were old certs from 2019-12 which had since | already been renewed in Feb, and the renewed certs are not | affected? | | The largest account has 445k certs revoked, and the most revoked | certs from last month (most likely to still be in active use?) is | 43k for a single account. I hope your rate-limits are in order if | you're going to start reissuing all of those before midnight :/ | | BTW account number 131 at the top of the file seems to mostly be | akamaiedge.net sites :) ___________________________________________________________________ (page generated 2020-03-03 23:00 UTC)