[HN Gopher] Last week's Let's Encrypt downtime ___________________________________________________________________ Last week's Let's Encrypt downtime Author : agwa Score : 150 points Date : 2023-06-22 14:42 UTC (8 hours ago) (HTM) web link (www.agwa.name) (TXT) w3m dump (www.agwa.name) | AdamJacobMuller wrote: | Did we kill crt.sh? FATAL: terminating | connection due to conflict with recovery DETAIL: User | query might have needed to see row versions that must be removed. | CONTEXT: SQL statement "SELECT c.ID, x509_print(c.CERTIFICATE, | NULL, 196608), ca.ID, cac.CA_ID, | digest(c.CERTIFICATE, 'sha1'::text), | digest(c.CERTIFICATE, 'sha256'::text), | x509_serialNumber(c.CERTIFICATE), | digest(x509_publicKey(c.CERTIFICATE), 'sha256'::text), | x509_rsamodulus(c.CERTIFICATE), | x509_hasROCAFingerprint(c.CERTIFICATE), | x509_hasClosePrimes(c.CERTIFICATE), c.CERTIFICATE | FROM certificate c LEFT OUTER JOIN ca ON | (c.ISSUER_CA_ID = ca.ID) LEFT OUTER JOIN | ca_certificate cac ON (c.ID = | cac.CERTIFICATE_ID) WHERE digest(c.CERTIFICATE, | 'sha256') = t_bytea" PL/pgSQL function | web_apis(text,text[],text[]) line 1757 at SQL statement | ERROR: server conn crashed? server closed the connection | unexpectedly This probably means the server terminated | abnormally before or while processing the request. | | and now it's just a 502 error! | agwa wrote: | Unfortunately, crt.sh is chronically overloaded. | AdamJacobMuller wrote: | I've never seen it happen before, but, you would know better! | mjw1007 wrote: | > these certificates were already being rejected by Chrome and | Safari for having invalid SCTs | | What's a good way to make an equivalent check from a script, if I | want (in future) to be able to check whether I have a website | whose certificate has such a problem? | agwa wrote: | Excellent question! The sctcheck command from | https://github.com/google/certificate-transparency-go/ can be | used to check the signatures of the embedded SCTs in a | certificate. | | I've also got an online tool which you can use to test a site | for CT policy compliance: | https://sslmate.com/labs/ct_policy_analyzer/ | | Example of a working site: | https://sslmate.com/labs/ct_policy_analyzer/?sslmate.com | | Example of one of the sites affected by the Let's Encrypt | incident: | https://sslmate.com/labs/ct_policy_analyzer/?thecandyshake.c... | jimmyl02 wrote: | This is a great writeup and intro to certificate transparency | overall. Glad to see that certificate authorities are being held | accountable and learn more about how its done! | jrpelkonen wrote: | > I find it alarming that a week after the incident, 40% of the | affected certificates are still in use, despite being rejected by | the most popular browsers and despite affected subscribers being | emailed by Let's Encrypt. | | This is perhaps a consequence on how well-oiled of a machine LE | typically is: people stop paying attention to it. | tredre3 wrote: | That is true but how come certbot had no awareness of | revoked/withdrawn certificates before now? It seems like one of | the things a CA is supposed to solve for you, and the fact that | it doesn't is bit alarming in itself. | | Though, as the following sentence points out, they were already | working on it before the outage, so clearly they knew it was | needed. | | 1. https://datatracker.ietf.org/doc/draft-ietf-acme-ari/ | 411111111111111 wrote: | The CA can't solve it for you. | | The certificate authority signs certificate requests, | creating certificates. The revocation process is necessary as | well, but the CA doesn't have the ability to change the | already issued certificate, thus it cannot take action. | | A software like certbot can solve it for you, but that's not | affiliated with your CA | agwa wrote: | The CA is part of the solution by using ARI to inform ACME | clients to replace impacted certificates. | mcpherrinm wrote: | Even before ARI, some integrated ACME/Web servers use | OCSP as a way of knowing to renew if a cert was revoked. | Plus if you're doing that you can pin the OCSP response | while you're at it. | 411111111111111 wrote: | My point was that the CA can't solve it for you, they can | only give you APIs and processes with which you can solve | it yourself. | | If your webserver supports checking the certificate | validity then it's not solved by the CA, it's been solved | by the developers of that software and by you installing | it. | tialaramex wrote: | I haven't looked at a list of revoked certificates, because I | was busy, (and I no longer operate my own CT auditing software, | so I'd have to poke around in crt.sh which is not much fun) but | lets suppose these are a random sample of Let's Encrypt's ~2 | million issuances per day. | | What %age of the world's HTTPS web sites are "parked" and so | there is nobody who expects them to actually work? | BrandFromATVShow.example ? TeenDanceISawOnTikTok.example ? | SomeShortEnglishWord.example ? Nobody cares, if they do visit, | and there's a certificate failure, they realise that's not | where they meant to go and leave. | | Then what %age are somebody's fever dream / retirement plan / | abandoned start-up idea and so although the owner may notice | _eventually_ that it 's broken, that might not happen before | automatic renewal "fixes" the problem anyway if ever. | MyTownOlympicSwimmingPool.example JimAndBethsCakeShop.example | and LikeAWSForDogsSomehow.example | | And then how about all the outfits which folded weeks, months, | even in some cases years ago, but the ISP bill was paid, so, | the web site continues to exist until somebody removes it, but | of course nobody cares ? BoughtByGoogle.example and | YetAnotherBayAreaCryptoStartup.example together with | DefinitelyViableProduct.example and | OopsWalmartAlreadySellsThatForLessMoney.example | | If it was 95% I'd be more worried, at 40% I'd need to actually | check at least a decent sample and see for myself. In the time | I was writing this post I checked one, it wasn't replaced... | exactly, because the actual web site uses a certificate issued | five days earlier. Chances are they've got a bunch of duplicate | certificates, so the fact that some they don't use are broken | has never come up - that's just rude (wastes other people's | resources) but it works fine technically. | tedunangst wrote: | Renewing a cert without immediately deploying it seems like a | reasonable practice in the face of CAs that will misissue | through no fault of your own. | schoen wrote: | When we wrote Certbot, we thought (by analogy with prior | practice) that many sysadmins would want to manually | inspect certificates before deploying them! That's one | reason that we kept old certificates around and used a | symlink-updating system. | | As it turned out, misissued and invalid certs account for | an incredibly small fraction of Let's Encrypt's issuance | volume (I'm going to say < 1/108 offhand?) and manual | inspection kind of gets in the way of automation, so the | idea of separating these steps has come to seem kind of | quaint, for me at least. I've also helped thousands of | people on the Let's Encrypt forum and I think at most 2 | have said they were interested in looking at their new | certs' contents before starting to use them. | tedunangst wrote: | I may not inspect it myself (which wouldn't even catch | this issue), but letting it simmer for a week isn't hard. | agwa wrote: | That's a pretty good idea, and would also mitigate | clients with slow clocks rejecting a certificate for not | being valid yet. | NovemberWhiskey wrote: | Based on my experience, the capability model for certificate | management usually went like: | | 1) Chaos: certificates requested and installed manually, either | in response to incidents caused by expiration or calendar | reminders | | 2) Monitoring: certificates requested and installed manually, | in response to noisy alerting by probers looking for | indications of pending expiration or other ill-health | | 3) Automation: continuous certificate provisioning, | distribution and enablement either through platform or | integration | | The Let's Encrypt revolution has taken a lot of people from | stage 1 to stage 3 without stage 2 in between. | hinkley wrote: | Vernor Vinge has dominated the Singularity space in science | fiction pretty much from the beginning of the concept. | | Rainbow's End plays around in time frame right around where we | are now, just a bit before the sorts of doglegs we predict | would presage a Singularity in your lifetime. | | At one point the protagonists need to attack a bad actor, and | to make it work they need chaos on the internet. I don't recall | exactly how this plays out, but the way they decide to achieve | it is that one of the collaborators believes that they can | reject a CA cert that affects 10% of all certificates in the | wild, and the resulting pandemonium will give them | approximately the sort of chaos they need. | | Sounds to me like maybe that is either no longer true, or never | was. | tialaramex wrote: | [Spoilers] | | They don't need Chaos. They want to disable Rabbit, and they | know Rabbit's certificates mostly tie back to a single CA, | Credit Suisse. So they "revoke" Credit Suisse and accept the | consequences, which (they acknowledge) are career ending for | the Europeans. This is mostly a plot convenience because | Rabbit is much too powerful to allow what Vinge wants to | happen next. | | No, you can't actually "revoke" a root CA, the decision to | trust (or not) a root is local. So this part of the novel is | a fantasy. But even if you assume it means that the European | authorities can somehow reach into Credit Suisse and cause it | to revoke all the intermediates (which _maybe_ is a plausible | reading) and so on down to end entity certificates, that | doesn 't really work either. Not on the time scale Vinge | needs for the novel. | | Hours are conceivable but unlikely. Days maybe. A week. But | the novel needs it to be seconds. | | There are two big obstacles to even the revocation which does | really exist. Firstly humans are _much_ more enthusiastic | about seeing Dancing Pigs than they are about safety, because | safety is a very abstract idea, whereas seeing dancing pigs | is an immediate reward. This is the Dancing Pigs problem, and | we 've put some effort in, it's _less_ likely a random Chrome | user would get their face ripped to pieces because they | wanted Dancing Pigs and so they bypassed the security checks | that would protected them - than say - fifteen years ago, but | only somewhat. | | Secondly though, there's not a great enthusiasm technically | for this sort of counter-measure. It's so rarely beneficial | in practice. Most of the time those humans were right, we | were just denying them Dancing Pigs. Their face _might_ get | ripped to pieces, but to be honest it 's as likely to be | because they deliberate went to "Rip My Face To | Pieces.example" as through anything we could have prevented. | This is only barely a technical problem. So, when there are | things we could do to get closer to what's in the novel, why | would we? | | Building the PKI which exists in Vinge's novel is probably a | bad expenditure of resources. | francislavoie wrote: | FWIW, if those websites used Caddy as their ACME client, then it | would have detected the certificate being revoked as soon as | possible via OCSP stapling and would have had the certificate | renewed. It's a shame that other ACME clients aren't as robust to | problems like this. (Disclaimer: I work on Caddy as a volunteer) | agwa wrote: | Note that the certificates were not revoked until 2023-06-19 at | 18:00. In contrast, ARI was updated on 2023-06-15 at 22:43 to | tell ARI-supporting clients (such as lego) to renew | immediately. That means Caddy served broken certificates for | almost 4 days longer than necessary. | | Are there plans for Caddy to support ARI? | mholt wrote: | > That means Caddy served broken certificates for almost 4 | days longer than necessary. | | This would be news to me. Do you have a source for Caddy | serving any of the affected certificates? I'd like as much | info as possible. | | > Are there plans for Caddy to support ARI? | | If ARI can be made into an effective mechanism, then yes. | ACMEz already supports the current draft. | | I know Francis linked to a forum category, here's some more | specific links for background: | | - https://community.letsencrypt.org/t/can-ari-conforming- | clien... | | - https://community.letsencrypt.org/t/thoughts-from- | starting-t... | agwa wrote: | _> This would be news to me. Do you have a source for Caddy | serving any of the affected certificates? I'd like as much | info as possible._ | | That's news to you? I informed you last week that Caddy | would serve broken certificates in this situation: | https://news.ycombinator.com/item?id=36344549 | | I omitted "would" from my previous comment, but I think | it's pretty clear from Francis' comment that we're | discussing a hypothetical situation, and neither of us know | if any of the 645 affected certificates were requested by | Caddy or not. | | I skimmed the forum links (it would be productive if you | could send a email summarizing your thoughts to the IETF | ACME WG) and it seems like your complaints could also be | said of OCSP so it's hard to figure out why OCSP is OK for | Caddy but ARI isn't. | | FWIW, there's currently a ballot in the CABF which would | make OCSP optional for CAs, so OCSP may be on the way out | in the WebPKI. | mholt wrote: | You said: | | > Caddy served broken certificates | | So yes, that would be news to me. I'm asking for more | information. If Caddy did not serve broken certificates, | then I would appreciate clarification there so I know | where to spend my energy. | | > (it would be productive if you could send a email | summarizing your thoughts to the IETF ACME WG) | | I did this once and it was like talking into a black | hole. All the responses I got to the issue I brought up | were laced with complacency. | | > I skimmed the forum links and it seems like your | complaints could also be said of OCSP so it's hard to | figure out why OCSP is OK for Caddy but ARI isn't. | | Because OCSP does what it's intended to do. ARI does not. | | > FWIW, there's currently a ballot in the CABF which | would make OCSP optional for CAs, so OCSP may be on the | way out in the WebPKI. | | I am tracking that proposal and get daily notifications. | It is only for short-lived certs. I would be thrilled if | we could replace revocation -- and OCSP -- with short- | lived certs. | agwa wrote: | _> So yes, that would be news to me. I 'm asking for more | information. If Caddy did not serve broken certificates, | then I would appreciate clarification there so I know | where to spend my energy._ | | This is not engaging in good faith. | | _> I am tracking that proposal and get daily | notifications. It is only for short-lived certs._ | | It would make OCSP optional for all certificates. CRLs | would be optional only for short-lived certs. | mholt wrote: | > This is not engaging in good faith. | | Sorry, come again? Why so combative? | francislavoie wrote: | > Note that the certificates were not revoked until | 2023-06-19 at 18:00. | | Ah okay, I missed that. | | > Are there plans for Caddy to support ARI? | | It's... complicated. Matt argues that ARI does not make sense | for a variety of reasons. You can find the complex and deep | discussions about it on the LE forums. Do a Ctrl+F for ARI in | https://community.letsencrypt.org/c/client-dev/14 to find | them, there's a lot. | ElongatedMusket wrote: | Thanks for following through on this writeup! I knew LE certs | were publicly logged but didn't know the logs were decentralized | or how they hold the CA accountable. Appreciate the layman | explanation. | fruitreunion1 wrote: | Will non-browser clients like curl/requests ever support checking | CT logs? It's great that some browsers have it, but browsers are | not the only clients using TLS with CAs. Also doesn't help that a | lot of software can't use CA root stores with much granularity: | https://news.ycombinator.com/item?id=33876949 | agwa wrote: | Hopefully, although there are challenges to overcome. CT is a | fast-moving ecosystem, with logs coming and going, and policies | changing regularly. This requires CT-enforcing clients to be | very on-the-ball with updates, both in the sense that the | developers need to pay attention and update their code in time, | and any users of the apps need to upgrade frequently. Browser | makers can handle this because they are competently-staffed and | well-resourced. The authors of non-browser apps need to know | what they're getting into. | | A cautionary tale: there is a library for adding CT enforcement | to Android apps. Earlier this year, every app using this | library was suddenly unable to establish any TLS connections | because Google stopped publishing a JSON file which the library | should never have been consuming in the first place. There was | plenty of warning that this would happen, but the author of the | library was not on-the-ball. | https://groups.google.com/g/certificate-transparency/c/38Lr9... | NovemberWhiskey wrote: | The elephant in the room is that TLS implementations for | browsers and those in the libraries of common programming | languages have diverged really substantially: Web PKI is | massively more restrictive and depends on a bunch of technology | that's not in the baseline PKI. ___________________________________________________________________ (page generated 2023-06-22 23:00 UTC)