[HN Gopher] 1.1.1.1 lookup failures on October 4th, 2023 ___________________________________________________________________ 1.1.1.1 lookup failures on October 4th, 2023 Author : todsacerdoti Score : 64 points Date : 2023-10-04 19:41 UTC (3 hours ago) (HTM) web link (blog.cloudflare.com) (TXT) w3m dump (blog.cloudflare.com) | homero wrote: | This got me. I spent an hour trying to figure out why my Internet | seemingly went down but not fully | throwaway67743 wrote: | [flagged] | jarym wrote: | Up until a few months ago the HN crowd loved Cloudflare. How | sentiment has changed in such a short period. | | My guess would be their weird 'site protection' stuff is | burning too many people and negatively impacting their | reputation. | kkielhofner wrote: | > My guess would be their weird 'site protection' stuff is | burning too many people and negatively impacting their | reputation. | | What's always been interesting to me about this take is it's | not as though Cloudflare is randomly inserting themselves in | internet traffic. | | Cloudflare customers have choice in the marketplace and they | chose Cloudflare for whatever reasons. If end-users take | issue with accessing the site of a Cloudflare customer they | should take it up with the owners of the site that chose | Cloudflare. Theoretically the Cloudflare customer would take | it up with them if it becomes problematic. Cloudflare has no | obligation to the site end-users other than meeting the needs | of their customer who does have obligation to their end-users | (theoretically). | | Cloudflare is, ostensibly, providing a solution for their | customers. How that impacts their customer's end-users is | between Cloudflare and the customer. | reaperman wrote: | In general, I always seem to find comments along the lines of | this are very easy to thoroughly disprove. There has been | consistent criticism of Cloudflare for many years, ever since | the majority of web traffic started going through their anti- | DDOS and anti-bot gateways. | | Here's a HN post with lots of very critical comments[0] from | 7 years ago, including a fairly scathing one from 'tptacek. | Even way back then, you'd get the same comments you hear | today like: | | > So rather than demand fixes for the fundamental issues that | enable ddos attacks (preventing IP spoofing, allowing | infected computers to remain connected, etc), we just | continue down this path of massive centralization of services | into a few big players that can afford the arms race against | bonnets. Using services like Cloudflare as a 'fix' is | wrecking the decentralized principles of the Internet. At | that point we might as well just write all apps as Facebook | widgets. | | 0: https://news.ycombinator.com/item?id=13718947 | throwaway67743 wrote: | I've never loved cloudflare - as someone doing this long | before they existed I see through their wordy blog posts | about rookie mistakes. It's embarrassing really. | Eduard wrote: | maybe to compensate Cloudflare's success blog posts where they | usually represent themselves as the saviors of the world. | throwaway67743 wrote: | Quite. Nobody else can do what they do! (Brb doing the same | thing before Prince was even born) | kkielhofner wrote: | This is peak HN comment. | | 300 pops around the world delivering 210 Tbps of capacity, | mitigation of some of the largest DDoS attacks in history, | 20% of internet traffic. Workers, Pages, R2, D1, Zero | Trust, Stream, Images, Warp, 1.1.1.1, etc, etc, etc - all | at incredible scale. | | But yes, of course you have been doing the exact same thing | since before Prince was born. | throwaway67743 wrote: | People had global networks of the same scale long before, | they just didn't offer the same features because they had | different products. | Zambyte wrote: | I would rather they be open about their failures than deceptive | about it. Of course simply not failing would be ideal, but we | don't live in a perfect world. If a single, external point of | failure causes your system to crumble, that's a design problem, | not a dependency problem. | reaperman wrote: | To your point, Cloudflare leadership are pretty active on HN. | They generally do a pretty good job of providing detailed | explanations to good-faith questions here and providing | decent post-mortems of major incidents to the HN community. | | They do take care to avoid engaging with people who are | opposed to their dominance on ideological levels ("no one | should be the gatekeeper for that much of the internet", etc) | and there are a small handful of questions they seem to avoid | (e.g. direct feature-to-feature comparisons between Warp and | Mullvad) | throwaway67743 wrote: | They use transparency as a cover for rookie mistakes it's not | the same as actual transparency. Especially as these are | really bad examples of doing it wrong. | aftbit wrote: | They're practicing "just culture" (as in justice), which | rewards explaining and root causing your failures, and rejects | the concept that "someone sucks" in favor of "systems can | always be improved". | LeoPanthera wrote: | Did 1.0.0.1 also go down? The article doesn't say. | homero wrote: | Of course it did it's the same service | toast0 wrote: | A highly reliable service might run one partition on a | completely separate serving stack. It's worth asking. | morugam wrote: | We noticed this through our own, homegrown scripts that check for | this, having been screwed by an outage a few years ago. I'm happy | they so quickly acknowledge and explain these issues. Good work! | suprjami wrote: | Strangely I noticed this because some parts of eBay stopped | loading. I spent a while troubleshooting my privacy/adblock | nonsense because _surely CloudFlare couldn 't be down_ but that's | the only conclusion I could come to. | tedunangst wrote: | > Visit 1.1.1.1 from any device to get started with our free app | that makes your Internet faster and safer. | | Ironic. | denysvitali wrote: | My only concern: | | 7:57 UTC: first reports coming in | | I noticed this issue quite quickly ("reported" at 7:54 UTC [1]), | and I noticed I wasn't alone thanks to Twitter / X. I tried to | get in touch with Cloudflare to report this issue - but I haven't | found any meaningful contact other than Twitter. | | For such an important service, I'm impressed there is no contact | email / form where you can get in touch with the engineers | responsible for keeping the service up and running. | | Other than that, kudos for the well written blog post - as | always! | | [1]: https://nitter.net/DenysVitali/status/1709476961523835246 | araes wrote: | I like how its a 42 joke. | | 4(0b10) 7:00 ends at 11:02 (4 hr 2 min) on a 4 sum 2x2. And refs | to 1.1.1.1 vs 1.0.0.1 | robhlt wrote: | The lack of additional alerts in the Remediation section is a | little bit concerning. Adding an alert for serving stale root | zone data is great, but I think a few more would be very useful | too: | | - There's a clear uptick in SERVFAIL responses at 7:00 UTC but | they don't start their response until an hour later after | receiving external reports. This uptick should have automatically | triggered an alert. It can't have been within the normal range | because they got customer reports about it. | | - The resolver failed to load the root zone data on startup and | resorted a fallback path. Even if this isn't an error for the | resolver it should still be an alert for the static_zone service, | because its only client is failing to consume its data. | | - The static_zone service should also alert when some percentage | of instances fail to parse the root zone data, to get ahead of | potential problems before the existing data becomes stale. | ChrisArchitect wrote: | Earlier discussion while outage was active: | https://news.ycombinator.com/item?id=37763143 ___________________________________________________________________ (page generated 2023-10-04 23:00 UTC)