[HN Gopher] Graceful behavior at capacity ___________________________________________________________________ Graceful behavior at capacity Author : ingve Score : 57 points Date : 2023-08-08 06:41 UTC (1 days ago) (HTM) web link (blog.nelhage.com) (TXT) w3m dump (blog.nelhage.com) | ChrisMarshallNY wrote: | I'm writing an app that is likely to have, at most, a thousand | users, for the next couple of years. | | I'm testing it with 12,000 fake users. | | It works great. | rdoherty wrote: | I learned most of this the hard way as a SRE. How systems behave | at and over their limits is far more important than how they | behave under them. A system that is 'forgiving' (aka resilient) | is worth its weight in gold. Otherwise you get into downward | spirals with systems that can't recover unless they are rebooted. | Great read! | thewakalix wrote: | From my armchair, I'm not sure that "random drop" actually does | decrease latency. Most clients will just repeat the request, | resulting in an "effective latency" of however many times it gets | randomly dropped. The queue is now implicit, and I'd guess that | it's less efficient to carry out several request/drop cycles than | to just leave the client in a straightforward queue. | notacoward wrote: | FWIW, the single paragraph about "fair allocation" could be its | own thesis. This gets into quality of service, active queue | management, leaky buckets, deficit round robin, and so on _ad | infinitum_. I did quite a bit of work on this on multiple | projects at multiple companies, and it 's still one of the very | few algorithmic areas that I still think about in retirement. I | highly recommend following up on some of the terms above for some | interesting explorations. | tra3 wrote: | Most of the microservice code I see is response = | fetch(url, payload) if (response.error) ... | | but 99% of the folks I ask what is going to happen when the fetch | does NOT error out but instead takes 10 seconds look at me like | I'm speaking gibberish. | | This is the single biggest reason for cascading failures I see. | | Netflix has dealt with it via their Hystrix library (open | source). These days it seems like a proxy like Consul is the way | to go. It encapsulates all of the fancy logic (like circuit | breakers and flow control) so your service doesn't have to. | jiggawatts wrote: | As an ardent fan of monoliths and how they generally avoid such | tar pits, I have to acknowledge that service-oriented | architectures have their uses. | | So do we all have to keep reinventing these wheels, but only | after a production outage? | | Or is it time someone started work on a distributed operating | system? Vaguely like Kubernetes but full-featured? | | I keep seeing the same patterns being re-engineered over and | over. Maybe it's time to refactor these out... | klooney wrote: | It's more work, is the simple answer. ___________________________________________________________________ (page generated 2023-08-09 23:00 UTC)