[HN Gopher] Graceful behavior at capacity
       ___________________________________________________________________
        
       Graceful behavior at capacity
        
       Author : ingve
       Score  : 57 points
       Date   : 2023-08-08 06:41 UTC (1 days ago)
        
 (HTM) web link (blog.nelhage.com)
 (TXT) w3m dump (blog.nelhage.com)
        
       | ChrisMarshallNY wrote:
       | I'm writing an app that is likely to have, at most, a thousand
       | users, for the next couple of years.
       | 
       | I'm testing it with 12,000 fake users.
       | 
       | It works great.
        
       | rdoherty wrote:
       | I learned most of this the hard way as a SRE. How systems behave
       | at and over their limits is far more important than how they
       | behave under them. A system that is 'forgiving' (aka resilient)
       | is worth its weight in gold. Otherwise you get into downward
       | spirals with systems that can't recover unless they are rebooted.
       | Great read!
        
       | thewakalix wrote:
       | From my armchair, I'm not sure that "random drop" actually does
       | decrease latency. Most clients will just repeat the request,
       | resulting in an "effective latency" of however many times it gets
       | randomly dropped. The queue is now implicit, and I'd guess that
       | it's less efficient to carry out several request/drop cycles than
       | to just leave the client in a straightforward queue.
        
       | notacoward wrote:
       | FWIW, the single paragraph about "fair allocation" could be its
       | own thesis. This gets into quality of service, active queue
       | management, leaky buckets, deficit round robin, and so on _ad
       | infinitum_. I did quite a bit of work on this on multiple
       | projects at multiple companies, and it 's still one of the very
       | few algorithmic areas that I still think about in retirement. I
       | highly recommend following up on some of the terms above for some
       | interesting explorations.
        
       | tra3 wrote:
       | Most of the microservice code I see is                 response =
       | fetch(url, payload)       if (response.error) ...
       | 
       | but 99% of the folks I ask what is going to happen when the fetch
       | does NOT error out but instead takes 10 seconds look at me like
       | I'm speaking gibberish.
       | 
       | This is the single biggest reason for cascading failures I see.
       | 
       | Netflix has dealt with it via their Hystrix library (open
       | source). These days it seems like a proxy like Consul is the way
       | to go. It encapsulates all of the fancy logic (like circuit
       | breakers and flow control) so your service doesn't have to.
        
         | jiggawatts wrote:
         | As an ardent fan of monoliths and how they generally avoid such
         | tar pits, I have to acknowledge that service-oriented
         | architectures have their uses.
         | 
         | So do we all have to keep reinventing these wheels, but only
         | after a production outage?
         | 
         | Or is it time someone started work on a distributed operating
         | system? Vaguely like Kubernetes but full-featured?
         | 
         | I keep seeing the same patterns being re-engineered over and
         | over. Maybe it's time to refactor these out...
        
           | klooney wrote:
           | It's more work, is the simple answer.
        
       ___________________________________________________________________
       (page generated 2023-08-09 23:00 UTC)