[HN Gopher] Metastable Failures in Distributed Systems [pdf]
       ___________________________________________________________________
        
       Metastable Failures in Distributed Systems [pdf]
        
       Author : zekrioca
       Score  : 70 points
       Date   : 2021-10-04 17:52 UTC (5 hours ago)
        
 (HTM) web link (sigops.org)
 (TXT) w3m dump (sigops.org)
        
       | ctlachance wrote:
       | This paper introduced me to a new concept in system architecture.
       | Thanks for posting it!
        
       | mjb wrote:
       | I think this paper is super important, and anybody who designs or
       | runs big systems should read it and take the core point to heart.
       | As system designers, we're very used to thinking about systems as
       | 'stable' and 'unstable', where stability is good, and instability
       | is bad. What this paper points out is that many kinds of
       | distributed systems have multiple 'stable' modes, some of which
       | are modes where the system is stable (in a control theory sense),
       | but not doing any useful work from the client's perspective. This
       | is dangerous, because the system won't kick itself out of this
       | "stable but down" mode without something changing: human input, a
       | control plane taking action, etc.
       | 
       | I don't think this paper covers anything particularly new, but
       | writing it down in this form, with the evidence they present, is
       | very valuable. Hopefully this paper will deepen the conversation
       | about applying control theory to distributed systems design and
       | control problems, and allow a more theoretical approach to be
       | taken to the design of these systems to avoid common causes of
       | instability and bistability.
       | 
       | One of the authors has a great summary of the paper on his blog:
       | http://charap.co/metastable-failures-in-distributed-systems/
       | 
       | I wrote a summary and discussion too:
       | https://brooker.co.za/blog/2021/05/24/metastable.html
        
       | dang wrote:
       | Discussed 4 months ago:
       | 
       |  _Metastable Failures in Distributed Systems_ -
       | https://news.ycombinator.com/item?id=27506167 - June 2021 (11
       | comments)
       | 
       | ...but on a day like today we dare not mark it as a dupe.
        
       ___________________________________________________________________
       (page generated 2021-10-04 23:00 UTC)