[HN Gopher] How do you reason about a probabilistic distributed ... ___________________________________________________________________ How do you reason about a probabilistic distributed system? Author : ahelwer Score : 65 points Date : 2020-09-11 16:10 UTC (6 hours ago) (HTM) web link (ahelwer.ca) (TXT) w3m dump (ahelwer.ca) | hinkley wrote: | Statistics was one of the first classes I really struggled with | in school. | | When something grabs your attention, you start to suffer from | Frequency Bias (like 'all of a sudden' everyone has the same car | you bought or are going to buy). But in this case, everyone | _really_ does suck at statistics. Which probably explains why | when I learned what an actuary was and that they made about the | same as an anesthesiologist, I wasn 't that surprised. | | The ironic part for me is that I kept thinking dependent | variables were independent, whereas most of the people I observe | think independent variables are dependent (gambler's fallacy, | etc), and so vastly underestimate the potential for a 'bad run' | of events. | | When I still played WoW they posted an article about how they | tweaked probabilities a bit to avoid frustrating people. Much of | the questing in WoW is based on succeeding at an action a set | number of times that has a probability of failure, which turns it | into a normal distribution. Your buddy might have gotten lucky | and finished a quest in two minutes. Meanwhile, there's a 1 in a | million chance that you could still be working on that quest 5 | hours later. And there are 10 million users... | | So to prevent rage quitting, they made the success rate | dependent. Which they can do because the scenario is fictitious. | You can't do the same thing in the real world. | | I'm not sure if there's a way through this or around it (except | maybe "always hire a professional"). More likely in 10 years we | will discuss how we should have just backed away slowly from | these probabilistic situations and done something else. | duutfhhh wrote: | That's a worthy idea to learn. I've actually found it quite | trivial, which means I'm either too smart or too dumb to | understand it. That MDP diagram reminded me of the Feynman's path | integral model of electrons: the double slit experiment looks as | if electrons take all possible, within constraints, paths and the | interference image is the sum of them. Except that in the MDP | diagram, parameters of the two slits can vary, within some | boundaries, and that varying changes the interference image, so | the question that MDP answers is "how the interference image can | change if we vary the slits parameters within these boundaries"? | hwayne wrote: | > That's a worthy idea to learn. I've actually found it quite | trivial, which means I'm either too smart or too dumb to | understand it. | | _Or_ that Andrew is very good at explaining things :) | mjb wrote: | Very cool post. I like this point particularly: | | > In this case we can't assume, in general, that the thread | scheduler will assign each thread processor time with uniform | probability. That assumption makes even less sense in a fully | distributed system with processes running on separate computers | connected by a network. | | I would love to see tools in this space mature some more. TLA+ | has had a huge influence on how I approach distributed and | concurrency problems, and a large influence on the industry in | general. Modelling non-determinism (beyond the 'try all paths' | approach), and assigning probabilities to outcomes, seems like | the large missing piece needed to model real-world systems and | protocols. | | In systems with limited redundancy, for example, there's always a | "transition to a failure state" path which tends to get ignored | in modelling with tools like TLA+. That's because these tools can | tell you that the whole thing _might_ fail (which is useful), but | not the probability that the whole thing would fail (which is | significantly more useful). | hinkley wrote: | One of the interesting things about Raft that doesn't get focused | on much is that strictly speaking Raft does not require that all | nodes be able to see each other, just that some nodes can see all | of the other nodes. | | One of the reasons we use consensus protocols is because with | enough hardware involved, the machines cannot even agree about | which machines are up and responding to traffic. | | In Raft, a leader has to be able to communicate with more than | half of the cluster. But the followers don't strictly have to be | able to see each other (and if I understand the election protocol | correctly, not even during election). The leader is the hub of a | wheel, not a member of a star-connected graph, and as long as the | 'spokes' are all working, there is no partition. Even if | individual links keep disappearing. | | Is that true of the snowflake protocols mentioned here? Or do | they require a star topology to mostly be intact? | mjb wrote: | > One of the interesting things about Raft that doesn't get | focused on much is that strictly speaking Raft does not require | that all nodes be able to see each other, just that some nodes | can see all of the other nodes. | | The same is true of Paxos. If you have a common view of | participant health, then consensus is much easier than in the | general async model, so Raft and Paxos (and Viewstamped | Replication, etc) are like this for a reason. | | I don't know about SnowFlake, but there is a very interesting | tradeoff here between graph connectedness and resilience to the | failure of masters (e.g. if the topology is really a star, then | you have no real redundancy and you may as well have a single | master rather than using consensus). | stack_underflow wrote: | Couldn't message you on linkedin for some reason (looks like | they've locked the messaging feature to premium now??) - but just | FYI, looks like the twitter account linked on your blog in the | footer is owned by someone else. | ahelwer wrote: | Oh thanks! Looks like another Helwer reclaimed it. I think you | can message people on linkedin as part of a connect request | maybe? | hwayne wrote: | One thing I've started doing with my PRISM models that helps a | lot is using formulas as a poor man's strings. Like instead of | // Not-yet-flipped: 0, heads: 1, tails: 2 | | I'd do formula unflipped = 0; formula | heads = 1; formula tails = 2; | | Another thing that helps a lot is writing templates and then | writing scripts that write out the spec for you. I'm doing this | with a queueing problem and it's godsent. ___________________________________________________________________ (page generated 2020-09-11 23:01 UTC)