[HN Gopher] How large is that number in the Law of Large Numbers?
       ___________________________________________________________________
        
       How large is that number in the Law of Large Numbers?
        
       Author : sebg
       Score  : 110 points
       Date   : 2023-09-12 13:06 UTC (9 hours ago)
        
 (HTM) web link (thepalindrome.org)
 (TXT) w3m dump (thepalindrome.org)
        
       | wodenokoto wrote:
       | I really like how the plots and graphics look. Is it the library
       | by 3blue1brown? (Is it manim, it's called?)
        
       | bannedbybros wrote:
       | [dead]
        
       | yafbum wrote:
       | Stats class role of thumb: if you need to calculate the relative
       | probability of two outcomes, you can get to within about 10% once
       | you get 100 samples of each outcome (so, need more samples
       | overall if the distribution is skewed).
        
         | gear54rus wrote:
         | Its interesting that even in this thread the 2 answers differ
         | by an order of magnitude lol
        
           | gipp wrote:
           | Eh it's really the same rule, just applying a different
           | threshold.
        
             | marcosdumay wrote:
             | The problem is that the sensitivity to the number growth is
             | supposed to be exponential. So if you need 100 samples for
             | "within 10% of the value", then 10 samples should give you
             | almost completely random behavior.
             | 
             | In reality, it depends on your actual distribution, but the
             | OP from this thread here is unreasonably conservative for
             | something described as a "rule of thumb". Almost always, if
             | you have at least 10 of every category, you can already
             | discover every interesting thing that a rule of thumb will
             | allow. And you probably could go with less. But if you want
             | precision, you can't get it with rules of thumb.
        
               | CaptainNegative wrote:
               | The dependence on sample size is not exponential, it's
               | sublinear. The heuristic rate of convergence to keep in
               | mind is the square root of the sample size, i.e. getting
               | 10x more samples shrinks the margin of error (in a
               | multiplicative sense) by sqrt(10) [?] 3ish.
               | 
               | The exponential bit applies to the probability densities
               | as a function of the bounds themselves, i.e. how likely
               | you are to fall x units away from the mean typically
               | decreases exponentially with (some polynomial in) x.
               | 
               | Of course, this is all assuming a whole bunch of standard
               | conditions on the data you're looking at (independence,
               | identically distributed, bounded variance, etc.) and may
               | not hold if these are violated.
        
               | [deleted]
        
           | jcranmer wrote:
           | FWIW, the threshold I learned was 20 in each bucket, so now
           | you have 3 answers.
        
           | koolba wrote:
           | Just apply it recursively. Let's get 100 samples of comments
           | suggesting the number of samples to use. Then average those.
        
       | dragontamer wrote:
       | My statistics class at high school level taught the following:
       | 
       | The number of samples you need is very difficult to calculate
       | correctly, requiring deep analysis of standard deviations and
       | variances.
       | 
       | But surprisingly, you can simply know you've reached large number
       | status when over 10 items exist in each category.
       | 
       | ---------
       | 
       | Ex: when doing a heads vs tails coin flip experiment, you likely
       | have a large number once you have over 10 heads and over 10
       | tails. No matter how biased the coin is.
       | 
       | Or in this 'Lotto ticket' example, you have a large number of
       | samples after gathering enough data to find over 10 Jackpot
       | winners.
        
         | jmount wrote:
         | Very cool rule.
         | 
         | I think you can justify it by approximating each category as an
         | independent Poisson distribution. Then for each such processes
         | the variance equals the mean. So once you have 10 successes in
         | a bin, you have evidence of a probably good estimate for the
         | arrival rate of that category. The book "The Probabilistic
         | Method" calls a related idea "the Poisson paradigm."
         | 
         | (10 a nice round number where the standard deviation is below
         | the mean)
        
           | jmount wrote:
           | Small proviso: this is only true for a reasonable number of
           | categories (or you run into repeated experiment problems).
        
         | NelsonMinar wrote:
         | That's a neat rule of thumb; is there a simple statistical
         | argument for why 10 is the (not very large) number?
        
         | [deleted]
        
         | tgv wrote:
         | For heads or tails, that leaves a very large margin. In approx.
         | 1 in 20 trials, you'll end up with a 10-20 split.
        
           | dragontamer wrote:
           | Yeah, 95% confidence ratio (or approximately two standard
           | deviations) is pretty standard with regards to statistical
           | tests.
           | 
           | You gotta draw the line somewhere. At high-school statistics
           | level, its basically universally drawn at the 95% confidence
           | level. If you wanna draw new lines elsewhere, you gotta make
           | new rules yourself and recalculate all the rules of thumb.
        
             | User23 wrote:
             | I remember my high school AP Psychology teacher mocking
             | p=0.05 as practically meaningless. In retrospect it's funny
             | for a psychologist to say that, but I guess it was because
             | he was from the more empirically minded behaviorist
             | cognitive school and from time to time they have done
             | actual rigorous experiments[1] (in rodents).
             | 
             | [1] For example as described by Feynman in Cargo Cult
             | Science.
        
               | tgv wrote:
               | The observation above is simply true. If you toss a coin
               | 30 times, there's about a 5% chance that you'll end up
               | with 10-20 ratio or one more extreme.
               | 
               | NHST testing inverts the probability logic, makes the 5%
               | holy, and skims over the high probability of finding
               | something that is not equal to a specific value. That
               | procedure is then used for theory confirmation, while it
               | was (in another form) meant for falsification. Everything
               | is wrong about it, even if the experimental method is
               | flawless. Hence the reproducibility crisis.
        
               | lisper wrote:
               | The problem is two-fold:
               | 
               | 1. p=0.05 means that one result in 20 is going to be the
               | result of chance.
               | 
               | 2. It's generally pretty easy (especially in psychology)
               | to do 20 experiments, cherry-pick -- and publish! -- the
               | p=0.05 result, and throw away the others.
               | 
               | The result is that _published_ p=0.05 results are much
               | _more_ likely than 1 in 20 to be the result of chance.
        
               | dragontamer wrote:
               | So run a meta-study upon the results published by a set
               | of authors and double-check to make sure that their
               | results are normally distributed across the p-values
               | associated with their studies.
               | 
               | These problems are solved problems in the scientific
               | community. Just announce that regular meta-studies will
               | be done, expectations for authors to be normally
               | distributed is published, and publicly show off the meta-
               | study.
               | 
               | -------------
               | 
               | In any case, the discussion point you're making is well
               | beyond the high-school level needed for a general
               | education. If someone needs to run their own experiment
               | (A/B testing upon their website) and cannot afford a
               | proper set of tests/statistics, they should instead rely
               | upon high-school level heuristics to design their
               | personal studies.
               | 
               | This isn't a level of study about analyzing other
               | people's results and finding flaws in other people's
               | (possibly maliciously seeded) results. This is a
               | heuristic about how to run your own experiments and how
               | to prove something to yourself at a 95% confidence level.
               | If you want to get published in the scientific community,
               | the level of rigor is much higher of course, but no one
               | tries to publish a scientific paper on just a high school
               | education (which is where I was aiming my original
               | comment at).
        
               | User23 wrote:
               | There's a professor of Human Evolutionary Biology at
               | Harvard who only has a high school diploma[1]. Needless
               | to say he's been published and cited many times over.
               | 
               | [1] https://theconversation.com/profiles/louis-
               | liebenberg-122680...
        
               | withinboredom wrote:
               | I don't know whether you're mocking them or being
               | supportive of them or just stating a fact. Either way,
               | education level has no bearing on subject knowledge. I
               | know more about how computers, compilers, and software
               | algorithms work than most post-docs and professors that
               | I've run into in those subjects.
               | 
               | Am I smarter than them? Nope. Do I know as many fancy big
               | words as them? Nope. Do I care about results and
               | communicating complex topics to normal people? Yep. Do I
               | care more about making the company money than chasing
               | some bug-bear to go on my resume? Yep.
               | 
               | I fucking hate school and have no desire to ever go back.
               | I can't put up with the bullshit, so I dropped out; I
               | just never stopped studying and I don't need a piece of
               | paper to affirm that fact.
        
               | lisper wrote:
               | First, I was specifically responding to this:
               | 
               | > I remember my high school AP Psychology teacher mocking
               | p=0.05 as practically meaningless.
               | 
               | and trying to explain why the OP's teacher was probably
               | right.
               | 
               | Second:
               | 
               | > So run a meta-study upon the results published by a set
               | of authors and double-check to make sure that their
               | results are normally distributed across the p-values
               | associated with their studies.
               | 
               | That won't work, especially if you only run the meta-
               | study on published results because it is all but
               | impossible to get negative results published. Authors
               | don't need to cherry-pick, the peer-review system does it
               | for them.
               | 
               | > These problems are solved problems in the scientific
               | community.
               | 
               | No, they aren't. These are social and political problems,
               | not mathematical ones. And the scientific community is
               | pretty bad at solving those.
               | 
               | > the discussion point you're making is well beyond the
               | high-school level needed for a general education
               | 
               | I strongly disagree. I think everyone needs to understand
               | this so they can approach scientific claims with an
               | appropriate level of skepticism. Understanding how the
               | sausage is made is essential to understanding science.
               | 
               | And BTW, I am not some crazy anti-vaxxer climate-change
               | denialist flat-earther. I was an academic researcher for
               | 15 years -- in a STEM field, not psychology, and even
               | _that_ was sufficiently screwed up to make me change my
               | career. I have advocated for science and the scientific
               | method for decades. It 's not science that's broken, it's
               | the academic peer-review system, which is essentially
               | unchanged since it was invented in the 19th century.
               | _That_ is what needs to change. And that has nothing to
               | do with math and everything to do with politics and
               | economics.
        
               | Viliam1234 wrote:
               | > p=0.05 means that one result in 20 is going to be the
               | result of chance.
               | 
               | You made the same mistake most people make here: you
               | turned the arrow of the implication. It is not
               | "successful experiment implies chance (probability 5%)"
               | but "chance implies successful experiment (probability
               | 5%)".
               | 
               | What does that mean in practice? Imagine a hypothetical
               | scientist that is fundamentally confused about something
               | important, so _all_ hypotheses they generate are false.
               | Yet, using p=0.05, 5% of those hypotheses will be
               | "confirmed experimentally". In that case, it is not 5% of
               | the "experimentally confirmed" hypotheses that are wrong
               | -- it is full 100%. Even without any cherry-picking.
               | 
               | The problem is not that p=0.05 is too high. The problem
               | is, it doesn't actually mean what most people believe it
               | means.
        
               | lisper wrote:
               | I think we're actually in violent agreement here, but I
               | just wasn't precise enough. Let me try again:
               | p=0.05 means that one POSITIVE result in 20 is going to
               | be the result of chance and not causality
               | 
               | In other words: if I have some kind of intervention or
               | treatment, and that intervention or treatment produces
               | some result in a test group relative to a control group
               | with p=0.05, then the odds of getting that result simply
               | by chance and not because the treatment or intervention
               | actually had an effect are 5%.
               | 
               | The practical effect of this is that there are two
               | different ways of getting a p=0.05 result:
               | 
               | 1. Find a treatment or intervention that actually works
               | or
               | 
               | 2. Test ~20 different (useless) interventions. Or test
               | one useless intervention ~20 times.
               | 
               | A single p=0.05 result in isolation is useless because
               | there is no way to know which of the two methods produced
               | it.
               | 
               | This is why replication is so important. The odds of
               | getting a p=0.05 result by chance is 5%. But the odds of
               | getting TWO of them in sequential trials is 0.25%, and
               | the odds of a positive result being the result of pure
               | chance decrease exponentially with each subsequent
               | replication.
        
         | [deleted]
        
       | jameshart wrote:
       | Curious how people are 'applying' the Law of Large Numbers in a
       | way that needs this advice to be tacked on?
       | 
       | > Always keep the speed of convergence in mind when applying the
       | law of large numbers.
       | 
       | Any 'application' of the LLN basically amounts to replacing some
       | probalistic number derived from a bunch of random samples with
       | the _expected value_ of that number... and tacking on 'for
       | sufficiently large _n_ ' as a caveat to your subsequent
       | conclusions.
       | 
       | Figuring out whether, in practical cases, you will have a
       | sufficiently large _n_ that the conclusion is valid is a
       | necessary step in the analysis.
        
         | LudwigNagasena wrote:
         | > Figuring out whether, in practical cases, you will have a
         | sufficiently large n that the conclusion is valid is a
         | necessary step in the analysis.
         | 
         | The econometrics textbook I studied has more words "asymptotic"
         | in it than there are pages. Oftentimes it's impractical or even
         | theoretically intractable to derive finite sample properties
         | (and thus to answer when n is _really_ large enough).
        
       | gloryless wrote:
       | This kind of intuition is why a high school level statistics or
       | probability class seems so so valuable. I know not everyone will
       | use the math per se, but the concepts apply to everyday life and
       | are really hard to just grasp without having been taught it at
       | some point.
        
         | zodmaner wrote:
         | The sad thing is, having a mandatory high school level
         | statistics & probability class alone is not enough, you'll also
         | need a good curriculum and a competent teacher to go along with
         | it. Otherwise, it wouldn't work: a bad curriculum taught badly
         | by a unmotivated or unqualified teacher will almost always fail
         | to teach the intuition, or, even worse, alienates students from
         | the materials.
        
       | Dylan16807 wrote:
       | > This means that on average, we'll need a fifty million times
       | larger sample for the sample average to be as close to the true
       | average as in the case of dice rolls.
       | 
       | This is "as close" in an absolute sense, right?
       | 
       | If I take into account that the lottery value is 20x larger, and
       | I'm targeting relative accuracy, then I need 2.5 million times as
       | many samples?
        
       | causality0 wrote:
       | Am I the only one unreasonably annoyed that his graphs don't
       | match the description of his rolls?
        
       | alexb_ wrote:
       | If you had a gambling game that was simply "heads or tails, even
       | money", you would expect over a Large Number of trials that you
       | would get 0. But once you observe exactly one trial, the expected
       | value because +1 or -1 unit. We know this is always going to
       | happen one way or the other. Why then, does the bell curve of
       | "expected value" for this game not have two peaks, at 1 and -1?
       | Why does it peak at 0 instead?
       | 
       | What I'm asking about, I know I'm wrong about - I just want to
       | know how I can derive that for myself.
        
         | ineptech wrote:
         | "The expected value of a random variable with a finite number
         | of outcomes is a weighted average of all possible outcomes." --
         | https://en.wikipedia.org/wiki/Expected_value
        
           | alexb_ wrote:
           | That makes sense, I was always thinking of it as "Given an
           | infinite number of trials..."
        
             | ineptech wrote:
             | Whether/when its better to think in terms of "X has a 37%
             | chance of happening in a single trial" vs "If you ran a lot
             | of trials, X would happen in 37% of them" is kind of a
             | fraught topic that I can't say much about, but you might
             | find this interesting:
             | https://en.wikipedia.org/wiki/Probability_interpretations
        
         | munchbunny wrote:
         | The intuitive explanation is that the effect of a single sample
         | on the average diminishes as you take more samples. So, hand-
         | waving a bit, let's assume it's true that over a large number
         | of trials you would expect the average to converge to 0. You
         | just tossed a coin and got heads, so you're at +1. The average
         | of (1 + 0*n)/(n+1) still goes to 0 as n grows bigger and
         | bigger.
         | 
         | That skips over the distinction between "average" and
         | "probability distribution", but those are nuances are probably
         | better left for a proof of the central limit theorem.
        
       | crdrost wrote:
       | If the blog author is reading, some notes for improvement:
       | 
       | - Your odds calculation is likely wrong. You assumed from the
       | word "odds" that "odds ratio" was meant, (Odds=3 meaning "odds
       | 3:1 against" corresponding to p=25%) but the phrase is
       | "approximate odds 1 in X" (Odds=3 meaning "odds of 1 in 3 to win"
       | meaning 33%) and recalculating results in the remarkably exact
       | expected value of $80 which seems intentional?
       | 
       | - You phrase things in terms of variances, people will think more
       | in terms of standard deviations. So 3.5 +- 1.7 vs $80 +- $12,526.
       | 
       | - Note that you try to make a direct comparison between those two
       | but the two are in fact incomparable. The most direct comparison
       | might be to subtract 1 from the die roll and multiply by $32, so
       | that you have a 1/6 chance of winning $0, 1/6 of winning $32, ...
       | 1/6 of winning $160. So then we have $80 +- $55 vs $80 +-
       | $12,526. Then instead of saying you'd need 50 _million_ more
       | lottery tickets you 'd actually say you need about 50 _thousand_
       | more. This is closer to the  "right ballpark" where you can tell
       | that the whole lottery is expected to sell about 10,200,000
       | tickets on a good day.
       | 
       | - But where an article like this should really go is, "what are
       | you using the numbers for?". In the case of the Texas lottery
       | this is actually a strong constraint, they have to make sure that
       | they make a "profit" (like, it's not a real profit, it probably
       | goes to schools or something) on most lotteries, so you're
       | actually trying to ensure that 5 sigma or so is less than the
       | bias. So you've got a competition between $20 * _n_ and 5 *
       | $12,526 * [?]( _n_ ), or [?]( _n_ ) = 12526/4, _n_ = 9.8 million.
       | So that 's what the Texas Lottery is targeting, right? So then we
       | would calculate that the equivalent number of people that should
       | play in the "roll a die linear lottery" we've constructed is 187,
       | call it an even 200, if 200 people pay $100 for a lottery ticket
       | on the linear lottery then we can pretty much always pay out even
       | on a really bad day.
       | 
       | - So the 50,000x number that is actually correct is basically
       | just saying that we can run a much smaller lottery, 50,000 times
       | smaller, with that payoff structure. And there's something nice
       | about phrasing it this way.
       | 
       | - To really get "law of large numbers" we should _actually_
       | probably be looking at how much these distributions deviate from
       | Gaussian, rather than complaining that the Gaussian is too wide?
       | You can account for a wide Gaussian in a number of ways. But
       | probably we want to take the cube root of the 3d cumulant, for
       | example, try to argue when it  "vanishes"? Except given the
       | symmetry the 3rd cumulant for the die is probably 0 so you might
       | need to go out to the 4th cumulant for the die -- and this might
       | give a better explanation for the die converging more rapidly in
       | "shape" to the mean, it doesn't just come close faster, it also
       | becomes a Gaussian significantly faster because the payoff
       | structure is symmetric about the mean.
        
       | nathell wrote:
       | Tangential: syntax-highlighting math! This is the first time I've
       | seen it. Not yet sure what I think about it, but I can definitely
       | see the allure.
        
         | rendaw wrote:
         | Pedant-man on the scene: this is just highlighting since the
         | highlighting isn't derived from syntax.
        
         | Tachyooon wrote:
         | It's easy on the eyes and it can make reading lots of equations
         | less awkward if done correctly. I remember finding out this was
         | possible while I was working on an assignment in Latex - it
         | looked amazing.
         | 
         | It takes a little bit of work to colour in equations but I hope
         | more people start doing it (including me, I'd forgotten about
         | it for a while)
        
         | tetha wrote:
         | Yeah, I like it. I used this as a tutor in more finicky
         | exercises when it becomes really important to keep 2-3 very
         | similar, but different things apart. It takes a bit of
         | dexterity, but you can switch fluently between 3 different
         | whiteboard markers held in one hand while writing, haha.
         | 
         | I am kind of wondering if a semantic highlighting makes sense
         | as well. You often end up with some implicit assignment of
         | lowercase latin, uppercase latin, lowercase greek letters and
         | such for certain meanings. Kinematic - xyzt for position in
         | time, T_i(I_i) for the quaternion or transformation
         | representing a certain joint of a robot.
        
       | derbOac wrote:
       | The biggest problem for real processes is knowing whether in fact
       | x ~ i.i.d., with regard to time as well as individual
       | observations.
        
       ___________________________________________________________________
       (page generated 2023-09-12 23:01 UTC)