[HN Gopher] How large is that number in the Law of Large Numbers? ___________________________________________________________________ How large is that number in the Law of Large Numbers? Author : sebg Score : 110 points Date : 2023-09-12 13:06 UTC (9 hours ago) (HTM) web link (thepalindrome.org) (TXT) w3m dump (thepalindrome.org) | wodenokoto wrote: | I really like how the plots and graphics look. Is it the library | by 3blue1brown? (Is it manim, it's called?) | bannedbybros wrote: | [dead] | yafbum wrote: | Stats class role of thumb: if you need to calculate the relative | probability of two outcomes, you can get to within about 10% once | you get 100 samples of each outcome (so, need more samples | overall if the distribution is skewed). | gear54rus wrote: | Its interesting that even in this thread the 2 answers differ | by an order of magnitude lol | gipp wrote: | Eh it's really the same rule, just applying a different | threshold. | marcosdumay wrote: | The problem is that the sensitivity to the number growth is | supposed to be exponential. So if you need 100 samples for | "within 10% of the value", then 10 samples should give you | almost completely random behavior. | | In reality, it depends on your actual distribution, but the | OP from this thread here is unreasonably conservative for | something described as a "rule of thumb". Almost always, if | you have at least 10 of every category, you can already | discover every interesting thing that a rule of thumb will | allow. And you probably could go with less. But if you want | precision, you can't get it with rules of thumb. | CaptainNegative wrote: | The dependence on sample size is not exponential, it's | sublinear. The heuristic rate of convergence to keep in | mind is the square root of the sample size, i.e. getting | 10x more samples shrinks the margin of error (in a | multiplicative sense) by sqrt(10) [?] 3ish. | | The exponential bit applies to the probability densities | as a function of the bounds themselves, i.e. how likely | you are to fall x units away from the mean typically | decreases exponentially with (some polynomial in) x. | | Of course, this is all assuming a whole bunch of standard | conditions on the data you're looking at (independence, | identically distributed, bounded variance, etc.) and may | not hold if these are violated. | [deleted] | jcranmer wrote: | FWIW, the threshold I learned was 20 in each bucket, so now | you have 3 answers. | koolba wrote: | Just apply it recursively. Let's get 100 samples of comments | suggesting the number of samples to use. Then average those. | dragontamer wrote: | My statistics class at high school level taught the following: | | The number of samples you need is very difficult to calculate | correctly, requiring deep analysis of standard deviations and | variances. | | But surprisingly, you can simply know you've reached large number | status when over 10 items exist in each category. | | --------- | | Ex: when doing a heads vs tails coin flip experiment, you likely | have a large number once you have over 10 heads and over 10 | tails. No matter how biased the coin is. | | Or in this 'Lotto ticket' example, you have a large number of | samples after gathering enough data to find over 10 Jackpot | winners. | jmount wrote: | Very cool rule. | | I think you can justify it by approximating each category as an | independent Poisson distribution. Then for each such processes | the variance equals the mean. So once you have 10 successes in | a bin, you have evidence of a probably good estimate for the | arrival rate of that category. The book "The Probabilistic | Method" calls a related idea "the Poisson paradigm." | | (10 a nice round number where the standard deviation is below | the mean) | jmount wrote: | Small proviso: this is only true for a reasonable number of | categories (or you run into repeated experiment problems). | NelsonMinar wrote: | That's a neat rule of thumb; is there a simple statistical | argument for why 10 is the (not very large) number? | [deleted] | tgv wrote: | For heads or tails, that leaves a very large margin. In approx. | 1 in 20 trials, you'll end up with a 10-20 split. | dragontamer wrote: | Yeah, 95% confidence ratio (or approximately two standard | deviations) is pretty standard with regards to statistical | tests. | | You gotta draw the line somewhere. At high-school statistics | level, its basically universally drawn at the 95% confidence | level. If you wanna draw new lines elsewhere, you gotta make | new rules yourself and recalculate all the rules of thumb. | User23 wrote: | I remember my high school AP Psychology teacher mocking | p=0.05 as practically meaningless. In retrospect it's funny | for a psychologist to say that, but I guess it was because | he was from the more empirically minded behaviorist | cognitive school and from time to time they have done | actual rigorous experiments[1] (in rodents). | | [1] For example as described by Feynman in Cargo Cult | Science. | tgv wrote: | The observation above is simply true. If you toss a coin | 30 times, there's about a 5% chance that you'll end up | with 10-20 ratio or one more extreme. | | NHST testing inverts the probability logic, makes the 5% | holy, and skims over the high probability of finding | something that is not equal to a specific value. That | procedure is then used for theory confirmation, while it | was (in another form) meant for falsification. Everything | is wrong about it, even if the experimental method is | flawless. Hence the reproducibility crisis. | lisper wrote: | The problem is two-fold: | | 1. p=0.05 means that one result in 20 is going to be the | result of chance. | | 2. It's generally pretty easy (especially in psychology) | to do 20 experiments, cherry-pick -- and publish! -- the | p=0.05 result, and throw away the others. | | The result is that _published_ p=0.05 results are much | _more_ likely than 1 in 20 to be the result of chance. | dragontamer wrote: | So run a meta-study upon the results published by a set | of authors and double-check to make sure that their | results are normally distributed across the p-values | associated with their studies. | | These problems are solved problems in the scientific | community. Just announce that regular meta-studies will | be done, expectations for authors to be normally | distributed is published, and publicly show off the meta- | study. | | ------------- | | In any case, the discussion point you're making is well | beyond the high-school level needed for a general | education. If someone needs to run their own experiment | (A/B testing upon their website) and cannot afford a | proper set of tests/statistics, they should instead rely | upon high-school level heuristics to design their | personal studies. | | This isn't a level of study about analyzing other | people's results and finding flaws in other people's | (possibly maliciously seeded) results. This is a | heuristic about how to run your own experiments and how | to prove something to yourself at a 95% confidence level. | If you want to get published in the scientific community, | the level of rigor is much higher of course, but no one | tries to publish a scientific paper on just a high school | education (which is where I was aiming my original | comment at). | User23 wrote: | There's a professor of Human Evolutionary Biology at | Harvard who only has a high school diploma[1]. Needless | to say he's been published and cited many times over. | | [1] https://theconversation.com/profiles/louis- | liebenberg-122680... | withinboredom wrote: | I don't know whether you're mocking them or being | supportive of them or just stating a fact. Either way, | education level has no bearing on subject knowledge. I | know more about how computers, compilers, and software | algorithms work than most post-docs and professors that | I've run into in those subjects. | | Am I smarter than them? Nope. Do I know as many fancy big | words as them? Nope. Do I care about results and | communicating complex topics to normal people? Yep. Do I | care more about making the company money than chasing | some bug-bear to go on my resume? Yep. | | I fucking hate school and have no desire to ever go back. | I can't put up with the bullshit, so I dropped out; I | just never stopped studying and I don't need a piece of | paper to affirm that fact. | lisper wrote: | First, I was specifically responding to this: | | > I remember my high school AP Psychology teacher mocking | p=0.05 as practically meaningless. | | and trying to explain why the OP's teacher was probably | right. | | Second: | | > So run a meta-study upon the results published by a set | of authors and double-check to make sure that their | results are normally distributed across the p-values | associated with their studies. | | That won't work, especially if you only run the meta- | study on published results because it is all but | impossible to get negative results published. Authors | don't need to cherry-pick, the peer-review system does it | for them. | | > These problems are solved problems in the scientific | community. | | No, they aren't. These are social and political problems, | not mathematical ones. And the scientific community is | pretty bad at solving those. | | > the discussion point you're making is well beyond the | high-school level needed for a general education | | I strongly disagree. I think everyone needs to understand | this so they can approach scientific claims with an | appropriate level of skepticism. Understanding how the | sausage is made is essential to understanding science. | | And BTW, I am not some crazy anti-vaxxer climate-change | denialist flat-earther. I was an academic researcher for | 15 years -- in a STEM field, not psychology, and even | _that_ was sufficiently screwed up to make me change my | career. I have advocated for science and the scientific | method for decades. It 's not science that's broken, it's | the academic peer-review system, which is essentially | unchanged since it was invented in the 19th century. | _That_ is what needs to change. And that has nothing to | do with math and everything to do with politics and | economics. | Viliam1234 wrote: | > p=0.05 means that one result in 20 is going to be the | result of chance. | | You made the same mistake most people make here: you | turned the arrow of the implication. It is not | "successful experiment implies chance (probability 5%)" | but "chance implies successful experiment (probability | 5%)". | | What does that mean in practice? Imagine a hypothetical | scientist that is fundamentally confused about something | important, so _all_ hypotheses they generate are false. | Yet, using p=0.05, 5% of those hypotheses will be | "confirmed experimentally". In that case, it is not 5% of | the "experimentally confirmed" hypotheses that are wrong | -- it is full 100%. Even without any cherry-picking. | | The problem is not that p=0.05 is too high. The problem | is, it doesn't actually mean what most people believe it | means. | lisper wrote: | I think we're actually in violent agreement here, but I | just wasn't precise enough. Let me try again: | p=0.05 means that one POSITIVE result in 20 is going to | be the result of chance and not causality | | In other words: if I have some kind of intervention or | treatment, and that intervention or treatment produces | some result in a test group relative to a control group | with p=0.05, then the odds of getting that result simply | by chance and not because the treatment or intervention | actually had an effect are 5%. | | The practical effect of this is that there are two | different ways of getting a p=0.05 result: | | 1. Find a treatment or intervention that actually works | or | | 2. Test ~20 different (useless) interventions. Or test | one useless intervention ~20 times. | | A single p=0.05 result in isolation is useless because | there is no way to know which of the two methods produced | it. | | This is why replication is so important. The odds of | getting a p=0.05 result by chance is 5%. But the odds of | getting TWO of them in sequential trials is 0.25%, and | the odds of a positive result being the result of pure | chance decrease exponentially with each subsequent | replication. | [deleted] | jameshart wrote: | Curious how people are 'applying' the Law of Large Numbers in a | way that needs this advice to be tacked on? | | > Always keep the speed of convergence in mind when applying the | law of large numbers. | | Any 'application' of the LLN basically amounts to replacing some | probalistic number derived from a bunch of random samples with | the _expected value_ of that number... and tacking on 'for | sufficiently large _n_ ' as a caveat to your subsequent | conclusions. | | Figuring out whether, in practical cases, you will have a | sufficiently large _n_ that the conclusion is valid is a | necessary step in the analysis. | LudwigNagasena wrote: | > Figuring out whether, in practical cases, you will have a | sufficiently large n that the conclusion is valid is a | necessary step in the analysis. | | The econometrics textbook I studied has more words "asymptotic" | in it than there are pages. Oftentimes it's impractical or even | theoretically intractable to derive finite sample properties | (and thus to answer when n is _really_ large enough). | gloryless wrote: | This kind of intuition is why a high school level statistics or | probability class seems so so valuable. I know not everyone will | use the math per se, but the concepts apply to everyday life and | are really hard to just grasp without having been taught it at | some point. | zodmaner wrote: | The sad thing is, having a mandatory high school level | statistics & probability class alone is not enough, you'll also | need a good curriculum and a competent teacher to go along with | it. Otherwise, it wouldn't work: a bad curriculum taught badly | by a unmotivated or unqualified teacher will almost always fail | to teach the intuition, or, even worse, alienates students from | the materials. | Dylan16807 wrote: | > This means that on average, we'll need a fifty million times | larger sample for the sample average to be as close to the true | average as in the case of dice rolls. | | This is "as close" in an absolute sense, right? | | If I take into account that the lottery value is 20x larger, and | I'm targeting relative accuracy, then I need 2.5 million times as | many samples? | causality0 wrote: | Am I the only one unreasonably annoyed that his graphs don't | match the description of his rolls? | alexb_ wrote: | If you had a gambling game that was simply "heads or tails, even | money", you would expect over a Large Number of trials that you | would get 0. But once you observe exactly one trial, the expected | value because +1 or -1 unit. We know this is always going to | happen one way or the other. Why then, does the bell curve of | "expected value" for this game not have two peaks, at 1 and -1? | Why does it peak at 0 instead? | | What I'm asking about, I know I'm wrong about - I just want to | know how I can derive that for myself. | ineptech wrote: | "The expected value of a random variable with a finite number | of outcomes is a weighted average of all possible outcomes." -- | https://en.wikipedia.org/wiki/Expected_value | alexb_ wrote: | That makes sense, I was always thinking of it as "Given an | infinite number of trials..." | ineptech wrote: | Whether/when its better to think in terms of "X has a 37% | chance of happening in a single trial" vs "If you ran a lot | of trials, X would happen in 37% of them" is kind of a | fraught topic that I can't say much about, but you might | find this interesting: | https://en.wikipedia.org/wiki/Probability_interpretations | munchbunny wrote: | The intuitive explanation is that the effect of a single sample | on the average diminishes as you take more samples. So, hand- | waving a bit, let's assume it's true that over a large number | of trials you would expect the average to converge to 0. You | just tossed a coin and got heads, so you're at +1. The average | of (1 + 0*n)/(n+1) still goes to 0 as n grows bigger and | bigger. | | That skips over the distinction between "average" and | "probability distribution", but those are nuances are probably | better left for a proof of the central limit theorem. | crdrost wrote: | If the blog author is reading, some notes for improvement: | | - Your odds calculation is likely wrong. You assumed from the | word "odds" that "odds ratio" was meant, (Odds=3 meaning "odds | 3:1 against" corresponding to p=25%) but the phrase is | "approximate odds 1 in X" (Odds=3 meaning "odds of 1 in 3 to win" | meaning 33%) and recalculating results in the remarkably exact | expected value of $80 which seems intentional? | | - You phrase things in terms of variances, people will think more | in terms of standard deviations. So 3.5 +- 1.7 vs $80 +- $12,526. | | - Note that you try to make a direct comparison between those two | but the two are in fact incomparable. The most direct comparison | might be to subtract 1 from the die roll and multiply by $32, so | that you have a 1/6 chance of winning $0, 1/6 of winning $32, ... | 1/6 of winning $160. So then we have $80 +- $55 vs $80 +- | $12,526. Then instead of saying you'd need 50 _million_ more | lottery tickets you 'd actually say you need about 50 _thousand_ | more. This is closer to the "right ballpark" where you can tell | that the whole lottery is expected to sell about 10,200,000 | tickets on a good day. | | - But where an article like this should really go is, "what are | you using the numbers for?". In the case of the Texas lottery | this is actually a strong constraint, they have to make sure that | they make a "profit" (like, it's not a real profit, it probably | goes to schools or something) on most lotteries, so you're | actually trying to ensure that 5 sigma or so is less than the | bias. So you've got a competition between $20 * _n_ and 5 * | $12,526 * [?]( _n_ ), or [?]( _n_ ) = 12526/4, _n_ = 9.8 million. | So that 's what the Texas Lottery is targeting, right? So then we | would calculate that the equivalent number of people that should | play in the "roll a die linear lottery" we've constructed is 187, | call it an even 200, if 200 people pay $100 for a lottery ticket | on the linear lottery then we can pretty much always pay out even | on a really bad day. | | - So the 50,000x number that is actually correct is basically | just saying that we can run a much smaller lottery, 50,000 times | smaller, with that payoff structure. And there's something nice | about phrasing it this way. | | - To really get "law of large numbers" we should _actually_ | probably be looking at how much these distributions deviate from | Gaussian, rather than complaining that the Gaussian is too wide? | You can account for a wide Gaussian in a number of ways. But | probably we want to take the cube root of the 3d cumulant, for | example, try to argue when it "vanishes"? Except given the | symmetry the 3rd cumulant for the die is probably 0 so you might | need to go out to the 4th cumulant for the die -- and this might | give a better explanation for the die converging more rapidly in | "shape" to the mean, it doesn't just come close faster, it also | becomes a Gaussian significantly faster because the payoff | structure is symmetric about the mean. | nathell wrote: | Tangential: syntax-highlighting math! This is the first time I've | seen it. Not yet sure what I think about it, but I can definitely | see the allure. | rendaw wrote: | Pedant-man on the scene: this is just highlighting since the | highlighting isn't derived from syntax. | Tachyooon wrote: | It's easy on the eyes and it can make reading lots of equations | less awkward if done correctly. I remember finding out this was | possible while I was working on an assignment in Latex - it | looked amazing. | | It takes a little bit of work to colour in equations but I hope | more people start doing it (including me, I'd forgotten about | it for a while) | tetha wrote: | Yeah, I like it. I used this as a tutor in more finicky | exercises when it becomes really important to keep 2-3 very | similar, but different things apart. It takes a bit of | dexterity, but you can switch fluently between 3 different | whiteboard markers held in one hand while writing, haha. | | I am kind of wondering if a semantic highlighting makes sense | as well. You often end up with some implicit assignment of | lowercase latin, uppercase latin, lowercase greek letters and | such for certain meanings. Kinematic - xyzt for position in | time, T_i(I_i) for the quaternion or transformation | representing a certain joint of a robot. | derbOac wrote: | The biggest problem for real processes is knowing whether in fact | x ~ i.i.d., with regard to time as well as individual | observations. ___________________________________________________________________ (page generated 2023-09-12 23:01 UTC)