[HN Gopher] Introduction to Modern Statistics ___________________________________________________________________ Introduction to Modern Statistics Author : noelwelsh Score : 526 points Date : 2023-10-12 08:45 UTC (12 hours ago) (HTM) web link (openintro-ims2.netlify.app) (TXT) w3m dump (openintro-ims2.netlify.app) | noelwelsh wrote: | Statistics education is undergoing a bit of a revolution, driven | by the accessibility of computers. For example, hypothesis | testing is introduced by randomization[1], using a randomized | permutation test[2]. I find this really easy to understand, | compared to how I learned statistics using a more traditional | approach. The traditional approach taught be a cookbook of | hypothesis tests to use: use the t-test in this situation, use | the chi-squared in this situation, and so on. I never gained any | understanding of why I should use these different tests, or where | they came from, from the cookbook approach. | | For the same approach in a slightly different context see [3]. | | [1]: https://openintro-ims2.netlify.app/11-foundations- | randomizat... | | [2]: https://en.wikipedia.org/wiki/Permutation_test | | [3]: | https://inferentialthinking.com/chapters/11/1/Assessing_a_Mo... | iTokio wrote: | There is also Brillant that has a very polished interactive | course: | | https://brilliant.org/courses/statistics/ | usgroup wrote: | These things are great if they add value for you, but I would | be very skeptical of any non-mathematical approach to | statistics. I think statistics is only made clear by | mathematics, much the same as Physics. And one cannot grasp | statistics without being able to understand the maths. | | I think that still the best way to understand statistics is | to start with the mathematical theory and to grind 1000+ | textbook problems. | mkl wrote: | > I think that still the best way to understand statistics | is to start with the mathematical theory and to grind 1000+ | textbook problems. | | Are there any books you'd recommend for this approach? | usgroup wrote: | My grind was "Mathematical Statistics with Applications" | by Wackerly et al. There are PDF versions if you Google | for it. I can't say it was quick, easy or intuitive; but | it works. | | I also liked "In all Likelihood" by Pawitan for a | "likelihoodist" foundational approach. | usgroup wrote: | I've had similar thoughts, but I think its more to do with what | is in your head at the time you hear about it. I found | permutation tests satisfying to learn about because they | somehow helped consolidate what I knew from distribution | theory. If I didn't know any distribution theory prior, I'm not | sure they could have that effect. | | If you study mathematical statistics, it is not taught as a | cookbook. At the elementary level you learn probability theory | and distribution theory, all the different distributions, | hypothesis tests, regression, ANOVA and so on proceed from | there. Meanwhile, I think research scientists are often taught | statistics as a set of recipes because its usually a short | course for a specific discipline. E.g. Statistics for | biologists. | ImaCake wrote: | I think those short courses would be more effective if they | didn't bother with ANOVA and instead taught intro probability | and distributions and then jumped straight to regression. | ANOVA is just a really specific way of doing a regression. | | In R, and python::statsmodels you get the answer to | (essentially) an ANOVA any time you run an LM or GLM; its the | Z-statistic for your whole model. | | I know there is more nuance to this, but teaching students | that they can use regression for most of the problems they | would have used seemingly arcane tests for is going to be | much more useful for the students. | | Here is a lovely page demonstrating how to do this in R: | https://lindeloev.github.io/tests-as-linear/ | usgroup wrote: | I agree with the sentiment although I'm not sure there is | the time for all of it. At least when I took them, | probability theory and distribution theory were separate | semester long courses, and the former was a prerequisite | for the latter. | gpderetta wrote: | Stastsmodels and that github page are the only reason I | have some understanding of statistical tests. | bunderbunder wrote: | _Principles of Statistics_ by M.G. Bulmer is a nice | introduction to the mathematical side of things. It 's part | of Dover's classic textbook series, so it's inexpensive | compared to newer textbooks, and also concise and well- | written. | | It does assume you already have a solid understanding of | calculus and combinatorics, though. Which I think is fair. | Discrete statistics is arguably just applied combinatorics, | and continuous statistics applied calculus, so if you have a | strong foundation in those two subjects then you're already | 90% of the way there. (And, if you don't, stop the cart and | let the horse catch up.) | dr_dshiv wrote: | Do you know of any validation studies with Advanced Data | Analysis (formerly code interpreter) in chatGPT? I think it can | be excellent as a teaching tool. | wespiser_2018 wrote: | The difficulty of teaching statistics is that the maths you | need to prove things are right and gain an intuitive | understanding of the methods are far more advanced than what is | presented in a basic stats course. Gosset came up with the | t-test and proved to the world it made sense, yet we teach | students to apply it in a black box way without a fundamental | understanding of why it's right. That's not great pedagogy. | | IMO, this is where Bayesian Statistics is far superior. There's | a Curry-Howard isomorphism to logic which runs extremely deep, | and it's possible to introduce using conjugate distributions | with nice closed form analytical solutions. Anything more | complex, well, that's what computers are for, and there are | great ways (STAN) to run complex distributions that are far | more intricate than frequentist methods. | zozbot234 wrote: | Maximum likelihood (which underpins many frequentist methods) | basically amounts to Bayesian statistics with a uniform prior | on your parameters. And the "shape" of your prior actually | depends on the chosen parametrization, so in principle you | can account for non-flat priors as well. | nextos wrote: | IMHO, the discussion should not be so much whether to teach | Bayesian or maximum likelihood. But instead, whether to | teach generative models or to keep going with hypothesis | tests, which are generally presented to students as a bag | of tricks. | | Generative models, (implemented in e.g. Stan, PyMC, Pyro, | Turing, etc.) split models from inference. So one can | switch from maximum likelihood to variational inference or | MCMC quite easily. | | Generative models, beginning from regression, make a lot | more sense to students and yield much more robust | inference. Most people I know who publish research articles | on a frequent basis do not know p-values are not a measure | of effect sizes. This demonstrates current education has | failed. | eutectic wrote: | Maximum Likelihood corresponds to Bayesian statistics with | MAP estimation, which is not the typical way to use the | posterior. | thefringthing wrote: | > There's a Curry-Howard isomorphism [between] logic [and | Bayesian statistical inference]. | | This is an odd way of putting it. I think it's better to say | that, given some mostly uncontroversial assumptions, if one | is willing to assign real number degrees of belief to | uncertain claims, then Bayesian statistical inference is the | only way of reasoning about those claims that's compatible | with classical propositional logic. | jna_sh wrote: | Very excited to see Mine Cetinkaya-Rundel is an author here! Many | might be familiar with "R for Data Science" | (https://r4ds.had.co.nz/), to which she is a contributor, but | she's also published a lot of great papers around teaching data | science. | ayhanfuat wrote: | She also has some online courses on Coursera | (https://www.coursera.org/instructor/minecetinkayarundel). | Hands down one of the best instructors I have seen. | zvmaz wrote: | What is a good book on statistics that one can use for self- | learning? | noelwelsh wrote: | Depends where you are starting from and what you want to learn. | The linked book is a first year introduction, and does a good | job of that. If you want to go further there are many other | options: | | * Statistical Inference by Casella and Berger. This book has a | very good reputation for building statistics from first | principles. I won't link to them, but you can find full PDF | scans online with a simple search. Amazon reviews: | https://www.amazon.com/Statistical-Inference-Roger-Berger/dp... | | * Statistics by Freedman, Pisani, and Purves has similarly very | good reviews and can be easily found online. Amazon reviews: | https://www.amazon.com/Statistics-Fourth-David-Freedman-eboo... | | * The majority of the Berkeley data science core curriculum | books are online. This is not purely statistics but 1) is | taught in a modern style that makes use of computation and | randomization and 2) uses tools that may be useful to learn | about. | | 1. https://inferentialthinking.com/chapters/intro.html (Data 8) | | 2. https://learningds.org/intro.html (Data 100) | | 3. http://prob140.org/textbook/content/README.html (Data 140) | | 4. https://data102.org/fa23/resources/#textbooks-from- | previous-... (Data 102; this gets into machine learning and | pure statistics) | | The Berkeley curriculum is not the only one; there are tens, | possibly hundreds, of online courses. The Berkeley curriculum | is just 1) quite extensive and 2) the one I happened to read | the most about when I was recently researching how data science | is currently taught. | sudoankit wrote: | I particularly like Statistical Inference by George Casella and | Roger Lee Berger. | | You could also look at Introduction to Probability by Joseph K. | Blitzstein and Jessica Hwang (available for free here: | http://probabilitybook.net (redirects to drive)). | laichzeit0 wrote: | Should be noted that Casella's book is... well... really | great if you thought Spivak's calculus and Rudin's analysis | to be fun books, especially the exercises. | | Casella's exercises are absolutely brutal. | dan-robertson wrote: | I like _statistical rethinking_. It's targeted at science phd | students so the focus is "how can you use statistics for | testing your scientific hypotheses and trying to tease out | causation". It doesn't go deep into the mathematics of things | (though expects readers to be decently numerate and comfortable | analysing data without statistics). It only really talks about | Bayesian models and how to fit them by computer, so won't cover | much of the frequenting side of things at all. | verbify wrote: | ISLR/ISLP is free, was used in my masters and is excellent (and | has an accompanying video series) | | https://www.statlearning.com/ | dtjohnnyb wrote: | A couple of more introductory books that come at it from the | point of view of "someone who can code" are: - | https://greenteapress.com/wp/think-stats-2e/ (and the similar | Think Bayes if you enjoy this one) - | https://nostarch.com/learnbayes | | Can second Statistical Rethinking though if you have the basics | of stats and want to learn it again from a very different, more | causal/bayesian point of view. | begemotz wrote: | What is your background and what field will you be applying | your knowledge to? | | There can be a rather wide gap between a theoretical approach | that you might encounter as taught by a statistician and an | applied approach you might encounter in a business statistics | or social science statistics course. | | Depending on your math background and the area of intended | application, in my opinion, it would sway recommendations for a | first 'book' on statistics for self-learning. | photochemsyn wrote: | Good video lecture series: | | https://www.thegreatcourses.com/courses/learning-statistics-... | | Might be available for free via your local library, too. | ricksunny wrote: | I'm looking for help with distilling 'truth' from folk belief | systems by formalizng them under a Bayesian network framework, in | case anyone is looking for a project through which to sharpen | their statistical saw. | d00mer wrote: | They should remove "modern" from the title, because who the hell | uses the "R programming language" these days anymore? | Onawa wrote: | Everyone in my branch of Toxicology? Tons of people in | biological sciences. Just because you have bias against the | tool and don't run in the same circles doesn't mean that R | isn't used and love by a subset of devs. | noelwelsh wrote: | Statisticians do. The Berkeley curriculum, which I've linked to | in another comment, uses Python. | adr1an wrote: | Everyone but you. Check any statistics journal. Only a few | people developing methods switched to Python or Julia. | i_love_limes wrote: | A lot of people... in fact a huge portion of statisticians, | epidemiologists, econometrics, use it as their primary | language. | | I do genetic epidemiology (which is considerably more compute | intensive than regular epidemiology), and R is still the most | common language, with the most libraries and packages being | used for it, compared to python for example. | | I think maybe you should consider being less forthcoming with | your opinions on topics which you are not well informed on. | wespiser_2018 wrote: | I worked in data science for a few start ups, and even though | I know Python (it's my LeetCode language of choice), R just | dominates when it comes to accessing academic methods and | computational analysis. If you are going to push the | boundaries of what you can and can't analysis for statistical | effects and leverage academic learnings, it's R. | dereify wrote: | fyi many state-of-the-art statistical libraries exist (or are | properly maintained) in R only | ImaCake wrote: | I find it depends on what you want. There is no canonical GAM | (gen. addative model) library in python but there are a few | options - which are not easy to use. The statsmodels GAM | implementation appears to be broken. R, of course, has a | stupid easy to use GAM library that is pretty fast. | | On the other hand, R has _too many_ obscure options for what | I can find in scipy or sklearn. So I find it easier to just | jump into sklearn, use the very nice unified interface | "pipelines" to churn through a whole bunch of different | estimators without having to do any munging on my data. | | So I think it just depends on your field. But R seems to | stick more with academia. | nomilk wrote: | Before I knew command line, I tried to install python and spent | the next 3 days resolving an installation issue with 'wheel'. | | By contrast, from first downloading R to running my first R | script took about 1 hour (the most difficult part was opening | the 'script' pane in RStudio IDE, which doesn't open by default | on new installations, for some reason). | | There's huge demand out there for statistical software that's | accessible to people whose primary pursuit is not | programming/cs, but genetics, bioinformatics, economics, | ecology and other disciplines that necessitate tooling much | more powerful than excel, but with barriers to entry not much | greater than excel. R is a fairly amazing fit for those folks. | perrygeo wrote: | R and CRAN really get package management right. Even as a | very infrequent R user, there are no surprises, it "just | works". Compare that to my daily Python usage where I am | continually flummoxed by dependency issues. | _Wintermute wrote: | Strong disagree, there's a reason RStudio/Posit are | spending so much time trying to develop 3rd party | alternatives to install.packages() and CRAN. | | Try installing an older version of a package without it | pulling in the most recent incompatible dependencies, it's | a whole adventure. | MilStdJunkie wrote: | Respectfully, I'm going to ask, "what what?". I can't swing a | cat without hitting dplyr. It's probably industry dependent | though - I could see a dataset that's 99% text having | absolutely no reason to even look at R at all. | f6v wrote: | Most people in bioinformatics. | epgui wrote: | Probably most people who do statistics. | | R sucks as a language but it excels at that specific | application, just because of its tremendous ecosystem (putting | even python to shame in some niche areas). | wespiser_2018 wrote: | R is fine, it's no more absurd than other non-typed languages | like javascript. Most languages are very good at one or two | things, then not so good or appropriate for other tasks. For | R, that's statistics, modeling, and exploratory analysis, | which it absolutely crushes at due to ecosystem effects. | dleeftink wrote: | Anyone looking to apply and compare frequentist and bayesian | methods within a unified GUI (which is essentially an elegant | wrapper to R and selected/custom statistical packages), should | check out _JASP_ developed by the University of Amsterdam [0]. It | 's free to use, and the graphs + captions generated during each | step are publication quality right out of the box. | | Using it truly feels like a 'fresh way' to do statistics. Its | main website provides ample use cases, guides and tutorials, and | I often return to the blog for the well documented deepdives into | how traditional frequentist methods and their bayesian | counterparts compare (the animated explainers are especially | helpful, and I appreciate the devs reflecting on each release and | future directions). | | [0]: https://jasp-stats.org/ | NeutralForest wrote: | there was an interview of one of the JASP (creator or | maintainer, can't remember) in the "Learn Bayesian Stats" | podcast; it was very interesting. | rdhyee wrote: | I think the referenced episode is | https://learnbayesstats.com/episode/61-why-we-still-use- | non-... Thanks for pointing it out! | dleeftink wrote: | To me, it's academic software _done right_ , both in terms of | accessibility and maintenance. I'd love to hear more about | their governance and funding structure and how this might be | applied elsewhere, and learn about academic software of | similar grade and utility. | mindcrime wrote: | Even better than just being "free to use" it's F/OSS (under the | AGPL): | | https://github.com/jasp-stats/jasp-desktop | 3abiton wrote: | How does this compare to other stat libraries? | begemotz wrote: | I like the inclusion of randomization and bootstrapping. It's | unfortunate that the hypothesis framework is still NHST -- I | wouldn't consider that 'modern' by any means. | noelwelsh wrote: | I don't see widespread agreement in the statistics community as | to what should replace NHST. If you go Bayesian you need to | completely rewrite the course. I've seen confidence intervals | suggested as an alternative, but there are arguments against. | I've also seen arguments that hypothesis tests shouldn't be | used at all. Given that NHST is still widely used and there | isn't a clear alternative I think it's a disservice to students | to not introduce them. | begemotz wrote: | I probably should have been more clear. I didn't say | hypothesis testing, I said NHST (the binary null/alt | hypothesis approach) - which is an approach to hypothesis | testing particularly prevelant in certain disciplines such as | Psychology. | | And in that context, there is a lot of agreement that this | approach is fundamentally flawed and outdated. if you are | interested, I can provide references when I get to the | office. But off the top of my head consider Gigerenzer and | Cummings. | noelwelsh wrote: | For those following along at home Gigerenzer is, I think, | "Mindless Statistics"[1] and Cummings is "The New | Statistics"[2]. | | [1]: https://pure.mpg.de/rest/items/item_2101336/component/ | file_2... [2]: Sample at | https://tandfbis.s3.amazonaws.com/rt- | media/pp/common/sample-... | begemotz wrote: | Yes, those are appropriate (although Gigerenzer and | Cummings both have other relevant publications on the | topic). | | As for a undergraduate text that 'teaches the | difference', you can look at 'An Introduction to | Statistics' by Carlson & Winquist. | RedShift1 wrote: | Can I download this as a PDF? I'd like to read it offline. | noelwelsh wrote: | Here: https://www.openintro.org/book/ims/ | RedShift1 wrote: | This is the first version, not the 2nd? | noelwelsh wrote: | Hmmm ... must be because the 2nd edition is still in | progress. Best option might be to follow the immortal words | of Obiwan Kenobi and "use the source": | https://github.com/OpenIntroStat/ims | | Otherwise you can try building a PDF from the very similar | Data 8 book[1] using [2] | | [1]: https://github.com/data-8/textbook | | [2]: https://jupyterbook.org/en/stable/advanced/pdf.html | usgroup wrote: | I think Ronald Fisher may not have used bootstrap to calculate | confidence intervals; but it looks to me like he invented most of | the rest of the syllabus .. in the early 1900s :-) | mjburgess wrote: | What's often missing from these introductions is when statistics | will not work; and what it even means when it "works". The amount | of data needed to tell between two normal is about 30 data points | -- between two power-law distributions, >trillion. (And this | basically scuppers the central limit theorem, on which a lot of | cargo-cult stats is justified). | | Stats, imv, should be taught simulation-first: code up your | hypotheses and see if they're even testable. Many many projects | would immediately fail at the research stage. | | Next, know that predictions are almost never a good goal. Almost | everything is practically unpredictable -- with a near infinite | number of relevant causes, uncontrollable. | | At best, in ideal cases, you can use stats to model a | distribution of predictions _and then_ determine a risk /value | across that range. Ie., the goal isnt to predict anything but to | prescribe some action (or inference) according to a risk | tolerance (risk of error, or financial risk, etc.). | | It seems a generation of people have half-learned bits of stats, | glued them together, and created widespread 'statistical cargo- | cultism'. | | The lesson of stats isnt hypothesis testing, but how almost no | hypotheses are testable -- _and then_ what do you do | Ensorceled wrote: | It's ironic that this ... rant? ... is basically unreadable | without knowledge of basic statistical methods. | | How do you teach any of this to someone who hasn't already | taken introductory statistics? How do you learn anything if you | first have to learn the myriad ways something you don't even | have a basic working knowledge of can fail before you learn it? | mjburgess wrote: | The comment is addressed to the informed reader who is the | only one with a hope of being persuaded on this point. | | To teach this, from scratch, I think is fairly easy -- but | there's few with any incentive to do it. Many in academia | wouldnt know how, and if they did, would discover that much | of their research can be shown _a priori_ to not be | worthwhile (rather than after a decade of 'debate'). | | All you really need is to start with establishing an | intuitive understanding of randomness, how apparently highly | patterned it is, and so on. Then ask: how easy is it to | reproduce an observed pattern with (simulated) randomness? | | That question alone, properly supported via basic programming | simulations, will take you extremely far. Indeed, the answer | to it is often obvious -- a trivial program. | | That few ever write such programs shows how the whole edifice | of stats education is geared towards confirmation bias. | | Before computers, stats was either an extremely mathematical | disipline seeking (empirically useless) formula for toy | models; or using heuristic empirical formula that rarely | applied. | | Computers basically obviate all of that. Stats is mostly | about counting things and making comparisons -- perfect tasks | for machines. with only a few high-school mathematical | formula most could derive most useful statistical techniques | as simple computer programs. | noelwelsh wrote: | The modern approach, of which this textbook is an example, | does start with simulation. In fact there is very little | classical statistics (distributions, analytic tests) in the | book. The Berkeley Data 8 book, which I link to in another | comment, takes the same approach. I imagine there is still | too much classical material for your tastes, but there is | definitely change happening. | 2devnull wrote: | " that much of their research can be shown a priori to not | be worthwhile" | | Bingo. Cargo cult stats all the way down. It's not just | personal interest, it's the entire field, it's their | colleagues, mentors, and students. Good luck getting | somebody to see the light when not just their own income | depends on not seeing it, their whole world depends on the | "stat recipes" handed down from granny. | brutusborn wrote: | I think the egotistical aspect is the most powerful: many | researchers have built an identity based on the fact that | they "know" something, so to propose better alternatives | to their pet theories is tantamount to proposing their | life is a lie. To change their mind they need to admit | they didn't "know". | | The better the alternatives, the more fierce the passion | with which they will be rejected by the mainstream. | 2devnull wrote: | I now think it's best explained by simple economics. | Academia and academics are the product of economic forces | by and large. It's not quirky personalities or uniquely | talented minds that make up academia today. It's droves | of conscientious (big five sense) conformists, with | either high iq or mere socio-economic privilege, who have | been trained by our society to feel that financial | security means college, and even more financial security | means even more college. Credentials are like alpha .05, | they solve a scale problem in a way that alters the | quality/quantity ratio. If you want more | researchers/research/science output, credentials and | alpha .05 cargo cult stats are your levers to get more | quantity at lower quality. | Retric wrote: | It seems like a reasonable critique. The suggestion is to | include such ideas as people are taking introductory | statistics which isn't inappropriate. I wouldn't suggest | forcing students to code up their own simulations from | scratch, but creating a framework where students can plug in | various formula for each population, attach a statistical | test, and then run various simulations could do quite a bit. | However, what kinds of formula students are told to plug in | are important. | | If every formula is producing bell curves then that's a | failure to educate people. 50d6 vs 50d6 + 1 is easy enough | you can include 1d2 * 50 + 50d6 for a 2 tailed distribution, | but also significantly different distributions which then | fail various tests etc. | | I've seen people correctly remember the formula for | statistical tests from memory and then wildly misapply them. | That seems like focusing on the wrong things in an age when | such information is at everyone's fingertips, but | understanding of what that information means isn't. | taeric wrote: | Model building, at large, is the thing I regret being bad at. | Model your problem and then throw inputs at it and see what you | can see. | | Sucks, as we seem to have taught everyone that statistical | models are somehow unique models that can only be made to get a | prediction. To the point that we seem to have hard delineations | between "predictive" models and other "models.". | | I suspect there are some decent ontologies there. But, at | large, I regret that so many won't try to build a model. | srean wrote: | I work in applied ML and stats. Whenever a client gets pushy | about getting a prediction and would not care about quantifying | the uncertainty around it, I take it as a signal to disengage | and look for better pastures. It is really not worth the time, | more so if you value integrity. | | Competent stakeholders and decision makers use the uncertainty | around predictions, the chances of an outcome that is different | from the point-predicted outcome, to come to a decision and the | plan includes what the course of action should be should the | outcome differ from the prediction. | 0xDEAFBEAD wrote: | >The amount of data needed to tell between two normal is about | 30 data points | | What are you trying to say here? If there are two normal | distributions, both with variance one, one having mean 0 and | the other having mean 100, and I get a single sample from one | of the distributions, I can guess which distribution it came | from with very high confidence. Where did the number 30 come | from? | sndean wrote: | > Where did the number 30 come from? | | Yeah, I've also heard 30 for normal distributions over and | over in ~7 stats courses that I've taken. | | This SE stats answer sounds reasonable enough: | https://stats.stackexchange.com/a/2542 | juunpp wrote: | I am a noob and I've always got stuck on comparing two | independent means. Assumption: normality. Yeah, data is never | normal in my bakery. | haberman wrote: | This really resonates with me. I've attempted self-study about | statistics many times, each time wanting to understand the | fundamental assumptions that underlie popular statistical | methods. When I read the result of a poll or a a scientific | study, how rigorous are the claimed results, and what false | assumptions could undermine them? | | I want to build intuitions for how these statistical methods | even work, at a high level, before getting drowned in math | about all the details. And like you say, I want to understand | the boundaries: "when statistics will not work; and what it | even means when it "works". | | I imagine that different methodologies exist on a spectrum, | where some give more reliable results, and others are more | likely to be noise. I want to understand how to roughly tell | the good from the bad, and how to spot common problems. | wespiser_2018 wrote: | "Simulation first" is how I did things when I worked in data | science and bioinformatics. Define the simulation that | represents "random", then see how far off the actual data is | using either information theory or just a visual examination of | the data and summary statistic checks. That's a fast and easy | way to gut check any observation to see if there is an | underlying effect, which you can then "prove" using a more | sophisticated analysis. | | Just raw hypothesis is just too easy to juke by overwhelming it | with trials. Lots of research papers have "statistically | significant" results, but give no mention of how many | experiments it took to get them, or any indiciation of negative | results. Eventually, there will always be the analysis where | you incorrectly reject the null hypothsis given enough effort. | RSMDZ wrote: | >> between two power-law distributions, >trillion | | Do you have anywhere I can read more about this? I would have | assumed that a trillion data points would be sufficient to | compare any two real-world distributions | bigbillheck wrote: | > The amount of data needed to tell between ... two power-law | distributions, >trillion. | | I don't agree with this as a statement of fact (except in the | obvious case of two power-law distributions with extremely | close parameters). Supposing it was true, that would mean that | you would almost never have to actually worry about the | parameter, because unless your dataset is that large one power | law is about as good as any other for describing your data. | elashri wrote: | Thanks to the author for the book and making it open access. I | always admire these efforts. | growingkittens wrote: | Is there a "pre-statistics" book that teaches the thinking skills | and concepts needed to understand statistics? | ndr wrote: | This book seems to start where you need it to start. | | You don't need much beyond basic calculus. Most suffer from | some mental block they got installed at a young age akin those | that say "I'm bad at math" because their teacher sucked. Dive | in and you won't regret it. | obscurette wrote: | I have been a math teacher and although I can't guarantee | that I didn't suck, I can say that most of kids don't develop | this attitude because of teachers, but because of their | parents. "My mum says that she sucked at math/music/whatever | as well, so do I!" is far too common. As a teacher I just | didn't have resources to influence this attitude either. | ndr wrote: | Yes, parents can be horrible too. Unfortunately it's | somehow socially acceptable and even worthy of pride in | some circles, to be "bad at math". It's seems very rare for | someone to openly say "I'm bad at [my native language]" or | "writing". | | I feel stats is has a somewhat similar effect even among | those with math education. Several friends who have a | degree in math recoil at the first mention of stats | concepts. | obscurette wrote: | > It's seems very rare for someone to openly say "I'm bad | at [my native language]" or "writing". | | It is actually even fashionable in non-english countries. | Declaring "I'm bad at [my native language], I only use | english anyway" makes you a better person somehow. And | it's not rare in other areas either - in post-truth world | it's trendy not to know things. | Novosell wrote: | In non-english countries? All of them? Source? I, as a | person from one of said non-english countries, disagree. | growingkittens wrote: | My mental block is a brain injury that went undiagnosed until | I was 30. I can't really hold more than two numbers in my | head at a time. I struggled through math in school because it | was lecture based, and the books were written to accompany a | lecture. | | I can learn math fairly well if I have the right written | material and the right direction. However, I do not retain | math skills: without active practice, I revert back to "how | do fractions work?" | | For example, I did extremely well in a college algebra course | that was partially online (combined with Khan Academy to | catch up). I could do my tests perfectly in pen, much to the | amusement of the assistants. I could make connections and see | the implications and applications of the math. Roughly three | to six months later, I was back to forgetting fractions. | | I can't learn these things over time, but I can learn them | all at once. I'm collecting resources for my next math | adventure. | armcat wrote: | One of my favourite books on statistics and probability is | "Regression and Other Stories", by Andrew Gelman, Jennifer Hill | and Aki Vehtari. You can access the book for free here: | https://users.aalto.fi/~ave/ROS.pdf | epgui wrote: | +1, this is a great textbook, and not just for social sciences | as the second header would suggest. | epgui wrote: | As much as I appreciate and love all pedagogical endeavours in | the field, especially in the form of open texts, I really, | really, really dislike this overall approach to teaching | introductory statistics. | | I'm hoping to see, over time, a shift away from ad-hoc null | hypothesis testing in favour of linear models (yes, in | introductory courses, from the start-- see link below) and | Bayesian-by-default approaches. | | https://lindeloev.github.io/tests-as-linear/#:~:text=Most%20.... | bschne wrote: | I am partway through McElreath's "Statistical Rethinking" and I | fully agree with this. | epgui wrote: | That's a great textbook! | TheAlchemist wrote: | It's been recommended on this topic several times, so I'm | looking at it. Quite expensive ! I see there is a series of | lectures, which seems identical to the book. Is it the same | ? Or still worth buying the book ? | noelwelsh wrote: | The lectures are good, and I've been told the book can be | found online by the intrepid. I guess that Anna's Archive | or Library Genesis has it. | TheAlchemist wrote: | I've found the book indeed - although it seems to be the | first edition. | | It's here: | https://civil.colorado.edu/~balajir/CVEN6833/bayes- | resources... | begemotz wrote: | I agree about teaching from a unified GLM basis. The 'bayesian- | by-default' approach seems to going out on a more tenuous limb, | imo. | JHonaker wrote: | It's only appears tenuous because the subjective choices you | have to make when using frequentist methods are made for you | by the developer of the method. | | It's less comfortable to use Bayesian methods because you | have to be explicit about your assumptions _as the user_ , | which opens your assumptions up for easier inspection. | There's also way less specific information implied by priors | than most people think. Informative priors should try to make | distinctions between something that's reasonable-ish and | something that's essentially infinity (take pharmacokinetics | for example, the diffusion velocity of a molecule in your | blood stream shouldn't have a velocity near the speed of | light in a vacuum should it?). They should not be forcing | your model to achieve a particular result. Luckily, because | of the need to explicitly state them in a Bayesian analysis, | it's much easier to determine if they were properly set. | | Prior specification is essentially problem domain-informed | regularization where you can actually hope to understand if | the hyperparameter is going to work or not. | fallat wrote: | > I'm hoping to see, over time, a shift away from ad-hoc null | hypothesis testing in favour of linear models (yes, in | introductory courses, from the start-- see link below) and | Bayesian-by-default approaches. | | Is there anything where I can start today, as a guinea pig? My | statistics education is basically zero. | bschne wrote: | See my sibling comment, can recommend this: | https://xcelab.net/rm/statistical-rethinking/ | noelwelsh wrote: | There are other comments here that suggests a number of books | at varying levels. "Introduction to Modern Statistics" is | very approachable in its presentation. | willsmith72 wrote: | The epub is apparently too big to send to a kindle, but I can't | see the option to download it, only the pdf. Any ideas? | tea-coffee wrote: | This looks to be the 2nd edition. Can anyone comment on how the | 1st edition was? | mavam wrote: | For studying statistics, I put together a comprehensive cheat | sheet: https://github.com/mavam/stat-cookbook ___________________________________________________________________ (page generated 2023-10-12 21:00 UTC)