[HN Gopher] Probability and Statistics Cookbook (2011) [pdf]
       ___________________________________________________________________
        
       Probability and Statistics Cookbook (2011) [pdf]
        
       Author : cpp_frog
       Score  : 174 points
       Date   : 2022-06-06 13:34 UTC (9 hours ago)
        
 (HTM) web link (pages.cs.wisc.edu)
 (TXT) w3m dump (pages.cs.wisc.edu)
        
       | kiernanmcgowan wrote:
       | Another set of notes that I refer to often are from ECE 830, also
       | at UW Madison[0]. It was a great class that really represented
       | the culmination of all the probability theory and signals classes
       | I had taken over the years.
       | 
       | [0] https://nowak.ece.wisc.edu/ece830/index.html
        
       | cjohnson318 wrote:
       | This is a nice collection of definitions and key results, but
       | it's not a cookbook. I think of a cookbook as a collection of
       | useful, focused examples, demonstrating best practices, and
       | listing caveats.
        
       | jenny91 wrote:
       | This actually seems pretty good and has great coverage!
        
       | snicker7 wrote:
       | There is a book titled "All of Statistics" if you'd like a
       | whirlwind tour.
        
         | conformist wrote:
         | https://www.stat.cmu.edu/~larry/all-of-statistics/index.html
        
       | buzzdenver wrote:
       | I would call this a cheat-sheet rather than a cookbook.
        
         | jmt_ wrote:
         | Agree, I typed up something similar (but less detailed) as a
         | reference during my undergrad stats major. I'd expect a
         | cookbook to have worked out examples of applications of these
         | topics. But still looks very useful as a reference
        
       | mturmon wrote:
       | Actually quite good.
       | 
       | I've TA'd this class but it's surprising how many of these little
       | facts can be helpful if you recall them at the right time. I was
       | just reminded of:                 Var[Y] = E[Var[Y|X]] +
       | Var[E[Y|X]]
       | 
       | and it unstuck me from a little puzzle.
       | 
       | More recent version at: http://statistics.zone
        
         | jarenmf wrote:
         | Similar to the law of total expectation, easier for me to think
         | of partitioning a weighted mean :)
        
         | willdearden wrote:
         | There is a professor who was at Wisconsin, Charles Manski, who
         | developed partial identification, which uses tons of these
         | decompositions.
         | 
         | Idea is let's say you have a binary survey question where 80%
         | respond and 90% of them respond "yes". What can we say about
         | population "yes" rate (assume sample size is huge for
         | simplicity)?
         | 
         | P(Yes) = P(Yes | response) * P(response) + P(Yes | no response)
         | * P(no response) = 0.9 * 0.8 + P(Yes | no response) * 0.2 =
         | 0.72 + P(Yes | no response) * 0.2
         | 
         | Then 0 <= P(Yes | no response) <= 1, so 0.72 <= P(Yes) <= 0.92.
         | This example is somewhat trivial but it's a useful technique
         | for showing exactly how your assumptions map to inferences.
        
         | evandwight wrote:
         | For those who are wondering how this equation is true:
         | 
         | https://en.wikipedia.org/wiki/Law_of_total_variance
        
       | mdp2021 wrote:
       | Note that the linked PDF is a 2011 version (as is explicit),
       | 
       | but the most recent version (0.2.7) is dated 2021 - and available
       | at https://github.com/mavam/stat-
       | cookbook/releases/download/0.2...
       | 
       | There exist "release notes" pages with the differences (at
       | https://github.com/mavam/stat-cookbook/releases )
        
         | Bo0kerDeWitt wrote:
         | Nice, original LaTeX code is there too.
        
       | russellbeattie wrote:
       | I don't know math at all. I'd love a programmer version of this,
       | with all the algorithms in code. Probably already exists in NumPy
       | or something.
        
         | time_to_smile wrote:
         | If you're interested in probability and statistics it's well
         | worth your time to get more comfortable with the math.
         | 
         | There's a common mistake in thinking among programmers that
         | there's a one-to-one mapping between math and code and that
         | mathematical notation is just an annoying terse short hand.
         | 
         | As someone who spends a lot of time implementing mathematical
         | ideas into code I can tell you this is not remotely true.
         | Mathematics is dealing with a level of abstraction and thinking
         | that is fundamentally distinct from the computational
         | implementation of these.
         | 
         | A clear example of this is the Gamma function which appears all
         | over those notes. It's an essential function for working deeply
         | with statistics, you'll find it shows up just about everywhere
         | if you look carefully enough. You can manipulate it
         | mathematically to solve a range of problem.
         | 
         | However if you want to implement this from scratch in code,
         | that is to understand how to _compute_ the Gamma function, you
         | 're going to have to spend a lot of time studying numeric
         | methods if you want to do more than robotically copy it from
         | _Numerical Recipes_.
         | 
         | Similarly many of the integrals used in statistics can end up
         | quite difficult to compute, but that difficulty doesn't impact
         | their ease of use in a mathematical context. This is a common
         | theme when working with applied math: you can do quite a lot of
         | mathematical work on problems that you don't necessarily know
         | how to compute yet. Once you solve your problem mathematically,
         | then you can go on to solving how to actually compute the
         | answer.
        
           | jbay808 wrote:
           | To add to this comment, it's hard to make any useful program
           | if you don't have at least a clear conceptual understanding
           | of what you're trying to do.
           | 
           | For example, perhaps you are trying to calculate a variance.
           | But do you have a set of raw data from which you will
           | estimate the variance? Or some summary statistics? Or do you
           | already have a probability distribution from which you will
           | compute the variance? How is it represented? Is that the
           | probability distribution over the particular variable you
           | want the variance for, or is it a related variable that needs
           | to be transformed first somehow?
           | 
           | You don't necessarily need to know how to handle all the math
           | by hand, but there's no avoiding the need for at least a
           | clear idea of what you're doing and what the sticking points
           | might be.
        
       ___________________________________________________________________
       (page generated 2022-06-06 23:01 UTC)