hngopher.com

       [HN Gopher] Introduction to Probability for Data Science
       ___________________________________________________________________
        
       Introduction to Probability for Data Science
        
       Author : mariuz
       Score  : 105 points
       Date   : 2022-01-24 17:23 UTC (5 hours ago)
        
 (HTM) web link (probability4datascience.com)
 (TXT) w3m dump (probability4datascience.com)
        
       | tacoluv wrote:
       | Does anyone know of a way to send some money to the author? I
       | know he says "free" a lot but this is so awesome I want to treat
       | them to something.
        
       | LittlePeter wrote:
       | In second paragraph of Chapter 2 - Probability:
       | 
       | > No matter whether you prefer the frequentist's view or the
       | Bayesian's view...
       | 
       | I don't think the intended audience reading this chapter has this
       | preference at all...
       | 
       | Then the set notation uses square brackets instead of curly
       | braces? I cannot get over this for some reason.
        
         | hervature wrote:
         | You are misrepresenting that quote. That comes after giving a
         | fairly generic overview of both in which someone could form an
         | opinion. One does not need to know the peculiarities of
         | Bayesian reasoning to have the opinion "you should incorporate
         | prior knowledge". Also, the set notation does use curly braces.
        
           | LittlePeter wrote:
           | In my mind you cannot be frequentist or Bayesian after
           | reading just the first paragraph of Chapter 2. But fair
           | enough I am a bit too critical here.
           | 
           | Also you are right, set notation does use curly braces, I am
           | relieved :-). I was confused by the A = [-1, 1-1/n] (interval
           | set notation) on page 8 that I misread as [-1, 1, 1/n]...
        
             | ska wrote:
             | > In my mind you cannot be frequentist or Bayesian after
             | reading just the first paragraph of Chapter 2.
             | 
             | I don't think the author is asking you to, at all. They are
             | pointing out that there are two "camps" and you will see
             | these terms bandied about (e.g. if you google stuff). But
             | then they claim (rightly, I think for an intro like this)
             | that it doesn't really matter for the material to
             | (immediately) follow and you are better off focusing on
             | more fundamental ideas of probability.
        
       | heresie-dabord wrote:
       | > Some people ask how much money I can make from this book. The
       | answer is ZERO. There is not a single penny that goes to my
       | pocket. Why do I do that? Textbooks today are just ridiculously
       | expensive. [...] Education should be accessible to as many people
       | as possible, especially to those underpreviledged families.
       | 
       | B r a v o ! A free, quality education is the foundation for
       | social progress and economic prosperity.
        
       | dwrodri wrote:
       | This looks like a fantastic resource. Thanks for sharing!
       | 
       | I really enjoy the Bayesian side of ML, but it's definitely not
       | the most accessible. Erik Bernhardsson cites latent dirichlet
       | allocation as a big inspiration behind the music recommendation
       | system he originally designed for Spotify, which is apparently
       | still in use today[1]. I still struggle with grokking latent
       | factor models, but it can be so rewarding to build your own and
       | watch it work (even with only moderate success!).
       | 
       | Kevin Murphy has been working on a new edition of MLaPP that is
       | now two volumes, with the last volume on advanced topics slated
       | for release next year. However, both the old edition and the
       | drafts for the new edition are available on his website here[2].
       | 
       | The University of Tubingen has a course on probabilistic ml which
       | probably has one of the most thorough walkthroughs of a latent
       | factor model I've found on the Internet. You can find the full
       | playlist of lectures for free here on YouTube[3].
       | 
       | In terms of other resources for deep study on fastinating topics
       | which require some command over stats and probability:
       | 
       | - David Silverman's lectures on reinforcement learning are
       | fantastic [4]
       | 
       | - The Machine Learning Summer School lectures are often quite
       | good, with exceptionally talented researchers / practictioners
       | being invited to provide multi-hour lectures on their domain of
       | expertise with the intended audience being a bunch of graduate
       | students with intermediate backgrounds in general ML topics. [5]
       | 
       | 1: https://www.slideshare.net/erikbern/music-recommendations-
       | ml... 2: https://probml.github.io/pml-book/ 3:
       | https://www.youtube.com/playlist?list=PL05umP7R6ij1tHaOFY96m...
       | 4: https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-
       | OYHWgPe... 5: http://mlss.cc
        
       | graycat wrote:
       | "A random process is a function indexed by a random key."
       | 
       | Not just wrong, wildly bad nonsense.
       | 
       | Go get some data. Now you have the value of a _random variable_.
       | 
       | We don't get clear on just what _random_ means, and in _random
       | variable_ we do not assume some element of not knowing. In
       | particular _truly random_ is nonsense.
       | 
       | Suppose we have a non-empty set I and for each i in I we have a
       | random variable X_i (using TeX notation for a subscript). Then
       | the I and the set of all X_i is a _random process_ or a
       | _stochastic process_. We might write (X_i, I) or some such
       | notation.
       | 
       | Commonly the set I is an interval subset of the real line and
       | denotes time. Set I might be half of the real line or all of it
       | or just some interval, e.g., [0,1].
       | 
       | The set I might be just the numbers
       | 
       | [1, 2, 3, 4, 5, 6}
       | 
       | for, say, playing with dice with the usual six sides.
       | 
       | I might be the integers in [1, 52] for considering card games.
       | 
       | But the set I might be all the points on the surface of a sphere
       | for considering, say, the weather, maybe the oceans, etc.
       | 
       | The set I might be all pairs (t, x, y, z) where t is a real
       | number denoting time and the other three are coordinates in
       | ordinary 3-space.
       | 
       | A random variable can also be considered a function with domain a
       | _probability space_ O. So for random variable Y, for each w in O,
       | Y(w) is the value of the random variable Y at _sample_ w. Right,
       | the usual notation has capital Greek omega for O and lower case
       | Greek omega for w.
       | 
       | Then for a particular w and stochastic process X with index set
       | I, all the X_t(w) as t varies is a _sample path_ of the process
       | X. E.g., a plot of the stock marked DJI for yesterday is part of
       | such a sample path. So, with stochastic processes, what we
       | observe are sample paths.
       | 
       | That's a start on stochastic processes. Going deep into the field
       | gets to be difficult quickly. Just quickly, look for names
       | Kolmogorov, Dynkin, Doob, Ito, Shiryaev, Skorokhod, Rockafellar,
       | Cinlar, Strook, Varadhan, Mckean, Blumenthal, Getoor, Fleming,
       | Bertsekas, Karatzas, Shreve, Neveu, Tulcea(s).
       | 
       | For some of the _flavor_ of probability theory and stochastic
       | processes, see the article on _liftings_ at
       | 
       | https://en.wikipedia.org/wiki/Lifting_theory
       | 
       | I had the main book on liftings, I'd gotten for $1 at a used book
       | store (not a big seller) but lost it in a recent move.
        
       ___________________________________________________________________
       (page generated 2022-01-24 23:03 UTC)