[HN Gopher] Introduction to Probability for Data Science ___________________________________________________________________ Introduction to Probability for Data Science Author : mariuz Score : 105 points Date : 2022-01-24 17:23 UTC (5 hours ago) (HTM) web link (probability4datascience.com) (TXT) w3m dump (probability4datascience.com) | tacoluv wrote: | Does anyone know of a way to send some money to the author? I | know he says "free" a lot but this is so awesome I want to treat | them to something. | LittlePeter wrote: | In second paragraph of Chapter 2 - Probability: | | > No matter whether you prefer the frequentist's view or the | Bayesian's view... | | I don't think the intended audience reading this chapter has this | preference at all... | | Then the set notation uses square brackets instead of curly | braces? I cannot get over this for some reason. | hervature wrote: | You are misrepresenting that quote. That comes after giving a | fairly generic overview of both in which someone could form an | opinion. One does not need to know the peculiarities of | Bayesian reasoning to have the opinion "you should incorporate | prior knowledge". Also, the set notation does use curly braces. | LittlePeter wrote: | In my mind you cannot be frequentist or Bayesian after | reading just the first paragraph of Chapter 2. But fair | enough I am a bit too critical here. | | Also you are right, set notation does use curly braces, I am | relieved :-). I was confused by the A = [-1, 1-1/n] (interval | set notation) on page 8 that I misread as [-1, 1, 1/n]... | ska wrote: | > In my mind you cannot be frequentist or Bayesian after | reading just the first paragraph of Chapter 2. | | I don't think the author is asking you to, at all. They are | pointing out that there are two "camps" and you will see | these terms bandied about (e.g. if you google stuff). But | then they claim (rightly, I think for an intro like this) | that it doesn't really matter for the material to | (immediately) follow and you are better off focusing on | more fundamental ideas of probability. | heresie-dabord wrote: | > Some people ask how much money I can make from this book. The | answer is ZERO. There is not a single penny that goes to my | pocket. Why do I do that? Textbooks today are just ridiculously | expensive. [...] Education should be accessible to as many people | as possible, especially to those underpreviledged families. | | B r a v o ! A free, quality education is the foundation for | social progress and economic prosperity. | dwrodri wrote: | This looks like a fantastic resource. Thanks for sharing! | | I really enjoy the Bayesian side of ML, but it's definitely not | the most accessible. Erik Bernhardsson cites latent dirichlet | allocation as a big inspiration behind the music recommendation | system he originally designed for Spotify, which is apparently | still in use today[1]. I still struggle with grokking latent | factor models, but it can be so rewarding to build your own and | watch it work (even with only moderate success!). | | Kevin Murphy has been working on a new edition of MLaPP that is | now two volumes, with the last volume on advanced topics slated | for release next year. However, both the old edition and the | drafts for the new edition are available on his website here[2]. | | The University of Tubingen has a course on probabilistic ml which | probably has one of the most thorough walkthroughs of a latent | factor model I've found on the Internet. You can find the full | playlist of lectures for free here on YouTube[3]. | | In terms of other resources for deep study on fastinating topics | which require some command over stats and probability: | | - David Silverman's lectures on reinforcement learning are | fantastic [4] | | - The Machine Learning Summer School lectures are often quite | good, with exceptionally talented researchers / practictioners | being invited to provide multi-hour lectures on their domain of | expertise with the intended audience being a bunch of graduate | students with intermediate backgrounds in general ML topics. [5] | | 1: https://www.slideshare.net/erikbern/music-recommendations- | ml... 2: https://probml.github.io/pml-book/ 3: | https://www.youtube.com/playlist?list=PL05umP7R6ij1tHaOFY96m... | 4: https://www.youtube.com/playlist?list=PLqYmG7hTraZDM- | OYHWgPe... 5: http://mlss.cc | graycat wrote: | "A random process is a function indexed by a random key." | | Not just wrong, wildly bad nonsense. | | Go get some data. Now you have the value of a _random variable_. | | We don't get clear on just what _random_ means, and in _random | variable_ we do not assume some element of not knowing. In | particular _truly random_ is nonsense. | | Suppose we have a non-empty set I and for each i in I we have a | random variable X_i (using TeX notation for a subscript). Then | the I and the set of all X_i is a _random process_ or a | _stochastic process_. We might write (X_i, I) or some such | notation. | | Commonly the set I is an interval subset of the real line and | denotes time. Set I might be half of the real line or all of it | or just some interval, e.g., [0,1]. | | The set I might be just the numbers | | [1, 2, 3, 4, 5, 6} | | for, say, playing with dice with the usual six sides. | | I might be the integers in [1, 52] for considering card games. | | But the set I might be all the points on the surface of a sphere | for considering, say, the weather, maybe the oceans, etc. | | The set I might be all pairs (t, x, y, z) where t is a real | number denoting time and the other three are coordinates in | ordinary 3-space. | | A random variable can also be considered a function with domain a | _probability space_ O. So for random variable Y, for each w in O, | Y(w) is the value of the random variable Y at _sample_ w. Right, | the usual notation has capital Greek omega for O and lower case | Greek omega for w. | | Then for a particular w and stochastic process X with index set | I, all the X_t(w) as t varies is a _sample path_ of the process | X. E.g., a plot of the stock marked DJI for yesterday is part of | such a sample path. So, with stochastic processes, what we | observe are sample paths. | | That's a start on stochastic processes. Going deep into the field | gets to be difficult quickly. Just quickly, look for names | Kolmogorov, Dynkin, Doob, Ito, Shiryaev, Skorokhod, Rockafellar, | Cinlar, Strook, Varadhan, Mckean, Blumenthal, Getoor, Fleming, | Bertsekas, Karatzas, Shreve, Neveu, Tulcea(s). | | For some of the _flavor_ of probability theory and stochastic | processes, see the article on _liftings_ at | | https://en.wikipedia.org/wiki/Lifting_theory | | I had the main book on liftings, I'd gotten for $1 at a used book | store (not a big seller) but lost it in a recent move. ___________________________________________________________________ (page generated 2022-01-24 23:03 UTC)