[HN Gopher] Ask HN: What are the foundational texts for learning...
       ___________________________________________________________________
        
       Ask HN: What are the foundational texts for learning about
       AI/ML/NN?
        
       I've picked up the following, just wondering what everyone's
       thoughts are on the best books for a strong foundation:  Pattern
       Recognition and Machine Learning - Bishop  Deep Learning -
       Goodfellow, Bengio, Courville  Neural Smithing - Reed, Marks
       Neural Networks - Haykin  Artificial Intelligence - Haugeland
        
       Author : mfrieswyk
       Score  : 205 points
       Date   : 2023-01-09 16:34 UTC (6 hours ago)
        
       | robg wrote:
       | Coming from cognitive neuroscience surprised that _Explorations
       | in Parallel Distributed Processing_ by McClelland and Rumelhart
       | doesn't get more attention as a classic in bridging old school AI
       | approaches with the modern paradigm.
       | 
       | https://psycnet.apa.org/record/1988-97441-000
        
       | junkerm wrote:
       | In read parts of Murphys "Probabilistic Maschine Laearning" (vol
       | 1) which is an update of an existing book in ML. It covers a
       | broad range of topics also very recent developments. It also
       | includes foundation topics such as probability, linear algebra,
       | optimization. Also it is quite aligned with the Goodfellow book.
       | I found it quite challenging at certain points. What helped a lot
       | was to read a book on bayesian statistics. I used Think Bayes by
       | Allen Downey for that
       | (http://allendowney.github.io/ThinkBayes2/index.html)
        
       | zffr wrote:
       | You may also want to consider reading through some of the
       | important (or highly cited) academic papers in AI/ML/NN. From
       | these papers you may get a sense of the techniques researchers
       | are using, and which topics are most important to learn.
       | 
       | I have not applied this technique to AI/ML/NN specifically, but
       | it has been useful for me when trying to learn other topics.
        
       | raz32dust wrote:
       | I personally consider Linear algebra to be foundational in AI/ML.
       | Intro to Linear algebra, Gilbert Strang. And his free course on
       | MIT OCW is fantastic too.
       | 
       | While having strong mathematical foundation is useful, I think
       | developing intuition is even more important. For this, I
       | recommend Andrew Ng's coursera courses first before you dive too
       | deep.
        
         | mfrieswyk wrote:
         | I never took beyond Precalculus in school, thanks for the tip!
        
           | p1esk wrote:
           | Oh, most recommendations here assume stem college math
           | knowledge. You should become comfortable with calculus,
           | linear algebra, and probability/stats - those are the
           | foundations of ML.
        
           | NationalPark wrote:
           | Many of the suggestions so far are assuming you have taken
           | undergraduate linear algebra and calculus. I'd start with
           | those two subjects, you really can't build a foundational
           | understanding of modern AI techniques without them.
        
             | mythhouse wrote:
             | i did linear algebra and calculus using strang and spivak
             | textbooks. Those were classes i enjoy the most. But most of
             | that stuff has atrophied from my brain over the years, do
             | you recommend redoing those courses fast or can i learn
             | when i need it on demand basis.
        
               | viscanti wrote:
               | You can try a refresher on Jacobians. If you're following
               | everything there well enough, you probably have what you
               | need to move forward (and pick up the rusty parts that
               | you need as you go). If you're completely lost then you
               | probably want to go back for a quick refresher.
        
               | jimbokun wrote:
               | Review on an on demand basis.
               | 
               | The main concepts are matrix multiplication and
               | derivatives and their significance. Then you can dig into
               | the specifics and review or expand your knowledge as
               | needed.
        
         | viscanti wrote:
         | Strang is great but he covers a lot of things that don't have
         | much carryover to AI/ML and doesn't really cover things like
         | Jacobians which do. Maybe there's something more useful for
         | someone who is only learning Calculus and Linear Algebra for
         | AI/ML than what Strang teaches.
        
         | mindcrime wrote:
         | Another interesting resource for Linear Algebra is the "Coding
         | the Matrix" course.
         | 
         | http://codingthematrix.com/
         | 
         | https://www.youtube.com/playlist?list=PLEhMEyM9jSinRHXJgRCOL...
        
       | pkoird wrote:
       | AIMA by Russel and Norvig is a must read IMO.
        
       | dmarcos wrote:
       | I remember Carmack mentioning in a podcast a list of seminal
       | papers that Ilya Sutskever (@ilyasut) gave to him to learn AI
       | foundations. I would love to see that list.
        
       | davidhunter wrote:
       | The Quest for Artificial Intelligence: A History of Ideas and
       | Achievements Nils J. Nilsson
       | 
       | This is a good overview of the history of the field (up to SVMs
       | and before deep NNs). I found this useful for putting all the
       | different approaches into context.
        
       | softwaredoug wrote:
       | "Introduction to Statistical Learning" -
       | https://www.statlearning.com/
       | 
       | (there's also "Elements of Statistical Learning" which is a more
       | advanced version)
       | 
       | AI: A Modern Approach - https://aima.cs.berkeley.edu/
        
         | kevinskii wrote:
         | I agree. I read the first edition to Intro to Statistical
         | Learning and it went into just the right level of mathematical
         | depth. The authors also have Youtube lectures that accompany
         | the chapters, and these are a great reinforcement of the
         | material.
        
         | rg111 wrote:
         | ISL is a legit good book. Has the correct amount and balance or
         | rigor and application.
         | 
         | The explanation, examples, projects, math- all are crisp.
         | 
         | As the name suggests, it is only an introduction (unlike CLRS).
         | And it does serve as a great beginners' book giving you proper
         | foundation for the things that you learn and apply in the
         | future.
         | 
         | One thing people complain about is it being written in R, but
         | no serious hacker should fear R, as it can be picked up in 30
         | minutes, and you can implement the ideas in Python.
         | 
         | As someone with industry experience in Deep Learning, I will
         | recommend this book.
         | 
         | The ML course by Andrew Ng has no parallel, though. One must
         | try and do that course. Not sure about the current iteration,
         | but the classic one (w/ Octabe/MATLAB) was really great.
        
         | bjornsing wrote:
         | The Elements of Statistical Learning, by Jerome H. Friedman,
         | Robert Tibshirani, and Trevor Hastie. I've seen it referenced
         | quite a few times and the TOC looks good.
        
           | jtmcmc wrote:
           | This was one of the first books my advisor told me to read
           | when I started my ML phd a...long time ago. The fundamentals
           | of machine learning haven't changed and it's a great book.
        
       | master_yoda_1 wrote:
       | This book is all you need https://probml.github.io/pml-
       | book/book1.html
        
       | stevenbedrick wrote:
       | To add to the great recommendations on this thread, I really like
       | Moritz Hardt and Benjamin Recht's "Patterns, Predictions, and
       | Actions". It's published by Princeton University Press here:
       | https://press.princeton.edu/books/hardcover/9780691233734/pa...
       | 
       | But is also available online as a preprint here:
       | https://mlstory.org/
        
       | 5cott0 wrote:
       | https://www.manning.com/books/deep-learning-with-python-seco...
        
       | digitalsushi wrote:
       | Are there obvious paths into these spaces for someone stuck over
       | in devops/infrastructure/platform engineering? Or is it too far a
       | hop to really find a direct path in?
       | 
       | Let me ask a slightly different way - can someone like me get
       | into a job like these, without needing some more college?
       | 
       | My day job is wrapping up OS templates for people with ML
       | software and I always wonder what they get to go do with them
       | once they turn into a compute instance.
        
         | jtmcmc wrote:
         | if you're already doing a job at a company that does this
         | stuff, can you talk to people about wanting to change teams and
         | learn?
        
         | friendlyHornet wrote:
         | I would like to know this, as well.
        
         | zmgsabst wrote:
         | Why not ask them?
         | 
         | Call it cross functional training to increase your domain
         | knowledge, tell your manager you need it to ensure you're
         | providing the best service possible, and get your coworkers to
         | help you learn the framework they use...?
        
       | ipnon wrote:
       | I'd posit we don't understand AIML enough to know their
       | foundations with much certainty. Take for example the discovery
       | of emergent zero-shot properties in the latest LLMs. My
       | recommendation to a beginner would be to grok gradient descent,
       | matrix multiplication, and the universal approximation theorem,
       | then get on to engineering like the rest of us. You can't go
       | wrong with Jeremy Howard's FastAI course and his "Deep Learning
       | for Coders."
        
       | dceddia wrote:
       | I'm a big fan of learning through practice vs learning all the
       | theory up front, and for anyone else who feels the same, the Fast
       | AI course and book are very good: https://fast.ai
       | 
       | The authors are working on a new course that'll dive deep into
       | the modern Stable Diffusion stuff too, which I'm looking forward
       | to.
        
       | rg111 wrote:
       | Do you have Linear Algebra knowledge, and Stats 101 knowledge?
       | 
       | Then start with ISLR.
       | 
       | Then go and watch Andrew Ng Machine Learning course on Coursera
       | (a new version was added in 2022 that uses Python).
       | 
       | Then read the sklearn book from its maintainers/core devs. It's
       | from O'Reilly.
       | 
       | Then go do the Deep Learning Specialization from deeplearning.ai.
       | 
       | Then do fast.ai course.
       | 
       | If interested in Deep RL, watch David Silver lectures, then read
       | Deep RL in Action by Zai, Brown. Then do the HF course on Deep
       | RL.
       | 
       | This is how you get started. Choose your books based on your
       | personality, needs, and contents covered.
       | 
       | And among MOOCs, I highly suggest the one by Canziani, LeCun from
       | NYU. (I loved the 2020 version.)
       | 
       | The one taught by Fei Fei Li and Andrej Karpathy is nice.
       | 
       | These two MOOCs can substitute classic books based on quality.
       | 
       | I have never read cover to cover any of the famous books. I read
       | a lot from them sticking to specific subjects.
       | 
       | Get to reading papers, finding implementations. Ng + ISLR will
       | give you good grounds. Fast.ai + deeplearning.ai will give you
       | capability to solve real problems. NYU + Tubingen + Stanford +
       | UMich (Justin Johnson) courses will bring you to the edge.
       | 
       | You need a lot of practical experience that aren't taught
       | anywhere. So, get your hands dirty early. Learn to use
       | frameworks, cloud platforms, etc.
       | 
       | Then start reading papers.
       | 
       | A crystal clear grasp on Math foundations is a must. Get it if
       | you don't have already.
        
       | TaupeRanger wrote:
       | There are none anymore. We now know that throwing a bunch of bits
       | into the linear algebra meat grinder gets you endless high
       | quality art and decent linguistic functionality. The architecture
       | of these systems takes maybe a week to deeply understand, or
       | maybe a month for a beginner. That's really it. Everything else
       | is obsolete or no longer applicable unless you're interested in
       | theoretical research on alternatives to the current paradigm.
        
         | jtmcmc wrote:
         | This is definitely a take that ignores the massive amount of
         | utility for ML that exists outside of generative images and NLP
         | on the one hand and on the other vastly misrepresents the time
         | it takes to understand a model, assuming one does not already
         | have a background in CS, linear algebra and in particular
         | matrix calculus, probability, stats, etc...
        
         | rg111 wrote:
         | You are plain exaggerating. You can't do all of them in a few
         | weeks. Algorithms: Lin Reg -> Log Reg -> NN -> CNN + RNN ->
         | GANs + Transformers -> ViT -> Multimodal AI + LLMs + Diffusion
         | + Auto Encoders                   SVM, PCA, kNN, k-means
         | clustering, etc.              LightGBM, XGboost, Catboost, etc.
         | Optimization and optimizers.              Application-wise:
         | Classification, Semantic Segmentation, Pose Estimation, Text
         | Generation, Summarization, NER, Image Generation, Captioning,
         | Sequence Generation (like music/speech), text to speech, speech
         | to text, recommender systems, sentiment amalysis, tabular data,
         | etc.              Frameworks:         pandas, sklearn, PyTorch,
         | Jax -> training  inference, data loading
         | Platforms:         AWS + GCP + Azure         And a lot of GPU
         | shenanigans + framework/platform specific quirks
         | 
         | All these will take you ~2 years or 1.5 years at least,
         | 
         |  _given that:_
         | 
         | - you already know Python/any programming language properly
         | 
         | - you already know college level math (many people say you
         | don't need it, but _haven 't met a single soul_ in ML
         | research/modelling without college level math)
         | 
         | - you know Stats 101 matching a good uni curriculum and ability
         | to learn beyond
         | 
         | - you know git, docker, cli, etc.
         | 
         | Every influencer and their mother promising to teach you Data
         | Science in 30 days are plain lying.
         | 
         | Edit: I see that I left out Deep RL. Let's keep it that way for
         | now.
         | 
         | Edit2: Added tree based methods. These are very important.
         | XGBoost outperforms NNs _every time_ on tabular data. I also
         | once used an RF head appended to a DNN, for final prediction.
         | Added optimizers.
        
           | jimbokun wrote:
           | > SVM, PCA, kNN, k-means clustering
           | 
           | Are these still relevant in the age of Deep Neural Networks?
        
             | PeterisP wrote:
             | Yes, there are all kinds of tasks where the appropriate
             | solution is to use a DNN for much of the learning (either
             | directly learning the correlations or as transfer learning
             | from some large-data self-supervised task) and then, once
             | you have the results of that DNN inference, work with these
             | methods - apply PCA for interpreting the resulting vector,
             | or to separate out specific dimensions to expose them for
             | adjustment in some generative task; or perhaps the best way
             | for the final decision is a kNN on top of the DNN output,
             | etc.
        
             | popinman322 wrote:
             | PCA is a foundational dimension reduction technique, and
             | kNN can be used in conjunction with embeddings.
             | 
             | k-means is still great when you have prior/domain knowledge
             | about the number of groups.
        
             | jeffreyrogers wrote:
             | It's not in your list but decision trees still outperform
             | DNN on many tabular problems and can be trained faster.
        
             | rg111 wrote:
             | Yes.
             | 
             | Different problems require different solutions.
             | 
             | Sometimes, an NN would be overkill.
             | 
             | And stakeholders in many situations would like insights why
             | the prediction is what it is. NNs are miles behind LogReg
             | in terms of interpretablity.
        
         | cyber_kinetist wrote:
         | You still need to understand some basic theory/math about
         | probabilistic inference (along with some knowledge of linear
         | algebra), or else you'll get a bit overwhelmed by some of the
         | equations and not understand what the papers are talking about.
         | PRML by Bishop is probably more than enough to start reading ML
         | papers comfortably though. (This would probably be too easy for
         | a competent math major, but not all of us are trained that way
         | from the beginning...)
        
           | jeffreyrogers wrote:
           | I'm not sure why you're getting downvoted. I find it hard to
           | believe that someone without a decently strong math
           | background could make sense of a modern paper on deep
           | learning. I have a math minor from a good school and had to
           | brush up on some topics before papers started making sense to
           | me.
        
         | moneywoes wrote:
         | What resources are there to understand in a month?
        
         | sillysaurusx wrote:
         | A month to deeply understand?
         | 
         | I've been doing it since early 2019 and there are still
         | subtleties that catch me off guard. Get back to me when you're
         | not surprised that you can get rid of biases from many layers
         | without harming training.
         | 
         | I broadly agree with you, but the timeline was just a little
         | too aggressive. By about 10x. :)
        
           | topspin wrote:
           | > I've been doing it since early 2019 and there are still
           | subtleties that catch me off guard.
           | 
           | That's true of every non-trivial discipline. I often learn
           | subtleties about programming languages and hobbies I've been
           | dealing with for decades.
        
           | hooande wrote:
           | This is separate from understanding how a language model or
           | transformer works. You could read the major papers behind
           | those ideas and read every line of code involved several
           | times over in a month. I'd recommend it, if you're super
           | curious.
           | 
           | You can figure out the bias thing after about a month (or so)
           | of hands on practice. Do one Kaggle seriously and it'll
           | become pretty clear, pretty quickly.
        
       | ly3xqhl8g9 wrote:
       | Not sure if foundational (quite a tall order in such a fast-
       | moving field), but for sure a nice introduction into neural
       | networks, and even mathematics in general (for a teenager,
       | because it's nice to see numbers in action beyond school-level
       | algebra):
       | 
       | - Harrison Kinsley, Daniel Kukiela, _Neural Networks from
       | Scratch_ , https://nnfs.io,
       | https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0Qu...
       | 
       | Somewhat foundational, if not in actuality, then in the intention
       | to actually build a theory as in theory of gravitation, although
       | not necessarily an introductory text:
       | 
       | - Daniel A. Roberts, Sho Yaida, _The Principles of Deep Learning
       | Theory_ , https://arxiv.org/abs/2106.10165
        
       | bilsbie wrote:
       | If anyone is just starting and out wanting to do a study group
       | let me know.
       | 
       | I'm having trouble keeping my motivation up but I really want to
       | get up to speed on how LLM's work and someday make a career
       | switch.
        
         | moneywoes wrote:
         | Im down
        
       | adg001 wrote:
       | I have not seen mentioned so far in this thread the following
       | book, which I can't recommend more highly:
       | 
       | Understanding Machine Learning: From Theory To Algorithms - Shai
       | Shalev-Shwartz
        
       | dezzeus wrote:
       | You may want to also consider this one:
       | 
       | Artificial Intelligence, a modern approach - Stuart Russell,
       | Peter Norvig
        
         | apu wrote:
         | The big book of stuff that doesn't work.
        
           | rzzzt wrote:
           | Prop it up with a small stick and put some cracked walnuts
           | below to catch mice with it.
        
         | mindcrime wrote:
         | Can't recommend this highly enough, if for no other reason than
         | to provide some context to help the OP from getting trapped in
         | the "deep learning is all you need" echo-chamber. Sure ANN's
         | and DL are great and do amazing things, but until it's proven
         | that they really are the "be all, end all" (something I suspect
         | we're far from) then it makes sense to dedicate at least _some_
         | cycles to considering other paradigms.
        
       | bjornsing wrote:
       | It's probably a bit off the beaten path, but I can highly
       | recommend Probability Theory, The Logic of Science, by E. T.
       | Jaynes.
       | 
       | In the opening chapter Jaynes describes a hypothetical system he
       | calls "The Robot". He then lays out the mathematics of the "The
       | Robot's" thinking in detail: essentially Bayesian probability
       | theory. This is the best summary of an ideal ML/AI system I've
       | come across. It's also very philosophically enlightening.
        
         | misiti3780 wrote:
         | seconded! it's a great book.
        
         | sillysaurusx wrote:
         | I'm so sad the editor chose not to publish Jaynes' C snippets
         | because "they were too cryptic." They would've helped clarify
         | the ideas greatly.
         | 
         | It's a good book, but I don't know how it's related to ML. My
         | own answer would be "Just do it." Find an ML project you like
         | and start tinkering around. But everyone learns differently, so
         | maybe there's a book that can replace experience.
        
           | bjornsing wrote:
           | How is Jaynes (2003) related to ML? I guess in the same way
           | probability theory is related to ML: it underpins just about
           | every meaningful step forward in ML/AI research, as I see it.
        
       | IanCal wrote:
       | I think a good start is to think about what you want to do. "Back
       | in my day" ai was mostly academic and had more classic
       | foundational parts with newer flashy bits. It wasn't, broadly,
       | applicable to the real world. Some parts but not a huge amount.
       | 
       | Now I think you've got key parts. There's how to _use_ recent
       | production ready models /systems, how to _train_ them and how to
       | _make_ them. Is it in a research or business context?
       | 
       | The field is also broad enough that any one section (text,
       | images, probably symbols) and subsection (time series, bulk, fast
       | online work) all have significant bodies of work behind them. My
       | splits here will not be the best currently so I'm happy for any
       | corrections on a useful hierarchy by the way.
       | 
       | Perhaps you're interested in the history and what's led up to
       | today's work? That's more of a "brief history of time" style
       | coverage, but illuminating.
       | 
       | I'm aware I've not helpfully answered, but I think the same
       | question could have very different valid goals and wanted to
       | bring that to the fore.
        
       | alphabetting wrote:
       | For a less technical history of the field and major players I'd
       | recommend Genius Makers.
        
       | crosen99 wrote:
       | "Neural Networks and Deep Learning", by Michael Nielsen
       | http://neuralnetworksanddeeplearning.com (full text)
       | 
       | The first chapter walks through a neural network that recognizes
       | handwritten digits implemented in a little over 70 lines of
       | Python and leaves you with a very satisfying basic understanding
       | of how neural networks operate and how they are trained.
        
         | martythemaniak wrote:
         | This is the thing that made NNs "click" for me, I think it was
         | very good. Before this I did Andrew Ng's old ML course on
         | coursera, so I thought that was a good intro to old ML
         | approaches, common terms/techniques and flowed nicely into NNs.
         | 
         | But there's are both kinda old now, so there must be something
         | newer that'll give you an equally good intro to transformers,
         | etc.
        
       | gaspb wrote:
       | If you're more inclined to theory, I would suggest "Learning
       | Theory from First Principles" by F. Bach:
       | https://www.di.ens.fr/~fbach/ltfp_book.pdf
       | 
       | The book assumes limited knowledge (similar to what is required
       | for Pattern Recognition I would say) and gives a good intuition
       | on foundational principles of machine learning (bias/variance
       | tradeoff) before delving to more recent research problems. Part I
       | is great if you simply want to know what are the core tenets of
       | learning theory!
        
       | PartiallyTyped wrote:
       | I recommend against DL by Goodfellow. At this point it is pretty
       | much outdated. Actually, anything specific to NNs is already
       | outdated by release.
       | 
       | You'd need the following background:
       | 
       | - Linear Algebra
       | 
       | - Multivariate Calculus
       | 
       | - Probability theory && Statistics
       | 
       | Then you need a decent ML book to get the foundations of ML, you
       | can't go wrong with either of these:
       | 
       | - Bishop's Pattern Recognition
       | 
       | - Murphy's Probabilistic ML
       | 
       | - Elements of statistical learning
       | 
       | - Learning from data
       | 
       | You can supplement Murphy's with the advanced book. Elements is a
       | pretty tough book, consider going through "Introduction to
       | statistical learning"[1]. Bishop and Murphy include foundational
       | topics in mathematics.
       | 
       | LfD is a great introductory book and covers one of the most
       | important aspects of ML, that is, model complexity and families
       | of models. It can be supplemented with any of the other books.
       | 
       | I'd also recommend doing some abstract algebra, but it's not a
       | prerequisite.
       | 
       | If you would like a top-down approach, I recommend getting the
       | book "Mathematics of Machine Learning" and learning as needed.
       | 
       | For NN methods, some recommendations:
       | 
       | - https://paperswithcode.com/methods/category/regularization
       | 
       | - https://paperswithcode.com/methods/category/stochastic-optim...
       | 
       | - https://paperswithcode.com/methods/category/attention-mechan...
       | 
       | - https://paperswithcode.com/paper/auto-encoding-variational-b...
       | 
       | For something a little bit different but worth reading given that
       | you have the prerequisite mathematical maturity
       | 
       | - Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and
       | Gauges | https://arxiv.org/abs/2104.13478
       | 
       | [1] https://www.statlearning.com/
       | 
       | Many thanks to the user "mindcrime" for catching my error with
       | Introduction to statistical learning.
        
         | mindcrime wrote:
         | _consider going through "Introductions to Elements of
         | statistical learning"_
         | 
         | Was that supposed to be _An Introduction to Statistical
         | Learning_ [1] or maybe _Introduction to Statistical Relational
         | Learning_ [2]? I don't think there is a book titled
         | _Introduction to Elements of Statistical Learning_?
         | 
         | [1]: https://www.statlearning.com/
         | 
         | [2]: https://www.cs.umd.edu/srl-book/
        
           | PartiallyTyped wrote:
           | I referred to [1], thanks I have corrected GP.
        
         | sillysaurusx wrote:
         | (I can't wait until the myth that you need linear algebra and
         | calculus to do ML finally dies. It's like saying that you need
         | to understand assembly to do programming. It helps, but it's
         | far from a requirement.)
        
           | 6gvONxR4sf7o wrote:
           | I disagree strongly. In your analogy, if the compiler broke
           | down all the time, you would probably need to understand
           | assembly to do programming. ML is amazing today, but still
           | kinda sucks. In general you'll have a bunch of failures on
           | the way to a successful novel application, so it's more
           | critical to understand what's going on under the hood in ML
           | than in your programming analogy.
           | 
           | If you just want to apply well known things to well known
           | things, sure you're right. But as soon as things go wrong, I
           | couldn't imagine how much more inefficient my iteration
           | cycles would be trying to do novel work without understanding
           | linear algebra (for some kinds of novel work) or calc (for
           | other kinds of novel work). I think you kinda get at this
           | when you say it's not necessary but it helps. It's not
           | necessary, but it helps _a lot_ with anything off the beaten
           | track.
        
             | sillysaurusx wrote:
             | We agree, I think!
             | 
             | And certainly, if you're one of those people who can pull
             | it off, studying ML from first principles is probably an
             | advantage. I just wince every time since I wouldn't have
             | gotten into ML in the first place if I had to start with a
             | big Calculus tome. There are probably a lot of people like
             | me out there.
        
               | PartiallyTyped wrote:
               | OP asked for foundational, and I provided _foundational_.
               | In my opinion, everyone should start from some sound
               | foundations in LinAlg and Calculus.
               | 
               | Here are a couple of errors that stem from a single
               | foundational problem:
               | 
               | - a linear regressor can not be more than the number of
               | datapoints
               | 
               | - dimensionality reduction when you have NxM with M > N
               | is bogus and you need a bigger dataset to do anything
               | meaningful other than clustering
               | 
               | - input dimension of output layer is larger than the
               | number of samples
               | 
               | The underlying issue in all of these is the rank nullity
               | theorem which is pretty foundational for ML, and yet many
               | practitioners don't know about it or haven't made the
               | connection.
               | 
               | I am not expressing that you should have gone through
               | Spivak or build bottom up. There are books like
               | mathematics of ML that condense everything you need,
               | giving you a decent enough foundation for what you will
               | need.
        
           | antegamisou wrote:
           | > I can't wait until the myth that you need linear algebra
           | and calculus to do ML finally dies.
           | 
           | This is such a dangerously absurd claim.. but then, it speaks
           | volumes about the abysmal state the non-research heavy AI/ML
           | field has fallen into.
        
         | antegamisou wrote:
         | As always on HN, the right answer is at the bottom.
        
       | KRAKRISMOTT wrote:
       | Haugeland is GOFAI/cognitive science, not directly relevant to
       | modern machine learning variety of models unless you are doing
       | reinforcement learning or trees stuff (hey poker/chess/Go bots
       | are pretty cool!). Russel and Norvig are the typical introductory
       | textbooks for those. Marks and Haykins are all severely out of
       | date (they have solid content, but they don't have the same
       | _scale_ of modern deep learning which has many emergent
       | properties).
       | 
       | You are approaching this like an established natural sciences
       | field where old classics = good. This is not true for ML. ML is
       | developing and evolving quickly.
       | 
       | I suggest taking a look at Kevin Murphy's series for the
       | foundational knowledge. Sutton and Barto for reinforcement
       | learning. Mackay's learning algorithms and information theory
       | book is also excellent.
       | 
       | Kochenderfer's ML series is also excellent if you like control
       | theory and cybernetics
       | 
       | https://algorithmsbook.com/
       | https://mitpress.mit.edu/9780262039420/algorithms-for-optimi...
       | https://mitpress.mit.edu/9780262029254/decision-making-under...
       | 
       | For applied deep learning texts beyond the basics, I recommend
       | picking up some books/review papers on LLMs, Transformers, GANs.
       | For classic NLP, Jurafsky is the go-to.
       | 
       | Seminal deep learning papers:
       | https://github.com/anubhavshrimal/Machine-Learning-Research-...
       | 
       | Data engineering/science: https://github.com/eugeneyan/applied-ml
       | 
       | For speculation: https://en.m.wikipedia.org/wiki/Possible_Minds
        
         | ipnon wrote:
         | To your second point I have a sneaking suspicion whatever is
         | recommended in this very thread will suddenly jump in its
         | estimation as a "classic." History is made up as it goes along!
        
           | KRAKRISMOTT wrote:
           | Well, GP's _Neural Smithing_ is a solid example. There is
           | nothing wrong with it, it is surprisingly well written and
           | correct for something published before the millenium.
           | 
           | https://books.google.com/books/about/Neural_Smithing.html?id.
           | ..
           | 
           | Take a look at the Google Books preview (click view sample).
           | The basics are all there, intro to biological history of
           | neural networks, backpropagation, gradient descent, and
           | partial derivatives etc. It even hints at teacher-student
           | methods!
           | 
           | The only issue is that it missed out on two decades of
           | hardware development (and a bag of other optimization
           | tricks). Modern deep learning implementations requires
           | machine sympathy at scale. It also doesn't have any
           | literature on autoregressive networks like RNNs or image
           | processing tricks like CNNs.
        
         | mfrieswyk wrote:
         | Appreciate the comment very much. I feel like I need to build a
         | foundation context in order to appreciate the significance of
         | the latest developments, but I agree that most of what I posted
         | doesn't represent the state of the art.
        
         | starwind wrote:
         | Does the order matter for Kochenderfer? Any one of those put
         | more emphasis on controls than the others?
        
         | mtlmtlmtlmtl wrote:
         | A quick point about the "tree stuff" and Norvig&Russell:
         | 
         | While it does cover minimax trees, alphabeta etc, it only
         | really provides a very brief overview. The book is more of an
         | overview of the AI/ML fields as a whole. Game playing AI is
         | dense with various game-specific heuristics that the book
         | scarcely mentions.
         | 
         | Not sure about books, but the best resource I've found on at
         | least chess AI is chessprogramming.org, then just ingesting the
         | papers from the field.
        
       | cscurmudgeon wrote:
       | Get a strong grasp on Linear Algebra and everything else falls
       | into place more easily
       | 
       | https://math.mit.edu/~gs/learningfromdata/
        
       | gerash wrote:
       | I'd suggest these two by Kevin Murphy:
       | 
       | Probabilistic Machine Learning: An Introduction
       | 
       | https://probml.github.io/pml-book/book1.html
       | 
       | Probabilistic Machine Learning: Advanced Topics
       | 
       | https://probml.github.io/pml-book/book2.html
        
         | pablo24602 wrote:
         | Working through these right now- definitely recommend them
        
       | 6gvONxR4sf7o wrote:
       | Kevin Murphy's books (especially the new ones) are what I'd point
       | anyone towards for ML.
        
       | epgui wrote:
       | The foundations of AI/ML are really linear algebra and
       | statistics. But not the kinds of stats most people learn in
       | undergrad: focus on linear models (there are tons of great books
       | on just that; also look up "common statistical tests are linear
       | models" for a great intro into what i'd call useful stats),
       | bayesian stats, anova/manova/permanova, etc.
        
       | avipeltz wrote:
       | - _AIMA by Russel and Norvig_ is a classic but I would say is
       | more of overview of the field and for most topic areas isn 't
       | quite deep enough imo.
       | 
       | - For deep learning specifically, a more applied text that is
       | beautifully written and chock full of examples is Francois
       | Chollet's _Deep Learning with Python_ (there a new second edition
       | out with up to date examples using modern versions of
       | Tensorflow). The first 3 chapters I would give as required
       | reading for anyone interested in understanding some deep learning
       | fundamentals.
       | 
       | - _Deep Learning - goodfellow and bengio_ - seems like it would
       | be hard to get through without a reading group not exactly a APUE
       | or K &R type reading experience but I haven't spent enough time
       | with it.
       | 
       | If you haven't taken a Linear Algebra or Differential Equations
       | class its useful stuff to know for ML/DL theory but not fully
       | necessary to do applied work with modern high level libraries,
       | but definitely having a strong understanding of basic matrix math
       | is useful.
       | 
       | If you have interests in natural language processing theres a
       | couple good books:
       | 
       | - _Natural Language Processing with Python - Bird Klein, Loper_ ,
       | is a great intro to NLP concepts and working with NLTK which may
       | be a bit dated to some but I would definitely recommend, and its
       | online for free. Great examples.(https://www.nltk.org/book/)
       | 
       | - _Speech and Language Processing - Dan Jurafsky and James H.
       | Martin_ - is good, though I have only spent much time with the
       | pre-print
       | 
       | And then theres a lot of papers that are good reads. Let me know
       | if you have any questions or want a list of good papers.
       | 
       | If you just want to get off the ground and start playing with
       | stuff and building things I'd recommend fast.ai's free online
       | course - its pretty high level and a lot is abstracted away but
       | its a great start and can enable you to build lots of cool things
       | pretty rapidly. Andrew Ng's online course also is quite
       | requitable and will probably give you a bit more background and
       | fundamentals.
       | 
       | If I were to choose one book from the bunch it would be Chollet
       | it gives you pretty much all the building blocks you need to be
       | able to read some papers and try to implement things yourself and
       | I find building things a much more satisfying way to learn than
       | sitting down and writing proofs or just taking notes but thats
       | just my preference.
        
         | rg111 wrote:
         | Norvig-Russel has many chapters spanning hundreds of pages that
         | are way out of date and not used anywhere.
         | 
         | And the new things he cover are covered in a better manner and
         | better depth in other sources.
         | 
         | I read this book like a novel. Good for a basic overview, but
         | the RoI is very low.
        
       | daturkel wrote:
       | I maintain a list of well-known or foundational papers in ML in a
       | github repo that may be of interest to readers of this thread
       | 
       | https://github.com/daturkel/learning-papers
        
       | bradreaves2 wrote:
       | This is off the beaten path, but consider Abu-Mostafa et al.'s
       | "Learning from Data". https://www.amazon.com/Learning-Data-Yaser-
       | S-Abu-Mostafa/dp/...
       | 
       | I adore PRML, but the scope and depth is overwhelming. LfD
       | encapsulates a number of really core principles in a simple text.
       | The companion course is outstanding and available on EdX.
       | 
       | The tradeoff is that LfD doesn't cover a lot of breath in terms
       | of looking at specific algorithms, but your other texts will do a
       | better job there.
       | 
       | My second recommendation is to read the documentation for
       | Scikit.Learn. It's amazingly instructive and a practical guide to
       | doing ML in practice.
        
         | PartiallyTyped wrote:
         | LfD is a great book to get people to think about complexity
         | classes and model families. We used that in my grad course and
         | I can recommend it.
        
         | vowelless wrote:
         | I strongly second this. Abu Mostafa has videos and homework for
         | this course too. This course was the one that made a LOT of
         | fundamental things "click", like, why does learning even work
         | and what are some broad expectations about what we can and
         | cannot learn.
        
       ___________________________________________________________________
       (page generated 2023-01-09 23:01 UTC)