[HN Gopher] Neural Networks from Scratch ___________________________________________________________________ Neural Networks from Scratch Author : bane Score : 297 points Date : 2021-10-09 03:14 UTC (2 days ago) (HTM) web link (aegeorge42.github.io) (TXT) w3m dump (aegeorge42.github.io) | synergy20 wrote: | very cool, nice UI, simplest tutorial but grasps the gist, | perfect for starters to have a big picture before diving in the | details. | spoonsearch wrote: | Very nice, the color combination and the UI is so pleasing. The | explanation is cool :) | robomartin wrote: | For those curious about the _nabla_ ([?]) or gradient symbol (not | a Greek letter): | | https://en.wikipedia.org/wiki/Nabla_symbol | mLuby wrote: | While most of the random starting weights converged quickly, this | one got stuck with a fairly incorrect worldview, so to speak: | | ![data=blueberries in center ringed by strawberries; model=top | and bottom third red, middle third blue instead of the expected | outer ring red inner circle blue.](https://imgur.com/a/N2w69Mp) | | Is it overfitting to say the same is true for humans, where a | brain's starting weights and early experiences may make it much | more difficult to achieve an accurate model? | sarathyweb wrote: | The text is too small to read on my phone. I cannot zoom in | either :( | bnegreve wrote: | I don't think it is a good idea to describe neural networks as a | large graph of neurons interacting with each other. It is not | really helpful to understand what is going on inside. | | It is more useful to understand them as a series of transforms | that bend and fold the input space, in order to place pairs of | similar items close to each other. I would like to see people | trying to illustrate that instead. | | It also has the benefit of making the connection with linear | algebra much easier to understand. | mjburgess wrote: | A Neural Network _isnt_ a graph in any case, and isn 't based | on the brain. | | As you said, it's a sequence of transformations. | | NB. If it's a graph: write out the edge list (etc.) . | | NNs are diagrammed as graphs, but this is highly misleading. | Retric wrote: | Neural Networks are a graph more specifically a Weighted | Directed Graph. | | They are also very much modeled after the brain, more | specifically they originate from a 1943 paper by | neurophysiologist Warren McCulloch and mathematician Walter | Pitts who described how neurons in the brain might work by | modeling a simple neural network. | | Of course it's not an accurate model, but it very much is | based on early understanding of biological neurons. | jonnycomputer wrote: | Yeah, I very much don't understand OP's argument. And its | trivial to write out the nodes and edges (at least for | trivially sized neural networks). | zwaps wrote: | I think what op means is this: A graph is mathematically | a set of vertices (nodes) and a set of ordered or | unordered tuples giving the edges (ties). Now, sometimes | you might have a weight on these edges, for example by | specifying some function on the edge set. | | However, it is difficult to see how a neural network that | includes operations like sum, multiply and tanh, might be | modeled this way. How do you describe a dropout as a | graph? | | I think the argument is that a graph is not sufficient to | describe a NN, so technically speaking a NN is not a | graph. It is more. It has edges between x and f(x), but | we also need to specify what f(x) is. The mathematical | definition of a graph doesn't do that. | Retric wrote: | A weighted graph can include weights for both the | vertices and edges. For example a network latency diagram | may include the physical wires as separate from the | router as the router latency may depend on network load. | Similarly routers themselves have internal bandwidth | limitations etc. | | As to the rest separating the NN from everything needed | to generate it is a useful distinction. You're not going | to generate a different f(x) by slightly changing the | training set etc. It's however a somewhat arbitrary | distinction. | [deleted] | laGrenouille wrote: | > NB. If it's a graph: write out the edge list (etc.) . | | I don't understand what issue you are referring to. | | For a dense network, each pair of adjacent layers forms a | complete bipartite graph. In other words, edges are all pairs | with one node in layer N and another in layer N+1. | | CNNs and RNNs take a little more work, but still easy to | describe the graph structure. | zwaps wrote: | I think op means that a graph is not sufficient to describe | a NN. If a layer is Y=XB, then you draw that as set of | nodes Y and individual weight b_ij as edge-weights from X. | Right. | | But can you describe things like concat, max-pooling, | attention etc. without changing the meaning of the edges? | Or do you have to annotate edges to now mean "apply | function here"? If so, op probably wants to say that you | are describing more than a graph. There's a graph there, | but you need more, you need elaborate descriptions of what | edges do. In that case, op could be correct to say that | technically, NN are not graphs. | | Or, perhaps NN can generally be represented by vertices and | edge lists. It certainly isn't the usual way to draw them, | though. | farresito wrote: | Totally agree with you. The article that opened my eyes was | this[0] one. This[1] video is also very good. | | [0] https://colah.github.io/posts/2014-03-NN-Manifolds- | Topology/ | | [1] https://www.youtube.com/watch?v=e5xKayCBOeU | joefourier wrote: | You really find large n-dimensional transforms easier to reason | about and visualise, as opposed to layers of neurons with | connections? You don't find it much more intuitive to see it as | a graph once you start adding recurrence, convolutions, | sparsity, dropout, connections across multiple layers, etc., | let alone coming up with new concepts? | | I think it's useful to understand it in both ways, but our | intuitions about transforms are largely useless when the number | of dimensions is high enough. | nerdponx wrote: | It's good to have both perspectives. Ideally you learn the | layers-of-transforms version alongside the styled graph-of- | neurons version. If you had to only pick one, which one you | learn would depend a lot on what kind of student you are and | what your goals are. I think the layers-of-transforms version | is "less wrong" in general, but probably harder to | understand, so it's maybe better if you had to learn just | one. | ravi-delia wrote: | I think understanding how neural networks work is easiest if | you think of them as networks. Reasoning about _why_ they | work is a lot easier thinking about them as transformations. | It 's not like you're actually picturing all the parameters | of a nontrivial network one way or the other. | farresito wrote: | Not the person you are answering to, but I think it's all | about the level of abstraction you want to reason at. I | didn't grok neural networks until I visualized the | transformations that were happening in a very simple network. | Once that made sense, I could start thinking in terms of | layers. | shimonabi wrote: | If anyone is interested, here is a simple symbol recognizer using | backpropagation I wrote in Python a while ago with the help of | the book "Make your own network" by Tariq Rashid. Numpy is a | great help with matrix calculations. | | https://www.youtube.com/watch?v=IAQyVmTDz0A | andreyk wrote: | Pleasantly surprised by this, not yet another blog post on this | but rather a nice interactive lesson. Well done! | aeg42x wrote: | Hi everyone! I made this thing! I'm glad you all like it :) This | is actually my first time using javascript, so if there's any | issues please let me know and I'll do my best to fix them. | Pensacola wrote: | Hi, nice site! But since you asked, here's an issue: the little | "Click to increase or decrease weights" feature doesn't work in | Firefox. | windsignaling wrote: | "first time using javascript" | | Impressive. I think the first time I used Javascript I made a | button. | mdp2021 wrote: | On mine, the textboxes are broken - overlapping other areas and | rendered with a heavy blur. | aeg42x wrote: | Are you using mobile or a browser? And could you please post a | screenshot? I'll see what I can do! Thank you! | pplanel wrote: | Can't start in Android's Firefox. | informationslob wrote: | I can. | moffkalast wrote: | Probably can't start in Netscape Navigator either, the | audacity. | kebsup wrote: | Very nice. I've have created very similar thing a few years ago, | but yours is nicer. :D https://nnplayground.com | minihat wrote: | Each time I teach neural nets to an engineer, there's only a 50% | chance they can write down the chain rule. Colah's blog on | backprop used to be my favorite resource to leave them with | (https://colah.github.io/posts/2015-08-Backprop). | | The explanation of the calculus in this tool is equally | fantastic. And the art is very cute. | | There are many ways to skin a cat, of course, but this is as good | a tutorial as I've seen for getting you through backprop as fast | as possible. | jhgb wrote: | > there's only a 50% chance they can write down the chain rule | | I blame the common mathematical notation for that. | friebetill wrote: | I found this 13 min explanation very helpful in understanding | backpropagation (https://youtu.be/c36lUUr864M?t=2520). | | First he explains the necessary concepts: | | 1) Chain Rule | | 2) Computational Graph | | Then he explains backpropagation in these three steps (first in | general and then with examples): | | 1) Forward pass: Compute loss | | 2) Compute local gradients | | 3) Backward pass: Compute dLoss/dWeights using the Chain Rule | shaan7 wrote: | Any recommendations for a 101 book for neural nets for someone | who is "just a programmer"? OP's tutorial is quite nice, but I | love to read books and find it easier to learn from them. | aeg42x wrote: | I highly recommend http://neuralnetworksanddeeplearning.com/ | it's an online book that has some great code examples built | in. | carom wrote: | There is a book called neural networks from scratch at | https://nnfs.io. | wesleywt wrote: | Fastai has a course: practical deep learning for programmers. | carom wrote: | There are also coursera specializations from Andrew Ng at | https://deeplearning.ai. | matsemann wrote: | Ng's course is bottom up: Start with the basic math, | expand upon it, until you arrive at ML and neural nets. | | Fastai is top down: learn to use practical ML with | abstractions, and then dig deeper and explain as needed. | | I preferred fastai's approach, even though I enjoyed | both. Ng's could be a bit too low level and fundamental | for what I wanted to learn. | carom wrote: | This is a valuable take. Fastai was very frustrating for | me because I wanted to understand the internals. I ended | up not finishing it, so take my opinion with a grain of | salt. | windsignaling wrote: | I much prefer Andrew Ng's courses as well. | | I tried Fast AI, but it seems to be trying too hard to | take out the math, which oddly for me (as a STEM grad) | makes it much more difficult to understand. | | Had to stop when I saw him using Excel spreadsheets to | explain convolution. | jacobcmarshall wrote: | _Deep Learning with Python_ by Chollet is an excellent | beginner resource if you are a hands-on learner. | | It starts off with some tutorials using the Keras library, | and then gets into the math later on. | | By the end of the book, you create multiple different types | of neural networks for identifying images, text, and more! I | highly recommend it. | baron_harkonnen wrote: | Given the current state of automatic differentiation I'm not so | sure it's even necessary or particularly useful to focus on | backpropagation any more. | | While backprop has major historic significance, in the end it's | essentially just a pure calculation which no longer needs to be | done by hand. | | Don't get me wrong, I still believe that understanding the | gradient is hugely important, and conceptually it will always | be essential to understand that one is optimizing a neural | network by taking the derivative of the loss function, but | backprop is not necessary nor is it particularly useful for | modern neural networks (nobody is computing gradients by hand | for transformers). | | IMHO a better approach is to focus on a tool like JAX where | taking a derivative is abstracted away cleanly enough, but at | the same time you remain fully aware of all the calculus that | is being done. | | Especially for programmers, it's better to look at Neural | Networks as just a specific application of Differentiable | Programing. This makes them both easier to understand and also | enables the learner to open a much broader class of problems | they can solve with the same tools. | medo-bear wrote: | Backpropagation is a particular implementation of reverse | mode auto-differentiation, and it is the basis for all | implementaions of DL models. It is very strange for me to | read this as though it is very obvious and commonly accepted | fact, which I don't think it is. | baron_harkonnen wrote: | > to read this as though it is very obvious and commonly | accepted fact | | I'm not entirely sure what you're referring to by "this" | but assuming you mean my comment, I think what I'm saying | is very much up for debate and not an "obvious and commonly | accepted fact". Karpathy has a very reasonably argument | that directly disagrees with what I'm suggesting [0]. Of | course he also agrees that in practice nobody will every | use backprop directly. | | Whether it's JAX, TF, PyTorch, etc the chain rule will be | applied for you. I'm arguing that I think it's helpful to | not have to worry about the details of how your derivative | is being computed, and rather build an intuition about | using derivatives as an abstraction. To be fair I think | Karpathy is correct for people who are going to be learning | to explicitly be experts in Neural Networks. | | My point is more that given how powerful our tools today | are for computing derivatives (I think JAX/Autograd have | improved since Karpathy wrote that article), it's better to | teach programmers to learn think of derivatives, gradients, | hessians etc as high level abstractions. Worrying less | about how to compute them and more about how to use them. | In this way thinking about modeling doesn't need to be | restricted to strictly NNs, but rather use NNs and example | and then demonstrate to the student that they are free to | build any model by defining how the model predicts, scoring | the prediction and using the tools of calculus to answer | other common questions you might have. | | edit: a good analogy is logic programming and | backtracking/unification. The entire point of logic | programming is to abstract away backtracking. Sure experts | in Prolog do need to understand backtracking, but it's more | helpful to get beginners understanding how Prolog behaves | than understand the details of backtracking. | | [0] https://karpathy.medium.com/yes-you-should-understand- | backpr... | medo-bear wrote: | but with backprop you do not worry about computing | derivatives by hand. backprop and AD in general means you | do not have to do that. maybe one of us is | misunderstanding the other | | i am saying that if you want to work with ML algorithms | on a more deeper level you must learn backprop | | if you want to implement some models on the other hand, | you can just follow a recipe approach | matsemann wrote: | > _there 's only a 50% chance they can write down the chain | rule_ | | Why should I, though? I remember the concept from calculus. I | know pytorch keeps track of the various stuff I do to a vector | and calculates a gradient based on it. What more do I need to | know when all I want to do is to play with applications, not | implement backprop myself? | medo-bear wrote: | if you don't understand chain rule then you dont understand | backprop, which means you do not really understand how deep | learning works. at most you can follow recipes cook book | style. it is kind of how one can make a website without a | deep understanding of networking | baron_harkonnen wrote: | > at most you can follow recipes cook book style. | | Here I disagree with you pretty strongly. Once someone is | comfortable with differentiable programming it's much more | obvious how to build and optimize any type of model. | | People should be more concerned about when to use | derivatives, gradients, hessians, Laplace approximation etc | rather than worry about the implementation details of these | tools. | | Abstraction can also aid depth of understanding. I know | plenty of people who can implement backprop, but then don't | understand how to estimate parameter uncertainty from the | Hessian. The latter is much more important for general | model building. | medo-bear wrote: | i am not sure what you are disagreeing with. chain rule | is basic calculus that precedes understanding hessians. | my argument is, if you can not understand what the chain | rule is, you will not understand more complicated | mathematics in ML. do you think i am wrong ? | | EDIT: also uncertainty estimation is the stuff of | probabalistic approach to ML. i would say that people who | do probabalistic ML are quite mathematically capable (at | least to my experience) | baron_harkonnen wrote: | > chain rule is basic calculus that precedes | understanding hessians. | | It doesn't have to be that way. The hessian is an | abstract idea and the chain rule and more specifically | backpropagation are methods of computing the results for | an abstract idea. When I want the hessian I want a matrix | of second order partial derivatives, I'm not interested | in how those are computed. | | For a more concrete example, would you say that using the | quantile function for the normal distribution requires | you to be able to implement it from scratch? | | There are many, very smart, very knowledgeable people | that correctly use the normal quantile function (inverse | CDF) every day for essential quantitative computation | that have absolutely no idea how to implement the inverse | error function (an essential part of the normal | quantile). Would you say that you don't really know | statistics if you can't do this? That a beginner must | understand the implementation details of the inverse | error function before making any claims about normal | quantiles? I myself would absolutely need to pull up a | copy of Numerical Recipes to do this. It would be, in my | opinion, ludicrous to say that anyone wanting to write | statistical code should understand and be able to | implement the normal quantile function. Maybe in 1970 | that was true, but we have software to abstract that out | for us. | | The same is becoming true of backprop. I can simply call | jax.grad on my implementation of loss of the forward pass | of the NN I'm interested in and get the gradient of that | function, the same way I can call scipy.stats.norm.ppf to | get that quantile for a normal. All that is important is | that you understand what the quantile function of the | normal distribution means for you to use it correctly, | and again I suspect there are many practicing | statisticians that don't know how to implement this. | | And to give you a bit of context, my view on this has | developed from working with many people who can pass a | calculus exam and perform the necessarily steps to | compute a derivative, but yet have almost no intuition | about what a derivative _means_ and how to use it and | reason about it. Calculus historically focused on | computation over intuition because that was what was | needed to do practical work with calculus. Today the | computation can take second place to the intuition | because we have powerful tools that can take care of all | the computation for you. | tchalla wrote: | > my argument is, if you can not understand what the | chain rule is, you will not understand more complicated | mathematics in ML. | | Are you sure about this? | medo-bear wrote: | yes. in europe admission into an ML-type masters degree | lists all three standard levels of mathematical analysis | as a bare minimum for application | tchalla wrote: | If by understand, you mean understand and not regurgitate | it when asked as a trivia question - I agree with you. | However, there are different interpretations of the chain | rule. | Imnimo wrote: | Certainly there's a lot you can do without understanding | backprop - you can train pre-made architectures, you can put | pre-made layers together to build your own architecture, you | can tweak hyperparameters and improve your model's accuracy, | and so on. But I also think you will eventually run into a | problem that would be much easier to debug if you understand | backprop. If your model isn't learning, and your tensorboard | graphs show your gradient magnitude is through the roof, | it'll be much easier to track that down if you have a strong | conceptual model of how gradients are calculated and how they | flow backwards through the network. ___________________________________________________________________ (page generated 2021-10-11 23:00 UTC)