[HN Gopher] Neural Networks from Scratch
       ___________________________________________________________________
        
       Neural Networks from Scratch
        
       Author : bane
       Score  : 297 points
       Date   : 2021-10-09 03:14 UTC (2 days ago)
        
 (HTM) web link (aegeorge42.github.io)
 (TXT) w3m dump (aegeorge42.github.io)
        
       | synergy20 wrote:
       | very cool, nice UI, simplest tutorial but grasps the gist,
       | perfect for starters to have a big picture before diving in the
       | details.
        
       | spoonsearch wrote:
       | Very nice, the color combination and the UI is so pleasing. The
       | explanation is cool :)
        
       | robomartin wrote:
       | For those curious about the _nabla_ ([?]) or gradient symbol (not
       | a Greek letter):
       | 
       | https://en.wikipedia.org/wiki/Nabla_symbol
        
       | mLuby wrote:
       | While most of the random starting weights converged quickly, this
       | one got stuck with a fairly incorrect worldview, so to speak:
       | 
       | ![data=blueberries in center ringed by strawberries; model=top
       | and bottom third red, middle third blue instead of the expected
       | outer ring red inner circle blue.](https://imgur.com/a/N2w69Mp)
       | 
       | Is it overfitting to say the same is true for humans, where a
       | brain's starting weights and early experiences may make it much
       | more difficult to achieve an accurate model?
        
       | sarathyweb wrote:
       | The text is too small to read on my phone. I cannot zoom in
       | either :(
        
       | bnegreve wrote:
       | I don't think it is a good idea to describe neural networks as a
       | large graph of neurons interacting with each other. It is not
       | really helpful to understand what is going on inside.
       | 
       | It is more useful to understand them as a series of transforms
       | that bend and fold the input space, in order to place pairs of
       | similar items close to each other. I would like to see people
       | trying to illustrate that instead.
       | 
       | It also has the benefit of making the connection with linear
       | algebra much easier to understand.
        
         | mjburgess wrote:
         | A Neural Network _isnt_ a graph in any case, and isn 't based
         | on the brain.
         | 
         | As you said, it's a sequence of transformations.
         | 
         | NB. If it's a graph: write out the edge list (etc.) .
         | 
         | NNs are diagrammed as graphs, but this is highly misleading.
        
           | Retric wrote:
           | Neural Networks are a graph more specifically a Weighted
           | Directed Graph.
           | 
           | They are also very much modeled after the brain, more
           | specifically they originate from a 1943 paper by
           | neurophysiologist Warren McCulloch and mathematician Walter
           | Pitts who described how neurons in the brain might work by
           | modeling a simple neural network.
           | 
           | Of course it's not an accurate model, but it very much is
           | based on early understanding of biological neurons.
        
             | jonnycomputer wrote:
             | Yeah, I very much don't understand OP's argument. And its
             | trivial to write out the nodes and edges (at least for
             | trivially sized neural networks).
        
               | zwaps wrote:
               | I think what op means is this: A graph is mathematically
               | a set of vertices (nodes) and a set of ordered or
               | unordered tuples giving the edges (ties). Now, sometimes
               | you might have a weight on these edges, for example by
               | specifying some function on the edge set.
               | 
               | However, it is difficult to see how a neural network that
               | includes operations like sum, multiply and tanh, might be
               | modeled this way. How do you describe a dropout as a
               | graph?
               | 
               | I think the argument is that a graph is not sufficient to
               | describe a NN, so technically speaking a NN is not a
               | graph. It is more. It has edges between x and f(x), but
               | we also need to specify what f(x) is. The mathematical
               | definition of a graph doesn't do that.
        
               | Retric wrote:
               | A weighted graph can include weights for both the
               | vertices and edges. For example a network latency diagram
               | may include the physical wires as separate from the
               | router as the router latency may depend on network load.
               | Similarly routers themselves have internal bandwidth
               | limitations etc.
               | 
               | As to the rest separating the NN from everything needed
               | to generate it is a useful distinction. You're not going
               | to generate a different f(x) by slightly changing the
               | training set etc. It's however a somewhat arbitrary
               | distinction.
        
               | [deleted]
        
           | laGrenouille wrote:
           | > NB. If it's a graph: write out the edge list (etc.) .
           | 
           | I don't understand what issue you are referring to.
           | 
           | For a dense network, each pair of adjacent layers forms a
           | complete bipartite graph. In other words, edges are all pairs
           | with one node in layer N and another in layer N+1.
           | 
           | CNNs and RNNs take a little more work, but still easy to
           | describe the graph structure.
        
             | zwaps wrote:
             | I think op means that a graph is not sufficient to describe
             | a NN. If a layer is Y=XB, then you draw that as set of
             | nodes Y and individual weight b_ij as edge-weights from X.
             | Right.
             | 
             | But can you describe things like concat, max-pooling,
             | attention etc. without changing the meaning of the edges?
             | Or do you have to annotate edges to now mean "apply
             | function here"? If so, op probably wants to say that you
             | are describing more than a graph. There's a graph there,
             | but you need more, you need elaborate descriptions of what
             | edges do. In that case, op could be correct to say that
             | technically, NN are not graphs.
             | 
             | Or, perhaps NN can generally be represented by vertices and
             | edge lists. It certainly isn't the usual way to draw them,
             | though.
        
         | farresito wrote:
         | Totally agree with you. The article that opened my eyes was
         | this[0] one. This[1] video is also very good.
         | 
         | [0] https://colah.github.io/posts/2014-03-NN-Manifolds-
         | Topology/
         | 
         | [1] https://www.youtube.com/watch?v=e5xKayCBOeU
        
         | joefourier wrote:
         | You really find large n-dimensional transforms easier to reason
         | about and visualise, as opposed to layers of neurons with
         | connections? You don't find it much more intuitive to see it as
         | a graph once you start adding recurrence, convolutions,
         | sparsity, dropout, connections across multiple layers, etc.,
         | let alone coming up with new concepts?
         | 
         | I think it's useful to understand it in both ways, but our
         | intuitions about transforms are largely useless when the number
         | of dimensions is high enough.
        
           | nerdponx wrote:
           | It's good to have both perspectives. Ideally you learn the
           | layers-of-transforms version alongside the styled graph-of-
           | neurons version. If you had to only pick one, which one you
           | learn would depend a lot on what kind of student you are and
           | what your goals are. I think the layers-of-transforms version
           | is "less wrong" in general, but probably harder to
           | understand, so it's maybe better if you had to learn just
           | one.
        
           | ravi-delia wrote:
           | I think understanding how neural networks work is easiest if
           | you think of them as networks. Reasoning about _why_ they
           | work is a lot easier thinking about them as transformations.
           | It 's not like you're actually picturing all the parameters
           | of a nontrivial network one way or the other.
        
           | farresito wrote:
           | Not the person you are answering to, but I think it's all
           | about the level of abstraction you want to reason at. I
           | didn't grok neural networks until I visualized the
           | transformations that were happening in a very simple network.
           | Once that made sense, I could start thinking in terms of
           | layers.
        
       | shimonabi wrote:
       | If anyone is interested, here is a simple symbol recognizer using
       | backpropagation I wrote in Python a while ago with the help of
       | the book "Make your own network" by Tariq Rashid. Numpy is a
       | great help with matrix calculations.
       | 
       | https://www.youtube.com/watch?v=IAQyVmTDz0A
        
       | andreyk wrote:
       | Pleasantly surprised by this, not yet another blog post on this
       | but rather a nice interactive lesson. Well done!
        
       | aeg42x wrote:
       | Hi everyone! I made this thing! I'm glad you all like it :) This
       | is actually my first time using javascript, so if there's any
       | issues please let me know and I'll do my best to fix them.
        
         | Pensacola wrote:
         | Hi, nice site! But since you asked, here's an issue: the little
         | "Click to increase or decrease weights" feature doesn't work in
         | Firefox.
        
         | windsignaling wrote:
         | "first time using javascript"
         | 
         | Impressive. I think the first time I used Javascript I made a
         | button.
        
       | mdp2021 wrote:
       | On mine, the textboxes are broken - overlapping other areas and
       | rendered with a heavy blur.
        
         | aeg42x wrote:
         | Are you using mobile or a browser? And could you please post a
         | screenshot? I'll see what I can do! Thank you!
        
       | pplanel wrote:
       | Can't start in Android's Firefox.
        
         | informationslob wrote:
         | I can.
        
         | moffkalast wrote:
         | Probably can't start in Netscape Navigator either, the
         | audacity.
        
       | kebsup wrote:
       | Very nice. I've have created very similar thing a few years ago,
       | but yours is nicer. :D https://nnplayground.com
        
       | minihat wrote:
       | Each time I teach neural nets to an engineer, there's only a 50%
       | chance they can write down the chain rule. Colah's blog on
       | backprop used to be my favorite resource to leave them with
       | (https://colah.github.io/posts/2015-08-Backprop).
       | 
       | The explanation of the calculus in this tool is equally
       | fantastic. And the art is very cute.
       | 
       | There are many ways to skin a cat, of course, but this is as good
       | a tutorial as I've seen for getting you through backprop as fast
       | as possible.
        
         | jhgb wrote:
         | > there's only a 50% chance they can write down the chain rule
         | 
         | I blame the common mathematical notation for that.
        
         | friebetill wrote:
         | I found this 13 min explanation very helpful in understanding
         | backpropagation (https://youtu.be/c36lUUr864M?t=2520).
         | 
         | First he explains the necessary concepts:
         | 
         | 1) Chain Rule
         | 
         | 2) Computational Graph
         | 
         | Then he explains backpropagation in these three steps (first in
         | general and then with examples):
         | 
         | 1) Forward pass: Compute loss
         | 
         | 2) Compute local gradients
         | 
         | 3) Backward pass: Compute dLoss/dWeights using the Chain Rule
        
         | shaan7 wrote:
         | Any recommendations for a 101 book for neural nets for someone
         | who is "just a programmer"? OP's tutorial is quite nice, but I
         | love to read books and find it easier to learn from them.
        
           | aeg42x wrote:
           | I highly recommend http://neuralnetworksanddeeplearning.com/
           | it's an online book that has some great code examples built
           | in.
        
           | carom wrote:
           | There is a book called neural networks from scratch at
           | https://nnfs.io.
        
           | wesleywt wrote:
           | Fastai has a course: practical deep learning for programmers.
        
             | carom wrote:
             | There are also coursera specializations from Andrew Ng at
             | https://deeplearning.ai.
        
               | matsemann wrote:
               | Ng's course is bottom up: Start with the basic math,
               | expand upon it, until you arrive at ML and neural nets.
               | 
               | Fastai is top down: learn to use practical ML with
               | abstractions, and then dig deeper and explain as needed.
               | 
               | I preferred fastai's approach, even though I enjoyed
               | both. Ng's could be a bit too low level and fundamental
               | for what I wanted to learn.
        
               | carom wrote:
               | This is a valuable take. Fastai was very frustrating for
               | me because I wanted to understand the internals. I ended
               | up not finishing it, so take my opinion with a grain of
               | salt.
        
               | windsignaling wrote:
               | I much prefer Andrew Ng's courses as well.
               | 
               | I tried Fast AI, but it seems to be trying too hard to
               | take out the math, which oddly for me (as a STEM grad)
               | makes it much more difficult to understand.
               | 
               | Had to stop when I saw him using Excel spreadsheets to
               | explain convolution.
        
           | jacobcmarshall wrote:
           | _Deep Learning with Python_ by Chollet is an excellent
           | beginner resource if you are a hands-on learner.
           | 
           | It starts off with some tutorials using the Keras library,
           | and then gets into the math later on.
           | 
           | By the end of the book, you create multiple different types
           | of neural networks for identifying images, text, and more! I
           | highly recommend it.
        
         | baron_harkonnen wrote:
         | Given the current state of automatic differentiation I'm not so
         | sure it's even necessary or particularly useful to focus on
         | backpropagation any more.
         | 
         | While backprop has major historic significance, in the end it's
         | essentially just a pure calculation which no longer needs to be
         | done by hand.
         | 
         | Don't get me wrong, I still believe that understanding the
         | gradient is hugely important, and conceptually it will always
         | be essential to understand that one is optimizing a neural
         | network by taking the derivative of the loss function, but
         | backprop is not necessary nor is it particularly useful for
         | modern neural networks (nobody is computing gradients by hand
         | for transformers).
         | 
         | IMHO a better approach is to focus on a tool like JAX where
         | taking a derivative is abstracted away cleanly enough, but at
         | the same time you remain fully aware of all the calculus that
         | is being done.
         | 
         | Especially for programmers, it's better to look at Neural
         | Networks as just a specific application of Differentiable
         | Programing. This makes them both easier to understand and also
         | enables the learner to open a much broader class of problems
         | they can solve with the same tools.
        
           | medo-bear wrote:
           | Backpropagation is a particular implementation of reverse
           | mode auto-differentiation, and it is the basis for all
           | implementaions of DL models. It is very strange for me to
           | read this as though it is very obvious and commonly accepted
           | fact, which I don't think it is.
        
             | baron_harkonnen wrote:
             | > to read this as though it is very obvious and commonly
             | accepted fact
             | 
             | I'm not entirely sure what you're referring to by "this"
             | but assuming you mean my comment, I think what I'm saying
             | is very much up for debate and not an "obvious and commonly
             | accepted fact". Karpathy has a very reasonably argument
             | that directly disagrees with what I'm suggesting [0]. Of
             | course he also agrees that in practice nobody will every
             | use backprop directly.
             | 
             | Whether it's JAX, TF, PyTorch, etc the chain rule will be
             | applied for you. I'm arguing that I think it's helpful to
             | not have to worry about the details of how your derivative
             | is being computed, and rather build an intuition about
             | using derivatives as an abstraction. To be fair I think
             | Karpathy is correct for people who are going to be learning
             | to explicitly be experts in Neural Networks.
             | 
             | My point is more that given how powerful our tools today
             | are for computing derivatives (I think JAX/Autograd have
             | improved since Karpathy wrote that article), it's better to
             | teach programmers to learn think of derivatives, gradients,
             | hessians etc as high level abstractions. Worrying less
             | about how to compute them and more about how to use them.
             | In this way thinking about modeling doesn't need to be
             | restricted to strictly NNs, but rather use NNs and example
             | and then demonstrate to the student that they are free to
             | build any model by defining how the model predicts, scoring
             | the prediction and using the tools of calculus to answer
             | other common questions you might have.
             | 
             | edit: a good analogy is logic programming and
             | backtracking/unification. The entire point of logic
             | programming is to abstract away backtracking. Sure experts
             | in Prolog do need to understand backtracking, but it's more
             | helpful to get beginners understanding how Prolog behaves
             | than understand the details of backtracking.
             | 
             | [0] https://karpathy.medium.com/yes-you-should-understand-
             | backpr...
        
               | medo-bear wrote:
               | but with backprop you do not worry about computing
               | derivatives by hand. backprop and AD in general means you
               | do not have to do that. maybe one of us is
               | misunderstanding the other
               | 
               | i am saying that if you want to work with ML algorithms
               | on a more deeper level you must learn backprop
               | 
               | if you want to implement some models on the other hand,
               | you can just follow a recipe approach
        
         | matsemann wrote:
         | > _there 's only a 50% chance they can write down the chain
         | rule_
         | 
         | Why should I, though? I remember the concept from calculus. I
         | know pytorch keeps track of the various stuff I do to a vector
         | and calculates a gradient based on it. What more do I need to
         | know when all I want to do is to play with applications, not
         | implement backprop myself?
        
           | medo-bear wrote:
           | if you don't understand chain rule then you dont understand
           | backprop, which means you do not really understand how deep
           | learning works. at most you can follow recipes cook book
           | style. it is kind of how one can make a website without a
           | deep understanding of networking
        
             | baron_harkonnen wrote:
             | > at most you can follow recipes cook book style.
             | 
             | Here I disagree with you pretty strongly. Once someone is
             | comfortable with differentiable programming it's much more
             | obvious how to build and optimize any type of model.
             | 
             | People should be more concerned about when to use
             | derivatives, gradients, hessians, Laplace approximation etc
             | rather than worry about the implementation details of these
             | tools.
             | 
             | Abstraction can also aid depth of understanding. I know
             | plenty of people who can implement backprop, but then don't
             | understand how to estimate parameter uncertainty from the
             | Hessian. The latter is much more important for general
             | model building.
        
               | medo-bear wrote:
               | i am not sure what you are disagreeing with. chain rule
               | is basic calculus that precedes understanding hessians.
               | my argument is, if you can not understand what the chain
               | rule is, you will not understand more complicated
               | mathematics in ML. do you think i am wrong ?
               | 
               | EDIT: also uncertainty estimation is the stuff of
               | probabalistic approach to ML. i would say that people who
               | do probabalistic ML are quite mathematically capable (at
               | least to my experience)
        
               | baron_harkonnen wrote:
               | > chain rule is basic calculus that precedes
               | understanding hessians.
               | 
               | It doesn't have to be that way. The hessian is an
               | abstract idea and the chain rule and more specifically
               | backpropagation are methods of computing the results for
               | an abstract idea. When I want the hessian I want a matrix
               | of second order partial derivatives, I'm not interested
               | in how those are computed.
               | 
               | For a more concrete example, would you say that using the
               | quantile function for the normal distribution requires
               | you to be able to implement it from scratch?
               | 
               | There are many, very smart, very knowledgeable people
               | that correctly use the normal quantile function (inverse
               | CDF) every day for essential quantitative computation
               | that have absolutely no idea how to implement the inverse
               | error function (an essential part of the normal
               | quantile). Would you say that you don't really know
               | statistics if you can't do this? That a beginner must
               | understand the implementation details of the inverse
               | error function before making any claims about normal
               | quantiles? I myself would absolutely need to pull up a
               | copy of Numerical Recipes to do this. It would be, in my
               | opinion, ludicrous to say that anyone wanting to write
               | statistical code should understand and be able to
               | implement the normal quantile function. Maybe in 1970
               | that was true, but we have software to abstract that out
               | for us.
               | 
               | The same is becoming true of backprop. I can simply call
               | jax.grad on my implementation of loss of the forward pass
               | of the NN I'm interested in and get the gradient of that
               | function, the same way I can call scipy.stats.norm.ppf to
               | get that quantile for a normal. All that is important is
               | that you understand what the quantile function of the
               | normal distribution means for you to use it correctly,
               | and again I suspect there are many practicing
               | statisticians that don't know how to implement this.
               | 
               | And to give you a bit of context, my view on this has
               | developed from working with many people who can pass a
               | calculus exam and perform the necessarily steps to
               | compute a derivative, but yet have almost no intuition
               | about what a derivative _means_ and how to use it and
               | reason about it. Calculus historically focused on
               | computation over intuition because that was what was
               | needed to do practical work with calculus. Today the
               | computation can take second place to the intuition
               | because we have powerful tools that can take care of all
               | the computation for you.
        
               | tchalla wrote:
               | > my argument is, if you can not understand what the
               | chain rule is, you will not understand more complicated
               | mathematics in ML.
               | 
               | Are you sure about this?
        
               | medo-bear wrote:
               | yes. in europe admission into an ML-type masters degree
               | lists all three standard levels of mathematical analysis
               | as a bare minimum for application
        
               | tchalla wrote:
               | If by understand, you mean understand and not regurgitate
               | it when asked as a trivia question - I agree with you.
               | However, there are different interpretations of the chain
               | rule.
        
           | Imnimo wrote:
           | Certainly there's a lot you can do without understanding
           | backprop - you can train pre-made architectures, you can put
           | pre-made layers together to build your own architecture, you
           | can tweak hyperparameters and improve your model's accuracy,
           | and so on. But I also think you will eventually run into a
           | problem that would be much easier to debug if you understand
           | backprop. If your model isn't learning, and your tensorboard
           | graphs show your gradient magnitude is through the roof,
           | it'll be much easier to track that down if you have a strong
           | conceptual model of how gradients are calculated and how they
           | flow backwards through the network.
        
       ___________________________________________________________________
       (page generated 2021-10-11 23:00 UTC)