[HN Gopher] Dive into Deep Learning
       ___________________________________________________________________
        
       Dive into Deep Learning
        
       Author : soohyung
       Score  : 196 points
       Date   : 2020-01-03 18:19 UTC (4 hours ago)
        
 (HTM) web link (d2l.ai)
 (TXT) w3m dump (d2l.ai)
        
       | whoisnnamdi wrote:
       | Great guide - though unless I missed it I think this is missing
       | the latest advancements around Transformers, BERT, ELMo, etc.
       | 
       | This stuff is pretty fresh, so it's understandable, but the NLP
       | chapter would be greatly enhanced by covering these newer topics
        
         | enitihas wrote:
         | Is there any book which has more than a passing mention of
         | BERT?
        
       | fareesh wrote:
       | As an engineer I find myself in this type of situation quite
       | often - if anyone can point me to some good resources or has any
       | advice, I'd be quite grateful:
       | 
       | - Some non-technical stakeholder comes to me and says "can we
       | solve this problem with Machine Learning?" usually it's something
       | like "there need to be two supervisors on the factory floor at
       | all times, and I want an email alert everytime there are less
       | than 2 supervisors for more than 20 minutes"
       | 
       | - I ask for some sample footage to build a prototype and get a
       | few very poor quality videos, at a very different standard from
       | what I see in most of these tutorials.
       | 
       | - I find some pre-trained model that is able to do people
       | detection or face detection and return bounding rectangles and
       | download it in whatever form
       | 
       | - After about 30 minutes of fiddling and googling errors, I run
       | it against the sample footage
       | 
       | - I get about 60% accuracy - this is no good. Where do I go from
       | here? Keep trying different models? There are all sorts of models
       | like YOLO and SSD and RetinaNet and YOLO2 and YOLO3.
       | 
       | - At some point I try a bunch of models and all of them are at
       | best 75% good. At this point I figure I should train it with my
       | own dataset, and so I guess I need to arrange to have this stuff
       | labelled. In my experience stakeholders are usually willing to
       | appoint someone to do it but they want to know how much footage
       | they need to label and whether their team will need special
       | training to do the labelling and after it's all done is this even
       | going to work?
       | 
       | What are some effective / opinionated workflows for this part of
       | the overall process that have worked well for you? What's a
       | labelling tool that non-technical users can use intuitively? How
       | good are tools/services like Mechanical Turk and Ground Truth?
       | 
       | This part of the process costs time and money - stakeholders,
       | particularly managers who are non-technical tend to want an
       | answer beforehand - "If we spend all this time and money
       | labelling footage, how well is this going to work? How much
       | footage do we need to label?". How do you handle these kinds of
       | conversations?
       | 
       | I find this space fairly well-populated with ML tutorials and
       | resources but haven't been able to find content that is focused
       | on this part of the process.
        
         | fxtentacle wrote:
         | Most AI stuff is just horribly over-hyped, so the sad truth
         | might be that what you are seeing is the state of the art and
         | nobody else has found a better way yet.
         | 
         | As a practical example, figuring out where a given pixel moves
         | from one video frame to the next one, when working on real-
         | world videos, the best known algorithms get about 50% of the
         | pixels correct. With clever filtering, you can maybe bump that
         | to 60 or 70%, but in any case you will be left with a 30%+
         | error rate.
         | 
         | NVIDIA / Google / Microsoft / Amazon will tell you that you
         | need to buy or rent more GPUs or Cloud GPU servers and do more
         | training with more data. And there's plenty of companies in
         | cheap labor countries offering to do your data annotation at a
         | very reasonable rate. But both of them are just trying to sell
         | to you. They don't care if it will solve your problem, as long
         | as you're feeling hopeful enough to buy their stuff.
         | 
         | Judging from the bad results that even Google / Facebook /
         | NVIDIA show at benchmarks, having a near-unlimited budget is
         | still not enough to make ML work nicely.
         | 
         | Oh and for these image classification networks like YOLO, they
         | have their own flavor of problems:
         | https://www.inverse.com/article/56914-a-google-algorithm-was...
        
           | throwlaplace wrote:
           | >As a practical example, figuring out where a given pixel
           | moves from one video frame to the next one, when working on
           | real-world videos, the best known algorithms get about 50% of
           | the pixels correct. With clever filtering, you can maybe bump
           | that to 60 or 70%, but in any case you will be left with a
           | 30%+ error rate.
           | 
           | what do you mean by this? optical flow isn't really a
           | learning problem? it's a classical problem with very good
           | classical algorithms
           | 
           | https://www.mia.uni-saarland.de/Publications/brox-
           | eccv04-of....
           | 
           | https://people.csail.mit.edu/celiu/OpticalFlow/
           | 
           | https://github.com/pathak22/pyflow
        
             | fxtentacle wrote:
             | It used to be. Then the AI fanboys arrived and started
             | treating it like a learning problem.
             | 
             | https://arxiv.org/abs/1612.01925
             | 
             | https://arxiv.org/abs/1709.02371
             | 
             | https://arxiv.org/abs/1904.09117
             | 
             | BTW, also the classical algorithms deal very badly with
             | noise and repetitive textures, e.g. a video of a forest in
             | the afternoon.
        
         | newfeatureok wrote:
         | I'm somewhat surprised at the responses for this.
         | 
         | I believe your issue can be easily solved - have supervisors
         | wear a distinctive color from a non-supervisor. For example
         | let's say it's yellow.
         | 
         | OK so now you have yellow wearing supervisors and everyone
         | else. To resolve the issue you have described acquire a month
         | or so of footage, with labels per minute describing how many
         | yellow wearing supervisors and how many people (in total) there
         | are.
         | 
         | So the data you have is:
         | 
         | 1. Yellow wearing supervisors
         | 
         | 2. Total amount of workers on the floor
         | 
         | Then with this data you can train a network to do what you're
         | describing pretty easily. Assuming there are a lot of workers
         | on the floor, trying to do person detection or face detection
         | would require too much data. Just have a uniform enforced and
         | train on the colors/presence.
        
           | fareesh wrote:
           | Sorry but it was a scenario I imagined and not something that
           | happened in reality. I can't talk about some of the real-
           | world scenarios that I am asked to consult on, so I made up a
           | rather poorly thought-out one.
        
           | mrspeaker wrote:
           | "Easily solved - just have them wear special clothes."
           | Everything is easy if you can arbitrarily change the
           | requirements!
        
             | 1MoreThing wrote:
             | This is good problem-solving. Why spend tens (if not
             | hundreds) of thousands of dollars building technology to do
             | a complicated task if you can cut that effort in half or
             | more by having somebody where a funny vest?
             | 
             | Remember, the problem is "I need to know when I don't have
             | two managers on the floor," not "how do I use machine
             | learning to know when I don't have two managers on the
             | floor."
        
               | mrspeaker wrote:
               | This particular problem is "I need to know when I don't
               | have two managers on the floor, and they aren't always
               | wearing funny vests just because the computer guys are
               | bad at deep learning".
               | 
               | If we can make up arbitrary rules then just have them jot
               | down on a piece of paper when they come and go, and if
               | they are the last to leave then they have to send an
               | email.
        
             | newfeatureok wrote:
             | The requirements were not changed. Supervisors of almost
             | every working class position already wear different clothes
             | to begin with. Heck, even doctors wear different clothing
             | than nurses, teachers than students, coaches from athletes,
             | etc.
             | 
             | The general point is to capitalize on preexisting
             | information than to do the "true" solution which is error
             | prone and even a human might not have 100% accuracy at, due
             | to the fact that in certain settings (such as this
             | hypothetical) the perfected solution cannot be accomplished
             | without constraints.
        
           | SubiculumCode wrote:
           | This is not bad, but once in this territory, why not just add
           | some tracking beacon to a badge?
        
         | sickcodebruh wrote:
         | Have you tried fast.ai's Practical Deep Learning For Coders?
         | https://course.fast.ai/ I think it's great for answering many
         | of the exact questions you have.
         | 
         | I was able to answer my own versions of many of those questions
         | after the first few video lessons. It demonstrated to us that
         | our data is a great fit for machine learning. I didn't feel
         | comfortable turning my experiments into something production-
         | worthy but I feel confident enough to at least have
         | conversations about it and sketch out a possible plan for what
         | a contractor could work on this year.
        
           | fareesh wrote:
           | There seem to be a lot of courses in this space - I'll give
           | this one a try since you're recommending it. Most of them
           | seem to focus more on the theoretical / math aspects of
           | stuff, which is quite interesting but I find it more
           | interesting to implement these things and solve real-world
           | problems.
        
             | voodootrucker wrote:
             | FastAI has you detecting dog breads in lesson one :)
        
               | corporateslave5 wrote:
               | More advertising from fast.ai
        
               | barbecue_sauce wrote:
               | People always bring up fast.ai because it's a good course
               | and it's free. As someone who has gone through it, I can
               | attest to its quality.
        
         | salty_biscuits wrote:
         | You need to start from what sort of accuracy do you need for
         | the task from a business perspective (including what is
         | acceptable in terms of false positives and false negatives).
         | Just back of the envelope stuff. You have a rough idea of the
         | "I copied stuff other people has done rate" and the "I spent
         | few a days mucking about rate". This stuff always follows a
         | logistic curve with time, starting at your first rate and
         | asymptotically going to high 90%. Use this to get a ball park
         | estimate of how long it will take / cost. If the accuracy
         | required is close to 100% you can probably give up straight
         | away. For things like this that I have done in the past, a good
         | mental model has been if it isn't worth "manually automating"
         | the task (i.e. paying someone somewhere to watch a webcam and
         | send the email so you always have the end product and you
         | eventually get labeled data as a byproduct) it might not be
         | worth trying to automate it.
        
         | carbocation wrote:
         | 60% on a per-frame basis might be enough if all you need to do
         | is identify the condition "two supervisors are not on the
         | floor" for at least 20 minutes.
         | 
         | As in, if you compute your per-frame score and compare it over
         | bigger chunks of time, is it sufficiently different when 2 are
         | on the floor and 2 are not?
        
           | fareesh wrote:
           | I wrote random numbers for the sake of narrating a scenario
           | but yeah I suppose you could do Supervisor Present Y/N for
           | 180 frame chunks @30fps and pick up the value per minute
        
         | proc0 wrote:
         | Isn't this a human learning problem? Just tell your supervisors
         | to be aware of their counterpart on the floor, at all times?
        
         | zmmmmm wrote:
         | The shocking thing that at least I ran into is the sheer
         | quantity of training data you really need. The large companies
         | doing this successfully are using utterly gigantic libraries of
         | training data that are beyond anything others could ever come
         | up with. It really brought home to me what a blunt intstrument
         | deep learning really is.
        
           | fareesh wrote:
           | Is there some kind of rule of thumb for a minimum of how much
           | data is needed for various types of problems?
        
             | TACIXAT wrote:
             | Retraining and existing model does not need many (fast.ai
             | lesson 1 example is retraining a net to distinguish
             | cricketers and baseball players with 30 images). For a full
             | net, it's on the order of thousands per category.
        
         | voodootrucker wrote:
         | - We use GCP for labeling [1]
         | 
         | - Yolov3 is state of the art for speed. I think RetinaNet does
         | better if you have the horse power.
         | 
         | - I can't recommend FastAI [2] enough for learning things to
         | try.
         | 
         | - 60% on a frame by frame basis might be enough as long as you
         | have a low false positive rate you can tell. Combine with
         | OpenCV mean shift if you need real time.
         | 
         | - Start small. Show success with pre-trained models, then move
         | on to transfer learning. Start with a small dataset. Agree on a
         | metric beforehand.
         | 
         | - Use a notebook. [3] Play around, don't let it run for days
         | then look at the result.
         | 
         | [1] https://cloud.google.com/ai-platform/data-labeling/docs/
         | 
         | [2] https://course.fast.ai/
         | 
         | [3] https://github.com/Mersive-
         | Technologies/yolov3/blob/master/f...
         | 
         | Edit: formatting
        
           | fareesh wrote:
           | Thanks I will check out these resources
        
         | dna_polymerase wrote:
         | For annotation, check out Prodigy [0].
         | 
         | Generally speaking, as classification systems themselves are
         | pretty dumb there isn't really a way to know what architecture
         | will work best for your task, other than trial and error. Of
         | course you can optimize parameters in a less chaotic way (grid-
         | search or AutoML). In my experience it mostly boils down to
         | data. Try augmentation methods, acquiring more data or transfer
         | learning with varying degrees of layer relearning.
         | 
         | [0]: https://prodi.gy/
        
           | fareesh wrote:
           | Looks great, will give it a try. I'm assuming I can just host
           | this somewhere and send users the link.
        
         | throwlaplace wrote:
         | >What's a labelling tool that non-technical users can use
         | intuitively?
         | 
         | i haven't used it but microsoft has this
         | 
         | https://github.com/microsoft/VoTT
         | 
         | >"If we spend all this time and money labelling footage, how
         | well is the going to work?"
         | 
         | "not well at all because we don't have facebook/google scale
         | training data. let's try to figure out a conventional way to do
         | it". for the supervisors problem i would recommend bluetooth
         | beacons.
        
         | jointpdf wrote:
         | I would recommend CVAT for annotation (for images/video):
         | https://github.com/opencv/cvat
         | 
         | In general, annotating data for object detection or
         | segmentation tends to be _very_ hard to do effectively--expect
         | low quality and inconsistent labels.
        
         | tel wrote:
         | There are a load of questions here.
         | 
         | > Where do I go from here? Keep trying different models?
         | 
         | > ...after [the labeling is] all done is this even going to
         | work?
         | 
         | > [How to label]
         | 
         | > If we spend all this time and money labelling footage, how
         | well is this going to work? How much footage do we need to
         | label?
         | 
         | Generally, you're discussing the space of model improvement and
         | refinement. This is the costliest and most dangerous part of
         | any ML pipeline. Without good evaluation, stakeholder support,
         | and real reason to believe that the algorithm can be improved
         | this is just a hole to throw money into.
         | 
         | The short answer to most questions is that you don't really
         | know. Generally speaking, more data will improve ML algorithm
         | performance, especially if that data is more specific to your
         | problem. That said, more data may not actually substantially
         | improve performance.
         | 
         | You will get much more leverage by using existing systems,
         | accepting whatever error rate you receive, and building systems
         | and processes around these tools to play to their strengths.
         | People have suggested asking the floor managers to wear a
         | certain color. You could also use the probabilistic bounds
         | implied by the accuracies you're seeing to build a system which
         | doesn't _replace_ manual monitoring, but augments it.
         | 
         | Perhaps you can emit a warning when there's a likelihood
         | exceeding some threshold that there aren't enough people on the
         | floor. This makes it easier for the person monitoring manually,
         | catches the worst case scenarios, and helps improve the
         | accuracy of the entire monitoring system.
         | 
         | Not only can these systems be implemented more cheaply, they
         | will provide early wins for your stakeholders and provide
         | groundwork for a case to invest in the actual ML. They might
         | also reduce the problem space that you're working in to a place
         | where you can judge accuracy better and build theories about
         | why the models might be underperforming. This will support
         | experiments to try out new models, augment the system with
         | other models, or even try to fine-tune or improve the models
         | themselves for your particular situation.
         | 
         | In terms of software development lifecycles, it's relatively
         | late in the game when you can afford the often nearly
         | bottomless investment of "machine learning research". Early
         | stages should just implement existing, simple models with
         | minimal variation and work on refining the problem such that
         | bigger tools can be supported down the line if the value is
         | there.
        
           | mpfundstein wrote:
           | Gee, you just described my practice :-)
        
             | tel wrote:
             | Ha, that's good to hear. Would love to chat with you about
             | it if you're interested.
        
           | fareesh wrote:
           | Thanks - this validates many of the assumptions I had about
           | this part of the process.
           | 
           | It has been challenging communicating many of these realities
           | to non-technical folks, who seem to be quite misguided about
           | implementing these types of systems as opposed to "non-ML"
           | systems where there is a less imperfect and more predictable
           | idea of what's possible, how well it will work, and how much
           | effort is required to pull it off.
        
             | tel wrote:
             | In my opinion, there's space for a "ML Product Manager" as
             | a specialization for someone who understands the technical
             | aspects of both software and ML systems, but also can
             | design roadmaps, build stakeholder buy-in, and generally
             | shepherd the project. That feels like a big open space
             | right now.
        
         | ssivark wrote:
         | Awesome summary. Welcome to some lessons/truths (circa 2019
         | state of technology):
         | 
         | 1. Deep learning (by itself) is often a shitty solution. It
         | takes a lot of fiddling with not just the models, but also the
         | training data -- to get anything useful. Often the data
         | generation team/effort becomes larger than the model-building
         | effort.
         | 
         | 2. It is hopeless to use neural networks as an end-to-end
         | solution. This example will involve studying whether detections
         | are correlated/independent in neighboring frames... whether
         | information can be pooled across frames... whether you can use
         | that to build a robust real-time of the scene of interest, etc.
         | That will involve lots of judicious software system design
         | using broader ideas from ML / statistical reasoning.
         | 
         | This is why I find it hopelessly misleading to tell people to
         | just find tutorials with TensorFlow/Pytorch and get started.
         | You really need to understand what's going on to be able to
         | build useful systems.
         | 
         | That's apart from all the thorny ethical questions raised by
         | monitoring humans.
        
         | Eridrus wrote:
         | > "If we spend all this time and money labelling footage, how
         | well is this going to work? How much footage do we need to
         | label?"
         | 
         | Start by labeling some data yourself. If you need to scale
         | things up, you're going to need very clear rubrics for how
         | things should be labeled and you're not going to be able to
         | make them without having labeled some data yourself.
         | 
         | Definitely think about what the easiest form of your task is.
         | Labeling bounding boxes is time intensive, labeling whether
         | there are 2 or more supervisors on the floor should be a lot
         | easier, and you can easily label a bunch of frames all at once.
         | 
         | You're going to need to figure out what tooling you will need
         | for labeling, is this available out of the box, or will you
         | need something custom?
         | 
         | Label X data points yourself and do some transfer learning.
         | Label another X data points and see how much better things get.
         | 
         | The rough rule of thumb is performance increases
         | logarithmically with data[1]. After you have a few points on
         | the curve about how much better things get from more data, fit
         | a logarithmic curve and make a prediction of how much data you
         | will need, though be prepared that you might be off by a factor
         | of 10.
         | 
         | As others have mentioned, it's worth thinking about false
         | positive/negative tradeoffs and how much you care about either.
         | 
         | If the numbers you're extrapolating to aren't satisfactory,
         | then yeah, you need to keep messing around with your training
         | until you bend the curve enough that it seems like you'll get
         | there with labeled data.
         | 
         | [1] https://ai.googleblog.com/2017/07/revisiting-unreasonable-
         | ef...
        
         | meritt wrote:
         | Why is this a machine learning problem? Does your factory not
         | have keycard access? Or just require your supervisors to carry
         | some sort of RFID/BLE tracking device. These are well-solved
         | problems.
        
           | fareesh wrote:
           | I had to lie about specifics in the example because I post
           | under my real name and there are things I can't talk about :)
           | 
           | Apologies - I figured the primary intent of my comment - i.e.
           | the questions at the end, would be the focus of most
           | responses
        
       | dang wrote:
       | Discussed a year ago:
       | https://news.ycombinator.com/item?id=18838808
        
       | sanxiyn wrote:
       | See also Dive into Deep Learning Compiler from the same team:
       | http://tvm.d2l.ai/
        
       | throwlaplace wrote:
       | this looks pretty good. certainly much better than goodfellow's
       | deep learning book. definitely much the diagrams and code are
       | much appreciated but i'm curious why mxnet over pytorch?
        
         | bensaccount wrote:
         | all the authors look to be Amazon employees and I think MXNet
         | is Amazon's "chosen" DL framework.
        
           | throwlaplace wrote:
           | ah that makes sense. should've googled author's names. i just
           | assumed they were academic because of the large number of
           | unis using the book.
        
             | [deleted]
        
         | sanjose321 wrote:
         | I find this comment amusing, have you read the goodfellow's
         | book? That book is amazing.
        
           | throwlaplace wrote:
           | i read the first half of it very closely and skimmed the
           | second.
        
       | dragandj wrote:
       | I'll chip in with my book, which is written with programmers in
       | mind, implements everything from scratch, works on CPU and GPU,
       | at great speed. Directly links theory to implementation, and you
       | can use it along with Goodfellow's Deep Learning book. Also,
       | discusses all steps, and does not skip gradients by using
       | autograd.
       | 
       | Deep Learning for Programmers: An Interactive Tutorial with CUDA,
       | OpenCL, DNNL, Java, and Clojure.
       | 
       | https://aiprobook.com/deep-learning-for-programmers/
        
         | mpfundstein wrote:
         | Is there a print version (in the planning)? I usually don't buy
         | ebooks
        
           | dragandj wrote:
           | Only a limited hand-crafted hardcover edition is planned.
           | That being said, you can print a dead tree version from the
           | PDF at your local printing shop (or at home) if you care
           | about the text, and not that much about binding.
        
             | mpfundstein wrote:
             | Yes, that would be an option.
        
         | vga805 wrote:
         | And, what makes me want to dive into this the most, there's
         | some Clojure! Will definitely have to take a look a this one.
         | Thanks.
        
           | dragandj wrote:
           | There's lots of Clojure! (in relative terms. In absolute
           | terms, there's not much of it because Clojure is so concise
           | and powerful that everything is implemented with very little
           | code :)
        
       ___________________________________________________________________
       (page generated 2020-01-03 23:00 UTC)