[HN Gopher] Dive into Deep Learning ___________________________________________________________________ Dive into Deep Learning Author : soohyung Score : 196 points Date : 2020-01-03 18:19 UTC (4 hours ago) (HTM) web link (d2l.ai) (TXT) w3m dump (d2l.ai) | whoisnnamdi wrote: | Great guide - though unless I missed it I think this is missing | the latest advancements around Transformers, BERT, ELMo, etc. | | This stuff is pretty fresh, so it's understandable, but the NLP | chapter would be greatly enhanced by covering these newer topics | enitihas wrote: | Is there any book which has more than a passing mention of | BERT? | fareesh wrote: | As an engineer I find myself in this type of situation quite | often - if anyone can point me to some good resources or has any | advice, I'd be quite grateful: | | - Some non-technical stakeholder comes to me and says "can we | solve this problem with Machine Learning?" usually it's something | like "there need to be two supervisors on the factory floor at | all times, and I want an email alert everytime there are less | than 2 supervisors for more than 20 minutes" | | - I ask for some sample footage to build a prototype and get a | few very poor quality videos, at a very different standard from | what I see in most of these tutorials. | | - I find some pre-trained model that is able to do people | detection or face detection and return bounding rectangles and | download it in whatever form | | - After about 30 minutes of fiddling and googling errors, I run | it against the sample footage | | - I get about 60% accuracy - this is no good. Where do I go from | here? Keep trying different models? There are all sorts of models | like YOLO and SSD and RetinaNet and YOLO2 and YOLO3. | | - At some point I try a bunch of models and all of them are at | best 75% good. At this point I figure I should train it with my | own dataset, and so I guess I need to arrange to have this stuff | labelled. In my experience stakeholders are usually willing to | appoint someone to do it but they want to know how much footage | they need to label and whether their team will need special | training to do the labelling and after it's all done is this even | going to work? | | What are some effective / opinionated workflows for this part of | the overall process that have worked well for you? What's a | labelling tool that non-technical users can use intuitively? How | good are tools/services like Mechanical Turk and Ground Truth? | | This part of the process costs time and money - stakeholders, | particularly managers who are non-technical tend to want an | answer beforehand - "If we spend all this time and money | labelling footage, how well is this going to work? How much | footage do we need to label?". How do you handle these kinds of | conversations? | | I find this space fairly well-populated with ML tutorials and | resources but haven't been able to find content that is focused | on this part of the process. | fxtentacle wrote: | Most AI stuff is just horribly over-hyped, so the sad truth | might be that what you are seeing is the state of the art and | nobody else has found a better way yet. | | As a practical example, figuring out where a given pixel moves | from one video frame to the next one, when working on real- | world videos, the best known algorithms get about 50% of the | pixels correct. With clever filtering, you can maybe bump that | to 60 or 70%, but in any case you will be left with a 30%+ | error rate. | | NVIDIA / Google / Microsoft / Amazon will tell you that you | need to buy or rent more GPUs or Cloud GPU servers and do more | training with more data. And there's plenty of companies in | cheap labor countries offering to do your data annotation at a | very reasonable rate. But both of them are just trying to sell | to you. They don't care if it will solve your problem, as long | as you're feeling hopeful enough to buy their stuff. | | Judging from the bad results that even Google / Facebook / | NVIDIA show at benchmarks, having a near-unlimited budget is | still not enough to make ML work nicely. | | Oh and for these image classification networks like YOLO, they | have their own flavor of problems: | https://www.inverse.com/article/56914-a-google-algorithm-was... | throwlaplace wrote: | >As a practical example, figuring out where a given pixel | moves from one video frame to the next one, when working on | real-world videos, the best known algorithms get about 50% of | the pixels correct. With clever filtering, you can maybe bump | that to 60 or 70%, but in any case you will be left with a | 30%+ error rate. | | what do you mean by this? optical flow isn't really a | learning problem? it's a classical problem with very good | classical algorithms | | https://www.mia.uni-saarland.de/Publications/brox- | eccv04-of.... | | https://people.csail.mit.edu/celiu/OpticalFlow/ | | https://github.com/pathak22/pyflow | fxtentacle wrote: | It used to be. Then the AI fanboys arrived and started | treating it like a learning problem. | | https://arxiv.org/abs/1612.01925 | | https://arxiv.org/abs/1709.02371 | | https://arxiv.org/abs/1904.09117 | | BTW, also the classical algorithms deal very badly with | noise and repetitive textures, e.g. a video of a forest in | the afternoon. | newfeatureok wrote: | I'm somewhat surprised at the responses for this. | | I believe your issue can be easily solved - have supervisors | wear a distinctive color from a non-supervisor. For example | let's say it's yellow. | | OK so now you have yellow wearing supervisors and everyone | else. To resolve the issue you have described acquire a month | or so of footage, with labels per minute describing how many | yellow wearing supervisors and how many people (in total) there | are. | | So the data you have is: | | 1. Yellow wearing supervisors | | 2. Total amount of workers on the floor | | Then with this data you can train a network to do what you're | describing pretty easily. Assuming there are a lot of workers | on the floor, trying to do person detection or face detection | would require too much data. Just have a uniform enforced and | train on the colors/presence. | fareesh wrote: | Sorry but it was a scenario I imagined and not something that | happened in reality. I can't talk about some of the real- | world scenarios that I am asked to consult on, so I made up a | rather poorly thought-out one. | mrspeaker wrote: | "Easily solved - just have them wear special clothes." | Everything is easy if you can arbitrarily change the | requirements! | 1MoreThing wrote: | This is good problem-solving. Why spend tens (if not | hundreds) of thousands of dollars building technology to do | a complicated task if you can cut that effort in half or | more by having somebody where a funny vest? | | Remember, the problem is "I need to know when I don't have | two managers on the floor," not "how do I use machine | learning to know when I don't have two managers on the | floor." | mrspeaker wrote: | This particular problem is "I need to know when I don't | have two managers on the floor, and they aren't always | wearing funny vests just because the computer guys are | bad at deep learning". | | If we can make up arbitrary rules then just have them jot | down on a piece of paper when they come and go, and if | they are the last to leave then they have to send an | email. | newfeatureok wrote: | The requirements were not changed. Supervisors of almost | every working class position already wear different clothes | to begin with. Heck, even doctors wear different clothing | than nurses, teachers than students, coaches from athletes, | etc. | | The general point is to capitalize on preexisting | information than to do the "true" solution which is error | prone and even a human might not have 100% accuracy at, due | to the fact that in certain settings (such as this | hypothetical) the perfected solution cannot be accomplished | without constraints. | SubiculumCode wrote: | This is not bad, but once in this territory, why not just add | some tracking beacon to a badge? | sickcodebruh wrote: | Have you tried fast.ai's Practical Deep Learning For Coders? | https://course.fast.ai/ I think it's great for answering many | of the exact questions you have. | | I was able to answer my own versions of many of those questions | after the first few video lessons. It demonstrated to us that | our data is a great fit for machine learning. I didn't feel | comfortable turning my experiments into something production- | worthy but I feel confident enough to at least have | conversations about it and sketch out a possible plan for what | a contractor could work on this year. | fareesh wrote: | There seem to be a lot of courses in this space - I'll give | this one a try since you're recommending it. Most of them | seem to focus more on the theoretical / math aspects of | stuff, which is quite interesting but I find it more | interesting to implement these things and solve real-world | problems. | voodootrucker wrote: | FastAI has you detecting dog breads in lesson one :) | corporateslave5 wrote: | More advertising from fast.ai | barbecue_sauce wrote: | People always bring up fast.ai because it's a good course | and it's free. As someone who has gone through it, I can | attest to its quality. | salty_biscuits wrote: | You need to start from what sort of accuracy do you need for | the task from a business perspective (including what is | acceptable in terms of false positives and false negatives). | Just back of the envelope stuff. You have a rough idea of the | "I copied stuff other people has done rate" and the "I spent | few a days mucking about rate". This stuff always follows a | logistic curve with time, starting at your first rate and | asymptotically going to high 90%. Use this to get a ball park | estimate of how long it will take / cost. If the accuracy | required is close to 100% you can probably give up straight | away. For things like this that I have done in the past, a good | mental model has been if it isn't worth "manually automating" | the task (i.e. paying someone somewhere to watch a webcam and | send the email so you always have the end product and you | eventually get labeled data as a byproduct) it might not be | worth trying to automate it. | carbocation wrote: | 60% on a per-frame basis might be enough if all you need to do | is identify the condition "two supervisors are not on the | floor" for at least 20 minutes. | | As in, if you compute your per-frame score and compare it over | bigger chunks of time, is it sufficiently different when 2 are | on the floor and 2 are not? | fareesh wrote: | I wrote random numbers for the sake of narrating a scenario | but yeah I suppose you could do Supervisor Present Y/N for | 180 frame chunks @30fps and pick up the value per minute | proc0 wrote: | Isn't this a human learning problem? Just tell your supervisors | to be aware of their counterpart on the floor, at all times? | zmmmmm wrote: | The shocking thing that at least I ran into is the sheer | quantity of training data you really need. The large companies | doing this successfully are using utterly gigantic libraries of | training data that are beyond anything others could ever come | up with. It really brought home to me what a blunt intstrument | deep learning really is. | fareesh wrote: | Is there some kind of rule of thumb for a minimum of how much | data is needed for various types of problems? | TACIXAT wrote: | Retraining and existing model does not need many (fast.ai | lesson 1 example is retraining a net to distinguish | cricketers and baseball players with 30 images). For a full | net, it's on the order of thousands per category. | voodootrucker wrote: | - We use GCP for labeling [1] | | - Yolov3 is state of the art for speed. I think RetinaNet does | better if you have the horse power. | | - I can't recommend FastAI [2] enough for learning things to | try. | | - 60% on a frame by frame basis might be enough as long as you | have a low false positive rate you can tell. Combine with | OpenCV mean shift if you need real time. | | - Start small. Show success with pre-trained models, then move | on to transfer learning. Start with a small dataset. Agree on a | metric beforehand. | | - Use a notebook. [3] Play around, don't let it run for days | then look at the result. | | [1] https://cloud.google.com/ai-platform/data-labeling/docs/ | | [2] https://course.fast.ai/ | | [3] https://github.com/Mersive- | Technologies/yolov3/blob/master/f... | | Edit: formatting | fareesh wrote: | Thanks I will check out these resources | dna_polymerase wrote: | For annotation, check out Prodigy [0]. | | Generally speaking, as classification systems themselves are | pretty dumb there isn't really a way to know what architecture | will work best for your task, other than trial and error. Of | course you can optimize parameters in a less chaotic way (grid- | search or AutoML). In my experience it mostly boils down to | data. Try augmentation methods, acquiring more data or transfer | learning with varying degrees of layer relearning. | | [0]: https://prodi.gy/ | fareesh wrote: | Looks great, will give it a try. I'm assuming I can just host | this somewhere and send users the link. | throwlaplace wrote: | >What's a labelling tool that non-technical users can use | intuitively? | | i haven't used it but microsoft has this | | https://github.com/microsoft/VoTT | | >"If we spend all this time and money labelling footage, how | well is the going to work?" | | "not well at all because we don't have facebook/google scale | training data. let's try to figure out a conventional way to do | it". for the supervisors problem i would recommend bluetooth | beacons. | jointpdf wrote: | I would recommend CVAT for annotation (for images/video): | https://github.com/opencv/cvat | | In general, annotating data for object detection or | segmentation tends to be _very_ hard to do effectively--expect | low quality and inconsistent labels. | tel wrote: | There are a load of questions here. | | > Where do I go from here? Keep trying different models? | | > ...after [the labeling is] all done is this even going to | work? | | > [How to label] | | > If we spend all this time and money labelling footage, how | well is this going to work? How much footage do we need to | label? | | Generally, you're discussing the space of model improvement and | refinement. This is the costliest and most dangerous part of | any ML pipeline. Without good evaluation, stakeholder support, | and real reason to believe that the algorithm can be improved | this is just a hole to throw money into. | | The short answer to most questions is that you don't really | know. Generally speaking, more data will improve ML algorithm | performance, especially if that data is more specific to your | problem. That said, more data may not actually substantially | improve performance. | | You will get much more leverage by using existing systems, | accepting whatever error rate you receive, and building systems | and processes around these tools to play to their strengths. | People have suggested asking the floor managers to wear a | certain color. You could also use the probabilistic bounds | implied by the accuracies you're seeing to build a system which | doesn't _replace_ manual monitoring, but augments it. | | Perhaps you can emit a warning when there's a likelihood | exceeding some threshold that there aren't enough people on the | floor. This makes it easier for the person monitoring manually, | catches the worst case scenarios, and helps improve the | accuracy of the entire monitoring system. | | Not only can these systems be implemented more cheaply, they | will provide early wins for your stakeholders and provide | groundwork for a case to invest in the actual ML. They might | also reduce the problem space that you're working in to a place | where you can judge accuracy better and build theories about | why the models might be underperforming. This will support | experiments to try out new models, augment the system with | other models, or even try to fine-tune or improve the models | themselves for your particular situation. | | In terms of software development lifecycles, it's relatively | late in the game when you can afford the often nearly | bottomless investment of "machine learning research". Early | stages should just implement existing, simple models with | minimal variation and work on refining the problem such that | bigger tools can be supported down the line if the value is | there. | mpfundstein wrote: | Gee, you just described my practice :-) | tel wrote: | Ha, that's good to hear. Would love to chat with you about | it if you're interested. | fareesh wrote: | Thanks - this validates many of the assumptions I had about | this part of the process. | | It has been challenging communicating many of these realities | to non-technical folks, who seem to be quite misguided about | implementing these types of systems as opposed to "non-ML" | systems where there is a less imperfect and more predictable | idea of what's possible, how well it will work, and how much | effort is required to pull it off. | tel wrote: | In my opinion, there's space for a "ML Product Manager" as | a specialization for someone who understands the technical | aspects of both software and ML systems, but also can | design roadmaps, build stakeholder buy-in, and generally | shepherd the project. That feels like a big open space | right now. | ssivark wrote: | Awesome summary. Welcome to some lessons/truths (circa 2019 | state of technology): | | 1. Deep learning (by itself) is often a shitty solution. It | takes a lot of fiddling with not just the models, but also the | training data -- to get anything useful. Often the data | generation team/effort becomes larger than the model-building | effort. | | 2. It is hopeless to use neural networks as an end-to-end | solution. This example will involve studying whether detections | are correlated/independent in neighboring frames... whether | information can be pooled across frames... whether you can use | that to build a robust real-time of the scene of interest, etc. | That will involve lots of judicious software system design | using broader ideas from ML / statistical reasoning. | | This is why I find it hopelessly misleading to tell people to | just find tutorials with TensorFlow/Pytorch and get started. | You really need to understand what's going on to be able to | build useful systems. | | That's apart from all the thorny ethical questions raised by | monitoring humans. | Eridrus wrote: | > "If we spend all this time and money labelling footage, how | well is this going to work? How much footage do we need to | label?" | | Start by labeling some data yourself. If you need to scale | things up, you're going to need very clear rubrics for how | things should be labeled and you're not going to be able to | make them without having labeled some data yourself. | | Definitely think about what the easiest form of your task is. | Labeling bounding boxes is time intensive, labeling whether | there are 2 or more supervisors on the floor should be a lot | easier, and you can easily label a bunch of frames all at once. | | You're going to need to figure out what tooling you will need | for labeling, is this available out of the box, or will you | need something custom? | | Label X data points yourself and do some transfer learning. | Label another X data points and see how much better things get. | | The rough rule of thumb is performance increases | logarithmically with data[1]. After you have a few points on | the curve about how much better things get from more data, fit | a logarithmic curve and make a prediction of how much data you | will need, though be prepared that you might be off by a factor | of 10. | | As others have mentioned, it's worth thinking about false | positive/negative tradeoffs and how much you care about either. | | If the numbers you're extrapolating to aren't satisfactory, | then yeah, you need to keep messing around with your training | until you bend the curve enough that it seems like you'll get | there with labeled data. | | [1] https://ai.googleblog.com/2017/07/revisiting-unreasonable- | ef... | meritt wrote: | Why is this a machine learning problem? Does your factory not | have keycard access? Or just require your supervisors to carry | some sort of RFID/BLE tracking device. These are well-solved | problems. | fareesh wrote: | I had to lie about specifics in the example because I post | under my real name and there are things I can't talk about :) | | Apologies - I figured the primary intent of my comment - i.e. | the questions at the end, would be the focus of most | responses | dang wrote: | Discussed a year ago: | https://news.ycombinator.com/item?id=18838808 | sanxiyn wrote: | See also Dive into Deep Learning Compiler from the same team: | http://tvm.d2l.ai/ | throwlaplace wrote: | this looks pretty good. certainly much better than goodfellow's | deep learning book. definitely much the diagrams and code are | much appreciated but i'm curious why mxnet over pytorch? | bensaccount wrote: | all the authors look to be Amazon employees and I think MXNet | is Amazon's "chosen" DL framework. | throwlaplace wrote: | ah that makes sense. should've googled author's names. i just | assumed they were academic because of the large number of | unis using the book. | [deleted] | sanjose321 wrote: | I find this comment amusing, have you read the goodfellow's | book? That book is amazing. | throwlaplace wrote: | i read the first half of it very closely and skimmed the | second. | dragandj wrote: | I'll chip in with my book, which is written with programmers in | mind, implements everything from scratch, works on CPU and GPU, | at great speed. Directly links theory to implementation, and you | can use it along with Goodfellow's Deep Learning book. Also, | discusses all steps, and does not skip gradients by using | autograd. | | Deep Learning for Programmers: An Interactive Tutorial with CUDA, | OpenCL, DNNL, Java, and Clojure. | | https://aiprobook.com/deep-learning-for-programmers/ | mpfundstein wrote: | Is there a print version (in the planning)? I usually don't buy | ebooks | dragandj wrote: | Only a limited hand-crafted hardcover edition is planned. | That being said, you can print a dead tree version from the | PDF at your local printing shop (or at home) if you care | about the text, and not that much about binding. | mpfundstein wrote: | Yes, that would be an option. | vga805 wrote: | And, what makes me want to dive into this the most, there's | some Clojure! Will definitely have to take a look a this one. | Thanks. | dragandj wrote: | There's lots of Clojure! (in relative terms. In absolute | terms, there's not much of it because Clojure is so concise | and powerful that everything is implemented with very little | code :) ___________________________________________________________________ (page generated 2020-01-03 23:00 UTC)