[HN Gopher] Show HN: Fast Deep Reinforcement Learning Course
       ___________________________________________________________________
        
       Show HN: Fast Deep Reinforcement Learning Course
        
       I worked on this applied Deep Reinforcement Learning course for the
       better part of 2021. I made a Datacamp course [0] before, and this
       served as my inspiration to make an applied Deep RL series.
       Normally, Deep RL courses teach a lot of mathematically involved
       theory. You get the practical applications near the end (if at
       all).  I have tried to turn that on its head. In the top-down
       approach, you learn practical skills first, then go deeper later.
       This is much more fun.  This course (the first in a planned multi-
       part series) shows how to use the Deep Reinforcement Learning
       framework RLlib to solve OpenAI Gym environments. I provide a big-
       picture overview of RL and show how to use the tools to get the job
       done. This approach is similar to learning Deep Learning by
       building and training various deep networks using a high-level
       framework e.g. Keras.  In the next course in the series (open for
       pre-enrollment), we move on to solving real-world Deep RL problems
       using custom environments and various tricks that make the
       algorithms work better [1].  The main advantage of this sequence is
       that these practical skills can be picked up fast and used in real
       life immediately. The involved mathematical bits can be picked up
       later. RLlib is the industry standard, so you won't need to change
       tools as you progress.  This is the first time that I made a course
       on my own. I learned flip-chart drawing to illustrate the slides
       and notebooks. That was fun, considering how much I suck at
       drawing. I am using Teachable as the LMS, Latex (Beamer) for the
       slides, Sketchbook for illustrations, Blue Yeti for audio
       recording, OBS Studio for screencasting, and Filmora for video
       editing. The captions are first auto-generated on YouTube and then
       hand edited to fix errors and improve formatting. I do the majority
       of the production on Linux and then switch to Windows for video
       editing.  I released the course last month and the makers of RLlib
       got in touch to show their approval. That's the best thing to
       happen so far.  Please feel free to try it and ask any questions. I
       am around and will do my best to answer them.  [0]
       https://www.datacamp.com/courses/unit-testing-for-data-scien... [1]
       https://courses.dibya.online/p/realdeeprl
        
       Author : gh1
       Score  : 110 points
       Date   : 2022-06-03 15:00 UTC (8 hours ago)
        
 (HTM) web link (courses.dibya.online)
 (TXT) w3m dump (courses.dibya.online)
        
       | cyber_kinetist wrote:
       | To be honest though, the practical side of things of RL can be a
       | hit-and-miss in terms of "fun" depending on the person. It
       | requires a lot of manual hand tuning, reward shaping,
       | hyperparameter tuning, and general trial-and-error to make an
       | agent do a seemingly simple-enough task, and these tricks are
       | more heuristically and haphazardly done than what you would
       | expect from more "conventional" programming. It is fun for the
       | right people (who loves tinkering with stuff and also have the
       | perseverance to continually run RL experiments that can last
       | hours or even days). But I would imagine many getting bored by
       | the whole experience. (Pssst.... I was one of them, switched to
       | doing something else in the middle of grad school)
       | 
       | By the way, RLlib is good if you want to try out simple
       | experiments with well-established RL algorithms, but it's
       | _really_ awful to use when you want to modify the algorithm even
       | just a little bit. So it 's not bad for beginner-level tutorials,
       | but once you get the basics it might be very frustrating later
       | on. I would recommend simpler frameworks like Stable Baselines 3
       | (https://stable-baselines3.readthedocs.io/en/master/ ) for a much
       | more stable experience, if you have gained a fair bit of
       | Python/ML programming skills at hand and don't have trouble
       | reading well-maintained library code.
        
         | gh1 wrote:
         | My experience matches yours. Recently, I was trying to solve an
         | optimization problem using Deep RL. As usual, I had to run many
         | experiments over several days using various tricks and
         | hyperparameters. Finally, it turned out something related to
         | the symmetry of the action space made a huge difference in
         | learning.
         | 
         | Anyhow, the experimentation stage requires a certain discipline
         | and feels tedious at times. But the moment when learning takes
         | off, it feels great, and for me personally, compensates for the
         | tedious phase before.
         | 
         | It's certainly not fun for everyone, but I guess it could be
         | fun for the target audience of the course (ML engineers/Data
         | Scientists).
         | 
         | Regarding frameworks, my experience has been different. I find
         | RLlib to be more modular and adaptable than SB3. But the
         | learning curve is certainly steeper. The biggest
         | differentiating factor for me is production readiness. Assuming
         | that we are learning something in order to actually use it, I
         | would recommend RLlib over SB3. The equation for researchers
         | may be different though.
        
           | InefficientRed wrote:
           | Have you ever encountered a situation where RL solved a (IRL
           | "people paid me non-research-grant money for this") problem
           | for you faster than classical controls engineering and/or
           | planning? I have not.
        
             | gh1 wrote:
             | Depends on what you mean by faster. Do you mean "time to
             | solution" or "time to inference"? I think there are also
             | more factors to take into consideration when considering
             | the merit of the method e.g. performance, robustness,
             | ability to handle non-linearity, ability to solve the full
             | online problem etc.
             | 
             | When all these factors are taken into account, I have
             | encountered situations where Deep RL performed better.
             | 
             | There are also very public examples of this e.g. Google's
             | data center cooling [0] and competitive sailing [1].
             | 
             | [0]
             | https://www.technologyreview.com/2018/08/17/140987/google-
             | ju... [1] https://www.mckinsey.com/business-
             | functions/mckinsey-digital...
        
         | avna98 wrote:
         | RLlib maintainer here -- We've been in the process of making
         | many API changes over the past couple months to make it easy to
         | modify or implement custom algorithms. The full set of changes
         | and updated docs will be released along with ray 2.0 in August!
        
           | cyber_kinetist wrote:
           | Ah, good to meet here. I had experience using RLlib while
           | participating in research back at grad school (which
           | eventually became a SIGGRAPH conference paper this year!),
           | and I've even sent some small pull requests before (with a
           | different ID). Sorry if this is a bit of an off-topic
           | comment, but I want to share some inconveniences I've
           | experienced during using RLlib:
           | 
           | - The framework seems to be mainly built on the assumption
           | that it is going to be run on a cloud machine like AWS/Azure.
           | However, many researchers use HPC-type cluster machines which
           | are far different from these cloud setups, and I found
           | support for it to be lackluster in RLlib. (In our case we had
           | 4 16-core Xeon CPUs and 1 V100 GPU per node, with multiple
           | nodes connected via Infiniband, with CentOS 7 / OpenHPC
           | installed and job control done via SLURM) It was quite
           | disappointing to found out that the framework didn't support
           | Infiniband communication at all, since these are really
           | costly to have (for good reason!). I also found that
           | allocating workers based on lower-level details like
           | affinity/NUMA to be very cumbersome, since the API assumes
           | you want to "auto-assign" your workers automatically instead
           | of "pinning" it manually for the highest performance. (The
           | last time I've used RLlib I looked at placement groups to do
           | this but found it too confusing.) Running your environments
           | NUMA-aware can be crucial for having the best performance
           | when you're running heavy custom-made environments in C++. I
           | did some experiments and found out that parallelizing the
           | environment on the C++ side (via threading) on each NUMA mode
           | was much faster than blindly running one process per physical
           | CPU core (which is what RLlib defaults to. You can hack a bit
           | and write your VecEnv on the C++ side but this messes up lots
           | of assumptions RLlib makes and creates a whole lot of other
           | issues in the code.) Seeing promising solutions like Envpool
           | (https://github.com/sail-sg/envpool) coming up I think these
           | issues with parallelizing environments can be improved.
           | 
           | - As I've said before, the framework is very easy to do
           | simple and established things, but becomes very hard when you
           | try to do anything custom, like modifying RL algorithms to
           | fit in your research. What I needed to do was to simply
           | modify the PPO algorithm to do some custom learning step
           | inside each epoch, and still found it surprisingly hard.
           | Using the whole declarative "Observable-like" API approach to
           | write RL code in Python was incredibly painful, since you
           | have no way to debug any of your code, and also have no idea
           | that your code is correct until you run your whole RL
           | pipeline until 30 minutes into your training you get a
           | strange TypeError. (Got some of the horror flashbacks from
           | when I was using modern JS and Angular, but in a much worse
           | form) I get the feeling that the overall codebase is
           | incredibly complex, uses too many weird dark Python
           | metaprogramming tricks, and is a pain to navigate and extend,
           | compared to other much cleaner solutions like Stable
           | Baselines 3... (they aren't as "general" of a solution as
           | RLlib, but can be more easily modified towards one's needs).
           | Maybe my needs were a bit special, so it might have been much
           | better if I had hand-rolled my PPO implementation with
           | torch.distributed... (if I just had more time...)
           | 
           | But still, your framework did help tremendously in our
           | research, we wouldn't have finished the paper without it.
           | These were just some lamentations from a formerly-grad school
           | student who was struggling with these issues some years ago.
           | (I'm not doing any reinforcement learning nowadays, but many
           | people would certainly benefit from these improvements.)
        
       | sydthrowaway wrote:
       | Gamechanging!
        
       | Fletch137 wrote:
       | I love the illustrations in the slides. How long did it take you
       | to learn flip chart drawing and how did you do the overlays in
       | LaTeX?
        
         | gh1 wrote:
         | Thanks. I learned it from an Udemy course [0]. Took just a
         | couple of weeks to pick up. Regarding overlays, Sketchbook
         | supports the idea of layers. I simply put different elements in
         | the illustration in different layers. Sketchbook gives me PSD
         | files that can be imported in GIMP. I then export many PNG
         | files by progressively selecting more layers in GIMP. These PNG
         | files go into Beamer like this:
         | \begin{figure}
         | \includegraphics<2>[width=0.35\textwidth]{images/shop/1.png}%
         | \includegraphics<3>[width=0.35\textwidth]{images/shop/2.png}%
         | \end{figure}
         | 
         | The % sign is important and it maintains the correct
         | positioning of the images.
         | 
         | [0] https://www.udemy.com/course/drawing-for-trainers-leaders-
         | an...
        
       | robinson_k wrote:
       | Enrolled! Went through the detailed lesson plan and you have done
       | a great job structuring the course. I am looking forward to doing
       | it over the weekend.
       | 
       | One suggestion: Instead of naming all the Jupyter notebooks
       | "coding_exercise.ipynb", maybe name them differently? That way,
       | they won't overwrite the previous download.
        
         | gh1 wrote:
         | Good catch. I can imagine that this is annoying. I have put it
         | in my todo.
         | 
         | I hope you enjoy the course over the weekend.
        
           | pciexpgpu wrote:
           | Thank you for doing this!
           | 
           | I haven't looked deeply enough, but does this course use a
           | higher-level 'package' such as OpenAI Gym or teach at a
           | lower-level? (Is lower-level stuff even possible...)
        
             | gh1 wrote:
             | I think the levels (high, low etc.) are relevant for the
             | Deep RL algorithm, not the environment. The lower level
             | version of OpenAI Gym canned environments would be custom
             | Gym environments. I don't see much reason to go any lower
             | than that.
             | 
             | The situation looks different for Deep RL algorithm. You
             | can implement them from scratch yourself using Tensorflow
             | or any other similar library. Otherwise, you could just use
             | a higher-level library like RLlib which implements the
             | algorithm using modular components and exposes
             | hyperparameters as configuration parameters.
             | 
             | In many real world use cases, all one needs to do is to use
             | RLlib's implementation and then tune the hyperparameters.
             | In that way RLlib is to Deep RL what Keras is to Deep
             | Learning.
             | 
             | This course uses RLlib. Does that answer your question?
        
               | pciexpgpu wrote:
               | Great, yep, that is good to know.
        
       | Asafp wrote:
       | Before jumping into Deep Reinforcement Learning I highly
       | recommend doing the Reinforcement Learning course by David Silver
       | [1].
       | 
       | [1] https://www.deepmind.com/learning-resources/introduction-
       | to-...
        
         | rg111 wrote:
         | Nice resource, but still 10+ hours of video and nothing else.
         | 
         | No code, coding assignments, math problems or coding problems.
         | 
         | Very little RoI.
         | 
         | I watched them all from start to finish. I had a superficial,
         | shallow "understanding" but no real knowledge.
         | 
         | The best (very short book) to learn Deep RL is the one by Zai,
         | Brown from Manning.
         | 
         | And keep the classic Sutton, Barto near. That's it.
         | 
         | If you want a video course that closely follows the book with
         | quizzes and assignments, check out UofAlberta's MOOC on
         | Coursera.
         | 
         | (Hugging Face also has a new Deep RL course taught by Simonini.
         | You could check that out, but I haven't seen it.)
        
           | axpy906 wrote:
           | HF covers Decision Transfomers.
           | 
           | Sutton and Barto is the best start for foundations. Start
           | there.
        
         | pciexpgpu wrote:
         | Second this, his talks are very elaborate, has great pointers
         | to reading material/coursework - as if you were sitting
         | alongside the students in UCL. Very involved though - if you
         | have a 'day job'.
        
         | [deleted]
        
       | superfreek wrote:
       | Congrats on the launch. I have seen your Deep RL tutorials
       | circulating on YouTube. I like your presentation style: crisp and
       | precise.
        
       | mrfusion wrote:
       | Would this teach transformers? Or is that something else?
       | 
       | Also any tips for finding a study group for learning the large
       | language models? I can't seem to self motivate.
        
         | gh1 wrote:
         | Maybe this would help you differentiate: GPT-3, DALL-E 2 etc.
         | uses transformers, while AlphaGo, OpenAI Five etc. uses Deep
         | Reinforcement Learning. They are not mutually exclusive, but
         | just different things.
        
           | rg111 wrote:
           | Yrrk.
           | 
           | Transformers are being used in Deep RL for at least months.
           | 
           | Try these: https://scholar.google.com/scholar?q=transformer+d
           | eep+reinfo...
        
       | [deleted]
        
       | abidlabs wrote:
       | May also be of interest: https://github.com/huggingface/deep-rl-
       | class
        
       | Baopab wrote:
       | This looks great! Thank you for all the thought and effort you
       | have put into it.
       | 
       | I am currently working on a project where I need to use RLlib for
       | a capacity planning problem. Looks like I will learn a thing or
       | two over the weekend.
       | 
       | I will eventually need to use a custom environment, so it's great
       | to see it's included in your roadmap. Most courses I have seen
       | totally ignored that. Fancy Atari envs are great for practice and
       | have wow factor, but you need a custom environment to do anything
       | resembling real work.
       | 
       | Would I need a beefy GPU for the coding challenges?
        
         | gh1 wrote:
         | I am glad you like it. The coding exercises don't require a
         | GPU. Thankfully, most RL problems (and certainly the ones used
         | in the course) require small neural nets which can be trained
         | in reasonable time using a CPU.
        
       | Buttons840 wrote:
       | My suggestions for learning deep RL are the book Grokking Deep RL
       | and the Spinning Up website. These are reading focused obviously.
       | Then, when your implementations don't work, compare them to
       | minimal-rl. I don't intend to detract from this course, just
       | adding some of my own suggestions on the topic.
       | 
       | https://www.manning.com/books/grokking-deep-reinforcement-le...
       | 
       | https://spinningup.openai.com/en/latest/
       | 
       | https://github.com/seungeunrho/minimalRL/blob/master/sac.py
        
         | [deleted]
        
       | thrill wrote:
       | When I visit the site using Edge, even with Adblockers disabled,
       | I'm unable to view the courses listed as "preview", such as
       | https://courses.dibya.online/courses/fastdeeprl/lectures/383...,
       | and instead get a notice that "this page has been blocked by
       | Microsoft Edge".
        
         | gh1 wrote:
         | I am sorry about that. Unfortunately, the same thing happens in
         | Firefox when the tracking protection is set to "strict".
         | 
         | This is apparently happening after Teachable updated their
         | video player. Earlier, they used Wistia. Now they use Hotmart.
         | 
         | I have informed Teachable about this issue. They said they will
         | look into it.
         | 
         | The current workaround would be to use Chrome or Firefox (with
         | tracking protection set to a level below "strict").
        
       ___________________________________________________________________
       (page generated 2022-06-03 23:00 UTC)