[HN Gopher] Dopamine and Temporal Difference Learning
       ___________________________________________________________________
        
       Dopamine and Temporal Difference Learning
        
       Author : magoghm
       Score  : 92 points
       Date   : 2020-02-09 13:35 UTC (9 hours ago)
        
 (HTM) web link (deepmind.com)
 (TXT) w3m dump (deepmind.com)
        
       | afarrell wrote:
       | I can partly understand some this based on an EE101-level control
       | theory, a High School level model of how neurons work, and
       | 3blue1brown's intro to neural nets[1]. However, I have a strong
       | personal interest in developing a much deeper understanding of
       | dopamine neurons and their role in Executive Function.
       | 
       | Can anyone recommend a good curriculum which can take a random
       | web developer from "Knows what a myelinated axon, a sigmoid
       | function, and a feedback loop are" to having a solid enough
       | background to dive into the research on this?
       | 
       | [1] https://www.youtube.com/watch?v=aircAruvnKk
        
         | scribu wrote:
         | Well, there's a major called Behavioral Neuroscience [1], which
         | sounds like what you're after.
         | 
         | Or you could do a whole undergraduate program just on
         | neuroanatomy. [2]
         | 
         | There's also this subfield called Cognitive Neuroscience [3]
         | 
         | (I'm not an expert either.)
         | 
         | [1] https://psychology.nova.edu/undergraduate/behavioral-
         | neurosc...
         | 
         | [2] https://neuro.ucr.edu/courses/nrsc200a
         | 
         | [3] https://en.wikipedia.org/wiki/Cognitive_neuroscience
        
       | afarrell wrote:
       | This seems to so far only address learning in a context where the
       | reward appears shortly after the behavior which caused it. That
       | is valuable to understand, but seems like it fails to yet explain
       | how Executive Functions work.
       | 
       | > What happens if an individual's brain "listens" selectively to
       | optimistic versus pessimistic dopamine neurons? Does this give
       | rise to impulsivity, or depression?
       | 
       | My intuition is that impulsivity would arise as a result of
       | giving much greater weight to the signals of a very recently-
       | trained network than to a less-recently trained network.
       | 
       | This all raises a few questions for me:
       | 
       | 1) How does a brain _recognize_ reward in order to fire the
       | signal which trains a dopamine network? It seems straightforward
       | for the taste of food, a hug from a fellow-tribesman, or a bell
       | from playing a video game for 5 hours straight (in a simulated
       | Atari environment).
       | 
       | How does a brain recognize reward while it is (for example)
       | writing a teacher-assigned essay for an unclear audience or a
       | python program without Test-Driven Development?
       | 
       | What does the brain use as its leading-KPIs?
       | 
       | 2) How does a brain select how much it listens to different
       | networks which predict rewards from different actions so it is
       | "robust to changes in the environment or changes in the policy".
       | How does the brain adjust its attention in response to changing
       | context?
        
         | pizza wrote:
         | Mere speculation but maybe there are multiplexed temporal-
         | difference-ish networks that correlate different reward
         | frequencies per basis (like the different x[k] for the Fourier
         | transform of a signal x[n])
        
       | bradknowles wrote:
       | Is anyone else having problems with just getting a blank page
       | when trying to load that site?
       | 
       | Or is it just me on iOS?
        
         | riwsky wrote:
         | I hit this, and the solution for me was to turn off my content
         | blocker.
        
       | cs702 wrote:
       | Great blog post on great research. Worth reading in its entirety.
       | 
       | Summarizing at a very high level abstraction: This work compares
       | a mechanism used for learning probability distributions of
       | expected rewards in deep reinforcement learning systems to the
       | dopamine reward mechanism in mice brains.
       | 
       | This passage near the end, in particular, caught my eye :
       | 
       |  _> ...our final question was if we could decode the reward
       | distribution from the firing rates of dopamine cells [in mice
       | brains]. As shown in Figure 5, we found that it was indeed
       | possible, using only the firing rates of dopamine cells, to
       | reconstruct a reward distribution (blue trace) which was a very
       | close match to the actual distribution of rewards (grey area) in
       | the task that the mice were engaged in. This reconstruction
       | relied on interpreting the firing rates of dopamine cells as the
       | reward prediction errors of a distributional TD model, and
       | performing inference to determine what distribution that model
       | had learned about._
       | 
       | In other words, mice brains seem to be using the same mechanism,
       | and it appears we can decode the probability distribution of
       | expected rewards learned by those brains by measuring only the
       | firing rate of dopamine cells.
       | 
       | Very exciting!
        
         | keenmaster wrote:
         | How far are we from being able to use brain scanners to A/B
         | test online lectures to perfection?
         | 
         | For example, you can show the 20 top Calculus 2 courses to
         | groups of 50 people each, all dawning brain scanners, and
         | create a "brain activation map" for each class from each
         | professor. Among students of the top 10% of professors (as
         | measured by exam results and brain activation), we can analyze
         | the most engaging moments in each course, and hybridize them
         | into a master course. Furthermore, we can analyze differential
         | learning outcomes in males, females, students of different
         | races, and K-clustered psychographic profiles (based on DMN
         | activity and other neurological measures taken before the
         | course).
         | 
         | If the learning outcomes are significantly different, then it
         | may be more appropriate to create several different master
         | courses for the people that showed different learning outcomes.
         | A class can be recommended, Netflix style, based on your
         | demographics and neural activation patterns.
         | 
         | Everyone would have the best Calculus 2 class, into perpetuity,
         | after one cycle of experimentation and master class creation.
         | The same can be done for all canonical coursework, from
         | Kindergarten through the top undergrad majors to Med/Law/MBAs.
         | The aforementioned learning outcome differentials would be
         | lessened by the "no child left behind" effect of each kid
         | getting a solid grasp of math and science early on with
         | neurologically tailored coursework.
        
           | sjg007 wrote:
           | You could probably just scan facial or eye movement
           | reactions.
        
       ___________________________________________________________________
       (page generated 2020-02-09 23:00 UTC)