[HN Gopher] Dopamine and Temporal Difference Learning ___________________________________________________________________ Dopamine and Temporal Difference Learning Author : magoghm Score : 92 points Date : 2020-02-09 13:35 UTC (9 hours ago) (HTM) web link (deepmind.com) (TXT) w3m dump (deepmind.com) | afarrell wrote: | I can partly understand some this based on an EE101-level control | theory, a High School level model of how neurons work, and | 3blue1brown's intro to neural nets[1]. However, I have a strong | personal interest in developing a much deeper understanding of | dopamine neurons and their role in Executive Function. | | Can anyone recommend a good curriculum which can take a random | web developer from "Knows what a myelinated axon, a sigmoid | function, and a feedback loop are" to having a solid enough | background to dive into the research on this? | | [1] https://www.youtube.com/watch?v=aircAruvnKk | scribu wrote: | Well, there's a major called Behavioral Neuroscience [1], which | sounds like what you're after. | | Or you could do a whole undergraduate program just on | neuroanatomy. [2] | | There's also this subfield called Cognitive Neuroscience [3] | | (I'm not an expert either.) | | [1] https://psychology.nova.edu/undergraduate/behavioral- | neurosc... | | [2] https://neuro.ucr.edu/courses/nrsc200a | | [3] https://en.wikipedia.org/wiki/Cognitive_neuroscience | afarrell wrote: | This seems to so far only address learning in a context where the | reward appears shortly after the behavior which caused it. That | is valuable to understand, but seems like it fails to yet explain | how Executive Functions work. | | > What happens if an individual's brain "listens" selectively to | optimistic versus pessimistic dopamine neurons? Does this give | rise to impulsivity, or depression? | | My intuition is that impulsivity would arise as a result of | giving much greater weight to the signals of a very recently- | trained network than to a less-recently trained network. | | This all raises a few questions for me: | | 1) How does a brain _recognize_ reward in order to fire the | signal which trains a dopamine network? It seems straightforward | for the taste of food, a hug from a fellow-tribesman, or a bell | from playing a video game for 5 hours straight (in a simulated | Atari environment). | | How does a brain recognize reward while it is (for example) | writing a teacher-assigned essay for an unclear audience or a | python program without Test-Driven Development? | | What does the brain use as its leading-KPIs? | | 2) How does a brain select how much it listens to different | networks which predict rewards from different actions so it is | "robust to changes in the environment or changes in the policy". | How does the brain adjust its attention in response to changing | context? | pizza wrote: | Mere speculation but maybe there are multiplexed temporal- | difference-ish networks that correlate different reward | frequencies per basis (like the different x[k] for the Fourier | transform of a signal x[n]) | bradknowles wrote: | Is anyone else having problems with just getting a blank page | when trying to load that site? | | Or is it just me on iOS? | riwsky wrote: | I hit this, and the solution for me was to turn off my content | blocker. | cs702 wrote: | Great blog post on great research. Worth reading in its entirety. | | Summarizing at a very high level abstraction: This work compares | a mechanism used for learning probability distributions of | expected rewards in deep reinforcement learning systems to the | dopamine reward mechanism in mice brains. | | This passage near the end, in particular, caught my eye : | | _> ...our final question was if we could decode the reward | distribution from the firing rates of dopamine cells [in mice | brains]. As shown in Figure 5, we found that it was indeed | possible, using only the firing rates of dopamine cells, to | reconstruct a reward distribution (blue trace) which was a very | close match to the actual distribution of rewards (grey area) in | the task that the mice were engaged in. This reconstruction | relied on interpreting the firing rates of dopamine cells as the | reward prediction errors of a distributional TD model, and | performing inference to determine what distribution that model | had learned about._ | | In other words, mice brains seem to be using the same mechanism, | and it appears we can decode the probability distribution of | expected rewards learned by those brains by measuring only the | firing rate of dopamine cells. | | Very exciting! | keenmaster wrote: | How far are we from being able to use brain scanners to A/B | test online lectures to perfection? | | For example, you can show the 20 top Calculus 2 courses to | groups of 50 people each, all dawning brain scanners, and | create a "brain activation map" for each class from each | professor. Among students of the top 10% of professors (as | measured by exam results and brain activation), we can analyze | the most engaging moments in each course, and hybridize them | into a master course. Furthermore, we can analyze differential | learning outcomes in males, females, students of different | races, and K-clustered psychographic profiles (based on DMN | activity and other neurological measures taken before the | course). | | If the learning outcomes are significantly different, then it | may be more appropriate to create several different master | courses for the people that showed different learning outcomes. | A class can be recommended, Netflix style, based on your | demographics and neural activation patterns. | | Everyone would have the best Calculus 2 class, into perpetuity, | after one cycle of experimentation and master class creation. | The same can be done for all canonical coursework, from | Kindergarten through the top undergrad majors to Med/Law/MBAs. | The aforementioned learning outcome differentials would be | lessened by the "no child left behind" effect of each kid | getting a solid grasp of math and science early on with | neurologically tailored coursework. | sjg007 wrote: | You could probably just scan facial or eye movement | reactions. ___________________________________________________________________ (page generated 2020-02-09 23:00 UTC)