Psych/Cogst 4650/6650, Spring 2012: Reinforcement Learning: Computational and Brain Aspects

Instructors: Shimon Edelman, Barbara Finlay
Time: Wednesdays 10:10-12:35. Place: 369 Uris Hall.
Blackboard site: HERE; use it to post questions, suggestions for additional readings, etc.
Readings: a zipfile with all the PDFs is available on Blackboard. For a week-by-week list, see below; for some annotations, see here.

I. Introduction: the idea of reinforcement learning (RL) in machine learning and neuroscience

Jan 25:

Overview of reinforcement learning contrasted with other approaches (Edelman).
General motor command structure in the brain (Finlay).
Review topic assignments for the later dates.

Feb 1:

Deeper into the basal ganglia (Finlay).
Computational issues in action learning (Edelman).
Choose presentation subjects and dates.

Readings for Jan. 25 — Feb 1 introductory material (required):

Woergoetter, W., and B. Porr (2007). Reinforcement learning. Scholarpedia, 3(3):1448.
Parent, A. and L.-N. Hazrati (1995). Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Research Reviews 20:91-127.
Wolpert, D. M., J. Diedrichsen, J. R. Flanagan (2011). Principles of sensorimotor learning. Nature Reviews Neuroscience 12:739-752.
Atallah, H. E., Frank, M. J., & O'Reilly, R. C. (2004). Hippocampus, cortex, and basal ganglia: Insights from computational models of complementary learning systems. Neurobiology of Learning and Memory, 82(3), 253-267.
Chater, N. (2009). Rational and mechanistic perspectives on reinforcement learning. Cognition 113:350-364.
Sutton, R. and A. G. Barto (1998). Reinforcement Learning (BOOK).

Background reading for basic brain structure review:

Purves et al. (2008). The Human Nervous System. Chapter 1 in Principles of Cognitive Science, Sinauer. Clear overview of basic structure and terminology, including the assumed background on neurons and action potentials.
Solari, S. V. H., & Stone, R. A. (2011). Cognitive consilience: primate non-primary circuits underlying cognition. Frontiers in Neuroanatomy, 5, 65. doi: 10.3389/fnana.2011.00065

II. Brain Mechanisms and Models of Reinforcement Learning

Feb 8:

Basic basal ganglia circuitry and models

Aldridge, J. W., & Berridge, K. C. (1998). Coding of serial order by neostriatal neurons: A ''natural action'' approach to movement sequence. Journal of Neuroscience, 18(7), 2777-2787. Also, Aldridge and Berridge review.
Frank, M. J. (2011). Computational models of motivated action selection in corticostriatal circuits. Current Opinion in Neurobiology, 21(3), 381-386.
Graybiel, A. M. (2008). Habits, rituals, and the evaluative brain. Annual Review of Neuroscience, 31(1), 359-387.
Redgrave, P., Vautrelle, N., & Reynolds, J. N. J. (2011). Functional properties of the basal ganglia's re-entrant loop architecture: selection and reinforcement. Neuroscience, 198, 138-151.

Additional material, not required (note: Most of these review similar material, but frame the content in the specific academic perspective indicated, except Cohen and Frank, which is a different version of Frank 2011):

Ashby, F. G., Turner, B. O., & Horvitz, J. C. (2010). Cortical and basal ganglia contributions to habit learning and automaticity. Trends in Cognitive Sciences, 14(5), 208-215.
Bar-Gad, I., Morris, G., & Bergman, H. (2003). Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Progress in Neurobiology, 71(6), 439-473.
Balleine, B. W., Liljeholm, M., & Ostlund, S. B. (2009). The integrative function of the basal ganglia in instrumental conditioning. Behavioural Brain Research, 199:43-52.
Botvinick, M. M., Niv, Y., & Barto, A. C. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3), 262-280.
Chakravarthy, V. S., D. Joseph, R. S. Bapi (2010). What do the basal ganglia do? A modeling perspective. Biol Cybern 103:237-253.
Cohen, M. X., & Frank, M. J. (2009). Neurocomputational models of basal ganglia function in learning, memory and choice. Behavioural Brain Research, 199:141-156.

Feb 15:

The process of learning and unlearning as insight into models

Jin, X., & Costa, R. M. (2010). Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature, 466(7305), 457-462.
Thorn CA, Atallah H, Howe M, Graybiel AM (2010). Differential dynamics of activity changes in dorsolateral and dorsomedial striatal loops during learning. Neuron 66:781-795.
Howe, M. W., Atallah, H. E., McCool, A., Gibson, D. J., & Graybiel, A. M. (2011). Habit learning is associated with major shifts in frequencies of oscillatory activity and synchronized spike firing in striatum. Proceedings of the National Academy of Sciences, 108(40), 16801-16806.

Additional:

Jin, D. Z., Fujii, N., & Graybiel, A. M. (2009). Neural representation of time in cortico-basal ganglia circuits. Proceedings of the National Academy of Sciences, 106(45), 19156-19161.

Feb 22:

Chaining, embedding, interrupting and unlearning (1)

Charlesworth, J. D., Tumer, E. C., Warren, T. L., & Brainard, M. S. (2011). Learning the microstructure of successful behavior. [10.1038/nn.2748]. Nat Neurosci, 14(3), 373-380
Bornstein, A. M., & Daw, N. D. (2011). Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Current Opinion in Neurobiology, 21(3), 374-380

Amemori, K.-i., Gibb, L. G., & Graybiel, A. M. (2011).

Additional:

Ito, M., & Doya, K. (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Current Opinion in Neurobiology, 21(3), 368-373.

Feb 29:

Chaining, embedding, interrupting and unlearning (2)

Ding, J. B., Guzman, J. N., Peterson, J. D., Goldberg, J. A., & Surmeier, D. J. (2010). Thalamic gating of corticostriatal signaling by cholinergic interneurons. Neuron, 67(2), 294-307.
Isoda, M., & Hikosaka, O. (2011). Cortico-basal ganglia mechanisms for overcoming innate, habitual and motivational behaviors. [10.1111/j.1460-9568.2011.07698.x]. European Journal of Neuroscience, 33(11), 2058-2069.
Special topic: Michael Anderson, Colloquium Speaker in Psychology this week. Anderson, M. (2010). Neural re-use as a fundamental organizational principal of the brain. Behavioral and Brain Sciences.

Mar 7:

Individual differences in performance of models and subjects, disorders

Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry. Trends in Cognitive Sciences, 16(1), 72-80.
Redgrave, P., Rodriguez, M., Smith, Y., Rodriguez-Oroz, M. C., Lehericy, S., Bergman, H., . . . Obeso, J. A. (2010). Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease. [10.1038/nrn2915]. Nat Rev Neurosci, 11(11), 760-772.
Neiman, T. and Y. Loewenstein (2011). Reinforcement learning in professional basketball players. Nature Communications 2:569.

Mar 14:

Assigning agency

Review Wolpert (under Jan. 25 readings).
Gallagher, S. (2012). Multiple aspects in the sense of agency. New Ideas in Psychology 30:15-31.
Wegner D. (2004). Precis of The illusion of conscious will. Behavioral and Brain Sciences, 27:649-659 (not commentaries). Whoever signs up for this should get the book and present some of the experiments reviewed, with an eye to the statistical assignment of agency, not the consciousness aspect.
Whitham, E. M., Fitzgibbon, S. P., Lewis, T. W., Pope, K. J., DeLosAngeles, D., Clark, C. R., . . . Willoughby, J. O. (2011). Visual experiences during paralysis. Frontiers in Human Neuroscience, 5. doi: 10.3389/fnhum.2011.00160

Mar 21: SPRING BREAK

Mar 28:

Multiple types of reinforcers and gates: Opiates and oxytocin

Humphries, M. D., & Prescott, T. J. (2010). The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Progress in Neurobiology, 90(4), 385-417.
Ross, H. E., & Young, L. J. (2009). Oxytocin and the neural mechanisms regulating social cognition and affiliative behavior. Frontiers in Neuroendocrinology, 30(4), 534-547.
Depue, R. L., & Collins, P. F. (1999). Neurobiology of the structure of personality: Dopamine, facilitation of incentive motivation, and extraversion. Behavioral and Brain Sciences, 22, 491-569.

An example, but not necessarily a "model", of research in this area using primates and neuroimaging:

Chang, S. W. C., Barter, J. W., Ebitz, R. B., Watson, K. K., & Platt, M. L. (2012). Inhaled oxytocin amplifies both vicarious reinforcement and self reinforcement in rhesus macaques (Macaca mulatta). Proceedings of the National Academy of Sciences, 109:959-964.

April 4:

Hierarchical and other control architectures: computation

Parr, P. and S. Russell (1997). Reinforcement Learning with Hierarchies of Machines. Proc. NIPS.
Botvinick, M., Y. Niv, A. G. Barto (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition 113:262-280.
Ribas-Fernandes, J. J. F., A. Solway, C. Diuk, J. T. McGuire, A. Barto, Y. Niv, and M. Botvinick (2011). A Neural Signature of Hierarchical Reinforcement Learning. Neuron 71:370-379.
Barto, A. G. and S. Mahadevan (2003). Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Systems 14:41-77.
Vigorito, C. M. and A. G. Barto (2010). Intrinsically Motivated Hierarchical Skill Learning in Structured Environments. IEEE Transactions on Autonomous Mental Development 2:132-144.

April 11:

Negative reinforcement and punishment

Matsumoto M, Hikosaka O (2009). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459:837-841.
LeDoux, J. E. (2003). The emotional brain, fear and the amygdala. Cellular and Molecular Neurobiology, 23.

Additional:

Leknes, S., & Tracey, I. (2008). A common neurobiology for pain and pleasure. Nature Reviews Neuroscience, 9:314-320.

April 18:

The special case of anxiety: resource allocation and control

Egner, T., Etkin, A., Gale, S., & Hirsch, J. (2008). Dissociable neural systems resolve conflict from emotional versus nonemotional distracters. Cerebral Cortex, 18(6), 1475-1484. (Also Egner commentary).

Additional:

Duncan, J. (2010). The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends in Cognitive Sciences, 14:172-179.
Hurley, MM, Dennet, D. and Adams, R. (2011). Inside Jokes: Using humor to reverse-engineer the mind. MIT Press. Chapts 2 and 5 for a sketch of the argument and an interesting twist on what a "reinforcement" is.

April 25:

Integrating basal ganglia function into memory and decision-making

O'Reilly, R. C. and M. J. Frank (2006). Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia. Neural Computation 18:283-328.
van der Meer MA, Johnson A, Schmitzer-Torbert NC, Redish AD (2010). Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron 67:25-32.
McNab, F. and T. Klingberg (2008). Prefrontal cortex and basal ganglia control access to working memory. Nature Neuroscience 11:103-108.
Ullman, M. T. (2006). Is Broca's area part of a basal ganglia thalamocortical circuit? Cortex 42:480-485.

May 2:

Overview

Shmuelof, L., & Krakauer, J. W. (2011). Are we ready for a natural history of motor learning? Neuron, 72(3), 469-476.

Shimon Edelman <se37 at cornell.edu>

Last modified on Mon Feb 27 14:50:27 2012