Wednesday, November 9, 2011

Adaptive Coding of Reward Value by Dopamine Neurons

Authors: Philippe N. Tobler, Christopher Fiorillo, Wolfram Schultz
Summary: Midbrain DA neurons adapt to information provided by reward-predicting stimuli. The neuronal responses changed relative to the expected reward value; gain changed relative to the variance.
  • Background:
    • Expected Value (i.e., the 'average')
$$E\left[X\right]=\int_{-\infty}^{\infty}x\!\cdot \!p\left(x\right)\,\,\,\,\mathrm{d}x$$

    • "In order to select the action associated with the largest reward, it is critical that the neural representation of reward has minimal uncertainty."
      • I'm not sure this is in line with the basic tenets of information theory.
      • I'll ignore this statement's inconsistency with information theory, what does it even mean? How does a 'representation' have an associated uncertainty?
    • "... the representational capacity of the brain is limited, as exemplified by its finite number of neurons and the limited number of possible spike outputs of each neuron."
      • Also a fundamentally inconsistent statement from what we know about computing machines.
        • You don't need an infinite number of neurons to represent infinitely many things; you need procedures that can take an arbitrary input and produce an output from a possibly infinite set. What is finite is the encoding scheme used by a computing machine.
          • An example is the operation of addition as implemented in a modern computer.
          • Computers are not infinite in any sense of the word yet somehow they can do arbitrary precision arithmetic.
      • As for the second statement, it's a little more plausible although I'm not sure anyone knows what the functional significance of spike outputs are. This is simply what we observe when record neuronal activity from the brain in response to a stimulus.
  • Experiment(s):
    • Five stimuli
      • Each indicated the probability that a specific volume would be delivered 2 seconds after stimulus onset.
      • Monkeys started to lick once they learned that the visual stimulus predicted a reward (A).
      • Transient activation of DA neurons increased monotonically with the expected volume associated with each stimulus (B and C).
    • Are individual neurons sensitive to probability and/or magnitude?
      • Measured both magnitude and probability independently and found a correlation between the two (spikes / ml).
      • When Tobler says the expected reward value does he really mean just he product of the probability and the magnitude?
    • What is the extent to which DA neurons discriminate between different volumes of unpredicted liquid?
    • How does DA neuron activity scale with the difference between actual and expected reward?
      • Look at DA responses at the time of the reward from experiment shown in figure 1.
      • 1A shows that animals can discriminate between stimuli.
      • The larger of the two volumes always elicited an increase in activity at the time of the reward, and the smaller a decrease.
        • The magnitude of activation or suppression appeared to be identical in each case.
      • DA neurons do not scale according to the absolute difference between actual and expected reward.
        • The gain of the neural responses appeared to adapt according to the discrepancy in volume between the two potential outcomes.
      • Figure 4C to the right shows the median neural responses as a function of liquid volume and.
        • Large 'difference' or 'variance' between expected reward magnitude and actual reward magnitude shows less activation small shows more.
        • It doesn't matter what the absolute value of the difference between the smaller and larger rewards is, as long as they have an equal probability of occurrence.
        • The larger of the two rewards always elicited the same increase and the smaller the same decrease regardless of absolute magnitude.
  • Conclusions
    • The authors suggest then, that activity in DA neurons carries information on the magnitude of reward.
    • The intuitive notion is something like: "Adjust the animal's behavior via brain activity such that the reward outcomes that are most probable elicit the least variable response(s), regardless of the absolute size of the reward."

No comments:

Post a Comment