New research suggests that when a task gets harder, a reward’s influence on us goes down, while punishment’s goes up.
This newly found relationship between conflict and reinforcement learning suggests that the circuits in the frontal cortex that calculate the degree of conflict, effort, and difficulty of actions are integrated with the dopamine-driven circuits that govern perceptions of reward and punishment in another part of the brain, the striatum.
In two sets of experiments reported in Nature Communications, scientists gathered evidence for the link using EEG scans, genetic tests, manipulation with a low dose of a dopamine-related drug, and even tracking eye blinks.
“The signals in the cortex that respond to conflict act to induce an aversive learning signal in your basic reinforcement learning systems,” says Brown University cognitive scientist Michael Frank, coauthor of the study led by former student James Cavanagh, now an assistant professor at the University of New Mexico.
Conflict ‘alarm bells’
The conflict in the experimental learning was merely a matter of having to use the left hand to indicate the selection of a stimulus on the right side of a screen, or vice versa. This simple case of spatial conflict is well established in cognitive psychology.
In this study it slowed responses by only about 12 milliseconds but elicited reliable EEG brain signals typically associated with a conflict-induced “alarm bell.”
Here’s how the task worked: In a learning phase the 83 volunteer adults simply had to press the left button on a game pad when they saw a blue shape or the right button when they saw a yellow one. There were four shapes in all (call them A, B, C, and D) that could appear on either side of the screen.
Each shape had a different probability of providing a one point reward when learners pressed the correct button. A was always rewarded, D was seldom rewarded. B and C were each equivalently rewarded 50 percent of the time, but B never provided a point when it appeared on the side opposite from the button and C’s reward occurred only when it appeared on the side opposite from the button.
In this way, punishment (no points) for B became associated with the opposite-side conflict as did C’s reward (one point).
After the conflict-infused learning phase, people then moved on to a second phase where they were shown pairs of these previously observed shapes and had to indicate their preferences in terms of which one they thought was more rewarding.
Everyone learned that A was rewarding and D was not, but learned perceptions of B and C were skewed in one of two ways for each participant. For those who learn better from reward, conflict acted to reduce experienced reward value, leading to a preference for B over C.
For those who learn better from avoiding punishment, conflict acted to enhanced experienced punishment value, leading to greater avoidance of B. In essence the latter effect is like “adding insult to injury,” where conflict made gaining no points even more aversive.
The researchers weren’t just relying on behavioral observation to inform their study. The EEG sensors monitored the midcingulate cortex, which previous research identified as the site where the brain determines the costs of effort, difficulty, and conflict in action.
The sensors measured the strength of theta and delta frequency brainwaves while people carried out the phases of the task.
“These findings suggest that conflict acted to both diminish reward value and to boost punishment avoidance within cortical systems associated with interpreting the salience of feedback,” write the authors.
So how does the cortical conflict signal actually change learning about reward values? The researchers looked to the volunteers’ genes, specifically one called DARPP-32, which governs how dopamine is processed in downstream areas of the brain.
That’s because research has shown that people with some variants of the gene are more sensitive to reward learning, while people with other variants are more sensitive to punishment avoidance learning, consistent with how this gene affects dopamine function in neurons sensitive to rewards and punishments in the striatum.
The genotyping confirmed that whether people became biased in favor of B or C had to do with their genetic predisposition to learn more from reward or avoiding punishment.
In a second set of experiments with 30 volunteers, Cavanagh, Frank, and their coauthors actively manipulated dopamine function in this downstream area (i.e., the striatum). They gave subjects safe, low doses of the drug cabergoline, which temporarily reduces receptivity to dopamine.
Prior work had shown that this subtle effect causes people to learn more from punishment avoidance than reward. Sure enough, it did.
Without the drug (on placebo), volunteers overall slightly favored B over C, but with the drug, that flipped to a significantly greater bias for C over B, consistent with learning from punishment avoidance.
They even observed that that degree to which this drug affected the conflict value learning was related to its effects on eye blink rate, which has been linked to dopamine activity.
Cavanagh says he hopes to apply the knowledge to better understand learning in people with obsessive-compulsive disorder and other anxiety disorders who have enhanced theta band signals of conflict.
“Does it make them learn more from ‘punishments,’ does it make them learn less from reward?” he says. “What does the consequence of this well-known alteration in anxiety have to do with the way they learn from the world?”
The National Science Foundation supported the study.
Source: Brown University