小柯机器人

研究建立多巴胺强化学习的价值分配代码
2020-01-16 16:58

英国DeepMind公司Will Dabney研究组提出了基于多巴胺的强化学习的价值分配码。这一研究成果在线发表在2020年1月15日的《自然》上。

他们提出了一种基于多巴胺的强化学习的方法,该方法受最近关于分布式强化学习的人工智能研究的启发。他们假设大脑不是以单一均值的方式代表未来可能的回报,而是以概率分布的方式来代表,可以有效地同时并行地代表多个未来成果。这个想法暗示了一组经验预测,我们使用来自小鼠腹侧被盖区的单个单元记录进行了测试。他们的发现为神经网络实现分布强化学习提供了有力的证据。

据悉,自多巴胺的奖励预测误差理论引入以来,已经解释了许多经验现象,为理解大脑中的奖励和价值表示提供了一个统一的框架。根据现在的规范理论,报酬预测表示为单个标量,它支持了解随机结果的期望值或均值。

附:英文原文

Title: A distribvalue in dopamine-based reinforcement learning

Author: Will Dabney, Zeb Kurth-Nelson, Naoshige Uchida, Clara Kwon Starkweather, Demis Hassabis, Rmi Munos, Matthew Botvinick

Issue&Volume: 2020-01-15

Abstract: Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1,2,3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4,5,6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

DOI: 10.1038/s41586-019-1924-6

Source: https://www.nature.com/articles/s41586-019-1924-6

Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html


本期文章:《自然》:Online/在线发表

分享到:

0