大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2016.11】用于强化学习的深度学习方法

已有 1540 次阅读 2020-7-20 17:02 |系统分类:科研笔记|文章来源:转载

本文为葡萄牙里斯本技术大学(作者:Daniel Luis Simões Marta)的硕士论文,共95页。

 

本文主要研究了在强化学习中应用深度学习方法时,分离状态感知函数逼近的挑战。作为一个起点,高维状态被认为是将强化学习应用于现实任务时的基本限制。针对维数灾难问题,我们建议降低数据的维数,以获得简洁的代码(环境的内部表示),作为强化学习框架中的替代状态。在过去的几十年中,人们采用了不同的方法,包括具有手工设计功能的内核机制,在这些机制中,选择合适的滤波器匹配任务,并且需要进行大量的研究。在这项工作中,各种深度学习方法与无监督学习机制被考虑。

 

另一个关键主题涉及估算大状态空间的Q,在这种情况下,表格方法不再可行。作为一种Q函数逼近的方法,我们在深度学习中寻找有监督学习方法。本文的目标包括详细探讨和理解所提出的方法,并实现一个神经控制器。考虑到各种优化程序和增加的参数,进行了一些模拟,得出了一些结论。多种结构被用作Q值函数的近似。为了推断更好的方法并提示更高规模的应用,在两种类似的Q网络之间进行了试验。关于最新技术的实现在经典控制问题上进行了测试分析。

 

This thesis focuses on the challenge ofdecoupling state perception and function approximation when applying DeepLearning Methods within Reinforcement Learning. As a starting point,high-dimensional states were considered, being this the fundamental limitationwhen applying Reinforcement Learning to real world tasks. Addressing the Curseof Dimensionality issue, we propose to reduce the dimensionality of data inorder to obtain succinct codes (internal representations of the environment),to be used as alternative states in a Reinforcement Learning framework.Different approaches were made along the last few decades, including KernelMachines with hand-crafted features, where the choice of appropriate filterswas task dependent and consumed a considerable amount of research. In thiswork, various Deep Learning methods with unsupervised learning mechanisms wereconsidered. Another key thematic relates to estimating Q-values for largestate-spaces, where tabular approaches are no longer feasible. As a mean toperform Q-function approximation, we search for supervised learning methodswithin Deep Learning. The objectives of this thesis include a detailedexploration and understanding of the proposed methods with the implementationof a neural controller. Several simulations were performed taking into accounta variety of optimization procedures and increased parameters to draw several conclusions.Several architectures were used as a Q-value function approximation. To inferbetter approaches and hint for higher scale applications, a trial between twosimilar types of Q-networks were conducted. Implementations regardingstate-of-the-art techniques were tested on classic control problems.

 

 

1. 引言

2. 深度学习的概念

3. 强化学习

4. 实验架构

5. 实验结果

6. 结论


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




https://wap.sciencenet.cn/blog-69686-1242829.html

上一篇:[转载]【信息技术】【2016】用于FPGA的轻量级认证加密
下一篇:[转载]【信息技术】【2003.03】视觉监控应用中人体跟踪算法的设计与实现
收藏 IP: 114.102.184.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-24 22:54

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部