大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2013】训练递归神经网络

已有 867 次阅读 2021-8-25 20:20 |系统分类:科研笔记|文章来源:转载

图片


本文为加拿大多伦多大学(作者:Ilya Sutskever)的博士论文,共101页。

 

递归神经网络(RNNs)是一种很难训练的强大序列模型,因此在机器学习应用中很少使用。本文提出了克服神经网络训练困难的方法,并将其应用于具有挑战性的问题。我们首先描述了一个新的概率序列模型,它结合了受限玻耳兹曼机和RNN。新模型比其它模型的性能更强大,而训练难度更低。接下来,我们还介绍了一个新的无Hessian(HF)优化器,并说明它可以在具有极端长时间依赖性的任务上训练RNN,这在以前被认为是不可能的。然后将HF应用于字符级语言建模,得到了很好的结果。我们还将HF应用于最优控制,得到了能在延迟反馈和未知扰动条件下成功运行的RNN控制律。最后,我们描述了一个随机参数初始化方案,它允许带动量的梯度下降来训练具有长期依赖性问题的RNN。这直接违背了广泛流传的关于一阶方法无法做到这一点的观点,并表明以前训练RNN失败的部分原因是由于随机初始化的缺陷

 

Recurrent Neural Networks (RNNs) are powerful sequence models that werebelieved to be difficult to train, and as a result they were rarely used inmachine learning applications. This thesis presents methods that overcome thedifficulty of training RNNs, and applications of RNNs to challenging problems. Wefirst describe a new probabilistic sequence model that combines RestrictedBoltzmann Machines and RNNs. The new model is more powerful than similar modelswhile being less difficult to train. Next, we present a new variant of theHessian-free (HF) optimizer and show that it can train RNNs on tasks that haveextreme long-range temporal dependencies, which were previously considered tobe impossibly hard. We then apply HF to character-level language modelling andget excellent results. We also apply HF to optimal control and obtain RNNcontrol laws that can successfully operate under conditions of delayed feedbackand unknown disturbances. Finally, we describe a random parameterinitialization scheme that allows gradient descent with momentum to train RNNson problems with long-term dependencies. This directly contradicts wide spread beliefsabout the inability of first-order methods to do so, and suggests that previousattempts at training RNNs failed partly due to flaws in the randominitialization.

 

1.         引言

2.         项目背景

3.         递归时间受限Boltzmann机

4.         基于无Hessian优化的RNN训练

5.         基于RNN的语言建模

6.         基于RNN的控制律学习

7.         良好初始化RNN的动量方法

8.         结论


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




https://wap.sciencenet.cn/blog-69686-1301420.html

上一篇:[转载]【信息技术】【2006.06】用于人体检测和跟踪的全方位图像处理
下一篇:[转载]【电子技术】【2013】数字信号处理器的设计与FPGA实现
收藏 IP: 112.31.16.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-3-29 17:06

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部