[读论文]---084 基于神经网络的序列-序列学习

Sequence to Sequence Learning with Neural Networks


Deep NeuralNetworks (DNNs) are powerful models that have achieved excellent performance ondifficult learning tasks. Although DNNs work well whenever large labeledtraining sets are available, they cannot be used to map sequences to sequences.In this paper, we present a general end-to-end approach to sequence learningthat makes minimal assumptions on the sequence structure. Our method uses amultilayered Long Short-Term Memory (LSTM) to map the input sequence to avector of a fixed dimensionality, and then another deep LSTM to decode thetarget sequence from the vector. Our main result is that on an English toFrench translation task from theWMT-14 dataset, the translations produced bythe LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM’sBLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM didnot have difficulty on long sentences. For comparison, a phrase-based SMTsystem achieves a BLEU score of 33.3 on the same dataset. When we used the LSTMto rerank the 1000 hypotheses produced by the aforementioned SMT system, itsBLEU score increases to 36.5, which is close to the previous state of the art.The LSTM also learned sensible phrase and sentence representations that aresensitive to word order and are relatively invariant to the active and thepassive voice. Finally, we found that reversing the order of the words in all sourcesentences (but not target sentences) improved the LSTM’s performance markedly,because doing so introduced many short term dependencies between the source andthe target sentence which made the optimization problem easier.



