Member-only story

Neural Machine Translation

Neural Machine Translation with Attention Mechanism

How Does A Machine Translation Model Know Where To Look?

6 min readSep 28, 2021

This article reviews a paper titled: Neural Machine Translation By Jointly Learning To Align And Translate by Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio.

In 2014, machine translation using neural networks emerged. Researchers adapted encoder-decoder (or sequence-to-sequence) architectures that encode a sentence in one language into a fixed-length vector and then decode it to another language.

However, the approach requires the encoder to compress all required information into a fixed-length vector, no matter how long the source sentence is, making it difficult for the model to handle long sentences. The performance of such an encoder-decoder model goes down sharply as the length of an input sentence increases.

The paper proposed an extension to overcome the limitation of the encoder-decoder architecture by letting the decoder access all hidden states, not just the final one from the encoder. Moreover, the author introduced the attention mechanism so that the decoder can learn how to use appropriate context to translate the source sentence into the target language.