Attention Mechanism¶

Attention mechanism m ostly used on RNN for NLP, but SAGAN using self-attention on CNN.

Align and Translate¶

Neural Machine Translation by Jointly Learning to Align and Translate (ICLR 2015)

In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.