In 2017, Vaswani et al. published a paper titled “Attention Is All You Need” for the NeurIPS conference. The transformer architecture does not use any recurrence or convolution. It solely relies on attention mechanisms.
In this article, we discuss the attention mechanisms in the transformer: