THE TRANSFORMER’S

Self-Attention

Why Is Attention All You Need?

Naoki
10 min readNov 14, 2021

--

In 2017, Vaswani et al. published a paper titled “Attention Is All You Need” for the NeurIPS conference. The transformer architecture does not use any recurrence or convolution. It solely relies on attention mechanisms.

In this article, we discuss the attention mechanisms in the transformer:

--

--