Longformer (2020)

The Long-Document Transformer

6 min readNov 27, 2022

In 2020, researchers at Allen Institute for Artiﬁcial Intelligence (AI2) published “Longformer: The Long-Document Transformer”.

AI2 is a non-profit research organization that hosts the Semantic Scholar website providing AI-driven search and discovery tools for research publications. They crafted a new attention mechanism for processing long-document to overcome the issue with the original transformer architecture when scaling to long-form texts.

Longformer is an important work that influenced later works that must process long sequences like steams of video frames.

This article explains the Longformer’s attention mechanism.

Problem with Long Sequence

The transformer is well-known for its self-attention mechanism in which each token in the input sequence refers to every token in the same sequence via dot-product operations. As such, the computation complexity in memory and space becomes O(n²), where n is the number of tokens.

So, the computational complexity quadratically increases with the sequence length, which makes it very expensive to process long sequences. We could divide a long sequence into smaller chunks, but then we would lose the global context with the sequence, reducing the quality…

Longformer (2020)

The Long-Document Transformer

Problem with Long Sequence

Written by Naoki