Large-Scale Pre-Trained Language Models

BERT

How and Why Does It Use The Transformer Architecture?

BERT stands for Bidirectional Encoder Representations from Transformers. As the name suggests, it generates representations using an encoder from Vaswani et al.’s Transformer architecture. However, there are notable differences between BERT and the original Transformer…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store