THE TRANSFORMER SERIES

Transformer’s Training Details

Optimizer, Scheduler, Loss Function

Naoki
6 min readJan 26, 2022

--

The previous article discussed the implementation of a data loader for training a model based on the transformer architecture from Attention Is All You Need by Ashish Vaswani et. al.

This article discusses the following requirements for training a transformer…

--

--