THE TRANSFORMER SERIES
Transformer’s Training Details
The previous article discussed the implementation of a data loader for training a model based on the transformer architecture from Attention Is All You Need by Ashish Vaswani et. al.
This article discusses the following requirements for training a transformer…