Long Short Term Memory

How LSTM Mitigated the Vanishing Gradients But Not the Exploding Gradients

12 min readSep 26, 2021

In theory, RNNs (Recurrent Neural Networks) should extract features (hidden states) from long sequential data. In reality, researchers had a hard time training the basic RNNs using BPTT (Back-Propagation Through Time).

Long Short Term Memory

How LSTM Mitigated the Vanishing Gradients But Not the Exploding Gradients

Written by Naoki