Long Short Term Memory

How LSTM Mitigated the Vanishing Gradients But Not the Exploding Gradients

Naoki

--

In theory, RNNs (Recurrent Neural Networks) should extract features (hidden states) from long sequential data. In reality, researchers had a hard time training the basic RNNs using BPTT (Back-Propagation Through Time).

--

--