Long Short Term Memory
How LSTM Mitigated the Vanishing Gradients But Not the Exploding Gradients
In theory, RNNs (Recurrent Neural Networks) should extract features (hidden states) from long sequential data. In reality, researchers had a hard time training the basic RNNs using BPTT (Back-Propagation Through Time).