Long Short-Term Memory

How LSTM Mitigated the Vanishing Gradients But Not the Exploding Gradients

Naoki
11 min readSep 26, 2021

--

In theory, RNNs (Recurrent Neural Networks) should extract features (hidden states) from long sequential data. In reality, researchers had a hard time training the basic RNNs using BPTT (Back-Propagation Through Time).

The main reasons are the vanishing and exploding gradient problems, which LSTM (Long Short Term Memory) mitigated enough to be more trainable but did not entirely solve. Then, what are the remaining issues with LSTM?

To understand the issue, we need to know how BPTT works. Then, it will be clearer how the vanishing and exploding gradients occur. After that, we can appreciate why LSTM works better than the basic RNN, especially for long sequential data. Finally, we will understand why LSTM does not completely solve the problems.

In this article, we discuss the following topics:

  • BPTT (Back-Propagation Through Time)
  • Vanishing and Exploding Gradients
  • LSTM (Long Short Term Memory)
  • BPTT Through LSTM Cells

BPTT (Back-Propagation Through Time)

Let’s use a basic RNN to discuss how BPTT works. Suppose the RNN uses the final…

--

--