Long Short Term Memory

How LSTM Mitigated the Vanishing Gradients But Not the Exploding Gradients

In theory, RNNs (Recurrent Neural Networks) should extract features (hidden states) from long sequential data. In reality, researchers had a hard time training the basic RNNs using BPTT (Back-Propagation Through Time).

Founder & CEO @ kikaben.com | C++, PyTorch | Machine Intelligence Enthusiast | twitter.com/naokishibuya

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

iOS 13 What’s New

Pix2Pix

Getting Started with Machine Learning

Recurrent Convolutional Neural Networks for Text Classification

Recurrent Neural Networks

Path to master Machine Learning

Distributed Training on Edge Devices: Large Batch vs. Federated Learning

How to combine several individual images into a single tiled image in MatLab?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Naoki

Naoki

Founder & CEO @ kikaben.com | C++, PyTorch | Machine Intelligence Enthusiast | twitter.com/naokishibuya

More from Medium

7 resources I like to find interesting machine learning papers

Basic Mathematics in Reinforcement Learning: Geometric Series

Train like labels can’t harm the learning: Learning with Noisy Labels

Explainable AI: a comprehensive review of the main methods