Natural Language Processing
Word Embedding Lookup
How does an embedding layer solve the curse of dimensionality problem?
This article reviews A Neural Probabilistic Language Model (2003) by Yoshua Bengio et al. In the paper, the authors proposed to train a neural language model end-to-end, including a learnable word embedding layer.
Their neural language model significantly outperformed the best n-grams model at that time, thanks to the embedding layer solving the curse of dimensionality problem.
We discuss the following topics:
- A Probabilistic Language Model
- Curse of Dimensionality
- Distributed Representation
- Embedded Lookup
- Word Similarity and Probability
- Neural Language Model
A Probabilistic Language Model
A probabilistic language model predicts the next word given a sequence of words before that.
For example, given the following sequence of words, what would be the next word?