Member-only story
ICL: Why Can GPT Learn In-Context? (2022)
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
GPT-3 has shown surprising In-Context Learning (ICL) ability, which Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers explains as a kind of implicit fine-tuning.
With ICL, GPT-3 can learn from a few demonstrations (input-label pairs) and predict the labels for unseen inputs. It can do so without additional parameter updates.
But how does it do that?
According to the paper, they hypothesize that:
- GPT first produces meta-gradients according to the demonstration examples.
- Then, it applies the meta-gradients to the original GPT to build an ICL model.
So, let’s dive into the paper to see how GPT learns in-context.
Meta-Gradients
The paper explains that ICL and explicit fine-tuning are both gradient descent. The difference is that explicit fine-tuning uses gradients while ICL uses meta-gradients.
The below figure shows the difference and the similarity.