Member-only story

ICL: Why Can GPT Learn In-Context? (2022)

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

11 min readApr 30, 2023

--

GPT-3 has shown surprising In-Context Learning (ICL) ability, which Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers explains as a kind of implicit fine-tuning.

With ICL, GPT-3 can learn from a few demonstrations (input-label pairs) and predict the labels for unseen inputs. It can do so without additional parameter updates.

But how does it do that?

According to the paper, they hypothesize that:

  • GPT first produces meta-gradients according to the demonstration examples.
  • Then, it applies the meta-gradients to the original GPT to build an ICL model.

So, let’s dive into the paper to see how GPT learns in-context.

Meta-Gradients

The paper explains that ICL and explicit fine-tuning are both gradient descent. The difference is that explicit fine-tuning uses gradients while ICL uses meta-gradients.

The below figure shows the difference and the similarity.

--

--

No responses yet