Open in app

Sign In

Write

Sign In

Mastodon
Naoki
Naoki

2.5K Followers

Home

About

Apr 30

ICL: Why Can GPT Learn In-Context? (2022)

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers — The paper Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers provides insights into how GPT-3 can learn from a few demonstrations and predict labels for unseen inputs. This ability is known as In-Context Learning (ICL). The paper explains how GPT-3 performs ICL as follows:

Artificial Intelligence

2 min read

ICL: Why Can GPT Learn In-Context? (2022)
ICL: Why Can GPT Learn In-Context? (2022)
Artificial Intelligence

2 min read


Jan 12

Hands-on Deep Learning with PyTorch

A Series of Video Lectures on YouTube — I’ve created a short course on deep learning with PyTorch. It contains a series of videos to cover various topics. I keep each video short and straight to the point. I hope you enjoy it! An overview of the lecture series Prerequisites to studying deep learning PyTorch Installation Tensors NumPy arrays (hands-on) PyTorch tensors (hands-on) Torchvision datasets (hands-on)

Pytorch

1 min read

Hands-on Deep Learning with PyTorch
Hands-on Deep Learning with PyTorch
Pytorch

1 min read


Jan 4

GPT-3: In-Context Few-Shot Learner (2020)

Language Models are Few-Shot Learners — #GPT #Transformer In 2020, OpenAI announced GPT-3, a generative language model with 175 billion parameters, 10x more than any previous language model, and published its performance on NLP benchmarks. However, it wasn’t just another size upgrade. GPT-3 showed the improved capability to handle tasks purely via text interaction. Those tasks…

Artificial Intelligence

7 min read

GPT-3: In-Context Few-Shot Learner (2020)
GPT-3: In-Context Few-Shot Learner (2020)
Artificial Intelligence

7 min read


Dec 30, 2022

GPT-2: Too Dangerous To Release (2019)

Language Models are Unsupervised Multitask Learners — #GPT #Transformer GPT-2 is a direct scale-up of GPT-1, with more parameters and trained on more data. However, it was deemed to be too dangerous to release by OpenAI: Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in…

Artificial Intelligence

4 min read

GPT-2: Too Dangerous To Release (2019)
GPT-2: Too Dangerous To Release (2019)
Artificial Intelligence

4 min read


Dec 26, 2022

GPT (2018)

Generative Pre-Trained Transformer — #GPT #Transformer In 2018, OpenAI released the first version of GPT (Generative Pre-Trained Transformer) for generating texts as if humans wrote. The architecture of GPT is based on the original transformer’s decoder. They trained GPT in two stages: Unsupervised Pre-training pre-trains GPT on unlabeled text, which taps into abundant text…

Artificial Intelligence

6 min read

GPT (2018)
GPT (2018)
Artificial Intelligence

6 min read


Nov 27, 2022

Longformer (2020)

The Long-Document Transformer — In 2020, researchers at Allen Institute for Artificial Intelligence (AI2) published “Longformer: The Long-Document Transformer”. AI2 is a non-profit research organization that hosts the Semantic Scholar website providing AI-driven search and discovery tools for research publications. …

Artificial Intelligence

6 min read

Longformer(2020)
Longformer(2020)
Artificial Intelligence

6 min read


Nov 4, 2022

Swin Transformer (2021)

Hierarchical Vision Transformer using Shifted Windows — In 2021, Microsoft announced a new Vision Transformer called Swin Transformer, which can act as a backbone for computer vision tasks like image classification, object detection, and semantic segmentation. The word Swin stands for Shifted windows that provide the Transformer with hierarchical vision, which is the main topic of this…

Artificial Intelligence

8 min read

Swin Transformer (2021)
Swin Transformer (2021)
Artificial Intelligence

8 min read


Nov 2, 2022

ViT: Vision Transformer (2020)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — In 2020, the Google Brain team developed Vision Transformer (ViT), an image classification model without a CNN (convolutional neural network). ViT directly applies a Transformer Encoder to sequences of image patches for classification. This article explains how ViT works. Vision Transformer Architecture Vision Transformer Overview

Artificialintelligenceai

5 min read

ViT: Vision Transformer (2020)
ViT: Vision Transformer (2020)
Artificialintelligenceai

5 min read


Oct 30, 2022

DETR (2020)

Object Detection with Transformers — In 2020, Meta (Facebook) AI built a new object detection model using the Transformer’s encoder and decoder architecture. They named it DETR ( DEtection TRansformer). Unlike YOLO or Faster R-CNN, it does not require box priors ( anchors) and post-processing ( NMS). …

Artificial Intelligence

10 min read

DETR (2020)
DETR (2020)
Artificial Intelligence

10 min read


Oct 24, 2022

YOLOv3 (2018)

An Incremental Improvement — YOLOv3 is the last version of YOLO that Joseph Redmon worked on. After this, he abandoned not only YOLO but also computer vision research altogether. We can read some of his reasoning for his decision in this paper. This article visits each section of the YOLOv3 paper and summarizes what…

Artificial Intelligence

10 min read

YOLOv3 (2018)
YOLOv3 (2018)
Artificial Intelligence

10 min read

Naoki

Naoki

2.5K Followers

Solopreneur @ kikaben.com

Following
  • Synced

    Synced

  • Jesus Rodriguez

    Jesus Rodriguez

  • Sik-Ho Tsang

    Sik-Ho Tsang

  • Ms Aerin

    Ms Aerin

  • Michael Bronstein

    Michael Bronstein

See all (8)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams