NaokiLearning Transferable Visual Models From Natural Language Supervision (2021)Bridging the Gap Between Vision and Language — A Look at OpenAI’s CLIP ModelAug 13, 2023Aug 13, 2023
NaokiICL: Why Can GPT Learn In-Context? (2022)Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-OptimizersApr 30, 2023Apr 30, 2023
NaokiGPT-3: In-Context Few-Shot Learner (2020)Language Models are Few-Shot LearnersJan 4, 2023Jan 4, 2023
NaokiGPT-2: Too Dangerous To Release (2019)Language Models are Unsupervised Multitask LearnersDec 30, 2022Dec 30, 2022
NaokiDistilBERT — distilled version of BERTHow can we compress BERT while keeping 97% of the performance?Mar 6, 2022Mar 6, 2022
NaokiRoBERTa — Robustly optimized BERT approachHow did RoBERTa outperform XLNet with no architectural changes to the original BERT?Feb 20, 20221Feb 20, 20221
NaokiBERT — Bidirectional Encoder Representation from TransformersHow and Why Does It Use The Transformer Architecture?Feb 6, 2022Feb 6, 2022
NaokiTransformer’s Positional EncodingHow Does It Know Word Positions Without Recurrence?Oct 30, 2021Oct 30, 2021
NaokiBLEU (Bi-Lingual Evaluation Understudy)How do we evaluate a machine translation with reference sentences?Oct 19, 2021Oct 19, 2021
NaokiBeam Search for Machine TranslationHow Greedy, Exhaustive and Beam Search Algorithms WorkOct 17, 20211Oct 17, 20211
NaokiWord Embedding LookupHow does an embedded layer solve the curse of dimensionality problem?Oct 11, 2021Oct 11, 2021
NaokiNeural Machine Translation with Attention MechanismHow Does A Machine Translation Model Know Where To Look?Sep 28, 2021Sep 28, 2021
NaokiLong Short Term MemoryHow LSTM Mitigated the Vanishing Gradients but not the Exploding GradientsSep 26, 2021Sep 26, 2021