- learn: transformers
- books
- Jan 2013: Efficient Estimation of Word Representations in Vector Space
- Sep 2014: Neural Machine Translation by Jointly Learning to Align and Translate - Attention
- Sep 2014: Sequence to Sequence Learning with Neural Networks
- Jun 2015: Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books - BookCorpus
- Apr 2017: Learning to Generate Reviews and Discovering Sentiment
- Jun 2017: Attention Is All You Need - Transformer
- Jan 2018: Universal Language Model Fine-tuning for Text Classification - ULMFiT
- Feb 2018: Deep contextualized word representations - ELMo
- Jun 2018: Improving Language Understanding by Generative Pre-Training - GPT
- Oct 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Oct 2018: CARER: Contextualized Affect Representations for Emotion Recognition - dataset
- Oct 2019: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- May 2015: The Unreasonable Effectiveness of Recurrent Neural Networks
- Jun 2018: The Illustrated Transformer
- Dec 2022: A Mathematical Framework for Transformer Circuits