Table 1: Summary of PFMs in Text. The pretraining task includes language model (LM), masked LM
(MLM), permuted LM (PLM), denoising autoencoder (DAE), knowledge graphs (KG), and knowledge em-
bedding (KE).
Year Conference Model Architecture Embedding Training method Code
2013 NeurIPS Skip-Gram [66] Word2Vec Probabilistic - https://github.com/.../models
2014 EMNLP GloVe [67] Word2Vec Probabilistic - -
2015 NeurIPS LM-LSTM [68] LSTM Probabilistic LM https://github.com/.../GloVe
2016 IJCAI Shared LSTM [69] LSTM Probabilistic LM https://github.com/.../adversarial_text
2017 TACL FastText [70] Word2Vec Probabilistic - https://github.com/.../fastText
2017 NeurIPS CoVe [71] LSTM+Seq2Seq Probabilistic - https://github.com/.../cove
2018 NAACL-HLT ELMO [51] LSTM Contextual LM https://allennlp.org/elmo
2018 NAACL-HLT BERT [13] Transformer Encoder Contextual MLM https://github.com/.../bert
2018 OpenAI GPT [48] Transformer Decoder Autoregressive LM https://github.com/...transformer-lm
2019 ACL ERNIE(THU) Transformer Encoder Contextual MLM https://github.com/.../ERNIE
2019 ACL Transformer-XL [72] Transformer-XL Contextual - https://github.com/.../transformer-xl
2019 ICLR InfoWord [73] Transformer Encoder Contextual MLM -
2019 ICLR StructBERT [74] Transformer Encoder Contextual MLM -
2019 ICLR ALBERT [45] Transformer Encoder Contextual MLM https://github.com/.../ALBERT
2019 ICLR WKLM [75] Transformer Encoder Contextual MLM -
2019 ICML MASS [57] Transformer Contextual MLM(Seq2Seq) https://github.com/.../MASS
2019 EMNLP-IJCNLP KnowBERT [76] Transformer Encoder Contextual MLM https://github.com/.../kb
2019 EMNLP-IJCNLP Unicoder [77] Transformer Encoder Contextual MLM+TLM -
2019 EMNLP-IJCNLP MultiFit [78] QRNN Probabilistic LM https://github.com/.../multifit
2019 EMNLP-IJCNLP SciBERT [79] Transformer Encoder Contextual MLM https://github.com/.../scibert
2019 EMNLP-IJCNLP BERT-PKD [80] Transformer Encoder Contextual MLM https://github.com/...Compression
2019 NeurIPS Xlnet [14] Transformer-XL Encoder Permutation PLM https://github.com/.../xlnet
2019 NeurIPS UNILM [58] LSTM + Transformer Contextual LM + MLM https://github.com/.../unilm
2019 NeurIPS XLM [81] Transformer Encoder Contextual MLM+CLM+TLM https://github.com/.../XLM
2019 OpenAI Blog GPT-2 [49] Transformer Decoder Autoregressive LM https://github.com/.../gpt-2
2019 arXiv RoBERTa [53] Transformer Encoder Contextual MLM https://github.com/.../fairseq
2019 arXiv ERNIE(Baidu) [59] Transformer Encoder Contextual MLM+DLM https://github.com/.../ERNIE
2019 EMC2@NeurIPS Q8BERT [82] Transformer Encoder Contextual MLM https://github.com/.../quantized_bert.py
2019 arXiv DistilBERT [83] Transformer Encoder Contextual MLM https://github.com/.../distillation
2020 ACL fastBERT [84] Transformer Encoder Contextual MLM https://github.com/.../FastBERT
2020 ACL SpanBERT [42] Transformer Encoder Contextual MLM https://github.com/.../SpanBERT
2020 ACL BART [43] Transformer En: Contextual DAE https://github.com/.../transformers
De: Autoregressive
2020 ACL CamemBERT [85] Transformer Encoder Contextual MLM(WWM) https://camembert-model.fr
2020 ACL XLM-R [86] Transformer Encoder Contextual MLM https://github.com/.../XLM
2020 ICLR Reformer [87] Reformer Permutation - https://github.com/.../reformer
2020 ICLR ELECTRA [44] Transformer Encoder Contextual MLM https://github.com/.../electra
2020 AAAI Q-BERT [88] Transformer Encoder Contextual MLM -
2020 AAAI XNLG [89] Transformer Contextual MLM+DAE https://github.com/.../xnlg
2020 AAAI K-BERT [90] Transformer Encoder Contextual MLM https://github.com/.../K-BERT
2020 AAAI ERNIE 2.0 [60] Transformer Encoder Contextual MLM https://github.com/.../ERNIE
2020 NeurIPS GPT-3 [20] Transformer Decoder Autoregressive LM https://github.com/.../gpt-3
2020 NeurIPS MPNet [55] Transformer Encoder Permutation MLM+PLM https://github.com/.../MPNet
2020 NeurIPS ConvBERT [91] Mixed Attention Contextual - https://github.com/.../ConvBert
2020 NeurIPS MiniLM [92] Transformer Encoder Contextual MLM https://github.com/.../minilm
2020 TACL mBART [93] Transformer Contextual DAE https://github.com/.../mbart
2020 COLING CoLAKE [94] Transformer Encoder Contextual MLM+KE https://github.com/.../CoLAKE
2020 LREC FlauBERT [95] Transformer Encoder Contextual MLM https://github.com/.../Flaubert
2020 EMNLP GLM [96] Transformer Encoder Contextual MLM+KG https://github.com/.../GLM
2020 EMNLP (Findings) TinyBERT [97] Transformer Contextual MLM https://github.com/.../TinyBERT
2020 EMNLP (Findings) RobBERT [98] Transformer Encoder Contextual MLM https://github.com/.../RobBERT
2020 EMNLP (Findings) ZEN [62] Transformer Encoder Contextual MLM https://github.com/.../ZEN
2020 EMNLP (Findings) BERT-MK [99] KG-Transformer Encoder Contextual MLM -
2020 RepL4NLP@ACL CompressingBERT [33] Transformer Encoder Contextual MLM(Pruning) https://github.com/.../bert-prune
2020 JMLR T5 [100] Transformer Contextual MLM(Seq2Seq) https://github.com/...transformer
2021 T-ASL BERT-wwm-Chinese [61] Transformer Encoder Contextual MLM https://github.com/...BERT-wwm
2021 EACL PET [101] Transformer Encoder Contextual MLM https://github.com/.../pet
2021 TACL KEPLER [102] Transformer Encoder Contextual MLM+KE https://github.com/.../KEPLER
2021 EMNLP SimCSE [103] Transformer Encoder Contextual MLM+KE https://github.com/.../SimCSE
2021 ICML GLaM [104] Transformer Autoregressive LM -
2021 arXiv XLM-E [105] Transformer Contextual MLM
2021 arXiv T0 [106] Transformer Contextual MLM https://github.com/.../T0
2021 arXiv Gopher [107] Transformer Autoregressive LM -
2022 arXiv MT-NLG [108] Transformer Contextual MLM -
2022 arXiv LaMDA [65] Transformer Decoder Autoregressive LM https://github.com/.../LaMDA
2022 arXiv Chinchilla [109] Transformer Autoregressive LM -
2022 arXiv PaLM [41] Transformer Autoregressive LM https://github.com/.../PaLM
2022 arXiv OPT [110] Transformer Decoder Autoregressive LM https://github.com/.../MetaSeq
17