BERT (Bidirectional Encoder Representations from Transformers)
- Contextualized word representations
- Two phases:
- Pre-training (semi-supervised) BERT
- Masked Language Model: Predicts masked words (language modeling), so language understanding the bidirectional, whereas LM only uses left or right context.
- Next Sentence Prediction: Modeling relationship between sentences.
- Fine-tuning (supervised) for target tasks
- Pre-training (semi-supervised) BERT
BERT Family
Encoder-Based Pre-Trained Models:
- XLNet
- RoBERTa
- SpanBERT
- XLM
- Multilingual BERT