跳至主要内容

BERT (Bidirectional Encoder Representations from Transformers)

  • Contextualized word representations
  • Two phases:
    • Pre-training (semi-supervised) BERT
      • Masked Language Model: Predicts masked words (language modeling), so language understanding the bidirectional, whereas LM only uses left or right context.
      • Next Sentence Prediction: Modeling relationship between sentences.
    • Fine-tuning (supervised) for target tasks

BERT Family

Encoder-Based Pre-Trained Models:

  • XLNet
  • RoBERTa
  • SpanBERT
  • XLM
  • Multilingual BERT

References