Skip to content

11.6.1 Pretrained Models Roadmap: BERT, GPT, T5

Pretrained models move NLP from one-task training to a reusable foundation: pretrain on large text, then transfer to downstream tasks.

BERT GPT T5 comparison chart

Learning order diagram for the pretrained language models chapter

Pretraining transfer finetune map

BERT emphasizes understanding, GPT emphasizes generation, and T5 rewrites many tasks into text-to-text form.

task = {
"needs_generation": True,
"needs_sentence_label": False,
"needs_text_to_text": True,
}
if task["needs_text_to_text"]:
family = "T5-style text-to-text"
elif task["needs_generation"]:
family = "GPT-style autoregressive"
else:
family = "BERT-style understanding"
print("family:", family)
print("reason:", "match model objective to task output")

Expected output:

Terminal window
family: T5-style text-to-text
reason: match model objective to task output

Do not choose by model name alone. Match tokenizer, objective, output format, cost, and deployment constraints.

StepReadPractice Output
1Pretraining paradigmExplain pretrain → transfer → fine-tune/infer
2BERTUnderstand mask prediction and bidirectional representations
3GPTUnderstand next-token generation and context window
4T5Rewrite tasks into text-to-text form
5Transformers practiceConnect tokenizer, model, pipeline, input, output

Keep this page’s proof of learning as a small evidence card:

Model Choice
BERT, GPT, T5, Transformers pipeline, or other pretrained baseline
Tokenizer Output
ids, masks, decoded text, or batch shape
Task Result
classification, generation, extraction, or text-to-text output
Failure Check
wrong model family, token limit, domain mismatch, cost, or latency
Expected Output
model call result plus a short choice rationale

You pass this chapter when you can explain why different objectives create different strengths, and run or design one small pretrained-model comparison experiment.

Check reasoning and explanation
  1. A passing answer starts from the text unit and output type: token, span, sentence label, sequence, embedding, or generated text.
  2. The evidence should include a small dataset example, model or pipeline choice, metric, and at least one inspected error case.
  3. A good self-check distinguishes preprocessing issues from model issues, such as tokenization mistakes, label ambiguity, data imbalance, or hallucinated generation.