Skip to main content

11.6.1 Pretrained Models Roadmap: BERT, GPT, T5

Pretrained models move NLP from one-task training to a reusable foundation: pretrain on large text, then transfer to downstream tasks.

See the Paradigm Map First

BERT GPT T5 comparison chart

Learning order diagram for the pretrained language models chapter

Pretraining transfer finetune map

BERT emphasizes understanding, GPT emphasizes generation, and T5 rewrites many tasks into text-to-text form.

Run a Model Family Choice Check

task = {
"needs_generation": True,
"needs_sentence_label": False,
"needs_text_to_text": True,
}

if task["needs_text_to_text"]:
family = "T5-style text-to-text"
elif task["needs_generation"]:
family = "GPT-style autoregressive"
else:
family = "BERT-style understanding"

print("family:", family)
print("reason:", "match model objective to task output")

Expected output:

family: T5-style text-to-text
reason: match model objective to task output

Do not choose by model name alone. Match tokenizer, objective, output format, cost, and deployment constraints.

Learn in This Order

StepReadPractice Output
1Pretraining paradigmExplain pretrain → transfer → fine-tune/infer
2BERTUnderstand mask prediction and bidirectional representations
3GPTUnderstand next-token generation and context window
4T5Rewrite tasks into text-to-text form
5Transformers practiceConnect tokenizer, model, pipeline, input, output

Pass Check

You pass this chapter when you can explain why different objectives create different strengths, and run or design one small pretrained-model comparison experiment.