Skip to content

A.3 AI Development History: 15 Stages and Key Papers

AI 15-stage development history map

This appendix is optional. Read it when you want a historical “where did this come from?” map, not when you need to memorize papers for the first pass.

Use it in this order:

  1. Look at the 15-stage picture.
  2. Scan the stage table.
  3. Pick only the stage that matches the chapter you are studying.
  4. Come back later when a paper or algorithm name appears again.
StageBeginner meaningCourse anchor
1. The AI questionCan a machine show intelligent behavior?Intro
2. Symbolic AIHumans write rules; machines reason with rulesBackground
3. Expert systemsDomain knowledge becomes rule-based softwareSystem thinking
4. Probability and statisticsUse uncertainty and evidence, not only fixed rulesChapter 4
5. Classic machine learningLearn patterns from data and featuresChapter 5
6. Early neural networksA model learns simple decision boundariesChapters 5-6
7. BackpropagationMulti-layer networks become trainableChapter 6
8. Kernel and ensemble eraSVM, trees, forests, and boosting make ML practicalChapter 5
9. Deep learning breakthroughData + GPUs + deep networks unlock vision and speechChapters 6 and 10
10. Embeddings and sequence modelsText becomes vectors; sequences become learnableChapter 11
11. Transformer and pretrainingAttention makes large-scale language models practicalChapters 6-7
12. LLM and alignmentModels become instruction-following assistantsChapter 7
13. RAGModels connect to external knowledge and citationsChapter 8
14. Agent and tool useModels plan, call tools, and leave tracesChapter 9
15. Multimodal and AIGCAI works across text, image, speech, video, and generationChapter 12

The pattern is simple: every stage solves a bottleneck from the previous stage, then creates a new engineering problem.

AI Main Line Relay Map

AI history is easier to remember as a relay than as a list of names:

Relay handoffWhat changed
Rules -> probabilitySystems moved from fixed logic to uncertain evidence
Probability -> MLModels started learning patterns from data
ML -> deep learningFeatures became learned, not fully hand-designed
Deep learning -> TransformerSequence modeling became easier to scale
LLM -> RAG / AgentModels connected to knowledge, tools, and workflows
Text -> multimodalAI started understanding and generating multiple media types

AI History Turning Points Comic Strip

Turning pointWhy beginners should care
PerceptronThe first strong feeling that machines might learn from data
XOR limitationA reminder that simple linear models are not enough
BackpropagationMulti-layer neural networks became trainable in practice
AlexNetData, GPUs, and deep CNNs made deep learning explode
TransformerAttention replaced the old sequence-modeling main line
RAG / AgentModels moved from answering text to using knowledge and tools

Do not memorize every year first. Remember the shape: hope, setback, repair, scale, and engineering.

AI Paper Problem-Solution-Impact Chain

For any paper or algorithm, ask only four questions first. Keep the example short enough to fit in a note card:

QuestionBeginner answer pattern
Old bottleneckName the old limitation. For Transformer, RNNs were hard to parallelize and long paths were costly.
New methodName the mechanism. For Transformer, self-attention became the key move.
New capabilityName what became easier. Large-scale sequence modeling became practical.
Projects changedName downstream systems. LLM, RAG, Agent, and multimodal projects all inherit this shift.

This is enough for beginner-level historical understanding. Formula details can wait until the relevant chapter.

AI Timeline Map from the Project Perspective

Course lineKey nodes to recognize firstWhy they matter
Math foundationsBayes, Shannon, maximum likelihood, EMProbability, information, and loss functions
Classic MLCART, SVM, Random Forest, AdaBoost, XGBoostStrong baselines and tabular-data engineering
Neural networksPerceptron, XOR, Backpropagation, LSTM, AlexNet, ResNetWhy depth, gradients, data, and compute matter
NLP and LLMWord2Vec, Seq2Seq, Transformer, BERT, GPT, InstructGPTThe path from word vectors to assistants
RAG and AgentRAG, Chain-of-Thought, ReAct, ToolformerExternal knowledge, reasoning traces, and tool use
MultimodalCLIP, DDPM, Latent Diffusion, Whisper, SAMText, image, speech, video, and generation pipelines

Some entries are landmark papers. Some are algorithm families or historical turning points. That is fine. The useful question is always: what problem did this node make easier?

Use these only when you are studying the related chapter. Read each branch as a small answer to two questions: what bottleneck changed, and which course chapter should I revisit?

Timeline of Three Neural Network Waves and Two Valleys

Start with the neural-network waves when Chapters 6 and 7 mention perceptron, backpropagation, CNN, or Transformer. Notice where enthusiasm rises, falls, and returns with data and compute.

Classic Machine Learning Branch Map

Use the classic ML branch when Chapter 5 names SVM, trees, forests, boosting, or XGBoost. Compare which branch is about decision boundaries, which is about ensembles, and which is about strong tabular baselines.

NLP to LLM Lineage Map

Use the NLP branch when Chapter 11 introduces tokenization, embeddings, Seq2Seq, BERT, or GPT. Read it as the path from word meaning to instruction-following assistants.

Alignment, Agent, and Systems Main Line Map

Use this systems branch when Chapters 7-9 discuss instruction tuning, RLHF, tool use, traces, or deployment. The important pattern is that model quality and system control improve together.

LLM to Agent Engineering Evolution Timeline

Use the engineering timeline when you are deciding whether a project needs only prompting, retrieval, tools, or a full Agent loop. Do not skip the trace and evaluation checkpoints.

Multimodal and AIGC Lineage Map

Use the multimodal branch with Chapter 12. Follow how text, image, speech, and segmentation models become reusable parts of a generation pipeline.

If you see this nameGo back to
Bayes, MLE, entropy, EMChapter 4 math foundations
SVM, Random Forest, XGBoostChapter 5 machine learning
Perceptron, backpropagation, CNN, LSTM, TransformerChapter 6 deep learning
GPT, RLHF, LoRA, instruction tuningChapter 7 LLM principles
RAG, vector retrieval, citationsChapter 8 RAG
Chain-of-Thought, ReAct, Toolformer, tool useChapter 9 Agent
AlexNet, ResNet, YOLO, SAMChapter 10 computer vision
Word2Vec, Seq2Seq, BERT, GPTChapter 11 NLP
CLIP, diffusion, Whisper, multimodal generationChapter 12 multimodal

Pick any 3 nodes and rewrite them in project language:

Example project card:

  • Node: Attention Is All You Need
  • Old bottleneck: RNNs were not ideal for long sequences or parallel training.
  • New method: self-attention became the main line of sequence modeling.
  • Projects affected: LLMs, RAG, Agent systems, multimodal models.
  • Course chapter to revisit: Chapters 6, 7, 8, and 9.

The goal is not to recite history. The goal is to connect a historical node to a real capability you may build later.

Project reference and review notes

One acceptable answer could use these three nodes:

Backpropagation

  • Old bottleneck: multilayer neural networks were hard to train effectively.
  • New method: gradients could be propagated layer by layer.
  • Projects affected: image classifiers, language models, and nearly all deep learning systems.
  • Course chapter to revisit: Chapter 6.

RAG

  • Old bottleneck: language models could answer fluently without grounded external evidence.
  • New method: retrieval adds relevant documents before generation.
  • Projects affected: knowledge assistants, policy Q&A, citation-aware research tools.
  • Course chapter to revisit: Chapter 8.

CLIP

  • Old bottleneck: image and text models were often trained in separate spaces.
  • New method: contrastive training aligned images and text.
  • Projects affected: image search, multimodal retrieval, image generation guidance.
  • Course chapter to revisit: Chapter 12.

The answer is strong when each node names a real bottleneck, a method shift, an affected project type, and a chapter to revisit. It is weak if it only lists famous names without explaining what became easier.

Keep this page’s proof of learning as a small evidence card:

Timeline Anchor
stage, key idea, representative paper/system, and why it mattered
Chapter Link
which course chapter this milestone helps explain
Memory Hook
diagram, comic panel, or one-sentence historical turn
Failure Check
memorizing names without understanding the problem each milestone solved
Expected Output
a short timeline note connected to at least one project decision