Skip to main content

A.3 AI Development History: 15 Stages and Key Papers

AI 15-stage development history map

This appendix is optional. Read it when you want a historical “where did this come from?” map, not when you need to memorize papers for the first pass.

Use it in this order:

  1. Look at the 15-stage picture.
  2. Scan the stage table.
  3. Pick only the stage that matches the chapter you are studying.
  4. Come back later when a paper or algorithm name appears again.

The 15-stage map

StageBeginner meaningCourse anchor
1. The AI questionCan a machine show intelligent behavior?Intro
2. Symbolic AIHumans write rules; machines reason with rulesBackground
3. Expert systemsDomain knowledge becomes rule-based softwareSystem thinking
4. Probability and statisticsUse uncertainty and evidence, not only fixed rulesChapter 4
5. Classic machine learningLearn patterns from data and featuresChapter 5
6. Early neural networksA model learns simple decision boundariesChapters 5-6
7. BackpropagationMulti-layer networks become trainableChapter 6
8. Kernel and ensemble eraSVM, trees, forests, and boosting make ML practicalChapter 5
9. Deep learning breakthroughData + GPUs + deep networks unlock vision and speechChapters 6 and 10
10. Embeddings and sequence modelsText becomes vectors; sequences become learnableChapter 11
11. Transformer and pretrainingAttention makes large-scale language models practicalChapters 6-7
12. LLM and alignmentModels become instruction-following assistantsChapter 7
13. RAGModels connect to external knowledge and citationsChapter 8
14. Agent and tool useModels plan, call tools, and leave tracesChapter 9
15. Multimodal and AIGCAI works across text, image, speech, video, and generationChapter 12

The pattern is simple: every stage solves a bottleneck from the previous stage, then creates a new engineering problem.

Read the main storyline as a relay

AI Main Line Relay Map

AI history is easier to remember as a relay than as a list of names:

Relay handoffWhat changed
Rules -> probabilitySystems moved from fixed logic to uncertain evidence
Probability -> MLModels started learning patterns from data
ML -> deep learningFeatures became learned, not fully hand-designed
Deep learning -> TransformerSequence modeling became easier to scale
LLM -> RAG / AgentModels connected to knowledge, tools, and workflows
Text -> multimodalAI started understanding and generating multiple media types

Six turning points worth remembering

AI History Turning Points Comic Strip

Turning pointWhy beginners should care
PerceptronThe first strong feeling that machines might learn from data
XOR limitationA reminder that simple linear models are not enough
BackpropagationMulti-layer neural networks became trainable in practice
AlexNetData, GPUs, and deep CNNs made deep learning explode
TransformerAttention replaced the old sequence-modeling main line
RAG / AgentModels moved from answering text to using knowledge and tools

Do not memorize every year first. Remember the shape: hope, setback, repair, scale, and engineering.

How to read a paper node

AI Paper Problem-Solution-Impact Chain

For any paper or algorithm, ask only four questions first:

QuestionExample: Attention Is All You Need
What old bottleneck existed?RNNs were hard to parallelize and struggled with long paths
What new method appeared?Self-attention, multi-head attention, positional encoding
What new capability opened?Scalable sequence modeling and later large language models
Which projects changed?LLMs, RAG, Agent systems, multimodal models

This is enough for beginner-level historical understanding. Formula details can wait until the relevant chapter.

Key nodes by course line

AI Timeline Map from the Project Perspective

Course lineKey nodes to recognize firstWhy they matter
Math foundationsBayes, Shannon, maximum likelihood, EMProbability, information, and loss functions
Classic MLCART, SVM, Random Forest, AdaBoost, XGBoostStrong baselines and tabular-data engineering
Neural networksPerceptron, XOR, Backpropagation, LSTM, AlexNet, ResNetWhy depth, gradients, data, and compute matter
NLP and LLMWord2Vec, Seq2Seq, Transformer, BERT, GPT, InstructGPTThe path from word vectors to assistants
RAG and AgentRAG, Chain-of-Thought, ReAct, ToolformerExternal knowledge, reasoning traces, and tool use
MultimodalCLIP, DDPM, Latent Diffusion, Whisper, SAMText, image, speech, video, and generation pipelines

Some entries are landmark papers. Some are algorithm families or historical turning points. That is fine. The useful question is always: what problem did this node make easier?

Optional visual branches

Use these only when you are studying the related chapter.

Timeline of Three Neural Network Waves and Two Valleys

Classic Machine Learning Branch Map

NLP to LLM Lineage Map

Alignment, Agent, and Systems Main Line Map

LLM to Agent Engineering Evolution Timeline

Multimodal and AIGC Lineage Map

Fast chapter lookup

If you see this nameGo back to
Bayes, MLE, entropy, EMChapter 4 math foundations
SVM, Random Forest, XGBoostChapter 5 machine learning
Perceptron, backpropagation, CNN, LSTM, TransformerChapter 6 deep learning
GPT, RLHF, LoRA, instruction tuningChapter 7 LLM principles
RAG, vector retrieval, citationsChapter 8 RAG
Chain-of-Thought, ReAct, Toolformer, tool useChapter 9 Agent
AlexNet, ResNet, YOLO, SAMChapter 10 computer vision
Word2Vec, Seq2Seq, BERT, GPTChapter 11 NLP
CLIP, diffusion, Whisper, multimodal generationChapter 12 multimodal

Mini exercise

Pick any 3 nodes and rewrite them in project language:

Node: Attention Is All You Need
Old bottleneck: RNNs were not ideal for long sequences or parallel training.
New method: self-attention became the main line of sequence modeling.
Projects affected: LLMs, RAG, Agent systems, multimodal models.
Course chapter to revisit: Chapters 6, 7, 8, and 9.

The goal is not to recite history. The goal is to connect a historical node to a real capability you may build later.