9.4.3 Short-Term Memory

Learning Objectives
Section titled “Learning Objectives”- Understand the difference between short-term memory and long-term memory
- Understand why you cannot keep stuffing the entire history into the model forever
- Master three common short-term memory approaches: conversation windows, runtime state, and summary memory
- Read a simple short-term memory manager
- Know the most common ways short-term memory fails
What Exactly Is Short-Term Memory?
Section titled “What Exactly Is Short-Term Memory?”A One-Sentence Definition
Section titled “A One-Sentence Definition”You can first think of short-term memory as:
The context and intermediate state that a system temporarily keeps in order to complete the current task.
It usually includes:
- The most recent few turns of conversation
- The current task goal
- The steps already executed
- Temporary intermediate results
How Is It Different from Long-Term Memory?
Section titled “How Is It Different from Long-Term Memory?”| Type | What it focuses on |
|---|---|
| Short-term memory | Information needed for the current task |
| Long-term memory | Information that remains valuable across tasks and sessions |
For example:
- “The user said they want to check the refund policy” -> short-term memory
- “This user likes concise answers” -> more like long-term memory
Why Can’t We Just Keep Feeding the Model All the History?
Section titled “Why Can’t We Just Keep Feeding the Model All the History?”Because the Context Window Is Not Infinite
Section titled “Because the Context Window Is Not Infinite”The model can only see a limited amount of context. If you keep stuffing all the history into it, you will run into:
- Higher and higher token cost
- Slower and slower responses
- Important information getting buried
More Information Is Not Always Better
Section titled “More Information Is Not Always Better”Many beginners think:
“If we give the model a bit more history, that should never hurt, right?”
Not necessarily.
If the context contains too much unrelated content, the model is more likely to:
- Focus on the wrong thing
- Repeat old information
- Forget what it is actually supposed to do right now
So the real job of short-term memory is not “the more the better,” but:
Keep the most useful information within a limited budget.
The Three Most Common Forms of Short-Term Memory
Section titled “The Three Most Common Forms of Short-Term Memory”Conversation Window (Sliding Window)
Section titled “Conversation Window (Sliding Window)”The simplest approach is:
- Keep only the most recent N turns of messages
Advantages:
- Simple
- Low implementation cost
Disadvantages:
- Important information from too long ago gets pushed out
Runtime State (Task State)
Section titled “Runtime State (Task State)”Instead of only remembering chat text, explicitly keep track of:
- The current task goal
- What has already been checked
- What the next step should be
This kind of state is especially important for Agents.
Summary Memory
Section titled “Summary Memory”When the history gets too long, don’t discard it entirely—compress it into a summary first.
For example:
- Keep the most recent 4 turns in full
- Compress older content into a short summary
This is a very common trade-off.
The Simplest Short-Term Memory: A Sliding Window
Section titled “The Simplest Short-Term Memory: A Sliding Window”Runnable Example
Section titled “Runnable Example”messages = [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hello, what can I help you with?"}, {"role": "user", "content": "I want to understand the refund policy"}, {"role": "assistant", "content": "Are you asking about the time limit or the specific conditions?"}, {"role": "user", "content": "Mainly the time limit"},]
window_size = 3short_term_memory = messages[-window_size:]
for msg in short_term_memory: print(msg)Expected output:
{'role': 'user', 'content': 'I want to understand the refund policy'}{'role': 'assistant', 'content': 'Are you asking about the time limit or the specific conditions?'}{'role': 'user', 'content': 'Mainly the time limit'}This Code Is Simple, but Still Very Important
Section titled “This Code Is Simple, but Still Very Important”It teaches you something essential:
Short-term memory is first and foremost a question of “which messages should be kept.”
Not every piece of history is worth carrying forward.
But a Message Window Alone Is Not Enough
Section titled “But a Message Window Alone Is Not Enough”Why Not?
Section titled “Why Not?”Look at this conversation:
- The user says, “I want to check the refund policy”
- Then they ask several other details in a row
- On the 10th turn, they ask, “Can I get a refund in my situation?”
If you only keep the most recent 3 turns, the system may have already forgotten:
- That the whole task was actually about “refunds”
So an Agent Also Needs Structured State
Section titled “So an Agent Also Needs Structured State”For example:
task_state = { "goal": "Help the user determine refund eligibility", "last_tool": "search_policy", "latest_policy_result": "Refunds are available within 7 days of purchase and if learning progress is below 20%"}
print(task_state)Expected output:
{'goal': 'Help the user determine refund eligibility', 'last_tool': 'search_policy', 'latest_policy_result': 'Refunds are available within 7 days of purchase and if learning progress is below 20%'}This kind of state is different from raw chat logs. It is more like:
The workspace for what the system is currently doing.
A More Teaching-Friendly Short-Term Memory Manager
Section titled “A More Teaching-Friendly Short-Term Memory Manager”The example below manages both:
- The most recent few messages
- The current task state
class ShortTermMemory: def __init__(self, max_messages=4): self.max_messages = max_messages self.messages = [] self.state = {}
def add_message(self, role, content): self.messages.append({"role": role, "content": content}) self.messages = self.messages[-self.max_messages:]
def update_state(self, **kwargs): self.state.update(kwargs)
def snapshot(self): return { "messages": self.messages, "state": self.state }
memory = ShortTermMemory(max_messages=3)memory.add_message("user", "I want to check the refund policy")memory.add_message("assistant", "Are you more concerned about the time limit or the conditions?")memory.add_message("user", "First, let’s look at the time limit")memory.update_state(goal="Determine refund eligibility", topic="refund policy")
print(memory.snapshot())Expected output:
{'messages': [{'role': 'user', 'content': 'I want to check the refund policy'}, {'role': 'assistant', 'content': 'Are you more concerned about the time limit or the conditions?'}, {'role': 'user', 'content': 'First, let’s look at the time limit'}], 'state': {'goal': 'Determine refund eligibility', 'topic': 'refund policy'}}
What Makes This Example Better Than “Just Storing Message History”?
Section titled “What Makes This Example Better Than “Just Storing Message History”?”Because it splits short-term memory into two layers:
- Text context
- Structured state
This is very important in Agent systems.
Evidence to Keep
Section titled “Evidence to Keep”Keep this page’s proof of learning as a small evidence card:
- Memory Type
- short-term, long-term, episodic, or procedural
- Write Rule
- when memory is created or updated
- Retrieve Rule
- query, relevance, recency, and permission check
- Failure Check
- stale memory, privacy leak, contradiction, or over-retrieval
- Cleanup Action
- summarize, merge, expire, delete, or ask for confirmation
Summary Memory: What Should We Do When Messages Keep Growing?
Section titled “Summary Memory: What Should We Do When Messages Keep Growing?”A Common Strategy
Section titled “A Common Strategy”In real systems, this is a very common approach:
- Keep the most recent few turns as-is
- Compress older history into a summary
A Simplified Example
Section titled “A Simplified Example”old_messages = [ "The user first asked about the refund policy", "Then they asked about certificate requirements", "Finally they returned to the refund conditions"]
summary = "The user’s main goal in this session is to determine whether they meet the refund conditions, and they also asked about certificates along the way."
recent_messages = [ {"role": "user", "content": "Can I still get a refund if my learning progress is 30%?"}]
memory_package = { "summary": summary, "recent_messages": recent_messages}
print(memory_package)Expected output:
{'summary': 'The user’s main goal in this session is to determine whether they meet the refund conditions, and they also asked about certificates along the way.', 'recent_messages': [{'role': 'user', 'content': 'Can I still get a refund if my learning progress is 30%?'}]}This is the most basic “summary + recent window” idea.
What Does Short-Term Memory Actually Solve in an Agent?
Section titled “What Does Short-Term Memory Actually Solve in an Agent?”It mainly solves three things:
Keeping the Current Task Coherent
Section titled “Keeping the Current Task Coherent”The system should not restart from scratch at every step as if it were seeing the user for the first time.
Preserving State Across Multi-Step Execution
Section titled “Preserving State Across Multi-Step Execution”For example:
- Which tool has already been called
- What has already been found
- What step is still missing
Controlling Context Cost
Section titled “Controlling Context Cost”Short-term memory is not only about “remembering.” It is also about:
- Avoiding unnecessary content
- Reducing token cost
- Improving response stability
The Most Common Ways Short-Term Memory Fails
Section titled “The Most Common Ways Short-Term Memory Fails”Remembering Too Little
Section titled “Remembering Too Little”Symptoms:
- The system suddenly forgets what it was just talking about
Remembering Too Much
Section titled “Remembering Too Much”Symptoms:
- The context becomes long and messy
- Answers drift off track
- Cost goes up
Storing Only Messages, Not State
Section titled “Storing Only Messages, Not State”Symptoms:
- Multi-step tasks easily break down
- The connection between tool calls before and after becomes weak
Storing Only State, Not the Original Dialogue
Section titled “Storing Only State, Not the Original Dialogue”Symptoms:
- The user’s original wording gets lost easily
- Tone, constraints, and details disappear
So short-term memory is usually not “choose just one,” but rather a combined design.
Common Pitfalls for Beginners
Section titled “Common Pitfalls for Beginners”Mixing Up Short-Term Memory and Long-Term Memory
Section titled “Mixing Up Short-Term Memory and Long-Term Memory”Short-term memory is for the current task, not for a complete user profile.
Thinking a Bigger Message Window Is Always Better
Section titled “Thinking a Bigger Message Window Is Always Better”A window that is too large also brings noise and cost.
Ignoring Structured State
Section titled “Ignoring Structured State”This can make an Agent start drifting as soon as the task becomes multi-step.
Summary
Section titled “Summary”The most important thing in this section is not to memorize the words “window” or “summary,” but to grasp this main idea:
The goal of short-term memory is not to preserve history forever, but to maintain coherence for the current task within a limited context.
Well-designed short-term memory usually includes both recent messages and task state, and sometimes an additional layer of summary compression.
Exercises
Section titled “Exercises”- Extend the
ShortTermMemoryexample in this section to support asummaryfield. - Change the maximum message window from 3 to 5 and observe how the
snapshot()output changes. - Think about this: if an Agent often forgets “which tool it has already called,” would you first expand the message window or add structured state?
- Explain in your own words: why do we say short-term memory solves “current task coherence” rather than “long-term user profiling”?
Reference implementation and walkthrough
- A
summaryfield can compress older turns into a short current-task note while the raw message window keeps the latest details. - Changing the window from 3 to 5 should keep more recent messages in
snapshot(), which may improve coherence but also adds noise and tokens. - If the Agent forgets which tools it already called, add structured state first. Enlarging the message window is a weaker and more expensive fix.
- Short-term memory keeps the current task coherent: goal, constraints, recent corrections, tool results, and next action. It is not meant to become a permanent user profile.