9.4.3 Short-Term Memory

Short-term memory context window and runtime state

Learning Objectives

Understand the difference between short-term memory and long-term memory
Understand why you cannot keep stuffing the entire history into the model forever
Master three common short-term memory approaches: conversation windows, runtime state, and summary memory
Read a simple short-term memory manager
Know the most common ways short-term memory fails

What Exactly Is Short-Term Memory?

A One-Sentence Definition

You can first think of short-term memory as:

The context and intermediate state that a system temporarily keeps in order to complete the current task.

It usually includes:

The most recent few turns of conversation
The current task goal
The steps already executed
Temporary intermediate results

How Is It Different from Long-Term Memory?

Type	What it focuses on
Short-term memory	Information needed for the current task
Long-term memory	Information that remains valuable across tasks and sessions

For example:

“The user said they want to check the refund policy” -> short-term memory
“This user likes concise answers” -> more like long-term memory

Why Can’t We Just Keep Feeding the Model All the History?

Because the Context Window Is Not Infinite

The model can only see a limited amount of context. If you keep stuffing all the history into it, you will run into:

Higher and higher token cost
Slower and slower responses
Important information getting buried

More Information Is Not Always Better

Many beginners think:

“If we give the model a bit more history, that should never hurt, right?”

Not necessarily.

If the context contains too much unrelated content, the model is more likely to:

Focus on the wrong thing
Repeat old information
Forget what it is actually supposed to do right now

So the real job of short-term memory is not “the more the better,” but:

Keep the most useful information within a limited budget.

The Three Most Common Forms of Short-Term Memory

Conversation Window (Sliding Window)

The simplest approach is:

Keep only the most recent N turns of messages

Advantages:

Simple
Low implementation cost

Disadvantages:

Important information from too long ago gets pushed out

Runtime State (Task State)

Instead of only remembering chat text, explicitly keep track of:

The current task goal
What has already been checked
What the next step should be

This kind of state is especially important for Agents.

Summary Memory

When the history gets too long, don’t discard it entirely—compress it into a summary first.

For example:

Keep the most recent 4 turns in full
Compress older content into a short summary

This is a very common trade-off.

The Simplest Short-Term Memory: A Sliding Window

Runnable Example

messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hello, what can I help you with?"},
    {"role": "user", "content": "I want to understand the refund policy"},
    {"role": "assistant", "content": "Are you asking about the time limit or the specific conditions?"},
    {"role": "user", "content": "Mainly the time limit"},
]

window_size = 3
short_term_memory = messages[-window_size:]

for msg in short_term_memory:
    print(msg)

Expected output:

{'role': 'user', 'content': 'I want to understand the refund policy'}
{'role': 'assistant', 'content': 'Are you asking about the time limit or the specific conditions?'}
{'role': 'user', 'content': 'Mainly the time limit'}

This Code Is Simple, but Still Very Important

It teaches you something essential:

Short-term memory is first and foremost a question of “which messages should be kept.”

Not every piece of history is worth carrying forward.

But a Message Window Alone Is Not Enough

Why Not?

Look at this conversation:

The user says, “I want to check the refund policy”
Then they ask several other details in a row
On the 10th turn, they ask, “Can I get a refund in my situation?”

If you only keep the most recent 3 turns, the system may have already forgotten:

That the whole task was actually about “refunds”

So an Agent Also Needs Structured State

For example:

task_state = {
    "goal": "Help the user determine refund eligibility",
    "last_tool": "search_policy",
    "latest_policy_result": "Refunds are available within 7 days of purchase and if learning progress is below 20%"
}

print(task_state)

Expected output:

{'goal': 'Help the user determine refund eligibility', 'last_tool': 'search_policy', 'latest_policy_result': 'Refunds are available within 7 days of purchase and if learning progress is below 20%'}

This kind of state is different from raw chat logs. It is more like:

The workspace for what the system is currently doing.

A More Teaching-Friendly Short-Term Memory Manager

The example below manages both:

The most recent few messages
The current task state

class ShortTermMemory:
    def __init__(self, max_messages=4):
        self.max_messages = max_messages
        self.messages = []
        self.state = {}

    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        self.messages = self.messages[-self.max_messages:]

    def update_state(self, **kwargs):
        self.state.update(kwargs)

    def snapshot(self):
        return {
            "messages": self.messages,
            "state": self.state
        }

memory = ShortTermMemory(max_messages=3)
memory.add_message("user", "I want to check the refund policy")
memory.add_message("assistant", "Are you more concerned about the time limit or the conditions?")
memory.add_message("user", "First, let’s look at the time limit")
memory.update_state(goal="Determine refund eligibility", topic="refund policy")

print(memory.snapshot())

Expected output:

{'messages': [{'role': 'user', 'content': 'I want to check the refund policy'}, {'role': 'assistant', 'content': 'Are you more concerned about the time limit or the conditions?'}, {'role': 'user', 'content': 'First, let’s look at the time limit'}], 'state': {'goal': 'Determine refund eligibility', 'topic': 'refund policy'}}

Short-term memory snapshot result map

What Makes This Example Better Than “Just Storing Message History”?

Because it splits short-term memory into two layers:

Text context
Structured state

This is very important in Agent systems.

Evidence to Keep

Keep this page’s proof of learning as a small evidence card:

Memory Type: short-term, long-term, episodic, or procedural
Write Rule: when memory is created or updated
Retrieve Rule: query, relevance, recency, and permission check
Failure Check: stale memory, privacy leak, contradiction, or over-retrieval
Cleanup Action: summarize, merge, expire, delete, or ask for confirmation

Summary Memory: What Should We Do When Messages Keep Growing?

A Common Strategy

In real systems, this is a very common approach:

Keep the most recent few turns as-is
Compress older history into a summary

A Simplified Example

old_messages = [
    "The user first asked about the refund policy",
    "Then they asked about certificate requirements",
    "Finally they returned to the refund conditions"
]

summary = "The user’s main goal in this session is to determine whether they meet the refund conditions, and they also asked about certificates along the way."

recent_messages = [
    {"role": "user", "content": "Can I still get a refund if my learning progress is 30%?"}
]

memory_package = {
    "summary": summary,
    "recent_messages": recent_messages
}

print(memory_package)

Expected output:

{'summary': 'The user’s main goal in this session is to determine whether they meet the refund conditions, and they also asked about certificates along the way.', 'recent_messages': [{'role': 'user', 'content': 'Can I still get a refund if my learning progress is 30%?'}]}

This is the most basic “summary + recent window” idea.

What Does Short-Term Memory Actually Solve in an Agent?

It mainly solves three things:

Keeping the Current Task Coherent

The system should not restart from scratch at every step as if it were seeing the user for the first time.

Preserving State Across Multi-Step Execution

For example:

Which tool has already been called
What has already been found
What step is still missing

Controlling Context Cost

Short-term memory is not only about “remembering.” It is also about:

Avoiding unnecessary content
Reducing token cost
Improving response stability

The Most Common Ways Short-Term Memory Fails

Remembering Too Little

Symptoms:

The system suddenly forgets what it was just talking about

Remembering Too Much

Symptoms:

The context becomes long and messy
Answers drift off track
Cost goes up

Storing Only Messages, Not State

Symptoms:

Multi-step tasks easily break down
The connection between tool calls before and after becomes weak

Storing Only State, Not the Original Dialogue

Symptoms:

The user’s original wording gets lost easily
Tone, constraints, and details disappear

So short-term memory is usually not “choose just one,” but rather a combined design.

Common Pitfalls for Beginners

Mixing Up Short-Term Memory and Long-Term Memory

Short-term memory is for the current task, not for a complete user profile.

Thinking a Bigger Message Window Is Always Better

A window that is too large also brings noise and cost.

Ignoring Structured State

This can make an Agent start drifting as soon as the task becomes multi-step.

Summary

The most important thing in this section is not to memorize the words “window” or “summary,” but to grasp this main idea:

The goal of short-term memory is not to preserve history forever, but to maintain coherence for the current task within a limited context.

Well-designed short-term memory usually includes both recent messages and task state, and sometimes an additional layer of summary compression.

Exercises

Extend the ShortTermMemory example in this section to support a summary field.
Change the maximum message window from 3 to 5 and observe how the snapshot() output changes.
Think about this: if an Agent often forgets “which tool it has already called,” would you first expand the message window or add structured state?
Explain in your own words: why do we say short-term memory solves “current task coherence” rather than “long-term user profiling”?

Reference implementation and walkthrough

A summary field can compress older turns into a short current-task note while the raw message window keeps the latest details.
Changing the window from 3 to 5 should keep more recent messages in snapshot(), which may improve coherence but also adds noise and tokens.
If the Agent forgets which tools it already called, add structured state first. Enlarging the message window is a weaker and more expensive fix.
Short-term memory keeps the current task coherent: goal, constraints, recent corrections, tool results, and next action. It is not meant to become a permanent user profile.