· 6 min read
Every agent operates within a finite context window. This constraint is the single most important architectural consideration for production agent systems. When an agent runs out of context space, it starts losing information — earlier conversation turns, tool results, and instructions get truncated or dropped. The agent does not know what it has forgotten, so it continues operating with an incomplete picture, producing subtly wrong results.
The Claude Agent SDK provides the execution framework, but context management is your responsibility. You decide what goes into the context window, how long it stays, and what happens when space runs low. For single-turn agents, this is trivial. For multi-turn agents with tools, handoffs, and long-running tasks, context management determines whether your system works reliably or degrades unpredictably.
This guide covers the strategies for keeping agents effective across long interactions: conversation summarization, external memory stores, context window budgeting, and state management patterns that work across agent boundaries.
Context management becomes critical when:
Simple, short-lived agents (1-3 turns, no tools) rarely hit context limits. But any agent that uses tools, operates in multi-turn conversations, or participates in multi-agent workflows needs explicit context management.
Monitor how much context your agents consume to identify when management strategies are needed.
import { Agent } from "@anthropic-ai/agent-sdk";
interface ContextMetrics {
inputTokens: number;
outputTokens: number;
totalTokens: number;
turnCount: number;
toolCallCount: number;
estimatedCapacityUsed: number;
}
async function runWithContextTracking(
agent: Agent,
input: string,
maxContextTokens: number = 200000
): Promise<{ output: string; metrics: ContextMetrics }> {
let totalInput = 0;
let totalOutput = 0;
let toolCalls = 0;
let turns = 0;
const result = await agent.run(input, {
maxTurns: 15,
onTurnComplete: (turnResult) => {
totalInput += turnResult.usage.inputTokens;
totalOutput += turnResult.usage.outputTokens;
toolCalls += turnResult.toolCalls?.length ?? 0;
turns++;
},
});
const metrics: ContextMetrics = {
inputTokens: totalInput,
outputTokens: totalOutput,
totalTokens: totalInput + totalOutput,
turnCount: turns,
toolCallCount: toolCalls,
estimatedCapacityUsed: totalInput / maxContextTokens,
};
if (metrics.estimatedCapacityUsed > 0.8) {
console.warn(
`Context usage at ${(metrics.estimatedCapacityUsed * 100).toFixed(1)}% — ` +
`consider summarization or context trimming`
);
}
return { output: result.output, metrics };
}
When context grows too large, summarize earlier turns to free space while preserving essential information.
const summarizerAgent = new Agent({
name: "context-summarizer",
model: "claude-sonnet-4-20250514",
instructions: `Summarize the conversation history into a concise context block.
PRESERVE:
- Key decisions made and their reasoning
- Specific data points and numbers referenced
- Current task status and next steps
- User preferences and constraints mentioned
DISCARD:
- Pleasantries and filler conversation
- Failed tool calls and their error details (keep only the outcome)
- Intermediate reasoning that led to final conclusions
Format as a structured summary under 500 words.`,
});
interface ManagedConversation {
summary: string | null;
recentMessages: Array<{ role: string; content: string }>;
maxRecentMessages: number;
}
async function addMessageWithContextManagement(
conversation: ManagedConversation,
message: { role: string; content: string }
): Promise<ManagedConversation> {
conversation.recentMessages.push(message);
if (conversation.recentMessages.length > conversation.maxRecentMessages) {
// Summarize older messages
const messagesToSummarize = conversation.recentMessages.slice(
0,
conversation.recentMessages.length - Math.floor(conversation.maxRecentMessages / 2)
);
const textToSummarize = messagesToSummarize
.map((m) => `${m.role}: ${m.content}`)
.join("\n\n");
const previousContext = conversation.summary
? `Previous summary:\n${conversation.summary}\n\nNew messages to incorporate:\n`
: "";
const summaryResult = await summarizerAgent.run(
`${previousContext}${textToSummarize}`,
{ maxTurns: 1 }
);
conversation.summary = summaryResult.output;
conversation.recentMessages = conversation.recentMessages.slice(
messagesToSummarize.length
);
}
return conversation;
}
For information that must persist across sessions or exceed context window limits, use an external memory store.
import { Tool } from "@anthropic-ai/agent-sdk";
import { z } from "zod";
interface MemoryEntry {
key: string;
content: string;
category: string;
timestamp: number;
accessCount: number;
}
class AgentMemoryStore {
private entries = new Map<string, MemoryEntry>();
store(key: string, content: string, category: string): void {
this.entries.set(key, {
key,
content,
category,
timestamp: Date.now(),
accessCount: 0,
});
}
retrieve(key: string): MemoryEntry | undefined {
const entry = this.entries.get(key);
if (entry) entry.accessCount++;
return entry;
}
search(query: string, category?: string): MemoryEntry[] {
const results: MemoryEntry[] = [];
for (const entry of this.entries.values()) {
if (category && entry.category !== category) continue;
if (entry.content.toLowerCase().includes(query.toLowerCase())) {
results.push(entry);
}
}
return results.sort((a, b) => b.timestamp - a.timestamp).slice(0, 5);
}
}
const memoryStore = new AgentMemoryStore();
const storeMemoryTool = new Tool({
name: "store_memory",
description: "Save important information for future reference across conversations",
inputSchema: z.object({
key: z.string().describe("Short identifier for this memory"),
content: z.string().describe("The information to remember"),
category: z.enum(["user_preference", "decision", "fact", "task_state"]),
}),
async execute({ key, content, category }) {
memoryStore.store(key, content, category);
return { stored: true, key };
},
});
const recallMemoryTool = new Tool({
name: "recall_memory",
description: "Search stored memories for relevant information",
inputSchema: z.object({
query: z.string().describe("What to search for"),
category: z.enum(["user_preference", "decision", "fact", "task_state"]).optional(),
}),
async execute({ query, category }) {
const results = memoryStore.search(query, category);
if (results.length === 0) {
return { found: false, message: "No matching memories found" };
}
return {
found: true,
memories: results.map((r) => ({
key: r.key,
content: r.content,
category: r.category,
storedAt: new Date(r.timestamp).toISOString(),
})),
};
},
});
When agents hand off to each other, allocate context budgets to prevent downstream agents from running out of space.
function buildContextBudgetedHandoff(
upstreamOutput: string,
maxHandoffTokens: number = 4000
): string {
const estimatedTokens = Math.ceil(upstreamOutput.length / 4);
if (estimatedTokens <= maxHandoffTokens) {
return upstreamOutput;
}
// Truncate to budget and add a notice
const truncatedLength = maxHandoffTokens * 4;
const truncated = upstreamOutput.slice(0, truncatedLength);
return `${truncated}\n\n[Note: Previous agent output was truncated from ` +
`~${estimatedTokens} tokens to ${maxHandoffTokens} tokens to preserve ` +
`context budget. Key findings should be in the content above.]`;
}
async function budgetedPipeline(query: string) {
const research = await researchAgent.run(query, { maxTurns: 10 });
const budgetedResearch = buildContextBudgetedHandoff(research.output, 5000);
const analysis = await analysisAgent.run(budgetedResearch, { maxTurns: 10 });
const budgetedAnalysis = buildContextBudgetedHandoff(analysis.output, 3000);
return reportAgent.run(budgetedAnalysis, { maxTurns: 5 });
}
Context management is fundamentally a team-level concern. Each agent in a pipeline or parallel configuration consumes from its own context window, but the information that flows between agents determines how much context each one needs.
In Sequential Pipelines, context accumulates with each stage. The third agent receives the first agent's output (possibly summarized), the second agent's output, plus its own instructions. Without explicit context budgets, the final agent often operates with a nearly full context window and degraded performance.
In Parallel Workers, each agent gets a fresh context window, which is one of the key advantages of this pattern. The synthesis agent that merges parallel results is the bottleneck — it receives output from all parallel agents and must have enough context budget to process everything.
In Long-Running Agent patterns (agents that persist across user sessions), external memory stores are essential. The agent cannot keep all historical context in its window, so it must selectively load relevant memories for each new interaction.
Monitor context usage before it becomes a problem. Add token tracking to your agent runs from the start. By the time you notice degraded output quality, context exhaustion may have been affecting results for a while.
Summarize aggressively between pipeline stages. A 10,000-token research report can usually be summarized to 2,000 tokens without losing the information the analysis agent needs. Build summarization into every handoff rather than passing raw output.
Store structured data externally, not in context. If an agent accumulates a list of items, database records, or search results across multiple tool calls, write them to an external store and give the agent a retrieval tool. Context windows are expensive storage.
Test with maximum-length conversations. Run your agents through scenarios that generate 20+ turns with multiple tool calls. Many context management bugs only appear when the window is nearly full.
Give agents awareness of their context constraints. Include a line in agent instructions like "If you have accumulated significant context from tool calls, summarize your key findings before proceeding." This prompts the model to self-manage context in long interactions.