Claude Code: Optimizing Agent Team Prompts

· 6 min read

Overview

In a single-agent system, prompt quality affects output quality. In a multi-agent system, prompt quality affects output quality multiplied by the number of agents. A vaguely prompted research agent produces mediocre findings, which a downstream analysis agent interprets incorrectly, which a report agent then presents with false confidence. Prompt failures compound across agent boundaries.

Optimizing prompts for agent teams is different from optimizing prompts for standalone interactions. You are not just writing instructions for one model call — you are defining contracts between agents. Each agent's prompt must specify what it does, what it does not do, what input it expects, and what output format it produces. These boundaries prevent the overlapping responsibilities, inconsistent output formats, and unclear handoff conditions that cause most multi-agent failures.

This guide covers the specific prompt patterns that improve multi-agent reliability: role boundaries, output schemas, coordination instructions, and systematic testing approaches.

When to use it

Invest in prompt optimization when:

Prompt optimization is less impactful when your agents are fundamentally misconfigured — wrong tools, wrong coordination pattern, or wrong task decomposition. Fix the architecture first, then optimize prompts.

Getting started

Define explicit role boundaries

Every agent prompt should clearly state both what the agent does and what it does not do. Negative boundaries prevent agents from overstepping their role.

import { Agent } from "@anthropic-ai/agent-sdk";

// Poor prompt — vague role, no boundaries
const vagueAgent = new Agent({
  name: "researcher",
  model: "claude-sonnet-4-20250514",
  instructions: "Help with research tasks.",
});

// Optimized prompt — specific role, clear boundaries, defined output
const optimizedAgent = new Agent({
  name: "market-researcher",
  model: "claude-sonnet-4-20250514",
  instructions: `You are a market research specialist focused on competitive
    intelligence gathering.

    YOUR RESPONSIBILITIES:
    - Search for competitor product information, pricing, and positioning
    - Identify market trends and emerging competitors
    - Gather quantitative market data (market size, growth rates, share)
    - Compile source-attributed findings

    YOU DO NOT:
    - Make strategic recommendations (the strategy agent handles this)
    - Analyze or interpret the data you gather (the analysis agent handles this)
    - Write final reports or presentations (the report agent handles this)

    OUTPUT FORMAT:
    Return your findings as a structured list with these fields for each finding:
    - finding: One-sentence summary
    - source: Where you found this information
    - confidence: high / medium / low
    - dataPoint: Specific number or fact, if applicable
    - dateRelevance: When this information was current`,
});

Specify output schemas in agent instructions

When agents produce output that other agents consume, define the expected format precisely.

const analysisAgent = new Agent({
  name: "competitive-analyst",
  model: "claude-sonnet-4-20250514",
  instructions: `You analyze competitive research findings and produce
    structured assessments.

    INPUT: You receive a list of research findings, each with a source,
    confidence level, and data point.

    ANALYSIS PROCESS:
    1. Group findings by competitor
    2. Assess each competitor's strategic position
    3. Identify gaps in the research (missing data points)
    4. Rate competitive threat level for each competitor

    OUTPUT: Respond with a JSON object matching this structure:
    {
      "competitors": [
        {
          "name": "string",
          "threatLevel": "high" | "medium" | "low",
          "strengths": ["string"],
          "weaknesses": ["string"],
          "keyMetrics": { "metricName": "value" },
          "dataGaps": ["string"]
        }
      ],
      "marketOverview": {
        "totalAddressableMarket": "string",
        "growthRate": "string",
        "dominantTrend": "string"
      },
      "confidenceAssessment": "string"
    }

    If research findings are insufficient for confident analysis, say so
    explicitly in the confidenceAssessment field rather than speculating.`,
});

Write coordination-aware prompts

In multi-agent systems, agents need to understand their position in the workflow.

const qualityReviewAgent = new Agent({
  name: "quality-reviewer",
  model: "claude-sonnet-4-20250514",
  instructions: `You are the final quality gate in a content pipeline.

    UPSTREAM AGENTS:
    - research-agent: Provides source material and data points
    - draft-agent: Produces the initial content draft

    YOUR POSITION: You receive the draft and the original research.
    Your output goes directly to the user with no further processing.

    REVIEW CRITERIA:
    1. Factual accuracy: Does the draft match the research findings?
    2. Completeness: Are all key research findings represented?
    3. Clarity: Is the content understandable to the target audience?
    4. Source attribution: Are claims supported by cited sources?

    OUTPUT ACTIONS:
    - If the draft passes all criteria: Return the draft with minor edits
    - If the draft has fixable issues: Return a corrected version
    - If the draft has fundamental problems: Return a detailed list of
      issues that need the draft-agent to regenerate

    Never fabricate information that was not in the research findings.
    Never remove source attributions to make text read more smoothly.`,
});

Systematic prompt testing with evaluation sets

Build a test harness that runs the same inputs through your agent team and scores the outputs.

interface PromptTestCase {
  name: string;
  input: string;
  expectedBehaviors: string[];
  forbiddenBehaviors: string[];
}

const testCases: PromptTestCase[] = [
  {
    name: "standard-competitor-query",
    input: "Analyze the CRM market competitive landscape for mid-market B2B companies",
    expectedBehaviors: [
      "mentions at least 3 specific competitors",
      "includes quantitative market data",
      "provides threat level assessment",
    ],
    forbiddenBehaviors: [
      "makes up specific revenue numbers without sources",
      "recommends strategic actions",
    ],
  },
  {
    name: "ambiguous-query",
    input: "Tell me about the competition",
    expectedBehaviors: [
      "asks clarifying questions or uses available context",
      "does not produce a generic response",
    ],
    forbiddenBehaviors: [
      "produces a detailed analysis without knowing the market",
    ],
  },
];

async function runPromptTests(agent: Agent, tests: PromptTestCase[]) {
  const results = [];

  for (const test of tests) {
    const result = await agent.run(test.input, { maxTurns: 10 });
    const output = result.output.toLowerCase();

    const passedExpected = test.expectedBehaviors.map((behavior) => ({
      behavior,
      passed: evaluateBehavior(output, behavior),
    }));

    const passedForbidden = test.forbiddenBehaviors.map((behavior) => ({
      behavior,
      passed: !evaluateBehavior(output, behavior),
    }));

    results.push({
      testName: test.name,
      expected: passedExpected,
      forbidden: passedForbidden,
      allPassed:
        passedExpected.every((r) => r.passed) &&
        passedForbidden.every((r) => r.passed),
    });
  }

  return results;
}

Integration with agent teams

Prompt optimization has different priorities depending on the coordination pattern:

Sequential Pipelines: Focus on output format consistency. Each agent's output is the next agent's input. If the analysis agent expects JSON but the research agent returns free-form text, the pipeline breaks. Define explicit output schemas and validate them between stages.

Parallel Workers: Focus on differentiation. When three agents research the same topic in parallel, their prompts must guide them to genuinely different perspectives. Otherwise you get three copies of the same generic analysis. Specify the lens each agent should use: "academic sources only," "industry practitioner perspective," "emerging trend focus."

Router patterns: Focus on triage prompt precision. The router's classification accuracy determines the entire system's routing quality. Test the triage prompt with at least 50 representative inputs, including edge cases that span multiple categories.

Best practices and common pitfalls

  1. Optimize the weakest agent first. In a multi-agent team, overall quality is constrained by the worst-performing agent. Identify which agent produces the lowest quality output and focus prompt optimization there before tuning well-performing agents.

  2. Use concrete examples in prompts, not abstract rules. Showing the agent one example of good output teaches format and quality expectations more effectively than three paragraphs of abstract instructions.

  3. Test prompts in the team context, not isolation. An agent that performs perfectly in isolation may struggle when receiving real upstream output. Always test with actual outputs from preceding agents in the pipeline.

  4. Version your prompts. Track prompt changes alongside code changes. When agent quality degrades, you need to identify which prompt change caused the regression. Store prompts in version-controlled files, not inline strings.

  5. Measure token efficiency alongside quality. A prompt that produces better output but costs 3x the tokens may not be the right tradeoff. Track output quality and token usage together to find the efficient frontier.

Skip the setup — generate agent teams instantly →