How to Iterate on Agent Team Prompts: A 3-Run Framework

· 5 min read

Why Your First Run Is Never Your Best Run

Agent teams generate useful output on the first try. But "useful" and "excellent" are different things. The first run reveals what the team can do. Iteration reveals what it should do.

The problem is that most people either accept the first output as-is or tweak prompts randomly hoping something improves. Neither approach is efficient.

This framework gives you a structured path from first run to polished output in exactly three iterations. Three runs is the sweet spot — enough to reach high quality, few enough to avoid diminishing returns.

The 4-Dimension Evaluation Framework

Before you iterate, you need to know what to evaluate. Score every agent team output on these four dimensions:

Rate each dimension as strong, adequate, or weak. The weakest dimension is your target for Run 2.

Run 1: Baseline

What to Do

Run the agent team exactly as configured. Don't modify anything. This is your baseline.

How to Evaluate

Read the full output and score each dimension. Be specific about what's weak and why.

Example baseline evaluation for a competitive analysis team:

Weakest dimension: Actionability

What You Learn

The baseline tells you what the team naturally produces well and where it falls short. Most teams are strong on breadth and weak on either depth or actionability. That's because default prompts tell agents what to analyze but rarely specify how specific and actionable the output should be.

Run 2: Targeted Fix

What to Do

Modify the prompts to directly address the weakest dimension identified in Run 1. Change only what's needed to fix the weak area — don't overhaul everything.

Before and After Prompt Example

Weakest dimension: Actionability

Before (baseline prompt for Strategy Synthesizer):

Synthesize the competitive analysis from all agents into a strategic recommendation. Identify key themes and suggest how we should respond.

After (targeted fix):

Synthesize the competitive analysis from all agents into a strategic recommendation. For each recommendation, specify: (1) the exact action to take, (2) which team owns it, (3) the expected timeline, and (4) how to measure success. Prioritize recommendations by impact. Generic advice like "differentiate" or "innovate" is not acceptable — every recommendation must be specific enough that a team could start executing it tomorrow.

How to Evaluate

Re-run the team with the modified prompts. Score all four dimensions again and compare to baseline.

Example Run 2 evaluation:

What You Learn

Targeted fixes usually produce dramatic improvement in the weak dimension without degrading others. If fixing one dimension weakens another (e.g., pushing for actionability reduces depth), note it — you'll address it in Run 3.

Run 3: Polish

What to Do

Fine-tune output format, add quality criteria, and optimize the synthesis prompt. This is where you go from "strong" to "excellent."

Three things to adjust in the polish run:

1. Output Format Specifications

Tell agents exactly how to structure their output. Tables, bullet points, headers, and specific section requirements eliminate ambiguity.

Before:

Analyze each competitor's pricing strategy.

After:

For each competitor, produce a pricing analysis in this format:

  • Pricing model: (per-seat, usage-based, flat-rate, hybrid)
  • Entry price: (lowest published tier)
  • Enterprise price: (highest tier or custom pricing indicators)
  • Key differentiator: (what makes their pricing strategy distinctive)
  • Vulnerability: (where their pricing creates an opening for us)

2. Quality Criteria in Prompts

Add explicit standards that agents must meet. This prevents regression on dimensions you've already fixed.

Example addition to any agent prompt:

Quality standards: Every claim must include specific evidence. Every recommendation must include a concrete next step. Do not use vague quantifiers like "significant" or "many" — use numbers or ranges.

3. Synthesis Prompt Optimization

The synthesis prompt has the biggest impact on final output quality. For Run 3, make it comprehensive.

Before:

Combine the agent outputs into a final report.

After:

Produce the final competitive analysis report. Structure: Executive Summary (5 bullet points max), Competitor Profiles (one section per competitor using the standardized format), Strategic Recommendations (top 5, prioritized, each with specific action/owner/timeline/metric), and Open Questions (what we need to investigate further). Before writing, identify any contradictions between agent outputs and resolve them explicitly. The report should be usable in a leadership meeting without additional context.

How to Evaluate

Score all four dimensions one final time. Compare against both Run 1 (baseline) and Run 2 (targeted fix).

Why 3 Runs Is the Sweet Spot

The improvement curve for prompt iteration follows a predictable pattern:

After three runs, you have a well-tuned team configuration that you can reuse. Save those prompts. The next time you need a competitive analysis (or whatever the team does), you start from Run 3 quality, not Run 1.

The Framework in Practice

Run Focus Time Expected Improvement
1 Baseline — run as-is, evaluate 5 min Establishes starting point
2 Targeted fix — address weakest dimension 10 min 30-50% quality improvement
3 Polish — format, quality criteria, synthesis 15 min 15-25% additional improvement

Total time: 30 minutes to go from a default team to a polished, reusable configuration.

That's a small investment for a team you might run dozens of times. And every run after the third benefits from the prompt improvements you've already made.

Start with a baseline team and iterate →