Multi-Agent Design Patterns
About 10 minutes
A multi-agent system is an architecture in which multiple AI agents collaborate to handle tasks that a single agent struggles to solve alone. This page explains four representative design patterns implementable with the Claude Agent SDK, along with the selection criteria for each.
What Is a Multi-Agent System?
Section titled “What Is a Multi-Agent System?”A single AI agent has three structural limitations. The first is the context length ceiling. Reviewing an entire large codebase or analyzing hundreds of pages of documentation does not fit within a single context window. The second is the limits of specialization. In domains such as law, medicine, and finance, an agent specialized in a particular domain achieves higher accuracy and consistency than a general-purpose agent. The third is the absence of parallelism. A single agent processes tasks sequentially, so even when multiple independent subtasks exist, they cannot be executed simultaneously.
Multi-agent systems address all three problems. They split tasks to work around context length constraints, deploy specialized agents for each role, and execute independent subtasks in parallel to improve overall throughput.
graph TD
SINGLE[Single Agent] --> LIMIT1[Context Length Ceiling]
SINGLE --> LIMIT2[Limits of Specialization]
SINGLE --> LIMIT3[No Parallelism]
MULTI[Multi-Agent System] --> SOL1[Task splitting avoids context constraints]
MULTI --> SOL2[Specialized agents improve accuracy]
MULTI --> SOL3[Parallel execution improves throughput]Four Fundamental Patterns
Section titled “Four Fundamental Patterns”Pattern 1: Orchestrator-Worker
Section titled “Pattern 1: Orchestrator-Worker”The Orchestrator-Worker pattern is a structure in which one orchestrator agent manages and directs multiple worker agents. The orchestrator is responsible for planning the overall task, decomposing it into subtasks, issuing instructions to workers, and integrating the results. Each worker executes its assigned subtask and returns the result to the orchestrator.
sequenceDiagram
participant User as User
participant Orch as Orchestrator
participant W1 as Worker A
participant W2 as Worker B
participant W3 as Worker C
User->>Orch: Task request (e.g., competitive analysis report)
Orch->>W1: Subtask A (product analysis of Competitor A)
Orch->>W2: Subtask B (pricing research on Competitor B)
Orch->>W3: Subtask C (market trend collection)
W1-->>Orch: Result A
W2-->>Orch: Result B
W3-->>Orch: Result C
Orch->>User: Integrated reportSuitable use cases:
- Complex coding tasks (feature implementation, test creation, and documentation generation running in parallel)
- Competitive research and market analysis reports (simultaneous collection and analysis from multiple sources)
- Large-scale data processing pipelines (parallel aggregation and integration of multiple datasets)
Claude Agent SDK implementation example (Python pseudocode):
from anthropic import Anthropic
client = Anthropic()
def orchestrator_agent(task: str) -> str:
"""Orchestrator: decomposes the task and delegates to workers"""
# The orchestrator plans the subtasks
plan_response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system="You are an expert in task decomposition. Break the input task into subtasks.",
messages=[{"role": "user", "content": f"Decompose the following task into parallel subtasks: {task}"}]
)
subtasks = parse_subtasks(plan_response.content[0].text)
# Delegate each subtask to a worker (parallel execution)
results = []
for subtask in subtasks:
result = worker_agent(subtask)
results.append(result)
# Integrate results
return synthesize_results(results)
def worker_agent(subtask: str) -> str:
"""Worker: executes the assigned subtask"""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=512,
system="You are a specialized agent that executes assigned subtasks accurately.",
messages=[{"role": "user", "content": subtask}]
)
return response.content[0].textPattern 2: Pipeline
Section titled “Pattern 2: Pipeline”The Pipeline pattern is a sequential structure in which each agent passes its output as input to the next agent. The result of the upstream agent becomes the input for the downstream agent, and the processing is refined step by step. Each stage is responsible for a clearly defined transformation.
graph LR
INPUT[Input] --> A1[Agent 1\nDraft creation]
A1 --> A2[Agent 2\nFact-check & proofreading]
A2 --> A3[Agent 3\nStyle normalization]
A3 --> A4[Agent 4\nTranslation]
A4 --> OUTPUT[Final output]Suitable use cases:
- Content production pipelines (drafting → fact-checking → proofreading → style normalization → translation)
- Data transformation and ETL processing (extract → cleanse → transform → validate → load)
- Code review pipelines (generation → static analysis → test creation → documentation generation)
Approach to defining agent roles at each stage:
Apply the Single Responsibility Principle to each stage’s agent. Limit each agent to one type of transformation, and define input and output formats clearly. This allows independent quality evaluation and replacement of each stage. The system prompt for each stage should explicitly state the flow: “receive the upstream output, perform the following processing, and produce output,” maintaining contextual continuity.
def run_pipeline(input_text: str) -> str:
"""Execute each pipeline stage sequentially"""
stages = [
("Drafting Agent", "Create a draft on the given topic."),
("Proofreading Agent", "Proofread the following text and correct any factual errors."),
("Translation Agent", "Translate the following English text to Japanese."),
]
current_content = input_text
for stage_name, system_prompt in stages:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": current_content}]
)
current_content = response.content[0].text
return current_contentPattern 3: Specialist Pool
Section titled “Pattern 3: Specialist Pool”The Specialist Pool pattern is a structure in which a router agent analyzes incoming input and routes processing to the most appropriate specialized agent. Each specialist agent is optimized for a specific domain or processing type and delivers higher accuracy than a general-purpose agent.
graph TD
INPUT[Input] --> ROUTER[Router Agent]
ROUTER --> |Technical question| SPEC_TECH[Technical Support Agent]
ROUTER --> |Billing & payment| SPEC_BILL[Billing Support Agent]
ROUTER --> |English| SPEC_EN[English-Language Agent]
ROUTER --> |French| SPEC_FR[French-Language Agent]
ROUTER --> |Other| SPEC_GEN[General-Purpose Agent]
SPEC_TECH --> OUTPUT[Response]
SPEC_BILL --> OUTPUT
SPEC_EN --> OUTPUT
SPEC_FR --> OUTPUT
SPEC_GEN --> OUTPUTSuitable use cases:
- Customer support systems (technical questions, billing, complaints, and general inquiries handled by specialized agents)
- Multilingual processing (detect the language and route to the appropriate native-language agent)
- Legal, medical, and financial document analysis (place agents with domain expertise for each document type)
Routing logic design approaches:
There are three approaches to designing a router agent. The first is rule-based routing, which dispatches based on keywords or regular expressions. It is fast and predictable but struggles with complex inputs. The second is LLM-based routing, in which the router agent itself uses an LLM to understand the intent of the input and decide on the routing destination. It handles complex inputs flexibly but increases latency and cost. The third is hybrid routing, which first attempts rule-based routing at high speed and falls back to the LLM only when the result is uncertain. This approach is effective in most production environments.
def router_agent(user_input: str) -> str:
"""Router: analyzes input and routes to the appropriate specialist agent"""
routing_response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=128,
system="""Analyze the input and return one appropriate category.
Categories: [technical, billing, general, language_en, language_fr]
Return in JSON format: {"category": "..."}""",
messages=[{"role": "user", "content": user_input}]
)
category = parse_category(routing_response.content[0].text)
specialist_map = {
"technical": technical_specialist,
"billing": billing_specialist,
"general": general_specialist,
}
specialist = specialist_map.get(category, general_specialist)
return specialist(user_input)Pattern 4: Swarm
Section titled “Pattern 4: Swarm”The Swarm pattern is a structure in which multiple autonomous agents collaborate to solve problems without a centralized orchestrator. Each agent shares state and can hand off processing to other agents as needed. The swarm as a whole achieves adaptive problem-solving.
graph TD
SHARED["Shared State\n(context, progress, artifacts)"]
A1[Agent 1\nCode analysis] <-->|handoff| A2[Agent 2\nTest generation]
A2 <-->|handoff| A3[Agent 3\nDocumentation generation]
A1 <-->|handoff| A3
A1 --> SHARED
A2 --> SHARED
A3 --> SHARED
SHARED --> A1
SHARED --> A2
SHARED --> A3Suitable use cases:
- Large-scale code refactoring (each agent independently analyzes and improves a module and writes results to the shared state)
- Parallel research (multiple agents investigate different sources simultaneously and aggregate findings in shared state)
- Autonomous software testing (test generation, execution, and debugging handled cooperatively by multiple agents)
Connection to Claude Code Level 10:
At Claude Code Level 10 (the advanced agent autonomy level), the Swarm pattern serves as the underlying architecture. When Claude Code spawns multiple sub-agents to perform large-scale repository refactoring or migration in parallel, each sub-agent holds an independent context while coordinating through a shared file system or state store. Understanding the Swarm pattern helps explain how Claude Code’s advanced autonomous behavior works.
class SwarmAgent:
"""Swarm agent: collaborates with other agents through shared state"""
def __init__(self, name: str, specialization: str, shared_state: dict):
self.name = name
self.specialization = specialization
self.shared_state = shared_state
def process(self, task: str) -> str:
# Retrieve current progress from shared state
context = self.shared_state.get("context", "")
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=f"You are a specialized agent in {self.specialization}. Reference the shared state when working.",
messages=[{
"role": "user",
"content": f"Shared state: {context}\n\nTask: {task}"
}]
)
result = response.content[0].text
# Update shared state
self.shared_state[f"{self.name}_result"] = result
return resultPattern Selection Framework
Section titled “Pattern Selection Framework”To choose the right pattern, evaluate the task across four axes: nature, dependencies, parallelism, and specialization.
graph TD
START[Receive task] --> Q1{Can the task be\ndecomposed into\nsubtasks?}
Q1 -->|No| SINGLE[Process with a single agent]
Q1 -->|Yes| Q2{Are there dependencies\nbetween subtasks?}
Q2 -->|Yes, sequential dependency| PIPELINE[Pipeline pattern]
Q2 -->|No, independent| Q3{Is centralized\nmanagement needed?}
Q3 -->|Yes| Q4{Is specialized\nrouting needed?}
Q3 -->|No| SWARM[Swarm pattern]
Q4 -->|Yes, dispatch-based| SPECIALIST[Specialist Pool pattern]
Q4 -->|No, integration-based| ORCHESTRATOR[Orchestrator-Worker pattern]Pattern selection criteria table:
| Evaluation axis | Orchestrator-Worker | Pipeline | Specialist Pool | Swarm |
|---|---|---|---|---|
| Task nature | Requires split & integrate | Stepwise transformation | Type-based routing | Large-scale & complex |
| Subtask dependencies | Parallel, independent | Sequential, order-dependent | Independent (type-dependent) | Dynamic, adaptive |
| Parallelism | High (parallel workers) | Low (sequential) | Medium (per type) | High (all agents parallel) |
| Specialization | Medium (generic workers OK) | Medium (per-stage) | High (domain-specific) | Medium to high (adaptive) |
| Implementation complexity | Medium | Low | Low to medium | High |
| Recommended model scale | Opus (Orch) + Sonnet (Worker) | Sonnet for all stages | Haiku (Router) + specialized models | Sonnet for all agents |
Implementation Considerations
Section titled “Implementation Considerations”Context Passing Design
Section titled “Context Passing Design”How context is passed between agents directly affects the quality of a multi-agent system. The context provided to each agent should be “minimal yet sufficient.” Passing the entire context inflates token consumption, and irrelevant information degrades judgment accuracy. A summary pattern — extracting and summarizing only the necessary information from the upstream result before passing it to the next agent — is effective. Using schema-defined structured data (JSON) for inter-agent communication also prevents parsing errors and improves reliability.
Error Handling and Partial Failure Recovery
Section titled “Error Handling and Partial Failure Recovery”Design multi-agent systems with the assumption that individual agents may fail. In the Pipeline pattern, a failure in an upstream agent propagates to all downstream stages, making error detection and fallback processing at each stage essential. In the Orchestrator-Worker pattern, graceful degradation — tolerating partial worker failures and generating a final output from only the successful workers’ results — is important. Concretely, implement retry logic (with exponential backoff), timeout settings, and handling that treats failed worker results as “not retrieved.”
Cost and Latency Trade-offs
Section titled “Cost and Latency Trade-offs”Multi-agent systems increase API call volume, making cost and latency management critical. The foundation of cost optimization is using the right model for each role: a fast, low-cost Claude Haiku for the router agent, and Claude Opus for agents handling complex tasks. Combining result caching (preventing recalculation for identical inputs) with parallel execution to reduce overall latency achieves a balance between cost and speed. In production environments, monitor each agent’s execution time, cost, and accuracy to continuously optimize.
Summary
Section titled “Summary”The four multi-agent patterns are:
- Orchestrator-Worker: Splits a complex task, executes parts in parallel, and the central orchestrator integrates the results. Suitable for research report creation and complex coding tasks.
- Pipeline: Each agent passes its output sequentially to the next, refining the result step by step. Suitable for content production and data transformation flows.
- Specialist Pool: A router dispatches input to specialized agents based on the type of input. Suitable for customer support and multilingual processing.
- Swarm: Autonomous agents collaborate without centralized management. Suitable for large-scale refactoring and parallel research.
In pattern selection, answering four questions in order — “Can the task be decomposed?”, “What direction do the dependencies flow?”, “Is centralized management needed?”, and “Is specialization required?” — leads to the optimal pattern.
See the references for the external specifications and background sources used on this page.[1][2]
References
Section titled “References”- Anthropic, Claude Code documentation
- Anthropic, Claude API documentation