Multi-Agent Design Patterns

About 10 minutes

Developers and architects designing systems that combine multiple AI agents

A multi-agent system is an architecture in which multiple AI agents collaborate to handle tasks that a single agent struggles to solve alone. This page explains four representative design patterns implementable with the Claude Agent SDK, along with the selection criteria for each.

What Is a Multi-Agent System?

A single AI agent has three structural limitations. The first is the context length ceiling. Reviewing an entire large codebase or analyzing hundreds of pages of documentation does not fit within a single context window. The second is the limits of specialization. In domains such as law, medicine, and finance, an agent specialized in a particular domain achieves higher accuracy and consistency than a general-purpose agent. The third is the absence of parallelism. A single agent processes tasks sequentially, so even when multiple independent subtasks exist, they cannot be executed simultaneously.

Multi-agent systems address all three problems. They split tasks to work around context length constraints, deploy specialized agents for each role, and execute independent subtasks in parallel to improve overall throughput.

graph TD
  SINGLE[Single Agent] --> LIMIT1[Context Length Ceiling]
  SINGLE --> LIMIT2[Limits of Specialization]
  SINGLE --> LIMIT3[No Parallelism]

  MULTI[Multi-Agent System] --> SOL1[Task splitting avoids context constraints]
  MULTI --> SOL2[Specialized agents improve accuracy]
  MULTI --> SOL3[Parallel execution improves throughput]

Four Fundamental Patterns

Pattern 1: Orchestrator-Worker

The Orchestrator-Worker pattern is a structure in which one orchestrator agent manages and directs multiple worker agents. The orchestrator is responsible for planning the overall task, decomposing it into subtasks, issuing instructions to workers, and integrating the results. Each worker executes its assigned subtask and returns the result to the orchestrator.

sequenceDiagram
  participant User as User
  participant Orch as Orchestrator
  participant W1 as Worker A
  participant W2 as Worker B
  participant W3 as Worker C

  User->>Orch: Task request (e.g., competitive analysis report)
  Orch->>W1: Subtask A (product analysis of Competitor A)
  Orch->>W2: Subtask B (pricing research on Competitor B)
  Orch->>W3: Subtask C (market trend collection)
  W1-->>Orch: Result A
  W2-->>Orch: Result B
  W3-->>Orch: Result C
  Orch->>User: Integrated report

Suitable use cases:

Complex coding tasks (feature implementation, test creation, and documentation generation running in parallel)
Competitive research and market analysis reports (simultaneous collection and analysis from multiple sources)
Large-scale data processing pipelines (parallel aggregation and integration of multiple datasets)

Claude Agent SDK implementation example (Python pseudocode):

from anthropic import Anthropic

client = Anthropic()

def orchestrator_agent(task: str) -> str:
    """Orchestrator: decomposes the task and delegates to workers"""
    # The orchestrator plans the subtasks
    plan_response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        system="You are an expert in task decomposition. Break the input task into subtasks.",
        messages=[{"role": "user", "content": f"Decompose the following task into parallel subtasks: {task}"}]
    )
    subtasks = parse_subtasks(plan_response.content[0].text)

    # Delegate each subtask to a worker (parallel execution)
    results = []
    for subtask in subtasks:
        result = worker_agent(subtask)
        results.append(result)

    # Integrate results
    return synthesize_results(results)

def worker_agent(subtask: str) -> str:
    """Worker: executes the assigned subtask"""
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=512,
        system="You are a specialized agent that executes assigned subtasks accurately.",
        messages=[{"role": "user", "content": subtask}]
    )
    return response.content[0].text

Pattern 2: Pipeline

The Pipeline pattern is a sequential structure in which each agent passes its output as input to the next agent. The result of the upstream agent becomes the input for the downstream agent, and the processing is refined step by step. Each stage is responsible for a clearly defined transformation.

graph LR
  INPUT[Input] --> A1[Agent 1\nDraft creation]
  A1 --> A2[Agent 2\nFact-check & proofreading]
  A2 --> A3[Agent 3\nStyle normalization]
  A3 --> A4[Agent 4\nTranslation]
  A4 --> OUTPUT[Final output]

Suitable use cases:

Content production pipelines (drafting → fact-checking → proofreading → style normalization → translation)
Data transformation and ETL processing (extract → cleanse → transform → validate → load)
Code review pipelines (generation → static analysis → test creation → documentation generation)

Approach to defining agent roles at each stage:

Apply the Single Responsibility Principle to each stage’s agent. Limit each agent to one type of transformation, and define input and output formats clearly. This allows independent quality evaluation and replacement of each stage. The system prompt for each stage should explicitly state the flow: “receive the upstream output, perform the following processing, and produce output,” maintaining contextual continuity.

def run_pipeline(input_text: str) -> str:
    """Execute each pipeline stage sequentially"""
    stages = [
        ("Drafting Agent", "Create a draft on the given topic."),
        ("Proofreading Agent", "Proofread the following text and correct any factual errors."),
        ("Translation Agent", "Translate the following English text to Japanese."),
    ]

    current_content = input_text
    for stage_name, system_prompt in stages:
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            system=system_prompt,
            messages=[{"role": "user", "content": current_content}]
        )
        current_content = response.content[0].text

    return current_content

Pattern 3: Specialist Pool

The Specialist Pool pattern is a structure in which a router agent analyzes incoming input and routes processing to the most appropriate specialized agent. Each specialist agent is optimized for a specific domain or processing type and delivers higher accuracy than a general-purpose agent.

graph TD
  INPUT[Input] --> ROUTER[Router Agent]
  ROUTER --> |Technical question| SPEC_TECH[Technical Support Agent]
  ROUTER --> |Billing & payment| SPEC_BILL[Billing Support Agent]
  ROUTER --> |English| SPEC_EN[English-Language Agent]
  ROUTER --> |French| SPEC_FR[French-Language Agent]
  ROUTER --> |Other| SPEC_GEN[General-Purpose Agent]
  SPEC_TECH --> OUTPUT[Response]
  SPEC_BILL --> OUTPUT
  SPEC_EN --> OUTPUT
  SPEC_FR --> OUTPUT
  SPEC_GEN --> OUTPUT

Suitable use cases:

Customer support systems (technical questions, billing, complaints, and general inquiries handled by specialized agents)
Multilingual processing (detect the language and route to the appropriate native-language agent)
Legal, medical, and financial document analysis (place agents with domain expertise for each document type)

Routing logic design approaches:

There are three approaches to designing a router agent. The first is rule-based routing, which dispatches based on keywords or regular expressions. It is fast and predictable but struggles with complex inputs. The second is LLM-based routing, in which the router agent itself uses an LLM to understand the intent of the input and decide on the routing destination. It handles complex inputs flexibly but increases latency and cost. The third is hybrid routing, which first attempts rule-based routing at high speed and falls back to the LLM only when the result is uncertain. This approach is effective in most production environments.

def router_agent(user_input: str) -> str:
    """Router: analyzes input and routes to the appropriate specialist agent"""
    routing_response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=128,
        system="""Analyze the input and return one appropriate category.
        Categories: [technical, billing, general, language_en, language_fr]
        Return in JSON format: {"category": "..."}""",
        messages=[{"role": "user", "content": user_input}]
    )
    category = parse_category(routing_response.content[0].text)

    specialist_map = {
        "technical": technical_specialist,
        "billing": billing_specialist,
        "general": general_specialist,
    }
    specialist = specialist_map.get(category, general_specialist)
    return specialist(user_input)

Pattern 4: Swarm

The Swarm pattern is a structure in which multiple autonomous agents collaborate to solve problems without a centralized orchestrator. Each agent shares state and can hand off processing to other agents as needed. The swarm as a whole achieves adaptive problem-solving.

graph TD
  SHARED["Shared State\n(context, progress, artifacts)"]

  A1[Agent 1\nCode analysis] <-->|handoff| A2[Agent 2\nTest generation]
  A2 <-->|handoff| A3[Agent 3\nDocumentation generation]
  A1 <-->|handoff| A3
  A1 --> SHARED
  A2 --> SHARED
  A3 --> SHARED
  SHARED --> A1
  SHARED --> A2
  SHARED --> A3

Suitable use cases:

Large-scale code refactoring (each agent independently analyzes and improves a module and writes results to the shared state)
Parallel research (multiple agents investigate different sources simultaneously and aggregate findings in shared state)
Autonomous software testing (test generation, execution, and debugging handled cooperatively by multiple agents)

Connection to Claude Code Level 10:

At Claude Code Level 10 (the advanced agent autonomy level), the Swarm pattern serves as the underlying architecture. When Claude Code spawns multiple sub-agents to perform large-scale repository refactoring or migration in parallel, each sub-agent holds an independent context while coordinating through a shared file system or state store. Understanding the Swarm pattern helps explain how Claude Code’s advanced autonomous behavior works.

class SwarmAgent:
    """Swarm agent: collaborates with other agents through shared state"""

    def __init__(self, name: str, specialization: str, shared_state: dict):
        self.name = name
        self.specialization = specialization
        self.shared_state = shared_state

    def process(self, task: str) -> str:
        # Retrieve current progress from shared state
        context = self.shared_state.get("context", "")

        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            system=f"You are a specialized agent in {self.specialization}. Reference the shared state when working.",
            messages=[{
                "role": "user",
                "content": f"Shared state: {context}\n\nTask: {task}"
            }]
        )

        result = response.content[0].text
        # Update shared state
        self.shared_state[f"{self.name}_result"] = result
        return result

Pattern Selection Framework

To choose the right pattern, evaluate the task across four axes: nature, dependencies, parallelism, and specialization.

graph TD
  START[Receive task] --> Q1{Can the task be\ndecomposed into\nsubtasks?}

  Q1 -->|No| SINGLE[Process with a single agent]
  Q1 -->|Yes| Q2{Are there dependencies\nbetween subtasks?}

  Q2 -->|Yes, sequential dependency| PIPELINE[Pipeline pattern]
  Q2 -->|No, independent| Q3{Is centralized\nmanagement needed?}

  Q3 -->|Yes| Q4{Is specialized\nrouting needed?}
  Q3 -->|No| SWARM[Swarm pattern]

  Q4 -->|Yes, dispatch-based| SPECIALIST[Specialist Pool pattern]
  Q4 -->|No, integration-based| ORCHESTRATOR[Orchestrator-Worker pattern]

Pattern selection criteria table:

Evaluation axis	Orchestrator-Worker	Pipeline	Specialist Pool	Swarm
Task nature	Requires split & integrate	Stepwise transformation	Type-based routing	Large-scale & complex
Subtask dependencies	Parallel, independent	Sequential, order-dependent	Independent (type-dependent)	Dynamic, adaptive
Parallelism	High (parallel workers)	Low (sequential)	Medium (per type)	High (all agents parallel)
Specialization	Medium (generic workers OK)	Medium (per-stage)	High (domain-specific)	Medium to high (adaptive)
Implementation complexity	Medium	Low	Low to medium	High
Recommended model scale	Opus (Orch) + Sonnet (Worker)	Sonnet for all stages	Haiku (Router) + specialized models	Sonnet for all agents

Implementation Considerations

Context Passing Design

How context is passed between agents directly affects the quality of a multi-agent system. The context provided to each agent should be “minimal yet sufficient.” Passing the entire context inflates token consumption, and irrelevant information degrades judgment accuracy. A summary pattern — extracting and summarizing only the necessary information from the upstream result before passing it to the next agent — is effective. Using schema-defined structured data (JSON) for inter-agent communication also prevents parsing errors and improves reliability.

Error Handling and Partial Failure Recovery

Design multi-agent systems with the assumption that individual agents may fail. In the Pipeline pattern, a failure in an upstream agent propagates to all downstream stages, making error detection and fallback processing at each stage essential. In the Orchestrator-Worker pattern, graceful degradation — tolerating partial worker failures and generating a final output from only the successful workers’ results — is important. Concretely, implement retry logic (with exponential backoff), timeout settings, and handling that treats failed worker results as “not retrieved.”

Cost and Latency Trade-offs

Multi-agent systems increase API call volume, making cost and latency management critical. The foundation of cost optimization is using the right model for each role: a fast, low-cost Claude Haiku for the router agent, and Claude Opus for agents handling complex tasks. Combining result caching (preventing recalculation for identical inputs) with parallel execution to reduce overall latency achieves a balance between cost and speed. In production environments, monitor each agent’s execution time, cost, and accuracy to continuously optimize.

Summary

The four multi-agent patterns are:

Orchestrator-Worker: Splits a complex task, executes parts in parallel, and the central orchestrator integrates the results. Suitable for research report creation and complex coding tasks.
Pipeline: Each agent passes its output sequentially to the next, refining the result step by step. Suitable for content production and data transformation flows.
Specialist Pool: A router dispatches input to specialized agents based on the type of input. Suitable for customer support and multilingual processing.
Swarm: Autonomous agents collaborate without centralized management. Suitable for large-scale refactoring and parallel research.

In pattern selection, answering four questions in order — “Can the task be decomposed?”, “What direction do the dependencies flow?”, “Is centralized management needed?”, and “Is specialization required?” — leads to the optimal pattern.

See the references for the external specifications and background sources used on this page.[1][2]

References

Anthropic, Claude Code documentation
Anthropic, Claude API documentation

Quiz

Anthropic's Safety Philosophy and Claude Design

Claude Code × MCP Integration Guide