Multi-Agent Systems: Orchestration Topologies

When one agent is not enough. Four canonical topologies, the specialisation argument, the context-window economics, and the failure modes that recur in production.

Definition

Multi-agent system

Any system in which two or more agent processes coordinate to accomplish a goal. Each agent has its own LLM context, its own tool set, and (in most production designs) a specialised role: planner, researcher, coder, reviewer, summariser. The agents communicate by message-passing, by reading and writing a shared state, or by some combination.

Four canonical topologies

The space of multi-agent topologies is large, but four shapes recur across the literature and the framework documentation. Each has different communication overhead, different failure profiles, and different debugging characteristics.

Supervisor / Hierarchical

One supervisor agent delegates to specialist sub-agents and aggregates results. LangGraph's supervisor pattern and AutoGen's GroupChatManager are the canonical implementations.

Peer / Mesh

Agents communicate as equals; no single orchestrator. CrewAI Crew with no manager role, AutoGen peer chat. Higher coordination cost; useful when no obvious supervisor structure exists.

Pipeline / Sequential

Agents arranged in a fixed sequence. Each consumes the previous agent's output. Predictable and easy to audit; loses the flexibility advantages of dynamic delegation.

Blackboard / Shared-Memory

Agents read from and write to a shared data structure rather than messaging directly. The blackboard is the coordination mechanism. Common when no fixed sequence exists.

Specialisation vs generality

Splitting a task across multiple specialist agents helps when the specialists genuinely have different capabilities (different tool sets, different system prompts, different model choices), and when the task naturally decomposes into independent sub-tasks. It hurts when the agents largely duplicate each other’s reasoning, when the coordination overhead exceeds the parallelism gain, or when the boundary between agents creates unclear responsibility for end-to-end correctness.

A working test: if you can describe each agent’s role in one sentence and that sentence is meaningfully different from every other agent’s role, the decomposition is probably worth it. If the roles overlap, consolidate.

Coordination mechanics

Three coordination mechanisms appear across implementations. Message-passing is the most common: each agent sees the messages directed to it and emits messages to others. Shared state (a blackboard or a typed state object) is preferred when many agents need access to the same data and direct messaging would be noisy. Turn-taking protocols (round-robin, supervisor-decided next-speaker, voting) determine who acts next when more than one agent is eligible.

Failure handling is harder than in single-agent systems. When one agent stalls, fails, or returns garbage, the others need a way to detect this and either retry, route around, or escalate. Production multi-agent systems typically have an explicit timeout per agent and a supervisor or watchdog process that monitors progress.

The context-window economics

A pragmatic argument for multi-agent systems is rarely advertised: splitting work across agents is often the cheapest way to keep each LLM call inside the context window where models perform reliably. A single 300,000-token prompt to one agent is more error-prone and more expensive than five 60,000-token prompts to five specialists, even though the total token count is comparable. Models routinely degrade in long-context performance even when the full prompt nominally fits.

This means many production multi-agent designs are not motivated by “different specialists” in any sociological sense. They are motivated by context-window engineering. The sociological framing is downstream of the engineering reality.

Failure modes

Infinite delegation loops. Agents pass the same task back and forth without terminating. Mitigation: hop limits, an explicit termination criterion at the supervisor.
Context drift. Downstream agents lose earlier agents’ reasoning. The fifth agent in a pipeline does not remember why the first agent made an early decision. Mitigation: structured state objects that propagate key decisions explicitly.
The “everyone is polite” problem. Reviewer agents tend to agree with author agents. The literature on LLM-as-judge consistently reports that critique-only roles produce weaker disagreement than expected. Mitigation: explicit instruction to critique adversarially, or use a different model for the critic role.
Coordination overhead. Communication tokens between agents can outweigh the per-agent reasoning tokens. Mitigation: structured messages instead of free-form chat, agent-to-agent protocols rather than broadcast.

Framework implementations

LangGraph supervisor pattern - graph-based supervisor that routes to specialist sub-graphs. docs
AutoGen GroupChatManager - conversational multi-agent with configurable next-speaker selection. docs
CrewAI Crew - role-based crew model with sequential or hierarchical processes. docs
OpenAI Swarm / Agents SDK handoffs - handoff-based coordination between specialist agents. docs
Anthropic multi-agent samples - reference implementations published with the Claude Agent SDK. cookbook

Frequently asked questions

When does a multi-agent system beat a single-agent ReAct loop?

When the task naturally decomposes into specialist sub-tasks (different tool sets, different system prompts, different context-window budgets), or when sub-tasks can run in parallel. Splitting one large agent into several smaller specialists lets each operate inside a smaller, cleaner context window, which often improves reliability. The cost is coordination overhead: agents have to communicate, and that communication is itself token-spend.

What is the most common topology in production?

Supervisor (hierarchical). One supervisor agent receives the user goal, decides which specialist sub-agent to delegate to, and aggregates the results. LangGraph supervisor patterns and AutoGen GroupChatManager both implement this shape. Peer/mesh topologies appear in research papers but are rarer in production systems because the lack of a single decision-maker makes failure handling harder.

Are multi-agent systems just a way around context-window limits?

Frequently, yes. The pragmatic argument for multi-agent is often economic: splitting a 200,000-token task across four 50,000-token agents costs roughly the same in tokens but is more reliable, since each sub-agent operates inside a context size where models perform best. The architectural framing “specialist agents collaborating” and the engineering reality “context-window economics” arrive at similar designs.

What is an infinite delegation loop?

A failure mode where two agents pass the same task back and forth indefinitely. Agent A delegates to Agent B; Agent B decides Agent A is better suited and delegates back. Mitigations: hop limits on delegation, an explicit termination criterion at the supervisor, or a tie-breaking rule that prevents loops.

Related references

ReAct - the typical inner loop of each specialist
Plan-and-Execute - related; supervisor as planner
Orchestration tooling - frameworks that support multi-agent
Honest Limitations - failure modes specific to multi-agent

Sources and Further Reading

Q. Wu et al., AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, arXiv:2308.08155 (2023).
L. Wang et al., A Survey on LLM-based Autonomous Agents, arXiv:2308.11432 (2023).
Z. Xi et al., The Rise and Potential of LLM-Based Agents, arXiv:2309.07864 (2023).
LangGraph, multi-agent concepts.
Microsoft AutoGen, documentation.
CrewAI, documentation.
OpenAI Agents SDK, repository.