III.1. Tooling / Orchestration

Last verified April 2026 - 9 sources

Orchestration Frameworks for AI Agents

A category reference. What orchestration frameworks provide, the vocabulary you need to evaluate them, and the major frameworks as of 2026 with documentation links rather than rankings.

What orchestration provides

An orchestration framework supplies the parts of an agent system that are not the language model. Seven capabilities recur across the major frameworks. Different frameworks emphasise different combinations.

State management. A typed object that persists across agent steps; used by the model on each iteration to know what has already happened.
Retries and error handling. Tool failures, malformed JSON, transient API errors. The framework retries with backoff or escalates per a configured policy.
Observability hooks. Every LLM call, tool call, and sub-agent hop becomes a span in a trace. Without this, debugging a production agent is largely guesswork.
Human-in-the-loop interrupts. The ability to pause execution, surface the current state to a human, and resume after the human approves or modifies. Critical for high-stakes actions.
Streaming. Token-by-token output from the model; framework-supported pass-through to the user interface.
Parallelism. Concurrent tool calls, parallel sub-agents, fan-out queries.
Persistence and checkpointing. The agent’s state is serialised so a restart resumes where it left off rather than starting over.

Why you probably need it

The honest case for hand-rolling: a demo agent fits in a single Python file with a while-loop around an LLM call. The honest case for a framework: a production agent needs to recover after a process restart, report its token cost per task, let a human inspect and intervene at a critical step, and trace every call for debugging. Each of those capabilities is months of work to build well; frameworks ship them as defaults.

A working heuristic: if the agent will be deployed in front of users, run unattended overnight, or take actions that cannot be cheaply rolled back, use a framework. If it is a tool for the engineer who wrote it, hand-rolling is fine.

Category vocabulary

Terms that recur in framework documentation. Knowing them makes any framework’s docs readable in under an hour.

Graph vs sequential vs role-based execution. The shape of the agent runtime: a state graph, a fixed sequence, or a set of roles that exchange messages.
Stateful vs stateless workflows. Whether the framework persists state across executions or treats each run as fresh.
Checkpointing. Saving state at points where execution can resume. Often paired with a backing store (SQLite, Postgres, Redis).
Human-in-the-loop interrupt. A first-class pause point in execution where a human reviews the next action.
Typed vs untyped state. Whether the framework enforces a schema on the state object (e.g. Pydantic models) or treats state as a free dictionary.
Streaming vs batch. Whether the framework supports token-by-token output to the caller.

The major frameworks

A reference list, not a ranking. Each entry is a one-paragraph description of the framework’s execution model with a documentation link. None of these are recommended over the others on this page; selection depends on team fit, language preference, and existing stack.

LangChain ecosystem

LangGraph

Graph-based state-machine model. Each node is a function that reads and writes typed state; edges describe transitions. Strong support for human-in-the-loop interrupts and checkpointing.

documentation

Microsoft Research

AutoGen

Conversational multi-agent model. Agents exchange messages; a GroupChatManager controls turn order. Originated as a Microsoft Research project.

documentation

Independent open source

CrewAI

Role-based crew model. Each agent is configured with a role, a goal, and a backstory; processes are sequential or hierarchical. Python-first.

documentation

OpenAI

OpenAI Agents SDK

Handoff-based coordination. Agents pass control to other agents via explicit handoff calls. Successor to the experimental Swarm project.

documentation

LlamaIndex

LlamaIndex Agent Workflows

Retrieval-first framework with agent support. Workflow primitives suit retrieval-heavy and document-processing tasks.

documentation

Microsoft

Microsoft Semantic Kernel

Planner-and-skills model with first-class .NET, Java, and Python support. Strong fit for enterprise stacks running on Microsoft platforms.

documentation

Pydantic

Pydantic-AI

Type-driven agent framework with Pydantic validation throughout. Static-typing-friendly; the schema is the contract.

documentation

deepset

Haystack Agents (deepset)

Retrieval-centric pipeline model with agent support. Strong fit for RAG-first applications evolving toward agentic execution.

documentation

Choosing a framework (principles, not rankings)

Six questions, asked in this order, get most teams to a defensible choice without the buyer’s remorse common to category-shopping decisions.

Does your team think in graphs or in conversations? Graphs are testable; conversations are flexible. Pick the model that matches your team’s mental model of the system.
Is type-safety important? If you would pay a moderate boilerplate tax for compile-time guarantees on state shape, type-driven frameworks (LangGraph with typed state, Pydantic-AI) are worth the friction.
What stack are you already on? A team running Microsoft .NET will integrate Semantic Kernel more cheaply than they will integrate a Python-first framework. A team on a Python LangChain stack will adopt LangGraph faster.
How much production runtime is the framework targeting? Some frameworks emphasise development ergonomics; others emphasise production runtime characteristics. Check the deployment story before committing.
What is the human-in-the-loop story? If the agent will take destructive actions, the framework’s interrupt model is the most consequential feature it ships.
Is the framework actively maintained? The category is moving fast; an unmaintained framework is a liability. Check release cadence, open-issue triage, and breaking-change policy.

This site does not publish a ranked recommendation. The category churns too quickly for any ranking to remain accurate.

Frequently asked questions

Do I need an orchestration framework for an agent?

For a demo, no; a few hundred lines of Python wrapping an LLM call is enough to run a basic ReAct agent. For production, a framework usually pays for itself: state checkpointing on long tasks, retry logic on tool failure, observability hooks, human-in-the-loop interrupts, and persistence are painful to hand-roll. Most teams find that the time saved on infrastructure exceeds the time spent learning the framework.

What is the difference between graph-based and conversational frameworks?

Graph-based frameworks (LangGraph) model an agent as a state machine where each node is a function and edges are transitions. The execution path is deterministic given the state. Conversational frameworks (AutoGen) model an agent as a participant in a multi-turn chat; the next speaker is chosen by a manager or a routing rule. Graph-based suits workflows with clear states; conversational suits free-form delegation between specialists.

Is graph-based always better than conversational?

No. Graphs are easier to test and audit; conversations are easier to extend. A team that values determinism and inspectability will lean graph-based; a team that values flexibility and rapid iteration will lean conversational. Both are mature in 2026; the choice is a team-fit decision rather than a technical-superiority decision.

Can I switch frameworks later?

Switching is possible but expensive. The system prompts, the tool definitions, the evaluation harness, and most of the code transfer; the orchestration scaffolding does not. If you anticipate switching, keeping the prompts and tools in framework-agnostic form (separate files, not framework objects) makes the migration substantially cheaper.

Related references

Observability tooling
Evaluation tooling
Memory and RAG
How to Build - architecture-level build guide
Build vs Buy - decision framework

Sources and Further Reading

LangGraph, documentation.
Microsoft AutoGen, documentation.
CrewAI, documentation.
OpenAI Agents SDK, repository.
LlamaIndex, documentation.
Microsoft, Semantic Kernel documentation.
Pydantic-AI, documentation.
deepset Haystack, documentation.
Anthropic, Building effective agents (2024).