ReAct: Reasoning and Acting in AI Agents
The most-cited pattern in LLM-agent literature, introduced in 2022 and now the default building block for general-purpose agents.
A prompting pattern, introduced by Yao et al. in 2022, in which a large language model is prompted to alternate between Thought (reasoning steps expressed in natural language) and Action (tool calls) traces. After each action, the model observes the result and continues the loop until the goal is reached.
The key claim of the original paper is that interleaving reasoning with environment interaction outperforms either reasoning-only baselines (chain-of-thought) or acting-only baselines (direct tool use without intermediate reasoning) on multi-hop question answering, fact verification, and interactive decision-making benchmarks [Yao et al. 2022].
The loop
A ReAct agent runs the same three-step cycle until termination: emit a Thought, emit an Action, observe the result. The Thought is a natural-language reasoning trace; the Action is a structured tool call; the Observation is whatever the host application returns from executing the tool.
Worked example
A research-synthesis agent asked to summarise the open questions in the Reflexion line of work might produce a trace structurally like:
Thought: I need the Reflexion paper and any follow-up work that critiques it.
Action: search_papers("Reflexion language agent verbal reinforcement")
Observation: 6 results, top result is Shinn et al. 2023 (arXiv:2303.11366).
Thought: Read the paper to extract the central claim and the named limitations.
Action: read_paper("arXiv:2303.11366", sections=["abstract","limitations"])
Observation: Limitations section flags reward-signal noise on open-ended tasks.
Thought: Now find a follow-up that engaged with that limitation.
Action: search_citations("arXiv:2303.11366", filter="critique")
Observation: 3 follow-up papers, two within scope.
Thought: I have enough to summarise. Compose the final answer.
Action: finish(summary)Trace shown in conceptual form. Real framework outputs (LangChain AgentExecutor, LangGraph, AutoGen) use varying serialisation formats.
When it works
The original paper benchmarked ReAct on HotPotQA (multi-hop QA), FEVER (fact verification), ALFWorld (text-based household tasks), and WebShop (web navigation). On HotPotQA and FEVER it outperformed chain-of-thought and acting-only baselines; on ALFWorld and WebShop it outperformed imitation-learning and reinforcement-learning baselines by absolute margins of 10-34 percentage points, depending on the task [Yao et al. 2022].
The pattern empirically suits tasks that combine three properties: the answer requires multiple steps; intermediate observations meaningfully change the next step; and a verbal reasoning trace materially helps the model select the right action. Multi-hop research, exploratory data analysis, browser-based task execution, and interactive coding agents all share this profile.
When it fails
The infinite-thought trap
The model emits Thought after Thought without ever calling an Action. This often appears when the prompt over-encourages reflection (“think step by step before acting”), when no available tool fits the task, or when the model decides early that it already knows the answer and refuses to verify. Hard-cap iterations and require an Action after every Thought in the system prompt.
Tool-call hallucination
The model invents a tool that does not exist (calling database_query when the actual tool is sql_query) or calls a real tool with malformed arguments. The reliability of structured tool calling has improved substantially across model generations, but tool-call hallucination remains a live engineering concern. Mitigations: strict schema validation, retry-with-error-feedback, and tool-use unit evals (see the evaluation reference).
Observation bias
The model over-weights its most recent observation and forgets earlier context, particularly on long traces. This is a manifestation of the lost-in-the-middle problem in long-context models. Mitigations include periodic summarisation of the trace and structured state objects that lift important data out of free-form prose.
Long-chain degradation
Performance on ReAct traces with many iterations degrades faster than the underlying model’s context-length budget would suggest, even when the full trace fits in context. Tree-of-Thoughts (Yao et al. 2023) and Reflexion (Shinn et al. 2023) were both proposed in part as mitigations for this regime; see the reflection pattern for the latter.
Framework implementations
The major orchestration frameworks all expose ReAct-style execution as a primitive. Documentation links are provided for reference, not as ranked recommendations.
- LangGraph (LangChain) - graph-based state-machine model in which a ReAct loop is one type of node configuration. docs
- LangChain AgentExecutor (legacy in 2026) - the original ReAct implementation on which much of the literature is benchmarked. docs
- AutoGen (Microsoft Research) - conversational multi-agent model; ReAct emerges as the inner loop of a single AssistantAgent. docs
- CrewAI - role-based crew model; each role can be configured with ReAct-style execution. docs
- OpenAI Agents SDK - handoff-based coordination; the inner loop is ReAct-style with structured function calls. docs
Relationship to other patterns
ReAct is the foundational pattern. Plan-and-execute is the explicit-planning alternative for tasks where the plan is stable. Reflection wraps an outer critique-and-revise loop around a ReAct (or any other) inner loop. Multi-agent systems often have ReAct as the inner loop of each specialist sub-agent. Tool use is the underlying mechanism that all of these patterns assume; see the tool-use reference.
Frequently asked questions
What does ReAct stand for?
ReAct is a contraction of Reasoning + Acting. The pattern was introduced by Yao et al. in 2022 in the paper “ReAct: Synergizing Reasoning and Acting in Language Models” (arXiv:2210.03629).
How is ReAct different from chain-of-thought prompting?
Chain-of-thought asks the model to reason in natural language before producing an answer; the model never interacts with an environment. ReAct interleaves the same kind of reasoning with action steps that affect the world (calling tools, querying APIs) and observations of the results, grounding subsequent reasoning in real outcomes rather than internal predictions.
When does ReAct beat plan-and-execute?
When the plan is not knowable up front. Multi-hop research, web navigation, and exploratory data analysis all reward step-by-step adaptation. Plan-and-execute commits to a plan in one LLM call; ReAct revises after every observation. The trade-off is cost: ReAct makes more LLM calls per task.
What is the infinite-thought trap?
A failure mode where a ReAct agent keeps emitting Thought traces without ever acting. The model reasons in circles, often when the prompt over-encourages reflection or when the available tools are insufficient for the task. Mitigations are an iteration cap (hard limit on Thought-Action cycles) and a system-prompt instruction that requires an action after each thought.
- Patterns Index
- Plan-and-Execute - the explicit-planning alternative
- Reflection / Reflexion - the critique-and-revise wrapper
- Tool Use / Function Calling - the underlying mechanism
- Orchestration tooling - the frameworks that implement ReAct
Sources and Further Reading
- S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan and Y. Cao, ReAct: Synergizing Reasoning and Acting in Language Models, arXiv:2210.03629 (2022).
- N. Shinn et al., Reflexion: Language Agents with Verbal Reinforcement Learning, arXiv:2303.11366 (2023).
- S. Yao et al., Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arXiv:2305.10601 (2023).
- L. Wang et al., A Survey on LLM-based Autonomous Agents, arXiv:2308.11432 (2023).
- LangGraph, Agentic concepts documentation.
- Microsoft AutoGen, documentation.
- Anthropic, Building effective agents (2024).