How AI Agents Work: The Four-Stage Loop and Its Components

A reference treatment of the perceive-reason-act-evaluate cycle and the five components every working LLM agent contains, sourced to the foundational papers and the public framework documentation.

The four-stage loop

Every LLM agent that does useful work runs the same skeleton loop. The naming varies (perceive-think-act, observe-reason-act-reflect, sense-plan-act-evaluate) but the structure is the four-stage cycle described below. The loop terminates when the goal is reached, an iteration cap or token budget is exhausted, or the agent reports failure.

Figure 1. The four-stage agent loop. Adapted from Russell and Norvig (2020) and Yao et al. (2022).

Perceive

The agent reads its inputs: the user goal, the tools available, the conversation history, retrieved context (vector-store hits, knowledge-base entries), and any prior step results. In a chatbot turn, perception is just the new message. In an agent loop, it is the structured state object the orchestration framework hands the language model on each iteration.

Reason

The language model chooses the next step. Depending on the architectural pattern, this might be a free-form Thought trace (ReAct), an explicit plan emitted in JSON (planner-executor), or a structured tool call (function calling). The reasoning step is where the model translates the goal and the current state into a concrete next action.

Act

The host application executes the action the model chose. For a tool call, this means invoking an API, running code in a sandbox, querying a database, or reading a file. For a multi-agent system, the action might be delegating to a sub-agent. The result, whatever it is, gets serialised and added to the agent’s state for the next perception step.

Evaluate

The agent (or the orchestration framework around it) decides whether the goal has been met. In simple loops, this is a string match on the model’s output. In production systems, it can be a separate LLM call (LLM-as-judge), a deterministic validator (the schema check passed, the test suite ran green), or a human review step. If the goal is not yet met, the loop returns to perception with the new state.

Five core components

Anthropic’s Building effective agents describes a working agent as a language model in a loop with tools and a way to remember state. The same breakdown appears across LangGraph, AutoGen, and CrewAI documentation, with five recurring components.

1. The reasoning engine (LLM)

The foundation model provides three capabilities the agent depends on: instruction-following (the model takes a system prompt and adheres to it), tool selection (the model picks the right tool for a sub-task), and structured output (the model emits arguments in a parseable form). The major model families practitioners use as of 2026 are Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and DeepSeek. They differ on tool-use reliability, context-window size, latency, and cost; benchmarks for each are published openly by their providers and tracked by independent benchmarks (see the companion site benchmarkingagents.com).

2. Tools

A tool is any callable function the agent can invoke. Practitioners expose tools to the model with a name, a description, and a parameter schema; the model decides when to call which tool, and the host code executes the call. The two prevailing approaches are structured function-calling (OpenAI-style, the model emits JSON; the host parses and dispatches) and freeform tool use (the model emits a natural-language Action trace; the host parses with regex or constrained decoding). The tool-use pattern reference covers the trade-offs in depth.

3. Memory

Agents need to retain state across steps. The recurrent taxonomy is three-tiered: short-term (the current context window, ephemeral), long-term (persistent records of past interactions, often a structured log), and semantic (retrievable knowledge, typically stored in a vector database). RAG sits in the semantic tier. The memory tooling reference covers each tier and the failure modes specific to it.

4. Planning

Planning is how the agent decomposes a goal into ordered sub-tasks. ReAct does this implicitly, one step at a time. Planner-executor does it explicitly, emitting a full plan up front and then executing each step. Multi-agent systems often have a supervisor whose only job is planning. The patterns index covers the design space.

5. Self-evaluation

The reflection family of patterns gives the agent a structured way to critique its own output and revise. Self-Refine (Madaan et al. 2023) is single-model iterative improvement; Reflexion (Shinn et al. 2023) maintains an explicit memory of past failures. Reflection raises token cost in exchange for higher first-pass quality on reasoning-heavy tasks. The reflection pattern reference covers the family.

The orchestration layer

Production agents almost always run inside an orchestration framework. The framework supplies the parts of the system that are not the language model: state management between steps, retry logic on tool failure, observability hooks (every LLM call and tool call becomes a span in a trace), checkpointing for long-running tasks, human-in-the-loop interrupts, and streaming. A demo agent can be a hundred-line Python script. A production agent that survives restart, reports its cost, and lets a human pause execution at a critical step needs an orchestration layer underneath.

The major frameworks practitioners use as of 2026, with documentation links, are catalogued in the orchestration tooling reference. This site does not rank them.

What changed to make this practical

Three concurrent maturations moved agents from research demos to production deployments. Each is verifiable from public sources rather than analyst reports.

The first is reliable structured tool calling. OpenAI shipped function calling in mid-2023; Anthropic followed with native tool use; Google added function calling to Gemini. The reliability of the JSON arguments these APIs return rose sharply across model generations, as documented in each provider’s changelog and benchmarked publicly on tau-bench, BFCL, and similar tool-use evaluations.

The second is context-window growth. Production-grade agents routinely run with 200,000-token contexts as of 2026; some providers offer million-token contexts. This changed what an agent could keep in working memory without retrieval.

The third is the publication of mature orchestration frameworks. LangGraph (LangChain), Microsoft AutoGen, CrewAI, OpenAI’s Agents SDK, and Anthropic’s Claude Agent SDK each shipped stable releases in 2024-2025. The release histories, including breaking-change notes, are public on each project’s repository.

Where the public literature contains a quantitative claim that cannot be sourced to one of these channels, this site omits it. The Stanford AI Index Report’s technical-performance and cost chapters remain the cleanest publicly accessible reference for capability and cost trends over time.

Frequently asked questions

What is the basic loop an AI agent runs?

Most production AI agents implement a variation of the same four-stage cycle: perceive (read inputs and the current state), reason (decide a next step using the language model), act (call a tool, query an API, write a file), and evaluate (check whether the result meets the goal). The loop repeats until the goal is reached, an iteration cap is hit, or the agent reports failure. The Russell and Norvig textbook treatment of intelligent agents (4th ed., Ch. 2) frames this as the perceive-act cycle; modern LLM agents add the reason and evaluate steps explicitly.

What are the core components of an AI agent?

Five components recur in the literature and in framework documentation: a language model that performs reasoning, a set of callable tools, a memory subsystem, a planning mechanism (explicit or implicit), and an evaluation step. Anthropic’s public guidance on building effective agents and the LangGraph documentation describe the same five-component breakdown using slightly different vocabulary.

Is an agent the same as a workflow that calls an LLM?

No. The distinction Anthropic draws in Building effective agents is between workflows (where the language model is one node in a pre-written sequence) and agents (where the language model itself decides which steps to take and in what order). A pipeline that calls an LLM to summarise a document is a workflow. A system that decides whether to summarise, search, or ask a clarifying question is agentic.

Do agents always need an orchestration framework?

For a demonstration, no. A few hundred lines of Python implementing a while-loop around an LLM call is enough to run a basic ReAct agent. For a production deployment, an orchestration framework usually pays for itself: state checkpointing on long tasks, retry logic, observability hooks, human-in-the-loop interrupts, and persistence are painful to hand-roll. See the orchestration tooling reference for the category vocabulary.

Related references

Agent vs Assistant vs RPA vs Chatbot vs Copilot - the distinctions that matter
Architectural patterns index - five canonical design patterns
Memory and RAG - the three-tier memory taxonomy
Orchestration tooling - the framework category
How to build - architecture-level build guide

Sources and Further Reading

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed., Pearson, 2020. Ch. 2, “Intelligent Agents”.
S. Yao, J. Zhao, D. Yu et al., ReAct: Synergizing Reasoning and Acting in Language Models, arXiv:2210.03629 (2022).
L. Wang et al., A Survey on Large Language Model based Autonomous Agents, arXiv:2308.11432 (2023).
Anthropic, Building effective agents (December 2024).
OpenAI, Function calling guide.
Anthropic, Tool use documentation.
Stanford HAI, AI Index Report 2025.
LangGraph, documentation.
Microsoft AutoGen, documentation.
CrewAI, documentation.
OpenAI Agents SDK, repository.