A minimal command-line ReAct agent in Python: an LLM gets a natural-language goal, decides on its own whether to look something up or do a calculation, calls the matching tool, observes the result, and loops until it can answer — without us hard-coding the control flow.
ReAct = Reason + Act. The model interleaves thoughts (reasoning traces) with actions (tool calls), then reads observations (tool results) and decides what to do next. See Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023.
By reading the code and running the agent, you will see, end-to-end:
- The anatomy of an agent: LLM (brain) + Tools (hands) + Memory (conversation history) + Planning (the ReAct loop).
- How structured output formats (here: a strict
Thought / Action / Action Inputblock) let you treat an LLM as a programmable component. - Why a terminator tool (
final_answer) is a clean way to express "the agent is done" — no special control tokens needed. - The two most important guardrails for any agent: a hard
step budget and a safe tool-execution layer (no
eval()). - How an agent's trace (Thought → Action → Observation) makes its reasoning auditable in a way a single LLM call never is.
This PoC was generated by pasting the following prompt into VS Code's Copilot Chat in Agent mode. It is reproduced verbatim from the lecture slides so you can paste it yourself and compare.
Build a command-line ReAct agent in Python:
agent.pyplustools.py. Intools.pyexpose a dictTOOLSwith three callables:calculator(expr)(safe eval on arithmetic only),search(query)(lookup in a small in-memory dictionary KB with at least 5 entries, e.g."capital of france" -> "Paris"), andfinal_answer(text)(returns the text). Inagent.pyimplementrun_agent(goal, max_steps=6): a loop that calls the Anthropic API (claude-sonnet-4-6, key fromANTHROPIC_API_KEY) with a system prompt forcing the format"Thought:.../ Action:.../ Action Input:..."; parse the response with regex; execute the named tool; append the Observation back into the conversation history; stop whenfinal_answeris invoked ormax_stepsis reached. Print every step. At the bottom, anif __name__ == "__main__"block that runs the agent on two example goals. Providerequirements.txt(anthropic),.gitignore, and a README with the run command.
📝 Deviations from the prompt:
- Uses OpenRouter (
anthropic/claude-sonnet-4) instead of native Anthropic, for one-key access to many models.- The calculator uses an AST walk with an allow-list of node types — it never calls Python's
eval(). This matters: any tool you give an LLM is a tool a hostile prompt can try to abuse.- Adds
stop=["Observation:"]to the LLM call so the model can't hallucinate its own observation.
flowchart LR
GOAL["🎯 User goal"] --> LOOP
subgraph LOOP["🔁 ReAct loop (max_steps = 6)"]
direction TB
T["Thought:<br/>reason about<br/>next step"] --> A["Action:<br/>pick a tool"]
A --> AI["Action Input:<br/>argument string"]
AI --> EXEC["runtime executes<br/>TOOLS[action](input)"]
EXEC --> OBS["Observation:<br/>tool result"]
OBS -- "append to<br/>conversation" --> T
end
LOOP -- "Action == final_answer<br/>OR step budget hit" --> ANS["💡 Final answer"]
subgraph TOOLS["🧰 Toolbox (tools.py)"]
direction TB
CALC["calculator(expr)<br/>safe AST eval"]
SRCH["search(query)<br/>toy in-memory KB"]
FA["final_answer(text)<br/>terminator"]
end
EXEC -.dispatch.-> CALC
EXEC -.dispatch.-> SRCH
EXEC -.dispatch.-> FA
style GOAL fill:#dbeafe,stroke:#1e40af
style ANS fill:#bbf7d0,stroke:#15803d
style TOOLS fill:#fef3c7,stroke:#b45309
style LOOP fill:#f3e8ff,stroke:#6b21a8
The LLM is not in the loop. It is called by the loop. Each iteration is a fresh call with the growing conversation history. This is what makes the trace deterministic to read (and easy to debug).
tools.py — the agent's hands
| Symbol | Responsibility | Notes |
|---|---|---|
_safe_eval(node) |
Walk a Python AST, allowing only arithmetic node types and literal numbers. Raises on anything else. | The single-most-important security boundary in this PoC. |
calculator(expr) |
Parse expr with ast.parse(..., mode="eval") and pass to _safe_eval. Returns a string. |
Returns "ERROR: …" on bad input — never raises into the agent loop. |
_KB |
Small Python dict acting as a knowledge base (capitals, fun facts). | Easy to extend. |
search(query) |
Case-insensitive substring lookup in _KB. |
Returns a clear NOT FOUND message with available keys, so the model can recover. |
final_answer(text) |
Returns text unchanged. |
Acts as the loop terminator. |
TOOLS (dict) |
Maps tool names (the strings the LLM emits) to callables. | The contract between LLM and runtime. |
TOOL_DESCRIPTIONS (dict) |
Human-readable descriptions injected into the system prompt. | Lets the LLM know what each tool does and what input it expects. |
agent.py — the ReAct loop
| Section | Symbol | Responsibility |
|---|---|---|
| Constants | OPENROUTER_BASE_URL, LLM_MODEL, MAX_STEPS |
Provider URL, model slug, and step budget. |
| System prompt | SYSTEM_PROMPT |
Generated dynamically from TOOLS + TOOL_DESCRIPTIONS so adding a tool requires no prompt edits. Forces the strict output format. |
| Parser | _ACTION_RE, _parse(response) |
A single regex extracts (thought, action, action_input). Returns None if the model deviates from the format. |
| Client | _client() |
Reads OPENROUTER_API_KEY from .env, returns an OpenAI instance pointed at OpenRouter. |
| Loop | run_agent(goal, max_steps=6) |
Implements the ReAct cycle: call LLM → parse → dispatch tool → log → append observation → repeat until final_answer or budget exhausted. |
| Entry point | if __name__ == "__main__" |
Two sample goals, or override with CLI args. |
sequenceDiagram
participant Loop as run_agent()
participant LLM as OpenRouter LLM
participant Tool as TOOLS[action]
Loop->>LLM: messages (system + history)
Note over LLM: stop=["Observation:"]<br/>so the model never<br/>hallucinates a result
LLM-->>Loop: "Thought: …<br/>Action: search<br/>Action Input: capital of france"
Loop->>Loop: regex parse → (thought, action, input)
Loop->>Tool: TOOLS["search"]("capital of france")
Tool-->>Loop: "Paris"
Loop->>Loop: print step (audit trail)
Loop->>Loop: append assistant turn + "Observation: Paris"
alt action == "final_answer"
Loop-->>Loop: STOP, return text
else more steps allowed
Note over Loop: next iteration
end
cd poc3_react_agent
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # then edit .env: OPENROUTER_API_KEY=sk-or-v1-…Get a free OpenRouter key at https://openrouter.ai/keys.
# Run with the two built-in example goals
python agent.py
# Or pass your own
python agent.py "What is the capital of Japan, and what is 8 squared?"The agent's "knowledge" lives in a tiny dictionary _KB at the top of
tools.py. The full set of facts the search tool can find:
| Key | Value |
|---|---|
capital of france |
Paris |
capital of germany |
Berlin |
capital of japan |
Tokyo |
capital of brazil |
Brasília |
capital of australia |
Canberra |
speed of light |
299,792,458 m/s |
pi |
3.14159265 |
founder of microsoft |
Bill Gates and Paul Allen |
author of 1984 |
George Orwell |
Anything else returns NOT FOUND plus the list of known keys — which
the agent then has to handle gracefully (Test 4 below).
Copy these one by one and read the printed traces. Each query is designed to exercise a different ReAct behaviour.
| # | Command | Behaviour you should see |
|---|---|---|
| 1 | python agent.py "Who wrote 1984?" |
2 steps: search → final_answer. Pure lookup. |
| 2 | python agent.py "What is 17 * 23 + 5?" |
2 steps: calculator → final_answer. Pure math. |
| 3 | python agent.py "What is the capital of France, and what is twice the number of letters in its name?" |
3 steps: search (Paris) → calculator (2 * 5 = 10) → final_answer. The headline multi-tool, multi-step demo. |
| 4 | python agent.py "What is the capital of Japan, and what is 8 squared?" |
3 steps: search (Tokyo) → calculator (8 ** 2 = 64) → final_answer. |
| 5 | python agent.py "What is the capital of Mars?" |
search returns NOT FOUND → agent calls final_answer admitting Mars has no capital. Must NOT hallucinate. |
| 6 | python agent.py "Compute __import__('os').system('echo hacked')" |
calculator returns ERROR: Disallowed expression: …. Safety boundary — if you ever see hacked printed, the allow-list has been broken. |
💡 Read traces top-to-bottom. Every
Observation:you see was produced by Python code (the tool), not the LLM. EveryThought:andAction:was produced by the LLM. That separation is what makes the agent auditable.
Edit _KB in tools.py and restart:
_KB = {
# …existing entries…
"capital of mars": "Mars has no capital — it is uninhabited.",
"boiling point of water": "100 °C at sea level.",
}Then run python agent.py "What is the boiling point of water?" to see
the new fact picked up.
The point of these tests is not just "did it answer?" — it's to read the printed trace and confirm the agent is doing the right thing mechanically.
Run:
python agent.py "Who wrote 1984?"Expected trace shape:
[step 1] Thought: I need to look this up.
[step 1] Action: search
[step 1] Action Input: author of 1984
[step 1] Observation: George Orwell
[step 2] Thought: I have the answer.
[step 2] Action: final_answer
[step 2] Action Input: 1984 was written by George Orwell.
✅ The agent should call search exactly once, then final_answer.
2 steps total.
python agent.py "What is 17 * 23?"✅ One calculator call (Action Input: 17 * 23), one final_answer.
2 steps total.
python agent.py "What is the capital of France, and what is twice the number of letters in its name?"Verified output (Claude Sonnet 4 via OpenRouter):
[step 1] Thought: I need to find the capital of France first…
[step 1] Action: search
[step 1] Action Input: capital of france
[step 1] Observation: Paris
[step 2] Thought: Paris has 5 letters (P-a-r-i-s). I'll compute 2 * 5.
[step 2] Action: calculator
[step 2] Action Input: 2 * 5
[step 2] Observation: 10
[step 3] Action: final_answer
[step 3] Action Input: The capital of France is Paris, and twice the
number of letters in its name is 10.
✅ This is the canonical ReAct pattern: the result of step 1 (Paris)
informs the argument of step 2 (5 letters → 2 * 5).
python agent.py "What is the capital of Mars?"Verified behaviour:
- Step 1:
search('capital of mars')returnsNOT FOUND: 'capital of mars'. Known keys: …(whole-word matching prevents false hits likepimatchingcaPItal). - Step 2: the agent reads the available keys, realises Mars isn't there,
and calls
final_answeradmitting Mars has no capital city.
✅ The agent must not hallucinate "Olympus Mons" or similar.
Edit MAX_STEPS = 2 in agent.py and rerun Test 3.
✅ The trace should stop after step 2 with
=== STOPPED: max_steps=2 reached ===. This proves the budget is the
last line of defence against runaway loops.
python agent.py "Compute __import__('os').system('echo hacked')"✅ The calculator must return ERROR: Disallowed expression: … rather
than execute the call. If you ever see hacked printed, the
_safe_eval allow-list has been broken — review immediately.
| Symptom | Likely cause | Fix |
|---|---|---|
RuntimeError: OPENROUTER_API_KEY is not set |
.env missing or empty. |
cp .env.example .env and paste your key. |
ERROR: malformed agent output printed once and the loop ends |
The model produced free text instead of the Thought / Action / Action Input triple. |
Try a stronger model slug; check that you didn't reduce max_tokens too low. |
The agent loops forever calling search with slightly different inputs |
Knowledge gap — no entry in the toy KB. Add it to _KB in tools.py or accept that final_answer should admit uncertainty. |
Lower MAX_STEPS to enforce earlier termination. |
search('capital of mars') returns an unrelated value (e.g. 3.14159265) |
An old version used substring matching, so pi matched caPItal. |
Current search() uses whole-word set containment — upgrade to the latest tools.py. |
LLM call failed: 401 |
Invalid OpenRouter key. | Generate a new key. |
In agent.py:
LLM_MODEL = "anthropic/claude-sonnet-4" # any OpenRouter slug
MAX_STEPS = 6 # the most important guardrailIn client.chat.completions.create(...):
temperature=0.0 # determinism — tools should be picked predictably
stop=["Observation:"] # don't let the model hallucinate observations
max_tokens=400 # cap each turn to keep cost predictableTo add a new tool, edit tools.py:
def my_tool(arg: str) -> str:
...
TOOLS["my_tool"] = my_tool
TOOL_DESCRIPTIONS["my_tool"] = "What it does. Input: …"The system prompt rebuilds itself from these dicts on next launch — no prompt edits required.
rag_searchtool that calls the FAISS retriever from PoC 1 — the agent can now answer questions about any PDF you've indexed.product_searchtool wired to the Chroma collection from PoC 2.web_searchtool using DuckDuckGo's HTML endpoint or the SerpApi. This is the single most impactful tool to add for a real research agent.- Long-term memory. Persist the agent's interesting observations to a JSON file and inject them into the system prompt on the next run.
- Native tool calling. Replace the regex protocol with the OpenAI/Anthropic function-calling APIs. Trade-off: more robust, but ties you to a specific provider's schema.
- Reflexion. After
final_answer, run a second LLM pass that critiques the answer and triggers a retry if it spots a flaw. - Multi-agent. Add a
plannerthat decomposes goals and anexecutorthat runs sub-tasks. Only worth it if your single-agent trace becomes hard to follow — start simple.
- agent.py — the ReAct loop, system prompt, regex parser, OpenRouter client.
- tools.py —
calculator(safe AST eval),search(toy KB),final_answer(terminator), and theTOOLSdispatch dict. - requirements.txt —
openai,python-dotenv. - .env.example — copy to
.envand paste yourOPENROUTER_API_KEY. - .gitignore — excludes
.venv/,.env, etc.