Name	Name	Last commit message	Last commit date
parent directory ..
.env.example	.env.example
.gitignore	.gitignore
README.md	README.md
agent.py	agent.py
requirements.txt	requirements.txt
tools.py	tools.py

PoC 3 — A ReAct Agent with Tools

A minimal command-line ReAct agent in Python: an LLM gets a natural-language goal, decides on its own whether to look something up or do a calculation, calls the matching tool, observes the result, and loops until it can answer — without us hard-coding the control flow.

ReAct = Reason + Act. The model interleaves thoughts (reasoning traces) with actions (tool calls), then reads observations (tool results) and decides what to do next. See Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023.

🎯 What you will learn

By reading the code and running the agent, you will see, end-to-end:

The anatomy of an agent: LLM (brain) + Tools (hands) + Memory (conversation history) + Planning (the ReAct loop).
How structured output formats (here: a strict Thought / Action / Action Input block) let you treat an LLM as a programmable component.
Why a terminator tool (final_answer) is a clean way to express "the agent is done" — no special control tokens needed.
The two most important guardrails for any agent: a hard step budget and a safe tool-execution layer (no eval()).
How an agent's trace (Thought → Action → Observation) makes its reasoning auditable in a way a single LLM call never is.

📋 The exact Copilot Agent prompt

This PoC was generated by pasting the following prompt into VS Code's Copilot Chat in Agent mode. It is reproduced verbatim from the lecture slides so you can paste it yourself and compare.

Build a command-line ReAct agent in Python: agent.py plus tools.py. In tools.py expose a dict TOOLS with three callables: calculator(expr) (safe eval on arithmetic only), search(query) (lookup in a small in-memory dictionary KB with at least 5 entries, e.g. "capital of france" -> "Paris"), and final_answer(text) (returns the text). In agent.py implement run_agent(goal, max_steps=6): a loop that calls the Anthropic API (claude-sonnet-4-6, key from ANTHROPIC_API_KEY) with a system prompt forcing the format "Thought:.../ Action:.../ Action Input:..."; parse the response with regex; execute the named tool; append the Observation back into the conversation history; stop when final_answer is invoked or max_steps is reached. Print every step. At the bottom, an if __name__ == "__main__" block that runs the agent on two example goals. Provide requirements.txt (anthropic), .gitignore, and a README with the run command.

📝 Deviations from the prompt:

Uses OpenRouter (anthropic/claude-sonnet-4) instead of native Anthropic, for one-key access to many models.

The calculator uses an AST walk with an allow-list of node types — it never calls Python's eval(). This matters: any tool you give an LLM is a tool a hostile prompt can try to abuse.

Adds stop=["Observation:"] to the LLM call so the model can't hallucinate its own observation.

🏗️ Architecture

flowchart LR
    GOAL["🎯 User goal"] --> LOOP

    subgraph LOOP["🔁 ReAct loop (max_steps = 6)"]
        direction TB
        T["Thought:<br/>reason about<br/>next step"] --> A["Action:<br/>pick a tool"]
        A --> AI["Action Input:<br/>argument string"]
        AI --> EXEC["runtime executes<br/>TOOLS[action](input)"]
        EXEC --> OBS["Observation:<br/>tool result"]
        OBS -- "append to<br/>conversation" --> T
    end

    LOOP -- "Action == final_answer<br/>OR step budget hit" --> ANS["💡 Final answer"]

    subgraph TOOLS["🧰 Toolbox (tools.py)"]
        direction TB
        CALC["calculator(expr)<br/>safe AST eval"]
        SRCH["search(query)<br/>toy in-memory KB"]
        FA["final_answer(text)<br/>terminator"]
    end

    EXEC -.dispatch.-> CALC
    EXEC -.dispatch.-> SRCH
    EXEC -.dispatch.-> FA

    style GOAL fill:#dbeafe,stroke:#1e40af
    style ANS fill:#bbf7d0,stroke:#15803d
    style TOOLS fill:#fef3c7,stroke:#b45309
    style LOOP fill:#f3e8ff,stroke:#6b21a8

The LLM is not in the loop. It is called by the loop. Each iteration is a fresh call with the growing conversation history. This is what makes the trace deterministic to read (and easy to debug).

🧩 Components — file by file

`tools.py` — the agent's hands

Symbol	Responsibility	Notes
`_safe_eval(node)`	Walk a Python AST, allowing only arithmetic node types and literal numbers. Raises on anything else.	The single-most-important security boundary in this PoC.
`calculator(expr)`	Parse `expr` with `ast.parse(..., mode="eval")` and pass to `_safe_eval`. Returns a string.	Returns `"ERROR: …"` on bad input — never raises into the agent loop.
`_KB`	Small Python dict acting as a knowledge base (capitals, fun facts).	Easy to extend.
`search(query)`	Case-insensitive substring lookup in `_KB`.	Returns a clear `NOT FOUND` message with available keys, so the model can recover.
`final_answer(text)`	Returns `text` unchanged.	Acts as the loop terminator.
`TOOLS` (dict)	Maps tool names (the strings the LLM emits) to callables.	The contract between LLM and runtime.
`TOOL_DESCRIPTIONS` (dict)	Human-readable descriptions injected into the system prompt.	Lets the LLM know what each tool does and what input it expects.

`agent.py` — the ReAct loop

Section	Symbol	Responsibility
Constants	`OPENROUTER_BASE_URL`, `LLM_MODEL`, `MAX_STEPS`	Provider URL, model slug, and step budget.
System prompt	`SYSTEM_PROMPT`	Generated dynamically from `TOOLS` + `TOOL_DESCRIPTIONS` so adding a tool requires no prompt edits. Forces the strict output format.
Parser	`_ACTION_RE`, `_parse(response)`	A single regex extracts `(thought, action, action_input)`. Returns `None` if the model deviates from the format.
Client	`_client()`	Reads `OPENROUTER_API_KEY` from `.env`, returns an `OpenAI` instance pointed at OpenRouter.
Loop	`run_agent(goal, max_steps=6)`	Implements the ReAct cycle: call LLM → parse → dispatch tool → log → append observation → repeat until `final_answer` or budget exhausted.
Entry point	`if __name__ == "__main__"`	Two sample goals, or override with CLI args.

One iteration of the loop, in detail

sequenceDiagram
    participant Loop as run_agent()
    participant LLM as OpenRouter LLM
    participant Tool as TOOLS[action]

    Loop->>LLM: messages (system + history)
    Note over LLM: stop=["Observation:"]<br/>so the model never<br/>hallucinates a result
    LLM-->>Loop: "Thought: …<br/>Action: search<br/>Action Input: capital of france"
    Loop->>Loop: regex parse → (thought, action, input)
    Loop->>Tool: TOOLS["search"]("capital of france")
    Tool-->>Loop: "Paris"
    Loop->>Loop: print step (audit trail)
    Loop->>Loop: append assistant turn + "Observation: Paris"
    alt action == "final_answer"
        Loop-->>Loop: STOP, return text
    else more steps allowed
        Note over Loop: next iteration
    end

⚙️ Setup

cd poc3_react_agent
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env       # then edit .env: OPENROUTER_API_KEY=sk-or-v1-…

Get a free OpenRouter key at https://openrouter.ai/keys.

▶️ Run

# Run with the two built-in example goals
python agent.py

# Or pass your own
python agent.py "What is the capital of Japan, and what is 8 squared?"

📚 Example data & query cheatsheet

The agent's "knowledge" lives in a tiny dictionary _KB at the top of tools.py. The full set of facts the search tool can find:

Key	Value
`capital of france`	Paris
`capital of germany`	Berlin
`capital of japan`	Tokyo
`capital of brazil`	Brasília
`capital of australia`	Canberra
`speed of light`	299,792,458 m/s
`pi`	3.14159265
`founder of microsoft`	Bill Gates and Paul Allen
`author of 1984`	George Orwell

Anything else returns NOT FOUND plus the list of known keys — which the agent then has to handle gracefully (Test 4 below).

Curated example queries

Copy these one by one and read the printed traces. Each query is designed to exercise a different ReAct behaviour.

#	Command	Behaviour you should see
1	`python agent.py "Who wrote 1984?"`	2 steps: `search` → `final_answer`. Pure lookup.
2	`python agent.py "What is 17 * 23 + 5?"`	2 steps: `calculator` → `final_answer`. Pure math.
3	`python agent.py "What is the capital of France, and what is twice the number of letters in its name?"`	3 steps: `search` (Paris) → `calculator` (`2 * 5 = 10`) → `final_answer`. The headline multi-tool, multi-step demo.
4	`python agent.py "What is the capital of Japan, and what is 8 squared?"`	3 steps: `search` (Tokyo) → `calculator` (`8 ** 2 = 64`) → `final_answer`.
5	`python agent.py "What is the capital of Mars?"`	`search` returns `NOT FOUND` → agent calls `final_answer` admitting Mars has no capital. Must NOT hallucinate.
6	`python agent.py "Compute __import__('os').system('echo hacked')"`	`calculator` returns `ERROR: Disallowed expression: …`. Safety boundary — if you ever see `hacked` printed, the allow-list has been broken.

💡 Read traces top-to-bottom. Every Observation: you see was produced by Python code (the tool), not the LLM. Every Thought: and Action: was produced by the LLM. That separation is what makes the agent auditable.

Adding your own facts

Edit _KB in tools.py and restart:

_KB = {
    # …existing entries…
    "capital of mars": "Mars has no capital — it is uninhabited.",
    "boiling point of water": "100 °C at sea level.",
}

Then run python agent.py "What is the boiling point of water?" to see the new fact picked up.

🧪 Test plan

The point of these tests is not just "did it answer?" — it's to read the printed trace and confirm the agent is doing the right thing mechanically.

Test 1 — A pure search task

Run:

python agent.py "Who wrote 1984?"

Expected trace shape:

[step 1] Thought: I need to look this up.
[step 1] Action: search
[step 1] Action Input: author of 1984
[step 1] Observation: George Orwell

[step 2] Thought: I have the answer.
[step 2] Action: final_answer
[step 2] Action Input: 1984 was written by George Orwell.

✅ The agent should call search exactly once, then final_answer. 2 steps total.

Test 2 — A pure calculation task

python agent.py "What is 17 * 23?"

✅ One calculator call (Action Input: 17 * 23), one final_answer. 2 steps total.

Test 3 — Multi-step (the headline ReAct demo)

python agent.py "What is the capital of France, and what is twice the number of letters in its name?"

Verified output (Claude Sonnet 4 via OpenRouter):

[step 1] Thought: I need to find the capital of France first…
[step 1] Action: search
[step 1] Action Input: capital of france
[step 1] Observation: Paris

[step 2] Thought: Paris has 5 letters (P-a-r-i-s). I'll compute 2 * 5.
[step 2] Action: calculator
[step 2] Action Input: 2 * 5
[step 2] Observation: 10

[step 3] Action: final_answer
[step 3] Action Input: The capital of France is Paris, and twice the
                       number of letters in its name is 10.

✅ This is the canonical ReAct pattern: the result of step 1 (Paris) informs the argument of step 2 (5 letters → 2 * 5).

Test 4 — Graceful failure when knowledge is missing

python agent.py "What is the capital of Mars?"

Verified behaviour:

Step 1: search('capital of mars') returns NOT FOUND: 'capital of mars'. Known keys: … (whole-word matching prevents false hits like pi matching caPItal).
Step 2: the agent reads the available keys, realises Mars isn't there, and calls final_answer admitting Mars has no capital city.

✅ The agent must not hallucinate "Olympus Mons" or similar.

Test 5 — Step budget guardrail

Edit MAX_STEPS = 2 in agent.py and rerun Test 3.

✅ The trace should stop after step 2 with === STOPPED: max_steps=2 reached ===. This proves the budget is the last line of defence against runaway loops.

Test 6 — Calculator safety

python agent.py "Compute __import__('os').system('echo hacked')"

✅ The calculator must return ERROR: Disallowed expression: … rather than execute the call. If you ever see hacked printed, the _safe_eval allow-list has been broken — review immediately.

🛠️ Troubleshooting

Symptom	Likely cause	Fix
`RuntimeError: OPENROUTER_API_KEY is not set`	`.env` missing or empty.	`cp .env.example .env` and paste your key.
`ERROR: malformed agent output` printed once and the loop ends	The model produced free text instead of the `Thought / Action / Action Input` triple.	Try a stronger model slug; check that you didn't reduce `max_tokens` too low.
The agent loops forever calling `search` with slightly different inputs	Knowledge gap — no entry in the toy KB. Add it to `_KB` in tools.py or accept that `final_answer` should admit uncertainty.	Lower `MAX_STEPS` to enforce earlier termination.
`search('capital of mars')` returns an unrelated value (e.g. `3.14159265`)	An old version used substring matching, so `pi` matched `caPItal`.	Current `search()` uses whole-word set containment — upgrade to the latest tools.py.
`LLM call failed: 401`	Invalid OpenRouter key.	Generate a new key.

🔧 Tuning knobs

In agent.py:

LLM_MODEL = "anthropic/claude-sonnet-4"   # any OpenRouter slug
MAX_STEPS = 6                             # the most important guardrail

In client.chat.completions.create(...):

temperature=0.0           # determinism — tools should be picked predictably
stop=["Observation:"]     # don't let the model hallucinate observations
max_tokens=400            # cap each turn to keep cost predictable

To add a new tool, edit tools.py:

def my_tool(arg: str) -> str:
    ...

TOOLS["my_tool"] = my_tool
TOOL_DESCRIPTIONS["my_tool"] = "What it does. Input: …"

The system prompt rebuilds itself from these dicts on next launch — no prompt edits required.

💡 Extension ideas

rag_search tool that calls the FAISS retriever from PoC 1 — the agent can now answer questions about any PDF you've indexed.
product_search tool wired to the Chroma collection from PoC 2.
web_search tool using DuckDuckGo's HTML endpoint or the SerpApi. This is the single most impactful tool to add for a real research agent.
Long-term memory. Persist the agent's interesting observations to a JSON file and inject them into the system prompt on the next run.
Native tool calling. Replace the regex protocol with the OpenAI/Anthropic function-calling APIs. Trade-off: more robust, but ties you to a specific provider's schema.
Reflexion. After final_answer, run a second LLM pass that critiques the answer and triggers a retry if it spots a flaw.
Multi-agent. Add a planner that decomposes goals and an executor that runs sub-tasks. Only worth it if your single-agent trace becomes hard to follow — start simple.

📂 Files in this PoC

agent.py — the ReAct loop, system prompt, regex parser, OpenRouter client.
tools.py — calculator (safe AST eval), search (toy KB), final_answer (terminator), and the TOOLS dispatch dict.
requirements.txt — openai, python-dotenv.
.env.example — copy to .env and paste your OPENROUTER_API_KEY.
.gitignore — excludes .venv/, .env, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

PoC 3 — A ReAct Agent with Tools

🎯 What you will learn

📋 The exact Copilot Agent prompt

🏗️ Architecture

🧩 Components — file by file

`tools.py` — the agent's hands

`agent.py` — the ReAct loop

One iteration of the loop, in detail

⚙️ Setup

▶️ Run

📚 Example data & query cheatsheet

Curated example queries

Adding your own facts

🧪 Test plan

Test 1 — A pure search task

Test 2 — A pure calculation task

Test 3 — Multi-step (the headline ReAct demo)

Test 4 — Graceful failure when knowledge is missing

Test 5 — Step budget guardrail

Test 6 — Calculator safety

🛠️ Troubleshooting

🔧 Tuning knobs

💡 Extension ideas

📂 Files in this PoC

FilesExpand file tree

poc3_react_agent

Directory actions

More options

Directory actions

More options

Latest commit

History

poc3_react_agent

Folders and files

parent directory

README.md

PoC 3 — A ReAct Agent with Tools

🎯 What you will learn

📋 The exact Copilot Agent prompt

🏗️ Architecture

🧩 Components — file by file

tools.py — the agent's hands

agent.py — the ReAct loop

One iteration of the loop, in detail

⚙️ Setup

▶️ Run

📚 Example data & query cheatsheet

Curated example queries

Adding your own facts

🧪 Test plan

Test 1 — A pure search task

Test 2 — A pure calculation task

Test 3 — Multi-step (the headline ReAct demo)

Test 4 — Graceful failure when knowledge is missing

Test 5 — Step budget guardrail

Test 6 — Calculator safety

🛠️ Troubleshooting

🔧 Tuning knobs

💡 Extension ideas

📂 Files in this PoC

`tools.py` — the agent's hands

`agent.py` — the ReAct loop