Part of Forge Documentation
The runtime engine powers forge run — executing agent tasks via LLM providers with tool calling, conversation memory, and lifecycle hooks.
The core agent loop follows a simple pattern:
- Initialize memory with the system prompt and task history
- Append the user message
- Call the LLM with the conversation and available tool definitions
- If the LLM returns tool calls: execute each tool, append results, go to step 3
- If the LLM returns a text response: return it as the final answer
- If max iterations are exceeded: return an error
User message → Memory → LLM → tool_calls? → Execute tools → LLM → ... → text → Done
The loop terminates when FinishReason == "stop" or len(ToolCalls) == 0.
Forge supports multiple LLM providers with automatic fallback:
| Provider | Default Model | Auth |
|---|---|---|
openai |
gpt-5.2-2025-12-11 |
API key or OAuth; optional Organization ID |
anthropic |
claude-sonnet-4-20250514 |
API key |
gemini |
gemini-2.5-flash |
API key |
ollama |
llama3 |
None (local) |
| Custom | Configurable | API key |
model:
provider: openai
name: gpt-4oOr override with environment variables:
export FORGE_MODEL_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
forge runProvider is auto-detected from available API keys if not explicitly set. Provider configuration is resolved via ResolveModelConfig() in priority order:
- CLI flag
--provider(highest priority) - Environment variables:
FORGE_MODEL_PROVIDER,OPENAI_API_KEY,ANTHROPIC_API_KEY - forge.yaml
modelsection (lowest priority)
For OpenAI, Forge supports browser-based OAuth login (matching the Codex CLI flow) as an alternative to API keys:
forge init my-agent
# Select "OpenAI" -> "Login with browser (OAuth)"
# Browser opens for authenticationOAuth tokens are stored in ~/.forge/credentials/openai.json and automatically refreshed.
Enterprise OpenAI accounts can set an Organization ID to route API requests to the correct org:
model:
provider: openai
name: gpt-4o
organization_id: "org-xxxxxxxxxxxxxxxxxxxxxxxx"Or via environment variable (overrides YAML):
export OPENAI_ORG_ID=org-xxxxxxxxxxxxxxxxxxxxxxxxThe OpenAI-Organization header is sent on all OpenAI API requests (chat, embeddings, responses). Fallback providers inherit the primary org ID unless overridden per-fallback. The org ID is also injected into skill subprocess environments as OPENAI_ORG_ID.
Configure fallback providers for automatic failover when the primary provider is unavailable:
model:
provider: openai
name: gpt-4o
fallbacks:
- provider: anthropic
name: claude-sonnet-4-20250514
- provider: geminiOr via environment variable:
export FORGE_MODEL_FALLBACKS="anthropic:claude-sonnet-4-20250514,gemini:gemini-2.5-flash"Fallback behavior:
- Retriable errors (rate limits, overloaded, timeouts) try the next provider
- Non-retriable errors (auth, billing, bad format) abort immediately
- Per-provider exponential backoff cooldowns prevent thundering herd
- Fallbacks are also auto-detected from available API keys when not explicitly configured
The runtime supports multiple executor implementations:
| Executor | Use Case |
|---|---|
LLMExecutor |
Custom agents with LLM-powered tool calling |
SubprocessExecutor |
Framework agents (CrewAI, LangChain) running as subprocesses |
StubExecutor |
Returns canned responses for testing |
Executor selection happens in runner.go based on framework type and configuration.
Run the agent as a foreground HTTP server. Used for development and container deployments.
# Development (all interfaces, immediate shutdown)
forge run --with slack --port 8080
# Container deployment
forge run --host 0.0.0.0 --shutdown-timeout 30s| Flag | Default | Description |
|---|---|---|
--port |
8080 |
HTTP server port |
--host |
"" (all interfaces) |
Bind address |
--shutdown-timeout |
0 (immediate) |
Graceful shutdown timeout |
--with |
— | Channel adapters (e.g. slack,telegram) |
--mock-tools |
false |
Use mock executor for testing |
--model |
— | Override model name |
--provider |
— | Override LLM provider |
--env |
.env |
Path to env file |
--enforce-guardrails |
true |
Enforce guardrail violations as errors |
--no-guardrails |
false |
Disable all guardrail enforcement |
Manage the agent as a background daemon process with PID/log management.
# Start daemon (secure defaults: 127.0.0.1, 30s shutdown timeout)
forge serve
# Start on custom port
forge serve start --port 9090 --host 0.0.0.0
# Stop the daemon
forge serve stop
# Check status (PID, uptime, health)
forge serve status
# View recent logs (last 100 lines)
forge serve logs| Subcommand | Description |
|---|---|
start (default) |
Start the daemon in background |
stop |
Send SIGTERM (10s timeout, SIGKILL fallback) |
status |
Show PID, listen address, health check |
logs |
Tail .forge/serve.log |
The daemon forks forge run in the background with setsid, writes state to .forge/serve.json, and redirects output to .forge/serve.log. Passphrase prompting for encrypted secrets happens in the parent process (which has TTY access) before forking.
The runtime configures a FilesDir for tool-generated files (e.g., from file_create). This directory defaults to <WorkDir>/.forge/files/ and is injected into the execution context so tools can write files that other tools can reference by path.
<WorkDir>/
.forge/
files/ ← file_create output (patches.yaml, reports, etc.)
sessions/ ← conversation persistence
memory/ ← long-term memory
The FilesDir is set via LLMExecutorConfig.FilesDir and made available to tools through runtime.FilesDirFromContext(ctx). See Tools — File Create for details.
For details on session persistence, context window management, compaction, and long-term memory, see Memory.
The engine fires hooks at key points in the loop. See Hooks for details.
The runner registers four hook groups: logging, audit, progress, and guardrail hooks. The guardrail AfterToolExec hook scans tool output for secrets and PII, redacting or blocking before results enter the LLM context. See Tool Output Scanning.
The current implementation (v1) runs the full tool-calling loop non-streaming. ExecuteStream calls Execute internally and emits the final response as a single message on a channel. True word-by-word streaming during tool loops is planned for v2.
← Tools | Back to README | Memory →