code review: mistral, 31.05.2026

# AgentLoop Repository Review - Actionable Refactoring Report

---

## Executive Summary

The `huberp/agentloop` repository is a TypeScript-first LangChain runtime for tool-using coding agents. It demonstrates strong architectural foundations with modular design, clear separation of concerns, and effective use of framework abstractions (LangChain, LangGraph). However, several areas require attention to improve maintainability, type safety, testability, and adherence to best practices.

This report identifies **25 actionable items** across **6 priority tiers**, each with:
- Problem statement
- Refactoring goal
- Implementation steps
- Acceptance criteria

---

## Priority 1: Critical Issues (Must Fix)

### 1. **Circular Dependency Risk in `src/index.ts`**

**Problem:**
`src/index.ts` imports from and re-exports multiple modules (`toolRegistry`, `agentProfileRegistry`, `skillRegistry`), while also importing dependencies that those modules require (e.g., `config`, `logger`). This creates a tight coupling and potential circular dependency issues.

**Goal:**
Eliminate circular dependencies by restructuring imports and exports.

**Steps:**
1. Move all registry singletons (`toolRegistry`, `agentProfileRegistry`, `skillRegistry`) to a dedicated `src/registries/index.ts` file.
2. Have `src/index.ts` import registries only from `src/registries/index.ts`.
3. Ensure no module in `src/registries/` imports from `src/index.ts`.
4. Use dependency injection for shared state where possible.

**Acceptance Criteria:**
- No circular dependency warnings during build or runtime.
- All existing functionality preserved.
- `madge --circular src/` reports no circular dependencies.

---

### 2. **Global State Mutation in `src/config.ts`**

**Problem:**
`appConfig` is a mutable global object that is directly modified by `applyApiKeyOverride()`. This violates immutability principles and makes the configuration state unpredictable, especially in multi-tenant or test scenarios.

**Goal:**
Make configuration immutable and thread-safe.

**Steps:**
1. Convert `appConfig` to a `Readonly` object.
2. Replace `applyApiKeyOverride()` with a function that returns a new configuration object.
3. Use a context or dependency injection pattern to pass configuration to components.
4. Update all consumers to use the new pattern.

**Acceptance Criteria:**
- `appConfig` is declared as `const` or `Readonly`.
- No direct mutations of `appConfig` exist in the codebase.
- Tests can safely override configuration without affecting other tests.

---

### 3. **Unsafe Dynamic Imports in `ToolRegistry.loadFromDirectory()`**

**Problem:**
`loadFromDirectory()` uses `import(fileUrl)` with user-provided paths, which is a security risk (arbitrary code execution). Additionally, it uses `eval`-like behavior via `require` and dynamic imports.

**Goal:**
Restrict dynamic imports to trusted paths and validate tool definitions.

**Steps:**
1. Restrict `loadFromDirectory()` to only load from `src/tools` and explicitly configured directories.
2. Validate `toolDefinition` exports using a schema (e.g., Zod) before registration.
3. Add a `safeMode` flag that disables dynamic loading in production.
4. Log warnings for any dynamic imports.

**Acceptance Criteria:**
- Dynamic imports are restricted to allowlisted directories.
- All tool definitions are validated before registration.
- Security audit passes with no arbitrary code execution risks.

---

### 4. **Lack of Input Validation in `AgentProfileRegistry.loadFromDirectory()`**

**Problem:**
`loadFromDirectory()` in `src/agents/registry.ts` does not validate the structure or content of loaded agent profile files. Malformed files could cause runtime errors or security issues.

**Goal:**
Validate agent profile files before loading.

**Steps:**
1. Define a Zod schema for `AgentProfile`.
2. Validate each loaded profile against the schema.
3. Skip or reject invalid profiles with clear error messages.
4. Add unit tests for validation.

**Acceptance Criteria:**
- All agent profiles are validated before registration.
- Invalid profiles are rejected with descriptive errors.
- Unit tests cover validation edge cases.

---

### 5. **Race Condition in `PromptRegistry._saveChain`**

**Problem:**
`PromptRegistry._saveChain` uses a promise chain to serialize writes, but this pattern is fragile. If an error occurs during a save, subsequent saves may be blocked indefinitely.

**Goal:**
Replace the promise chain with a more robust serialization mechanism.

**Steps:**
1. Replace `_saveChain` with a proper async queue (e.g., `p-queue`).
2. Ensure errors in one save operation do not block future operations.
3. Add logging for failed saves.

**Acceptance Criteria:**
- No blocked saves due to prior errors.
- All save operations are logged.
- Errors are handled gracefully.

---

## Priority 2: High Impact (Should Fix)

### 6. **Tight Coupling Between `src/index.ts` and Submodules**

**Problem:**
`src/index.ts` directly imports and uses internal details of many submodules (e.g., `toolRegistry`, `permissionManager`, `concurrencyLimiter`). This makes it difficult to test or replace individual components.

**Goal:**
Reduce coupling by using dependency injection.

**Steps:**
1. Define interfaces for all dependencies (e.g., `ToolRegistryInterface`, `PermissionManagerInterface`).
2. Accept dependencies as constructor parameters or via a context object.
3. Update `src/index.ts` to inject dependencies rather than importing them directly.
4. Provide default implementations for backward compatibility.

**Acceptance Criteria:**
- `src/index.ts` does not directly instantiate or import concrete implementations.
- All dependencies are injectable.
- Unit tests can mock dependencies easily.

---

### 7. **Lack of Type Safety in `ToolDefinition.execute`**

**Problem:**
`ToolDefinition.execute` is typed as `(args: any) => Promise<string>`, which provides no type safety for tool arguments. This can lead to runtime errors if tools receive incorrect arguments.

**Goal:**
Enforce type safety for tool arguments and return values.

**Steps:**
1. Define a generic `ToolDefinition` interface with typed `args` and `returns`.
2. Update all tool definitions to use explicit types.
3. Use Zod schemas to validate arguments at runtime.
4. Update `ToolRegistry` to enforce type constraints.

**Acceptance Criteria:**
- All tool definitions have explicit argument and return types.
- Runtime validation catches type mismatches.
- TypeScript compiler enforces type safety.

---

### 8. **Hardcoded Paths in `ToolRegistry.loadFromDirectory()`**

**Problem:**
`loadFromDirectory()` uses `path.join(__dirname, "tools")` and other hardcoded paths, which makes the code less portable and harder to test.

**Goal:**
Make paths configurable and relative to a base directory.

**Steps:**
1. Accept a `baseDir` parameter in `loadFromDirectory()`.
2. Resolve all paths relative to `baseDir`.
3. Default `baseDir` to `__dirname` for backward compatibility.
4. Update callers to pass explicit paths.

**Acceptance Criteria:**
- No hardcoded paths in `loadFromDirectory()`.
- Paths are resolved relative to a configurable base directory.
- Backward compatibility is maintained.

---

### 9. **Missing Error Handling in `McpClient` Connection Logic**

**Problem:**
`registerMcpTools()` in `src/mcp/bridge.ts` swallows errors when connecting to MCP servers, which can mask configuration issues or connection failures.

**Goal:**
Improve error handling and reporting for MCP connections.

**Steps:**
1. Log detailed error messages for failed MCP connections.
2. Provide a way to retry or reconnect to failed servers.
3. Expose connection status via an API (e.g., `getMcpStatus()`).
4. Add unit tests for error scenarios.

**Acceptance Criteria:**
- All MCP connection errors are logged with context.
- Failed connections can be retried.
- Connection status is queryable.

---

### 10. **Inconsistent Error Handling in Tool Execution**

**Problem:**
Tool execution errors are caught and converted to strings in multiple places (e.g., `src/index.ts`, `src/streaming.ts`), but the error handling is inconsistent. Some errors are logged, others are not, and the error messages vary.

**Goal:**
Standardize error handling for tool execution.

**Steps:**
1. Define a `ToolExecutionResult` type with `success`, `error`, and `output` fields.
2. Create a centralized `executeTool()` function that handles errors consistently.
3. Update all tool execution paths to use this function.
4. Ensure all errors are logged with consistent context.

**Acceptance Criteria:**
- All tool execution errors are handled consistently.
- Errors are logged with full context.
- Error messages are user-friendly and actionable.

---

## Priority 3: Medium Impact (Nice to Fix)

### 11. **Lack of Dependency Injection in `streamWithTools()`**

**Problem:**
`streamWithTools()` in `src/streaming.ts` accepts a `StreamingDeps` object, but this pattern is not used consistently. Some dependencies (e.g., `logger`) are still imported directly.

**Goal:**
Fully adopt dependency injection in streaming logic.

**Steps:**
1. Add `logger` to `StreamingDeps`.
2. Update `streamWithTools()` to use the injected `logger`.
3. Ensure all other dependencies are injectable.
4. Update callers to provide all required dependencies.

**Acceptance Criteria:**
- `streamWithTools()` has no direct imports of dependencies.
- All dependencies are injectable via `StreamingDeps`.

---

### 12. **Hardcoded Timeouts in `invokeWithTimeout()`**

**Problem:**
`invokeWithTimeout()` in `src/retry.ts` uses a hardcoded timeout for all tools, which may not be optimal for all use cases.

**Goal:**
Make timeouts configurable per tool and per call.

**Steps:**
1. Accept a `timeoutMs` parameter in `invokeWithTimeout()`.
2. Allow per-tool timeout overrides in `ToolDefinition`.
3. Use the per-tool timeout if available, otherwise fall back to a default.
4. Update callers to pass explicit timeouts where needed.

**Acceptance Criteria:**
- Timeouts are configurable per tool and per call.
- Default timeouts are used when not specified.

---

### 13. **Lack of Retry Logic for Tool Execution**

**Problem:**
Tool execution does not have built-in retry logic, unlike LLM calls. Transient failures (e.g., network issues) may cause unnecessary task failures.

**Goal:**
Add retry logic for tool execution.

**Steps:**
1. Extend `invokeWithTimeout()` to support retries with exponential backoff.
2. Add a `maxRetries` field to `ToolDefinition`.
3. Use the retry logic for all tool executions.
4. Log retry attempts with context.

**Acceptance Criteria:**
- Tools can be configured to retry on failure.
- Retry attempts are logged.
- Exponential backoff is used for retries.

---

### 14. **Inconsistent Logging Format**

**Problem:**
Logging across the codebase is inconsistent. Some modules use `logger.info()`, others use `console.log()`, and the log messages vary in structure and detail.

**Goal:**
Standardize logging across the codebase.

**Steps:**
1. Define a logging standard (e.g., structured logging with consistent fields).
2. Replace all `console.log()` calls with `logger` methods.
3. Ensure all log messages include relevant context (e.g., `toolName`, `iteration`).
4. Add log levels (debug, info, warn, error) and use them appropriately.

**Acceptance Criteria:**
- All logging uses the `logger` module.
- Log messages are structured and consistent.
- Log levels are used appropriately.

---

### 15. **Missing Unit Tests for Core Logic**

**Problem:**
Core logic (e.g., `trimMessages()`, `countTokens()`, `backoffMs()`) lacks unit tests, which increases the risk of regressions.

**Goal:**
Add comprehensive unit tests for core logic.

**Steps:**
1. Identify all core utility functions (e.g., in `src/context.ts`, `src/retry.ts`).
2. Write unit tests for each function, covering edge cases.
3. Ensure tests are fast and deterministic.
4. Add tests to the CI pipeline.

**Acceptance Criteria:**
- All core utility functions have unit tests.
- Tests cover edge cases and error scenarios.
- CI pipeline runs all tests.

---

## Priority 4: Low Impact (Good to Fix)

### 16. **Hardcoded `cl100k_base` Encoding in `countTokens()`**

**Problem:**
`countTokens()` in `src/context.ts` hardcodes the `cl100k_base` encoding, which may not be optimal for all models (e.g., Mistral uses a different tokenizer).

**Goal:**
Make the tokenizer configurable.

**Steps:**
1. Accept a `tokenizer` parameter in `countTokens()`.
2. Default to `cl100k_base` for backward compatibility.
3. Allow per-model tokenizer overrides.
4. Update callers to pass explicit tokenizers where needed.

**Acceptance Criteria:**
- Tokenizer is configurable per model.
- Default tokenizer is used when not specified.

---

### 17. **Lack of Input Sanitization in `file-read.ts` and `file-write.ts`**

**Problem:**
File tools (`file-read.ts`, `file-write.ts`) do not sanitize input paths, which could allow path traversal attacks (e.g., `../../../etc/passwd`).

**Goal:**
Sanitize and validate file paths.

**Steps:**
1. Add a `sanitizePath()` utility function that:
   - Resolves paths to absolute paths.
   - Checks that the path is within the `workspaceRoot`.
   - Rejects paths that escape the workspace.
2. Use `sanitizePath()` in all file tools.
3. Add unit tests for path sanitization.

**Acceptance Criteria:**
- All file paths are sanitized before use.
- Path traversal attacks are prevented.
- Unit tests cover edge cases.

---

### 18. **Missing Rate Limiting for Web Search Tools**

**Problem:**
Web search tools (e.g., DuckDuckGo) do not have built-in rate limiting, which could lead to IP bans or rate limit errors.

**Goal:**
Add rate limiting for web search tools.

**Steps:**
1. Implement a rate limiter (e.g., token bucket or leaky bucket).
2. Apply rate limiting to all web search tools.
3. Make rate limits configurable via environment variables.
4. Log rate limit events.

**Acceptance Criteria:**
- Web search tools respect rate limits.
- Rate limits are configurable.
- Rate limit events are logged.

---

### 19. **Hardcoded `MAX_ITERATIONS` in `executeWithTools()`**

**Problem:**
`MAX_ITERATIONS` is hardcoded in `src/index.ts` and not easily configurable per agent or per call.

**Goal:**
Make `MAX_ITERATIONS` configurable.

**Steps:**
1. Move `MAX_ITERATIONS` to `appConfig`.
2. Allow per-agent or per-call overrides.
3. Update documentation to reflect the change.

**Acceptance Criteria:**
- `MAX_ITERATIONS` is configurable via `appConfig`.
- Per-agent or per-call overrides are supported.

---

### 20. **Lack of Documentation for Public API**

**Problem:**
The public API (e.g., `agentExecutor.invoke()`, `agentExecutor.stream()`) lacks comprehensive documentation, making it difficult for external consumers to use the library.

**Goal:**
Add comprehensive API documentation.

**Steps:**
1. Add JSDoc comments to all public functions and classes.
2. Document all parameters, return types, and error cases.
3. Generate API documentation (e.g., using TypeDoc).
4. Publish documentation to a public site (e.g., GitHub Pages).

**Acceptance Criteria:**
- All public APIs have JSDoc comments.
- API documentation is generated and published.

---

## Priority 5: Cosmetic Issues (Fix if Time Permits)

### 21. **Inconsistent Naming Conventions**

**Problem:**
Naming conventions are inconsistent (e.g., `toolRegistry` vs. `agentProfileRegistry`, `getCachedPromptContext` vs. `loadFromDirectory`).

**Goal:**
Standardize naming conventions.

**Steps:**
1. Define a naming convention (e.g., camelCase for functions, PascalCase for classes).
2. Rename all identifiers to conform to the convention.
3. Update all references.

**Acceptance Criteria:**
- All identifiers follow the naming convention.

---

### 22. **Inconsistent Code Formatting**

**Problem:**
Code formatting is inconsistent (e.g., indentation, line length, quote style).

**Goal:**
Enforce consistent code formatting.

**Steps:**
1. Adopt a code formatter (e.g., Prettier).
2. Configure the formatter with a standard style.
3. Add a pre-commit hook to enforce formatting.
4. Format all existing code.

**Acceptance Criteria:**
- All code is formatted consistently.
- Pre-commit hook enforces formatting.

---

### 23. **Missing Code Comments**

**Problem:**
Some complex logic (e.g., in `src/langgraph/compiler.ts`, `src/streaming.ts`) lacks comments, making it difficult to understand.

**Goal:**
Add comments to complex or non-obvious code.

**Steps:**
1. Identify all complex or non-obvious code sections.
2. Add comments explaining the purpose and logic.
3. Ensure comments are up-to-date with the code.

**Acceptance Criteria:**
- All complex code has explanatory comments.
- Comments are accurate and up-to-date.

---

### 24. **Lack of Examples in Documentation**

**Problem:**
Documentation lacks examples for common use cases (e.g., adding a custom tool, configuring an agent profile).

**Goal:**
Add examples to documentation.

**Steps:**
1. Identify common use cases.
2. Add code examples to the README and docs.
3. Ensure examples are tested and up-to-date.

**Acceptance Criteria:**
- Documentation includes examples for all common use cases.
- Examples are tested and up-to-date.

---

### 25. **Inconsistent Use of `async/await` vs. Promises**

**Problem:**
Some code uses `async/await`, while other code uses Promise chains (e.g., `.then()`). This inconsistency makes the code harder to read and maintain.

**Goal:**
Standardize on `async/await`.

**Steps:**
1. Convert all Promise chains to `async/await`.
2. Ensure error handling is consistent (e.g., `try/catch` for `async/await`).
3. Update all affected code.

**Acceptance Criteria:**
- All asynchronous code uses `async/await`.
- Error handling is consistent.

---

## Implementation Roadmap

| Phase | Duration | Focus Area | Items |
|---|---|---|---|
| 1 | 1-2 weeks | Critical Issues | 1-5 |
| 2 | 2-3 weeks | High Impact | 6-10 |
| 3 | 1-2 weeks | Medium Impact | 11-15 |
| 4 | 1 week | Low Impact | 16-20 |
| 5 | Ongoing | Cosmetic Issues | 21-25 |

---

## Testing Strategy

1. **Unit Tests:** Add unit tests for all core utility functions (e.g., `trimMessages()`, `countTokens()`).
2. **Integration Tests:** Add integration tests for major components (e.g., `ToolRegistry`, `AgentProfileRegistry`).
3. **E2E Tests:** Add end-to-end tests for common workflows (e.g., agent loop execution, tool invocation).
4. **Regression Tests:** Add tests for known issues to prevent regressions.

---

## Monitoring and Observability

1. **Logging:** Standardize logging and ensure all critical operations are logged.
2. **Metrics:** Add metrics for key operations (e.g., tool execution time, LLM call latency).
3. **Tracing:** Ensure tracing is comprehensive and covers all major operations.
4. **Error Tracking:** Add error tracking for all unhandled exceptions.

---

## Conclusion

The `agentloop` repository is well-structured and leverages modern TypeScript and framework abstractions effectively. However, addressing the **25 actionable items** in this report will significantly improve its maintainability, type safety, testability, and security. Prioritize **Priority 1 (Critical)** and **Priority 2 (High Impact)** items first, then proceed to lower-priority items as time permits.

---

## Appendix: File Structure Overview

```
agentloop/
├── src/
│   ├── index.ts              # Main entry point
│   ├── config.ts             # Configuration
│   ├── errors.ts             # Custom errors
│   ├── context.ts            # Token counting and message trimming
│   ├── logger.ts             # Logging
│   ├── retry.ts              # Retry and timeout utilities
│   ├── security.ts           # Permission management and concurrency limiting
│   ├── observability.ts      # Tracing and metrics
│   ├── streaming.ts          # Streaming agent loop
│   ├── agents/               # Agent profiles and activation
│   │   ├── registry.ts       # Agent profile registry
│   │   ├── activator.ts      # Profile activation logic
│   │   ├── types.ts          # Agent profile types
│   │   └── builtin/          # Built-in agent profiles
│   ├── tools/                # Built-in tools
│   │   ├── registry.ts       # Tool registry
│   │   ├── file-read.ts      # File read tool
│   │   ├── file-write.ts     # File write tool
│   │   └── ...              # Other tools
│   ├── prompts/              # Prompt management
│   │   ├── registry.ts       # Prompt template registry
│   │   ├── context.ts        # Prompt context building
│   │   └── system.ts         # System prompt generation
│   ├── langgraph/           # LangGraphJS orchestrator
│   │   ├── index.ts          # Public API
│   │   ├── compiler.ts       # Plan compilation
│   │   ├── graph.ts          # Graph execution
│   │   └── ...              # Other langgraph modules
│   ├── mcp/                 # MCP integration
│   │   ├── bridge.ts         # MCP tool registration
│   │   └── client.ts         # MCP client
│   ├── skills/               # Agent skills
│   │   ├── registry.ts       # Skill registry
│   │   └── builtin/          # Built-in skills
│   └── ...                  # Other modules
├── tests/                   # Tests
├── benchmarks/              # Performance benchmarks
└── docs/                    # Documentation
```

---

## Appendix: Key Dependencies

- **LangChain:** Core agent and tool abstractions (`@langchain/core`, `@langchain/langgraph`, `@langchain/mistralai`).
- **MCP SDK:** Model Context Protocol integration (`@modelcontextprotocol/sdk`).
- **TypeScript:** Type safety and compilation.
- **Jest:** Testing framework.
- **Pino:** Logging.
- **Zod:** Schema validation.

---

## Appendix: Environment Variables

Key environment variables (see `src/config.ts` for full list):
- `MISTRAL_API_KEY`: Mistral API key.
- `MAX_ITERATIONS`: Maximum agent loop iterations.
- `MAX_CONTEXT_TOKENS`: Maximum context window size.
- `TOOL_TIMEOUT_MS`: Per-tool execution timeout.
- `AUTO_APPROVE_ALL`: Bypass confirmation prompts for dangerous tools.
- `TRACING_ENABLED`: Enable invocation tracing.
- `UI_MODE`: User interface mode (`cli` or `tui`).
- `ORCHESTRATOR`: Orchestrator engine (`default` or `langgraph`).

Phase	Duration	Focus Area	Items
1	1-2 weeks	Critical Issues	1-5
2	2-3 weeks	High Impact	6-10
3	1-2 weeks	Medium Impact	11-15
4	1 week	Low Impact	16-20
5	Ongoing	Cosmetic Issues	21-25

code review: mistral, 31.05.2026 #127

Description

AgentLoop Repository Review - Actionable Refactoring Report

Executive Summary

Priority 1: Critical Issues (Must Fix)

1. Circular Dependency Risk in src/index.ts

2. Global State Mutation in src/config.ts

3. Unsafe Dynamic Imports in ToolRegistry.loadFromDirectory()

4. Lack of Input Validation in AgentProfileRegistry.loadFromDirectory()

5. Race Condition in PromptRegistry._saveChain

Priority 2: High Impact (Should Fix)

6. Tight Coupling Between src/index.ts and Submodules

7. Lack of Type Safety in ToolDefinition.execute

8. Hardcoded Paths in ToolRegistry.loadFromDirectory()

9. Missing Error Handling in McpClient Connection Logic

10. Inconsistent Error Handling in Tool Execution

Priority 3: Medium Impact (Nice to Fix)

11. Lack of Dependency Injection in streamWithTools()

12. Hardcoded Timeouts in invokeWithTimeout()

13. Lack of Retry Logic for Tool Execution

14. Inconsistent Logging Format

15. Missing Unit Tests for Core Logic

Priority 4: Low Impact (Good to Fix)

16. Hardcoded cl100k_base Encoding in countTokens()

17. Lack of Input Sanitization in file-read.ts and file-write.ts

18. Missing Rate Limiting for Web Search Tools

19. Hardcoded MAX_ITERATIONS in executeWithTools()

20. Lack of Documentation for Public API

Priority 5: Cosmetic Issues (Fix if Time Permits)

21. Inconsistent Naming Conventions

22. Inconsistent Code Formatting

23. Missing Code Comments

24. Lack of Examples in Documentation

25. Inconsistent Use of async/await vs. Promises

Implementation Roadmap

Testing Strategy

Monitoring and Observability

Conclusion

Appendix: File Structure Overview

Appendix: Key Dependencies

Appendix: Environment Variables

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Circular Dependency Risk in `src/index.ts`

2. Global State Mutation in `src/config.ts`

3. Unsafe Dynamic Imports in `ToolRegistry.loadFromDirectory()`

4. Lack of Input Validation in `AgentProfileRegistry.loadFromDirectory()`

5. Race Condition in `PromptRegistry._saveChain`

6. Tight Coupling Between `src/index.ts` and Submodules

7. Lack of Type Safety in `ToolDefinition.execute`

8. Hardcoded Paths in `ToolRegistry.loadFromDirectory()`

9. Missing Error Handling in `McpClient` Connection Logic

11. Lack of Dependency Injection in `streamWithTools()`

12. Hardcoded Timeouts in `invokeWithTimeout()`

16. Hardcoded `cl100k_base` Encoding in `countTokens()`

17. Lack of Input Sanitization in `file-read.ts` and `file-write.ts`

19. Hardcoded `MAX_ITERATIONS` in `executeWithTools()`

25. Inconsistent Use of `async/await` vs. Promises