Skip to content

Comments

feat: Enhance model registry and agent execution with new skills and …#26

Merged
MikePfunk28 merged 3 commits intomainfrom
012-visual-builder-update
Feb 14, 2026
Merged

feat: Enhance model registry and agent execution with new skills and …#26
MikePfunk28 merged 3 commits intomainfrom
012-visual-builder-update

Conversation

@MikePfunk28
Copy link
Owner

@MikePfunk28 MikePfunk28 commented Feb 13, 2026

…thinking capabilities

  • Added thinkingApiPath to ModelMetadata for reasoning invocation paths.
  • Introduced new models including DeepSeek V3.2, MiniMax M2, Zhipu GLM 4.7, and Qwen3.
  • Updated getModelThinkingConfig to handle reasoning configurations.
  • Enhanced schema to support agent skills and thinking levels.
  • Modified agent execution to utilize skills and thinking levels during model invocation.
  • Updated AgentBuilder and AgentTester components to manage skills and test models.
  • Improved model catalog with new entries for recently added models.

Summary by CodeRabbit

  • New Features

    • Skills management UI and registry; attach skills and set agent thinking level (low/medium/high)
    • Iterative Mode for testing with per-iteration history and stop criteria
    • Test-time model override and expanded model catalog (several new models)
  • Improvements

    • More accurate token usage tracking and billing
    • Model-aware workflows and safer input validation/escaping for generated code
    • Deployment/test flows simplified to a unified agent runtime
  • Documentation

    • README updated with tiers, models, and builder options

…thinking capabilities

- Added `thinkingApiPath` to `ModelMetadata` for reasoning invocation paths.
- Introduced new models including DeepSeek V3.2, MiniMax M2, Zhipu GLM 4.7, and Qwen3.
- Updated `getModelThinkingConfig` to handle reasoning configurations.
- Enhanced schema to support agent skills and thinking levels.
- Modified agent execution to utilize skills and thinking levels during model invocation.
- Updated AgentBuilder and AgentTester components to manage skills and test models.
- Improved model catalog with new entries for recently added models.
@MikePfunk28 MikePfunk28 self-assigned this Feb 13, 2026
@MikePfunk28 MikePfunk28 added bug Something isn't working enhancement New feature or request labels Feb 13, 2026
Copilot AI review requested due to automatic review settings February 13, 2026 23:29
@vercel
Copy link

vercel bot commented Feb 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-builder-application Ready Ready Preview, Comment Feb 14, 2026 7:53pm

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Adds model-aware workflows, a centralized tool dispatch + skills system, schema and UI for agent skills/thinking levels, Bedrock gating and model registry enhancements, iterative agent loop with tool execution and per-model token accounting, numerous API signature tightenings, and removal of ECS-based orchestration.

Changes

Cohort / File(s) Summary
Workflow & Execution Core
convex/agentBuilderWorkflow.ts, convex/strandsAgentExecution.ts, convex/strandsAgentExecutionDynamic.ts, convex/agentcoreTestExecution.ts, convex/unifiedAgentExecution.ts, convex/workflowExecutor.ts
Introduce modelId wiring and DEFAULT_WORKFLOW_MODEL, add AST_ANALYSIS/TEST_GENERATION/VERIFICATION stages, thread userId/modelId through workflows, replace local toolMap with centralized INTERNAL_TOOL_MAP, add Bedrock gating (requireBedrockAccess) and per-model invocation paths, and update token accumulation and billing hooks.
Tool Dispatch & Skills System
convex/lib/toolDispatch.ts, convex/metaTooling.ts, convex/agentSkills.test.ts, convex/strandsAgentsTools.ts
New centralized tool dispatch (dispatchToolCall), INTERNAL_TOOL_MAP, skill CRUD (create/list/get/seed), thinking-level → payload mapping, tool-result message builders, token accumulation, depth/iteration limits, and comprehensive tests.
Model Registry & Model Thinking
convex/modelRegistry.ts, src/data/modelCatalog.ts, convex/lib/dynamicModelSwitching.ts, convex/lib/unifiedModalitySwitching.ts
Extend model metadata (thinkingApiPath), add getModelThinkingConfig/isOllamaModelId/deriveDeploymentType helpers, expand BEDROCK models and short-name mappings, and update model-tier/cost derivations to use registry values.
MCP, Billing & MCP Clients
convex/mcpClient.ts, convex/mcpConfig.ts, convex/mcpFileUpload.ts, convex/agentcoreDeployment.ts, convex/deploymentRouter.ts
Tighten MCP param types, add optional userId to public invokeMCPTool for billing, add built-in MCP server entry, enforce ownership/auth in MCP server lookup, propagate userId to MCP calls, and make billing non-fatal with guarded increments.
Agent Design & Meta Workflow
convex/metaAgent.ts, convex/metaAgentWorkflow.ts, convex/metaAgentWorkflow.ts
Replace static model database with registry-built model list, require modelId in designAgent/designAgentWithWorkflow, gate access via requireBedrockAccess, thread userId/modelId through analysis and MCP calls, and return richer design outputs.
Iterative Loop & Orchestration
convex/lib/iterativeLoop.ts, convex/strandsAgentExecution.ts, convex/executeIterativeAgent
New iterative loop module (completion criteria, continuation prompts, limits) and iterative agent execution that performs multi-iteration tool invocation loops, accumulates token usage, and applies completion checks.
Schema & DB Shape
convex/schema.ts, convex/agents.ts, convex/dynamicTools*
Add agent skills array, thinkingLevel, iterativeConfig, skillType/skillConfig/toolDefinition metadata, indexes by_skill_type, and stricter env/mcpInputSchema typings (string/env record).
Type Tightening & Validation
many convex/*.ts (agentValidation, agentCoreTester, cdkGenerator, cloudFormationGenerator, codeGenerator, diagramGenerator, realAgentTesting, etc.)
Replace many v.any() argument shapes with structured v.object schemas for tools, headers, metadata, parameters, and results; enforce string-keyed records for metadata/env; update several public action arg types accordingly.
AWS Credentials & Removal of ECS Orchestration
convex/lib/aws/credentials.ts, convex/lib/aws/ecsClient.ts (removed), convex/containerOrchestrator.ts (removed), convex/queueProcessor.ts
Introduce centralized AWS credential helpers (validate/requireAwsCredentials), remove ECS client and container orchestrator modules and ECS-based queue paths, and route tests through AgentCore.
Frontend: Skills UI & Testing
src/components/AgentBuilder.tsx, src/components/AgentTester.tsx, src/components/AgentBuilderWorkflowUI.tsx
Add SkillsStep UI, AgentConfig.skills and thinkingLevel, client zod validation on save, iterative test mode UI + executeIterative integration, and add new workflow stages in the builder UI.
Utilities & Safety
convex/constants.ts, convex/deploymentPackageGenerator.ts, convex/codeGenerator.ts, convex/agentAsToolGenerator.ts, convex/lib/jsonUtils.ts
Add Python string escaping utilities, use escaping in generated code and test scaffolds, add safeJsonParse, and improve sanitization in code/tool generation templates.
Logging & Minor Cleanups
many convex/* and src/* files
Remove or reduce many console.log debug statements; tighten request validation in HTTP MCP route; update .github workflow to hard-code PROJECT_ID: "4" and remove project creation steps; add test:skills script and zod dependency.
Removals
convex/containerOrchestrator.ts, convex/lib/aws/ecsClient.ts
Remove ECS/Fargate orchestration modules and their exported actions/interfaces; queue and test routing updated accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant UI as UI/Client
    participant Workflow as Workflow Executor
    participant Registry as Model Registry
    participant Bedrock as Bedrock API
    participant Dispatch as Tool Dispatch
    participant MCP as MCP Server
    participant Billing as Billing Service

    UI->>Workflow: Start workflow (args include modelId, user)
    Workflow->>Registry: resolveModelThinkingConfig(modelId)
    Registry-->>Workflow: thinking config / effectiveModelId
    Workflow->>Bedrock: invoke model (Claude or Converse path) with effectiveModelId
    Bedrock-->>Workflow: response (text, thinking, tool_use) + usage
    Workflow->>Dispatch: dispatchToolCall(toolName, input, skills, userId)
    Dispatch->>MCP: call MCP / internal / sandbox as routed
    MCP-->>Dispatch: tool result + tokenUsage
    Dispatch->>Billing: increment usage (non-fatal)
    Dispatch-->>Workflow: tool_result
    Workflow->>Workflow: accumulate tokens & build next prompt (loop)
    alt completion reached
        Workflow->>Billing: final usage report (effectiveModelId)
        Billing-->>Workflow: report ack
        Workflow-->>UI: final response + totalUsage
    end
Loading
sequenceDiagram
    participant Designer as Client/UI
    participant Meta as MetaAgent Workflow
    participant Gate as Bedrock Gate
    participant MCP as MCP Server
    participant Bedrock as Bedrock API
    participant DB as Database

    Designer->>Meta: designAgent(name, modelId, requirements)
    Meta->>Gate: requireBedrockAccess(user, modelId)
    Gate-->>Meta: userId (or throw)
    Meta->>MCP: invokeMCPTool(analysisPrompt, userId, modelId)
    MCP->>Bedrock: invoke with thinking payload (if supported)
    Bedrock-->>MCP: analysis result + tokens
    MCP-->>Meta: analysis result
    Meta->>DB: create tools / agent config (includes skills & thinkingLevel)
    Meta-->>Designer: design result (agents, reasoning, points)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • 012 visual builder update #24 — Overlapping model-aware workflow, tool/skill dispatch, Bedrock gating, and token-usage billing changes; substantial code-level intersection.
  • 012 visual builder update #25 — Related billing adjustments (non-fatal billing increments), MCP invocation/billing patterns, and similar Convex module edits.

Poem

🐰 I nibble code and hop along,

Skills and models sing a song,
Bedrock gates and dispatch calls,
Tokens counted in the halls,
Thoughts loop on until they're strong.

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (73 files):

⚔️ .github/workflows/setup-github-projects_Version3.yml (content)
⚔️ .gitignore (content)
⚔️ README.md (content)
⚔️ convex/_generated/api.d.ts (content)
⚔️ convex/agentAsToolGenerator.ts (content)
⚔️ convex/agentBuilderWorkflow.ts (content)
⚔️ convex/agentCapabilities.ts (content)
⚔️ convex/agentCoreTester.ts (content)
⚔️ convex/agentExecution.test.ts (content)
⚔️ convex/agentImprovement.ts (content)
⚔️ convex/agentValidation.ts (content)
⚔️ convex/agentcoreDeployment.ts (content)
⚔️ convex/agentcoreTestExecution.ts (content)
⚔️ convex/agents.ts (content)
⚔️ convex/apiKeys.ts (content)
⚔️ convex/auditLogs.ts (content)
⚔️ convex/authDebug.ts (content)
⚔️ convex/authErrorHandler.ts (content)
⚔️ convex/automatedAgentBuilder.ts (content)
⚔️ convex/awsAuth.ts (content)
⚔️ convex/awsDeployment.ts (content)
⚔️ convex/cdkGenerator.ts (content)
⚔️ convex/cloudFormationGenerator.ts (content)
⚔️ convex/codeGenerator.ts (content)
⚔️ convex/constants.ts (content)
⚔️ convex/conversationAnalysis.ts (content)
⚔️ convex/conversations.ts (content)
⚔️ convex/debuggingAgent.ts (content)
⚔️ convex/deploymentPackageGenerator.ts (content)
⚔️ convex/deploymentRouter.ts (content)
⚔️ convex/diagramGenerator.ts (content)
⚔️ convex/errorLogging.ts (content)
⚔️ convex/http.ts (content)
⚔️ convex/integration.test.ts (content)
⚔️ convex/interleavedReasoning.ts (content)
⚔️ convex/lib/dynamicModelSwitching.ts (content)
⚔️ convex/lib/messageExecutor.ts (content)
⚔️ convex/lib/tierConfig.ts (content)
⚔️ convex/lib/unifiedModalitySwitching.ts (content)
⚔️ convex/maintenance.ts (content)
⚔️ convex/mcpClient.ts (content)
⚔️ convex/mcpConfig.ts (content)
⚔️ convex/mcpFileUpload.ts (content)
⚔️ convex/memory.ts (content)
⚔️ convex/memoryData.ts (content)
⚔️ convex/metaAgent.ts (content)
⚔️ convex/metaAgentWorkflow.ts (content)
⚔️ convex/metaTooling.ts (content)
⚔️ convex/modelBenchmarks.ts (content)
⚔️ convex/modelRegistry.ts (content)
⚔️ convex/multiAgentRuntime.ts (content)
⚔️ convex/promptChainExecutor.ts (content)
⚔️ convex/queueProcessor.ts (content)
⚔️ convex/realAgentTesting.ts (content)
⚔️ convex/schema.ts (content)
⚔️ convex/strandsAgentExecution.ts (content)
⚔️ convex/strandsAgentExecutionDynamic.ts (content)
⚔️ convex/strandsAgentsTools.ts (content)
⚔️ convex/swarmTestingOrchestrator.ts (content)
⚔️ convex/testExecution.ts (content)
⚔️ convex/tools.ts (content)
⚔️ convex/unifiedAgentExecution.ts (content)
⚔️ convex/workflowExecutor.ts (content)
⚔️ convex/workflowTemplateSeeder.ts (content)
⚔️ convex/workflows.ts (content)
⚔️ package.json (content)
⚔️ src/App.tsx (content)
⚔️ src/SignInForm.tsx (content)
⚔️ src/components/AgentBuilder.tsx (content)
⚔️ src/components/AgentBuilderWorkflowUI.tsx (content)
⚔️ src/components/AgentTester.tsx (content)
⚔️ src/components/ConversationChat.tsx (content)
⚔️ src/data/modelCatalog.ts (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary changes: enhancing the model registry and agent execution with skills and thinking capabilities, which aligns with the substantial modifications across model handling, schema extensions, and agent execution logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 012-visual-builder-update
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch 012-visual-builder-update
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the agent builder platform with advanced reasoning capabilities and a flexible skills system. It introduces support for new AI models (DeepSeek V3.2, MiniMax M2, Zhipu GLM 4.7, Qwen3), adds thinking levels to control reasoning depth, and implements a reusable skills framework that allows agents to invoke capabilities like memory management and advanced reasoning during conversations.

Changes:

  • Added thinkingApiPath metadata to model registry for routing reasoning invocations to the correct Bedrock API
  • Extended agent schema with optional skills and thinkingLevel fields for backward compatibility
  • Implemented shared tool dispatch infrastructure (lib/toolDispatch.ts) consolidating tool routing logic
  • Enhanced agent execution loop to support multi-turn tool use following the Anthropic tool_use protocol
  • Added Skills step to AgentBuilder UI and test model override in AgentTester
  • Created comprehensive test suite (agentSkills.test.ts) covering the new features

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
convex/modelRegistry.ts Added 7 new model entries with thinkingApiPath metadata and getModelThinkingConfig helper function
src/data/modelCatalog.ts Added corresponding UI catalog entries for new models
convex/schema.ts Extended agents table with skills and thinkingLevel, dynamicTools with skill-related fields
convex/lib/toolDispatch.ts New shared module with INTERNAL_TOOL_MAP, skill dispatch logic, thinking level mapping, and tool loop helpers
convex/strandsAgentExecution.ts Refactored to load agent skills and execute multi-turn tool loops with token accumulation
convex/workflowExecutor.ts Migrated to use shared INTERNAL_TOOL_MAP from toolDispatch.ts
convex/testExecution.ts Added testModelId override to allow testing with different models
convex/agentcoreTestExecution.ts Updated to respect testModelId override and improved error handling
convex/metaTooling.ts Added skill CRUD operations (listSkills, createSkill, getSkillById, seedBuiltinSkills)
convex/metaAgent.ts Refactored to build model scores from BEDROCK_MODELS registry dynamically
convex/metaAgentWorkflow.ts Added modelId parameter for user-selected workflow models
convex/agentBuilderWorkflow.ts Implemented model-aware invocation routing (Claude vs Converse API)
convex/mcpClient.ts Added userId parameter and token billing integration for MCP tool invocations
convex/mcpConfig.ts Security fix: prevent cross-user data leakage in getMCPServerByNameInternal
convex/deploymentRouter.ts Cleaned up Stripe mutation imports
src/components/AgentBuilder.tsx Added SkillsStep component with thinking level selector and skills chooser
src/components/AgentTester.tsx Added testModelId parameter to support testing with overridden models
convex/agentSkills.test.ts Comprehensive test suite with 14 tests covering tooling, schema, and skills CRUD
package.json Added test:skills script for focused test execution
.github/workflows/setup-github-projects_Version3.yml Updated project name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if ( !tc.id || !tc.name ) continue;
const result = await dispatchToolCall(
ctx, tc.name, ( tc.input as Record<string, any> ) || {},
agentSkills, String( agent.createdBy ),
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The userId is converted to string using String(agent.createdBy), but agent.createdBy is an Id<"users"> type which should already be compatible. Additionally, the dispatchToolCall function expects userId as a string parameter, but at line 178 in toolDispatch.ts, it's cast to 'any' to match the expected type Id<"users">. This type mismatch could indicate a design inconsistency. Consider using the Id type directly or clarifying the expected type in the function signature.

Suggested change
agentSkills, String( agent.createdBy ),
agentSkills, agent.createdBy,

Copilot uses AI. Check for mistakes.
Comment on lines 90 to 94
const testModel = args.testModelId || agent.model;
const testDeployType = args.testModelId
? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType )
: agent.deploymentType;
const isOllamaModel = testDeployType === "ollama" || (!testDeployType && testModel.includes(':') && !testModel.includes('.'));
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for determining if a model is Ollama appears duplicated on lines 90-94. The same heuristic (checking if modelId contains ":" but not ".") is used to infer deployment type when testModelId is provided. This heuristic might not be reliable for all future model naming patterns. Consider extracting this logic into a utility function or using a more robust method to determine the deployment type, such as looking up the model in a registry.

Copilot uses AI. Check for mistakes.
return { success: false, output: "", error: error.message };
}
}
break;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skillType is "internal" but the actionName is not found in INTERNAL_TOOL_MAP, the function breaks out of the switch without returning a value, falling through to the Priority 3 fallback. This could lead to unexpected behavior. Consider adding an explicit return statement or error handling for cases where an internal skill's actionName is invalid.

Suggested change
break;
return {
success: false,
output: "",
error: actionName
? `Unknown internal tool actionName: ${actionName}`
: "Missing actionName for internal tool",
};

Copilot uses AI. Check for mistakes.
accumulateTokenUsage, MAX_TOOL_LOOP_ITERATIONS, buildToolResultMessages } = await import( "./lib/toolDispatch" );

let agentSkills: import( "./lib/toolDispatch" ).SkillDefinition[] = [];
const agentDoc = agent as any; // Access optional fields added to schema
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses 'as any' type assertion to access optional fields added to the schema. While the comment acknowledges this, consider using proper TypeScript types or optional chaining with type guards for better type safety. For example, you could define a type that extends AgentDoc with the optional fields, or use type guards to check for the existence of these fields.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
convex/deploymentRouter.ts (3)

192-192: ⚠️ Potential issue | 🟡 Minor

Type mismatch: deployTier3 still accepts _userId: string while deployTier1/deployTier2 use Id<"users">.

The caller on line 43 passes userId from getAuthUserId, which returns Id<"users">. This compiles only because Id<"users"> extends string at runtime, but it's inconsistent with the other two tier functions and will mask type errors if the signature is relied upon later.

Proposed fix
-async function deployTier3(_ctx: any, _args: any, _userId: string): Promise<any> {
+async function deployTier3(_ctx: any, _args: any, _userId: Id<"users">): Promise<any> {

328-340: ⚠️ Potential issue | 🟠 Major

updateHealthStatus is a public mutation with no auth check.

This mutation is exported and callable by any client. An unauthenticated user could modify the healthStatus and lastHealthCheck of any deployment by ID. Consider adding authentication or making this an internal mutation if it's only meant to be called from monitorAgentCoreHealth.

Option A: Make it an internal mutation
-import { mutation, query } from "./_generated/server";
+import { mutation, query, internalMutation } from "./_generated/server";
 ...
-export const updateHealthStatus = mutation({
+export const updateHealthStatus = internalMutation({

Then update the caller on line 309 to use internal.deploymentRouter.updateHealthStatus.

Option B: Add auth guard
 export const updateHealthStatus = mutation({
   args: {
     deploymentId: v.id("deployments"),
     healthStatus: v.string(),
     lastHealthCheck: v.number(),
   },
   handler: async (ctx, args) => {
+    const userId = await getAuthUserId(ctx);
+    if (!userId) throw new Error("Not authenticated");
     await ctx.db.patch(args.deploymentId, {

260-268: ⚠️ Potential issue | 🟠 Major

Authorization bypass: agentId branch drops the userId filter.

When args.agentId is provided, the query is completely replaced with one that filters only by agentId, losing the userId authorization check. Any authenticated user can retrieve deployment history for any agent by providing its ID.

The deployments table lacks a composite by_agent_user index (unlike the conversations table which has one), so the fix requires either:

  1. Adding a .filter() to restore the userId check on the indexed query, or
  2. Creating a composite index by_agent_user on deployments and using it for dual-field indexed queries
Proposed fix — filter by both userId and agentId
     if (args.agentId) {
       query = ctx.db
         .query("deployments")
-        .withIndex("by_agent", (q) => q.eq("agentId", args.agentId!));
+        .withIndex("by_agent", (q) => q.eq("agentId", args.agentId!))
+        .filter((q) => q.eq(q.field("userId"), userId));
     }
convex/testExecution.ts (1)

420-422: ⚠️ Potential issue | 🔴 Critical

Bug: retryTest mis-classifies Bedrock models as Ollama, skipping billing.

Line 422 uses agent.model.includes(':') to detect Ollama, but many Bedrock model IDs contain colons (e.g., anthropic.claude-haiku-4-5-20251001-v1:0). This causes isOllamaModel to be true for standard Bedrock models, so billing at line 485 is incorrectly skipped.

The submitTest handler correctly checks includes(':') && !includes('.') — the same guard must be applied here.

🐛 Proposed fix
-    const isOllamaModel = agent ? (agent.model.includes(':') || agent.deploymentType === "ollama") : false;
+    const isOllamaModel = agent
+      ? (agent.deploymentType === "ollama" || (agent.model.includes(':') && !agent.model.includes('.')))
+      : false;
convex/agentcoreTestExecution.ts (2)

135-143: ⚠️ Potential issue | 🟠 Major

Lambda fallback uses agent.model instead of effectiveModel, ignoring the test-model override.

Line 138 passes agent.model to the Lambda fallback, while the direct Bedrock path at line 120 correctly uses effectiveModel. If a user selected a different model for testing, the Lambda fallback would silently revert to the agent's original model.

🐛 Proposed fix
           result = await executeViaLambda( {
             agentCode: agent.generatedCode,
             input: args.input,
-            modelId: agent.model,
+            modelId: effectiveModel,
             tools: agent.tools || [],
           } );

354-364: ⚠️ Potential issue | 🔴 Critical

SSRF allowlist missing host.docker.internal — blocks the default Ollama endpoint set by extractModelConfig.

In convex/testExecution.ts (line 809), the default Ollama baseUrl is http://host.docker.internal:11434. That URL flows through testDetails.modelConfig?.baseUrl at line 112 of this file. But line 355 here only allows localhost, 127.0.0.1, and ::1, so the default Docker-internal Ollama endpoint would be rejected.

🐛 Proposed fix
     // Validate Ollama endpoint to prevent SSRF
-    const allowedHosts = ["localhost", "127.0.0.1", "::1"];
+    const allowedHosts = ["localhost", "127.0.0.1", "::1", "host.docker.internal"];
src/components/AgentBuilder.tsx (2)

310-310: ⚠️ Potential issue | 🟠 Major

Step index not updated after inserting the Skills step — auto-generate navigates to Architecture instead of Test.

setCurrentStep(4) at line 310 was likely intended to navigate to the Test step after auto-generating from a plan. With the new "Skills" step inserted at index 3, all subsequent indices shifted by 1: the Test step is now at index 5, not 4.

🐛 Proposed fix
-      setCurrentStep( 4 );
+      setCurrentStep( 5 );

126-133: ⚠️ Potential issue | 🟠 Major

Include skills and thinkingLevel in the generateAgent and createAgent calls.

The skills and thinkingLevel fields are defined in the AgentConfig interface and managed in UI state (lines 1267–1290), but they are not passed to generateAgent (line 127) or createAgent (line 156). Since the backend execution reads thinkingLevel from the persisted agent document (convex/strandsAgentExecution.ts:214–216), these fields must be included in the createAgent call to be saved. Add them to both mutation/action arguments:

// Line 127: generateAgent call
const result = await generateAgent( {
  name: config.name,
  model: config.model,
  systemPrompt: config.systemPrompt,
  tools: config.tools,
  deploymentType: config.deploymentType,
  skills: config.skills,
  thinkingLevel: config.thinkingLevel,
} );

// Line 156: createAgent call
const agentId = await createAgent( {
  name: config.name,
  description: config.description,
  model: config.model,
  systemPrompt: config.systemPrompt,
  tools: config.tools,
  generatedCode,
  dockerConfig,
  deploymentType: config.deploymentType,
  isPublic: false,
  exposableAsMCPTool: config.exposableAsMCPTool,
  mcpToolName: config.mcpToolName,
  mcpInputSchema: config.mcpInputSchema,
  skills: config.skills,
  thinkingLevel: config.thinkingLevel,
} );

The backend mutation and action signatures will also need to be updated to accept these parameters.

convex/metaAgentWorkflow.ts (1)

181-190: ⚠️ Potential issue | 🔴 Critical

Bug: research field contains the function reference, not the result string.

Line 186 assigns researchStep (the function defined at line 229) instead of researchResult (the string captured at line 135). This means the workflow output contains a function reference rather than the research text, which will serialize incorrectly (likely as null or cause a runtime error).

🐛 Proposed fix
       return {
         success: true,
         ...design,
         workflow: {
           thinking1: thinkingStep1,
-          research: researchStep,
+          research: researchResult,
           thinking2: thinkingStep2,
           thinking3: thinkingStep3,
         },
       };
🤖 Fix all issues with AI agents
In `@convex/agentcoreTestExecution.ts`:
- Around line 83-84: effectiveProvider is being taken directly from
agent.modelProvider which may be undefined; change logic to derive provider from
deploymentType like testExecution.ts does by calling
extractModelConfig(effectiveModel, effectiveDeploymentType) and using its
provider result. Specifically, replace the use of agent.modelProvider when
computing effectiveProvider (currently alongside testDetails.modelProvider) with
the provider returned from extractModelConfig(effectiveModel,
effectiveDeploymentType) so functions like isOllama correctly detect Ollama
agents; update references to effectiveDeploymentType (derived from
testDetails.deploymentType || agent.deploymentType) and ensure
extractModelConfig is imported/available in this file.

In `@convex/lib/toolDispatch.ts`:
- Around line 193-208: dispatchToolCall currently allows unbounded recursive
agent dispatch when skillType === "agent" because it calls
executeAgentWithStrandsAgents; add a depth guard by introducing a depth
parameter (default 0) or a shared MAX_AGENT_DISPATCH_DEPTH constant, check depth
before the agent branch in dispatchToolCall and refuse/return an error when
depth >= MAX_AGENT_DISPATCH_DEPTH, and when invoking
ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, ...)
include/increment the depth so downstream dispatches call dispatchToolCall with
depth+1; update any callers of dispatchToolCall to pass the initial depth (0)
and ensure the returned error message clearly indicates exceeded max agent
dispatch depth.

In `@convex/mcpClient.ts`:
- Around line 187-189: invokeMCPTool currently uses caller-supplied args.userId
for billing; change it to derive and use the authenticated user id instead of
trusting args.userId: inside invokeMCPTool replace use of args.userId in the
billing/charge block with getAuthUserId(ctx) (or validate that args.userId ===
getAuthUserId(ctx) and reject/throw if not), and remove/ignore the userId
argument for billing purposes so only the authenticated user's account can be
charged.

In `@convex/modelRegistry.ts`:
- Around line 494-498: The new hardcoded Bedrock IDs in the modelRegistry block
are unverified and some lack the standard suffixes; update this by querying
Bedrock’s /v1/models at runtime and using that result instead of the hardcoded
entries: implement a fetch (e.g., add or update a helper like fetchBedrockModels
or getAvailableBedrockModels) to discover and validate IDs before adding them to
the existing modelRegistry map, fall back to a safe feature-flagged list (or
disable models) if verification fails, normalize IDs to include the expected
suffixes (e.g., ensure `:0` where applicable), and add unit/integration checks
that assert discovered IDs exist to prevent ValidationException on use.

In `@convex/strandsAgentExecution.ts`:
- Around line 354-357: When extractTokenUsage(responseBody, modelId) yields
zero, the fallback estimateTokenUsage currently passes an empty string for the
output which underestimates tokens; update the fallback to call
estimateTokenUsage(JSON.stringify(payload), JSON.stringify(responseBody)) (or
otherwise pass the actual response content) so the function receives both the
input and real output for a correct estimate—modify the code around
iterationTokens, extractTokenUsage, and estimateTokenUsage to use responseBody
as the output argument.
- Around line 340-448: The loop can exhaust MAX_TOOL_LOOP_ITERATIONS causing
silent success with empty finalToolCalls/finalContent; after the for loop in
strandsAgentExecution (the block using MAX_TOOL_LOOP_ITERATIONS, finalToolCalls,
finalContent, hasSkills, toolCalls, dispatchToolCall, buildToolResultMessages),
add a guard that detects exhaustion (e.g., a flag or check that the loop ran to
MAX_TOOL_LOOP_ITERATIONS and the last iteration produced tool_use blocks or
hasSkills was true) and then either log a warning and set a clear error result
or return success: false with an explanatory message and preserved token
accounting; ensure finalToolCalls/finalContent are set to reflect the last
observed toolCalls/content before returning so callers get a meaningful
response.
- Around line 263-266: Replace the hardcoded fallback that forces thinking on
Anthropic models by always using the mapThinkingLevelToPayload helper;
specifically, remove the ternary that sets `{ thinking: { type: "enabled",
budget_tokens: 3000 } }` and instead call mapThinkingLevelToPayload with the
current thinkingLevel and modelId (e.g., set thinkingConfig =
mapThinkingLevelToPayload(thinkingLevel, modelId)), so that undefined
thinkingLevel returns {} and avoids sending unsupported `thinking` params to
Claude 3.x/3.5.
🧹 Nitpick comments (10)
convex/testExecution.ts (1)

89-94: Duplicated Ollama-detection heuristic — extract a shared helper.

The same includes(":") && !includes(".") logic is computed twice: once for rate-limit bypass (lines 90-94) and again for model config extraction (lines 188-190). If one branch is updated without the other, they'll silently diverge. Extract a single deriveDeploymentType(modelId, fallback) helper and call it once.

♻️ Proposed refactor
+// Near the top of the handler, compute once:
+function deriveDeploymentType(modelId: string, fallbackType?: string): string {
+  if (modelId.includes(":") && !modelId.includes(".")) return "ollama";
+  return fallbackType || "bedrock";
+}

 // In submitTest handler, replace both computations:
-    const testModel = args.testModelId || agent.model;
-    const testDeployType = args.testModelId
-      ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType )
-      : agent.deploymentType;
-    const isOllamaModel = testDeployType === "ollama" || (!testDeployType && testModel.includes(':') && !testModel.includes('.'));
+    const effectiveModel = args.testModelId || agent.model;
+    const effectiveDeploymentType = args.testModelId
+      ? deriveDeploymentType(args.testModelId, agent.deploymentType)
+      : agent.deploymentType;
+    const isOllamaModel = effectiveDeploymentType === "ollama"
+      || (!effectiveDeploymentType && deriveDeploymentType(effectiveModel) === "ollama");

 // ...and remove the second computation at lines 187-190, reusing the same variables.

Also applies to: 185-191

convex/agentBuilderWorkflow.ts (1)

493-509: streamWorkflowExecution always uses ConverseCommand, bypassing the Claude-specific path in invokeWorkflowModel.

executeWorkflowStage routes Claude models to InvokeModelCommand (via invokeWorkflowModel) for optimal results, but streamWorkflowExecution unconditionally uses ConverseCommand at line 495. While Converse works with Claude, this means the two workflow entry points behave differently for the same model. Consider reusing invokeWorkflowModel here to keep behavior consistent.

♻️ Proposed refactor sketch
     for ( const stageName of stages ) {
       const stage = WORKFLOW_STAGES[stageName as keyof typeof WORKFLOW_STAGES];
       // ... build fullPrompt ...

-      const command = new ConverseCommand( { ... } );
-      const response = await bedrockClient.send( command );
-      const stageOutput = response.output?.message?.content?.[0]?.text || "";
-      const usage = response.usage ?? { inputTokens: 0, outputTokens: 0 };
+      const result = await invokeWorkflowModel( {
+        modelId: effectiveModelId,
+        stageName: stage.name,
+        systemPrompt: stage.systemPrompt,
+        userPrompt: fullPrompt,
+      } );
+      const stageOutput = result.outputText;
+      totalInputTokens += result.inputTokens;
+      totalOutputTokens += result.outputTokens;
convex/metaTooling.ts (1)

756-820: Consider validating name format to avoid collisions with seeded built-in skill names.

A user could call createSkill with a name like "short_term_memory" which collides with a built-in skill name if the built-in was seeded under a different user's ID. The duplicate check at line 785 only checks the current user's namespace, so the user-created skill would shadow the public built-in for that user. This may be intentional (user customization), but worth calling out.

convex/metaAgent.ts (1)

84-99: Clarify that modelId is the "analysis" model, not the resulting agent's model.

The user passes modelId which is used to run the requirements analysis (line 99), but the actual agent models are selected by selectOptimalModels from the scoring database (line 102). A reader might expect modelId to be the model assigned to the designed agent. A brief inline comment would prevent confusion.

convex/strandsAgentExecution.ts (3)

184-185: Unused dynamic import in executeViaAgentCore.

Lines 184–185 import six symbols from ./lib/toolDispatch, but none are used within executeViaAgentCore. All actual usage happens in executeDirectBedrock, which has its own identical import at lines 229–232. This is a wasted async import on every call.

♻️ Suggested fix — remove the unused import
 async function executeViaAgentCore(
   ctx: ActionCtx,
   agent: AgentDoc,
   message: string,
   history: ConversationMessage[]
 ): Promise<AgentExecutionResult> {
-  // Load agent's skills from dynamicTools table (if configured)
-  const { dispatchToolCall, buildToolsArray, mapThinkingLevelToPayload,
-    accumulateTokenUsage, MAX_TOOL_LOOP_ITERATIONS, buildToolResultMessages } = await import( "./lib/toolDispatch" );
-
+  // Load agent's skills from dynamicTools table (if configured)
+  const { SkillDefinition } = await import( "./lib/toolDispatch" );

Actually, only the SkillDefinition type is needed, and types are erased at runtime — so the entire import can be removed. The inline import("./lib/toolDispatch").SkillDefinition type annotation on line 187 is sufficient.

-  const { dispatchToolCall, buildToolsArray, mapThinkingLevelToPayload,
-    accumulateTokenUsage, MAX_TOOL_LOOP_ITERATIONS, buildToolResultMessages } = await import( "./lib/toolDispatch" );
-

187-187: agentSkills is never reassigned — use const.

The array is only mutated via .push(), so the binding itself is constant.

-  let agentSkills: import( "./lib/toolDispatch" ).SkillDefinition[] = [];
+  const agentSkills: import( "./lib/toolDispatch" ).SkillDefinition[] = [];

188-188: as any cast bypasses type safety for agent document fields.

skills and thinkingLevel are defined in the schema as optional fields on the agents table. If the AgentDoc type (from Doc<"agents">) is generated from the schema, these fields should already be available without casting. If the generated types haven't caught up yet, a narrower type assertion is preferable.

♻️ Suggested approach
-  const agentDoc = agent as any; // Access optional fields added to schema
+  // skills and thinkingLevel are optional schema fields on agents
+  const agentDoc = agent as AgentDoc & {
+    skills?: Array<{ skillId?: string; name: string; enabled?: boolean }>;
+    thinkingLevel?: "low" | "medium" | "high";
+  };
convex/lib/toolDispatch.ts (1)

73-91: mapThinkingLevelToPayload ignores _modelId — consider validating model compatibility.

The _modelId parameter is unused (prefixed with _), but not all Claude models support extended thinking. If a non-thinking-capable model is passed with a thinkingLevel, this will produce a payload that causes an API error. Consider validating model compatibility here or documenting the caller's responsibility.

convex/agentSkills.test.ts (2)

44-51: Redundant assertion: typeof ref check after toBeDefined().

Line 49 (expect(typeof ref).not.toBe("undefined")) is logically equivalent to the expect(ref).toBeDefined() on line 48. You could remove one or replace both with a more meaningful assertion — e.g., checking that each ref is a string (Convex function references serialize to strings).


1-12: Test header lists dispatchToolCall routing logic but no test covers it.

The comment on line 6 mentions testing dispatch routing, but dispatchToolCall is an async function requiring ActionCtx mocking and isn't tested here. Either add a note that it's tested via integration tests elsewhere, or remove it from the header to avoid confusion.

Comment on lines +494 to +498
// ============================================================================
// NEW MODELS — Feb 10, 2026 announcement
// https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-bedrock-adds-support-six-open-weights-models/
// NOTE: Model IDs predicted from naming patterns. Verify in AWS console.
// ============================================================================
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

AWS Bedrock February 2026 announcement new models DeepSeek MiniMax GLM Qwen Kimi

💡 Result:

AWS’s February 2026 Amazon Bedrock announcement (posted Feb 10, 2026) added six new fully managed “open weights” models—including the providers you listed (DeepSeek, MiniMax, Zhipu/GLM, Qwen, Moonshot/Kimi): DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, and Qwen3 Coder Next. It also states these models are served via Project Mantle, a new distributed inference engine on Bedrock, and mentions OpenAI API–compatible endpoints. [1]

AWS also posted a region-specific follow-up on Feb 12, 2026 saying the latest open-weight models are available in Asia Pacific (Sydney) via the bedrock-mantle endpoint (powered by Project Mantle). [2]

Sources: [1], [2]


🌐 Web query:

AWS Bedrock DeepSeek V3.2 MiniMax M2.1 model ID format bedrock-mantle

💡 Result:

On Amazon Bedrock’s bedrock-mantle (OpenAI‑compatible) endpoint, you don’t hardcode a special “bedrock-mantle” model ID format—you use the model id returned by the Mantle /v1/models (OpenAI Models API), and then pass that id as the model field in Responses / Chat Completions requests. (docs.aws.amazon.com)

What the IDs generally look like (Bedrock side)

Bedrock model identifiers are typically of the form:

  • Base model ID: provider.model-name-version (often with an optional suffix like :0) (docs.aws.amazon.com)
    Example shown in AWS docs for DeepSeek V3.1: deepseek.v3-v1:0. (docs.aws.amazon.com)
  • Cross‑Region inference profile ID: adds a geographic prefix like us.
    Example in AWS docs: us.deepseek.r1-v1:0. (docs.aws.amazon.com)

DeepSeek V3.2 and MiniMax M2.1 (Project Mantle)

AWS announced DeepSeek V3.2 and MiniMax M2.1 as models served via Project Mantle (and therefore available via the Mantle/OpenAI‑compatible endpoints). (aws.amazon.com)

Because the exact Mantle “model id” strings you must pass can vary by rollout/Region, the dependable way to get the correct IDs is to list models from your target bedrock-mantle.<region>.api.aws endpoint and use the returned id. (docs.aws.amazon.com)

Minimal way to discover the correct IDs

curl https://bedrock-mantle.us-east-1.api.aws/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

(Use the id values in the response as your model.) (docs.aws.amazon.com)

Citations:


🏁 Script executed:

sed -n '494,620p' convex/modelRegistry.ts

Repository: MikePfunk28/agent_builder_application

Length of output: 4432


🏁 Script executed:

# Check if there's any indication these IDs have been tested or verified
grep -n "deepseek.v3-0324\|minimax.minimax-m2\|zhipu.glm\|moonshot.kimi\|qwen.qwen3" convex/modelRegistry.ts | head -20

Repository: MikePfunk28/agent_builder_application

Length of output: 858


🏁 Script executed:

# Check if there are any tests that use these new model IDs
grep -r "deepseek.v3-0324\|minimax.minimax-m2\|zhipu.glm\|moonshot.kimi-k2-5\|qwen.qwen3" --include="*.test.*" --include="*.spec.*" --include="*.test.ts" --include="*.spec.ts" 2>/dev/null | head -20

Repository: MikePfunk28/agent_builder_application

Length of output: 63


🏁 Script executed:

# Check if there's any documentation or README about verifying these model IDs
find . -name "README*" -o -name "CONTRIBUTING*" -o -name "*MODELS*" -o -name "*.md" | xargs grep -l "bedrock\|model.*id\|verify" 2>/dev/null | head -10

Repository: MikePfunk28/agent_builder_application

Length of output: 523


🏁 Script executed:

# Check the project README for any model verification guidance
head -100 README.md

Repository: MikePfunk28/agent_builder_application

Length of output: 5107


Model IDs lack verification despite real AWS announcement.

The February 10, 2026 AWS Bedrock announcement is confirmed, but the hardcoded model IDs in lines 494–620 remain unverified. The code itself documents this with inline notes ("predicted from naming patterns," "not yet confirmed in AWS docs"). Additionally:

  • Several IDs deviate from standard Bedrock format (MiniMax and Kimi entries lack the :0 suffix convention)
  • GLM 4.7 entries explicitly note uncertain prefix: "Verify prefix (zhipu. vs z-ai.)"
  • AWS best practice recommends discovering model IDs dynamically via the /v1/models API endpoint rather than hardcoding them
  • No tests verify these IDs will work with Bedrock

If any ID is incorrect, all invocations for that model will fail with a Bedrock ValidationException. Before merging to main, either verify each ID against the AWS console/SDK or gate these models behind a feature flag.

🤖 Prompt for AI Agents
In `@convex/modelRegistry.ts` around lines 494 - 498, The new hardcoded Bedrock
IDs in the modelRegistry block are unverified and some lack the standard
suffixes; update this by querying Bedrock’s /v1/models at runtime and using that
result instead of the hardcoded entries: implement a fetch (e.g., add or update
a helper like fetchBedrockModels or getAvailableBedrockModels) to discover and
validate IDs before adding them to the existing modelRegistry map, fall back to
a safe feature-flagged list (or disable models) if verification fails, normalize
IDs to include the expected suffixes (e.g., ensure `:0` where applicable), and
add unit/integration checks that assert discovered IDs exist to prevent
ValidationException on use.

Comment on lines 263 to 266
// Determine thinking config: use thinkingLevel if set, else default
const thinkingConfig = thinkingLevel
? mapThinkingLevelToPayload( thinkingLevel, modelId )
: { thinking: { type: "enabled", budget_tokens: 3000 } };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the file structure and find the mapThinkingLevelToPayload function
find . -type f -name "*.ts" -o -name "*.js" | head -20

Repository: MikePfunk28/agent_builder_application

Length of output: 656


🏁 Script executed:

# Search for the mapThinkingLevelToPayload function definition
rg "mapThinkingLevelToPayload" --type ts --type js -A 10

Repository: MikePfunk28/agent_builder_application

Length of output: 5586


🏁 Script executed:

# Check the context of the file mentioned in the review
fd strandsAgentExecution.ts

Repository: MikePfunk28/agent_builder_application

Length of output: 112


🌐 Web query:

Anthropic Claude extended thinking support which models

💡 Result:

Anthropic Claude extended thinking is supported on these models: Claude Opus 4.6, Claude Opus 4.5, Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4.5, Claude Sonnet 4, Claude Sonnet 3.7 (deprecated), and Claude Haiku 4.5. [1]


🏁 Script executed:

# Search for model ID references throughout the codebase
rg "claude" --type ts -i | grep -i "model\|version"

Repository: MikePfunk28/agent_builder_application

Length of output: 16802


🏁 Script executed:

# Check the strandsAgentExecution.ts file more closely around the problematic code
cat convex/strandsAgentExecution.ts | sed -n '250,280p'

Repository: MikePfunk28/agent_builder_application

Length of output: 1097


🏁 Script executed:

# Search for any configuration or constants related to model selection
rg "claude-3\|claude-opus\|claude-sonnet\|claude-haiku" --type ts

Repository: MikePfunk28/agent_builder_application

Length of output: 63


Remove hardcoded thinking default—it breaks Claude 3.x and 3.5 models still referenced in the codebase.

The hardcoded fallback { thinking: { type: "enabled", budget_tokens: 3000 } } forces extended thinking on all Anthropic models when thinkingLevel is undefined, even though older Claude 3.x and 3.5 models (which are still used in tests and defaults throughout the codebase) don't support the thinking parameter. This will cause API errors.

The mapThinkingLevelToPayload(undefined, modelId) function already correctly returns {} for undefined levels. Use it as the default:

Suggested fix
-    const thinkingConfig = thinkingLevel
-      ? mapThinkingLevelToPayload( thinkingLevel, modelId )
-      : { thinking: { type: "enabled", budget_tokens: 3000 } };
+    const thinkingConfig = mapThinkingLevelToPayload( thinkingLevel, modelId );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Determine thinking config: use thinkingLevel if set, else default
const thinkingConfig = thinkingLevel
? mapThinkingLevelToPayload( thinkingLevel, modelId )
: { thinking: { type: "enabled", budget_tokens: 3000 } };
// Determine thinking config: use thinkingLevel if set, else default
const thinkingConfig = mapThinkingLevelToPayload( thinkingLevel, modelId );
🤖 Prompt for AI Agents
In `@convex/strandsAgentExecution.ts` around lines 263 - 266, Replace the
hardcoded fallback that forces thinking on Anthropic models by always using the
mapThinkingLevelToPayload helper; specifically, remove the ternary that sets `{
thinking: { type: "enabled", budget_tokens: 3000 } }` and instead call
mapThinkingLevelToPayload with the current thinkingLevel and modelId (e.g., set
thinkingConfig = mapThinkingLevelToPayload(thinkingLevel, modelId)), so that
undefined thinkingLevel returns {} and avoids sending unsupported `thinking`
params to Claude 3.x/3.5.

Comment on lines 340 to 448
for ( let iteration = 0; iteration < MAX_TOOL_LOOP_ITERATIONS; iteration++ ) {
const command = new InvokeModelCommand( {
modelId: modelId,
contentType: "application/json",
accept: "application/json",
body: JSON.stringify( payload ),
} );

const response = await client.send( command );
const responseBody = JSON.parse(
new TextDecoder().decode( response.body )
) as BedrockInvokeResponse;

let content = "";
let reasoning = "";
const toolCalls: ToolCall[] = [];

if ( responseBody.content && Array.isArray( responseBody.content ) ) {
// Anthropic models: content is an array of typed blocks
for ( const block of responseBody.content ) {
if ( block.type === "text" ) {
content += block.text;
} else if ( block.type === "thinking" ) {
reasoning += block.thinking;
} else if ( block.type === "tool_use" ) {
const id = typeof block.id === "string" ? block.id : undefined;
const name = typeof block.name === "string" ? block.name : undefined;
toolCalls.push( {
id,
name,
input: block.input,
} );
}
const response = await client.send( command );
const responseBody = JSON.parse(
new TextDecoder().decode( response.body )
) as BedrockInvokeResponse;

// ─── Token extraction per iteration ─────────────────────────────────
let iterationTokens = extractTokenUsage( responseBody, modelId );
if ( iterationTokens.totalTokens === 0 ) {
iterationTokens = estimateTokenUsage( JSON.stringify( payload ), "" );
}
} else if ( typeof responseBody.generation === "string" ) {
// Meta/Llama models: single generation string
content = responseBody.generation;
} else if ( responseBody.outputs && Array.isArray( responseBody.outputs ) ) {
// Mistral models: outputs array with text fields
content = responseBody.outputs.map( ( o ) => o.text || "" ).join( "" );
} else if ( responseBody.generations && Array.isArray( responseBody.generations ) ) {
// Cohere models: generations array
content = responseBody.generations.map( ( g ) => g.text || "" ).join( "" );
} else if ( responseBody.completions && Array.isArray( responseBody.completions ) ) {
// AI21 models: completions array
content = responseBody.completions.map( ( c ) => c.data?.text || "" ).join( "" );
} else if ( responseBody.results && Array.isArray( responseBody.results ) ) {
// Amazon Titan models: results array
content = responseBody.results.map( ( r ) => r.outputText || "" ).join( "" );
} else {
// Fallback: try to extract text from any field in the response
// Use the already-parsed `responseBody` and avoid logging raw/sensitive content.
console.warn( `Unrecognized Bedrock response format for model ${modelId}. Response did not match expected fields.` );
try {
if ( typeof responseBody === "string" ) {
const parsed = JSON.parse( responseBody );
content = typeof parsed === "string" ? parsed : JSON.stringify( parsed );
} else if ( responseBody && typeof responseBody === "object" ) {
// Preserve a JSON representation of the object as the fallback content
content = JSON.stringify( responseBody );
} else {
content = String( responseBody );
accumulateTokenUsage( accumulatedTokens, iterationTokens );

// ─── Parse response ─────────────────────────────────────────────────
let content = "";
let reasoning = "";
const toolCalls: ToolCall[] = [];
const assistantContentBlocks: Array<{ type: string;[key: string]: any }> = [];

if ( responseBody.content && Array.isArray( responseBody.content ) ) {
// Anthropic models: content is an array of typed blocks
for ( const block of responseBody.content ) {
assistantContentBlocks.push( block );
if ( block.type === "text" ) {
content += block.text;
} else if ( block.type === "thinking" ) {
reasoning += block.thinking;
} else if ( block.type === "tool_use" ) {
const id = typeof block.id === "string" ? block.id : undefined;
const name = typeof block.name === "string" ? block.name : undefined;
toolCalls.push( {
id,
name,
input: block.input,
} );
}
}
} catch {
// If JSON.parse fails for some reason, fall back to a best-effort string
} else if ( typeof responseBody.generation === "string" ) {
content = responseBody.generation;
} else if ( responseBody.outputs && Array.isArray( responseBody.outputs ) ) {
content = responseBody.outputs.map( ( o ) => o.text || "" ).join( "" );
} else if ( responseBody.generations && Array.isArray( responseBody.generations ) ) {
content = responseBody.generations.map( ( g ) => g.text || "" ).join( "" );
} else if ( responseBody.completions && Array.isArray( responseBody.completions ) ) {
content = responseBody.completions.map( ( c ) => c.data?.text || "" ).join( "" );
} else if ( responseBody.results && Array.isArray( responseBody.results ) ) {
content = responseBody.results.map( ( r ) => r.outputText || "" ).join( "" );
} else {
console.warn( `Unrecognized Bedrock response format for model ${modelId}. Response did not match expected fields.` );
try {
content = JSON.stringify( responseBody );
if ( typeof responseBody === "string" ) {
const parsed = JSON.parse( responseBody );
content = typeof parsed === "string" ? parsed : JSON.stringify( parsed );
} else if ( responseBody && typeof responseBody === "object" ) {
content = JSON.stringify( responseBody );
} else {
content = String( responseBody );
}
} catch {
content = String( responseBody );
try {
content = JSON.stringify( responseBody );
} catch {
content = String( responseBody );
}
}
}
}

// ─── Token extraction for billing ───────────────────────────────────────
const { extractTokenUsage, estimateTokenUsage } = await import( "./lib/tokenBilling" );
let tokenUsage = extractTokenUsage( responseBody, modelId );
// Accumulate text and reasoning across iterations
if ( content ) finalContent = content; // Last iteration's text wins
if ( reasoning ) finalReasoning += ( finalReasoning ? "\n---\n" : "" ) + reasoning;

// ─── Agent Loop: Execute tool calls if agent has skills ──────────────
if ( hasSkills && toolCalls.length > 0 ) {
// Execute each tool call via shared dispatch
const toolResults: import( "./lib/toolDispatch" ).ToolResult[] = [];
for ( const tc of toolCalls ) {
if ( !tc.id || !tc.name ) continue;
const result = await dispatchToolCall(
ctx, tc.name, ( tc.input as Record<string, any> ) || {},
agentSkills, String( agent.createdBy ),
);
toolResults.push( {
toolUseId: tc.id,
output: result.success ? result.output : ( result.error || "Tool execution failed" ),
isError: !result.success,
} );
}

// Build assistant + tool_result messages and append to conversation
const loopMessages = buildToolResultMessages( assistantContentBlocks, toolResults );
// Update payload messages for the next iteration
const currentMessages = ( payload.messages as any[] ) || [];
payload.messages = [...currentMessages, ...loopMessages];

// Continue the loop — model will see tool results and respond
continue;
}

// Fallback: estimate from text when provider doesn't return counts
if ( tokenUsage.totalTokens === 0 ) {
const inputText = JSON.stringify( payload );
tokenUsage = estimateTokenUsage( inputText, content );
// No tool calls (or no skills) — we're done, break out of loop
finalToolCalls = toolCalls;
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent failure when tool loop exhausts MAX_TOOL_LOOP_ITERATIONS.

If the model continuously requests tool calls for all 10 iterations, the for loop exits without hitting the break on line 447. In that case:

  • finalToolCalls remains [] (never assigned).
  • finalContent is the last iteration's text, which may be empty if the last response was purely tool_use blocks.
  • The function returns success: true with potentially empty content and no indication that the loop was truncated.

Add a guard after the loop to detect exhaustion and either warn or return a meaningful error.

🛡️ Suggested fix
   // No tool calls (or no skills) — we're done, break out of loop
   finalToolCalls = toolCalls;
   break;
  }
+
+ // If the loop completed without breaking, the model exhausted the iteration limit
+ if ( hasSkills && finalToolCalls.length === 0 && !finalContent.trim() ) {
+   console.warn( `Agent loop exhausted ${MAX_TOOL_LOOP_ITERATIONS} iterations without producing a final response.` );
+ }
🤖 Prompt for AI Agents
In `@convex/strandsAgentExecution.ts` around lines 340 - 448, The loop can exhaust
MAX_TOOL_LOOP_ITERATIONS causing silent success with empty
finalToolCalls/finalContent; after the for loop in strandsAgentExecution (the
block using MAX_TOOL_LOOP_ITERATIONS, finalToolCalls, finalContent, hasSkills,
toolCalls, dispatchToolCall, buildToolResultMessages), add a guard that detects
exhaustion (e.g., a flag or check that the loop ran to MAX_TOOL_LOOP_ITERATIONS
and the last iteration produced tool_use blocks or hasSkills was true) and then
either log a warning and set a clear error result or return success: false with
an explanatory message and preserved token accounting; ensure
finalToolCalls/finalContent are set to reflect the last observed
toolCalls/content before returning so callers get a meaningful response.

- Updated model identification logic to use `isOllamaModelId` for better clarity and accuracy.
- Removed unnecessary console logs from agent execution and unified agent execution functions.
- Enhanced validation for agent configuration in the AgentBuilder component using Zod schema.
- Improved handling of dynamic input types across various tools and workflows.
- Added iterative execution mode in AgentTester with completion criteria and result tracking.
- Cleaned up console logs in ConversationChat for better readability and performance.
- Updated package.json to include Zod for schema validation.
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 14, 2026

Deploying agent-builder-application with  Cloudflare Pages  Cloudflare Pages

Latest commit: 384e1aa
Status: ✅  Deploy successful!
Preview URL: https://4273bf92.agent-builder-application.pages.dev
Branch Preview URL: https://012-visual-builder-update.agent-builder-application.pages.dev

View logs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (18)
convex/swarmTestingOrchestrator.ts (2)

782-788: ⚠️ Potential issue | 🔴 Critical

Runtime crash: .map() called on a plain object (Record<string, any>).

sendMessageToSwarm returns responses as Record<string, any> (line 137), not an array. Calling .map() on it will throw TypeError: result.responses.map is not a function. Line 392 handles this correctly with Object.entries(...).

🐛 Proposed fix
-      const formattedResults = result.responses.map(([agentId, response]: [string, any]) => ({
+      const formattedResults = Object.entries(result.responses).map(([agentId, response]: [string, any]) => ({
         agentId,
         success: response.success,
         output: response.content || response.response,
         executionTime: response.executionTime,
         error: response.error,
       }));

418-419: ⚠️ Potential issue | 🟡 Minor

Division by zero when results is empty yields NaN scores.

If no agents respond (empty responses), results.length is 0 and both scores become NaN. Guard with a zero check.

Proposed fix
-      const isolationScore = (totalIsolationScore / results.length) * 100;
-      const coordinationScore = (totalCoordinationScore / results.length) * 100;
+      const isolationScore = results.length > 0 ? (totalIsolationScore / results.length) * 100 : 0;
+      const coordinationScore = results.length > 0 ? (totalCoordinationScore / results.length) * 100 : 0;
convex/apiKeys.ts (1)

259-266: ⚠️ Potential issue | 🟠 Major

Math.random() is not cryptographically secure for API key generation.

generateSecureToken uses Math.random(), which is a PRNG and predictable. For API key material, use crypto.getRandomValues (available in the Web Crypto API / Node.js) to produce cryptographically strong tokens.

🔒 Proposed fix
 function generateSecureToken(length: number): string {
   const chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
-  let result = '';
-  for (let i = 0; i < length; i++) {
-    result += chars.charAt(Math.floor(Math.random() * chars.length));
-  }
-  return result;
+  const values = new Uint32Array(length);
+  crypto.getRandomValues(values);
+  return Array.from(values, (v) => chars[v % chars.length]).join('');
 }
convex/integration.test.ts (1)

1112-1148: ⚠️ Potential issue | 🟠 Major

beforeEach data is invisible to tests due to separate convexTest instances.

The beforeEach at line 1112 creates its own convexTest(schema, modules) and inserts records (setting testUserId, testAgentId). However, every test block (e.g., line 1151) creates a new convexTest instance with its own in-memory database. The IDs stored in testUserId / testAgentId won't resolve in the test's t, so agents.find(a => a._id === testAgentId) at line 1160 will never match the inserted agent.

The same pattern is repeated in the "MCP Server Integration" (line 673), "AWS Diagram Generator" (line 831), and "Bedrock AgentCore" (line 1009) describe blocks. Each beforeEach is effectively dead code.

Either share the same t instance across beforeEach and test, or move the setup into each test.

convex/agentCoreTester.ts (1)

298-301: ⚠️ Potential issue | 🟠 Major

Hardcoded fallback "placeholder-token" will silently send unauthenticated requests.

If AGENTCORE_AUTH_TOKEN is unset, this falls through to a literal placeholder string that will be sent as a Bearer token to the real AgentCore endpoint. This should throw an error (like the credential helpers in convex/lib/aws/credentials.ts) rather than silently degrade.

Proposed fix
   private async getAuthToken(): Promise<string> {
-    return process.env.AGENTCORE_AUTH_TOKEN || "placeholder-token";
+    const token = process.env.AGENTCORE_AUTH_TOKEN;
+    if (!token) {
+      throw new Error("Missing AGENTCORE_AUTH_TOKEN environment variable");
+    }
+    return token;
   }
src/components/AgentTester.tsx (1)

212-221: ⚠️ Potential issue | 🟡 Minor

Stop button is non-functional during iterative mode.

isRunning on line 105 includes iterativeRunning, so the Stop button is shown during iterative execution. However, handleStop (line 212) early-returns when currentTestId is null, and currentTestId is never set in the iterative path. Clicking Stop during iterative mode silently does nothing.

Either disable the Stop button when iterativeRunning is true, or implement cancellation for the iterative action (e.g., via an AbortController).

Quick fix: disable Stop in iterative mode
           <button
             onClick={handleStop}
+            disabled={iterativeRunning}
             className="flex items-center gap-2 px-6 py-3 bg-red-600 text-white rounded-lg hover:bg-red-700 transition-colors"
           >

Also applies to: 105-105

convex/agentcoreTestExecution.ts (1)

140-148: ⚠️ Potential issue | 🟠 Major

Lambda fallback uses agent.model instead of effectiveModel, bypassing the test-level model override.

Line 143 passes agent.model to the Lambda fallback, while lines 115, 125 correctly use effectiveModel. If a test specifies a model override via testDetails.modelConfig?.modelId, the Lambda fallback will invoke the wrong model.

Proposed fix
           result = await executeViaLambda( {
             agentCode: agent.generatedCode,
             input: args.input,
-            modelId: agent.model,
+            modelId: effectiveModel,
             tools: agent.tools || [],
           } );
convex/promptChainExecutor.ts (1)

245-253: ⚠️ Potential issue | 🟠 Major

Missing try/catch on parallel billing — inconsistent with sequential and testPrompt paths.

The sequential chain (lines 124–137) and testPrompt (lines 463–477) both wrap billing in a try/catch marked as non-fatal. The parallel path will throw and fail the entire action if incrementUsageAndReportOverage rejects, even though the model invocations already succeeded.

Proposed fix: wrap in try/catch like the other paths
     // Meter: token-based billing for all parallel prompts
     if ( gateUserId && ( totalInputTokens > 0 || totalOutputTokens > 0 ) ) {
-      await ctx.runMutation( internal.stripeMutations.incrementUsageAndReportOverage, {
-        userId: gateUserId,
-        modelId: gateModelId,
-        inputTokens: totalInputTokens,
-        outputTokens: totalOutputTokens,
-      } );
+      try {
+        await ctx.runMutation( internal.stripeMutations.incrementUsageAndReportOverage, {
+          userId: gateUserId,
+          modelId: gateModelId,
+          inputTokens: totalInputTokens,
+          outputTokens: totalOutputTokens,
+        } );
+      } catch ( billingErr ) {
+        console.error( "promptChainExecutor: parallel billing failed (non-fatal)", {
+          userId: gateUserId, modelId: gateModelId,
+          inputTokens: totalInputTokens, outputTokens: totalOutputTokens,
+          error: billingErr instanceof Error ? billingErr.message : billingErr,
+        } );
+      }
     }
convex/codeGenerator.ts (1)

475-488: ⚠️ Potential issue | 🟡 Minor

Remaining raw tool.name interpolations in generated Python f-strings.

Lines 480 and 487 interpolate tool.name directly into Python f-string literals without escaping. If tool.name contains ", {, }, or newlines, the generated Python code will be syntactically broken or could allow injection.

Line 475 correctly uses safeToolName — apply the same treatment to lines 480 and 487.

Proposed fix
-        result = f"Tool ${tool.name} executed successfully"
+        result = f"Tool ${safeToolName} executed successfully"
         ${params.length > 0 ? `
-        # Available parameters: ${params.map((p: any) => p.name).join(', ')}
+        # Available parameters: ${params.map((p: any) => String(p.name).replace(/[^a-zA-Z0-9_]/g, '_')).join(', ')}
         logger.debug(f"Parameters: {locals()}")` : ''}

         return result
     except Exception as e:
-        logger.error(f"Error in custom tool ${tool.name}: {str(e)}")
+        logger.error(f"Error in custom tool ${safeToolName}: {str(e)}")
         raise`;
convex/agentAsToolGenerator.ts (1)

112-118: ⚠️ Potential issue | 🟡 Minor

Raw agentName remains in Python f-strings at lines 114 and 117, undoing the escaping applied elsewhere.

The error-path strings still interpolate the original agentName directly into Python f-strings. If agentName contains ", {, or }, the generated Python will break. Use safeAgentNameDQ (which is already computed on line 48) for consistency with the rest of this function.

Proposed fix
-            return f"Error from ${agentName}: {result.get('error', 'Unknown error')}"
+            return f"Error from ${safeAgentNameDQ}: {result.get('error', 'Unknown error')}"

     except Exception as e:
-        return f"Failed to invoke ${agentName}: {str(e)}"
+        return f"Failed to invoke ${safeAgentNameDQ}: {str(e)}"
convex/agents.ts (1)

181-196: ⚠️ Potential issue | 🟡 Minor

Parse mcpInputSchema before returning — type inconsistency causes test failures.

When agent.mcpInputSchema is a non-empty string, line 185 returns it directly as inputSchema. However, test code at convex/metaToolingIntegration.test.ts:480-483 accesses .properties on inputSchema, expecting an object. This causes a type mismatch: inputSchema is either a raw JSON string or an object literal.

Parse the string before returning:

Proposed fix
     return agents.map(agent => ({
       _id: agent._id,
       name: agent.mcpToolName || agent.name,
       description: agent.description || `AI agent: ${agent.name}`,
-      inputSchema: agent.mcpInputSchema || {
+      inputSchema: (agent.mcpInputSchema ? JSON.parse(agent.mcpInputSchema) : null) || {
         type: "object",
         properties: {
           input: { 
             type: "string", 
             description: "Input to the agent" 
           }
         },
         required: ["input"]
       }
     }));

Consider using safeJsonParse from convex/lib/jsonUtils.ts instead to handle malformed data gracefully.

convex/agentExecution.test.ts (1)

686-707: ⚠️ Potential issue | 🟡 Minor

Test is now a no-op — no assertions remain.

After removing the ECS assertions, this test patches fields and queries data but asserts nothing. It should either be removed or updated to verify AgentCore-specific cleanup behavior.

Suggested fix: add minimal assertion or mark as TODO
       // Test routed through AgentCore — ECS fields are deprecated
       // Verify the test was created and queued successfully
+      expect(test).toBeDefined();
+      expect(test.status).toBe("QUEUED");
+      // TODO: Add AgentCore-specific resource cleanup assertions
     });
convex/diagramGenerator.ts (1)

92-128: ⚠️ Potential issue | 🟡 Minor

Unsanitized tool.name interpolated into generated Python code.

tool.name is injected directly into Python string literals (e.g., line 115: "${tool.name}"). A tool name containing a double quote or backslash would produce invalid Python. agentName has the same risk in the diagram title string (line 96).

Consider escaping or sanitizing these values for use in Python strings.

Suggested helper
+function escapePythonString(s: string): string {
+  return s.replace(/\\/g, '\\\\').replace(/"/g, '\\"');
+}

Then use escapePythonString(tool.name) in template literals.

convex/tools.ts (1)

414-422: ⚠️ Potential issue | 🟡 Minor

Potential crash if all reasoning paths fail with identical error messages.

sortedAnswers[0] on Line 421 will throw if voteCounts is empty. While answers is always pushed to (even on error), it's possible all entries are identical error strings, in which case sortedAnswers will have entries. However, if numPaths is 0 (e.g., someone passes numPaths: 0), answers would be empty, causing a crash. Pre-existing issue, but worth noting.

convex/metaAgent.ts (1)

246-279: ⚠️ Potential issue | 🟠 Major

Handle missing requiredTools/requiredDataSources from LLM output.

If the model returns partial JSON, the current loops will throw and fail the whole design flow. Defensive defaults keep the system resilient to imperfect model output.

🛠️ Suggested fix
-  for (const requiredTool of analysis.requiredTools) {
+  const requiredTools = Array.isArray( analysis.requiredTools ) ? analysis.requiredTools : [];
+  for (const requiredTool of requiredTools) {
     const toolKey = requiredTool.toLowerCase().replace(/\s+/g, "_");
     if (toolMapping[toolKey]) {
       tools.push(toolMapping[toolKey]);
     }
   }

-  for (const dataSource of analysis.requiredDataSources) {
+  const requiredDataSources = Array.isArray( analysis.requiredDataSources ) ? analysis.requiredDataSources : [];
+  for (const dataSource of requiredDataSources) {
     tools.push({
convex/agentBuilderWorkflow.ts (1)

269-313: ⚠️ Potential issue | 🟠 Major

Use resolved Bedrock model ID consistently for gating + billing.

invokeWorkflowModel resolves model IDs internally, but billing uses effectiveModelId. If the user supplies an alias, billing/pricing lookups can mis‑match or fail. Resolve once and use the resolved ID for gating, invocation, and billing.

🛠️ Suggested fix
-    const effectiveModelId = args.modelId || DEFAULT_WORKFLOW_MODEL;
+    const effectiveModelId = args.modelId || DEFAULT_WORKFLOW_MODEL;
+    const { resolveBedrockModelId } = await import( "./modelRegistry.js" );
+    const resolvedModelId = resolveBedrockModelId( effectiveModelId );

     const { requireBedrockAccess } = await import( "./lib/bedrockGate" );
     const gateResult = await requireBedrockAccess(
-      ctx, effectiveModelId,
+      ctx, resolvedModelId,
       async ( lookupArgs ) => ctx.runQuery( internal.users.getInternal, lookupArgs ),
     );

     const result = await invokeWorkflowModel( {
-      modelId: effectiveModelId,
+      modelId: resolvedModelId,
       stageName: stage.name,
       systemPrompt: stage.systemPrompt,
       userPrompt: fullPrompt,
     } );

     await ctx.runMutation( internal.stripeMutations.incrementUsageAndReportOverage, {
       userId: gateResult.userId,
-      modelId: effectiveModelId,
+      modelId: resolvedModelId,
       inputTokens: result.inputTokens,
       outputTokens: result.outputTokens,
     } );
src/components/AgentBuilder.tsx (1)

128-134: ⚠️ Potential issue | 🟠 Major

Add skills and thinkingLevel fields to agent save, generate, and export payloads.

The SkillsStep component mutates config with these fields, but the backend mutation and action signatures don't accept them, causing user selections to be silently dropped. This breaks the end-to-end feature.

🛠️ Suggested fix

Update the following backend functions to accept and persist these fields:

export const generateAgent = action({
  args: {
    name: v.string(),
    model: v.string(),
    systemPrompt: v.string(),
    tools: v.array(...),
+   skills: v.optional(v.array(v.object({
+     skillId: v.optional(v.id("dynamicTools")),
+     name: v.string(),
+     enabled: v.optional(v.boolean()),
+   }))),
+   thinkingLevel: v.optional(v.union(
+     v.literal("low"),
+     v.literal("medium"),
+     v.literal("high")
+   )),
    deploymentType: v.string(),
    ...
  },

Also update agents.create and agents.update mutations with the same fields, and include them in the frontend payloads at lines 128–134, 173–186, and 404–413.

convex/modelRegistry.ts (1)

969-1032: ⚠️ Potential issue | 🟠 Major

Mismatch between SHORT_NAME_TO_BEDROCK_ID and BEDROCK_MODELS causes lookup failures.

SHORT_NAME_TO_BEDROCK_ID maps to cross-region prefixed IDs like "us.amazon.nova-pro-v1:0" and "us.meta.llama3-3-70b-instruct-v1:0", but BEDROCK_MODELS keys use base IDs without region prefixes (e.g., "amazon.nova-pro-v1:0", "meta.llama3-3-70b-instruct-v1:0"). When resolveBedrockModelId("nova-pro") returns the prefixed ID, subsequent lookups in getUnitsForModel() and getModelThinkingConfig() fail to find entries and return incorrect defaults (1 unit, no thinking support).

Either add the region-prefixed entries to BEDROCK_MODELS, strip the prefix in resolveBedrockModelId before returning, or normalize IDs in lookup functions.

🤖 Fix all issues with AI agents
In `@convex/apiKeys.ts`:
- Around line 273-276: The check in convex/apiKeys.ts now throws when
process.env.JWT_PRIVATE_KEY is missing, which will break verification of
existing API keys previously hashed with the old 'default-secret'; update the
code to handle a migration path: when verifying API keys (where you currently
reference secret / JWT_PRIVATE_KEY) attempt a legacy verification using the old
'default-secret' if the new secret fails, and on successful legacy verification
re-hash the key with the new JWT_PRIVATE_KEY and persist the new hash (or emit a
migration event), or alternatively provide a one-time migration script that
reads all stored hashes, re-computes them with the new secret and replaces them;
also ensure you log or surface that the change is a breaking migration in
release notes.

In `@convex/deploymentPackageGenerator.ts`:
- Around line 190-237: The generated Python scaffold in
generateBasicTestScaffold inserts unescaped values (agentName, tool names in
EXPECTED_TOOLS and tool docstrings) directly into string literals, risking
syntax errors or injection; fix by escaping these values for safe Python string
and triple-quote contexts before interpolation (use the existing
escapePythonString and escapePythonTripleQuote helpers), ensure EXPECTED_TOOLS
is populated with escaped/JSON-safe entries, replace direct "${agentName}" with
an escaped Python string, and use escaped tool names in toolTests/docstrings
while keeping safeName only for function identifiers.

In `@convex/http.ts`:
- Around line 201-206: The code computes message from rawInput and
JSON.stringify(args) but when args is undefined JSON.stringify(undefined)
returns undefined, so ensure message is always a string before calling
ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents). Update
the logic around rawInput/message (the variables named rawInput and message) to
coerce a safe string (e.g. if typeof rawInput === "string" use it, else use
JSON.stringify(args ?? {}) or fallback to an empty string) so that the message
argument passed with agent._id is never undefined.

In `@convex/lib/iterativeLoop.ts`:
- Around line 32-110: The JSDoc for CompletionCriteria.successPattern is
misleading: checkCompletionCriteria (and its use of response.content.includes)
treats successPattern as a plain substring, not a regex; update the JSDoc for
CompletionCriteria.successPattern to state "substring pattern" (or explicitly
"plain substring") and adjust the comment near checkCompletionCriteria to note
that successPattern is matched using String.prototype.includes; reference
CompletionCriteria.successPattern and the checkCompletionCriteria function when
making this small doc-only change.

In `@convex/lib/unifiedModalitySwitching.ts`:
- Around line 50-56: The avgCostPer1K function currently returns 0 on a
BEDROCK_MODELS miss which makes missing models appear free; update avgCostPer1K
to detect when BEDROCK_MODELS[modelId] is undefined, emit a warning (e.g.,
console.warn or your project logger) that includes the modelId, and return a
non-zero sentinel (preferably Number.POSITIVE_INFINITY or a very large value) so
selectUnifiedModel’s cost-based sorting does not prefer registry-missing models.

In `@convex/metaAgent.ts`:
- Around line 38-63: The buildModelDatabase function can produce NaN scores when
a model in BEDROCK_MODELS lacks costPer1MTokens or contextWindow; update
buildModelDatabase to validate those fields before computing scores: for each
[id, model] check that model.costPer1MTokens and a numeric
model.costPer1MTokens.output exist and are finite (skip the model if not), and
treat a missing or non-numeric model.contextWindow with a safe default (e.g., 0
or a configured fallback) before using Math.round; ensure totalScore is computed
only from validated numeric parts so selection and budget logic using totalScore
remain reliable.

In `@convex/modelRegistry.ts`:
- Around line 1117-1129: getModelThinkingConfig and getUnitsForModel currently
fail to match models when passed region-prefixed IDs; normalize the incoming
modelId before lookup by calling or mimicking resolveBedrockModelId (or strip
the leading region prefix like "us."/"eu." etc.) and use the normalized ID when
indexing BEDROCK_MODELS so lookups succeed; update getModelThinkingConfig (and
apply the same change to getUnitsForModel) to compute a lookupId =
resolveBedrockModelId(modelId) or remove the region segment, then use
BEDROCK_MODELS[lookupId] for capability and billing decisions.

In `@convex/realAgentTesting.ts`:
- Around line 26-31: The tools schema in the v.array object is missing the
extrasPip field, which makes the branch in generateFullRequirements that reads
tool.extrasPip dead; update the schema (the tools definition in this file) to
include extrasPip as an optional array of strings (matching other definitions
like in toolRegistry.ts and codeGenerator.ts) so tool.extrasPip is validated and
included when generateFullRequirements iterates tools.

In `@convex/strandsAgentExecution.ts`:
- Line 330: The code computes hasSkills as agentSkills.length > 0 &&
isAnthropicModel which silently drops skills for non‑Anthropic models; update
the logic to keep the check but also emit a clear warning when skills are
configured but the chosen model doesn't support tools: after the existing
hasSkills calculation in strandsAgentExecution.ts (reference symbols: hasSkills,
agentSkills, isAnthropicModel) add a conditional that when agentSkills.length >
0 && !isAnthropicModel calls the project's logger (e.g., processLogger.warn or
logger.warn) to note that skills will be ignored for the selected model and list
the configured skills to aid debugging.
- Line 550: The variable declaration for totalTokens should use const instead of
let since the binding is never reassigned; update the declaration of totalTokens
in strandsAgentExecution.ts (the object initialized as { inputTokens: 0,
outputTokens: 0, totalTokens: 0 }) from let to const — you can still mutate its
properties as needed, so replace the let binding with const for the variable
named totalTokens.
- Around line 100-118: The Bedrock detection logic in the isBedrock check is
missing several registered prefixes (e.g., deepseek, moonshot, qwen, minimax,
zhipu) and so can bypass gating; update isBedrock to reuse the canonical Bedrock
prefix list or utility instead of the hardcoded regex — import and use
BEDROCK_PREFIXES from modelRegistry (or call an existing helper like
isOllamaModelId/isBedrockModel if available) to test agent.model while still
honoring agent.deploymentType === "bedrock", and keep the existing bedrock
gating flow that calls requireBedrockAccessForUser with agentOwner and
agent.model.

In `@convex/swarmTestingOrchestrator.ts`:
- Around line 626-643: storeSwarm and getSwarm are stubs that cause createSwarm
to report success while nothing is persisted; fix by adding an in-memory
persistence map in this module (e.g., swarmStore: Map<string, SwarmDefinition>)
and implement storeSwarm to validate the incoming swarm object has an identifier
(e.g., swarm.id or swarm.swarmId), save it into swarmStore keyed by that id (or
throw if no id), and implement getSwarm to look up and return the stored
SwarmDefinition (or null if missing); also ensure createSwarm checks the result
of storeSwarm (propagate/throw on failure) rather than unconditionally returning
success so callers do not get a false-positive creation.

In `@convex/testExecution.ts`:
- Around line 184-190: The ad-hoc Ollama detection used to compute
effectiveDeploymentType is inconsistent with isOllamaModelId and can misclassify
testModelId; update the effectiveDeploymentType calculation so it uses
isOllamaModelId(args.testModelId) instead of the string heuristic (keep fallback
to agent.deploymentType when no testModelId), ensuring the variables
effectiveModel, effectiveDeploymentType and the call to
extractModelConfig(effectiveModel, effectiveDeploymentType) are derived
consistently with the rate-limiting logic that uses isOllamaModelId.

In `@convex/workflowExecutor.ts`:
- Around line 771-792: In evaluateExpression, the early
includes("success")/includes("error") checks can shadow operator logic (e.g.,
"status != success"); update the function to evaluate operator-based expressions
first (the !==/!= and ===/== branches using the regex splits) before the keyword
checks, or alternatively use word-boundary regexes when matching keywords (e.g.,
/\bsuccess\b/ and /\berror\b/) so operators with operands "success"/"error" are
parsed by the operator branches; ensure left/right resolution still uses
context[left] ?? context.result?.[left] and the same null/true/false coercion
logic in the operator handlers.

In `@src/App.tsx`:
- Around line 181-183: The onAgentsCreated callback passed to AIAgentBuilder is
a no-op so newly created agents are dropped and the user receives no feedback;
update the onAgentsCreated handler to call the existing onNavigate function and
navigate to the builder/dashboard route (or builder view) with the created agent
ID (received as the _agents parameter) so the UI transitions to the generated
agent (e.g., call onNavigate(...) with the route that includes the agent ID or a
query param); ensure AIAgentBuilder's onAgentsCreated uses the actual agent id
value rather than ignoring _agents.

In `@src/components/AgentBuilder.tsx`:
- Around line 155-162: The client-side saving schema allows agent names up to
200 chars while validateAgentConfig enforces 100, causing server rejections;
update the saveSchema (the z.object declared as saveSchema in AgentBuilder.tsx)
to set name: z.string().min(1, "Agent name is required").max(100, "Agent name
too long") so the client validation matches validateAgentConfig and prevents
mismatched acceptance.
🧹 Nitpick comments (18)
convex/swarmTestingOrchestrator.ts (3)

655-656: Consider replacing v.any() with a more constrained union or v.object with optional fields.

v.any() on internal action args disables all runtime validation, so malformed tool invocations will propagate silently. Even a loose schema (e.g., v.object({ agents: v.optional(v.array(v.any())), strategy: v.optional(v.string()), ... })) would catch structural mismatches early.

Same applies to line 628 and line 740.


93-93: substr is deprecated; use substring instead.

String.prototype.substr is deprecated per the ECMAScript spec. Replace with .substring(2, 11).


900-900: ctx: any loses type safety for the Convex action context.

Consider typing ctx with the appropriate Convex context type (e.g., ActionCtx) to get compile-time checks on ctx.runQuery calls.

.github/workflows/setup-github-projects_Version3.yml (1)

25-25: Hardcoded magic PROJECT_ID duplicated across two steps — consider a job-level env and add a comment.

"4" is a magic number with no explanation of which project it refers to. If the project is ever recreated, both steps silently target the wrong (or nonexistent) project. Two improvements:

  1. Define it once at the job level to keep things DRY.
  2. Add a brief comment so future maintainers know what project 4 is.
♻️ Suggested refactor
 jobs:
   setup-project:
     runs-on: ubuntu-latest
+    env:
+      # GitHub Projects V2 project number for the main backlog board
+      PROJECT_ID: "4"
     steps:
       - name: Checkout
         uses: actions/checkout@v4

       - name: Backfill existing open issues
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          PROJECT_ID: "4"
         run: |
           ...

       - name: Add new issues/PRs to project
         if: github.event_name == 'issues' || github.event_name == 'pull_request'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          PROJECT_ID: "4"
         run: |
           ...

Also applies to: 55-55

convex/authDebug.ts (1)

31-35: Hardcoded fallback URLs will silently produce wrong callback URLs in new deployments.

The env-var approach is a good improvement, but the fallback values (resolute-kudu-325, 633051e6) are specific to a single deployment. In a fresh environment where none of these env vars are set, the returned OAuth callback URLs will point to stale/incorrect endpoints with no visible warning.

Consider either logging a warning when a fallback is used, or dropping the fallbacks entirely and omitting an entry when the corresponding env var is not set.

convex/lib/dynamicModelSwitching.ts (2)

69-89: sonnet and sonnet45 tiers are identical in every field except capabilityRating.

Both tiers reference SONNET_ID, so they share the same modelId, cost, and maxTokens. The dynamic routing between complexity scores 30–59 and 60+ will select different tiers but invoke the exact same model at the same cost. The 3-tier escalation is effectively 2-tier.

If this is a temporary state until a distinct "sonnet45" model ID is available, consider adding a // TODO to make that clear. Otherwise, the sonnet45 tier entry is dead weight that could confuse future maintainers.


33-43: costPer1K falls back to zero cost — could silently mask billing issues.

If a model ID isn't found in BEDROCK_MODELS, the function returns { input: 0, output: 0 }. While the comment says this shouldn't happen, a warning log would help catch misconfiguration early.

Optional: add a warning
   // Fallback — should not happen for models listed below
+  console.warn(`costPer1K: model "${modelId}" not found in BEDROCK_MODELS — defaulting to 0`);
   return { input: 0, output: 0 };
convex/queueProcessor.ts (1)

12-12: parseInt on env var without radix and no NaN guard.

parseInt(process.env.MAX_CONCURRENT_TESTS || "10") will correctly fall back to "10" when the env var is unset, but if the env var is set to a non-numeric string (e.g., "abc"), parseInt returns NaN, which would cause runningCount >= MAX_CONCURRENT_TESTS to always be false, effectively removing the concurrency limit.

This is a pre-existing issue, but worth a quick guard since this file is being updated.

Optional: guard against NaN
-const MAX_CONCURRENT_TESTS = parseInt(process.env.MAX_CONCURRENT_TESTS || "10");
+const MAX_CONCURRENT_TESTS = parseInt(process.env.MAX_CONCURRENT_TESTS || "10", 10) || 10;
convex/deploymentPackageGenerator.ts (1)

215-218: assert len(EXPECTED_TOOLS) >= 0 always passes — list length is never negative.

The comment acknowledges "Some agents may have no tools," but the assertion is a no-op. If the intent is simply a presence check, consider making it a descriptive pass or removing the assertion.

convex/agentcoreTestExecution.ts (2)

455-461: Two statements on one line reduce readability.

Line 457 combines JSON.parse(...) assignment and return { on a single line, making it easy to miss the return.

Proposed fix
-      const errorPayload = JSON.parse( rawPayload ); return {
+      const errorPayload = JSON.parse( rawPayload );
+      return {

279-285: Timeout timer not cleared on success for both Bedrock and Lambda Promise.race calls.

When the Bedrock call resolves before the timeout, the setTimeout callback remains scheduled. While it won't cause a crash (the rejected promise is ignored), it keeps the timer handle alive unnecessarily. The same applies to the Lambda call at lines 448–453. Consider wrapping the race in an AbortController pattern or clearing the timer, similar to how the Ollama path (lines 367–386) correctly clears it.

Example fix for Bedrock (apply similarly to Lambda)
-    const response = await Promise.race( [
-      client.send( command ),
-      new Promise<never>( ( _, reject ) =>
-        setTimeout( () => reject( new Error( "Bedrock timeout" ) ), 60000 )
-      ),
-    ] );
+    let timeoutId: ReturnType<typeof setTimeout>;
+    const response = await Promise.race( [
+      client.send( command ),
+      new Promise<never>( ( _, reject ) => {
+        timeoutId = setTimeout( () => reject( new Error( "Bedrock timeout" ) ), 60000 );
+      } ),
+    ] ).finally( () => clearTimeout( timeoutId ) );
convex/cloudFormationGenerator.ts (1)

71-80: TemplateConfig interface still uses any[] for tools, losing the type safety added at the action boundary.

Consider updating the interface to match the typed shape from the action args, so downstream code benefits from the tighter typing as well.

Proposed fix
 interface TemplateConfig {
   agentName: string;
   model: string;
-  tools: any[];
+  tools: Array<{
+    name: string;
+    type: string;
+    requiresPip?: boolean;
+    pipPackages?: string[];
+  }>;
   region: string;
   environment: string;
-  vpcConfig: any;
-  monitoring: any;
-  scaling: any;
+  vpcConfig: { createVpc: boolean; vpcCidr?: string; availabilityZones?: string[] };
+  monitoring: { enableXRay: boolean; enableCloudWatch: boolean; logRetentionDays?: number };
+  scaling: { minCapacity?: number; maxCapacity?: number; targetCpuUtilization?: number };
 }
convex/mcpFileUpload.ts (1)

72-72: Good: env values narrowed to string.

Note that uploadMCPConfig (line 49) also writes env from parsed JSON without validating that values are strings. If the schema enforces Record<string, string>, consider adding validation in uploadMCPConfig as well to prevent inconsistent data.

src/components/AgentBuilderWorkflowUI.tsx (1)

32-66: New workflow stages added — ensure backend alignment.

The stage-to-result mapping at line 107 (result.workflow[index]?.output) relies on positional alignment between WORKFLOW_STAGES and the backend's result.workflow array. If the backend doesn't return results for ast_analysis, test_generation, or verification at the matching indices, outputs will be misattributed.

Consider keying results by stage name instead of array index for robustness.

convex/tools.ts (1)

49-53: Mid-file import statement.

The import { safeJsonParse } on Line 53 sits inside a comment block between function definitions. While this works in TypeScript (imports are hoisted), placing it with the other imports at the top of the file would improve readability.

Suggested: move import to top of file
 import { isOllamaModelId } from "./modelRegistry";
+import { safeJsonParse } from "./lib/jsonUtils";

Then remove Lines 49–53 (the mid-file comment block and import).

convex/strandsAgentExecutionDynamic.ts (1)

152-163: Model switch could bypass the Bedrock gate for the actually-executed model.

The Bedrock gate at Line 133 checks agent.model, but executeWithModelSwitching (Line 157) may select a different model via decideModelSwitch. If the switched model is in a different model family or tier, the gate wouldn't have checked it. This is mitigated by decideModelSwitch being tier-aware, but the defense-in-depth gap is worth noting.

convex/automatedAgentBuilder.ts (1)

264-264: buildSystemPrompt and analyzeAndAskNext use pervasive any types.

requirements: any, conversationHistory: any[], and requirements: any in the return type weaken the type safety gained by the structured validators in updateBuildSession. Consider defining a shared AgentRequirements interface and reusing it across both the mutation args and these helper functions.

convex/strandsAgentExecution.ts (1)

180-210: Remove unnecessary as any cast — skills and thinkingLevel are now in the agents schema.

Both fields are defined in convex/schema.ts as optional properties:

  • skills (line 57): optional array with skillId, name, enabled
  • thinkingLevel (line 64): optional union of "low" | "medium" | "high"

Since AgentDoc is typed as Doc<"agents">, the cast on line 182 can be removed once Convex codegen is re-run. Access these fields directly without as any to preserve type safety.

Comment on lines +273 to +276
const secret = process.env.JWT_PRIVATE_KEY;
if (!secret) {
throw new Error("JWT_PRIVATE_KEY environment variable is required for API key hashing");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Good hardening — but consider the migration path for keys hashed with the old default.

Removing the 'default-secret' fallback is the right call. However, if any existing environment was previously running without JWT_PRIVATE_KEY set, all previously hashed API keys were stored using 'default-secret' as the salt. Once the env var is set (or if it was already missing), those stored hashes will never match again, silently invalidating every existing key.

Consider whether a one-time migration or re-hash step is needed, or at minimum document this as a breaking change in the PR/release notes.

🤖 Prompt for AI Agents
In `@convex/apiKeys.ts` around lines 273 - 276, The check in convex/apiKeys.ts now
throws when process.env.JWT_PRIVATE_KEY is missing, which will break
verification of existing API keys previously hashed with the old
'default-secret'; update the code to handle a migration path: when verifying API
keys (where you currently reference secret / JWT_PRIVATE_KEY) attempt a legacy
verification using the old 'default-secret' if the new secret fails, and on
successful legacy verification re-hash the key with the new JWT_PRIVATE_KEY and
persist the new hash (or emit a migration event), or alternatively provide a
one-time migration script that reads all stored hashes, re-computes them with
the new secret and replaces them; also ensure you log or surface that the change
is a breaking migration in release notes.

Comment on lines +190 to +237
function generateBasicTestScaffold(agent: any): string {
const agentName = agent.name || "Agent";
const tools = (agent.tools || []).map((t: any) => t.name);
const toolTests = tools.map((toolName: string) => {
const safeName = toolName.replace(/[^a-zA-Z0-9]/g, "_");
const escaped = escapePythonString(toolName);
return `def test_tool_${safeName}_exists():
"""Verify ${safeName} tool is registered."""
assert "${escaped}" in EXPECTED_TOOLS`;
}).join("\n\n");

return `"""
Auto-generated test scaffold for ${agentName}.
Run with: pytest test_agent.py -v
"""
import pytest

# Expected tools from agent configuration
EXPECTED_TOOLS = ${JSON.stringify(tools)}

def test_agent_has_system_prompt():
"""Verify agent has a non-empty system prompt."""
system_prompt = """${(() => { let s = escapePythonTripleQuote(agent.systemPrompt || "").slice(0, 500); if (s.endsWith("\\")) s = s.slice(0, -1); return s; })()}"""
assert len(system_prompt.strip()) > 0

def test_agent_has_tools():
"""Verify agent has at least one tool configured."""
assert len(EXPECTED_TOOLS) >= 0 # Some agents may have no tools

${toolTests || "# No tool-specific tests (agent has no tools configured)"}

def test_agent_name():
"""Verify agent name is set."""
assert "${agentName}" != ""

def test_requirements_parseable():
"""Verify requirements.txt can be parsed."""
import pathlib
req_file = pathlib.Path(__file__).parent / "requirements.txt"
if req_file.exists():
lines = req_file.read_text().strip().split("\\n")
for line in lines:
line = line.strip()
if line and not line.startswith("#"):
# Basic check: each line should be a valid package spec
assert len(line) > 0
`;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Python injection risk: agentName is interpolated without escaping.

Line 223 embeds agentName directly into a Python string literal:

assert "${agentName}" != ""

If agentName contains a double-quote (e.g., My "Agent"), the generated Python will have a syntax error. More importantly, tool names on line 194 could have the same issue — safeName sanitizes for function names but not for the docstring on line 197.

Proposed fix for agentName
 def test_agent_name():
     """Verify agent name is set."""
-    assert "${agentName}" != ""
+    assert "${escapePythonString(agentName)}" != ""
🤖 Prompt for AI Agents
In `@convex/deploymentPackageGenerator.ts` around lines 190 - 237, The generated
Python scaffold in generateBasicTestScaffold inserts unescaped values
(agentName, tool names in EXPECTED_TOOLS and tool docstrings) directly into
string literals, risking syntax errors or injection; fix by escaping these
values for safe Python string and triple-quote contexts before interpolation
(use the existing escapePythonString and escapePythonTripleQuote helpers),
ensure EXPECTED_TOOLS is populated with escaped/JSON-safe entries, replace
direct "${agentName}" with an escaped Python string, and use escaped tool names
in toolTests/docstrings while keeping safeName only for function identifiers.

Comment on lines +201 to +206
const rawInput = (args as Record<string, unknown>)?.input;
const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args);

const result = await ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, {
agentId: agent._id,
message: args?.input || JSON.stringify(args),
message,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

message can be undefined when arguments is omitted.

When the request body has no arguments field (valid per MCP spec — line 122 allows it), args is undefined. Then:

  • (undefined as Record<string, unknown>)?.inputundefined
  • JSON.stringify(undefined)undefined (the primitive, not a string)

So message is undefined, which is passed to executeAgentWithStrandsAgents. This likely violates the expected string type of the message parameter.

Proposed fix
      const rawInput = (args as Record<string, unknown>)?.input;
-     const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args);
+     const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args ?? {});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const rawInput = (args as Record<string, unknown>)?.input;
const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args);
const result = await ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, {
agentId: agent._id,
message: args?.input || JSON.stringify(args),
message,
const rawInput = (args as Record<string, unknown>)?.input;
const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args ?? {});
const result = await ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, {
agentId: agent._id,
message,
🤖 Prompt for AI Agents
In `@convex/http.ts` around lines 201 - 206, The code computes message from
rawInput and JSON.stringify(args) but when args is undefined
JSON.stringify(undefined) returns undefined, so ensure message is always a
string before calling
ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents). Update
the logic around rawInput/message (the variables named rawInput and message) to
coerce a safe string (e.g. if typeof rawInput === "string" use it, else use
JSON.stringify(args ?? {}) or fallback to an empty string) so that the message
argument passed with agent._id is never undefined.

Comment on lines +32 to +110
export interface CompletionCriteria {
type: CompletionCriteriaType;
/** Regex or substring pattern that indicates success (for tests_pass). */
successPattern?: string;
/** Regex or substring pattern that indicates errors (for no_errors). */
errorPatterns?: string[];
}

export interface CompletionCheckResult {
isComplete: boolean;
reason?: string;
}

export interface IterationResult {
iteration: number;
content: string;
success: boolean;
tokenUsage?: { inputTokens: number; outputTokens: number; totalTokens: number };
}

// ─── Default error patterns ────────────────────────────────────────────────

const DEFAULT_ERROR_PATTERNS = [
"Error:",
"error:",
"ERROR:",
"failed",
"FAILED",
"Exception:",
"Traceback",
"SyntaxError",
"TypeError",
"ReferenceError",
];

/**
* Check whether fail indicators match independently of pass indicators.
* Returns true if there is at least one fail match NOT contained within a pass match.
* e.g. "0 failed" contains "failed", but that "failed" is part of "0 failed" (a pass indicator).
*/
function hasIndependentFailIndicator(
lower: string,
passIndicators: string[],
failIndicators: string[],
): boolean {
return failIndicators.some( f => {
const idx = lower.indexOf( f );
if ( idx === -1 ) return false;
return !passIndicators.some( pass => {
const passIdx = lower.indexOf( pass );
return passIdx !== -1 && idx >= passIdx && idx < passIdx + pass.length;
} );
} );
}

// ─── Core Functions ─────────────────────────────────────────────────────────

/**
* Check whether the agent's response meets the completion criteria.
*
* @param criteria - The criteria to check against.
* @param response - The agent's last response.
* @returns - Whether the loop should stop.
*/
export function checkCompletionCriteria(
criteria: CompletionCriteria,
response: { success: boolean; content: string },
): CompletionCheckResult {
switch ( criteria.type ) {
case "tests_pass": {
if ( !response.success ) {
return { isComplete: false, reason: "Agent execution failed" };
}
if ( criteria.successPattern ) {
const found = response.content.includes( criteria.successPattern );
return found
? { isComplete: true, reason: `Success pattern matched: "${criteria.successPattern}"` }
: { isComplete: false, reason: `Success pattern not found: "${criteria.successPattern}"` };
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify successPattern semantics (substring vs regex).

The JSDoc says “Regex or substring,” but the code uses includes only. Either update the doc to “substring” or add regex support to avoid confusion.

🛠️ Suggested fix (doc clarification)
-  /** Regex or substring pattern that indicates success (for tests_pass). */
+  /** Substring pattern that indicates success (for tests_pass). */
🤖 Prompt for AI Agents
In `@convex/lib/iterativeLoop.ts` around lines 32 - 110, The JSDoc for
CompletionCriteria.successPattern is misleading: checkCompletionCriteria (and
its use of response.content.includes) treats successPattern as a plain
substring, not a regex; update the JSDoc for CompletionCriteria.successPattern
to state "substring pattern" (or explicitly "plain substring") and adjust the
comment near checkCompletionCriteria to note that successPattern is matched
using String.prototype.includes; reference CompletionCriteria.successPattern and
the checkCompletionCriteria function when making this small doc-only change.

Comment on lines +50 to +56
function avgCostPer1K(modelId: string): number {
const meta = BEDROCK_MODELS[modelId];
if (meta?.costPer1MTokens) {
return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000;
}
return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent zero-cost fallback could break cost-based model routing.

avgCostPer1K returns 0 when a model ID isn't found in the registry. Since selectUnifiedModel sorts by costPer1KTokensOrUnit ascending for freemium users and low-complexity requests, a registry miss would make that model appear free and always be selected. Consider logging a warning or using a non-zero sentinel so cost routing doesn't silently degrade.

Suggested: add a warning for missing models
 function avgCostPer1K(modelId: string): number {
   const meta = BEDROCK_MODELS[modelId];
   if (meta?.costPer1MTokens) {
     return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000;
   }
+  console.warn(`avgCostPer1K: model "${modelId}" not found in BEDROCK_MODELS, defaulting to 0`);
   return 0;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function avgCostPer1K(modelId: string): number {
const meta = BEDROCK_MODELS[modelId];
if (meta?.costPer1MTokens) {
return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000;
}
return 0;
}
function avgCostPer1K(modelId: string): number {
const meta = BEDROCK_MODELS[modelId];
if (meta?.costPer1MTokens) {
return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000;
}
console.warn(`avgCostPer1K: model "${modelId}" not found in BEDROCK_MODELS, defaulting to 0`);
return 0;
}
🤖 Prompt for AI Agents
In `@convex/lib/unifiedModalitySwitching.ts` around lines 50 - 56, The
avgCostPer1K function currently returns 0 on a BEDROCK_MODELS miss which makes
missing models appear free; update avgCostPer1K to detect when
BEDROCK_MODELS[modelId] is undefined, emit a warning (e.g., console.warn or your
project logger) that includes the modelId, and return a non-zero sentinel
(preferably Number.POSITIVE_INFINITY or a very large value) so
selectUnifiedModel’s cost-based sorting does not prefer registry-missing models.

Comment on lines 626 to 643
export const storeSwarm = internalAction({
args: {
swarm: v.any(),
swarm: v.any(), // v.any(): swarm definition structure varies by topology and communication protocol
},
handler: async (ctx, args) => {
handler: async (_ctx, _args) => {
// Store swarm definition (in-memory for now)
console.log("Storing swarm:", args.swarm);
// TODO: Persist to swarmDefinitions table
},
});

export const getSwarm = internalQuery({
args: {
swarmId: v.string(),
},
handler: async (ctx, args): Promise<SwarmDefinition | null> => {
// Retrieve swarm definition (mock implementation)
console.log("Retrieving swarm:", args.swarmId);
handler: async (_ctx, _args): Promise<SwarmDefinition | null> => {
// Retrieve swarm definition (stub — needs swarmDefinitions table)
return null;
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

storeSwarm silently discards data while createSwarm reports success — every subsequent swarm operation will fail.

createSwarm (line 107) calls storeSwarm, which is now a no-op, then returns success: true with a swarmId. Any follow-up call (sendMessageToSwarm, switchToAgent, testSwarmCoordination, etc.) hits getSwarm which always returns null, resulting in "Swarm not found". The user gets a successful creation response but a broken swarm.

At minimum, createSwarm should surface that persistence is stubbed, or these stubs should throw so callers don't silently succeed.

🤖 Prompt for AI Agents
In `@convex/swarmTestingOrchestrator.ts` around lines 626 - 643, storeSwarm and
getSwarm are stubs that cause createSwarm to report success while nothing is
persisted; fix by adding an in-memory persistence map in this module (e.g.,
swarmStore: Map<string, SwarmDefinition>) and implement storeSwarm to validate
the incoming swarm object has an identifier (e.g., swarm.id or swarm.swarmId),
save it into swarmStore keyed by that id (or throw if no id), and implement
getSwarm to look up and return the stored SwarmDefinition (or null if missing);
also ensure createSwarm checks the result of storeSwarm (propagate/throw on
failure) rather than unconditionally returning success so callers do not get a
false-positive creation.

Comment on lines +184 to +190
// Determine model provider and config — use testModelId override if provided,
// so the user can design with one model and test with another.
const effectiveModel = args.testModelId || agent.model;
const effectiveDeploymentType = args.testModelId
? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType )
: agent.deploymentType;
const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Inconsistent Ollama detection: ad-hoc heuristic vs. centralized isOllamaModelId.

Line 91 uses isOllamaModelId(args.testModelId) (the new centralized registry-backed function) to derive testDeployType, but Line 188 uses the old ad-hoc heuristic args.testModelId.includes(":") && !args.testModelId.includes(".") to derive effectiveDeploymentType. These can produce different results for the same testModelId — for example, models like "deepseek-v3" or "mistral-nemo" (no colon, no dot) would be classified as non-Ollama by Line 188 but as Ollama by isOllamaModelId via the OLLAMA_MODELS registry fallback.

This means extractModelConfig on Line 190 could route to Bedrock when the rate-limiting gate (Line 96) correctly treated it as Ollama, causing a mismatch between gating and actual execution config.

🐛 Proposed fix: use isOllamaModelId consistently
     const effectiveModel = args.testModelId || agent.model;
     const effectiveDeploymentType = args.testModelId
-      ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType )
+      ? ( isOllamaModelId( args.testModelId ) ? "ollama" : agent.deploymentType )
       : agent.deploymentType;
     const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Determine model provider and config — use testModelId override if provided,
// so the user can design with one model and test with another.
const effectiveModel = args.testModelId || agent.model;
const effectiveDeploymentType = args.testModelId
? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType )
: agent.deploymentType;
const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType);
// Determine model provider and config — use testModelId override if provided,
// so the user can design with one model and test with another.
const effectiveModel = args.testModelId || agent.model;
const effectiveDeploymentType = args.testModelId
? ( isOllamaModelId( args.testModelId ) ? "ollama" : agent.deploymentType )
: agent.deploymentType;
const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType);
🤖 Prompt for AI Agents
In `@convex/testExecution.ts` around lines 184 - 190, The ad-hoc Ollama detection
used to compute effectiveDeploymentType is inconsistent with isOllamaModelId and
can misclassify testModelId; update the effectiveDeploymentType calculation so
it uses isOllamaModelId(args.testModelId) instead of the string heuristic (keep
fallback to agent.deploymentType when no testModelId), ensuring the variables
effectiveModel, effectiveDeploymentType and the call to
extractModelConfig(effectiveModel, effectiveDeploymentType) are derived
consistently with the rate-limiting logic that uses isOllamaModelId.

Comment on lines 771 to 792
function evaluateExpression( expression: string, context: Record<string, any> ): boolean {
// Simple checks for now (TODO: use proper safe evaluator)
if ( expression.includes( "success" ) ) {
return context.success === true;
}
if ( expression.includes( "error" ) ) {
return context.result?.error !== undefined;
}
if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() );
const leftValue = context[left] ?? context.result?.[left];
const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
return leftValue !== rightValue;
}
if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
// Very basic equality check
const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() );
const leftValue = context[left] || context.result?.[left];
const leftValue = context[left] ?? context.result?.[left];
const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
return leftValue === rightValue;
}

// Default false for safety
// Default false for safety — unrecognized expressions are rejected
return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Expression keyword checks shadow operator-based evaluation.

If an expression contains the word "success" or "error" and an operator like != or ==, the keyword check on lines 772–777 takes precedence and the operator is never evaluated. For example, "status != success" would hit the includes("success") branch and return context.success === true instead of performing an inequality comparison.

Consider checking operators first (they are more specific), or using a more structured parse that distinguishes keywords from operands.

Proposed reordering
 function evaluateExpression( expression: string, context: Record<string, any> ): boolean {
+  // Check operators first (more specific) before keyword-only checks
+  if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
+    const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() );
+    const leftValue = context[left] ?? context.result?.[left];
+    const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
+    return leftValue !== rightValue;
+  }
+  if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
+    const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() );
+    const leftValue = context[left] ?? context.result?.[left];
+    const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
+    return leftValue === rightValue;
+  }
   if ( expression.includes( "success" ) ) {
     return context.success === true;
   }
   if ( expression.includes( "error" ) ) {
     return context.result?.error !== undefined;
   }
-  if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
-    ...
-  }
-  if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
-    ...
-  }
   return false;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function evaluateExpression( expression: string, context: Record<string, any> ): boolean {
// Simple checks for now (TODO: use proper safe evaluator)
if ( expression.includes( "success" ) ) {
return context.success === true;
}
if ( expression.includes( "error" ) ) {
return context.result?.error !== undefined;
}
if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() );
const leftValue = context[left] ?? context.result?.[left];
const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
return leftValue !== rightValue;
}
if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
// Very basic equality check
const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() );
const leftValue = context[left] || context.result?.[left];
const leftValue = context[left] ?? context.result?.[left];
const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
return leftValue === rightValue;
}
// Default false for safety
// Default false for safety — unrecognized expressions are rejected
return false;
function evaluateExpression( expression: string, context: Record<string, any> ): boolean {
// Check operators first (more specific) before keyword-only checks
if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() );
const leftValue = context[left] ?? context.result?.[left];
const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
return leftValue !== rightValue;
}
if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() );
const leftValue = context[left] ?? context.result?.[left];
const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
return leftValue === rightValue;
}
if ( expression.includes( "success" ) ) {
return context.success === true;
}
if ( expression.includes( "error" ) ) {
return context.result?.error !== undefined;
}
// Default false for safety — unrecognized expressions are rejected
return false;
}
🤖 Prompt for AI Agents
In `@convex/workflowExecutor.ts` around lines 771 - 792, In evaluateExpression,
the early includes("success")/includes("error") checks can shadow operator logic
(e.g., "status != success"); update the function to evaluate operator-based
expressions first (the !==/!= and ===/== branches using the regex splits) before
the keyword checks, or alternatively use word-boundary regexes when matching
keywords (e.g., /\bsuccess\b/ and /\berror\b/) so operators with operands
"success"/"error" are parsed by the operator branches; ensure left/right
resolution still uses context[left] ?? context.result?.[left] and the same
null/true/false coercion logic in the operator handlers.

Comment on lines +181 to 183
return <AIAgentBuilder onAgentsCreated={(_agents) => {
// TODO: Navigate to builder with pre-filled agents
}} />;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No-op callback silently drops the generated agent — user gets no navigation feedback.

After AIAgentBuilder successfully generates an agent, it invokes onAgentsCreated with the new agent ID. The current no-op means the user stays on the same page with no visible result. Since onNavigate is already in scope, you could at least navigate to the builder/dashboard:

💡 Minimal fix to navigate after creation
-      return <AIAgentBuilder onAgentsCreated={(_agents) => {
-        // TODO: Navigate to builder with pre-filled agents
-      }} />;
+      return <AIAgentBuilder onAgentsCreated={(_agents) => {
+        // TODO: Pre-fill builder with generated agents
+        onNavigate("builder");
+      }} />;

Would you like me to open an issue to track implementing the full pre-fill behavior?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return <AIAgentBuilder onAgentsCreated={(_agents) => {
// TODO: Navigate to builder with pre-filled agents
}} />;
return <AIAgentBuilder onAgentsCreated={(_agents) => {
// TODO: Pre-fill builder with generated agents
onNavigate("builder");
}} />;
🤖 Prompt for AI Agents
In `@src/App.tsx` around lines 181 - 183, The onAgentsCreated callback passed to
AIAgentBuilder is a no-op so newly created agents are dropped and the user
receives no feedback; update the onAgentsCreated handler to call the existing
onNavigate function and navigate to the builder/dashboard route (or builder
view) with the created agent ID (received as the _agents parameter) so the UI
transitions to the generated agent (e.g., call onNavigate(...) with the route
that includes the agent ID or a query param); ensure AIAgentBuilder's
onAgentsCreated uses the actual agent id value rather than ignoring _agents.

Comment on lines +155 to +162
// Validate config before saving
const saveSchema = z.object( {
name: z.string().min( 1, "Agent name is required" ).max( 200, "Agent name too long" ),
description: z.string().max( 2000, "Description too long" ),
model: z.string().min( 1, "Model selection is required" ),
systemPrompt: z.string().max( 50000, "System prompt too long" ),
deploymentType: z.string().min( 1 ),
} );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align client name length validation with backend (100 chars).

validateAgentConfig enforces 100 chars, but the client allows 200, leading to server-side rejection after UI acceptance.

🛠️ Suggested fix
-      name: z.string().min( 1, "Agent name is required" ).max( 200, "Agent name too long" ),
+      name: z.string().min( 1, "Agent name is required" ).max( 100, "Agent name too long" ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Validate config before saving
const saveSchema = z.object( {
name: z.string().min( 1, "Agent name is required" ).max( 200, "Agent name too long" ),
description: z.string().max( 2000, "Description too long" ),
model: z.string().min( 1, "Model selection is required" ),
systemPrompt: z.string().max( 50000, "System prompt too long" ),
deploymentType: z.string().min( 1 ),
} );
// Validate config before saving
const saveSchema = z.object( {
name: z.string().min( 1, "Agent name is required" ).max( 100, "Agent name too long" ),
description: z.string().max( 2000, "Description too long" ),
model: z.string().min( 1, "Model selection is required" ),
systemPrompt: z.string().max( 50000, "System prompt too long" ),
deploymentType: z.string().min( 1 ),
} );
🤖 Prompt for AI Agents
In `@src/components/AgentBuilder.tsx` around lines 155 - 162, The client-side
saving schema allows agent names up to 200 chars while validateAgentConfig
enforces 100, causing server rejections; update the saveSchema (the z.object
declared as saveSchema in AgentBuilder.tsx) to set name: z.string().min(1,
"Agent name is required").max(100, "Agent name too long") so the client
validation matches validateAgentConfig and prevents mismatched acceptance.

@MikePfunk28 MikePfunk28 merged commit 6eafeda into main Feb 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant