feat: Enhance model registry and agent execution with new skills and …#26
feat: Enhance model registry and agent execution with new skills and …#26MikePfunk28 merged 3 commits intomainfrom
Conversation
…thinking capabilities - Added `thinkingApiPath` to `ModelMetadata` for reasoning invocation paths. - Introduced new models including DeepSeek V3.2, MiniMax M2, Zhipu GLM 4.7, and Qwen3. - Updated `getModelThinkingConfig` to handle reasoning configurations. - Enhanced schema to support agent skills and thinking levels. - Modified agent execution to utilize skills and thinking levels during model invocation. - Updated AgentBuilder and AgentTester components to manage skills and test models. - Improved model catalog with new entries for recently added models.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds model-aware workflows, a centralized tool dispatch + skills system, schema and UI for agent skills/thinking levels, Bedrock gating and model registry enhancements, iterative agent loop with tool execution and per-model token accounting, numerous API signature tightenings, and removal of ECS-based orchestration. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as UI/Client
participant Workflow as Workflow Executor
participant Registry as Model Registry
participant Bedrock as Bedrock API
participant Dispatch as Tool Dispatch
participant MCP as MCP Server
participant Billing as Billing Service
UI->>Workflow: Start workflow (args include modelId, user)
Workflow->>Registry: resolveModelThinkingConfig(modelId)
Registry-->>Workflow: thinking config / effectiveModelId
Workflow->>Bedrock: invoke model (Claude or Converse path) with effectiveModelId
Bedrock-->>Workflow: response (text, thinking, tool_use) + usage
Workflow->>Dispatch: dispatchToolCall(toolName, input, skills, userId)
Dispatch->>MCP: call MCP / internal / sandbox as routed
MCP-->>Dispatch: tool result + tokenUsage
Dispatch->>Billing: increment usage (non-fatal)
Dispatch-->>Workflow: tool_result
Workflow->>Workflow: accumulate tokens & build next prompt (loop)
alt completion reached
Workflow->>Billing: final usage report (effectiveModelId)
Billing-->>Workflow: report ack
Workflow-->>UI: final response + totalUsage
end
sequenceDiagram
participant Designer as Client/UI
participant Meta as MetaAgent Workflow
participant Gate as Bedrock Gate
participant MCP as MCP Server
participant Bedrock as Bedrock API
participant DB as Database
Designer->>Meta: designAgent(name, modelId, requirements)
Meta->>Gate: requireBedrockAccess(user, modelId)
Gate-->>Meta: userId (or throw)
Meta->>MCP: invokeMCPTool(analysisPrompt, userId, modelId)
MCP->>Bedrock: invoke with thinking payload (if supported)
Bedrock-->>MCP: analysis result + tokens
MCP-->>Meta: analysis result
Meta->>DB: create tools / agent config (includes skills & thinkingLevel)
Meta-->>Designer: design result (agents, reasoning, points)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR enhances the agent builder platform with advanced reasoning capabilities and a flexible skills system. It introduces support for new AI models (DeepSeek V3.2, MiniMax M2, Zhipu GLM 4.7, Qwen3), adds thinking levels to control reasoning depth, and implements a reusable skills framework that allows agents to invoke capabilities like memory management and advanced reasoning during conversations.
Changes:
- Added
thinkingApiPathmetadata to model registry for routing reasoning invocations to the correct Bedrock API - Extended agent schema with optional
skillsandthinkingLevelfields for backward compatibility - Implemented shared tool dispatch infrastructure (
lib/toolDispatch.ts) consolidating tool routing logic - Enhanced agent execution loop to support multi-turn tool use following the Anthropic tool_use protocol
- Added Skills step to AgentBuilder UI and test model override in AgentTester
- Created comprehensive test suite (
agentSkills.test.ts) covering the new features
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| convex/modelRegistry.ts | Added 7 new model entries with thinkingApiPath metadata and getModelThinkingConfig helper function |
| src/data/modelCatalog.ts | Added corresponding UI catalog entries for new models |
| convex/schema.ts | Extended agents table with skills and thinkingLevel, dynamicTools with skill-related fields |
| convex/lib/toolDispatch.ts | New shared module with INTERNAL_TOOL_MAP, skill dispatch logic, thinking level mapping, and tool loop helpers |
| convex/strandsAgentExecution.ts | Refactored to load agent skills and execute multi-turn tool loops with token accumulation |
| convex/workflowExecutor.ts | Migrated to use shared INTERNAL_TOOL_MAP from toolDispatch.ts |
| convex/testExecution.ts | Added testModelId override to allow testing with different models |
| convex/agentcoreTestExecution.ts | Updated to respect testModelId override and improved error handling |
| convex/metaTooling.ts | Added skill CRUD operations (listSkills, createSkill, getSkillById, seedBuiltinSkills) |
| convex/metaAgent.ts | Refactored to build model scores from BEDROCK_MODELS registry dynamically |
| convex/metaAgentWorkflow.ts | Added modelId parameter for user-selected workflow models |
| convex/agentBuilderWorkflow.ts | Implemented model-aware invocation routing (Claude vs Converse API) |
| convex/mcpClient.ts | Added userId parameter and token billing integration for MCP tool invocations |
| convex/mcpConfig.ts | Security fix: prevent cross-user data leakage in getMCPServerByNameInternal |
| convex/deploymentRouter.ts | Cleaned up Stripe mutation imports |
| src/components/AgentBuilder.tsx | Added SkillsStep component with thinking level selector and skills chooser |
| src/components/AgentTester.tsx | Added testModelId parameter to support testing with overridden models |
| convex/agentSkills.test.ts | Comprehensive test suite with 14 tests covering tooling, schema, and skills CRUD |
| package.json | Added test:skills script for focused test execution |
| .github/workflows/setup-github-projects_Version3.yml | Updated project name |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
convex/strandsAgentExecution.ts
Outdated
| if ( !tc.id || !tc.name ) continue; | ||
| const result = await dispatchToolCall( | ||
| ctx, tc.name, ( tc.input as Record<string, any> ) || {}, | ||
| agentSkills, String( agent.createdBy ), |
There was a problem hiding this comment.
The userId is converted to string using String(agent.createdBy), but agent.createdBy is an Id<"users"> type which should already be compatible. Additionally, the dispatchToolCall function expects userId as a string parameter, but at line 178 in toolDispatch.ts, it's cast to 'any' to match the expected type Id<"users">. This type mismatch could indicate a design inconsistency. Consider using the Id type directly or clarifying the expected type in the function signature.
| agentSkills, String( agent.createdBy ), | |
| agentSkills, agent.createdBy, |
convex/testExecution.ts
Outdated
| const testModel = args.testModelId || agent.model; | ||
| const testDeployType = args.testModelId | ||
| ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType ) | ||
| : agent.deploymentType; | ||
| const isOllamaModel = testDeployType === "ollama" || (!testDeployType && testModel.includes(':') && !testModel.includes('.')); |
There was a problem hiding this comment.
The logic for determining if a model is Ollama appears duplicated on lines 90-94. The same heuristic (checking if modelId contains ":" but not ".") is used to infer deployment type when testModelId is provided. This heuristic might not be reliable for all future model naming patterns. Consider extracting this logic into a utility function or using a more robust method to determine the deployment type, such as looking up the model in a registry.
convex/lib/toolDispatch.ts
Outdated
| return { success: false, output: "", error: error.message }; | ||
| } | ||
| } | ||
| break; |
There was a problem hiding this comment.
When skillType is "internal" but the actionName is not found in INTERNAL_TOOL_MAP, the function breaks out of the switch without returning a value, falling through to the Priority 3 fallback. This could lead to unexpected behavior. Consider adding an explicit return statement or error handling for cases where an internal skill's actionName is invalid.
| break; | |
| return { | |
| success: false, | |
| output: "", | |
| error: actionName | |
| ? `Unknown internal tool actionName: ${actionName}` | |
| : "Missing actionName for internal tool", | |
| }; |
| accumulateTokenUsage, MAX_TOOL_LOOP_ITERATIONS, buildToolResultMessages } = await import( "./lib/toolDispatch" ); | ||
|
|
||
| let agentSkills: import( "./lib/toolDispatch" ).SkillDefinition[] = []; | ||
| const agentDoc = agent as any; // Access optional fields added to schema |
There was a problem hiding this comment.
The code uses 'as any' type assertion to access optional fields added to the schema. While the comment acknowledges this, consider using proper TypeScript types or optional chaining with type guards for better type safety. For example, you could define a type that extends AgentDoc with the optional fields, or use type guards to check for the existence of these fields.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
convex/deploymentRouter.ts (3)
192-192:⚠️ Potential issue | 🟡 MinorType mismatch:
deployTier3still accepts_userId: stringwhiledeployTier1/deployTier2useId<"users">.The caller on line 43 passes
userIdfromgetAuthUserId, which returnsId<"users">. This compiles only becauseId<"users">extendsstringat runtime, but it's inconsistent with the other two tier functions and will mask type errors if the signature is relied upon later.Proposed fix
-async function deployTier3(_ctx: any, _args: any, _userId: string): Promise<any> { +async function deployTier3(_ctx: any, _args: any, _userId: Id<"users">): Promise<any> {
328-340:⚠️ Potential issue | 🟠 Major
updateHealthStatusis a public mutation with no auth check.This mutation is exported and callable by any client. An unauthenticated user could modify the
healthStatusandlastHealthCheckof any deployment by ID. Consider adding authentication or making this an internal mutation if it's only meant to be called frommonitorAgentCoreHealth.Option A: Make it an internal mutation
-import { mutation, query } from "./_generated/server"; +import { mutation, query, internalMutation } from "./_generated/server"; ... -export const updateHealthStatus = mutation({ +export const updateHealthStatus = internalMutation({Then update the caller on line 309 to use
internal.deploymentRouter.updateHealthStatus.Option B: Add auth guard
export const updateHealthStatus = mutation({ args: { deploymentId: v.id("deployments"), healthStatus: v.string(), lastHealthCheck: v.number(), }, handler: async (ctx, args) => { + const userId = await getAuthUserId(ctx); + if (!userId) throw new Error("Not authenticated"); await ctx.db.patch(args.deploymentId, {
260-268:⚠️ Potential issue | 🟠 MajorAuthorization bypass:
agentIdbranch drops theuserIdfilter.When
args.agentIdis provided, the query is completely replaced with one that filters only byagentId, losing theuserIdauthorization check. Any authenticated user can retrieve deployment history for any agent by providing its ID.The deployments table lacks a composite
by_agent_userindex (unlike the conversations table which has one), so the fix requires either:
- Adding a
.filter()to restore the userId check on the indexed query, or- Creating a composite index
by_agent_useron deployments and using it for dual-field indexed queriesProposed fix — filter by both userId and agentId
if (args.agentId) { query = ctx.db .query("deployments") - .withIndex("by_agent", (q) => q.eq("agentId", args.agentId!)); + .withIndex("by_agent", (q) => q.eq("agentId", args.agentId!)) + .filter((q) => q.eq(q.field("userId"), userId)); }convex/testExecution.ts (1)
420-422:⚠️ Potential issue | 🔴 CriticalBug:
retryTestmis-classifies Bedrock models as Ollama, skipping billing.Line 422 uses
agent.model.includes(':')to detect Ollama, but many Bedrock model IDs contain colons (e.g.,anthropic.claude-haiku-4-5-20251001-v1:0). This causesisOllamaModelto betruefor standard Bedrock models, so billing at line 485 is incorrectly skipped.The
submitTesthandler correctly checksincludes(':') && !includes('.')— the same guard must be applied here.🐛 Proposed fix
- const isOllamaModel = agent ? (agent.model.includes(':') || agent.deploymentType === "ollama") : false; + const isOllamaModel = agent + ? (agent.deploymentType === "ollama" || (agent.model.includes(':') && !agent.model.includes('.'))) + : false;convex/agentcoreTestExecution.ts (2)
135-143:⚠️ Potential issue | 🟠 MajorLambda fallback uses
agent.modelinstead ofeffectiveModel, ignoring the test-model override.Line 138 passes
agent.modelto the Lambda fallback, while the direct Bedrock path at line 120 correctly useseffectiveModel. If a user selected a different model for testing, the Lambda fallback would silently revert to the agent's original model.🐛 Proposed fix
result = await executeViaLambda( { agentCode: agent.generatedCode, input: args.input, - modelId: agent.model, + modelId: effectiveModel, tools: agent.tools || [], } );
354-364:⚠️ Potential issue | 🔴 CriticalSSRF allowlist missing
host.docker.internal— blocks the default Ollama endpoint set byextractModelConfig.In
convex/testExecution.ts(line 809), the default OllamabaseUrlishttp://host.docker.internal:11434. That URL flows throughtestDetails.modelConfig?.baseUrlat line 112 of this file. But line 355 here only allowslocalhost,127.0.0.1, and::1, so the default Docker-internal Ollama endpoint would be rejected.🐛 Proposed fix
// Validate Ollama endpoint to prevent SSRF - const allowedHosts = ["localhost", "127.0.0.1", "::1"]; + const allowedHosts = ["localhost", "127.0.0.1", "::1", "host.docker.internal"];src/components/AgentBuilder.tsx (2)
310-310:⚠️ Potential issue | 🟠 MajorStep index not updated after inserting the Skills step — auto-generate navigates to Architecture instead of Test.
setCurrentStep(4)at line 310 was likely intended to navigate to the Test step after auto-generating from a plan. With the new "Skills" step inserted at index 3, all subsequent indices shifted by 1: the Test step is now at index 5, not 4.🐛 Proposed fix
- setCurrentStep( 4 ); + setCurrentStep( 5 );
126-133:⚠️ Potential issue | 🟠 MajorInclude
skillsandthinkingLevelin thegenerateAgentandcreateAgentcalls.The
skillsandthinkingLevelfields are defined in theAgentConfiginterface and managed in UI state (lines 1267–1290), but they are not passed togenerateAgent(line 127) orcreateAgent(line 156). Since the backend execution readsthinkingLevelfrom the persisted agent document (convex/strandsAgentExecution.ts:214–216), these fields must be included in thecreateAgentcall to be saved. Add them to both mutation/action arguments:// Line 127: generateAgent call const result = await generateAgent( { name: config.name, model: config.model, systemPrompt: config.systemPrompt, tools: config.tools, deploymentType: config.deploymentType, skills: config.skills, thinkingLevel: config.thinkingLevel, } ); // Line 156: createAgent call const agentId = await createAgent( { name: config.name, description: config.description, model: config.model, systemPrompt: config.systemPrompt, tools: config.tools, generatedCode, dockerConfig, deploymentType: config.deploymentType, isPublic: false, exposableAsMCPTool: config.exposableAsMCPTool, mcpToolName: config.mcpToolName, mcpInputSchema: config.mcpInputSchema, skills: config.skills, thinkingLevel: config.thinkingLevel, } );The backend mutation and action signatures will also need to be updated to accept these parameters.
convex/metaAgentWorkflow.ts (1)
181-190:⚠️ Potential issue | 🔴 CriticalBug:
researchfield contains the function reference, not the result string.Line 186 assigns
researchStep(the function defined at line 229) instead ofresearchResult(the string captured at line 135). This means the workflow output contains a function reference rather than the research text, which will serialize incorrectly (likely asnullor cause a runtime error).🐛 Proposed fix
return { success: true, ...design, workflow: { thinking1: thinkingStep1, - research: researchStep, + research: researchResult, thinking2: thinkingStep2, thinking3: thinkingStep3, }, };
🤖 Fix all issues with AI agents
In `@convex/agentcoreTestExecution.ts`:
- Around line 83-84: effectiveProvider is being taken directly from
agent.modelProvider which may be undefined; change logic to derive provider from
deploymentType like testExecution.ts does by calling
extractModelConfig(effectiveModel, effectiveDeploymentType) and using its
provider result. Specifically, replace the use of agent.modelProvider when
computing effectiveProvider (currently alongside testDetails.modelProvider) with
the provider returned from extractModelConfig(effectiveModel,
effectiveDeploymentType) so functions like isOllama correctly detect Ollama
agents; update references to effectiveDeploymentType (derived from
testDetails.deploymentType || agent.deploymentType) and ensure
extractModelConfig is imported/available in this file.
In `@convex/lib/toolDispatch.ts`:
- Around line 193-208: dispatchToolCall currently allows unbounded recursive
agent dispatch when skillType === "agent" because it calls
executeAgentWithStrandsAgents; add a depth guard by introducing a depth
parameter (default 0) or a shared MAX_AGENT_DISPATCH_DEPTH constant, check depth
before the agent branch in dispatchToolCall and refuse/return an error when
depth >= MAX_AGENT_DISPATCH_DEPTH, and when invoking
ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, ...)
include/increment the depth so downstream dispatches call dispatchToolCall with
depth+1; update any callers of dispatchToolCall to pass the initial depth (0)
and ensure the returned error message clearly indicates exceeded max agent
dispatch depth.
In `@convex/mcpClient.ts`:
- Around line 187-189: invokeMCPTool currently uses caller-supplied args.userId
for billing; change it to derive and use the authenticated user id instead of
trusting args.userId: inside invokeMCPTool replace use of args.userId in the
billing/charge block with getAuthUserId(ctx) (or validate that args.userId ===
getAuthUserId(ctx) and reject/throw if not), and remove/ignore the userId
argument for billing purposes so only the authenticated user's account can be
charged.
In `@convex/modelRegistry.ts`:
- Around line 494-498: The new hardcoded Bedrock IDs in the modelRegistry block
are unverified and some lack the standard suffixes; update this by querying
Bedrock’s /v1/models at runtime and using that result instead of the hardcoded
entries: implement a fetch (e.g., add or update a helper like fetchBedrockModels
or getAvailableBedrockModels) to discover and validate IDs before adding them to
the existing modelRegistry map, fall back to a safe feature-flagged list (or
disable models) if verification fails, normalize IDs to include the expected
suffixes (e.g., ensure `:0` where applicable), and add unit/integration checks
that assert discovered IDs exist to prevent ValidationException on use.
In `@convex/strandsAgentExecution.ts`:
- Around line 354-357: When extractTokenUsage(responseBody, modelId) yields
zero, the fallback estimateTokenUsage currently passes an empty string for the
output which underestimates tokens; update the fallback to call
estimateTokenUsage(JSON.stringify(payload), JSON.stringify(responseBody)) (or
otherwise pass the actual response content) so the function receives both the
input and real output for a correct estimate—modify the code around
iterationTokens, extractTokenUsage, and estimateTokenUsage to use responseBody
as the output argument.
- Around line 340-448: The loop can exhaust MAX_TOOL_LOOP_ITERATIONS causing
silent success with empty finalToolCalls/finalContent; after the for loop in
strandsAgentExecution (the block using MAX_TOOL_LOOP_ITERATIONS, finalToolCalls,
finalContent, hasSkills, toolCalls, dispatchToolCall, buildToolResultMessages),
add a guard that detects exhaustion (e.g., a flag or check that the loop ran to
MAX_TOOL_LOOP_ITERATIONS and the last iteration produced tool_use blocks or
hasSkills was true) and then either log a warning and set a clear error result
or return success: false with an explanatory message and preserved token
accounting; ensure finalToolCalls/finalContent are set to reflect the last
observed toolCalls/content before returning so callers get a meaningful
response.
- Around line 263-266: Replace the hardcoded fallback that forces thinking on
Anthropic models by always using the mapThinkingLevelToPayload helper;
specifically, remove the ternary that sets `{ thinking: { type: "enabled",
budget_tokens: 3000 } }` and instead call mapThinkingLevelToPayload with the
current thinkingLevel and modelId (e.g., set thinkingConfig =
mapThinkingLevelToPayload(thinkingLevel, modelId)), so that undefined
thinkingLevel returns {} and avoids sending unsupported `thinking` params to
Claude 3.x/3.5.
🧹 Nitpick comments (10)
convex/testExecution.ts (1)
89-94: Duplicated Ollama-detection heuristic — extract a shared helper.The same
includes(":") && !includes(".")logic is computed twice: once for rate-limit bypass (lines 90-94) and again for model config extraction (lines 188-190). If one branch is updated without the other, they'll silently diverge. Extract a singlederiveDeploymentType(modelId, fallback)helper and call it once.♻️ Proposed refactor
+// Near the top of the handler, compute once: +function deriveDeploymentType(modelId: string, fallbackType?: string): string { + if (modelId.includes(":") && !modelId.includes(".")) return "ollama"; + return fallbackType || "bedrock"; +} // In submitTest handler, replace both computations: - const testModel = args.testModelId || agent.model; - const testDeployType = args.testModelId - ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType ) - : agent.deploymentType; - const isOllamaModel = testDeployType === "ollama" || (!testDeployType && testModel.includes(':') && !testModel.includes('.')); + const effectiveModel = args.testModelId || agent.model; + const effectiveDeploymentType = args.testModelId + ? deriveDeploymentType(args.testModelId, agent.deploymentType) + : agent.deploymentType; + const isOllamaModel = effectiveDeploymentType === "ollama" + || (!effectiveDeploymentType && deriveDeploymentType(effectiveModel) === "ollama"); // ...and remove the second computation at lines 187-190, reusing the same variables.Also applies to: 185-191
convex/agentBuilderWorkflow.ts (1)
493-509:streamWorkflowExecutionalways usesConverseCommand, bypassing the Claude-specific path ininvokeWorkflowModel.
executeWorkflowStageroutes Claude models toInvokeModelCommand(viainvokeWorkflowModel) for optimal results, butstreamWorkflowExecutionunconditionally usesConverseCommandat line 495. While Converse works with Claude, this means the two workflow entry points behave differently for the same model. Consider reusinginvokeWorkflowModelhere to keep behavior consistent.♻️ Proposed refactor sketch
for ( const stageName of stages ) { const stage = WORKFLOW_STAGES[stageName as keyof typeof WORKFLOW_STAGES]; // ... build fullPrompt ... - const command = new ConverseCommand( { ... } ); - const response = await bedrockClient.send( command ); - const stageOutput = response.output?.message?.content?.[0]?.text || ""; - const usage = response.usage ?? { inputTokens: 0, outputTokens: 0 }; + const result = await invokeWorkflowModel( { + modelId: effectiveModelId, + stageName: stage.name, + systemPrompt: stage.systemPrompt, + userPrompt: fullPrompt, + } ); + const stageOutput = result.outputText; + totalInputTokens += result.inputTokens; + totalOutputTokens += result.outputTokens;convex/metaTooling.ts (1)
756-820: Consider validatingnameformat to avoid collisions with seeded built-in skill names.A user could call
createSkillwith a name like"short_term_memory"which collides with a built-in skill name if the built-in was seeded under a different user's ID. The duplicate check at line 785 only checks the current user's namespace, so the user-created skill would shadow the public built-in for that user. This may be intentional (user customization), but worth calling out.convex/metaAgent.ts (1)
84-99: Clarify thatmodelIdis the "analysis" model, not the resulting agent's model.The user passes
modelIdwhich is used to run the requirements analysis (line 99), but the actual agent models are selected byselectOptimalModelsfrom the scoring database (line 102). A reader might expectmodelIdto be the model assigned to the designed agent. A brief inline comment would prevent confusion.convex/strandsAgentExecution.ts (3)
184-185: Unused dynamic import inexecuteViaAgentCore.Lines 184–185 import six symbols from
./lib/toolDispatch, but none are used withinexecuteViaAgentCore. All actual usage happens inexecuteDirectBedrock, which has its own identical import at lines 229–232. This is a wasted async import on every call.♻️ Suggested fix — remove the unused import
async function executeViaAgentCore( ctx: ActionCtx, agent: AgentDoc, message: string, history: ConversationMessage[] ): Promise<AgentExecutionResult> { - // Load agent's skills from dynamicTools table (if configured) - const { dispatchToolCall, buildToolsArray, mapThinkingLevelToPayload, - accumulateTokenUsage, MAX_TOOL_LOOP_ITERATIONS, buildToolResultMessages } = await import( "./lib/toolDispatch" ); - + // Load agent's skills from dynamicTools table (if configured) + const { SkillDefinition } = await import( "./lib/toolDispatch" );Actually, only the
SkillDefinitiontype is needed, and types are erased at runtime — so the entire import can be removed. The inlineimport("./lib/toolDispatch").SkillDefinitiontype annotation on line 187 is sufficient.- const { dispatchToolCall, buildToolsArray, mapThinkingLevelToPayload, - accumulateTokenUsage, MAX_TOOL_LOOP_ITERATIONS, buildToolResultMessages } = await import( "./lib/toolDispatch" ); -
187-187:agentSkillsis never reassigned — useconst.The array is only mutated via
.push(), so the binding itself is constant.- let agentSkills: import( "./lib/toolDispatch" ).SkillDefinition[] = []; + const agentSkills: import( "./lib/toolDispatch" ).SkillDefinition[] = [];
188-188:as anycast bypasses type safety for agent document fields.
skillsandthinkingLevelare defined in the schema as optional fields on theagentstable. If theAgentDoctype (fromDoc<"agents">) is generated from the schema, these fields should already be available without casting. If the generated types haven't caught up yet, a narrower type assertion is preferable.♻️ Suggested approach
- const agentDoc = agent as any; // Access optional fields added to schema + // skills and thinkingLevel are optional schema fields on agents + const agentDoc = agent as AgentDoc & { + skills?: Array<{ skillId?: string; name: string; enabled?: boolean }>; + thinkingLevel?: "low" | "medium" | "high"; + };convex/lib/toolDispatch.ts (1)
73-91:mapThinkingLevelToPayloadignores_modelId— consider validating model compatibility.The
_modelIdparameter is unused (prefixed with_), but not all Claude models support extended thinking. If a non-thinking-capable model is passed with a thinkingLevel, this will produce a payload that causes an API error. Consider validating model compatibility here or documenting the caller's responsibility.convex/agentSkills.test.ts (2)
44-51: Redundant assertion:typeof refcheck aftertoBeDefined().Line 49 (
expect(typeof ref).not.toBe("undefined")) is logically equivalent to theexpect(ref).toBeDefined()on line 48. You could remove one or replace both with a more meaningful assertion — e.g., checking that each ref is a string (Convex function references serialize to strings).
1-12: Test header listsdispatchToolCall routing logicbut no test covers it.The comment on line 6 mentions testing dispatch routing, but
dispatchToolCallis an async function requiringActionCtxmocking and isn't tested here. Either add a note that it's tested via integration tests elsewhere, or remove it from the header to avoid confusion.
| // ============================================================================ | ||
| // NEW MODELS — Feb 10, 2026 announcement | ||
| // https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-bedrock-adds-support-six-open-weights-models/ | ||
| // NOTE: Model IDs predicted from naming patterns. Verify in AWS console. | ||
| // ============================================================================ |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
AWS Bedrock February 2026 announcement new models DeepSeek MiniMax GLM Qwen Kimi
💡 Result:
AWS’s February 2026 Amazon Bedrock announcement (posted Feb 10, 2026) added six new fully managed “open weights” models—including the providers you listed (DeepSeek, MiniMax, Zhipu/GLM, Qwen, Moonshot/Kimi): DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, and Qwen3 Coder Next. It also states these models are served via Project Mantle, a new distributed inference engine on Bedrock, and mentions OpenAI API–compatible endpoints. [1]
AWS also posted a region-specific follow-up on Feb 12, 2026 saying the latest open-weight models are available in Asia Pacific (Sydney) via the bedrock-mantle endpoint (powered by Project Mantle). [2]
Sources: [1], [2]
🌐 Web query:
AWS Bedrock DeepSeek V3.2 MiniMax M2.1 model ID format bedrock-mantle
💡 Result:
On Amazon Bedrock’s bedrock-mantle (OpenAI‑compatible) endpoint, you don’t hardcode a special “bedrock-mantle” model ID format—you use the model id returned by the Mantle /v1/models (OpenAI Models API), and then pass that id as the model field in Responses / Chat Completions requests. (docs.aws.amazon.com)
What the IDs generally look like (Bedrock side)
Bedrock model identifiers are typically of the form:
- Base model ID:
provider.model-name-version(often with an optional suffix like:0) (docs.aws.amazon.com)
Example shown in AWS docs for DeepSeek V3.1:deepseek.v3-v1:0. (docs.aws.amazon.com) - Cross‑Region inference profile ID: adds a geographic prefix like
us.
Example in AWS docs:us.deepseek.r1-v1:0. (docs.aws.amazon.com)
DeepSeek V3.2 and MiniMax M2.1 (Project Mantle)
AWS announced DeepSeek V3.2 and MiniMax M2.1 as models served via Project Mantle (and therefore available via the Mantle/OpenAI‑compatible endpoints). (aws.amazon.com)
Because the exact Mantle “model id” strings you must pass can vary by rollout/Region, the dependable way to get the correct IDs is to list models from your target bedrock-mantle.<region>.api.aws endpoint and use the returned id. (docs.aws.amazon.com)
Minimal way to discover the correct IDs
curl https://bedrock-mantle.us-east-1.api.aws/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"(Use the id values in the response as your model.) (docs.aws.amazon.com)
Citations:
- 1: https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html?utm_source=openai
- 2: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_FoundationModelDetails.html?utm_source=openai
- 3: https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html
- 4: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-deepseek.html?utm_source=openai
- 5: https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-bedrock-adds-support-six-open-weights-models/?utm_source=openai
- 6: https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html?utm_source=openai
- 7: https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html?utm_source=openai
🏁 Script executed:
sed -n '494,620p' convex/modelRegistry.tsRepository: MikePfunk28/agent_builder_application
Length of output: 4432
🏁 Script executed:
# Check if there's any indication these IDs have been tested or verified
grep -n "deepseek.v3-0324\|minimax.minimax-m2\|zhipu.glm\|moonshot.kimi\|qwen.qwen3" convex/modelRegistry.ts | head -20Repository: MikePfunk28/agent_builder_application
Length of output: 858
🏁 Script executed:
# Check if there are any tests that use these new model IDs
grep -r "deepseek.v3-0324\|minimax.minimax-m2\|zhipu.glm\|moonshot.kimi-k2-5\|qwen.qwen3" --include="*.test.*" --include="*.spec.*" --include="*.test.ts" --include="*.spec.ts" 2>/dev/null | head -20Repository: MikePfunk28/agent_builder_application
Length of output: 63
🏁 Script executed:
# Check if there's any documentation or README about verifying these model IDs
find . -name "README*" -o -name "CONTRIBUTING*" -o -name "*MODELS*" -o -name "*.md" | xargs grep -l "bedrock\|model.*id\|verify" 2>/dev/null | head -10Repository: MikePfunk28/agent_builder_application
Length of output: 523
🏁 Script executed:
# Check the project README for any model verification guidance
head -100 README.mdRepository: MikePfunk28/agent_builder_application
Length of output: 5107
Model IDs lack verification despite real AWS announcement.
The February 10, 2026 AWS Bedrock announcement is confirmed, but the hardcoded model IDs in lines 494–620 remain unverified. The code itself documents this with inline notes ("predicted from naming patterns," "not yet confirmed in AWS docs"). Additionally:
- Several IDs deviate from standard Bedrock format (MiniMax and Kimi entries lack the
:0suffix convention) - GLM 4.7 entries explicitly note uncertain prefix: "Verify prefix (zhipu. vs z-ai.)"
- AWS best practice recommends discovering model IDs dynamically via the
/v1/modelsAPI endpoint rather than hardcoding them - No tests verify these IDs will work with Bedrock
If any ID is incorrect, all invocations for that model will fail with a Bedrock ValidationException. Before merging to main, either verify each ID against the AWS console/SDK or gate these models behind a feature flag.
🤖 Prompt for AI Agents
In `@convex/modelRegistry.ts` around lines 494 - 498, The new hardcoded Bedrock
IDs in the modelRegistry block are unverified and some lack the standard
suffixes; update this by querying Bedrock’s /v1/models at runtime and using that
result instead of the hardcoded entries: implement a fetch (e.g., add or update
a helper like fetchBedrockModels or getAvailableBedrockModels) to discover and
validate IDs before adding them to the existing modelRegistry map, fall back to
a safe feature-flagged list (or disable models) if verification fails, normalize
IDs to include the expected suffixes (e.g., ensure `:0` where applicable), and
add unit/integration checks that assert discovered IDs exist to prevent
ValidationException on use.
convex/strandsAgentExecution.ts
Outdated
| // Determine thinking config: use thinkingLevel if set, else default | ||
| const thinkingConfig = thinkingLevel | ||
| ? mapThinkingLevelToPayload( thinkingLevel, modelId ) | ||
| : { thinking: { type: "enabled", budget_tokens: 3000 } }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's explore the file structure and find the mapThinkingLevelToPayload function
find . -type f -name "*.ts" -o -name "*.js" | head -20Repository: MikePfunk28/agent_builder_application
Length of output: 656
🏁 Script executed:
# Search for the mapThinkingLevelToPayload function definition
rg "mapThinkingLevelToPayload" --type ts --type js -A 10Repository: MikePfunk28/agent_builder_application
Length of output: 5586
🏁 Script executed:
# Check the context of the file mentioned in the review
fd strandsAgentExecution.tsRepository: MikePfunk28/agent_builder_application
Length of output: 112
🌐 Web query:
Anthropic Claude extended thinking support which models
💡 Result:
Anthropic Claude extended thinking is supported on these models: Claude Opus 4.6, Claude Opus 4.5, Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4.5, Claude Sonnet 4, Claude Sonnet 3.7 (deprecated), and Claude Haiku 4.5. [1]
🏁 Script executed:
# Search for model ID references throughout the codebase
rg "claude" --type ts -i | grep -i "model\|version"Repository: MikePfunk28/agent_builder_application
Length of output: 16802
🏁 Script executed:
# Check the strandsAgentExecution.ts file more closely around the problematic code
cat convex/strandsAgentExecution.ts | sed -n '250,280p'Repository: MikePfunk28/agent_builder_application
Length of output: 1097
🏁 Script executed:
# Search for any configuration or constants related to model selection
rg "claude-3\|claude-opus\|claude-sonnet\|claude-haiku" --type tsRepository: MikePfunk28/agent_builder_application
Length of output: 63
Remove hardcoded thinking default—it breaks Claude 3.x and 3.5 models still referenced in the codebase.
The hardcoded fallback { thinking: { type: "enabled", budget_tokens: 3000 } } forces extended thinking on all Anthropic models when thinkingLevel is undefined, even though older Claude 3.x and 3.5 models (which are still used in tests and defaults throughout the codebase) don't support the thinking parameter. This will cause API errors.
The mapThinkingLevelToPayload(undefined, modelId) function already correctly returns {} for undefined levels. Use it as the default:
Suggested fix
- const thinkingConfig = thinkingLevel
- ? mapThinkingLevelToPayload( thinkingLevel, modelId )
- : { thinking: { type: "enabled", budget_tokens: 3000 } };
+ const thinkingConfig = mapThinkingLevelToPayload( thinkingLevel, modelId );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Determine thinking config: use thinkingLevel if set, else default | |
| const thinkingConfig = thinkingLevel | |
| ? mapThinkingLevelToPayload( thinkingLevel, modelId ) | |
| : { thinking: { type: "enabled", budget_tokens: 3000 } }; | |
| // Determine thinking config: use thinkingLevel if set, else default | |
| const thinkingConfig = mapThinkingLevelToPayload( thinkingLevel, modelId ); |
🤖 Prompt for AI Agents
In `@convex/strandsAgentExecution.ts` around lines 263 - 266, Replace the
hardcoded fallback that forces thinking on Anthropic models by always using the
mapThinkingLevelToPayload helper; specifically, remove the ternary that sets `{
thinking: { type: "enabled", budget_tokens: 3000 } }` and instead call
mapThinkingLevelToPayload with the current thinkingLevel and modelId (e.g., set
thinkingConfig = mapThinkingLevelToPayload(thinkingLevel, modelId)), so that
undefined thinkingLevel returns {} and avoids sending unsupported `thinking`
params to Claude 3.x/3.5.
| for ( let iteration = 0; iteration < MAX_TOOL_LOOP_ITERATIONS; iteration++ ) { | ||
| const command = new InvokeModelCommand( { | ||
| modelId: modelId, | ||
| contentType: "application/json", | ||
| accept: "application/json", | ||
| body: JSON.stringify( payload ), | ||
| } ); | ||
|
|
||
| const response = await client.send( command ); | ||
| const responseBody = JSON.parse( | ||
| new TextDecoder().decode( response.body ) | ||
| ) as BedrockInvokeResponse; | ||
|
|
||
| let content = ""; | ||
| let reasoning = ""; | ||
| const toolCalls: ToolCall[] = []; | ||
|
|
||
| if ( responseBody.content && Array.isArray( responseBody.content ) ) { | ||
| // Anthropic models: content is an array of typed blocks | ||
| for ( const block of responseBody.content ) { | ||
| if ( block.type === "text" ) { | ||
| content += block.text; | ||
| } else if ( block.type === "thinking" ) { | ||
| reasoning += block.thinking; | ||
| } else if ( block.type === "tool_use" ) { | ||
| const id = typeof block.id === "string" ? block.id : undefined; | ||
| const name = typeof block.name === "string" ? block.name : undefined; | ||
| toolCalls.push( { | ||
| id, | ||
| name, | ||
| input: block.input, | ||
| } ); | ||
| } | ||
| const response = await client.send( command ); | ||
| const responseBody = JSON.parse( | ||
| new TextDecoder().decode( response.body ) | ||
| ) as BedrockInvokeResponse; | ||
|
|
||
| // ─── Token extraction per iteration ───────────────────────────────── | ||
| let iterationTokens = extractTokenUsage( responseBody, modelId ); | ||
| if ( iterationTokens.totalTokens === 0 ) { | ||
| iterationTokens = estimateTokenUsage( JSON.stringify( payload ), "" ); | ||
| } | ||
| } else if ( typeof responseBody.generation === "string" ) { | ||
| // Meta/Llama models: single generation string | ||
| content = responseBody.generation; | ||
| } else if ( responseBody.outputs && Array.isArray( responseBody.outputs ) ) { | ||
| // Mistral models: outputs array with text fields | ||
| content = responseBody.outputs.map( ( o ) => o.text || "" ).join( "" ); | ||
| } else if ( responseBody.generations && Array.isArray( responseBody.generations ) ) { | ||
| // Cohere models: generations array | ||
| content = responseBody.generations.map( ( g ) => g.text || "" ).join( "" ); | ||
| } else if ( responseBody.completions && Array.isArray( responseBody.completions ) ) { | ||
| // AI21 models: completions array | ||
| content = responseBody.completions.map( ( c ) => c.data?.text || "" ).join( "" ); | ||
| } else if ( responseBody.results && Array.isArray( responseBody.results ) ) { | ||
| // Amazon Titan models: results array | ||
| content = responseBody.results.map( ( r ) => r.outputText || "" ).join( "" ); | ||
| } else { | ||
| // Fallback: try to extract text from any field in the response | ||
| // Use the already-parsed `responseBody` and avoid logging raw/sensitive content. | ||
| console.warn( `Unrecognized Bedrock response format for model ${modelId}. Response did not match expected fields.` ); | ||
| try { | ||
| if ( typeof responseBody === "string" ) { | ||
| const parsed = JSON.parse( responseBody ); | ||
| content = typeof parsed === "string" ? parsed : JSON.stringify( parsed ); | ||
| } else if ( responseBody && typeof responseBody === "object" ) { | ||
| // Preserve a JSON representation of the object as the fallback content | ||
| content = JSON.stringify( responseBody ); | ||
| } else { | ||
| content = String( responseBody ); | ||
| accumulateTokenUsage( accumulatedTokens, iterationTokens ); | ||
|
|
||
| // ─── Parse response ───────────────────────────────────────────────── | ||
| let content = ""; | ||
| let reasoning = ""; | ||
| const toolCalls: ToolCall[] = []; | ||
| const assistantContentBlocks: Array<{ type: string;[key: string]: any }> = []; | ||
|
|
||
| if ( responseBody.content && Array.isArray( responseBody.content ) ) { | ||
| // Anthropic models: content is an array of typed blocks | ||
| for ( const block of responseBody.content ) { | ||
| assistantContentBlocks.push( block ); | ||
| if ( block.type === "text" ) { | ||
| content += block.text; | ||
| } else if ( block.type === "thinking" ) { | ||
| reasoning += block.thinking; | ||
| } else if ( block.type === "tool_use" ) { | ||
| const id = typeof block.id === "string" ? block.id : undefined; | ||
| const name = typeof block.name === "string" ? block.name : undefined; | ||
| toolCalls.push( { | ||
| id, | ||
| name, | ||
| input: block.input, | ||
| } ); | ||
| } | ||
| } | ||
| } catch { | ||
| // If JSON.parse fails for some reason, fall back to a best-effort string | ||
| } else if ( typeof responseBody.generation === "string" ) { | ||
| content = responseBody.generation; | ||
| } else if ( responseBody.outputs && Array.isArray( responseBody.outputs ) ) { | ||
| content = responseBody.outputs.map( ( o ) => o.text || "" ).join( "" ); | ||
| } else if ( responseBody.generations && Array.isArray( responseBody.generations ) ) { | ||
| content = responseBody.generations.map( ( g ) => g.text || "" ).join( "" ); | ||
| } else if ( responseBody.completions && Array.isArray( responseBody.completions ) ) { | ||
| content = responseBody.completions.map( ( c ) => c.data?.text || "" ).join( "" ); | ||
| } else if ( responseBody.results && Array.isArray( responseBody.results ) ) { | ||
| content = responseBody.results.map( ( r ) => r.outputText || "" ).join( "" ); | ||
| } else { | ||
| console.warn( `Unrecognized Bedrock response format for model ${modelId}. Response did not match expected fields.` ); | ||
| try { | ||
| content = JSON.stringify( responseBody ); | ||
| if ( typeof responseBody === "string" ) { | ||
| const parsed = JSON.parse( responseBody ); | ||
| content = typeof parsed === "string" ? parsed : JSON.stringify( parsed ); | ||
| } else if ( responseBody && typeof responseBody === "object" ) { | ||
| content = JSON.stringify( responseBody ); | ||
| } else { | ||
| content = String( responseBody ); | ||
| } | ||
| } catch { | ||
| content = String( responseBody ); | ||
| try { | ||
| content = JSON.stringify( responseBody ); | ||
| } catch { | ||
| content = String( responseBody ); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // ─── Token extraction for billing ─────────────────────────────────────── | ||
| const { extractTokenUsage, estimateTokenUsage } = await import( "./lib/tokenBilling" ); | ||
| let tokenUsage = extractTokenUsage( responseBody, modelId ); | ||
| // Accumulate text and reasoning across iterations | ||
| if ( content ) finalContent = content; // Last iteration's text wins | ||
| if ( reasoning ) finalReasoning += ( finalReasoning ? "\n---\n" : "" ) + reasoning; | ||
|
|
||
| // ─── Agent Loop: Execute tool calls if agent has skills ────────────── | ||
| if ( hasSkills && toolCalls.length > 0 ) { | ||
| // Execute each tool call via shared dispatch | ||
| const toolResults: import( "./lib/toolDispatch" ).ToolResult[] = []; | ||
| for ( const tc of toolCalls ) { | ||
| if ( !tc.id || !tc.name ) continue; | ||
| const result = await dispatchToolCall( | ||
| ctx, tc.name, ( tc.input as Record<string, any> ) || {}, | ||
| agentSkills, String( agent.createdBy ), | ||
| ); | ||
| toolResults.push( { | ||
| toolUseId: tc.id, | ||
| output: result.success ? result.output : ( result.error || "Tool execution failed" ), | ||
| isError: !result.success, | ||
| } ); | ||
| } | ||
|
|
||
| // Build assistant + tool_result messages and append to conversation | ||
| const loopMessages = buildToolResultMessages( assistantContentBlocks, toolResults ); | ||
| // Update payload messages for the next iteration | ||
| const currentMessages = ( payload.messages as any[] ) || []; | ||
| payload.messages = [...currentMessages, ...loopMessages]; | ||
|
|
||
| // Continue the loop — model will see tool results and respond | ||
| continue; | ||
| } | ||
|
|
||
| // Fallback: estimate from text when provider doesn't return counts | ||
| if ( tokenUsage.totalTokens === 0 ) { | ||
| const inputText = JSON.stringify( payload ); | ||
| tokenUsage = estimateTokenUsage( inputText, content ); | ||
| // No tool calls (or no skills) — we're done, break out of loop | ||
| finalToolCalls = toolCalls; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Silent failure when tool loop exhausts MAX_TOOL_LOOP_ITERATIONS.
If the model continuously requests tool calls for all 10 iterations, the for loop exits without hitting the break on line 447. In that case:
finalToolCallsremains[](never assigned).finalContentis the last iteration's text, which may be empty if the last response was purelytool_useblocks.- The function returns
success: truewith potentially empty content and no indication that the loop was truncated.
Add a guard after the loop to detect exhaustion and either warn or return a meaningful error.
🛡️ Suggested fix
// No tool calls (or no skills) — we're done, break out of loop
finalToolCalls = toolCalls;
break;
}
+
+ // If the loop completed without breaking, the model exhausted the iteration limit
+ if ( hasSkills && finalToolCalls.length === 0 && !finalContent.trim() ) {
+ console.warn( `Agent loop exhausted ${MAX_TOOL_LOOP_ITERATIONS} iterations without producing a final response.` );
+ }🤖 Prompt for AI Agents
In `@convex/strandsAgentExecution.ts` around lines 340 - 448, The loop can exhaust
MAX_TOOL_LOOP_ITERATIONS causing silent success with empty
finalToolCalls/finalContent; after the for loop in strandsAgentExecution (the
block using MAX_TOOL_LOOP_ITERATIONS, finalToolCalls, finalContent, hasSkills,
toolCalls, dispatchToolCall, buildToolResultMessages), add a guard that detects
exhaustion (e.g., a flag or check that the loop ran to MAX_TOOL_LOOP_ITERATIONS
and the last iteration produced tool_use blocks or hasSkills was true) and then
either log a warning and set a clear error result or return success: false with
an explanatory message and preserved token accounting; ensure
finalToolCalls/finalContent are set to reflect the last observed
toolCalls/content before returning so callers get a meaningful response.
- Updated model identification logic to use `isOllamaModelId` for better clarity and accuracy. - Removed unnecessary console logs from agent execution and unified agent execution functions. - Enhanced validation for agent configuration in the AgentBuilder component using Zod schema. - Improved handling of dynamic input types across various tools and workflows. - Added iterative execution mode in AgentTester with completion criteria and result tracking. - Cleaned up console logs in ConversationChat for better readability and performance. - Updated package.json to include Zod for schema validation.
…ssage handling in MCP tool invocation
Deploying agent-builder-application with
|
| Latest commit: |
384e1aa
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://4273bf92.agent-builder-application.pages.dev |
| Branch Preview URL: | https://012-visual-builder-update.agent-builder-application.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (18)
convex/swarmTestingOrchestrator.ts (2)
782-788:⚠️ Potential issue | 🔴 CriticalRuntime crash:
.map()called on a plain object (Record<string, any>).
sendMessageToSwarmreturnsresponsesasRecord<string, any>(line 137), not an array. Calling.map()on it will throwTypeError: result.responses.map is not a function. Line 392 handles this correctly withObject.entries(...).🐛 Proposed fix
- const formattedResults = result.responses.map(([agentId, response]: [string, any]) => ({ + const formattedResults = Object.entries(result.responses).map(([agentId, response]: [string, any]) => ({ agentId, success: response.success, output: response.content || response.response, executionTime: response.executionTime, error: response.error, }));
418-419:⚠️ Potential issue | 🟡 MinorDivision by zero when
resultsis empty yieldsNaNscores.If no agents respond (empty
responses),results.lengthis 0 and both scores becomeNaN. Guard with a zero check.Proposed fix
- const isolationScore = (totalIsolationScore / results.length) * 100; - const coordinationScore = (totalCoordinationScore / results.length) * 100; + const isolationScore = results.length > 0 ? (totalIsolationScore / results.length) * 100 : 0; + const coordinationScore = results.length > 0 ? (totalCoordinationScore / results.length) * 100 : 0;convex/apiKeys.ts (1)
259-266:⚠️ Potential issue | 🟠 Major
Math.random()is not cryptographically secure for API key generation.
generateSecureTokenusesMath.random(), which is a PRNG and predictable. For API key material, usecrypto.getRandomValues(available in the Web Crypto API / Node.js) to produce cryptographically strong tokens.🔒 Proposed fix
function generateSecureToken(length: number): string { const chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'; - let result = ''; - for (let i = 0; i < length; i++) { - result += chars.charAt(Math.floor(Math.random() * chars.length)); - } - return result; + const values = new Uint32Array(length); + crypto.getRandomValues(values); + return Array.from(values, (v) => chars[v % chars.length]).join(''); }convex/integration.test.ts (1)
1112-1148:⚠️ Potential issue | 🟠 Major
beforeEachdata is invisible to tests due to separateconvexTestinstances.The
beforeEachat line 1112 creates its ownconvexTest(schema, modules)and inserts records (settingtestUserId,testAgentId). However, everytestblock (e.g., line 1151) creates a newconvexTestinstance with its own in-memory database. The IDs stored intestUserId/testAgentIdwon't resolve in the test'st, soagents.find(a => a._id === testAgentId)at line 1160 will never match the inserted agent.The same pattern is repeated in the "MCP Server Integration" (line 673), "AWS Diagram Generator" (line 831), and "Bedrock AgentCore" (line 1009)
describeblocks. EachbeforeEachis effectively dead code.Either share the same
tinstance acrossbeforeEachandtest, or move the setup into each test.convex/agentCoreTester.ts (1)
298-301:⚠️ Potential issue | 🟠 MajorHardcoded fallback
"placeholder-token"will silently send unauthenticated requests.If
AGENTCORE_AUTH_TOKENis unset, this falls through to a literal placeholder string that will be sent as a Bearer token to the real AgentCore endpoint. This should throw an error (like the credential helpers inconvex/lib/aws/credentials.ts) rather than silently degrade.Proposed fix
private async getAuthToken(): Promise<string> { - return process.env.AGENTCORE_AUTH_TOKEN || "placeholder-token"; + const token = process.env.AGENTCORE_AUTH_TOKEN; + if (!token) { + throw new Error("Missing AGENTCORE_AUTH_TOKEN environment variable"); + } + return token; }src/components/AgentTester.tsx (1)
212-221:⚠️ Potential issue | 🟡 MinorStop button is non-functional during iterative mode.
isRunningon line 105 includesiterativeRunning, so the Stop button is shown during iterative execution. However,handleStop(line 212) early-returns whencurrentTestIdisnull, andcurrentTestIdis never set in the iterative path. Clicking Stop during iterative mode silently does nothing.Either disable the Stop button when
iterativeRunningis true, or implement cancellation for the iterative action (e.g., via anAbortController).Quick fix: disable Stop in iterative mode
<button onClick={handleStop} + disabled={iterativeRunning} className="flex items-center gap-2 px-6 py-3 bg-red-600 text-white rounded-lg hover:bg-red-700 transition-colors" >Also applies to: 105-105
convex/agentcoreTestExecution.ts (1)
140-148:⚠️ Potential issue | 🟠 MajorLambda fallback uses
agent.modelinstead ofeffectiveModel, bypassing the test-level model override.Line 143 passes
agent.modelto the Lambda fallback, while lines 115, 125 correctly useeffectiveModel. If a test specifies a model override viatestDetails.modelConfig?.modelId, the Lambda fallback will invoke the wrong model.Proposed fix
result = await executeViaLambda( { agentCode: agent.generatedCode, input: args.input, - modelId: agent.model, + modelId: effectiveModel, tools: agent.tools || [], } );convex/promptChainExecutor.ts (1)
245-253:⚠️ Potential issue | 🟠 MajorMissing try/catch on parallel billing — inconsistent with sequential and testPrompt paths.
The sequential chain (lines 124–137) and
testPrompt(lines 463–477) both wrap billing in a try/catch marked as non-fatal. The parallel path will throw and fail the entire action ifincrementUsageAndReportOveragerejects, even though the model invocations already succeeded.Proposed fix: wrap in try/catch like the other paths
// Meter: token-based billing for all parallel prompts if ( gateUserId && ( totalInputTokens > 0 || totalOutputTokens > 0 ) ) { - await ctx.runMutation( internal.stripeMutations.incrementUsageAndReportOverage, { - userId: gateUserId, - modelId: gateModelId, - inputTokens: totalInputTokens, - outputTokens: totalOutputTokens, - } ); + try { + await ctx.runMutation( internal.stripeMutations.incrementUsageAndReportOverage, { + userId: gateUserId, + modelId: gateModelId, + inputTokens: totalInputTokens, + outputTokens: totalOutputTokens, + } ); + } catch ( billingErr ) { + console.error( "promptChainExecutor: parallel billing failed (non-fatal)", { + userId: gateUserId, modelId: gateModelId, + inputTokens: totalInputTokens, outputTokens: totalOutputTokens, + error: billingErr instanceof Error ? billingErr.message : billingErr, + } ); + } }convex/codeGenerator.ts (1)
475-488:⚠️ Potential issue | 🟡 MinorRemaining raw
tool.nameinterpolations in generated Python f-strings.Lines 480 and 487 interpolate
tool.namedirectly into Python f-string literals without escaping. Iftool.namecontains",{,}, or newlines, the generated Python code will be syntactically broken or could allow injection.Line 475 correctly uses
safeToolName— apply the same treatment to lines 480 and 487.Proposed fix
- result = f"Tool ${tool.name} executed successfully" + result = f"Tool ${safeToolName} executed successfully" ${params.length > 0 ? ` - # Available parameters: ${params.map((p: any) => p.name).join(', ')} + # Available parameters: ${params.map((p: any) => String(p.name).replace(/[^a-zA-Z0-9_]/g, '_')).join(', ')} logger.debug(f"Parameters: {locals()}")` : ''} return result except Exception as e: - logger.error(f"Error in custom tool ${tool.name}: {str(e)}") + logger.error(f"Error in custom tool ${safeToolName}: {str(e)}") raise`;convex/agentAsToolGenerator.ts (1)
112-118:⚠️ Potential issue | 🟡 MinorRaw
agentNameremains in Python f-strings at lines 114 and 117, undoing the escaping applied elsewhere.The error-path strings still interpolate the original
agentNamedirectly into Python f-strings. IfagentNamecontains",{, or}, the generated Python will break. UsesafeAgentNameDQ(which is already computed on line 48) for consistency with the rest of this function.Proposed fix
- return f"Error from ${agentName}: {result.get('error', 'Unknown error')}" + return f"Error from ${safeAgentNameDQ}: {result.get('error', 'Unknown error')}" except Exception as e: - return f"Failed to invoke ${agentName}: {str(e)}" + return f"Failed to invoke ${safeAgentNameDQ}: {str(e)}"convex/agents.ts (1)
181-196:⚠️ Potential issue | 🟡 MinorParse
mcpInputSchemabefore returning — type inconsistency causes test failures.When
agent.mcpInputSchemais a non-empty string, line 185 returns it directly asinputSchema. However, test code atconvex/metaToolingIntegration.test.ts:480-483accesses.propertiesoninputSchema, expecting an object. This causes a type mismatch:inputSchemais either a raw JSON string or an object literal.Parse the string before returning:
Proposed fix
return agents.map(agent => ({ _id: agent._id, name: agent.mcpToolName || agent.name, description: agent.description || `AI agent: ${agent.name}`, - inputSchema: agent.mcpInputSchema || { + inputSchema: (agent.mcpInputSchema ? JSON.parse(agent.mcpInputSchema) : null) || { type: "object", properties: { input: { type: "string", description: "Input to the agent" } }, required: ["input"] } }));Consider using
safeJsonParsefromconvex/lib/jsonUtils.tsinstead to handle malformed data gracefully.convex/agentExecution.test.ts (1)
686-707:⚠️ Potential issue | 🟡 MinorTest is now a no-op — no assertions remain.
After removing the ECS assertions, this test patches fields and queries data but asserts nothing. It should either be removed or updated to verify AgentCore-specific cleanup behavior.
Suggested fix: add minimal assertion or mark as TODO
// Test routed through AgentCore — ECS fields are deprecated // Verify the test was created and queued successfully + expect(test).toBeDefined(); + expect(test.status).toBe("QUEUED"); + // TODO: Add AgentCore-specific resource cleanup assertions });convex/diagramGenerator.ts (1)
92-128:⚠️ Potential issue | 🟡 MinorUnsanitized
tool.nameinterpolated into generated Python code.
tool.nameis injected directly into Python string literals (e.g., line 115:"${tool.name}"). A tool name containing a double quote or backslash would produce invalid Python.agentNamehas the same risk in the diagram title string (line 96).Consider escaping or sanitizing these values for use in Python strings.
Suggested helper
+function escapePythonString(s: string): string { + return s.replace(/\\/g, '\\\\').replace(/"/g, '\\"'); +}Then use
escapePythonString(tool.name)in template literals.convex/tools.ts (1)
414-422:⚠️ Potential issue | 🟡 MinorPotential crash if all reasoning paths fail with identical error messages.
sortedAnswers[0]on Line 421 will throw ifvoteCountsis empty. Whileanswersis always pushed to (even on error), it's possible all entries are identical error strings, in which casesortedAnswerswill have entries. However, ifnumPathsis 0 (e.g., someone passesnumPaths: 0),answerswould be empty, causing a crash. Pre-existing issue, but worth noting.convex/metaAgent.ts (1)
246-279:⚠️ Potential issue | 🟠 MajorHandle missing
requiredTools/requiredDataSourcesfrom LLM output.If the model returns partial JSON, the current loops will throw and fail the whole design flow. Defensive defaults keep the system resilient to imperfect model output.
🛠️ Suggested fix
- for (const requiredTool of analysis.requiredTools) { + const requiredTools = Array.isArray( analysis.requiredTools ) ? analysis.requiredTools : []; + for (const requiredTool of requiredTools) { const toolKey = requiredTool.toLowerCase().replace(/\s+/g, "_"); if (toolMapping[toolKey]) { tools.push(toolMapping[toolKey]); } } - for (const dataSource of analysis.requiredDataSources) { + const requiredDataSources = Array.isArray( analysis.requiredDataSources ) ? analysis.requiredDataSources : []; + for (const dataSource of requiredDataSources) { tools.push({convex/agentBuilderWorkflow.ts (1)
269-313:⚠️ Potential issue | 🟠 MajorUse resolved Bedrock model ID consistently for gating + billing.
invokeWorkflowModelresolves model IDs internally, but billing useseffectiveModelId. If the user supplies an alias, billing/pricing lookups can mis‑match or fail. Resolve once and use the resolved ID for gating, invocation, and billing.🛠️ Suggested fix
- const effectiveModelId = args.modelId || DEFAULT_WORKFLOW_MODEL; + const effectiveModelId = args.modelId || DEFAULT_WORKFLOW_MODEL; + const { resolveBedrockModelId } = await import( "./modelRegistry.js" ); + const resolvedModelId = resolveBedrockModelId( effectiveModelId ); const { requireBedrockAccess } = await import( "./lib/bedrockGate" ); const gateResult = await requireBedrockAccess( - ctx, effectiveModelId, + ctx, resolvedModelId, async ( lookupArgs ) => ctx.runQuery( internal.users.getInternal, lookupArgs ), ); const result = await invokeWorkflowModel( { - modelId: effectiveModelId, + modelId: resolvedModelId, stageName: stage.name, systemPrompt: stage.systemPrompt, userPrompt: fullPrompt, } ); await ctx.runMutation( internal.stripeMutations.incrementUsageAndReportOverage, { userId: gateResult.userId, - modelId: effectiveModelId, + modelId: resolvedModelId, inputTokens: result.inputTokens, outputTokens: result.outputTokens, } );src/components/AgentBuilder.tsx (1)
128-134:⚠️ Potential issue | 🟠 MajorAdd
skillsandthinkingLevelfields to agent save, generate, and export payloads.The SkillsStep component mutates
configwith these fields, but the backend mutation and action signatures don't accept them, causing user selections to be silently dropped. This breaks the end-to-end feature.🛠️ Suggested fix
Update the following backend functions to accept and persist these fields:
export const generateAgent = action({ args: { name: v.string(), model: v.string(), systemPrompt: v.string(), tools: v.array(...), + skills: v.optional(v.array(v.object({ + skillId: v.optional(v.id("dynamicTools")), + name: v.string(), + enabled: v.optional(v.boolean()), + }))), + thinkingLevel: v.optional(v.union( + v.literal("low"), + v.literal("medium"), + v.literal("high") + )), deploymentType: v.string(), ... },Also update
agents.createandagents.updatemutations with the same fields, and include them in the frontend payloads at lines 128–134, 173–186, and 404–413.convex/modelRegistry.ts (1)
969-1032:⚠️ Potential issue | 🟠 MajorMismatch between SHORT_NAME_TO_BEDROCK_ID and BEDROCK_MODELS causes lookup failures.
SHORT_NAME_TO_BEDROCK_ID maps to cross-region prefixed IDs like
"us.amazon.nova-pro-v1:0"and"us.meta.llama3-3-70b-instruct-v1:0", but BEDROCK_MODELS keys use base IDs without region prefixes (e.g.,"amazon.nova-pro-v1:0","meta.llama3-3-70b-instruct-v1:0"). WhenresolveBedrockModelId("nova-pro")returns the prefixed ID, subsequent lookups ingetUnitsForModel()andgetModelThinkingConfig()fail to find entries and return incorrect defaults (1 unit, no thinking support).Either add the region-prefixed entries to BEDROCK_MODELS, strip the prefix in resolveBedrockModelId before returning, or normalize IDs in lookup functions.
🤖 Fix all issues with AI agents
In `@convex/apiKeys.ts`:
- Around line 273-276: The check in convex/apiKeys.ts now throws when
process.env.JWT_PRIVATE_KEY is missing, which will break verification of
existing API keys previously hashed with the old 'default-secret'; update the
code to handle a migration path: when verifying API keys (where you currently
reference secret / JWT_PRIVATE_KEY) attempt a legacy verification using the old
'default-secret' if the new secret fails, and on successful legacy verification
re-hash the key with the new JWT_PRIVATE_KEY and persist the new hash (or emit a
migration event), or alternatively provide a one-time migration script that
reads all stored hashes, re-computes them with the new secret and replaces them;
also ensure you log or surface that the change is a breaking migration in
release notes.
In `@convex/deploymentPackageGenerator.ts`:
- Around line 190-237: The generated Python scaffold in
generateBasicTestScaffold inserts unescaped values (agentName, tool names in
EXPECTED_TOOLS and tool docstrings) directly into string literals, risking
syntax errors or injection; fix by escaping these values for safe Python string
and triple-quote contexts before interpolation (use the existing
escapePythonString and escapePythonTripleQuote helpers), ensure EXPECTED_TOOLS
is populated with escaped/JSON-safe entries, replace direct "${agentName}" with
an escaped Python string, and use escaped tool names in toolTests/docstrings
while keeping safeName only for function identifiers.
In `@convex/http.ts`:
- Around line 201-206: The code computes message from rawInput and
JSON.stringify(args) but when args is undefined JSON.stringify(undefined)
returns undefined, so ensure message is always a string before calling
ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents). Update
the logic around rawInput/message (the variables named rawInput and message) to
coerce a safe string (e.g. if typeof rawInput === "string" use it, else use
JSON.stringify(args ?? {}) or fallback to an empty string) so that the message
argument passed with agent._id is never undefined.
In `@convex/lib/iterativeLoop.ts`:
- Around line 32-110: The JSDoc for CompletionCriteria.successPattern is
misleading: checkCompletionCriteria (and its use of response.content.includes)
treats successPattern as a plain substring, not a regex; update the JSDoc for
CompletionCriteria.successPattern to state "substring pattern" (or explicitly
"plain substring") and adjust the comment near checkCompletionCriteria to note
that successPattern is matched using String.prototype.includes; reference
CompletionCriteria.successPattern and the checkCompletionCriteria function when
making this small doc-only change.
In `@convex/lib/unifiedModalitySwitching.ts`:
- Around line 50-56: The avgCostPer1K function currently returns 0 on a
BEDROCK_MODELS miss which makes missing models appear free; update avgCostPer1K
to detect when BEDROCK_MODELS[modelId] is undefined, emit a warning (e.g.,
console.warn or your project logger) that includes the modelId, and return a
non-zero sentinel (preferably Number.POSITIVE_INFINITY or a very large value) so
selectUnifiedModel’s cost-based sorting does not prefer registry-missing models.
In `@convex/metaAgent.ts`:
- Around line 38-63: The buildModelDatabase function can produce NaN scores when
a model in BEDROCK_MODELS lacks costPer1MTokens or contextWindow; update
buildModelDatabase to validate those fields before computing scores: for each
[id, model] check that model.costPer1MTokens and a numeric
model.costPer1MTokens.output exist and are finite (skip the model if not), and
treat a missing or non-numeric model.contextWindow with a safe default (e.g., 0
or a configured fallback) before using Math.round; ensure totalScore is computed
only from validated numeric parts so selection and budget logic using totalScore
remain reliable.
In `@convex/modelRegistry.ts`:
- Around line 1117-1129: getModelThinkingConfig and getUnitsForModel currently
fail to match models when passed region-prefixed IDs; normalize the incoming
modelId before lookup by calling or mimicking resolveBedrockModelId (or strip
the leading region prefix like "us."/"eu." etc.) and use the normalized ID when
indexing BEDROCK_MODELS so lookups succeed; update getModelThinkingConfig (and
apply the same change to getUnitsForModel) to compute a lookupId =
resolveBedrockModelId(modelId) or remove the region segment, then use
BEDROCK_MODELS[lookupId] for capability and billing decisions.
In `@convex/realAgentTesting.ts`:
- Around line 26-31: The tools schema in the v.array object is missing the
extrasPip field, which makes the branch in generateFullRequirements that reads
tool.extrasPip dead; update the schema (the tools definition in this file) to
include extrasPip as an optional array of strings (matching other definitions
like in toolRegistry.ts and codeGenerator.ts) so tool.extrasPip is validated and
included when generateFullRequirements iterates tools.
In `@convex/strandsAgentExecution.ts`:
- Line 330: The code computes hasSkills as agentSkills.length > 0 &&
isAnthropicModel which silently drops skills for non‑Anthropic models; update
the logic to keep the check but also emit a clear warning when skills are
configured but the chosen model doesn't support tools: after the existing
hasSkills calculation in strandsAgentExecution.ts (reference symbols: hasSkills,
agentSkills, isAnthropicModel) add a conditional that when agentSkills.length >
0 && !isAnthropicModel calls the project's logger (e.g., processLogger.warn or
logger.warn) to note that skills will be ignored for the selected model and list
the configured skills to aid debugging.
- Line 550: The variable declaration for totalTokens should use const instead of
let since the binding is never reassigned; update the declaration of totalTokens
in strandsAgentExecution.ts (the object initialized as { inputTokens: 0,
outputTokens: 0, totalTokens: 0 }) from let to const — you can still mutate its
properties as needed, so replace the let binding with const for the variable
named totalTokens.
- Around line 100-118: The Bedrock detection logic in the isBedrock check is
missing several registered prefixes (e.g., deepseek, moonshot, qwen, minimax,
zhipu) and so can bypass gating; update isBedrock to reuse the canonical Bedrock
prefix list or utility instead of the hardcoded regex — import and use
BEDROCK_PREFIXES from modelRegistry (or call an existing helper like
isOllamaModelId/isBedrockModel if available) to test agent.model while still
honoring agent.deploymentType === "bedrock", and keep the existing bedrock
gating flow that calls requireBedrockAccessForUser with agentOwner and
agent.model.
In `@convex/swarmTestingOrchestrator.ts`:
- Around line 626-643: storeSwarm and getSwarm are stubs that cause createSwarm
to report success while nothing is persisted; fix by adding an in-memory
persistence map in this module (e.g., swarmStore: Map<string, SwarmDefinition>)
and implement storeSwarm to validate the incoming swarm object has an identifier
(e.g., swarm.id or swarm.swarmId), save it into swarmStore keyed by that id (or
throw if no id), and implement getSwarm to look up and return the stored
SwarmDefinition (or null if missing); also ensure createSwarm checks the result
of storeSwarm (propagate/throw on failure) rather than unconditionally returning
success so callers do not get a false-positive creation.
In `@convex/testExecution.ts`:
- Around line 184-190: The ad-hoc Ollama detection used to compute
effectiveDeploymentType is inconsistent with isOllamaModelId and can misclassify
testModelId; update the effectiveDeploymentType calculation so it uses
isOllamaModelId(args.testModelId) instead of the string heuristic (keep fallback
to agent.deploymentType when no testModelId), ensuring the variables
effectiveModel, effectiveDeploymentType and the call to
extractModelConfig(effectiveModel, effectiveDeploymentType) are derived
consistently with the rate-limiting logic that uses isOllamaModelId.
In `@convex/workflowExecutor.ts`:
- Around line 771-792: In evaluateExpression, the early
includes("success")/includes("error") checks can shadow operator logic (e.g.,
"status != success"); update the function to evaluate operator-based expressions
first (the !==/!= and ===/== branches using the regex splits) before the keyword
checks, or alternatively use word-boundary regexes when matching keywords (e.g.,
/\bsuccess\b/ and /\berror\b/) so operators with operands "success"/"error" are
parsed by the operator branches; ensure left/right resolution still uses
context[left] ?? context.result?.[left] and the same null/true/false coercion
logic in the operator handlers.
In `@src/App.tsx`:
- Around line 181-183: The onAgentsCreated callback passed to AIAgentBuilder is
a no-op so newly created agents are dropped and the user receives no feedback;
update the onAgentsCreated handler to call the existing onNavigate function and
navigate to the builder/dashboard route (or builder view) with the created agent
ID (received as the _agents parameter) so the UI transitions to the generated
agent (e.g., call onNavigate(...) with the route that includes the agent ID or a
query param); ensure AIAgentBuilder's onAgentsCreated uses the actual agent id
value rather than ignoring _agents.
In `@src/components/AgentBuilder.tsx`:
- Around line 155-162: The client-side saving schema allows agent names up to
200 chars while validateAgentConfig enforces 100, causing server rejections;
update the saveSchema (the z.object declared as saveSchema in AgentBuilder.tsx)
to set name: z.string().min(1, "Agent name is required").max(100, "Agent name
too long") so the client validation matches validateAgentConfig and prevents
mismatched acceptance.
🧹 Nitpick comments (18)
convex/swarmTestingOrchestrator.ts (3)
655-656: Consider replacingv.any()with a more constrained union orv.objectwith optional fields.
v.any()on internal action args disables all runtime validation, so malformed tool invocations will propagate silently. Even a loose schema (e.g.,v.object({ agents: v.optional(v.array(v.any())), strategy: v.optional(v.string()), ... })) would catch structural mismatches early.Same applies to line 628 and line 740.
93-93:substris deprecated; usesubstringinstead.
String.prototype.substris deprecated per the ECMAScript spec. Replace with.substring(2, 11).
900-900:ctx: anyloses type safety for the Convex action context.Consider typing
ctxwith the appropriate Convex context type (e.g.,ActionCtx) to get compile-time checks onctx.runQuerycalls..github/workflows/setup-github-projects_Version3.yml (1)
25-25: Hardcoded magicPROJECT_IDduplicated across two steps — consider a job-level env and add a comment.
"4"is a magic number with no explanation of which project it refers to. If the project is ever recreated, both steps silently target the wrong (or nonexistent) project. Two improvements:
- Define it once at the job level to keep things DRY.
- Add a brief comment so future maintainers know what project
4is.♻️ Suggested refactor
jobs: setup-project: runs-on: ubuntu-latest + env: + # GitHub Projects V2 project number for the main backlog board + PROJECT_ID: "4" steps: - name: Checkout uses: actions/checkout@v4 - name: Backfill existing open issues env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} - PROJECT_ID: "4" run: | ... - name: Add new issues/PRs to project if: github.event_name == 'issues' || github.event_name == 'pull_request' env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} - PROJECT_ID: "4" run: | ...Also applies to: 55-55
convex/authDebug.ts (1)
31-35: Hardcoded fallback URLs will silently produce wrong callback URLs in new deployments.The env-var approach is a good improvement, but the fallback values (
resolute-kudu-325,633051e6) are specific to a single deployment. In a fresh environment where none of these env vars are set, the returned OAuth callback URLs will point to stale/incorrect endpoints with no visible warning.Consider either logging a warning when a fallback is used, or dropping the fallbacks entirely and omitting an entry when the corresponding env var is not set.
convex/lib/dynamicModelSwitching.ts (2)
69-89:sonnetandsonnet45tiers are identical in every field exceptcapabilityRating.Both tiers reference
SONNET_ID, so they share the samemodelId, cost, andmaxTokens. The dynamic routing between complexity scores 30–59 and 60+ will select different tiers but invoke the exact same model at the same cost. The 3-tier escalation is effectively 2-tier.If this is a temporary state until a distinct "sonnet45" model ID is available, consider adding a
// TODOto make that clear. Otherwise, thesonnet45tier entry is dead weight that could confuse future maintainers.
33-43:costPer1Kfalls back to zero cost — could silently mask billing issues.If a model ID isn't found in
BEDROCK_MODELS, the function returns{ input: 0, output: 0 }. While the comment says this shouldn't happen, a warning log would help catch misconfiguration early.Optional: add a warning
// Fallback — should not happen for models listed below + console.warn(`costPer1K: model "${modelId}" not found in BEDROCK_MODELS — defaulting to 0`); return { input: 0, output: 0 };convex/queueProcessor.ts (1)
12-12:parseInton env var without radix and noNaNguard.
parseInt(process.env.MAX_CONCURRENT_TESTS || "10")will correctly fall back to"10"when the env var is unset, but if the env var is set to a non-numeric string (e.g.,"abc"),parseIntreturnsNaN, which would causerunningCount >= MAX_CONCURRENT_TESTSto always befalse, effectively removing the concurrency limit.This is a pre-existing issue, but worth a quick guard since this file is being updated.
Optional: guard against NaN
-const MAX_CONCURRENT_TESTS = parseInt(process.env.MAX_CONCURRENT_TESTS || "10"); +const MAX_CONCURRENT_TESTS = parseInt(process.env.MAX_CONCURRENT_TESTS || "10", 10) || 10;convex/deploymentPackageGenerator.ts (1)
215-218:assert len(EXPECTED_TOOLS) >= 0always passes — list length is never negative.The comment acknowledges "Some agents may have no tools," but the assertion is a no-op. If the intent is simply a presence check, consider making it a descriptive pass or removing the assertion.
convex/agentcoreTestExecution.ts (2)
455-461: Two statements on one line reduce readability.Line 457 combines
JSON.parse(...)assignment andreturn {on a single line, making it easy to miss the return.Proposed fix
- const errorPayload = JSON.parse( rawPayload ); return { + const errorPayload = JSON.parse( rawPayload ); + return {
279-285: Timeout timer not cleared on success for both Bedrock and LambdaPromise.racecalls.When the Bedrock call resolves before the timeout, the
setTimeoutcallback remains scheduled. While it won't cause a crash (the rejected promise is ignored), it keeps the timer handle alive unnecessarily. The same applies to the Lambda call at lines 448–453. Consider wrapping the race in anAbortControllerpattern or clearing the timer, similar to how the Ollama path (lines 367–386) correctly clears it.Example fix for Bedrock (apply similarly to Lambda)
- const response = await Promise.race( [ - client.send( command ), - new Promise<never>( ( _, reject ) => - setTimeout( () => reject( new Error( "Bedrock timeout" ) ), 60000 ) - ), - ] ); + let timeoutId: ReturnType<typeof setTimeout>; + const response = await Promise.race( [ + client.send( command ), + new Promise<never>( ( _, reject ) => { + timeoutId = setTimeout( () => reject( new Error( "Bedrock timeout" ) ), 60000 ); + } ), + ] ).finally( () => clearTimeout( timeoutId ) );convex/cloudFormationGenerator.ts (1)
71-80:TemplateConfiginterface still usesany[]for tools, losing the type safety added at the action boundary.Consider updating the interface to match the typed shape from the action args, so downstream code benefits from the tighter typing as well.
Proposed fix
interface TemplateConfig { agentName: string; model: string; - tools: any[]; + tools: Array<{ + name: string; + type: string; + requiresPip?: boolean; + pipPackages?: string[]; + }>; region: string; environment: string; - vpcConfig: any; - monitoring: any; - scaling: any; + vpcConfig: { createVpc: boolean; vpcCidr?: string; availabilityZones?: string[] }; + monitoring: { enableXRay: boolean; enableCloudWatch: boolean; logRetentionDays?: number }; + scaling: { minCapacity?: number; maxCapacity?: number; targetCpuUtilization?: number }; }convex/mcpFileUpload.ts (1)
72-72: Good: env values narrowed tostring.Note that
uploadMCPConfig(line 49) also writesenvfrom parsed JSON without validating that values are strings. If the schema enforcesRecord<string, string>, consider adding validation inuploadMCPConfigas well to prevent inconsistent data.src/components/AgentBuilderWorkflowUI.tsx (1)
32-66: New workflow stages added — ensure backend alignment.The stage-to-result mapping at line 107 (
result.workflow[index]?.output) relies on positional alignment betweenWORKFLOW_STAGESand the backend'sresult.workflowarray. If the backend doesn't return results forast_analysis,test_generation, orverificationat the matching indices, outputs will be misattributed.Consider keying results by stage
nameinstead of array index for robustness.convex/tools.ts (1)
49-53: Mid-file import statement.The
import { safeJsonParse }on Line 53 sits inside a comment block between function definitions. While this works in TypeScript (imports are hoisted), placing it with the other imports at the top of the file would improve readability.Suggested: move import to top of file
import { isOllamaModelId } from "./modelRegistry"; +import { safeJsonParse } from "./lib/jsonUtils";Then remove Lines 49–53 (the mid-file comment block and import).
convex/strandsAgentExecutionDynamic.ts (1)
152-163: Model switch could bypass the Bedrock gate for the actually-executed model.The Bedrock gate at Line 133 checks
agent.model, butexecuteWithModelSwitching(Line 157) may select a different model viadecideModelSwitch. If the switched model is in a different model family or tier, the gate wouldn't have checked it. This is mitigated bydecideModelSwitchbeing tier-aware, but the defense-in-depth gap is worth noting.convex/automatedAgentBuilder.ts (1)
264-264:buildSystemPromptandanalyzeAndAskNextuse pervasiveanytypes.
requirements: any,conversationHistory: any[], andrequirements: anyin the return type weaken the type safety gained by the structured validators inupdateBuildSession. Consider defining a sharedAgentRequirementsinterface and reusing it across both the mutation args and these helper functions.convex/strandsAgentExecution.ts (1)
180-210: Remove unnecessaryas anycast —skillsandthinkingLevelare now in the agents schema.Both fields are defined in convex/schema.ts as optional properties:
skills(line 57): optional array with skillId, name, enabledthinkingLevel(line 64): optional union of "low" | "medium" | "high"Since
AgentDocis typed asDoc<"agents">, the cast on line 182 can be removed once Convex codegen is re-run. Access these fields directly withoutas anyto preserve type safety.
| const secret = process.env.JWT_PRIVATE_KEY; | ||
| if (!secret) { | ||
| throw new Error("JWT_PRIVATE_KEY environment variable is required for API key hashing"); | ||
| } |
There was a problem hiding this comment.
Good hardening — but consider the migration path for keys hashed with the old default.
Removing the 'default-secret' fallback is the right call. However, if any existing environment was previously running without JWT_PRIVATE_KEY set, all previously hashed API keys were stored using 'default-secret' as the salt. Once the env var is set (or if it was already missing), those stored hashes will never match again, silently invalidating every existing key.
Consider whether a one-time migration or re-hash step is needed, or at minimum document this as a breaking change in the PR/release notes.
🤖 Prompt for AI Agents
In `@convex/apiKeys.ts` around lines 273 - 276, The check in convex/apiKeys.ts now
throws when process.env.JWT_PRIVATE_KEY is missing, which will break
verification of existing API keys previously hashed with the old
'default-secret'; update the code to handle a migration path: when verifying API
keys (where you currently reference secret / JWT_PRIVATE_KEY) attempt a legacy
verification using the old 'default-secret' if the new secret fails, and on
successful legacy verification re-hash the key with the new JWT_PRIVATE_KEY and
persist the new hash (or emit a migration event), or alternatively provide a
one-time migration script that reads all stored hashes, re-computes them with
the new secret and replaces them; also ensure you log or surface that the change
is a breaking migration in release notes.
| function generateBasicTestScaffold(agent: any): string { | ||
| const agentName = agent.name || "Agent"; | ||
| const tools = (agent.tools || []).map((t: any) => t.name); | ||
| const toolTests = tools.map((toolName: string) => { | ||
| const safeName = toolName.replace(/[^a-zA-Z0-9]/g, "_"); | ||
| const escaped = escapePythonString(toolName); | ||
| return `def test_tool_${safeName}_exists(): | ||
| """Verify ${safeName} tool is registered.""" | ||
| assert "${escaped}" in EXPECTED_TOOLS`; | ||
| }).join("\n\n"); | ||
|
|
||
| return `""" | ||
| Auto-generated test scaffold for ${agentName}. | ||
| Run with: pytest test_agent.py -v | ||
| """ | ||
| import pytest | ||
|
|
||
| # Expected tools from agent configuration | ||
| EXPECTED_TOOLS = ${JSON.stringify(tools)} | ||
|
|
||
| def test_agent_has_system_prompt(): | ||
| """Verify agent has a non-empty system prompt.""" | ||
| system_prompt = """${(() => { let s = escapePythonTripleQuote(agent.systemPrompt || "").slice(0, 500); if (s.endsWith("\\")) s = s.slice(0, -1); return s; })()}""" | ||
| assert len(system_prompt.strip()) > 0 | ||
|
|
||
| def test_agent_has_tools(): | ||
| """Verify agent has at least one tool configured.""" | ||
| assert len(EXPECTED_TOOLS) >= 0 # Some agents may have no tools | ||
|
|
||
| ${toolTests || "# No tool-specific tests (agent has no tools configured)"} | ||
|
|
||
| def test_agent_name(): | ||
| """Verify agent name is set.""" | ||
| assert "${agentName}" != "" | ||
|
|
||
| def test_requirements_parseable(): | ||
| """Verify requirements.txt can be parsed.""" | ||
| import pathlib | ||
| req_file = pathlib.Path(__file__).parent / "requirements.txt" | ||
| if req_file.exists(): | ||
| lines = req_file.read_text().strip().split("\\n") | ||
| for line in lines: | ||
| line = line.strip() | ||
| if line and not line.startswith("#"): | ||
| # Basic check: each line should be a valid package spec | ||
| assert len(line) > 0 | ||
| `; | ||
| } |
There was a problem hiding this comment.
Python injection risk: agentName is interpolated without escaping.
Line 223 embeds agentName directly into a Python string literal:
assert "${agentName}" != ""If agentName contains a double-quote (e.g., My "Agent"), the generated Python will have a syntax error. More importantly, tool names on line 194 could have the same issue — safeName sanitizes for function names but not for the docstring on line 197.
Proposed fix for agentName
def test_agent_name():
"""Verify agent name is set."""
- assert "${agentName}" != ""
+ assert "${escapePythonString(agentName)}" != ""🤖 Prompt for AI Agents
In `@convex/deploymentPackageGenerator.ts` around lines 190 - 237, The generated
Python scaffold in generateBasicTestScaffold inserts unescaped values
(agentName, tool names in EXPECTED_TOOLS and tool docstrings) directly into
string literals, risking syntax errors or injection; fix by escaping these
values for safe Python string and triple-quote contexts before interpolation
(use the existing escapePythonString and escapePythonTripleQuote helpers),
ensure EXPECTED_TOOLS is populated with escaped/JSON-safe entries, replace
direct "${agentName}" with an escaped Python string, and use escaped tool names
in toolTests/docstrings while keeping safeName only for function identifiers.
| const rawInput = (args as Record<string, unknown>)?.input; | ||
| const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args); | ||
|
|
||
| const result = await ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, { | ||
| agentId: agent._id, | ||
| message: args?.input || JSON.stringify(args), | ||
| message, |
There was a problem hiding this comment.
message can be undefined when arguments is omitted.
When the request body has no arguments field (valid per MCP spec — line 122 allows it), args is undefined. Then:
(undefined as Record<string, unknown>)?.input→undefinedJSON.stringify(undefined)→undefined(the primitive, not a string)
So message is undefined, which is passed to executeAgentWithStrandsAgents. This likely violates the expected string type of the message parameter.
Proposed fix
const rawInput = (args as Record<string, unknown>)?.input;
- const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args);
+ const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args ?? {});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const rawInput = (args as Record<string, unknown>)?.input; | |
| const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args); | |
| const result = await ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, { | |
| agentId: agent._id, | |
| message: args?.input || JSON.stringify(args), | |
| message, | |
| const rawInput = (args as Record<string, unknown>)?.input; | |
| const message = typeof rawInput === "string" ? rawInput : JSON.stringify(args ?? {}); | |
| const result = await ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents, { | |
| agentId: agent._id, | |
| message, |
🤖 Prompt for AI Agents
In `@convex/http.ts` around lines 201 - 206, The code computes message from
rawInput and JSON.stringify(args) but when args is undefined
JSON.stringify(undefined) returns undefined, so ensure message is always a
string before calling
ctx.runAction(api.strandsAgentExecution.executeAgentWithStrandsAgents). Update
the logic around rawInput/message (the variables named rawInput and message) to
coerce a safe string (e.g. if typeof rawInput === "string" use it, else use
JSON.stringify(args ?? {}) or fallback to an empty string) so that the message
argument passed with agent._id is never undefined.
| export interface CompletionCriteria { | ||
| type: CompletionCriteriaType; | ||
| /** Regex or substring pattern that indicates success (for tests_pass). */ | ||
| successPattern?: string; | ||
| /** Regex or substring pattern that indicates errors (for no_errors). */ | ||
| errorPatterns?: string[]; | ||
| } | ||
|
|
||
| export interface CompletionCheckResult { | ||
| isComplete: boolean; | ||
| reason?: string; | ||
| } | ||
|
|
||
| export interface IterationResult { | ||
| iteration: number; | ||
| content: string; | ||
| success: boolean; | ||
| tokenUsage?: { inputTokens: number; outputTokens: number; totalTokens: number }; | ||
| } | ||
|
|
||
| // ─── Default error patterns ──────────────────────────────────────────────── | ||
|
|
||
| const DEFAULT_ERROR_PATTERNS = [ | ||
| "Error:", | ||
| "error:", | ||
| "ERROR:", | ||
| "failed", | ||
| "FAILED", | ||
| "Exception:", | ||
| "Traceback", | ||
| "SyntaxError", | ||
| "TypeError", | ||
| "ReferenceError", | ||
| ]; | ||
|
|
||
| /** | ||
| * Check whether fail indicators match independently of pass indicators. | ||
| * Returns true if there is at least one fail match NOT contained within a pass match. | ||
| * e.g. "0 failed" contains "failed", but that "failed" is part of "0 failed" (a pass indicator). | ||
| */ | ||
| function hasIndependentFailIndicator( | ||
| lower: string, | ||
| passIndicators: string[], | ||
| failIndicators: string[], | ||
| ): boolean { | ||
| return failIndicators.some( f => { | ||
| const idx = lower.indexOf( f ); | ||
| if ( idx === -1 ) return false; | ||
| return !passIndicators.some( pass => { | ||
| const passIdx = lower.indexOf( pass ); | ||
| return passIdx !== -1 && idx >= passIdx && idx < passIdx + pass.length; | ||
| } ); | ||
| } ); | ||
| } | ||
|
|
||
| // ─── Core Functions ───────────────────────────────────────────────────────── | ||
|
|
||
| /** | ||
| * Check whether the agent's response meets the completion criteria. | ||
| * | ||
| * @param criteria - The criteria to check against. | ||
| * @param response - The agent's last response. | ||
| * @returns - Whether the loop should stop. | ||
| */ | ||
| export function checkCompletionCriteria( | ||
| criteria: CompletionCriteria, | ||
| response: { success: boolean; content: string }, | ||
| ): CompletionCheckResult { | ||
| switch ( criteria.type ) { | ||
| case "tests_pass": { | ||
| if ( !response.success ) { | ||
| return { isComplete: false, reason: "Agent execution failed" }; | ||
| } | ||
| if ( criteria.successPattern ) { | ||
| const found = response.content.includes( criteria.successPattern ); | ||
| return found | ||
| ? { isComplete: true, reason: `Success pattern matched: "${criteria.successPattern}"` } | ||
| : { isComplete: false, reason: `Success pattern not found: "${criteria.successPattern}"` }; | ||
| } |
There was a problem hiding this comment.
Clarify successPattern semantics (substring vs regex).
The JSDoc says “Regex or substring,” but the code uses includes only. Either update the doc to “substring” or add regex support to avoid confusion.
🛠️ Suggested fix (doc clarification)
- /** Regex or substring pattern that indicates success (for tests_pass). */
+ /** Substring pattern that indicates success (for tests_pass). */🤖 Prompt for AI Agents
In `@convex/lib/iterativeLoop.ts` around lines 32 - 110, The JSDoc for
CompletionCriteria.successPattern is misleading: checkCompletionCriteria (and
its use of response.content.includes) treats successPattern as a plain
substring, not a regex; update the JSDoc for CompletionCriteria.successPattern
to state "substring pattern" (or explicitly "plain substring") and adjust the
comment near checkCompletionCriteria to note that successPattern is matched
using String.prototype.includes; reference CompletionCriteria.successPattern and
the checkCompletionCriteria function when making this small doc-only change.
| function avgCostPer1K(modelId: string): number { | ||
| const meta = BEDROCK_MODELS[modelId]; | ||
| if (meta?.costPer1MTokens) { | ||
| return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000; | ||
| } | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Silent zero-cost fallback could break cost-based model routing.
avgCostPer1K returns 0 when a model ID isn't found in the registry. Since selectUnifiedModel sorts by costPer1KTokensOrUnit ascending for freemium users and low-complexity requests, a registry miss would make that model appear free and always be selected. Consider logging a warning or using a non-zero sentinel so cost routing doesn't silently degrade.
Suggested: add a warning for missing models
function avgCostPer1K(modelId: string): number {
const meta = BEDROCK_MODELS[modelId];
if (meta?.costPer1MTokens) {
return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000;
}
+ console.warn(`avgCostPer1K: model "${modelId}" not found in BEDROCK_MODELS, defaulting to 0`);
return 0;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function avgCostPer1K(modelId: string): number { | |
| const meta = BEDROCK_MODELS[modelId]; | |
| if (meta?.costPer1MTokens) { | |
| return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000; | |
| } | |
| return 0; | |
| } | |
| function avgCostPer1K(modelId: string): number { | |
| const meta = BEDROCK_MODELS[modelId]; | |
| if (meta?.costPer1MTokens) { | |
| return (meta.costPer1MTokens.input + meta.costPer1MTokens.output) / 2 / 1000; | |
| } | |
| console.warn(`avgCostPer1K: model "${modelId}" not found in BEDROCK_MODELS, defaulting to 0`); | |
| return 0; | |
| } |
🤖 Prompt for AI Agents
In `@convex/lib/unifiedModalitySwitching.ts` around lines 50 - 56, The
avgCostPer1K function currently returns 0 on a BEDROCK_MODELS miss which makes
missing models appear free; update avgCostPer1K to detect when
BEDROCK_MODELS[modelId] is undefined, emit a warning (e.g., console.warn or your
project logger) that includes the modelId, and return a non-zero sentinel
(preferably Number.POSITIVE_INFINITY or a very large value) so
selectUnifiedModel’s cost-based sorting does not prefer registry-missing models.
| export const storeSwarm = internalAction({ | ||
| args: { | ||
| swarm: v.any(), | ||
| swarm: v.any(), // v.any(): swarm definition structure varies by topology and communication protocol | ||
| }, | ||
| handler: async (ctx, args) => { | ||
| handler: async (_ctx, _args) => { | ||
| // Store swarm definition (in-memory for now) | ||
| console.log("Storing swarm:", args.swarm); | ||
| // TODO: Persist to swarmDefinitions table | ||
| }, | ||
| }); | ||
|
|
||
| export const getSwarm = internalQuery({ | ||
| args: { | ||
| swarmId: v.string(), | ||
| }, | ||
| handler: async (ctx, args): Promise<SwarmDefinition | null> => { | ||
| // Retrieve swarm definition (mock implementation) | ||
| console.log("Retrieving swarm:", args.swarmId); | ||
| handler: async (_ctx, _args): Promise<SwarmDefinition | null> => { | ||
| // Retrieve swarm definition (stub — needs swarmDefinitions table) | ||
| return null; | ||
| }, |
There was a problem hiding this comment.
storeSwarm silently discards data while createSwarm reports success — every subsequent swarm operation will fail.
createSwarm (line 107) calls storeSwarm, which is now a no-op, then returns success: true with a swarmId. Any follow-up call (sendMessageToSwarm, switchToAgent, testSwarmCoordination, etc.) hits getSwarm which always returns null, resulting in "Swarm not found". The user gets a successful creation response but a broken swarm.
At minimum, createSwarm should surface that persistence is stubbed, or these stubs should throw so callers don't silently succeed.
🤖 Prompt for AI Agents
In `@convex/swarmTestingOrchestrator.ts` around lines 626 - 643, storeSwarm and
getSwarm are stubs that cause createSwarm to report success while nothing is
persisted; fix by adding an in-memory persistence map in this module (e.g.,
swarmStore: Map<string, SwarmDefinition>) and implement storeSwarm to validate
the incoming swarm object has an identifier (e.g., swarm.id or swarm.swarmId),
save it into swarmStore keyed by that id (or throw if no id), and implement
getSwarm to look up and return the stored SwarmDefinition (or null if missing);
also ensure createSwarm checks the result of storeSwarm (propagate/throw on
failure) rather than unconditionally returning success so callers do not get a
false-positive creation.
| // Determine model provider and config — use testModelId override if provided, | ||
| // so the user can design with one model and test with another. | ||
| const effectiveModel = args.testModelId || agent.model; | ||
| const effectiveDeploymentType = args.testModelId | ||
| ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType ) | ||
| : agent.deploymentType; | ||
| const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType); |
There was a problem hiding this comment.
Inconsistent Ollama detection: ad-hoc heuristic vs. centralized isOllamaModelId.
Line 91 uses isOllamaModelId(args.testModelId) (the new centralized registry-backed function) to derive testDeployType, but Line 188 uses the old ad-hoc heuristic args.testModelId.includes(":") && !args.testModelId.includes(".") to derive effectiveDeploymentType. These can produce different results for the same testModelId — for example, models like "deepseek-v3" or "mistral-nemo" (no colon, no dot) would be classified as non-Ollama by Line 188 but as Ollama by isOllamaModelId via the OLLAMA_MODELS registry fallback.
This means extractModelConfig on Line 190 could route to Bedrock when the rate-limiting gate (Line 96) correctly treated it as Ollama, causing a mismatch between gating and actual execution config.
🐛 Proposed fix: use isOllamaModelId consistently
const effectiveModel = args.testModelId || agent.model;
const effectiveDeploymentType = args.testModelId
- ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType )
+ ? ( isOllamaModelId( args.testModelId ) ? "ollama" : agent.deploymentType )
: agent.deploymentType;
const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Determine model provider and config — use testModelId override if provided, | |
| // so the user can design with one model and test with another. | |
| const effectiveModel = args.testModelId || agent.model; | |
| const effectiveDeploymentType = args.testModelId | |
| ? ( args.testModelId.includes( ":" ) && !args.testModelId.includes( "." ) ? "ollama" : agent.deploymentType ) | |
| : agent.deploymentType; | |
| const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType); | |
| // Determine model provider and config — use testModelId override if provided, | |
| // so the user can design with one model and test with another. | |
| const effectiveModel = args.testModelId || agent.model; | |
| const effectiveDeploymentType = args.testModelId | |
| ? ( isOllamaModelId( args.testModelId ) ? "ollama" : agent.deploymentType ) | |
| : agent.deploymentType; | |
| const { modelProvider, modelConfig } = extractModelConfig(effectiveModel, effectiveDeploymentType); |
🤖 Prompt for AI Agents
In `@convex/testExecution.ts` around lines 184 - 190, The ad-hoc Ollama detection
used to compute effectiveDeploymentType is inconsistent with isOllamaModelId and
can misclassify testModelId; update the effectiveDeploymentType calculation so
it uses isOllamaModelId(args.testModelId) instead of the string heuristic (keep
fallback to agent.deploymentType when no testModelId), ensuring the variables
effectiveModel, effectiveDeploymentType and the call to
extractModelConfig(effectiveModel, effectiveDeploymentType) are derived
consistently with the rate-limiting logic that uses isOllamaModelId.
| function evaluateExpression( expression: string, context: Record<string, any> ): boolean { | ||
| // Simple checks for now (TODO: use proper safe evaluator) | ||
| if ( expression.includes( "success" ) ) { | ||
| return context.success === true; | ||
| } | ||
| if ( expression.includes( "error" ) ) { | ||
| return context.result?.error !== undefined; | ||
| } | ||
| if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) { | ||
| const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() ); | ||
| const leftValue = context[left] ?? context.result?.[left]; | ||
| const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right; | ||
| return leftValue !== rightValue; | ||
| } | ||
| if ( expression.includes( "==" ) || expression.includes( "===" ) ) { | ||
| // Very basic equality check | ||
| const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() ); | ||
| const leftValue = context[left] || context.result?.[left]; | ||
| const leftValue = context[left] ?? context.result?.[left]; | ||
| const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right; | ||
| return leftValue === rightValue; | ||
| } | ||
|
|
||
| // Default false for safety | ||
| // Default false for safety — unrecognized expressions are rejected | ||
| return false; |
There was a problem hiding this comment.
Expression keyword checks shadow operator-based evaluation.
If an expression contains the word "success" or "error" and an operator like != or ==, the keyword check on lines 772–777 takes precedence and the operator is never evaluated. For example, "status != success" would hit the includes("success") branch and return context.success === true instead of performing an inequality comparison.
Consider checking operators first (they are more specific), or using a more structured parse that distinguishes keywords from operands.
Proposed reordering
function evaluateExpression( expression: string, context: Record<string, any> ): boolean {
+ // Check operators first (more specific) before keyword-only checks
+ if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
+ const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() );
+ const leftValue = context[left] ?? context.result?.[left];
+ const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
+ return leftValue !== rightValue;
+ }
+ if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
+ const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() );
+ const leftValue = context[left] ?? context.result?.[left];
+ const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right;
+ return leftValue === rightValue;
+ }
if ( expression.includes( "success" ) ) {
return context.success === true;
}
if ( expression.includes( "error" ) ) {
return context.result?.error !== undefined;
}
- if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) {
- ...
- }
- if ( expression.includes( "==" ) || expression.includes( "===" ) ) {
- ...
- }
return false;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function evaluateExpression( expression: string, context: Record<string, any> ): boolean { | |
| // Simple checks for now (TODO: use proper safe evaluator) | |
| if ( expression.includes( "success" ) ) { | |
| return context.success === true; | |
| } | |
| if ( expression.includes( "error" ) ) { | |
| return context.result?.error !== undefined; | |
| } | |
| if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) { | |
| const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() ); | |
| const leftValue = context[left] ?? context.result?.[left]; | |
| const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right; | |
| return leftValue !== rightValue; | |
| } | |
| if ( expression.includes( "==" ) || expression.includes( "===" ) ) { | |
| // Very basic equality check | |
| const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() ); | |
| const leftValue = context[left] || context.result?.[left]; | |
| const leftValue = context[left] ?? context.result?.[left]; | |
| const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right; | |
| return leftValue === rightValue; | |
| } | |
| // Default false for safety | |
| // Default false for safety — unrecognized expressions are rejected | |
| return false; | |
| function evaluateExpression( expression: string, context: Record<string, any> ): boolean { | |
| // Check operators first (more specific) before keyword-only checks | |
| if ( expression.includes( "!=" ) || expression.includes( "!==" ) ) { | |
| const [left, right] = expression.split( /!==?/ ).map( ( s ) => s.trim() ); | |
| const leftValue = context[left] ?? context.result?.[left]; | |
| const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right; | |
| return leftValue !== rightValue; | |
| } | |
| if ( expression.includes( "==" ) || expression.includes( "===" ) ) { | |
| const [left, right] = expression.split( /===?/ ).map( ( s ) => s.trim() ); | |
| const leftValue = context[left] ?? context.result?.[left]; | |
| const rightValue = right === "null" ? null : right === "true" ? true : right === "false" ? false : right; | |
| return leftValue === rightValue; | |
| } | |
| if ( expression.includes( "success" ) ) { | |
| return context.success === true; | |
| } | |
| if ( expression.includes( "error" ) ) { | |
| return context.result?.error !== undefined; | |
| } | |
| // Default false for safety — unrecognized expressions are rejected | |
| return false; | |
| } |
🤖 Prompt for AI Agents
In `@convex/workflowExecutor.ts` around lines 771 - 792, In evaluateExpression,
the early includes("success")/includes("error") checks can shadow operator logic
(e.g., "status != success"); update the function to evaluate operator-based
expressions first (the !==/!= and ===/== branches using the regex splits) before
the keyword checks, or alternatively use word-boundary regexes when matching
keywords (e.g., /\bsuccess\b/ and /\berror\b/) so operators with operands
"success"/"error" are parsed by the operator branches; ensure left/right
resolution still uses context[left] ?? context.result?.[left] and the same
null/true/false coercion logic in the operator handlers.
| return <AIAgentBuilder onAgentsCreated={(_agents) => { | ||
| // TODO: Navigate to builder with pre-filled agents | ||
| }} />; |
There was a problem hiding this comment.
No-op callback silently drops the generated agent — user gets no navigation feedback.
After AIAgentBuilder successfully generates an agent, it invokes onAgentsCreated with the new agent ID. The current no-op means the user stays on the same page with no visible result. Since onNavigate is already in scope, you could at least navigate to the builder/dashboard:
💡 Minimal fix to navigate after creation
- return <AIAgentBuilder onAgentsCreated={(_agents) => {
- // TODO: Navigate to builder with pre-filled agents
- }} />;
+ return <AIAgentBuilder onAgentsCreated={(_agents) => {
+ // TODO: Pre-fill builder with generated agents
+ onNavigate("builder");
+ }} />;Would you like me to open an issue to track implementing the full pre-fill behavior?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return <AIAgentBuilder onAgentsCreated={(_agents) => { | |
| // TODO: Navigate to builder with pre-filled agents | |
| }} />; | |
| return <AIAgentBuilder onAgentsCreated={(_agents) => { | |
| // TODO: Pre-fill builder with generated agents | |
| onNavigate("builder"); | |
| }} />; |
🤖 Prompt for AI Agents
In `@src/App.tsx` around lines 181 - 183, The onAgentsCreated callback passed to
AIAgentBuilder is a no-op so newly created agents are dropped and the user
receives no feedback; update the onAgentsCreated handler to call the existing
onNavigate function and navigate to the builder/dashboard route (or builder
view) with the created agent ID (received as the _agents parameter) so the UI
transitions to the generated agent (e.g., call onNavigate(...) with the route
that includes the agent ID or a query param); ensure AIAgentBuilder's
onAgentsCreated uses the actual agent id value rather than ignoring _agents.
| // Validate config before saving | ||
| const saveSchema = z.object( { | ||
| name: z.string().min( 1, "Agent name is required" ).max( 200, "Agent name too long" ), | ||
| description: z.string().max( 2000, "Description too long" ), | ||
| model: z.string().min( 1, "Model selection is required" ), | ||
| systemPrompt: z.string().max( 50000, "System prompt too long" ), | ||
| deploymentType: z.string().min( 1 ), | ||
| } ); |
There was a problem hiding this comment.
Align client name length validation with backend (100 chars).
validateAgentConfig enforces 100 chars, but the client allows 200, leading to server-side rejection after UI acceptance.
🛠️ Suggested fix
- name: z.string().min( 1, "Agent name is required" ).max( 200, "Agent name too long" ),
+ name: z.string().min( 1, "Agent name is required" ).max( 100, "Agent name too long" ),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Validate config before saving | |
| const saveSchema = z.object( { | |
| name: z.string().min( 1, "Agent name is required" ).max( 200, "Agent name too long" ), | |
| description: z.string().max( 2000, "Description too long" ), | |
| model: z.string().min( 1, "Model selection is required" ), | |
| systemPrompt: z.string().max( 50000, "System prompt too long" ), | |
| deploymentType: z.string().min( 1 ), | |
| } ); | |
| // Validate config before saving | |
| const saveSchema = z.object( { | |
| name: z.string().min( 1, "Agent name is required" ).max( 100, "Agent name too long" ), | |
| description: z.string().max( 2000, "Description too long" ), | |
| model: z.string().min( 1, "Model selection is required" ), | |
| systemPrompt: z.string().max( 50000, "System prompt too long" ), | |
| deploymentType: z.string().min( 1 ), | |
| } ); |
🤖 Prompt for AI Agents
In `@src/components/AgentBuilder.tsx` around lines 155 - 162, The client-side
saving schema allows agent names up to 200 chars while validateAgentConfig
enforces 100, causing server rejections; update the saveSchema (the z.object
declared as saveSchema in AgentBuilder.tsx) to set name: z.string().min(1,
"Agent name is required").max(100, "Agent name too long") so the client
validation matches validateAgentConfig and prevents mismatched acceptance.
…thinking capabilities
thinkingApiPathtoModelMetadatafor reasoning invocation paths.getModelThinkingConfigto handle reasoning configurations.Summary by CodeRabbit
New Features
Improvements
Documentation