Skip to content

Conversation

@sroussey
Copy link
Collaborator

@sroussey sroussey commented Dec 20, 2025

Introduce ModelConfig and ModelRecord schemas, allowing AiTask to accept full model configurations directly and enabling job workers to run without a model repository.

The original AiTask input schema suggested it could take a ModelConfig object, but the implementation only supported a string identifier, requiring a model repository lookup at runtime. This change aligns the implementation with the schema and decouples job execution from the model repository by embedding the necessary ModelConfig directly into job payloads. ModelConfig is a lighter schema for task inputs, while ModelRecord is used for repository persistence.


Open in Cursor Open in Web


Note

Allow passing full ModelConfig in task/job inputs (no repo lookup), introduce ModelConfigSchema vs ModelRecordSchema, update provider run fns and repositories accordingly.

  • Models:
    • Introduce ModelConfigSchema (for task/job inputs) and ModelRecordSchema (for persistence); export ModelConfig/ModelRecord.
    • Repositories (InMemoryModelRepository, ModelRepository, test bindings) now use ModelRecordSchema.
  • Task/Job Execution:
    • AiTask/AiVisionTask accept string | ModelConfig, resolve inline configs without repository, and embed model config into AiJobInput.
    • Validation/narrowing updated to handle inline configs and task compatibility; default queue derives from model.provider when config given.
    • AiJob passes ModelConfig directly to run funcs; no repository lookup.
    • AiProviderRegistry types updated to use ModelConfig; worker wrapper forwards [input, model].
  • Providers:
    • HF Transformers and TF MediaPipe job run fns now typed with HfTransformersOnnxModelConfig/TFMPModelConfig.
    • HF pipeline cache key/unload use model.providerConfig.modelPath:pipeline (not model_id).
  • Schemas/Utilities:
    • AiTaskSchemas.TypeModel* now build from ModelConfigSchema.
  • Tests:
    • Update tests to pass inline model objects and remove repository dependency in AiJob execution.

Written by Cursor Bugbot for commit a29226e. This will update automatically on new commits. Configure here.

@cursor
Copy link

cursor bot commented Dec 20, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

- Replaced instances of `ModelRecord as ModelRecordSchema` with `ModelRecordSchema` for consistency across multiple files.
- Updated the model schema definitions to use `ModelConfigSchema` and `ModelRecordSchema` in relevant areas.
- Enhanced code formatting for better readability in various task and model repository files.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a clearer separation between ModelConfig (for task/job inputs) and ModelRecord (for repository persistence), enabling job workers to execute AI tasks without requiring access to a model repository at runtime. The key architectural change allows AiTask to accept and pass full model configurations directly in job payloads, aligning the implementation with the original schema design intent.

  • Introduces ModelConfigSchema with minimal required fields (provider, providerConfig) and ModelRecordSchema with full requirements for persistence
  • Updates AiTask to accept both string model identifiers and ModelConfig objects, resolving strings via repository while passing configs directly to jobs
  • Modifies job execution to use embedded ModelConfig instead of performing runtime repository lookups

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/ai/src/model/ModelSchema.ts Introduces ModelConfigSchema (lightweight) and ModelRecordSchema (persistence) with distinct required fields
packages/ai/src/task/base/AiTask.ts Adds getModelConfigForInput() method, updates validation to handle both string and object models, modifies job input generation to embed configs
packages/ai/src/task/base/AiTaskSchemas.ts Updates model type schema reference from ModelSchema to ModelConfigSchema
packages/ai/src/task/base/AiVisionTask.ts Updates type signatures to accept ModelConfig in addition to strings
packages/ai/src/job/AiJob.ts Removes model repository lookup, directly uses ModelConfig from job input
packages/ai/src/provider/AiProviderRegistry.ts Updates run function type signature to accept ModelConfig instead of ModelRecord
packages/ai/src/model/ModelRepository.ts Updates repository to use ModelRecordSchema for persistence
packages/ai/src/model/InMemoryModelRepository.ts Updates constructor to use ModelRecordSchema
packages/test/src/binding/SqliteModelRepository.ts Updates tabular repository instantiation with ModelRecordSchema
packages/test/src/binding/PostgresModelRepository.ts Updates tabular repository instantiation with ModelRecordSchema
packages/test/src/binding/IndexedDbModelRepository.ts Updates tabular repository instantiation with ModelRecordSchema
packages/ai-provider/src/tf-mediapipe/common/TFMP_ModelSchema.ts Adds TFMPModelConfig type extending the base model config schema
packages/ai-provider/src/tf-mediapipe/common/TFMP_JobRunFns.ts Updates all run function signatures to accept TFMPModelConfig
packages/ai-provider/src/hf-transformers/common/HFT_ModelSchema.ts Adds HfTransformersOnnxModelConfig type with proper schema separation
packages/ai-provider/src/hf-transformers/common/HFT_JobRunFns.ts Updates type signatures and cache key logic to use modelPath:pipeline instead of model_id
packages/test/src/test/ai-provider/AiProviderRegistry.test.ts Updates test to pass full model config object instead of string identifier

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +283 to +286
const combined: (string | ModelConfig)[] = [...requestedInline, ...usingStrings];

// we alter input to be the models that were found for this kind of input
usingModels = usingModels.length > 1 ? usingModels : usingModels[0];
(input as any)[key] = usingModels;
(input as any)[key] = combined.length > 1 ? combined : combined[0];
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both requestedInline and usingStrings are empty, the combined array will be empty, and combined[0] will be undefined. This could lead to unexpected behavior when a single model is expected but all requested models were filtered out. Consider adding a check to handle this case explicitly or maintaining the original value when no compatible models are found.

Copilot uses AI. Check for mistakes.
*/

import { ModelSchema } from "@workglow/ai";
import { ModelRecordSchema } from "@workglow/ai";
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement is missing ModelConfigSchema which should be imported from "@workglow/ai" alongside ModelRecordSchema. This is needed to properly extend the ModelConfig schema for the TFMPModelConfig type.

Copilot uses AI. Check for mistakes.
Comment on lines 57 to 63
const ExtendedModelConfigSchema = {
type: "object",
properties: {
...ModelRecordSchema.properties,
...TFMPModelSchema.properties,
},
required: [...ModelRecordSchema.required, ...TFMPModelSchema.required],
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExtendedModelConfigSchema should extend ModelConfigSchema instead of ModelRecordSchema. This is inconsistent with the HFT implementation and defeats the purpose of having a lighter ModelConfig schema for task inputs. The required fields should be [...ModelConfigSchema.required, ...TFMPModelSchema.required] instead of [...ModelRecordSchema.required, ...].

Copilot uses AI. Check for mistakes.
sroussey and others added 3 commits December 21, 2025 01:10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Replaced `ModelRecordSchema` with `ModelConfigSchema` in the ExtendedModelConfigSchema definition.
- Adjusted required properties to align with the new schema structure.
- Improved code consistency and readability in the model schema file.
@sroussey sroussey marked this pull request as ready for review December 21, 2025 01:15
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

taskModels?.find((m) => m.model_id === model)
);

const combined: (string | ModelConfig)[] = [...requestedInline, ...usingStrings];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline models not filtered for task compatibility in narrowInput

The narrowInput method filters string models based on task compatibility but keeps all inline ModelConfig objects unfiltered. However, validateInput rejects inline configs with a tasks array that doesn't include the current task type. This inconsistency means incompatible inline configs pass through narrowInput but fail validation later. The purpose of narrowInput is to strip incompatible models in dataflows so validation succeeds when at least some models are compatible. Inline configs with explicit incompatible tasks arrays should be filtered out in narrowInput to maintain consistency with validateInput.

Additional Locations (1)

Fix in Cursor Fix in Web

progressScaleMax: number = 10
) => {
const cacheKey = `${model.model_id}:${model.providerConfig.pipeline}`;
const cacheKey = `${model.providerConfig.modelPath}:${model.providerConfig.pipeline}`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline cache key ignores device and dtype configuration

The pipeline cache key changed from model.model_id:pipeline to model.providerConfig.modelPath:pipeline. However, the pipeline is created with additional configuration options including dType (dtype) and device (lines 372, 376). Since these options aren't included in the cache key, two model configs with the same modelPath and pipeline but different dType or device settings will incorrectly share the same cached pipeline. This could cause models to run on the wrong device (e.g., CPU instead of WebGPU) or with the wrong precision, leading to incorrect behavior or errors.

Fix in Cursor Fix in Web

@sroussey sroussey merged commit b3091ce into main Dec 21, 2025
2 checks passed
@sroussey
Copy link
Collaborator Author

Closes #149

@sroussey sroussey deleted the cursor/aitask-model-config-handling-bdcf branch December 21, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants