Foxx Schema Usage, Ownership
Goal
Identify all locations where schemas are referenced, mutated, or relied upon, and clarify how schema lifecycle (create, update, delete) behaves across:
- Web UI
- Python client
- Foxx API
- Background tasks
- External schema service
1. Schema Usage by Entry Point
1.1 End User (Web UI)
Users can:
- Create a schema
- Revise an existing schema
- Apply a schema to a record
- View schema validation errors for a record
- Edit/update a schema
- View a schema definition
- Delete a schema only when not in use
- Use a schema helper to build queries
1.2 End User (Python Client)
Users can:
- Apply a schema to a record
(All other schema interactions are mediated through backend APIs.)
1.3 Foxx API Usage
Schemas are referenced in the following routes:
| Route |
Path |
Schema Usage |
| Update record |
dat/update |
Uses schema ID |
| Batch update |
dat/update/batch |
Uses schema ID |
| View record |
dat/view |
Uses schema ID, returns schema ID + version |
| Create record |
dat/create |
Uses schema ID; recordCreate derives sch_ver |
| Batch create |
dat/create/batch |
Same as above |
| Delete record |
dat/delete |
Decrements schema cnt |
| Task execution |
task/run |
Handles schema count decrement in background deletes |
Relevant call paths:
dat/update -> recordUpdate
dat/update/batch -> recordUpdate
dat/view -> schema lookup + return
dat/create -> recordCreate
dat/create/batch -> recordCreate
dat/delete
-> tasks.js
-> _deleteDataRecord
-> decrement schema cnt
task/run
-> taskGetRunFunc
-> taskRunProjectDelete / taskRunRecCollDelete
-> _deleteDataRecords
-> decrement schema cnt
2. Architectural Constraint: Background Tasks
Schema reference counting (cnt) is updated inside background tasks.
This complicates decoupling schemas into a separate service because:
- Background tasks need schema access
- Network calls introduce failure states
- Retries must be safe and idempotent
Decision
Keep cnt in the schema document, not in the external schema service.
Rationale:
- Schema usage is tightly coupled to DataFed records
- Reference counting must be local, fast, and transactional
- Externalizing it increases failure modes without benefit
3. Data Model
3.1 Record Document
Record
└── sch_id = "<schema_name>:"
This is a logical reference supplied by the user.
3.2 Schema Document
Schema
├── _id = "sch/281578009"
├── _key = "281578009"
├── id = "<schema_name>:"
├── ver =
├── cnt =
├── own_id = "<u/user_id>"
├── pub =
├── desc =
└── status = <pending | exists | deleting> (NEW)
3.3 Schema Version Edge (sch_ver)
SchemaVersionEdge
_from ──>
_to ──>
Models schema version lineage.
3.4 Schema Dependency Edge (sch_dep)
SchemaDependencyEdge
_from ──>
_to ──>
Models schema dependencies.
4. Required Additions
4.1 Schema Backend / Plugin Identifier
The schema document needs an identifier for the schema backend or plugin.
The schema service address belongs in the orchestration service config
The schema document should store only a logical identifier
4.2 Schema Status Field (Required)
Add a status field to track lifecycle state:
pending → creation requested, not yet confirmed
exists → schema successfully created
deleting → deletion in progress
This enables retries and safe recovery from partial failures.
5. Schema Lifecycle Flows
5.1 Schema Creation Flow
sequenceDiagram
participant Client
participant Foxx
participant Orchestrator
participant SchemaService
Client->>Foxx: Create schema request
Foxx->>Foxx: Authorization check
Foxx->>Foxx: Create schema doc (status=pending)
Foxx->>Orchestrator: Request schema creation
Orchestrator->>SchemaService: Create schema
SchemaService-->>Orchestrator: Success
Orchestrator-->>Foxx: Confirm creation
Foxx->>Foxx: Update status=exists
Failure handling:
If creation fails, status remains pending
Reads encountering pending retry creation
Creation must be idempotent
5.2 Schema Deletion Flow
sequenceDiagram
participant Client
participant Foxx
participant Orchestrator
participant SchemaService
Client->>Foxx: Delete schema request
Foxx->>Foxx: Authorization check
Foxx->>Foxx: Check cnt
alt cnt > 0
Foxx-->>Client: Reject delete
else cnt == 0
Foxx->>Foxx: status=deleting
Foxx->>Orchestrator: Delete schema
Orchestrator->>SchemaService: Delete schema
SchemaService-->>Orchestrator: Success
end
Deletes are idempotent. Retrying with the same schema ID is safe.
5.3 Failure Mode: Orphaned Schemas
If orchestration fails after DataFed removes the schema:
DataFed no longer references the schema
Schema service may retain an orphan
This is acceptable:
Orphan cleanup can occur asynchronously
Idempotent deletes allow safe retries
6. Summary of Decisions
Schema usage count (cnt) lives in the schema document
Background tasks update schema usage locally
Schema service interaction is mediated by the orchestrator
Schema lifecycle requires explicit status
Create/delete operations must be idempotent
Temporary inconsistency is acceptable; silent corruption is not
Foxx Schema Usage, Ownership
Goal
Identify all locations where schemas are referenced, mutated, or relied upon, and clarify how schema lifecycle (create, update, delete) behaves across:
1. Schema Usage by Entry Point
1.1 End User (Web UI)
Users can:
1.2 End User (Python Client)
Users can:
(All other schema interactions are mediated through backend APIs.)
1.3 Foxx API Usage
Schemas are referenced in the following routes:
dat/updatedat/update/batchdat/viewdat/createrecordCreatederivessch_verdat/create/batchdat/deletecnttask/runRelevant call paths:
dat/update -> recordUpdate
dat/update/batch -> recordUpdate
dat/view -> schema lookup + return
dat/create -> recordCreate
dat/create/batch -> recordCreate
dat/delete
-> tasks.js
-> _deleteDataRecord
-> decrement schema cnt
task/run
-> taskGetRunFunc
-> taskRunProjectDelete / taskRunRecCollDelete
-> _deleteDataRecords
-> decrement schema cnt
2. Architectural Constraint: Background Tasks
Schema reference counting (
cnt) is updated inside background tasks.This complicates decoupling schemas into a separate service because:
Decision
Keep
cntin the schema document, not in the external schema service.Rationale:
3. Data Model
3.1 Record Document
Record
└── sch_id = "<schema_name>:"
This is a logical reference supplied by the user.
3.2 Schema Document
Schema
├── _id = "sch/281578009"
├── _key = "281578009"
├── id = "<schema_name>:"
├── ver =
├── cnt =
├── own_id = "<u/user_id>"
├── pub =
├── desc =
└── status = <pending | exists | deleting> (NEW)
3.3 Schema Version Edge (sch_ver)
SchemaVersionEdge
_from ──>
_to ──>
Models schema version lineage.
3.4 Schema Dependency Edge (sch_dep)
SchemaDependencyEdge
_from ──>
_to ──>
Models schema dependencies.
4. Required Additions
4.1 Schema Backend / Plugin Identifier
The schema document needs an identifier for the schema backend or plugin.
The schema service address belongs in the orchestration service config
The schema document should store only a logical identifier
4.2 Schema Status Field (Required)
Add a status field to track lifecycle state:
pending → creation requested, not yet confirmed
exists → schema successfully created
deleting → deletion in progress
This enables retries and safe recovery from partial failures.
5. Schema Lifecycle Flows
5.1 Schema Creation Flow
sequenceDiagram
participant Client
participant Foxx
participant Orchestrator
participant SchemaService
Failure handling:
If creation fails, status remains pending
Reads encountering pending retry creation
Creation must be idempotent
5.2 Schema Deletion Flow
sequenceDiagram
participant Client
participant Foxx
participant Orchestrator
participant SchemaService
Deletes are idempotent. Retrying with the same schema ID is safe.
5.3 Failure Mode: Orphaned Schemas
If orchestration fails after DataFed removes the schema:
DataFed no longer references the schema
Schema service may retain an orphan
This is acceptable:
Orphan cleanup can occur asynchronously
Idempotent deletes allow safe retries
6. Summary of Decisions
Schema usage count (cnt) lives in the schema document
Background tasks update schema usage locally
Schema service interaction is mediated by the orchestrator
Schema lifecycle requires explicit status
Create/delete operations must be idempotent
Temporary inconsistency is acceptable; silent corruption is not