Skip to content

[Discovery] - Foxx Identify how schemas are currently used #1828

@JoshuaSBrown

Description

@JoshuaSBrown

Foxx Schema Usage, Ownership

Goal

Identify all locations where schemas are referenced, mutated, or relied upon, and clarify how schema lifecycle (create, update, delete) behaves across:

  • Web UI
  • Python client
  • Foxx API
  • Background tasks
  • External schema service

1. Schema Usage by Entry Point

1.1 End User (Web UI)

Users can:

  • Create a schema
  • Revise an existing schema
  • Apply a schema to a record
  • View schema validation errors for a record
  • Edit/update a schema
  • View a schema definition
  • Delete a schema only when not in use
  • Use a schema helper to build queries

1.2 End User (Python Client)

Users can:

  • Apply a schema to a record

(All other schema interactions are mediated through backend APIs.)


1.3 Foxx API Usage

Schemas are referenced in the following routes:

Route Path Schema Usage
Update record dat/update Uses schema ID
Batch update dat/update/batch Uses schema ID
View record dat/view Uses schema ID, returns schema ID + version
Create record dat/create Uses schema ID; recordCreate derives sch_ver
Batch create dat/create/batch Same as above
Delete record dat/delete Decrements schema cnt
Task execution task/run Handles schema count decrement in background deletes

Relevant call paths:

dat/update -> recordUpdate
dat/update/batch -> recordUpdate
dat/view -> schema lookup + return
dat/create -> recordCreate
dat/create/batch -> recordCreate

dat/delete
-> tasks.js
-> _deleteDataRecord
-> decrement schema cnt

task/run
-> taskGetRunFunc
-> taskRunProjectDelete / taskRunRecCollDelete
-> _deleteDataRecords
-> decrement schema cnt


2. Architectural Constraint: Background Tasks

Schema reference counting (cnt) is updated inside background tasks.

This complicates decoupling schemas into a separate service because:

  • Background tasks need schema access
  • Network calls introduce failure states
  • Retries must be safe and idempotent

Decision

Keep cnt in the schema document, not in the external schema service.

Rationale:

  • Schema usage is tightly coupled to DataFed records
  • Reference counting must be local, fast, and transactional
  • Externalizing it increases failure modes without benefit

3. Data Model

3.1 Record Document

Record
└── sch_id = "<schema_name>:"

This is a logical reference supplied by the user.

3.2 Schema Document

Schema
├── _id = "sch/281578009"
├── _key = "281578009"
├── id = "<schema_name>:"
├── ver =
├── cnt =
├── own_id = "<u/user_id>"
├── pub =
├── desc =
└── status = <pending | exists | deleting> (NEW)

3.3 Schema Version Edge (sch_ver)

SchemaVersionEdge
_from ──>
_to ──>

Models schema version lineage.

3.4 Schema Dependency Edge (sch_dep)

SchemaDependencyEdge
_from ──>
_to ──>

Models schema dependencies.

4. Required Additions

4.1 Schema Backend / Plugin Identifier

The schema document needs an identifier for the schema backend or plugin.

The schema service address belongs in the orchestration service config

The schema document should store only a logical identifier

4.2 Schema Status Field (Required)

Add a status field to track lifecycle state:

pending → creation requested, not yet confirmed
exists → schema successfully created
deleting → deletion in progress

This enables retries and safe recovery from partial failures.

5. Schema Lifecycle Flows

5.1 Schema Creation Flow

sequenceDiagram
participant Client
participant Foxx
participant Orchestrator
participant SchemaService

Client->>Foxx: Create schema request
Foxx->>Foxx: Authorization check
Foxx->>Foxx: Create schema doc (status=pending)
Foxx->>Orchestrator: Request schema creation
Orchestrator->>SchemaService: Create schema
SchemaService-->>Orchestrator: Success
Orchestrator-->>Foxx: Confirm creation
Foxx->>Foxx: Update status=exists

Failure handling:

If creation fails, status remains pending

Reads encountering pending retry creation

Creation must be idempotent

5.2 Schema Deletion Flow

sequenceDiagram
participant Client
participant Foxx
participant Orchestrator
participant SchemaService

Client->>Foxx: Delete schema request
Foxx->>Foxx: Authorization check
Foxx->>Foxx: Check cnt

alt cnt > 0
    Foxx-->>Client: Reject delete
else cnt == 0
    Foxx->>Foxx: status=deleting
    Foxx->>Orchestrator: Delete schema
    Orchestrator->>SchemaService: Delete schema
    SchemaService-->>Orchestrator: Success
end

Deletes are idempotent. Retrying with the same schema ID is safe.

5.3 Failure Mode: Orphaned Schemas

If orchestration fails after DataFed removes the schema:

DataFed no longer references the schema
Schema service may retain an orphan

This is acceptable:

Orphan cleanup can occur asynchronously
Idempotent deletes allow safe retries

6. Summary of Decisions

Schema usage count (cnt) lives in the schema document
Background tasks update schema usage locally
Schema service interaction is mediated by the orchestrator
Schema lifecycle requires explicit status
Create/delete operations must be idempotent
Temporary inconsistency is acceptable; silent corruption is not

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions