[Discovery] - Foxx Identify how schemas are currently used

# Foxx Schema Usage, Ownership

## Goal

Identify **all locations where schemas are referenced, mutated, or relied upon**, and clarify how schema lifecycle (create, update, delete) behaves across:

- Web UI
- Python client
- Foxx API
- Background tasks
- External schema service

---

## 1. Schema Usage by Entry Point

### 1.1 End User (Web UI)

Users can:

- Create a schema
- Revise an existing schema
- Apply a schema to a record
- View schema validation errors for a record
- Edit/update a schema
- View a schema definition
- Delete a schema **only when not in use**
- Use a schema helper to build queries

---

### 1.2 End User (Python Client)

Users can:

- Apply a schema to a record

(All other schema interactions are mediated through backend APIs.)

---

### 1.3 Foxx API Usage

Schemas are referenced in the following routes:

| Route | Path | Schema Usage |
|----|----|----|
| Update record | `dat/update` | Uses schema ID |
| Batch update | `dat/update/batch` | Uses schema ID |
| View record | `dat/view` | Uses schema ID, returns schema ID + version |
| Create record | `dat/create` | Uses schema ID; `recordCreate` derives `sch_ver` |
| Batch create | `dat/create/batch` | Same as above |
| Delete record | `dat/delete` | Decrements schema `cnt` |
| Task execution | `task/run` | Handles schema count decrement in background deletes |

Relevant call paths:

dat/update -> recordUpdate
dat/update/batch -> recordUpdate
dat/view -> schema lookup + return
dat/create -> recordCreate
dat/create/batch -> recordCreate

dat/delete
-> tasks.js
-> _deleteDataRecord
-> decrement schema cnt

task/run
-> taskGetRunFunc
-> taskRunProjectDelete / taskRunRecCollDelete
-> _deleteDataRecords
-> decrement schema cnt


---

## 2. Architectural Constraint: Background Tasks

Schema reference counting (`cnt`) is updated **inside background tasks**.

This complicates decoupling schemas into a separate service because:

- Background tasks need schema access
- Network calls introduce failure states
- Retries must be safe and idempotent

### Decision

**Keep `cnt` in the schema document**, not in the external schema service.

**Rationale:**

- Schema usage is tightly coupled to DataFed records
- Reference counting must be local, fast, and transactional
- Externalizing it increases failure modes without benefit

---

## 3. Data Model

### 3.1 Record Document

Record
└── sch_id = "<schema_name>:<version>"

This is a logical reference supplied by the user.

### 3.2 Schema Document
Schema
├── _id     = "sch/281578009"
├── _key    = "281578009"
├── id      = "<schema_name>:<version>"
├── ver     = <version>
├── cnt     = <number of records using this schema>
├── own_id  = "<u/user_id>"
├── pub     = <boolean>
├── desc    = <string>
└── status  = <pending | exists | deleting>   (NEW)

### 3.3 Schema Version Edge (sch_ver)
SchemaVersionEdge
_from ──> <schema _id>
_to   ──> <schema _id>


Models schema version lineage.

### 3.4 Schema Dependency Edge (sch_dep)
SchemaDependencyEdge
_from ──> <schema _id>
_to   ──> <schema _id>


Models schema dependencies.

## 4. Required Additions
### 4.1 Schema Backend / Plugin Identifier

The schema document needs an identifier for the schema backend or plugin.

The schema service address belongs in the orchestration service config

The schema document should store only a logical identifier

### 4.2 Schema Status Field (Required)

Add a status field to track lifecycle state:

pending   → creation requested, not yet confirmed
exists    → schema successfully created
deleting  → deletion in progress


This enables retries and safe recovery from partial failures.

## 5. Schema Lifecycle Flows
### 5.1 Schema Creation Flow
sequenceDiagram
    participant Client
    participant Foxx
    participant Orchestrator
    participant SchemaService

    Client->>Foxx: Create schema request
    Foxx->>Foxx: Authorization check
    Foxx->>Foxx: Create schema doc (status=pending)
    Foxx->>Orchestrator: Request schema creation
    Orchestrator->>SchemaService: Create schema
    SchemaService-->>Orchestrator: Success
    Orchestrator-->>Foxx: Confirm creation
    Foxx->>Foxx: Update status=exists


Failure handling:

If creation fails, status remains pending

Reads encountering pending retry creation

Creation must be idempotent

### 5.2 Schema Deletion Flow
sequenceDiagram
    participant Client
    participant Foxx
    participant Orchestrator
    participant SchemaService

    Client->>Foxx: Delete schema request
    Foxx->>Foxx: Authorization check
    Foxx->>Foxx: Check cnt

    alt cnt > 0
        Foxx-->>Client: Reject delete
    else cnt == 0
        Foxx->>Foxx: status=deleting
        Foxx->>Orchestrator: Delete schema
        Orchestrator->>SchemaService: Delete schema
        SchemaService-->>Orchestrator: Success
    end

Deletes are idempotent. Retrying with the same schema ID is safe.

### 5.3 Failure Mode: Orphaned Schemas

If orchestration fails after DataFed removes the schema:

DataFed no longer references the schema
Schema service may retain an orphan

This is acceptable:

Orphan cleanup can occur asynchronously
Idempotent deletes allow safe retries

## 6. Summary of Decisions

Schema usage count (cnt) lives in the schema document
Background tasks update schema usage locally
Schema service interaction is mediated by the orchestrator
Schema lifecycle requires explicit status
Create/delete operations must be idempotent
Temporary inconsistency is acceptable; silent corruption is not




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discovery] - Foxx Identify how schemas are currently used #1828

Foxx Schema Usage, Ownership

Goal

1. Schema Usage by Entry Point

1.1 End User (Web UI)

1.2 End User (Python Client)

1.3 Foxx API Usage

2. Architectural Constraint: Background Tasks

Decision

3. Data Model

3.1 Record Document

3.2 Schema Document

3.3 Schema Version Edge (sch_ver)

3.4 Schema Dependency Edge (sch_dep)

4. Required Additions

4.1 Schema Backend / Plugin Identifier

4.2 Schema Status Field (Required)

5. Schema Lifecycle Flows

5.1 Schema Creation Flow

5.2 Schema Deletion Flow

5.3 Failure Mode: Orphaned Schemas

6. Summary of Decisions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Route	Path	Schema Usage
Update record	`dat/update`	Uses schema ID
Batch update	`dat/update/batch`	Uses schema ID
View record	`dat/view`	Uses schema ID, returns schema ID + version
Create record	`dat/create`	Uses schema ID; `recordCreate` derives `sch_ver`
Batch create	`dat/create/batch`	Same as above
Delete record	`dat/delete`	Decrements schema `cnt`
Task execution	`task/run`	Handles schema count decrement in background deletes

[Discovery] - Foxx Identify how schemas are currently used #1828

Description

Foxx Schema Usage, Ownership

Goal

1. Schema Usage by Entry Point

1.1 End User (Web UI)

1.2 End User (Python Client)

1.3 Foxx API Usage

2. Architectural Constraint: Background Tasks

Decision

3. Data Model

3.1 Record Document

3.2 Schema Document

3.3 Schema Version Edge (sch_ver)

3.4 Schema Dependency Edge (sch_dep)

4. Required Additions

4.1 Schema Backend / Plugin Identifier

4.2 Schema Status Field (Required)

5. Schema Lifecycle Flows

5.1 Schema Creation Flow

5.2 Schema Deletion Flow

5.3 Failure Mode: Orphaned Schemas

6. Summary of Decisions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions