Skip to content

Support schema v2.0.0 (citations, provenance, hatch_schema_version, breaking field changes) #19

@LittleCoinCoin

Description

@LittleCoinCoin

Context

Hatch-Schemas PR #29 introduced package/v2.0.0/hatch_pkg_metadata_schema.json. This issue tracks the corresponding validator work needed to support packages using hatch_schema_version: "2.0.0".

The implementation follows the established pattern: add a hatch_validator/package/v2_0_0/ module, wire it into the validator chain, and register it in the factory.


Breaking changes from v1.2.2 → v2.0.0

These are changes that affect existing validation logic and require deliberate handling, not just schema passthrough.

1. Version field rename: package_schema_versionhatch_schema_version

The field used to route packages to the correct validator has been renamed. The current chain-of-responsibility dispatcher reads metadata.get("package_schema_version", "") (see v1_2_2/validator.py:67). The v2.0.0 validator must read hatch_schema_version instead.

Note: Packages with the old field name should still be routed to the appropriate ≤1.2.2 validator. The dispatch key used for routing must not break backward compatibility.

2. authorauthors (renamed + stricter)

The singular author field is now authors, an array with minItems: 1. Each item requires name and allows an optional email. Any accessor or dependency logic reading the author field must be updated.

3. tools[].descriptiontools[].desc

The description field inside each tool entry was renamed from description to desc. The tools_validation module must be updated accordingly.

4. Docker dependency: version_constrainttag, digest now required

  • digest (sha256:<64 hex chars>) is now a required field on docker dependencies.
  • version_constraint is replaced by the optional tag field.
  • The docker dependency validator must enforce digest presence and stop looking for version_constraint.

5. version_constraint is optional for all dependency types

Previously version_constraint was required on some dependency objects. In v2.0.0 only name is required for hatch, python, and system dependencies. The validator must not error when version_constraint is absent.


New fields requiring validation

6. provenance (optional object)

{
  "git_sha": "a1b2c3d",      // ^[0-9a-f]{7,40}$
  "build_env": "conda-lock"  // enum: conda-lock | pip-compile | manual
}

The object is optional, but if present it must satisfy anyOf: [required: git_sha, required: build_env] — at least one of the two fields must be provided. additionalProperties: false.

7. citations (optional array)

Each citation item requires format (enum) and value, with an optional note. The value field has format-specific patterns enforced via allOf/if-then:

format value pattern / constraint
doi ^10\.\d{4,}/\S+$
arxiv ^\d{4}\.\d{4,5}(v\d+)? or https://arxiv.org/abs/...
pmid ^\d+$
isbn ^(97[89])?\d{9}[\dX]$
url URI format
bibtex, ris, csl-json, formatted free-form string

These conditional validations are handled by JSON Schema itself, but any custom citation-level checks belong in a new citations_validation.py strategy.


Non-breaking schema changes (no custom logic needed)

Enforced by the fetched JSON schema, no custom Python validation required:

  • version now validates full SemVer.
  • name follows reverse-DNS format with a single / separator.
  • description.maxLength increased to 200 (was 100).
  • license and tags are no longer required fields.
  • additionalProperties: false added to nested objects — stricter schema surfaces unexpected fields automatically.

Implementation checklist

  • Create hatch_validator/package/v2_0_0/ with:
    • __init__.py
    • accessor.py — reads hatch_schema_version; handles authors (array); handles tools[].desc
    • schema_validation.py — fetches package schema at tag schemas-package-v2.0.0
    • dependency_validation.py — enforces digest required on docker; tag replaces version_constraint for docker; version_constraint optional everywhere
    • tools_validation.py — reads desc instead of description per tool
    • entry_point_validation.py — unchanged from v1.2.2, delegate or copy
    • citations_validation.py (if custom logic beyond JSON Schema is desired)
    • provenance_validation.py (if custom logic beyond JSON Schema is desired)
    • validator.py — dispatches on hatch_schema_version == "2.0.0"
  • Register v2.0.0 in hatch_validator/core/validator_factory.py
  • Update hatch_validator/schemas/schema_fetcher.py if the release tag prefix changes for v2.0.0
  • Add test fixtures using examples/v2.0.0/server_v2.0.0_example.json (valid) and server_v2.0.0_invalid_example.json (invalid) from Hatch-Schemas
  • Add tests/test_package_validator_for_v2_0_0.py covering: schema routing, citations, provenance, docker digest, tools desc, authors array

Reference files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions