|
| 1 | +# The Schema-First Mental Model |
| 2 | + |
| 3 | +On the previous page we described the problem: a schema layer that exists in |
| 4 | +every organisation but is governed by nobody, invisible below the OpenAPI |
| 5 | +surface, duplicated across teams, and never treated as shared infrastructure. |
| 6 | +The solution is not a new process or a new tool. It is a change in how you |
| 7 | +think about schemas. |
| 8 | + |
| 9 | +Schemas are code. They define the structure and meaning of your data, determine |
| 10 | +what is valid and what is not, and represent the contracts that every API in |
| 11 | +your organisation either honours or violates. They deserve the same discipline |
| 12 | +you apply to any other critical piece of code: version control, review, and a |
| 13 | +single authoritative source. |
| 14 | + |
| 15 | +## The architecture in a nutshell |
| 16 | + |
| 17 | +- A single Git repository holds your organisation's canonical schemas |
| 18 | +- A registry sits on top of that repository and exposes its contents over HTTP: |
| 19 | + searchable, browsable, and queryable by any tooling or pipeline that needs |
| 20 | + it |
| 21 | +- Changes go through Git, the same pull request workflow your organisation |
| 22 | + already uses for everything else |
| 23 | +- Consumers reference schemas either directly via |
| 24 | + [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) pointing at the |
| 25 | + registry, or by running [`jsonschema |
| 26 | + install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown) |
| 27 | + to pull schemas into their own repository and reference them locally |
| 28 | +- API teams keep their OpenAPI specs wherever makes sense for them, pointing |
| 29 | + their schema references at the central source |
| 30 | + |
| 31 | +That is the entire model. Everything that follows is grounded on those steps. |
| 32 | + |
| 33 | +## This model is already proven in practice |
| 34 | + |
| 35 | +Maintaining a Git repository of schemas and serving it over HTTP is not a novel |
| 36 | +proposal. Most organisations doing it are closed-source, but several public |
| 37 | +examples are instructive. |
| 38 | + |
| 39 | +[SchemaStore](https://github.com/SchemaStore/schemastore) is the most visible |
| 40 | +instance: a Git repository of JSON Schemas for popular tools, maintained |
| 41 | +through pull requests and served over HTTP. When your editor autocompletes a |
| 42 | +GitHub Actions workflow or a Kubernetes manifest, it is reading from a schema |
| 43 | +reviewed and merged through a PR there. |
| 44 | + |
| 45 | +[KrakenD](https://github.com/krakend/krakend-schema) maintains its |
| 46 | +configuration schemas the same way. So does [NASA's General Coordinates |
| 47 | +Network](https://github.com/nasa-gcn/gcn-schema/tree/main/gcn) for astronomical |
| 48 | +alert schemas, and the [Human Cell |
| 49 | +Atlas](https://github.com/HumanCellAtlas/metadata-schema/tree/master/json_schema) |
| 50 | +for biological metadata. |
| 51 | + |
| 52 | +Ikenna Nwaiwu's [Automating API Delivery: APIOps with |
| 53 | +OpenAPI](https://www.amazon.com/Automating-API-Delivery-APIOps-OpenAPI/dp/1633438783) |
| 54 | +(Manning, 2024) formalises the same approach for API definitions. This guide |
| 55 | +extends that thinking one layer deeper, to the schema layer those definitions |
| 56 | +depend on. |
| 57 | + |
| 58 | +## Why Git? Governance by construction |
| 59 | + |
| 60 | +*This approach is a direct application of [GitOps](https://opengitops.dev/): |
| 61 | +using Git as the single source of truth for system state, where every change |
| 62 | +goes through a version-controlled, reviewable workflow.* |
| 63 | + |
| 64 | +In comparison, a stateful registry like Apicurio or Confluent accepts writes |
| 65 | +directly to itself over its API. This means that schemas can enter your |
| 66 | +canonical layer outside any review workflow. Some solutions bolt approval flows |
| 67 | +on top, but in doing so they are rebuilding what Git already provides natively: |
| 68 | +audit history, access controls, branch-based proposals, CI integration. They |
| 69 | +are rebuilding those capabilities with less maturity, less flexibility, and |
| 70 | +less familiarity than infrastructure your engineers have used for years. |
| 71 | + |
| 72 | +!!! NOTE |
| 73 | + |
| 74 | + It is worth noting that stateful registries are the right tool for certain |
| 75 | + problems. In event streaming architectures, sharing a schema out-of-band |
| 76 | + for deserialisation is a coordination problem, and a stateful push model |
| 77 | + fits. API governance is a different problem. |
| 78 | + |
| 79 | +With a Git-native approach, the registry has no write path. The only way to |
| 80 | +change what it serves is to change what is in Git, which means going through a |
| 81 | +pull request: proposable, reviewable, reversible, audited by default. |
| 82 | +Governance is not a policy people follow. It is a structural property of the |
| 83 | +system. |
| 84 | + |
| 85 | +Git's extensibility compounds this advantage. You can attach anything to the |
| 86 | +workflow: AI reviewers, webhooks to notify downstream systems, automated |
| 87 | +duplicate detection, Slack notifications. The entire CI ecosystem the industry |
| 88 | +has built around Git works here without additional integration. No schema |
| 89 | +registry plugin system required. |
| 90 | + |
| 91 | +## Why a registry on top of Git, and not just Git |
| 92 | + |
| 93 | +A raw file tree is not enough for everyone who needs to interact with the |
| 94 | +schema layer. A product manager should not need a terminal and a git client to |
| 95 | +understand what a `Transaction` schema contains. A governance team needs a |
| 96 | +health view across hundreds of definitions, not file diffs. A pipeline |
| 97 | +resolving a [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) will |
| 98 | +prefer a stable HTTP endpoint, not a repository clone. A compliance stakeholder |
| 99 | +needs rendered documentation, not raw JSON. |
| 100 | + |
| 101 | +A read-only registry, like Sourcemeta One, turns the source of truth into |
| 102 | +something the whole organisation can use, without changing what the source of |
| 103 | +truth is. The registry is the interface. Git is the authority. |
| 104 | + |
| 105 | +Because the registry holds no state, its operational profile is fundamentally |
| 106 | +simpler: no database, no sync process, no stateful service. A stateless |
| 107 | +registry is a single container, trivial to deploy and scale horizontally. |
| 108 | +Reliability increases precisely because the most common source of failure in |
| 109 | +distributed systems, stateful persistence, has been removed. |
| 110 | + |
| 111 | +## `$ref` versus `jsonschema install` |
| 112 | + |
| 113 | +Referencing schemas via |
| 114 | +[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) is convenient for |
| 115 | +tooling, OpenAPI authoring, and local development. The tradeoff is a runtime |
| 116 | +dependency on the registry being available, which is fine for most internal |
| 117 | +workflows but a risk for production systems or airgapped environments. |
| 118 | + |
| 119 | +The [`jsonschema |
| 120 | +install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown) |
| 121 | +command fetches schemas with integrity verification and writes them to disk, |
| 122 | +where they can be committed and used with no network dependency. The pattern is |
| 123 | +identical to npm: you depend on the local copy, not the registry being live. |
| 124 | +Many teams use [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) |
| 125 | +during development and [`jsonschema |
| 126 | +install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown) |
| 127 | +for production builds and CI. |
| 128 | + |
| 129 | +## One source of truth, but never a blocker |
| 130 | + |
| 131 | +A source of truth requires one place, because two authoritative repositories |
| 132 | +means teams must choose, and that choice is where inconsistency begins. But the |
| 133 | +path there runs through local iteration, not around it. |
| 134 | + |
| 135 | +A common fear about centralised governance is that it becomes a bottleneck. |
| 136 | +That is a misunderstanding of what this model is trying to achieve. |
| 137 | + |
| 138 | +The goal is not to gate development behind a registry. It is to centralise and |
| 139 | +make official what teams are already building, in parallel, without ever |
| 140 | +blocking them. Local schemas are a staging area. If the registry does not have |
| 141 | +what you need, define it locally, ship, and keep building. Upstream proposals |
| 142 | +happen as a separate concern. |
| 143 | + |
| 144 | +## What the flow looks like |
| 145 | + |
| 146 | +A developer on the payments team needs a `PaymentMethod` definition. The |
| 147 | +registry does not have one. They define it locally, reference it from their |
| 148 | +OpenAPI spec, and ship. Development is not blocked. |
| 149 | + |
| 150 | +When the PR lands, the platform team notices the `currency` field uses a |
| 151 | +freeform string where [ISO |
| 152 | +4217](https://www.iso.org/iso-4217-currency-codes.html) already defines a |
| 153 | +controlled vocabulary, and suggests internally referencing it. The schema |
| 154 | +becomes stronger through review. The PR merges, triggering a CI action that |
| 155 | +redeploys Sourcemeta One. On startup the registry reads from the updated |
| 156 | +repository and `PaymentMethod` is now available canonically. |
| 157 | + |
| 158 | +The payments team replaces their local definition with a |
| 159 | +[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) to the registry. |
| 160 | +The next team finds it through the registry and skips defining their own. A |
| 161 | +third team runs [`jsonschema |
| 162 | +install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown) |
| 163 | +to pull it locally. The local schema served its purpose and is gone. |
| 164 | + |
| 165 | +!!! NOTE |
| 166 | + |
| 167 | + In some organisations the developer then proposes it upstream directly. In |
| 168 | + larger organisations, an API platform team continuously monitors what |
| 169 | + schemas are being defined locally across teams, identifies patterns worth |
| 170 | + standardising, and drives centralisation over time, replacing local |
| 171 | + definitions with canonical ones as adoption spreads. Either way, the schema |
| 172 | + moves upstream without blocking anyone. |
| 173 | + |
| 174 | +The fragmentation cycle is always governed and prevented. |
0 commit comments