Skip to content

Commit 09f5de3

Browse files
authored
Kickstart a larger documentation guide (#814)
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
1 parent f475869 commit 09f5de3

7 files changed

Lines changed: 353 additions & 1 deletion

File tree

docs/getting-started.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,8 @@ Congratulations! You've just built your first Sourcemeta One instance in under
130130
two minutes (told you so!). Whilst our single-schema service might seem modest,
131131
you've got the perfect foundation to experiment and expand.
132132

133-
Ready to take things further? Take a look at our
133+
Ready to take things further? Take a look at our more comprehensive
134+
getting-started [guide](guide/index.md). Also explore our
134135
[integrations](integrations.md) which cover ways on which you can pull and use
135136
the schemas in a growing amount of programming languages and applications.
136137

docs/guide/approach.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# The Schema-First Mental Model
2+
3+
On the previous page we described the problem: a schema layer that exists in
4+
every organisation but is governed by nobody, invisible below the OpenAPI
5+
surface, duplicated across teams, and never treated as shared infrastructure.
6+
The solution is not a new process or a new tool. It is a change in how you
7+
think about schemas.
8+
9+
Schemas are code. They define the structure and meaning of your data, determine
10+
what is valid and what is not, and represent the contracts that every API in
11+
your organisation either honours or violates. They deserve the same discipline
12+
you apply to any other critical piece of code: version control, review, and a
13+
single authoritative source.
14+
15+
## The architecture in a nutshell
16+
17+
- A single Git repository holds your organisation's canonical schemas
18+
- A registry sits on top of that repository and exposes its contents over HTTP:
19+
searchable, browsable, and queryable by any tooling or pipeline that needs
20+
it
21+
- Changes go through Git, the same pull request workflow your organisation
22+
already uses for everything else
23+
- Consumers reference schemas either directly via
24+
[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) pointing at the
25+
registry, or by running [`jsonschema
26+
install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
27+
to pull schemas into their own repository and reference them locally
28+
- API teams keep their OpenAPI specs wherever makes sense for them, pointing
29+
their schema references at the central source
30+
31+
That is the entire model. Everything that follows is grounded on those steps.
32+
33+
## This model is already proven in practice
34+
35+
Maintaining a Git repository of schemas and serving it over HTTP is not a novel
36+
proposal. Most organisations doing it are closed-source, but several public
37+
examples are instructive.
38+
39+
[SchemaStore](https://github.com/SchemaStore/schemastore) is the most visible
40+
instance: a Git repository of JSON Schemas for popular tools, maintained
41+
through pull requests and served over HTTP. When your editor autocompletes a
42+
GitHub Actions workflow or a Kubernetes manifest, it is reading from a schema
43+
reviewed and merged through a PR there.
44+
45+
[KrakenD](https://github.com/krakend/krakend-schema) maintains its
46+
configuration schemas the same way. So does [NASA's General Coordinates
47+
Network](https://github.com/nasa-gcn/gcn-schema/tree/main/gcn) for astronomical
48+
alert schemas, and the [Human Cell
49+
Atlas](https://github.com/HumanCellAtlas/metadata-schema/tree/master/json_schema)
50+
for biological metadata.
51+
52+
Ikenna Nwaiwu's [Automating API Delivery: APIOps with
53+
OpenAPI](https://www.amazon.com/Automating-API-Delivery-APIOps-OpenAPI/dp/1633438783)
54+
(Manning, 2024) formalises the same approach for API definitions. This guide
55+
extends that thinking one layer deeper, to the schema layer those definitions
56+
depend on.
57+
58+
## Why Git? Governance by construction
59+
60+
*This approach is a direct application of [GitOps](https://opengitops.dev/):
61+
using Git as the single source of truth for system state, where every change
62+
goes through a version-controlled, reviewable workflow.*
63+
64+
In comparison, a stateful registry like Apicurio or Confluent accepts writes
65+
directly to itself over its API. This means that schemas can enter your
66+
canonical layer outside any review workflow. Some solutions bolt approval flows
67+
on top, but in doing so they are rebuilding what Git already provides natively:
68+
audit history, access controls, branch-based proposals, CI integration. They
69+
are rebuilding those capabilities with less maturity, less flexibility, and
70+
less familiarity than infrastructure your engineers have used for years.
71+
72+
!!! NOTE
73+
74+
It is worth noting that stateful registries are the right tool for certain
75+
problems. In event streaming architectures, sharing a schema out-of-band
76+
for deserialisation is a coordination problem, and a stateful push model
77+
fits. API governance is a different problem.
78+
79+
With a Git-native approach, the registry has no write path. The only way to
80+
change what it serves is to change what is in Git, which means going through a
81+
pull request: proposable, reviewable, reversible, audited by default.
82+
Governance is not a policy people follow. It is a structural property of the
83+
system.
84+
85+
Git's extensibility compounds this advantage. You can attach anything to the
86+
workflow: AI reviewers, webhooks to notify downstream systems, automated
87+
duplicate detection, Slack notifications. The entire CI ecosystem the industry
88+
has built around Git works here without additional integration. No schema
89+
registry plugin system required.
90+
91+
## Why a registry on top of Git, and not just Git
92+
93+
A raw file tree is not enough for everyone who needs to interact with the
94+
schema layer. A product manager should not need a terminal and a git client to
95+
understand what a `Transaction` schema contains. A governance team needs a
96+
health view across hundreds of definitions, not file diffs. A pipeline
97+
resolving a [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) will
98+
prefer a stable HTTP endpoint, not a repository clone. A compliance stakeholder
99+
needs rendered documentation, not raw JSON.
100+
101+
A read-only registry, like Sourcemeta One, turns the source of truth into
102+
something the whole organisation can use, without changing what the source of
103+
truth is. The registry is the interface. Git is the authority.
104+
105+
Because the registry holds no state, its operational profile is fundamentally
106+
simpler: no database, no sync process, no stateful service. A stateless
107+
registry is a single container, trivial to deploy and scale horizontally.
108+
Reliability increases precisely because the most common source of failure in
109+
distributed systems, stateful persistence, has been removed.
110+
111+
## `$ref` versus `jsonschema install`
112+
113+
Referencing schemas via
114+
[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) is convenient for
115+
tooling, OpenAPI authoring, and local development. The tradeoff is a runtime
116+
dependency on the registry being available, which is fine for most internal
117+
workflows but a risk for production systems or airgapped environments.
118+
119+
The [`jsonschema
120+
install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
121+
command fetches schemas with integrity verification and writes them to disk,
122+
where they can be committed and used with no network dependency. The pattern is
123+
identical to npm: you depend on the local copy, not the registry being live.
124+
Many teams use [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/)
125+
during development and [`jsonschema
126+
install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
127+
for production builds and CI.
128+
129+
## One source of truth, but never a blocker
130+
131+
A source of truth requires one place, because two authoritative repositories
132+
means teams must choose, and that choice is where inconsistency begins. But the
133+
path there runs through local iteration, not around it.
134+
135+
A common fear about centralised governance is that it becomes a bottleneck.
136+
That is a misunderstanding of what this model is trying to achieve.
137+
138+
The goal is not to gate development behind a registry. It is to centralise and
139+
make official what teams are already building, in parallel, without ever
140+
blocking them. Local schemas are a staging area. If the registry does not have
141+
what you need, define it locally, ship, and keep building. Upstream proposals
142+
happen as a separate concern.
143+
144+
## What the flow looks like
145+
146+
A developer on the payments team needs a `PaymentMethod` definition. The
147+
registry does not have one. They define it locally, reference it from their
148+
OpenAPI spec, and ship. Development is not blocked.
149+
150+
When the PR lands, the platform team notices the `currency` field uses a
151+
freeform string where [ISO
152+
4217](https://www.iso.org/iso-4217-currency-codes.html) already defines a
153+
controlled vocabulary, and suggests internally referencing it. The schema
154+
becomes stronger through review. The PR merges, triggering a CI action that
155+
redeploys Sourcemeta One. On startup the registry reads from the updated
156+
repository and `PaymentMethod` is now available canonically.
157+
158+
The payments team replaces their local definition with a
159+
[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) to the registry.
160+
The next team finds it through the registry and skips defining their own. A
161+
third team runs [`jsonschema
162+
install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
163+
to pull it locally. The local schema served its purpose and is gone.
164+
165+
!!! NOTE
166+
167+
In some organisations the developer then proposes it upstream directly. In
168+
larger organisations, an API platform team continuously monitors what
169+
schemas are being defined locally across teams, identifies patterns worth
170+
standardising, and drives centralisation over time, replacing local
171+
definitions with canonical ones as adoption spreads. Either way, the schema
172+
moves upstream without blocking anyone.
173+
174+
The fragmentation cycle is always governed and prevented.

docs/guide/development.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Working Locally
2+
3+
Coming soon!

docs/guide/evolution.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Tackling Schema Evolution
2+
3+
Coming soon!

docs/guide/index.md

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# The Problem With API Governance Today
2+
3+
APIs are the connective tissue of modern software. Most organisations know
4+
this. Most have also, at some point, decided to "govern their APIs."
5+
6+
And yet the same problems keep coming back.
7+
8+
## The symptoms are familiar
9+
10+
Pick any reasonably large engineering organisation and you tend to find the
11+
same problems, regardless of how much has been invested in API tooling.
12+
13+
**Duplication.** The payments team has a `Customer` object. So does the
14+
notifications service, the billing system, and the analytics pipeline. Each
15+
definition made sense to the team that wrote it. None of them were being
16+
careless. There was no shared definition to reach for, so they wrote their own.
17+
Now you have five definitions of `Customer`, subtly inconsistent. An `address`
18+
that is a string in one API and a nested object in another. `dateOfBirth` here,
19+
`dob` there. `email` required in three specs, optional in two, and only
20+
validated in one.
21+
22+
**Discoverability.** A sixth team arrives. They search for existing
23+
definitions, find nothing organised or trustworthy, and write their own. The
24+
problem does not just persist, it compounds. According to [Postman's 2025 State
25+
of the API Report](https://www.postman.com/state-of-api/2025/), **34% of
26+
developers cannot find the APIs they need within their own organisation**, and
27+
**55% struggle with inconsistent documentation**. Teams are rebuilding work
28+
that already exists because there is no reliable way to know what already
29+
exists and to actually reuse it.
30+
31+
**Integration pain.** No two APIs in the organisation agree on how to represent
32+
the same concept. Some of this is unavoidable as your business domain has
33+
concepts unique to you, and without a shared internal definition every team
34+
invents their own version independently. But a significant part is entirely
35+
avoidable. Standards for many common concepts already exist: [ISO
36+
8601](https://www.iso.org/iso-8601-date-and-time-format.html) for dates, [ISO
37+
3166](https://www.iso.org/iso-3166-country-codes.html) for country codes, [ISO
38+
4217](https://www.iso.org/iso-4217-currency-codes.html) for currencies, [ISO
39+
20022](https://www.iso20022.org/) for financial data models used by SWIFT and
40+
SEPA, and many more. For those concepts, the thinking has already been done and
41+
your teams are spending effort reinventing it, most probably to lesser quality.
42+
The consequence either way is the same: every integration boundary in your
43+
system, internal or external, requires a translation layer that would not exist
44+
if everyone had agreed on the same definitions to begin with.
45+
46+
## The data confirms it
47+
48+
This pattern is not a niche complaint. It became the norm.
49+
50+
[SmartBear's 2023 State of Software Quality: API
51+
report](https://smartbear.com/state-of-software-quality/api/), surveying over
52+
1,100 API practitioners across 17 industries, found that **API standardisation
53+
is the top challenge cited by 51% of organisations**, ahead of security,
54+
tooling, and performance. It has held the top spot in every edition of the
55+
survey since 2016.
56+
57+
[Axway's 2024 State of Enterprise API Maturity
58+
report](https://resources.axway.com/build-api-marketplace/rpt-state-of-enterprise-api-maturity-in-2024-en),
59+
based on 600 senior IT and business decision-makers across nine countries,
60+
found that **78% of enterprise leaders do not know how many APIs their
61+
organisation has**, and **74% acknowledge that more than 20% of their APIs are
62+
entirely unmanaged**. If organisations cannot account for their APIs, the
63+
things that are at least visible as endpoints, as documentation pages, as
64+
things that break in production, consider how much less visible the schema
65+
layer inside those APIs must be. APIs surface in logs, in incidents, in partner
66+
complaints. Schemas are embedded silently inside specs, never inventoried,
67+
never compared across teams. The governance gap at the API level is actually
68+
the optimistic number.
69+
70+
These are not symptoms of bad engineers or underinvestment in tooling. Most of
71+
these organisations already have tooling. They are symptoms of a missing layer.
72+
73+
## The industry standardised on OpenAPI and stopped one layer too high
74+
75+
These are not random failures. They have a common structural cause.
76+
77+
OpenAPI became the industry standard for describing APIs. That was genuinely
78+
good. Teams started writing specs, tooling matured, design-first workflows
79+
became real. An entire ecosystem of editors, linters, gateways, and
80+
documentation portals emerged around the format.
81+
82+
But almost all of that tooling shares the same blind spot: it treats each
83+
OpenAPI spec as the unit of work. It helps you design a better spec in
84+
isolation, with consistent naming within it, well-structured responses, and
85+
clear documentation. What it does not do is help you share anything *across*
86+
specs. The OpenAPI spec is always the starting point, and it is never
87+
decomposed further.
88+
89+
This matters because a non-trivial OpenAPI spec is, in terms of raw content,
90+
mostly schemas. The endpoints, the HTTP verbs, the status codes: that is
91+
structural boilerplate. The substance is the definitions. What a `Customer`
92+
looks like. What an `Invoice` contains. What an `Address` requires. In any
93+
real-world API, well over 80% of what matters lives in the schema layer.
94+
95+
And nobody talked about sharing that layer.
96+
97+
Teams write OpenAPI specs in isolation. Each spec defines its own schemas
98+
inline, because that is what every tutorial shows and because there was nowhere
99+
central to put them even if a team wanted to. The result: every OpenAPI spec is
100+
an island. Individually excellent. Collectively incoherent.
101+
102+
## But what about API spec first?
103+
104+
Designing the contract before writing code is genuinely better than the
105+
alternative. But API spec first still takes the OpenAPI specification as its
106+
atom, which means the schemas inside it remain ungoverned, uninventoried, and
107+
unshared across teams. You can be entirely rigorous about your OpenAPI spec
108+
and still end up with five definitions of `Customer`. The discipline is real.
109+
The level of abstraction is wrong.
110+
111+
What makes this particularly hard to notice is that the tooling actively
112+
reinforces the false confidence. A team runs their OpenAPI spec through a
113+
linter. It passes. Naming conventions: consistent. Response codes: correct.
114+
Required fields: present. But the linter checked the structure of the spec, not
115+
the quality of the schemas inside it. It did not ask whether `Customer` was
116+
well-modelled, whether it duplicated something three other teams had already
117+
defined, or whether it diverged from an industry standard. The spec looks
118+
clean. The schema layer is still a mess. Most OpenAPI linters treat schema
119+
content as largely opaque, because the schema layer is not what they were
120+
designed to govern. You can pass every lint rule and still have a fundamental
121+
governance problem.
122+
123+
## AI makes this more urgent, not less
124+
125+
[Gartner predicts that by 2026, more than 30% of the increase in API demand
126+
will come from AI tools and large language
127+
models](https://www.gartner.com/en/newsroom/press-releases/2024-03-20-gartner-predicts-more-than-30-percent-of-the-increase-in-demand-for-apis-will-come-from-ai-and-tools-using-llms-by-2026).
128+
Separately, [40% of enterprise applications will be integrated with
129+
task-specific AI agents by end of
130+
2026](https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025),
131+
up from less than 5% today. AI agents consume interface definitions literally
132+
and at scale. The quality of those definitions, and the documentation generated
133+
out of them, has never mattered more.
134+
135+
But it is worth being precise about what "interface definition" means here.
136+
Most engineers currently think of the OpenAPI spec as the interface. That is
137+
understandable: it is what gets shared with consumers, what gets published to
138+
documentation portals, what gets handed to a new developer on day one. *But
139+
OpenAPI is a wrapper format*. It makes schemas usable in the context of APIs,
140+
describing which schemas appear at which endpoints and under which HTTP
141+
methods. *The actual interfaces, the data contracts themselves, are the schemas
142+
underneath.*
143+
144+
An AI agent consuming a well-defined, rich shared schema behaves consistently
145+
and predictably. An AI agent consuming five slightly different inline schemas
146+
with loose descriptions and no metadata result in five slightly different
147+
interpretations, all potentially incorrect.
148+
149+
## The solution is to treat schemas as their own layer
150+
151+
The problem is not that teams write bad APIs. It is that the schema layer, the
152+
layer that defines what data *means* across your organisation, has never been
153+
treated as infrastructure in its own right.
154+
155+
Every other layer has been addressed. You have source control for code. You
156+
have registries for container images. You have package managers for
157+
dependencies. *The schema layer, the shared vocabulary of your entire API
158+
landscape, has been left as an afterthought embedded inside individual specs:
159+
invisible and ungoverned.*
160+
161+
That is what this guide addresses. Not another linter for your OpenAPI specs. A
162+
foundation for properly introducing and governing the layer beneath them.

docs/guide/setup.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Setting Up a Schema Registry
2+
3+
Coming soon!

mkdocs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,12 @@ plugins:
2424
nav:
2525
- index.md
2626
- getting-started.md
27+
- Guide:
28+
- guide/index.md
29+
- guide/approach.md
30+
- guide/setup.md
31+
- guide/development.md
32+
- guide/evolution.md
2733
- configuration.md
2834
- integrations.md
2935
- api.md

0 commit comments

Comments
 (0)