Kickstart a larger documentation guide (#814)

jviotti · web-flow · commit 09f5de3f140d · 2026-04-07T13:22:03.000-04:00
Signed-off-by: Juan Cruz Viotti &lt;jv@jviotti.com&gt;
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -130,7 +130,8 @@ Congratulations! You've just built your first Sourcemeta One instance in under
 two minutes (told you so!). Whilst our single-schema service might seem modest,
 you've got the perfect foundation to experiment and expand.
 
-Ready to take things further? Take a look at our
+Ready to take things further? Take a look at our more comprehensive
+getting-started [guide](guide/index.md). Also explore our
 [integrations](integrations.md) which cover ways on which you can pull and use
 the schemas in a growing amount of programming languages and applications.
 
diff --git a/docs/guide/approach.md b/docs/guide/approach.md
@@ -0,0 +1,174 @@
+# The Schema-First Mental Model
+
+On the previous page we described the problem: a schema layer that exists in
+every organisation but is governed by nobody, invisible below the OpenAPI
+surface, duplicated across teams, and never treated as shared infrastructure.
+The solution is not a new process or a new tool. It is a change in how you
+think about schemas.
+
+Schemas are code. They define the structure and meaning of your data, determine
+what is valid and what is not, and represent the contracts that every API in
+your organisation either honours or violates. They deserve the same discipline
+you apply to any other critical piece of code: version control, review, and a
+single authoritative source.
+
+## The architecture in a nutshell
+
+- A single Git repository holds your organisation's canonical schemas
+- A registry sits on top of that repository and exposes its contents over HTTP:
+  searchable, browsable, and queryable by any tooling or pipeline that needs
+  it
+- Changes go through Git, the same pull request workflow your organisation
+  already uses for everything else
+- Consumers reference schemas either directly via
+  [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) pointing at the
+  registry, or by running [`jsonschema
+  install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
+  to pull schemas into their own repository and reference them locally
+- API teams keep their OpenAPI specs wherever makes sense for them, pointing
+  their schema references at the central source
+
+That is the entire model. Everything that follows is grounded on those steps.
+
+## This model is already proven in practice
+
+Maintaining a Git repository of schemas and serving it over HTTP is not a novel
+proposal. Most organisations doing it are closed-source, but several public
+examples are instructive.
+
+[SchemaStore](https://github.com/SchemaStore/schemastore) is the most visible
+instance: a Git repository of JSON Schemas for popular tools, maintained
+through pull requests and served over HTTP. When your editor autocompletes a
+GitHub Actions workflow or a Kubernetes manifest, it is reading from a schema
+reviewed and merged through a PR there.
+
+[KrakenD](https://github.com/krakend/krakend-schema) maintains its
+configuration schemas the same way. So does [NASA's General Coordinates
+Network](https://github.com/nasa-gcn/gcn-schema/tree/main/gcn) for astronomical
+alert schemas, and the [Human Cell
+Atlas](https://github.com/HumanCellAtlas/metadata-schema/tree/master/json_schema)
+for biological metadata.
+
+Ikenna Nwaiwu's [Automating API Delivery: APIOps with
+OpenAPI](https://www.amazon.com/Automating-API-Delivery-APIOps-OpenAPI/dp/1633438783)
+(Manning, 2024) formalises the same approach for API definitions. This guide
+extends that thinking one layer deeper, to the schema layer those definitions
+depend on.
+
+## Why Git? Governance by construction
+
+*This approach is a direct application of [GitOps](https://opengitops.dev/):
+using Git as the single source of truth for system state, where every change
+goes through a version-controlled, reviewable workflow.*
+
+In comparison, a stateful registry like Apicurio or Confluent accepts writes
+directly to itself over its API. This means that schemas can enter your
+canonical layer outside any review workflow. Some solutions bolt approval flows
+on top, but in doing so they are rebuilding what Git already provides natively:
+audit history, access controls, branch-based proposals, CI integration. They
+are rebuilding those capabilities with less maturity, less flexibility, and
+less familiarity than infrastructure your engineers have used for years.
+
+!!! NOTE
+
+    It is worth noting that stateful registries are the right tool for certain
+    problems. In event streaming architectures, sharing a schema out-of-band
+    for deserialisation is a coordination problem, and a stateful push model
+    fits. API governance is a different problem.
+
+With a Git-native approach, the registry has no write path. The only way to
+change what it serves is to change what is in Git, which means going through a
+pull request: proposable, reviewable, reversible, audited by default.
+Governance is not a policy people follow. It is a structural property of the
+system.
+
+Git's extensibility compounds this advantage. You can attach anything to the
+workflow: AI reviewers, webhooks to notify downstream systems, automated
+duplicate detection, Slack notifications. The entire CI ecosystem the industry
+has built around Git works here without additional integration. No schema
+registry plugin system required.
+
+## Why a registry on top of Git, and not just Git
+
+A raw file tree is not enough for everyone who needs to interact with the
+schema layer. A product manager should not need a terminal and a git client to
+understand what a `Transaction` schema contains. A governance team needs a
+health view across hundreds of definitions, not file diffs. A pipeline
+resolving a [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) will
+prefer a stable HTTP endpoint, not a repository clone. A compliance stakeholder
+needs rendered documentation, not raw JSON.
+
+A read-only registry, like Sourcemeta One, turns the source of truth into
+something the whole organisation can use, without changing what the source of
+truth is. The registry is the interface. Git is the authority.
+
+Because the registry holds no state, its operational profile is fundamentally
+simpler: no database, no sync process, no stateful service. A stateless
+registry is a single container, trivial to deploy and scale horizontally.
+Reliability increases precisely because the most common source of failure in
+distributed systems, stateful persistence, has been removed.
+
+## `$ref` versus `jsonschema install`
+
+Referencing schemas via
+[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) is convenient for
+tooling, OpenAPI authoring, and local development. The tradeoff is a runtime
+dependency on the registry being available, which is fine for most internal
+workflows but a risk for production systems or airgapped environments.
+
+The [`jsonschema
+install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
+command fetches schemas with integrity verification and writes them to disk,
+where they can be committed and used with no network dependency. The pattern is
+identical to npm: you depend on the local copy, not the registry being live.
+Many teams use [`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/)
+during development and [`jsonschema
+install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
+for production builds and CI.
+
+## One source of truth, but never a blocker
+
+A source of truth requires one place, because two authoritative repositories
+means teams must choose, and that choice is where inconsistency begins. But the
+path there runs through local iteration, not around it.
+
+A common fear about centralised governance is that it becomes a bottleneck.
+That is a misunderstanding of what this model is trying to achieve.
+
+The goal is not to gate development behind a registry. It is to centralise and
+make official what teams are already building, in parallel, without ever
+blocking them. Local schemas are a staging area. If the registry does not have
+what you need, define it locally, ship, and keep building. Upstream proposals
+happen as a separate concern.
+
+## What the flow looks like
+
+A developer on the payments team needs a `PaymentMethod` definition. The
+registry does not have one. They define it locally, reference it from their
+OpenAPI spec, and ship. Development is not blocked.
+
+When the PR lands, the platform team notices the `currency` field uses a
+freeform string where [ISO
+4217](https://www.iso.org/iso-4217-currency-codes.html) already defines a
+controlled vocabulary, and suggests internally referencing it. The schema
+becomes stronger through review. The PR merges, triggering a CI action that
+redeploys Sourcemeta One. On startup the registry reads from the updated
+repository and `PaymentMethod` is now available canonically.
+
+The payments team replaces their local definition with a
+[`$ref`](https://www.learnjsonschema.com/2020-12/core/ref/) to the registry.
+The next team finds it through the registry and skips defining their own. A
+third team runs [`jsonschema
+install`](https://github.com/sourcemeta/jsonschema/blob/main/docs/install.markdown)
+to pull it locally. The local schema served its purpose and is gone.
+
+!!! NOTE
+
+    In some organisations the developer then proposes it upstream directly. In
+    larger organisations, an API platform team continuously monitors what
+    schemas are being defined locally across teams, identifies patterns worth
+    standardising, and drives centralisation over time, replacing local
+    definitions with canonical ones as adoption spreads. Either way, the schema
+    moves upstream without blocking anyone.
+
+The fragmentation cycle is always governed and prevented.
diff --git a/docs/guide/development.md b/docs/guide/development.md
@@ -0,0 +1,3 @@
+# Working Locally
+
+Coming soon!
diff --git a/docs/guide/evolution.md b/docs/guide/evolution.md
@@ -0,0 +1,3 @@
+# Tackling Schema Evolution
+
+Coming soon!
diff --git a/docs/guide/index.md b/docs/guide/index.md
@@ -0,0 +1,162 @@
+# The Problem With API Governance Today
+
+APIs are the connective tissue of modern software. Most organisations know
+this. Most have also, at some point, decided to "govern their APIs."
+
+And yet the same problems keep coming back.
+
+## The symptoms are familiar
+
+Pick any reasonably large engineering organisation and you tend to find the
+same problems, regardless of how much has been invested in API tooling.
+
+**Duplication.** The payments team has a `Customer` object. So does the
+notifications service, the billing system, and the analytics pipeline. Each
+definition made sense to the team that wrote it. None of them were being
+careless. There was no shared definition to reach for, so they wrote their own.
+Now you have five definitions of `Customer`, subtly inconsistent. An `address`
+that is a string in one API and a nested object in another. `dateOfBirth` here,
+`dob` there. `email` required in three specs, optional in two, and only
+validated in one.
+
+**Discoverability.** A sixth team arrives. They search for existing
+definitions, find nothing organised or trustworthy, and write their own. The
+problem does not just persist, it compounds. According to [Postman's 2025 State
+of the API Report](https://www.postman.com/state-of-api/2025/), **34% of
+developers cannot find the APIs they need within their own organisation**, and
+**55% struggle with inconsistent documentation**. Teams are rebuilding work
+that already exists because there is no reliable way to know what already
+exists and to actually reuse it.
+
+**Integration pain.** No two APIs in the organisation agree on how to represent
+the same concept. Some of this is unavoidable as your business domain has
+concepts unique to you, and without a shared internal definition every team
+invents their own version independently. But a significant part is entirely
+avoidable. Standards for many common concepts already exist: [ISO
+8601](https://www.iso.org/iso-8601-date-and-time-format.html) for dates, [ISO
+3166](https://www.iso.org/iso-3166-country-codes.html) for country codes, [ISO
+4217](https://www.iso.org/iso-4217-currency-codes.html) for currencies, [ISO
+20022](https://www.iso20022.org/) for financial data models used by SWIFT and
+SEPA, and many more. For those concepts, the thinking has already been done and
+your teams are spending effort reinventing it, most probably to lesser quality.
+The consequence either way is the same: every integration boundary in your
+system, internal or external, requires a translation layer that would not exist
+if everyone had agreed on the same definitions to begin with.
+
+## The data confirms it
+
+This pattern is not a niche complaint. It became the norm.
+
+[SmartBear's 2023 State of Software Quality: API
+report](https://smartbear.com/state-of-software-quality/api/), surveying over
+1,100 API practitioners across 17 industries, found that **API standardisation
+is the top challenge cited by 51% of organisations**, ahead of security,
+tooling, and performance. It has held the top spot in every edition of the
+survey since 2016.
+
+[Axway's 2024 State of Enterprise API Maturity
+report](https://resources.axway.com/build-api-marketplace/rpt-state-of-enterprise-api-maturity-in-2024-en),
+based on 600 senior IT and business decision-makers across nine countries,
+found that **78% of enterprise leaders do not know how many APIs their
+organisation has**, and **74% acknowledge that more than 20% of their APIs are
+entirely unmanaged**. If organisations cannot account for their APIs, the
+things that are at least visible as endpoints, as documentation pages, as
+things that break in production, consider how much less visible the schema
+layer inside those APIs must be. APIs surface in logs, in incidents, in partner
+complaints. Schemas are embedded silently inside specs, never inventoried,
+never compared across teams. The governance gap at the API level is actually
+the optimistic number.
+
+These are not symptoms of bad engineers or underinvestment in tooling. Most of
+these organisations already have tooling. They are symptoms of a missing layer.
+
+## The industry standardised on OpenAPI and stopped one layer too high
+
+These are not random failures. They have a common structural cause.
+
+OpenAPI became the industry standard for describing APIs. That was genuinely
+good. Teams started writing specs, tooling matured, design-first workflows
+became real. An entire ecosystem of editors, linters, gateways, and
+documentation portals emerged around the format.
+
+But almost all of that tooling shares the same blind spot: it treats each
+OpenAPI spec as the unit of work. It helps you design a better spec in
+isolation, with consistent naming within it, well-structured responses, and
+clear documentation. What it does not do is help you share anything *across*
+specs. The OpenAPI spec is always the starting point, and it is never
+decomposed further.
+
+This matters because a non-trivial OpenAPI spec is, in terms of raw content,
+mostly schemas. The endpoints, the HTTP verbs, the status codes: that is
+structural boilerplate. The substance is the definitions. What a `Customer`
+looks like. What an `Invoice` contains. What an `Address` requires. In any
+real-world API, well over 80% of what matters lives in the schema layer.
+
+And nobody talked about sharing that layer.
+
+Teams write OpenAPI specs in isolation. Each spec defines its own schemas
+inline, because that is what every tutorial shows and because there was nowhere
+central to put them even if a team wanted to. The result: every OpenAPI spec is
+an island. Individually excellent. Collectively incoherent.
+
+## But what about API spec first?
+
+Designing the contract before writing code is genuinely better than the
+alternative. But API spec first still takes the OpenAPI specification as its
+atom, which means the schemas inside it remain ungoverned, uninventoried, and
+unshared across teams.  You can be entirely rigorous about your OpenAPI spec
+and still end up with five definitions of `Customer`. The discipline is real.
+The level of abstraction is wrong.
+
+What makes this particularly hard to notice is that the tooling actively
+reinforces the false confidence. A team runs their OpenAPI spec through a
+linter. It passes. Naming conventions: consistent. Response codes: correct.
+Required fields: present. But the linter checked the structure of the spec, not
+the quality of the schemas inside it. It did not ask whether `Customer` was
+well-modelled, whether it duplicated something three other teams had already
+defined, or whether it diverged from an industry standard. The spec looks
+clean. The schema layer is still a mess. Most OpenAPI linters treat schema
+content as largely opaque, because the schema layer is not what they were
+designed to govern. You can pass every lint rule and still have a fundamental
+governance problem.
+
+## AI makes this more urgent, not less
+
+[Gartner predicts that by 2026, more than 30% of the increase in API demand
+will come from AI tools and large language
+models](https://www.gartner.com/en/newsroom/press-releases/2024-03-20-gartner-predicts-more-than-30-percent-of-the-increase-in-demand-for-apis-will-come-from-ai-and-tools-using-llms-by-2026).
+Separately, [40% of enterprise applications will be integrated with
+task-specific AI agents by end of
+2026](https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025),
+up from less than 5% today. AI agents consume interface definitions literally
+and at scale. The quality of those definitions, and the documentation generated
+out of them, has never mattered more.
+
+But it is worth being precise about what "interface definition" means here.
+Most engineers currently think of the OpenAPI spec as the interface. That is
+understandable: it is what gets shared with consumers, what gets published to
+documentation portals, what gets handed to a new developer on day one. *But
+OpenAPI is a wrapper format*. It makes schemas usable in the context of APIs,
+describing which schemas appear at which endpoints and under which HTTP
+methods. *The actual interfaces, the data contracts themselves, are the schemas
+underneath.*
+
+An AI agent consuming a well-defined, rich shared schema behaves consistently
+and predictably. An AI agent consuming five slightly different inline schemas
+with loose descriptions and no metadata result in five slightly different
+interpretations, all potentially incorrect.
+
+## The solution is to treat schemas as their own layer
+
+The problem is not that teams write bad APIs. It is that the schema layer, the
+layer that defines what data *means* across your organisation, has never been
+treated as infrastructure in its own right.
+
+Every other layer has been addressed. You have source control for code. You
+have registries for container images. You have package managers for
+dependencies. *The schema layer, the shared vocabulary of your entire API
+landscape, has been left as an afterthought embedded inside individual specs:
+invisible and ungoverned.*
+
+That is what this guide addresses. Not another linter for your OpenAPI specs. A
+foundation for properly introducing and governing the layer beneath them.
diff --git a/docs/guide/setup.md b/docs/guide/setup.md
@@ -0,0 +1,3 @@
+# Setting Up a Schema Registry
+
+Coming soon!
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -24,6 +24,12 @@ plugins:
 nav:
   - index.md
   - getting-started.md
+  - Guide:
+    - guide/index.md
+    - guide/approach.md
+    - guide/setup.md
+    - guide/development.md
+    - guide/evolution.md
   - configuration.md
   - integrations.md
   - api.md

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Working Locally`
	`2`	`+`
	`3`	`+Coming soon!`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Tackling Schema Evolution`
	`2`	`+`
	`3`	`+Coming soon!`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Setting Up a Schema Registry`
	`2`	`+`
	`3`	`+Coming soon!`