Skip to content

LDP server with LDES output for change-driven search index updates #254

@ddeboer

Description

@ddeboer

Summary

Implement an LDP (Linked Data Platform) server that persists RDF resources and exposes them as an LDES (Linked Data Event Stream), enabling downstream consumers — such as a search indexer (#252) — to react to changes incrementally rather than reindexing from scratch.

Problem

Change discovery is a shared challenge across all data platforms, including the Dataset Register (DR). Platform services (selection, search index, semantic analysis, notification) currently have no standard way to detect what has changed in dataset descriptions, distributions, or individual objects. Polling SPARQL endpoints is expensive and misses deletes; full reindexes are wasteful.

Approach

LDP write interface

Sources (including DR) write dataset descriptions and individual objects via standard HTTP:

  • POST — create a new resource
  • PUT — replace an existing resource
  • DELETE — remove a resource

Granularity: documents = resources (not individual triples). Each write operation is a self-contained RDF document.

LDES output

Every write is appended to an LDES as a versioned member. Consumers subscribe to the stream and process only the delta since their last checkpoint:

  • A new/updated resource → new LDES member with the full document + ldes:isVersionOf link
  • A deleted resource → tombstone member (or out-of-band signal, TBD)

Integration with #252

The LDES stream is the ideal input for an incremental indexing strategy in @lde/search-typesense:

  1. LDES consumer reads new members since the last cursor
  2. For each member: upsert (create/update) or delete the corresponding Typesense document
  3. Advance cursor — no full reindex needed

This replaces the 'full reindex via collection alias swap' strategy in #252 with a continuous, low-latency sync.

Notes

  • Data platforms (including DR) are unaware of changes — change detection is a shared cost/efficiency problem
  • LDES is the chosen standard for event streams in this ecosystem
  • Copying (kopiëren) is an explicit design goal: downstream services maintain their own copy, decoupled from the source
  • Granularity is documents/resources, not triples

Relates to

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions