Skip to content

Convex replicator module#574

Open
kobiebotha wants to merge 89 commits into
mainfrom
module-convex
Open

Convex replicator module#574
kobiebotha wants to merge 89 commits into
mainfrom
module-convex

Conversation

@kobiebotha
Copy link
Copy Markdown
Contributor

@kobiebotha kobiebotha commented Mar 19, 2026

This adds support for Convex as a replication source. Since Convex itself is open source (technically also FSL), it was quite feasible for me to implement this.

As with any datastore, there are many quirks. I've attempted to document pertinent ones in a README.md in the module root. Required reading is the section titled "Mutation Transaction Atomicity".

To get a feel for the system, run the convex self-host-demo: https://github.com/powersync-ja/self-host-demo/tree/convex-demo. Then simply open two instances of the Convex React demo app (new) (http://localhost:3030) side by side.

When running the demo, to log into the Convex dashboard, you need to jump through some hoops:

  1. Check the container logs for the convex-keygen Docker container
  2. Get the "Admin key" printed to console

TODO

  • Test against Convex Cloud (it has been a while since I did that)
  • Fix replication metrics (currently reporting per transaction "page", should report per mutation?)
  • Update Test Connection logic to ensure that the powersync_checkpoints table and write mutation function exist
  • Measure replication performance
  • Docs feat: Convex docs powersync-docs#456

await this.assertHostAllowed();

const primaryPath = `/api/${options.endpoint}`;
const fallbackPath = `/api/streaming_export/${options.endpoint}`;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this being used on either cloud or self hosted. It seems like the /api/streaming_export fallback routes don't get used. I've removed this for this reason.

@stevensJourney
Copy link
Copy Markdown
Collaborator

As part of reviewing the initial POC, I went through the implementation in a bit more detail and then addressed a few items that seemed worth tightening up before this lands.

The main thing I spent time on was the replication flow itself. I reviewed the snapshot + delta approach, where we pin one global Convex snapshot boundary, snapshot selected tables at that same boundary, and then resume document_deltas from that stored LSN. I also looked at resumable snapshots and verified the behaviour by adding integration tests.

I verified the write checkpoint flow. The important ordering here is that we read the Convex head, create the managed PowerSync write checkpoint in the callback, and only then write the Convex marker mutation. That marker write is what gives an idle Convex deployment a later observable delta so the checkpoint can actually be acknowledged. AI generated docs have been added to give more detail about this process. We can remove those if necessary, perhaps they might help other AI agents in the future.

I reviewed the route API adapter as well, especially createReplicationHead, schema/debug table handling, and connection testing. I added a connection test check for the powersync_checkpoints table and mutator so misconfigured Convex projects fail earlier with a more useful message.

On the testing side, I added real integration tests against a local Convex backend and wired them into CI. These cover the module connection path, route API adapter, streaming replication, and resumable snapshots. Some of the original storage-mocking tests were AI-generated. They are less important now that we have real local Convex integration tests, but I left them in place since they might still have some use.

I also did a pass on the Convex API client and value conversion. The current Convex API response typings have been checked with cloud and self-hosted integration tests.

For Convex -> SQLite conversion, I verified the JSON schema responses received from Convex backends and made some cleanup improvements. This process has some limitations which are listed as known issues. I'll mention more on these in an upcoming docs PR.

If you'd like to take this for a spin, feel free to try the React Convex Todolist demo from powersync-ja/powersync-js#952 - this uses a development PowerSync service image. Note that this demo will be moved to its own repository soon.

AI Usage disclaimer:
I believe most of the original implementation was AI generated. Most of my review improvements were hand coded. AI (Codex GPT-5.5 medium) was used to assist with the writing of integration tests - these tests were thoroughly debugged, tweaked and verified. The README content and docs pages are all AI generated. This code has been reviewed with multiple passes by Codex GPT-5.5 medium.

@stevensJourney stevensJourney marked this pull request as ready for review May 14, 2026 15:56
@stevensJourney stevensJourney requested a review from rkistner May 14, 2026 15:56
// TODO! It seems like Convex might not report the schema value for values which have not
// been populated in the DB yet. This can cause many issues - and we need to work around this.
// We perform runtime checks and conversions at this point.
if (value == null) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some additional thought, I’m leaning toward disabling json_schemas for SQLite row conversion in the Convex replication path.

The issue is that using schema metadata makes row values inconsistent depending on whether Convex happened to report a field in json_schemas at the time we cached it:

  • Int64 with schema metadata becomes bigint
  • the same Int64 without schema metadata stays the raw JSON string
  • Bytes with schema metadata becomes a Uint8Array/blob`
  • the same Bytes without schema metadata stays the raw base64 string

That inconsistency is probably worse than preserving the raw wire types. If we only use the types from list_snapshot / document_deltas, then behavior is stable:

  • Convex Int64 is always a string
  • Convex Bytes is always a base64 string
  • number / float64 is always a JS number
  • booleans come through as booleans, which are already accepted by the Convex-to-SQLite conversion layer

Then users can explicitly normalize ambiguous fields in Sync Streams rules, e.g. CAST(points AS INTEGER) for Int64 columns. This is predictable and avoids cases where the same column changes type depending on whether a populated value existed when json_schemas was fetched.

I think we can still keep json_schemas for table discovery and admin/diagnostic schema reporting, but avoid using it to coerce replicated row values.

@stevensJourney
Copy link
Copy Markdown
Collaborator

After making the above changes for JSON Schema usage, some additional improvements could be made to cater for schema changes.

After digging into how the Convex replicator actually uses source metadata, we’ve narrowed the schema-change problem down quite a bit.

For most of our other replicators, schema-change detection is important because it protects against things like replica identity changes, stale relation metadata, table renames/drops, and DDL changes that require a re-snapshot. Convex is different in a few important ways:

  • _id is always the replication identity, so there is no replica-id drift to detect.
  • Runtime row conversion does not use json_schemas; it uses the actual JSON document payload from list_snapshot and document_deltas.
  • Field additions, removals, and type changes should therefore flow through normal document mutations/deltas.
  • Convex data migrations are expected to be online document writes, so they should replicate as data changes rather than schema-triggered re-snapshots.

The one remaining question was wildcard table discovery. We added an integration test to verify that json_schemas lists schema-defined tables even when they contain no documents. That passed, which means initial wildcard expansion can discover empty tables up front. Based on that, the stream no longer needs to snapshot a table inline when it is first observed in document_deltas; if the table appears later in deltas, the delta payload is the source of truth.

Code/docs changes from this:

  • Removed the Convex stream’s schema cache / forced schema refresh path.
  • Exact table patterns now resolve directly from Sync Streams rules.
  • Wildcards still use json_schemas for initial expansion.
  • Newly observed selected tables in document_deltas are resolved and marked snapshot-complete, then the delta row is applied directly. No inline snapshot.
  • Added/updated tests around exact table resolution, wildcard discovery, and empty schema-defined tables.
  • Added docs/convex/schema-change-handling.md with the rationale and limitations.
  • Updated the Convex README to reflect the new behavior.

One important limitation we validated: deleting a table from the Convex dashboard does not emit per-document _deleted rows in document_deltas. That means previously replicated rows can remain synced to clients. The docs now recommend using the dashboard “Clear Table” action before deleting a table, or deleting documents through mutation paths that emit document deltas. Otherwise, dashboard/schema-only table removal needs to be treated as a sync-rule/deployment state change where affected PowerSync state may need to be cleared or re-replicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants