Convex replicator module#574
Conversation
built using gpt-5.3
sync rules will mostly have multiple table
it's always a timestamp
hardcode to 60
also cleaned up handling of convex types
# Conflicts: # packages/schema/src/scripts/compile-json-schema.ts # packages/schema/tsconfig.json
| await this.assertHostAllowed(); | ||
|
|
||
| const primaryPath = `/api/${options.endpoint}`; | ||
| const fallbackPath = `/api/streaming_export/${options.endpoint}`; |
There was a problem hiding this comment.
I could see this being used on either cloud or self hosted. It seems like the /api/streaming_export fallback routes don't get used. I've removed this for this reason.
|
As part of reviewing the initial POC, I went through the implementation in a bit more detail and then addressed a few items that seemed worth tightening up before this lands. The main thing I spent time on was the replication flow itself. I reviewed the snapshot + delta approach, where we pin one global Convex snapshot boundary, snapshot selected tables at that same boundary, and then resume I verified the write checkpoint flow. The important ordering here is that we read the Convex head, create the managed PowerSync write checkpoint in the callback, and only then write the Convex marker mutation. That marker write is what gives an idle Convex deployment a later observable delta so the checkpoint can actually be acknowledged. AI generated docs have been added to give more detail about this process. We can remove those if necessary, perhaps they might help other AI agents in the future. I reviewed the route API adapter as well, especially On the testing side, I added real integration tests against a local Convex backend and wired them into CI. These cover the module connection path, route API adapter, streaming replication, and resumable snapshots. Some of the original storage-mocking tests were AI-generated. They are less important now that we have real local Convex integration tests, but I left them in place since they might still have some use. I also did a pass on the Convex API client and value conversion. The current Convex API response typings have been checked with cloud and self-hosted integration tests. For Convex -> SQLite conversion, I verified the JSON schema responses received from Convex backends and made some cleanup improvements. This process has some limitations which are listed as known issues. I'll mention more on these in an upcoming docs PR. If you'd like to take this for a spin, feel free to try the React Convex Todolist demo from powersync-ja/powersync-js#952 - this uses a development PowerSync service image. Note that this demo will be moved to its own repository soon. AI Usage disclaimer: |
| // TODO! It seems like Convex might not report the schema value for values which have not | ||
| // been populated in the DB yet. This can cause many issues - and we need to work around this. | ||
| // We perform runtime checks and conversions at this point. | ||
| if (value == null) { |
There was a problem hiding this comment.
After some additional thought, I’m leaning toward disabling json_schemas for SQLite row conversion in the Convex replication path.
The issue is that using schema metadata makes row values inconsistent depending on whether Convex happened to report a field in json_schemas at the time we cached it:
Int64with schema metadata becomesbigint- the same
Int64without schema metadata stays the raw JSON string Byteswith schema metadata becomes aUint8Array/blob`- the same
Byteswithout schema metadata stays the raw base64 string
That inconsistency is probably worse than preserving the raw wire types. If we only use the types from list_snapshot / document_deltas, then behavior is stable:
- Convex
Int64is always a string - Convex
Bytesis always a base64 string number/float64is always a JS number- booleans come through as booleans, which are already accepted by the Convex-to-SQLite conversion layer
Then users can explicitly normalize ambiguous fields in Sync Streams rules, e.g. CAST(points AS INTEGER) for Int64 columns. This is predictable and avoids cases where the same column changes type depending on whether a populated value existed when json_schemas was fetched.
I think we can still keep json_schemas for table discovery and admin/diagnostic schema reporting, but avoid using it to coerce replicated row values.
|
After making the above changes for JSON Schema usage, some additional improvements could be made to cater for schema changes. After digging into how the Convex replicator actually uses source metadata, we’ve narrowed the schema-change problem down quite a bit. For most of our other replicators, schema-change detection is important because it protects against things like replica identity changes, stale relation metadata, table renames/drops, and DDL changes that require a re-snapshot. Convex is different in a few important ways:
The one remaining question was wildcard table discovery. We added an integration test to verify that Code/docs changes from this:
One important limitation we validated: deleting a table from the Convex dashboard does not emit per-document |
…-schemas list of tables (for non-wildcard table patterns).
This adds support for Convex as a replication source. Since Convex itself is open source (technically also FSL), it was quite feasible for me to implement this.
As with any datastore, there are many quirks. I've attempted to document pertinent ones in a README.md in the module root. Required reading is the section titled "Mutation Transaction Atomicity".
To get a feel for the system, run the
convexself-host-demo: https://github.com/powersync-ja/self-host-demo/tree/convex-demo. Then simply open two instances of the Convex React demo app (new) (http://localhost:3030) side by side.When running the demo, to log into the Convex dashboard, you need to jump through some hoops:
convex-keygenDocker containerTODO
powersync_checkpointstable and write mutation function exist