Skip to content

Detect missed deletes#368

Open
Yostra wants to merge 8 commits intomainfrom
detect_missed_deletes
Open

Detect missed deletes#368
Yostra wants to merge 8 commits intomainfrom
detect_missed_deletes

Conversation

@Yostra
Copy link
Copy Markdown
Collaborator

@Yostra Yostra commented May 7, 2026

Summary

Adds a reconcile-cleanup path that detects records hard-deleted in Stripe but missed by the event stream, and tombstones them in the destination on the next reconcile pass.

This introduces two optional connector capabilities:

  • Destination.getStaleRecords — emits batches of { stream, ids[] } for rows whose destination-stamped _last_synced_at predates the current syncRunStartedAt, optionally scoped by a filter (e.g. { _account_id }) for safe multi-tenant operation.
  • Source.verifyRecords — given those batches, re-fetches each record upstream and yields a recordDeleted: true message for anything Stripe returns as 404 or { deleted: true }.

Composing the two via the destination's existing write path turns "missing in source" into a tombstone in the destination — no special delete primitive required.

Destination stamp column

The "is this row stale?" check needs a destination-stamped timestamp that advances on every successful sync. Both destinations now use the same column name:

Destination Column Notes
postgres _last_synced_at Already in the table DDL on main; this PR makes upsertMany actually populate it.
google_sheets _last_synced_at Re-introduces the writer behavior that was reverted on main, aligned with the Postgres column name for cross-destination consistency.

_last_synced_at is intentionally not marked as a volatile column in the Postgres upsert — every sync (including no-op rows) advances the timestamp, so each reconcile pass converges on a shrinking set of stale ids instead of an ever-growing pile.

Temporal activity

reconcileCleanup(pipelineId, syncRunStartedAt) resolves the destination via a small whitelist (postgres, google_sheets), composes getStaleRecords → verifyRecords → write, and heartbeats per stream and every 15s while writing. Failures are logged and swallowed so the next reconcile interval re-runs.

The activity is registered in createActivities and a proxy is declared in workflows/_shared.ts for future workflow integration. No production workflow calls it yet — kept inert pending a separate change to wire it into pipeline-lifecycle.

CI

  • Disables the docs (Vercel deploy) and e2e_cdn jobs (if: false) — the Vercel project was deleted. Re-enable by removing the flag once a new project is provisioned. Both jobs are kept in the workflow file unchanged, so flipping them back on is a one-character change.

Test plan

E2E (MockActivityEnvironment runs the production activity end-to-end). Each suite seeds two customers via the in-process engine, hard-deletes one in Stripe without replaying the customer.deleted event, runs the activity, and asserts the doomed row is tombstoned while the survivor remains. The customers stream name matches the Stripe source's catalog on main.

# Postgres
POSTGRES_URL=postgres://postgres:postgres@localhost:55432/postgres \
  pnpm --filter @stripe/sync-e2e exec vitest run stripe-reconcile-cleanup.test.ts

# Sheets — additionally requires GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET / GOOGLE_REFRESH_TOKEN.
# Optional GOOGLE_SPREADSHEET_ID to reuse an existing sheet instead of creating one.
pnpm --filter @stripe/sync-e2e exec vitest run stripe-reconcile-cleanup.test.ts

@Yostra Yostra force-pushed the detect_missed_deletes branch from a0f9db7 to ddf19aa Compare May 7, 2026 06:17
@Yostra Yostra force-pushed the detect_missed_deletes branch from 757e0e9 to 3074828 Compare May 7, 2026 18:32
@Yostra Yostra marked this pull request as ready for review May 7, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant