Skip to content

feat(registry): add source column to discovered_properties for property reconciliation#4125

Draft
bokelley wants to merge 1 commit intomainfrom
claude/issue-4111-discovered-properties-source
Draft

feat(registry): add source column to discovered_properties for property reconciliation#4125
bokelley wants to merge 1 commit intomainfrom
claude/issue-4111-discovered-properties-source

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 5, 2026

Closes #4111

Summary

Resolves the known limitation in hosted-property-sync.ts where properties removed from an AAO-hosted manifest persisted in the registry until manually cleared. Adds a source column ('crawler' | 'aao_hosted') to discovered_properties (migration 467) and upgrades the sync from additive-only to full reconcile.

Key design decisions:

  • Hosted manifest is authoritative: any discovered_properties row for the domain not in the current manifest is deleted, regardless of source. A publisher who removes a property sees it gone on next sync.
  • Crawler facts preserved on conflict: if the row already has source='crawler', the crawler's identifiers, tags, and source label are preserved — origin-verified facts take precedence over hosted-manifest values.
  • Advisory-lock transaction: property upserts + reconcile DELETE run inside a single transaction guarded by pg_advisory_xact_lock(hashtext('dp:'||domain)) to prevent concurrent-sync interleave races (same pattern as org-intake-lock.ts).
  • (name, property_type) keying: the DELETE uses unnest-based tuple matching so a property reclassified to a different type is correctly removed rather than silently left as a ghost row.

Also corrects the factual contradiction in the hosted-property-fed-index-sync.md changeset, which previously described discovered_properties as additive-only.

Non-breaking justification

Adds optional source TEXT NOT NULL DEFAULT 'crawler' column to discovered_properties with a safe default — existing crawler rows backfill as 'crawler' with no rewrite. The upsertProperty interface gains an optional source field; all existing callers that don't pass it get 'crawler'. The registry API response shape is unchanged. Changeset is --empty (server-side sync logic, not the published AdCP protocol spec).

Pre-PR review

  • code-reviewer: approved after two passes — stale comment updated, statement_timeout added, identifiers/tags guard fixed, synced counter after-await confirmed, throw condition analyzed and correct (gated on errors > 0).
  • adtech-product-expert: approved after two passes — (name, property_type) keying correct, source-scope removal from DELETE correct (crawler rows for in-manifest properties survive via NOT EXISTS subquery), behavioral commitment documented in code comments.

Nits (not fixed, noted for reviewers):

  • source='adagents_json' rows (distinct from 'crawler') would be overwritten to 'aao_hosted' on conflict — pre-existing enum gap in the schema, out of scope for this PR.
  • Authorization reconcile loop is not under the same transaction scope as the property reconcile — concurrent-sync window exists there too (same exposure as before this PR).

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

  • Push fixup commits directly: gh pr checkout <num>
    fix → push.
  • Or re-trigger: comment /triage execute on the source
    issue.

See #3121
for context.

Session: https://claude.ai/code/session_01P6jrAAgREAnLNQodRxEyfQ


Generated by Claude Code

…ty reconciliation

Resolves the known limitation in hosted-property-sync where removed
properties persisted indefinitely. Adds a `source` column
('crawler' | 'aao_hosted') to `discovered_properties` (migration 467) so
the hosted-property sync can identify which rows it wrote and reconcile
(delete) them when the publisher removes properties from their manifest.

Key design decisions:
- The hosted manifest is authoritative for the publisher's property list;
  any row not in the current manifest is deleted, regardless of source.
- On conflict with a crawler-attested row (source='crawler'), the crawler's
  source label, identifiers, and tags are preserved — origin-verified facts
  take precedence over hosted-manifest values.
- Property upserts and the reconcile DELETE run inside a single transaction
  guarded by a domain-scoped advisory lock to prevent concurrent-sync races.

Closes #4111

https://claude.ai/code/session_01P6jrAAgREAnLNQodRxEyfQ
@bokelley bokelley added the claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage. label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AAO: discovered_properties.source column to enable property-removal reconciliation in hosted sync

2 participants