Skip to content

feat: canonicalize highlights across channels#3838

Draft
idoshamun wants to merge 4 commits into
mainfrom
codex/canonical-highlights
Draft

feat: canonicalize highlights across channels#3838
idoshamun wants to merge 4 commits into
mainfrom
codex/canonical-highlights

Conversation

@idoshamun
Copy link
Copy Markdown
Member

What changed

  • reworked highlights to use a canonical post_highlight row per post plus a new post_highlight_channel placement table
  • migrated the generation flow from per-channel runs to a single global evaluation pass with placement into all matching publish channels
  • replaced the legacy api.v1.post-highlighted flow with a canonical api.v1.highlight-created topic and migrated the dependent workers
  • updated GraphQL highlights reads, sitemap queries, cleanup, and focused tests to use the canonical model

Why

The previous model treated channel as part of the highlight identity. That duplicated highlights across channels, forced deduplication in read paths, and missed coverage because each channel was generated independently.

This change makes highlights canonical first, evaluates candidate stories once, and keeps channel membership as placement metadata.

Impact

  • one highlight row now represents one story
  • channel pages read through live placements
  • /highlights continues to be significance-filtered, but now reads canonical highlights directly
  • highlight-created side effects now fire once per canonical creation instead of once per channel row

Validation

  • NODE_ENV=test pnpm run db:migrate:reset
  • NODE_ENV=test npx jest __tests__/highlights.ts __tests__/workers/generateChannelHighlight.ts __tests__/workers/newHighlightRealTime.ts __tests__/cron/cleanChannelHighlights.ts __tests__/workers/majorHighlightTweet.ts __tests__/cron/channelHighlights.ts __tests__/sitemaps.ts __tests__/workers/cdc/primary.ts --testEnvironment=node --runInBand --modulePathIgnorePatterns='<rootDir>/build/'
  • pnpm run build

Notes

  • the targeted Jest run still reports an existing worker/open-handle shutdown warning after all suites pass

@pulumi
Copy link
Copy Markdown

pulumi Bot commented Apr 30, 2026

🍹 The Update (preview) for dailydotdev/api/prod (at 36769ef) was successful.

✨ Neo Explanation

This is a significant schema migration and event system refactor that replaces per-channel highlight rows with a canonical-highlight + placement-join-table model. The primary risk is the destructive migration (legacy table dropped, `reason` column removed, unique key changed) running during a live rolling deployment, where a brief overlap with old-schema writer code could cause constraint violations. 🟡 Moderate Risk

This PR refactors the channel highlights system to a canonical data model: a single post_highlight row per post (unique on postId only) with a new post_highlight_channel join table tracking per-channel placements. It also consolidates the per-channel fan-out into a single global api.v1.generate-highlights event and replaces the protobuf-based PostHighlightedMessage with plain JSON on a new api.v1.highlight-created topic.

🟡 Warning — Destructive database migration running at deploy time

The CanonicalHighlights1776200000000 migration renames post_highlight to post_highlight_legacy, creates a new post_highlight table (with a unique constraint on postId alone, dropping the old channel+postId unique key and the reason column), backfills data via a deduplication CTE, then drops post_highlight_legacy. The dedup logic picks one canonical row per postId (preferring live over retired, then most recent), which means any post previously highlighted in multiple channels simultaneously will be collapsed to a single canonical record. This is intentional but irreversible once the legacy table is dropped. Reviewers should verify the CTE dedup logic correctly handles the edge case where the same post had active highlights in multiple channels at once.

🟡 Warning — Deployment ordering risk

The old post_highlight schema enforced uniqueness on (channel, postId). The new schema enforces uniqueness on postId alone. If any old-schema code (still in-flight pods) attempts to write a new highlight row after the migration completes but before the rolling update finishes, it will hit a unique constraint violation on postId. The risk window is the rolling deployment interval.

🔵 Info — PubSub subscription cutover

The four old subscriptions (api.new-highlight-real-time, api.major-highlight-tweet, api.major-headline-added-notification, api.generate-channel-highlight) are deleted and replaced with -v2 versions pointing to new topics. Messages in-flight on the old subscriptions at the time of deletion will be lost, but since these are highlight generation/notification events (not financial or audit-critical), a brief gap is acceptable. The old CDC-based api.v1.post-highlighted event path is also removed entirely; events are now only emitted from the worker after successful replaceHighlights(), which is the correct place.

🔵 Info — New api.agentic-digest-tweet subscription

A net-new worker subscription is added on the api.v1.post-visible topic for agentic digest tweet functionality.

Resource Changes

    Name                                                    Type                                  Operation
~   vpc-native-update-tags-str-cron                         kubernetes:batch/v1:CronJob           update
~   vpc-native-update-trending-cron                         kubernetes:batch/v1:CronJob           update
+   api-sub-api.major-highlight-tweet-v2                    gcp:pubsub/subscription:Subscription  create
~   vpc-native-clean-zombie-opportunities-cron              kubernetes:batch/v1:CronJob           update
~   vpc-native-personalized-digest-deployment               kubernetes:apps/v1:Deployment         update
~   vpc-native-clean-gifted-plus-cron                       kubernetes:batch/v1:CronJob           update
~   vpc-native-materialize-monthly-best-post-archives-cron  kubernetes:batch/v1:CronJob           update
+   api-sub-api.major-headline-added-notification-v2        gcp:pubsub/subscription:Subscription  create
~   vpc-native-clean-stale-user-transactions-cron           kubernetes:batch/v1:CronJob           update
~   vpc-native-generate-search-invites-cron                 kubernetes:batch/v1:CronJob           update
-   api-sub-api.major-headline-added-notification           gcp:pubsub/subscription:Subscription  delete
~   vpc-native-hourly-notification-cron                     kubernetes:batch/v1:CronJob           update
~   vpc-native-materialize-yearly-best-post-archives-cron   kubernetes:batch/v1:CronJob           update
~   vpc-native-private-deployment                           kubernetes:apps/v1:Deployment         update
+   vpc-native-api-db-migration-aafa0f5e                    kubernetes:batch/v1:Job               create
~   vpc-native-temporal-deployment                          kubernetes:apps/v1:Deployment         update
~   vpc-native-user-profile-analytics-clickhouse-cron       kubernetes:batch/v1:CronJob           update
~   vpc-native-generic-referral-reminder-cron               kubernetes:batch/v1:CronJob           update
~   vpc-native-user-profile-updated-sync-cron               kubernetes:batch/v1:CronJob           update
~   vpc-native-channel-digests-cron                         kubernetes:batch/v1:CronJob           update
~   vpc-native-post-analytics-clickhouse-cron               kubernetes:batch/v1:CronJob           update
~   vpc-native-daily-digest-cron                            kubernetes:batch/v1:CronJob           update
~   vpc-native-update-current-streak-cron                   kubernetes:batch/v1:CronJob           update
~   vpc-native-expire-super-agent-trial-cron                kubernetes:batch/v1:CronJob           update
~   vpc-native-rotate-daily-quests-cron                     kubernetes:batch/v1:CronJob           update
~   vpc-native-ws-deployment                                kubernetes:apps/v1:Deployment         update
~   vpc-native-update-tag-materialized-views-cron           kubernetes:batch/v1:CronJob           update
~   vpc-native-clean-channel-highlights-cron                kubernetes:batch/v1:CronJob           update
-   api-sub-api.new-highlight-real-time                     gcp:pubsub/subscription:Subscription  delete
-   vpc-native-api-db-migration-601d1ed0                    kubernetes:batch/v1:Job               delete
~   vpc-native-calculate-top-readers-cron                   kubernetes:batch/v1:CronJob           update
~   vpc-native-rotate-weekly-quests-cron                    kubernetes:batch/v1:CronJob           update
-   vpc-native-api-clickhouse-migration-601d1ed0            kubernetes:batch/v1:Job               delete
+   vpc-native-api-clickhouse-migration-aafa0f5e            kubernetes:batch/v1:Job               create
~   vpc-native-check-analytics-report-cron                  kubernetes:batch/v1:CronJob           update
~   vpc-native-channel-highlights-cron                      kubernetes:batch/v1:CronJob           update
-   api-sub-api.generate-channel-highlight                  gcp:pubsub/subscription:Subscription  delete
~   vpc-native-sync-subscription-with-cio-cron              kubernetes:batch/v1:CronJob           update
~   vpc-native-bg-deployment                                kubernetes:apps/v1:Deployment         update
~   vpc-native-deployment                                   kubernetes:apps/v1:Deployment         update
~   vpc-native-update-views-cron                            kubernetes:batch/v1:CronJob           update
... and 20 other changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant