feat(cells): Add post deploy migration to fix OrganizationMapping.date_created field#115406
feat(cells): Add post deploy migration to fix OrganizationMapping.date_created field#115406lynnagara wants to merge 8 commits into
Conversation
…e_created fields Follow-up to #115325. That PR made `update_organization_mapping_from_instance` carry `Organization.date_added` into `OrganizationMapping.date_created` going forward, but existing mapping rows still hold whatever `timezone.now()` was captured at mapping insert time. This migration repairs them. The migration emits one `CellOutbox` per `Organization` with `category=ORGANIZATION_UPDATE`. The standard cell receiver drains each one and re-runs `Organization.handle_async_replication`, which now plumbs `date_added` → `date_created` through to the mapping. This is a post-deploy migration to be triggered manually. Note: this migration replicates the whole row. `date_updated` wil be bumped on every row. Nothing reads that today (Synapse will, later). One alternative appraoch that I considered was a new temporary, more targeted RPC method that updates only the `date_created` row. I decided to reuse the existing RPC method here - creating all the throwaway methods, receivers, categories is probably not worth it for a one-shot migration, and adds additional risk associated with incorrect implementation.
|
This PR has a migration; here is the generated SQL for for --
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL |
| for org_id in Organization.objects.values_list("id", flat=True).iterator( | ||
| chunk_size=_BATCH_SIZE | ||
| ): |
There was a problem hiding this comment.
I don't completely remember why, but I feel like .iterator doesn't work for us... I think it tries to use server side cursors maybe, which don't work due to pgbouncer
Instead, you can use something like
for org_id in RangeQuerySetWrapperWithProgressBar(
Organization.objects.values_list(id", flat=True),
result_value_getter=lambda values: values[0],
):
There was a problem hiding this comment.
Alternatively, there aren't that many orgs. You could just bring the whole set of org ids into memory and iterate over them
There was a problem hiding this comment.
Wouldn't using RangeQuerySetWrapperWithProgressBar be a solution we know is going to work?
There was a problem hiding this comment.
It's the more standard solution, I just like to give folks options
markstory
left a comment
There was a problem hiding this comment.
Did you consider using the existing outbox backfill tooling for this? By creating this many outboxes all at once, we are more likely to impact latency for customer operations, and potentially contribute to contention in postgres.
| for org_id in Organization.objects.values_list("id", flat=True).iterator( | ||
| chunk_size=_BATCH_SIZE | ||
| ): |
There was a problem hiding this comment.
Wouldn't using RangeQuerySetWrapperWithProgressBar be a solution we know is going to work?
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 08b272d. Configure here.
are you saying incrementing the replication version will take care of everything? |
|
This PR has a migration; here is the generated SQL for for --
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL |
Backend Test FailuresFailures on
|
I'm not too familiar with this tooling. If we don't need to have this all run on self hosted maybe it makes more sense? |
… date_created (#115423) Triggers backfill_outboxes.py to re-replicate every Organization upserting OrganizationMapping.date_created with the value from Organization.date_added now that #115325 plumbs it through. Alternative to #115406 Roll out: - needs to be flipped to 5 in sentry-options-automator to take effect in production
|
superceded by #115423 |

Context
Follow-up to #115325. That PR made
update_organization_mapping_from_instancecarryOrganization.date_addedintoOrganizationMapping.date_createdgoing forward, but existing mapping rows still hold whatevertimezone.now()was captured at mapping insert time. This migration repairs them.Implementation
The migration emits one
CellOutboxperOrganizationwithcategory=ORGANIZATION_UPDATE. The standard cell receiver drains each one and re-runsOrganization.handle_async_replication, which now plumbsdate_added→date_createdthrough to the mapping.Roll out notes
This is a post-deploy migration to be triggered manually.
The migration replicates the whole row. The
date_updatedfield will be bumped on every row. Since nothing reads that today (only Synapse will, later), this should be safe.An increase in memcached set commands are expected during the migration as caches will be updated for each row. Given the number of orgs (relatively small) and the outbox delivery concurrency cap of 5, this volume is trivial for memcached.
Alternative approaches considered
An alternative approach considered was a new temporary, more targeted RPC method that updates only the
date_createdrow and not the rest of the fields. However creating all the throwaway methods, receivers, categories comes with it's own cost and adds additional risk associated with incorrect implementation. It's probably not worth it for a one-shot migration.