chore(container-runtime): remove DisableBatchIdTracking kill-switch#27296
chore(container-runtime): remove DisableBatchIdTracking kill-switch#27296dannimad wants to merge 2 commits into
Conversation
Removes the Fluid.ContainerRuntime.DisableBatchIdTracking config kill-switch and gates batchId tracking on the existing Fluid.Container.enableOfflineFull opt-in. The internal class member is renamed to offlineEnabled to reflect its single source of truth. Containers that do not opt into Offline Load no longer run DuplicateBatchDetector; the natural off-ramp for a batchId tracking regression is now to disable enableOfflineFull. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (117 lines, 5 files), I've queued these reviewers:
How this works
|
There was a problem hiding this comment.
Pull request overview
This PR removes the Fluid.ContainerRuntime.DisableBatchIdTracking kill-switch and makes batchId tracking (including DuplicateBatchDetector and batchId stamping on resubmit) conditional on the existing Offline Load opt-in (Fluid.Container.enableOfflineFull). This narrows duplicate batch detection to the “forked container via Offline Load” scenario and updates tests/documentation accordingly.
Changes:
- Removed the
DisableBatchIdTrackingconfig kill-switch and gated batchId tracking onFluid.Container.enableOfflineFull. - Updated container-runtime tests to explicitly opt into Offline Load when exercising batchId tracking / duplicate detection behavior.
- Updated end-to-end test to rely on default-off detector behavior (no longer needs the kill-switch).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/test/test-end-to-end-tests/src/test/fewerBatches.spec.ts | Removes use of the kill-switch in the out-of-order op test and updates commentary to reflect Offline Load gating. |
| packages/runtime/container-runtime/src/test/containerRuntime.spec.ts | Updates expectations and opts tests into Offline Load where batchId tracking / duplicate detection is required. |
| packages/runtime/container-runtime/src/containerRuntime.ts | Replaces kill-switch + default-on tracking with enableOfflineFull as the single source of truth; renames the internal flag accordingly. |
| .changeset/calm-batches-track.md | Adds a release note entry describing the removal and behavior change. |
Comments suppressed due to low confidence (1)
packages/runtime/container-runtime/src/containerRuntime.ts:1908
- Same as above: this UsageError message refers to "Offline mode" while the rest of the code/comments refer to the "Offline Load" feature. Updating the wording would make the error clearer and consistent.
if (this.offlineEnabled && !this.groupedBatchingEnabled) {
const error = new UsageError("Offline mode requires grouped batching to be enabled");
this.closeFn(error);
throw error;
| Remove the `Fluid.ContainerRuntime.DisableBatchIdTracking` config kill-switch and gate batchId tracking on the Offline Load opt-in | ||
|
|
||
| The internal `Fluid.ContainerRuntime.DisableBatchIdTracking` config flag has been removed. It was previously used as a kill-switch to suppress batchId stamping and `DuplicateBatchDetector` activation when both `FlushMode.TurnBased` and grouped batching were enabled. The flag is no longer needed: batchId tracking is now enabled iff the Offline Load feature is opted into via `Fluid.Container.enableOfflineFull`, which is also the natural off-ramp if a regression is observed. | ||
|
|
||
| Containers that do not opt into Offline Load no longer run `DuplicateBatchDetector`. Forked-container duplicate detection now requires the Offline Load opt-in. |
| if (this.offlineEnabled && this._flushMode !== FlushMode.TurnBased) { | ||
| const error = new UsageError("Offline mode is only supported in turn-based mode"); | ||
| this.closeFn(error); | ||
| throw error; |
| // sent as a placeholder grouped batch to preserve their batchId (see | ||
| // OpGroupingManager.createEmptyGroupedBatch / outbox.flushEmptyBatch). | ||
| this.offlineEnabled = | ||
| this.mc.config.getBoolean("Fluid.Container.enableOfflineFull") === true; |
There was a problem hiding this comment.
Deep Review: The new runtime gate is strictly narrower than the loader's offline opt-in, which silently regresses fork detection for Fluid.Container.enableOfflineLoad consumers.
The divergence. Runtime now derives offlineEnabled from a single config:
this.offlineEnabled = this.mc.config.getBoolean("Fluid.Container.enableOfflineFull") === true;But the loader treats offline-load as enabled when any of three signals is set (packages/loader/container-loader/src/container.ts:964-968):
const offlineLoadEnabled =
this.isInteractiveClient &&
(this.mc.config.getBoolean("Fluid.Container.enableOfflineLoad") ??
this.mc.config.getBoolean("Fluid.Container.enableOfflineFull") ??
options.enableOfflineLoad !== false);The first arm — Fluid.Container.enableOfflineLoad — is a live (non-deprecated) config flag. Only the third typed-options arm is @deprecated Do not use (packages/common/container-definitions/src/loader.ts:608-616).
Concrete consequence. A host opted into offline load via Fluid.Container.enableOfflineLoad continues to wire up SerializedStateManager / closeAndGetPendingLocalState at the loader, but the runtime no longer instantiates DuplicateBatchDetector (this file, the if (this.offlineEnabled) allocation) and no longer stamps batchId on resubmit (batchId: this.offlineEnabled ? batchId : undefined). The exact forked-container scenario DuplicateBatchDetector was built to catch (#21767, #22497) goes silently undetected.
Why pre-PR was safe. Pre-PR, runtime tracking was default-on (TurnBased + grouped batching ⇒ tracking on), accidentally papering over the divergence so enableOfflineLoad consumers still got the detector. This PR removes that safety net without unifying the gates or documenting the migration. The changeset only states "Forked-container duplicate detection now requires the Offline Load opt-in" — implying a single opt-in when there are three at the loader layer.
Historical. tyler-cai-microsoft flagged this exact multi-flag/loader-runtime back-compat concern in #21767 inline on containerRuntime.ts:4134 (2024-07-10); markfields replied "I'll follow up... since none of this has shipped yet." The follow-up never landed; this PR cements the divergence.
Precedent for the union. packages/loader/container-loader/src/snapshotRefresher.ts:98-101 already accepts either Fluid.Container.enableOfflineSnapshotRefresh or Fluid.Container.enableOfflineFull — the codebase has a pattern for honoring multiple offline opt-ins at downstream layers.
Pick one:
- (a) Widen the runtime gate to
enableOfflineLoad ?? enableOfflineFull, mirroringsnapshotRefresher.ts:98-101. Lowest-risk — preserves pre-PR behavior for legacy-flag consumers without re-introducing the kill-switch. - (b) Propagate the loader's resolved
offlineLoadEnabled(the full three-arm union fromcontainer.ts:964-968) into the runtime viaIContainerContextso both layers always agree. Bigger surface change, eliminates the divergence permanently. - (c) If the narrowing is intentional, expand
.changeset/calm-batches-track.mdand theofflineEnableddoc-comment (containerRuntime.ts:1457-1464) to call out that hosts usingFluid.Container.enableOfflineLoad(or relying onIContainerLoadMode.enableOfflineLoad: true) must migrate toFluid.Container.enableOfflineFullto retainDuplicateBatchDetectorand resubmit batchId stamping. Add aUsageErroror telemetry warning when the loader has serialized-pending-state plumbing engaged but the runtime gate is off. If you go this route, also clarify theoutbox.ts:358-368"we do it always" comment to referenceenableOfflineFull, sinceflushAll's placeholder grouped-batch path is now transitively gated on the same flag.
Question for you: for consumers currently enabling offline load via Fluid.Container.enableOfflineLoad (the config flag, not the deprecated typed option), is the intent that they migrate to Fluid.Container.enableOfflineFull, or should the runtime honor the union?
Deep ReviewReviewed commit Readiness: 5/10 — MAKING PROGRESS Not ready for sign-off. The single-flag runtime gate ( Path to Ready
Context for Reviewers
For human reviewer
Review history (1 prior review)
|
|
@dannimad i don't think we should remove this yet. the feature hasn't even shipped yet. the kill switch exists for use to turn this off if we see issues running the feature. we can't remove it until we are confident the feature is working well. |
Description
Removes the
Fluid.ContainerRuntime.DisableBatchIdTrackingconfig kill-switch and ties batchId tracking (which powersDuplicateBatchDetectorand batchId stamping on resubmit) to the existingFluid.Container.enableOfflineFullopt-in. The kill-switch had no known consumers, andenableOfflineFullis the natural off-ramp if a regression in batchId tracking appears.Behavior change: containers that do not opt into Offline Load no longer run
DuplicateBatchDetector. Forked-container duplicate detection now requires the Offline Load opt-in. The internal class memberbatchIdTrackingEnabledis renamed toofflineEnabledto reflect its new single source of truth.Reviewer Guidance
The review process is outlined on this wiki page.
Process empty batch,Can roundtrip DuplicateBatchDetector state through summary/snapshot) now setenableOfflineFull: truebecause they exercise behavior that is now gated.fewerBatches.spec.ts"Op reentry submits two batches" previously used the kill-switch to suppress the detector for an unrelated artificial sequence-number reuse; it now relies on the default-off behavior with no flag.