feat(batch-queue): two-level tenant dispatch for fair queue by ericallam · Pull Request #3133 · triggerdotdev/trigger.dev

ericallam · 2026-02-25T22:39:32Z

Replace flat master queue index with two-level tenant dispatch to fix
noisy neighbor problem. When a tenant has many queues at capacity, the
scheduler now iterates tenants (Level 1) not queues, then fetches
per-tenant queues (Level 2) only for eligible tenants.

Single-deploy migration: new enqueues write to dispatch indexes only,
consumer drains old master queue alongside new dispatch path until empty.

Replace flat master queue index with two-level tenant dispatch to fix noisy neighbor problem. When a tenant has many queues at capacity, the scheduler now iterates tenants (Level 1) not queues, then fetches per-tenant queues (Level 2) only for eligible tenants. Single-deploy migration: new enqueues write to dispatch indexes only, consumer drains old master queue alongside new dispatch path until empty. refs TRI-7082

changeset-bot · 2026-02-25T22:39:40Z

⚠️ No Changeset found

Latest commit: 321b462

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-02-25T22:39:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR implements a two-level tenant dispatch for fair-queue: a new TenantDispatch manages shard-level tenant indices (Level 1) and per-tenant queue indices (Level 2). FairQueue integrates TenantDispatch and wires it into enqueue, batch-enqueue, dequeue/claim, release/complete/retry, and shutdown paths. New Redis Lua V2 scripts and commander commands update both index levels and propagate tenantId. DRR scheduler gains a dispatch-based selection path with a legacy fallback; key producers, visibility flows, telemetry (dispatchLength), and public types/exports are extended. Tests for tenant dispatch, visibility, and race conditions are added/updated.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is missing required template sections including the issue link, checklist, testing details, and changelog. While it contains a brief summary of the change, it does not follow the repository's PR description template.	Complete the PR description using the template: add the issue link (Closes #...), check off the checklist items, document testing steps, and expand the changelog section with more detail.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: introducing a two-level tenant dispatch architecture for the fair queue to address the noisy-neighbor problem.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/tri-7082-batch-queue-fair-queue-architecture-change

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The releaseMessage and releaseMessageBatch Lua scripts were still writing to the old master queue shards. Updated them to write to the new dispatch indexes instead, so released/reclaimed messages go into the new two-level index atomically. Also removed the legacyDrainComplete flag in favor of checking ZCARD on each iteration (O(1)), and removed the redundant legacyDrainComplete otel metric since master_queue.length already shows drain status.

…ests Tests that release (retry) and reclaim (visibility timeout) correctly update dispatch indexes instead of master queue. Also tests that legacy pre-deploy messages in the old master queue migrate to the new dispatch index when they get reclaimed or retried.

The fallback path for schedulers without selectQueuesFromDispatch was building allQueues from the dispatch index but then ignoring it and calling scheduler.selectQueues with the old master queue key (which is empty for new messages). Fixed to group the fetched queues by tenant directly with capacity filtering.

coderabbitai

🧹 Nitpick comments (2)

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts (1)

850-863: Consider using vitest's waitFor helper instead of custom implementation.

The custom waitFor function duplicates functionality available in vitest. However, this implementation is simple and works correctly.

♻️ Alternative using vitest

import { vi } from "vitest";

// Usage in tests:
await vi.waitFor(async () => {
  expect(await condition()).toBe(true);
}, { timeout: 5000, interval: 50 });

This would align with the pattern used in raceConditions.test.ts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts` around
lines 850 - 863, Replace the custom waitFor helper with vitest's built-in
implementation: remove the local async function waitFor and import vi from
"vitest", then update tests that call waitFor(...) to call vi.waitFor(...)
(passing the same timeout/interval options) or use vi.waitFor(() => expect(...),
{ timeout, interval }) as appropriate; ensure references to the function name
waitFor are updated and any test-specific timing options are preserved.

packages/redis-worker/src/fair-queue/index.ts (1)

936-946: Unused variable _score.

The score is extracted but never used. Consider using an underscore-only name or restructuring the loop.

♻️ Suggested fix

     for (let i = 0; i < results.length; i += 2) {
       const queueId = results[i];
-      const _score = results[i + 1];
-      if (queueId && _score) {
+      if (queueId) {
         const tenantId = this.keys.extractTenantId(queueId);
         const existing = byTenant.get(tenantId) ?? [];
         existing.push(queueId);
         byTenant.set(tenantId, existing);
       }
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/redis-worker/src/fair-queue/index.ts` around lines 936 - 946, The
variable _score is assigned from results but never used; in the loop that builds
byTenant (see the byTenant map population and this.keys.extractTenantId(queueId)
usage) either remove the unused assignment or rename it to a conventional
discard name (e.g., _) to avoid the linterror. Update the for loop that iterates
results (the block that does const queueId = results[i]; const _score =
results[i + 1]; ...) so it does not create an unused variable—either omit the
second assignment entirely or change its name to _—and keep the rest of the
logic that pushes queueId into byTenant unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/redis-worker/src/fair-queue/index.ts`:
- Around line 936-946: The variable _score is assigned from results but never
used; in the loop that builds byTenant (see the byTenant map population and
this.keys.extractTenantId(queueId) usage) either remove the unused assignment or
rename it to a conventional discard name (e.g., _) to avoid the linterror.
Update the for loop that iterates results (the block that does const queueId =
results[i]; const _score = results[i + 1]; ...) so it does not create an unused
variable—either omit the second assignment entirely or change its name to _—and
keep the rest of the logic that pushes queueId into byTenant unchanged.

In `@packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts`:
- Around line 850-863: Replace the custom waitFor helper with vitest's built-in
implementation: remove the local async function waitFor and import vi from
"vitest", then update tests that call waitFor(...) to call vi.waitFor(...)
(passing the same timeout/interval options) or use vi.waitFor(() => expect(...),
{ timeout, interval }) as appropriate; ensure references to the function name
waitFor are updated and any test-specific timing options are preserved.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 337420c and 4a62c4b.

📒 Files selected for processing (6)

packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts
packages/redis-worker/src/fair-queue/visibility.ts
scripts/enhance-release-pr.mjs

🚧 Files skipped from review as they are similar to previous changes (1)

packages/redis-worker/src/fair-queue/visibility.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: typecheck / typecheck

🧰 Additional context used

📓 Path-based instructions (10)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Tests should avoid mocks or stubs and use the helpers from @internal/testcontainers when Redis or Postgres are needed
Use vitest for running unit tests

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx,js}: Never mock anything in tests - use testcontainers instead for Redis and PostgreSQL
Test files should be placed next to source files (e.g., MyService.ts → MyService.test.ts)
Use vitest exclusively for testing

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

{packages,integrations}/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying public packages in packages/* or integrations/*, add a changeset using pnpm run changeset:add

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Import from @trigger.dev/core using subpaths only, never the root

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

**/{src,app}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/{src,app}/**/*.{ts,tsx}: Always import Trigger.dev tasks from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and include a unique id string property

Files:

packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts
packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

🧠 Learnings (3)

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to {packages/core,apps/webapp}/**/*.{ts,tsx} : Use zod for validation in packages/core and apps/webapp

Applied to files:

packages/redis-worker/src/fair-queue/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

Applied to files:

packages/redis-worker/src/fair-queue/index.ts

📚 Learning: 2026-01-12T17:18:09.451Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.

Applied to files:

packages/redis-worker/src/fair-queue/index.ts
packages/redis-worker/src/fair-queue/tests/visibility.test.ts

🧬 Code graph analysis (2)

packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts (2)

packages/redis-worker/src/fair-queue/schedulers/drr.ts (1)

keys (271-274)

packages/redis-worker/src/fair-queue/index.ts (4)

queueId (1586-1599)

queueId (1601-1635)

queueId (1637-1639)

queueId (1650-1677)

packages/redis-worker/src/fair-queue/tests/visibility.test.ts (1)

packages/redis-worker/src/fair-queue/keyProducer.ts (4)

tenantQueueIndexKey (83-85)

dispatchKey (79-81)

queueKey (39-41)

queueItemsKey (43-45)

🔇 Additional comments (12)

scripts/enhance-release-pr.mjs (1)

305-309: Good fix for multiline markdown list rendering.

This keeps continuation lines under the same bullet item in “Server changes,” preventing broken list formatting.

packages/redis-worker/src/fair-queue/tests/visibility.test.ts (2)

450-607: LGTM! Test updates correctly align with new two-level dispatch API.

The releaseBatch tests properly use the new signature with tenantQueueIndexKey, dispatchKey, and tenantId parameters. The assertions verify both index updates after release operations.

610-868: LGTM! reclaimTimedOut callback updated consistently across all test cases.

The callback shape now correctly provides tenantQueueIndexKey, dispatchKey, and tenantId instead of the old masterQueueKey, aligning with the two-level dispatch changes in the visibility manager.

packages/redis-worker/src/fair-queue/index.ts (4)

117-119: LGTM! TenantDispatch integration properly initialized.

The two-level tenant dispatch is correctly wired into the FairQueue lifecycle with proper initialization and cleanup in the close() method.

816-906: LGTM! Dual-path processing correctly implements migration strategy.

The #processShardIteration properly runs both the new tenant dispatch path and the legacy drain path. The O(1) ZCARD check on line 832 efficiently skips the drain path when the legacy master queue is empty.

1786-1902: LGTM! V2 Lua scripts correctly implement two-level index updates.

The scripts properly:

Update per-queue storage

Update Level 2 (tenant queue index) with the queue's oldest message timestamp

Update Level 1 (dispatch index) with the tenant's oldest timestamp across all queues

Handle cleanup when queues/tenants become empty in updateDispatchIndexes

1645-1677: LGTM! Dual-index update handles migration correctly.

The #updateAllIndexesAfterDequeue method properly updates both the legacy master queue and new dispatch indexes. Both calls are idempotent (ZREM on non-existent member is no-op), correctly handling queues that may exist in either or both indexes during the migration period.

packages/redis-worker/src/fair-queue/tests/raceConditions.test.ts (1)

627-633: LGTM! reclaimTimedOut callback updated to match new API.

The callback correctly provides the new dispatch-related keys (tenantQueueIndexKey, dispatchKey, tenantId) instead of the old masterQueueKey.
packages/redis-worker/src/fair-queue/tests/tenantDispatch.test.ts (4)
1-170: LGTM! Well-structured test helper for two-level dispatch tests.

The TestHelper class cleanly wraps FairQueue setup with sensible defaults and provides a clean message handler interface. The consumer loop correctly handles message parsing and lifecycle operations.

515-515: Unused variable blockT1.

This variable is declared but never read. Remove it to clean up the test.
🧹 Remove unused variable
       const processed: Array<{ tenantId: string; value: string }> = [];
-      let blockT1 = true;
393-458: LGTM! Legacy drain test correctly validates migration behavior.

The test properly simulates pre-deploy state by writing directly to the old master queue and per-queue storage, then verifies that both legacy and new messages are processed and the master queue is drained.

714-846: LGTM! Legacy migration tests cover both reclaim and retry paths.

These tests ensure that messages originally in the legacy master queue correctly migrate to the new dispatch indexes when:

A message times out and is reclaimed

A message fails and is scheduled for retry

This validates the single-deploy migration strategy works correctly.

add server changeset file

devin-ai-integration

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

devin-ai-integration · 2026-02-26T13:14:18Z

packages/redis-worker/src/fair-queue/index.ts

    if (claimedMessages.length === 0) {
-      // Queue is empty, update master queue and clean up caches
-      const removed = await this.redis.updateMasterQueueIfEmpty(masterQueueKey, queueKey, queueId);
-      if (removed === 1) {
-        this.queueDescriptorCache.delete(queueId);
-        this.queueCooloffStates.delete(queueId);
-      }
+      // Queue is empty, update both old and new indexes and clean up caches
+      await this.#updateAllIndexesAfterDequeue(queueId, tenantId);
+      this.queueDescriptorCache.delete(queueId);
+      this.queueCooloffStates.delete(queueId);


🟡 Unconditional cache cleanup when claim returns empty, unlike old conditional behavior

When claimBatch returns 0 messages in #claimAndPushToWorkerQueue, the new code unconditionally deletes from queueDescriptorCache and queueCooloffStates without checking whether the queue is actually empty. The old code only cleaned up caches when updateMasterQueueIfEmpty returned 1 (confirming the queue was empty).

Root Cause and Impact

The old code at packages/redis-worker/src/fair-queue/index.ts (pre-PR) was:

const removed = await this.redis.updateMasterQueueIfEmpty(masterQueueKey, queueKey, queueId); if (removed === 1) { this.queueDescriptorCache.delete(queueId); this.queueCooloffStates.delete(queueId); }

The new code at lines 1130-1134 is:

await this.#updateAllIndexesAfterDequeue(queueId, tenantId); this.queueDescriptorCache.delete(queueId); this.queueCooloffStates.delete(queueId);

Although claimBatch returning 0 messages strongly implies the queue is empty, there's a race window where a new message could be enqueued between the claim and the index update. In that case, #updateAllIndexesAfterDequeue would correctly update scores (not remove), but the caches would still be prematurely cleared. The queueDescriptorCache entry would be re-created on the next enqueue, but the queueCooloffStates entry being cleared means any accumulated cooloff counter is lost, potentially allowing a queue that was approaching cooloff to reset its counter.

Impact: Minor — the race window is very small and the worst case is a cooloff counter reset or a cache miss requiring re-population.

Suggested change

if (claimedMessages.length === 0) {

// Queue is empty, update master queue and clean up caches

const removed = await this.redis.updateMasterQueueIfEmpty(masterQueueKey, queueKey, queueId);

if (removed === 1) {

this.queueDescriptorCache.delete(queueId);

this.queueCooloffStates.delete(queueId);

}

// Queue is empty, update both old and new indexes and clean up caches

await this.#updateAllIndexesAfterDequeue(queueId, tenantId);

this.queueDescriptorCache.delete(queueId);

this.queueCooloffStates.delete(queueId);

if (claimedMessages.length === 0) {

// Queue is empty, update both old and new indexes and clean up caches

const { queueEmpty } = await this.#updateAllIndexesAfterDequeue(queueId, tenantId);

if (queueEmpty) {

this.queueDescriptorCache.delete(queueId);

this.queueCooloffStates.delete(queueId);

}

return 0;

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-02-26T13:14:20Z

packages/redis-worker/src/fair-queue/index.ts

+  async #fallbackDispatchToLegacyScheduler(
+    loopId: string,
+    shardId: number,
+    context: DispatchSchedulerContext,
+    parentSpan?: Span
+  ): Promise<TenantQueues[]> {
+    // Get tenants from dispatch
+    const tenants = await this.tenantDispatch.getTenantsFromShard(shardId);
+    if (tenants.length === 0) return [];
+
+    // For each tenant, get their queues and build grouped result
+    const tenantQueues: TenantQueues[] = [];
+    for (const { tenantId } of tenants) {
+      if (this.concurrencyManager) {
+        const atCapacity = await this.concurrencyManager.isAtCapacity("tenant", tenantId);
+        if (atCapacity) continue;
+      }
+      const queues = await this.tenantDispatch.getQueuesForTenant(tenantId);
+      if (queues.length > 0) {
+        tenantQueues.push({ tenantId, queues: queues.map((q) => q.queueId) });
+      }
+    }
+
+    return tenantQueues;
+  }


🚩 Fallback scheduler path doesn't use DRR deficit tracking

The #fallbackDispatchToLegacyScheduler method (lines 969-993) is the code path for schedulers that don't implement selectQueuesFromDispatch. It reads tenants from dispatch, fetches their queues, but doesn't call selectQueues on the scheduler — it just returns all eligible tenants' queues directly. This means schedulers without selectQueuesFromDispatch lose their fairness algorithm entirely when using the dispatch path. The old code would have called selectQueues with the master queue key. This path should only be hit by non-DRR schedulers (e.g., round-robin, weighted), and it at least filters at-capacity tenants.

Was this helpful? React with 👍 or 👎 to provide feedback.

ericallam added 2 commits February 25, 2026 22:15

safer legacy master queue draining

e1604f7

This comment was marked as resolved.

Sign in to view

ericallam added 4 commits February 26, 2026 10:39

typecheck fixes

4a62c4b

ericallam force-pushed the feature/tri-7082-batch-queue-fair-queue-architecture-change branch from 337420c to 4a62c4b Compare February 26, 2026 10:39

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Better DRR efficiency and making sure selection is fair

b855e48

ericallam marked this pull request as ready for review February 26, 2026 11:21

This comment was marked as resolved.

Sign in to view

0ski approved these changes Feb 26, 2026

View reviewed changes

fix: dispatch index shard must be tenant-based, not queue-based

af54cdf

This comment was marked as resolved.

Sign in to view

chore: add server changes file for dispatch shard fix

321b462

add server changeset file

ericallam merged commit b1e78a6 into main Feb 26, 2026
32 checks passed

ericallam deleted the feature/tri-7082-batch-queue-fair-queue-architecture-change branch February 26, 2026 13:07

github-actions bot mentioned this pull request Feb 26, 2026

chore: release v4.4.2 #3127

Open

devin-ai-integration bot reviewed Feb 26, 2026

View reviewed changes

Uh oh!

Conversation

ericallam commented Feb 25, 2026

Uh oh!

changeset-bot bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Feb 25, 2026 •

edited

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading