Skip to content

feat(scheduler): dispatch applies from workers#136

Draft
aparajon wants to merge 12 commits into
mainfrom
armand/scheduler-owned-dispatch
Draft

feat(scheduler): dispatch applies from workers#136
aparajon wants to merge 12 commits into
mainfrom
armand/scheduler-owned-dispatch

Conversation

@aparajon
Copy link
Copy Markdown
Collaborator

@aparajon aparajon commented May 21, 2026

Summary

  • Queue apply requests durably in SchemaBot storage, then use a best-effort scheduler wake so fresh applies usually start immediately
  • Make scheduler workers own claimed execution until the apply reaches a durable terminal or retry-waiting state, instead of only launching background work
  • Keep retry-waiting applies visible until the stale-heartbeat window before scheduler retries, preventing tight retry loops while preserving recovery
  • Route stopped apply starts back through the scheduler: /start marks the apply claimable, wakes workers, and returns accepted/rejected without running schema change work in the HTTP handler
  • Preserve the clear deferred-deploy rejection when /start is called before an apply reaches waiting_for_deploy
  • Update local and gRPC Tern clients to run queued or resumed applies from scheduler claims while preserving observers, options, tasks, and remote apply IDs
  • Serve queued gRPC progress from local storage until the data-plane apply ID exists, reporting the handoff as pending instead of polling the data plane with a control-plane ID
  • Release gRPC scheduler workers when a known remote apply is not found or repeatedly reports no active progress, marking the local apply failed instead of polling forever
  • Refresh stopped MySQL task state from data-plane progress so stop/start flows remain startable when scheduler progress polling races with a user stop
  • Guard atomic/deferred resumes with a final schema check so work that completed near stop/cutover is marked complete instead of reapplied

The wake fast path keeps the first-time CLI/webhook experience responsive: after an apply is accepted or restarted, users should see work start right away instead of waiting for the next scheduler poll. It is not a second queue; workers still claim from storage, and the normal polling loop remains the correctness path if a wake is dropped or coalesced.

POST /apply or /start
    |
    v
SchemaBot API ------ create/mark pending apply ------> Storage
    |                                                 ^
    | best-effort wake                               | claim + heartbeat lease
    v                                                 |
Scheduler worker ----------------------------> local/gRPC Tern
    |                                                 |
    +------ owns execution until terminal/retry-waiting state

🤖 Generated by Codex.

Copilot AI review requested due to automatic review settings May 21, 2026 15:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR shifts apply execution to a durable queue + scheduler-worker dispatch model, so HTTP/webhook requests persist the apply and return quickly while background workers claim and start engine work (local and gRPC Tern), preserving observers and IDs.

Changes:

  • Updated Service.ExecuteApply to enqueue apply/task records durably and wake scheduler workers instead of dispatching engine work inline.
  • Extended Local and gRPC Tern clients to dispatch queued applies from scheduler claims (including external_id syncing for gRPC).
  • Updated integration/E2E/unit tests to start the scheduler where applies now require worker dispatch.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/webhook/webhook_e2e_test.go Starts scheduler in webhook E2E setup so queued applies are dispatched.
pkg/webhook/comment_observer.go Updates observer documentation for queued/scheduler-based dispatch timing.
pkg/webhook/apply_handlers.go Aligns webhook handler comments with enqueue-before-dispatch flow.
pkg/tern/observer.go Clarifies pending-observer semantics vs API service’s pending-observer registry.
pkg/tern/local_control_resume.go Adds queued-apply dispatch path for pending applies during resume.
pkg/tern/local_client.go Updates pendingObserver comment to match new direct-Apply-only usage.
pkg/tern/grpc_client.go Implements queued apply dispatch for gRPC mode and stores external_id after dispatch; adjusts metrics on completion.
pkg/tern/client.go Updates client interface docs to reflect scheduler-claimed work semantics.
pkg/api/service.go Adds scheduler synchronization primitives and per-target pending observer registry.
pkg/api/scheduler.go Adds worker wakeups for queued applies and mutex-protects scheduler lifecycle.
pkg/api/proto_helpers.go Removes now-unused helper for converting stored table changes to proto changes.
pkg/api/progress_handlers.go Maps VSchema change type to vschema_update for API responses.
pkg/api/plan_handlers.go Reworks apply to enqueue durable apply/task rows, attach observer, wake scheduler, and return apply_identifier.
pkg/api/handlers_test.go Adds coverage for enqueueing + scheduler wake behavior and updates handler expectations (no inline dispatch).
integration/workflow_test.go Starts scheduler in integration server setup.
integration/scheduler_test.go Stops scheduler explicitly in test to validate subsequent recovery behavior.
integration/resolve_apply_id_test.go Waits for scheduler to dispatch queued gRPC apply and populate external_id.
integration/hybrid_mode_test.go Starts scheduler for hybrid local/remote targets.
integration/grpc_integration_test.go Passes storage to gRPC client and waits for scheduler-dispatched external_id; starts scheduler.
integration/cli_test.go Starts scheduler in CLI integration setups and passes storage to gRPC client.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/api/scheduler.go Outdated
Comment thread pkg/api/plan_handlers.go
Comment thread pkg/tern/grpc_client.go Outdated
@aparajon aparajon force-pushed the armand/scheduler-owned-dispatch branch 4 times, most recently from 2472dfa to 1419f3d Compare May 21, 2026 17:20
@aparajon aparajon marked this pull request as ready for review May 21, 2026 17:25
@aparajon aparajon requested review from Kiran01bm and morgo as code owners May 21, 2026 17:25
@aparajon aparajon force-pushed the armand/scheduler-owned-dispatch branch from 1419f3d to 4a8eb4b Compare May 21, 2026 17:45
@aparajon aparajon force-pushed the armand/scheduler-owned-dispatch branch from 9b1402f to e399bdf Compare May 21, 2026 19:51
@aparajon aparajon marked this pull request as draft May 21, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants