Skip to content

[scheduler/cuebot/pycue/rqd/pyoutline] Booking by slot limit#2101

Closed
DiegoTavares wants to merge 102 commits intoAcademySoftwareFoundation:new-schedulerfrom
DiegoTavares:scheduler_resource_naive_mode
Closed

[scheduler/cuebot/pycue/rqd/pyoutline] Booking by slot limit#2101
DiegoTavares wants to merge 102 commits intoAcademySoftwareFoundation:new-schedulerfrom
DiegoTavares:scheduler_resource_naive_mode

Conversation

@DiegoTavares
Copy link
Copy Markdown
Collaborator

@DiegoTavares DiegoTavares commented Dec 11, 2025

Add a new booking mode that doesn't take cores and memory into consideration, but a predefined limit on how many concurrent frames a host is allowed to run.

Rationale: Booking by slot is useful for pipelines where frames are small and limited not by their
cpu/memory consumption but by other resources like storage bandwidth or network availability. In
these scenarios, limiting the concurrency is more important than the resource consumption.

Attention:* This branch is stacked on top of #2002

Tasks:

  • Implement booking logic on Scheduler
  • Add new columns to Host to mark how many slots are available and fill them up on Cuebot
  • Add new column to Layer to define slot limit and implement logic to fill it up on Cuebot
  • Handle new attributes on Host and Layer using pycue
  • Handle new attributes on Host and Layer using pyoutline
  • Handle new attributes on Host and Layer using cuegui

- Implement FrameRange and FrameSet structs to parse and represent complex frame range syntaxes
including stepped, inverse stepped, negative steps, and interleaved ranges - Support chunking
FrameSets into compact sub-ranges for dispatching - Integrate FrameSet chunking in RqdDispatcher for
precise frame chunking - Improve dispatch error handling with distinct error types - Update host DAO
and models to include allocation info for resource checks - Add .gitignore entry for /sandbox/kafka*
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
The producer module produces events on kafka for each pending job. The consumer modules consume
events and books jobs on host, still relying on the database.
This version still contains an issue when executing multiple tests at the same time, as tests are
sharing a database instance an they rely on it existing to work.
Optimized async + pgpool interaction, but still far from perfect.
Last commit before giving up on dashmap
There is a protection against processing multiple bookings on a single host at the same time on
HostDao that uses a database lock. This protection is intended for multiple instances of the
scheduler running at the same time. However, this logic was also being triggered by a single
instance, which indicated there was a race condition in place.

The race condition happens because hosts can belong to multiple groups at the same time.
Use a central host store to prevent a split brain condition when a host belongs to multiple clusters
at the same time.
Besides that, use host_stats for up-to-date memory information when updating the host cache.
To simplify testing, these changes are being migrated to a new PR
Entries were migrated to a new PR isolating the feature they were related to
The new option is define as: ```yaml
```
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
This field limits the number of concurrent frames allowed to run on a specific host.
This commit is the first step towards the goal of allowing a new booking mode that doesn't take
cores and memory into consideration, but a predefined limit on how many concurrent frames a host is
allowed to run.

Rationale: Booking by slot is useful for pipelines where frames are small and limited not by their
cpu/memory consumption but by other resources like storage bandwith or network availability. In
these scenarios, limiting the concurrency is more important than the resource consumption.
@DiegoTavares DiegoTavares marked this pull request as draft December 11, 2025 18:54
@DiegoTavares DiegoTavares changed the base branch from master to new-scheduler December 12, 2025 17:44
@DiegoTavares
Copy link
Copy Markdown
Collaborator Author

A new PR has been created to handle the branch stacking.
#2105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant