Add support for Swarming preprocess queue and task scheduling#5282
Add support for Swarming preprocess queue and task scheduling#5282IvanBM18 wants to merge 4 commits into
Conversation
4993e5d to
99bf603
Compare
| from clusterfuzz._internal.metrics import logs | ||
|
|
||
| PREPROCESS_TARGET_SIZE_DEFAULT = 10000 | ||
| SWARMING_PREPROCESS_TARGET_SIZE_DEFAULT = 5 |
There was a problem hiding this comment.
The Swarming pool has a hard limit of 25 (LINUX) bots running 1 task each.
- At average 1 hour per fuzzing/swarming task, those 25 bots can finish 4 to 5 tasks every 10 minutes (the interval the cron job runs)
- Because the 2,000(in prod)
preprocesstworkers instantly process the preprocess queue, the target size acts as an injection rate more than a buffer.
So, Injecting 5 tasks every 10 minutes matches the expected Swarming rate, preventing an infinitely growing backlog of stale tasks. This is still the default value, the real value is managed trough a feature flag, we will later tweak this feature flag based on metrics & how swarming handled this workload, so that we have a more acqurate value
fernandofloresg
left a comment
There was a problem hiding this comment.
lgtm just had one question
| PREPROCESS_QUEUE_SIZE_LIMIT = 'preprocess_queue_size_limit' | ||
|
|
||
| SWARMING_REMOTE_EXECUTION = 'swarming_remote_execution' | ||
| # TODO(ibarba): Set this value based off dev & stage metrics and tests. |
There was a problem hiding this comment.
is this still true? if this is going to master then is going to stage and prod ?
There was a problem hiding this comment.
Yes this is still true, right now no swarming related code executes outside of dev, we have a ton of featureFlags in place for this reason. So when this changes reach stage/prod they are going to be safe.
Overview
This change adds support for scheduling tasks to the new Swarming backend. Because Swarming uses a different execution model and to be able to later account for
backpressureit requires its own separatepreprocessqueue and a much lower default target size (5) to prevent unbounded task queuing.By refactoring the cron scheduling logic, we can now simultaneously feed both the Swarming and Batch environments at their respective ideal rates.
Changes
SWARMING_PREPROCESS_TARGET_SIZE_DEFAULTset to5to support the Swarming backend's task capacity needs.SWARMING_QUEUES).BaseFuzzTaskSchedulerinto an abstract base class with a generic_schedule_fuzz_tasksmethod to support multiple backends.ChromeFuzzTaskSchedulerto independently schedule Swarming tasks (_schedule_swarming_fuzz_tasks) alongside standard Batch tasks.schedule_fuzz_test.pyto match the updated scheduler class signatures.TODO
src/clusterfuzz/_internal/base/feature_flags.py: Update this value based off dev & stage metrics and tests.