Bulter.py: Adds preprocess butler script for local preprocess#5266
Bulter.py: Adds preprocess butler script for local preprocess#5266IvanBM18 wants to merge 9 commits into
preprocess butler script for local preprocess#5266Conversation
preprocess command for local preprocess
Did you use a |
|
Does it need to be a subcommand of |
| uworker_env = _get_job_environment(args.job) | ||
| uworker_env.update(_get_fuzzer_environment(args.fuzzer, args.job)) | ||
|
|
||
| # Replicate what process_command_impl does in a real tworker |
There was a problem hiding this comment.
Could we use process_command_impl() then instead?
There was a problem hiding this comment.
At the end of said method we call run_command():
https://github.com/google/clusterfuzz/blob/master/src/clusterfuzz/_internal/bot/tasks/commands.py#L482
Which in turns triggers a workflow in which the preprocess step immediately queues the main task for remote execution when finished or just straight ups executes all 3 steps in the same machine(depending on setup), but we don't want that, we want to stop just after finishing the preprocess so we could manually trigger the main portion wherever and whenever we need to
There was a problem hiding this comment.
I see, thanks. Then this works for me. It will inevitably drift apart from the prod implementation, especially since this codepath is not exercised in production, but it's good enough to run for now and commit for others to reuse :)
Not in this case, but its possible to use a service account, you just need to generate a key, save it in your local and set it up as the default credentials for any gcloud library and cli operation. This is done using the Added more context in the description so future reviewers can easily understand this |
Yes, its need to as butler already handles a lot of bootstrapping operations for the same purpose, for example if we didn't use |
I think we are referring to different things, what I mean is that I think we could have as a standalone butler script so that we can run it with as python butler.py run <name_of_script> --non-dry-run --config $MY_DIRThat way we don't have to handle all of that by ourselves. Does it make sense? |
Thanks! |
preprocess command for local preprocesspreprocess butler script for local preprocess
095fe1c to
7a4a44a
Compare
letitz
left a comment
There was a problem hiding this comment.
LGTM as a testing tool, not production code.
Adds
preprocessbutler scriptThis butler script allows developers to trigger the preprocess portion of a
fuzztask and in consecuence generate the serialized and compresseduworker_inputpayload, upload it to real GCS, and get the signed download URL, exactly as it happens remotely. We can then use the resulting url to trigger a task in any backend that we want:utask_mainqueueThis accelerates local debugging of the
tworkerpreprocessing phase without relying on remote execution queues, which has proven to take multiple hours to "ACK" a task request.Note: To use this command you need the
Secret Manager Secret Accessorfor Dev or setup a service account key in your local(by using thegcloud authcli) that has said role and any other role required for a tworker'spreprocess.Example:
With an input like this
pipenv run python butler.py run preprocess --config-dir ~/clusterfuzz-config/configs/ --script_args fuzzer linux_asan_fuzzerYou should get an output like this:
....... Preprocess successful! Input Download URL: https://storage.googleapis.com/uworker-input.project/.....Changes
commands.pymodule workflow.Tests performed
Executed the following command in dev:
Successfully creates and uploads the payload and returns a valid signed URL. This signed url was later used to trigger a swarming task trough prpc, here are the logs