perf: parallelize EBS disk creation with other setup work#56
Draft
wdvr wants to merge 1 commit into
Draft
Conversation
Split disk creation into start_disk_creation() (non-blocking) and wait_for_disk_ready() (blocking). The create_volume API call returns immediately while AWS creates the volume in the background. We now do GitHub key fetching and EFS setup during that time, then wait for the volume only right before pod creation. Also: - Store ebs_volume_id in DynamoDB immediately after create_volume so cancel/cleanup can always find and clean up orphaned volumes - Add orphan volume cleanup in the outer except block when allocation fails after volume creation - Reduce SSH daemon poll interval from 10s to 3s (60 retries) since default image has openssh-server pre-installed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
create_disk_from_snapshot_or_emptyinto two phases:start_disk_creation()(non-blocking) andwait_for_disk_ready()(blocking)create_volumeAPI returns immediately with a volume_id — we now do GitHub key fetching + EFS setup while AWS creates the volume in the backgroundBug fixes included
ebs_volume_idis now written to DynamoDB immediately aftercreate_volumereturns, so cancel/cleanup can always find the volume even if the reservation fails mid-way.New flow
Test plan
--traceand compare disk_create_start→disk_create_end vs disk_wait_start→disk_create_end (the gap shows time saved)