Conversation
c567dc6 to
c01fa70
Compare
There was a problem hiding this comment.
Pull request overview
Enables multi-datacenter deployments by expanding the DataCenter enum, allowing datacenter=None (all DCs) or lists, and adding first-class support for attaching multiple network volumes (one per datacenter) with corresponding GraphQL payload updates.
Changes:
- Expand
DataCenterto all supported RunPod DCs and default endpoint/resource datacenter behavior toNone(no restriction / all DCs). - Add multi-volume support (accept single or list; validate one volume per DC; send
networkVolumeIdswhen multiple). - Add CPU datacenter restrictions via exported
CPU_DATACENTERS, with validation and clearer errors.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_p2_gaps.py | Updates expectations for default locations/datacenter behavior when unset. |
| tests/unit/test_endpoint.py | Adds coverage for volume normalization to lists and expanded datacenter input forms. |
| tests/unit/test_deprecations.py | Marks CPU_DATACENTERS as non-deprecated public API. |
| tests/unit/resources/test_serverless.py | Updates/extends tests for multi-DC datacenter normalization, locations syncing, multi-volume validation, and CPU DC validation. |
| tests/unit/resources/test_network_volume.py | Adds tests for NetworkVolume(datacenter=...) alias behavior. |
| src/runpod_flash/runtime/resource_provisioner.py | Passes locations from manifest into reconstructed resources. |
| src/runpod_flash/endpoint.py | Accepts volume as single or list; forwards datacenter as-is; serializes multi-volume configs into networkVolumes. |
| src/runpod_flash/core/resources/serverless_cpu.py | Excludes networkVolumes from CPU endpoint payload. |
| src/runpod_flash/core/resources/serverless.py | Adds multi-volume normalization/validation; datacenter normalization; CPU DC validation; deploy/update inject networkVolumeIds. |
| src/runpod_flash/core/resources/network_volume.py | Expands DataCenter; adds from_string; exports CPU_DATACENTERS; adds datacenter alias for dataCenterId. |
| src/runpod_flash/core/resources/load_balancer_sls_resource.py | Excludes networkVolumes from CPU load balancer payload. |
| src/runpod_flash/core/resources/init.py | Re-exports CPU_DATACENTERS. |
| src/runpod_flash/core/api/runpod.py | Extends saveEndpoint selection set to include networkVolumeIds. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Persists locations into the build manifest deployment config. |
| src/runpod_flash/init.py | Exposes CPU_DATACENTERS at the top-level package API. |
| docs/Flash_SDK_Reference.md | Documents expanded DC set, datacenter=None, CPU_DATACENTERS, and multi-volume examples. |
| docs/Flash_Deploy_Guide.md | Updates persistent storage examples for explicit DC targeting and multi-volume usage. |
Comments suppressed due to low confidence (2)
src/runpod_flash/cli/commands/build_utils/manifest.py:242
- Manifest extraction still only serializes the singular
networkVolume(andnetworkVolumeId), so endpoints configured with multiple volumes (networkVolumes) will be written to the manifest with only the first volume. This will break deploy/runtime provisioning for multi-volume configs. Extend manifest extraction to includenetworkVolumeswhen present (and ensure runtime reconstruction supports it).
if hasattr(resource_config, "locations") and resource_config.locations:
config["locations"] = resource_config.locations
if hasattr(resource_config, "env") and resource_config.env:
env_dict = dict(resource_config.env)
env_dict.pop("RUNPOD_API_KEY", None)
if env_dict:
config["env"] = env_dict
if hasattr(resource_config, "networkVolume") and resource_config.networkVolume:
nv = resource_config.networkVolume
nv_config = {"name": nv.name}
if nv.size is not None:
nv_config["size"] = nv.size
if hasattr(nv, "dataCenterId") and nv.dataCenterId is not None:
nv_config["dataCenterId"] = (
nv.dataCenterId.value
if hasattr(nv.dataCenterId, "value")
else nv.dataCenterId
)
config["networkVolume"] = nv_config
src/runpod_flash/runtime/resource_provisioner.py:157
create_resource_from_manifestcan reconstruct onlynetworkVolume/networkVolumeId, but multi-volume endpoints are now represented vianetworkVolumes(and deployed vianetworkVolumeIds). Without handling those keys, manifest-based provisioning will silently drop additional volumes. Add support fornetworkVolumes(and/ornetworkVolumeIds) reconstruction here.
if "locations" in resource_data:
deployment_kwargs["locations"] = resource_data["locations"]
# Reconstruct NetworkVolume from manifest data if present
if "networkVolume" in resource_data:
from runpod_flash.core.resources.network_volume import NetworkVolume
nv_data = resource_data["networkVolume"]
deployment_kwargs["networkVolume"] = NetworkVolume(**nv_data)
elif "networkVolumeId" in resource_data:
deployment_kwargs["networkVolumeId"] = resource_data["networkVolumeId"]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # CPU endpoints don't use GPU-specific fields, so exclude them from API payload | ||
| # This prevents the RunPod GraphQL API from rejecting CPU endpoints with GPU-specific fields | ||
| # Note: instanceIds is NOT in _input_only, so it will be sent to the API | ||
| _input_only = { | ||
| "id", | ||
| "cudaVersions", # GPU-specific, exclude from API payload | ||
| "datacenter", | ||
| "env", | ||
| "gpus", # Inherited from parent, but always None for CPU endpoints | ||
| "gpuIds", # GPU-specific API field, exclude from payload | ||
| "gpuCount", # GPU-specific API field, exclude from payload | ||
| "allowedCudaVersions", # GPU-specific API field, exclude from payload | ||
| "flashboot", | ||
| "flashEnvironmentId", | ||
| "imageName", | ||
| "networkVolume", | ||
| "networkVolumes", | ||
| "python_version", | ||
| } |
There was a problem hiding this comment.
CpuServerlessEndpoint now excludes networkVolumes from the API payload via _input_only, but the overridden config_hash below only includes networkVolume (singular). That means changing the multi-volume configuration won’t change config_hash, so drift detection/updates may not trigger. Include networkVolumes in the CPU config hash (or otherwise ensure multi-volume changes affect drift detection).
| _input_only = { | ||
| "id", | ||
| "cudaVersions", | ||
| "datacenter", | ||
| "env", | ||
| "flashboot", | ||
| "flashEnvironmentId", | ||
| "gpus", | ||
| "gpuIds", | ||
| "gpuCount", | ||
| "allowedCudaVersions", | ||
| "imageName", | ||
| "networkVolume", | ||
| "networkVolumes", | ||
| "python_version", | ||
| } |
There was a problem hiding this comment.
CpuLoadBalancerSlsResource now excludes networkVolumes from the API payload via _input_only, but its custom config_hash implementation later in the class only hashes networkVolume (singular). Multi-volume changes may not be detected as drift, preventing needed updates. Include networkVolumes in the CPU LB config hash (or ensure multi-volume affects drift detection).
| from runpod_flash import NetworkVolume, DataCenter | ||
|
|
||
| # existing volume by ID | ||
| vol = NetworkVolume(id="vol_abc123") |
There was a problem hiding this comment.
The docs show vol = NetworkVolume(id="vol_abc123"), but NetworkVolume currently requires a name field (no default) in the model. Either update the example to include name=... when referencing an existing volume, or make name optional when id is provided.
| vol = NetworkVolume(id="vol_abc123") | |
| vol = NetworkVolume(id="vol_abc123", name="my-existing-volume") |
| async def _ensure_network_volume_deployed(self) -> None: | ||
| """Ensures all network volumes are deployed. | ||
|
|
||
| Deploys each volume in networkVolumes and collects their IDs. | ||
| Sets networkVolumeId (singular) for backward compat with the first volume. | ||
| Populates _deployed_volume_ids for multi-volume API payloads. | ||
| """ | ||
| Ensures network volume is deployed and ready if one is specified. | ||
| Updates networkVolumeId with the deployed volume ID. | ||
| """ | ||
| self._deployed_volume_ids: list[str] = [] | ||
|
|
||
| if self.networkVolumeId: | ||
| return | ||
| self._deployed_volume_ids.append(self.networkVolumeId) | ||
|
|
There was a problem hiding this comment.
_ensure_network_volume_deployed assigns self._deployed_volume_ids on a Pydantic model without declaring it as a field/private attr. With the current BaseResource.model_config (no extra='allow'), setting an undeclared attribute will raise at runtime, breaking deploy/update. Define _deployed_volume_ids as a Pydantic PrivateAttr (or stash it via self.__dict__[...]) so it can be set safely.
| if isinstance(volume, NetworkVolume): | ||
| return [volume] | ||
| if isinstance(volume, list): | ||
| return volume or None |
There was a problem hiding this comment.
_normalize_volumes accepts a list but doesn’t validate that every element is a NetworkVolume. If a caller passes a mixed/incorrect list, the failure will happen later (e.g., when calling model_dump), producing a less clear error. Consider validating list element types here and raising a ValueError that points to the bad element.
| return volume or None | |
| if not volume: | |
| # preserve existing behavior: empty list normalizes to None | |
| return None | |
| for idx, vol in enumerate(volume): | |
| if not isinstance(vol, NetworkVolume): | |
| raise ValueError( | |
| "volume list elements must be NetworkVolume; " | |
| f"element at index {idx} is {type(vol).__name__}" | |
| ) | |
| return volume |
flash is hardcoded to EU-RO-1. The
DataCenterenum only has one value,NetworkVolume.dataCenterIdis frozen to it, andServerlessResource.datacenterdefaults to it with a prod-only guard that syncs it tolocations.DataCenternow has all 10 listed RunPod DCs. Thedatacenterparameter on bothEndpointandServerlessResourceaccepts a single DC, a list, a string, orNone(no location restriction, matching the console's default "all datacenters" behavior). The old default of EU-RO-1 is replaced withNone.Endpoint(volume=...)accepts a singleNetworkVolumeor a list. Network volumes are tied to individual DCs, so a one-volume-per-datacenter constraint is enforced inServerlessResource.normalize_network_volumes. When multiple volumes are attached, the deploy path sendsnetworkVolumeIdsto the GraphQL API instead of the singularnetworkVolumeId.CPU availability is limited to a subset of DCs.
CPU_DATACENTERS(EU-RO-1, US-TX-1, EU-SE-1) is exported and validated against invalidate_cpu_datacentersso you get a clear errorCloses AE-2422