feat: enable multi-datacenter support by KAJdev · Pull Request #266 · runpod/flash

KAJdev · 2026-03-11T21:55:01Z

flash is hardcoded to EU-RO-1. The DataCenter enum only has one value, NetworkVolume.dataCenterId is frozen to it, and ServerlessResource.datacenter defaults to it with a prod-only guard that syncs it to locations.

DataCenter now has all 10 listed RunPod DCs. The datacenter parameter on both Endpoint and ServerlessResource accepts a single DC, a list, a string, or None (no location restriction, matching the console's default "all datacenters" behavior). The old default of EU-RO-1 is replaced with None.

Endpoint(volume=...) accepts a single NetworkVolume or a list. Network volumes are tied to individual DCs, so a one-volume-per-datacenter constraint is enforced in ServerlessResource.normalize_network_volumes. When multiple volumes are attached, the deploy path sends networkVolumeIds to the GraphQL API instead of the singular networkVolumeId.

CPU availability is limited to a subset of DCs. CPU_DATACENTERS (EU-RO-1, US-TX-1, EU-SE-1) is exported and validated against in validate_cpu_datacenters so you get a clear error

Closes AE-2422

Copilot

Pull request overview

Enables multi-datacenter deployments by expanding the DataCenter enum, allowing datacenter=None (all DCs) or lists, and adding first-class support for attaching multiple network volumes (one per datacenter) with corresponding GraphQL payload updates.

Changes:

Expand DataCenter to all supported RunPod DCs and default endpoint/resource datacenter behavior to None (no restriction / all DCs).
Add multi-volume support (accept single or list; validate one volume per DC; send networkVolumeIds when multiple).
Add CPU datacenter restrictions via exported CPU_DATACENTERS, with validation and clearer errors.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_p2_gaps.py	Updates expectations for default `locations/datacenter` behavior when unset.
tests/unit/test_endpoint.py	Adds coverage for volume normalization to lists and expanded datacenter input forms.
tests/unit/test_deprecations.py	Marks `CPU_DATACENTERS` as non-deprecated public API.
tests/unit/resources/test_serverless.py	Updates/extends tests for multi-DC datacenter normalization, locations syncing, multi-volume validation, and CPU DC validation.
tests/unit/resources/test_network_volume.py	Adds tests for `NetworkVolume(datacenter=...)` alias behavior.
src/runpod_flash/runtime/resource_provisioner.py	Passes `locations` from manifest into reconstructed resources.
src/runpod_flash/endpoint.py	Accepts volume as single or list; forwards `datacenter` as-is; serializes multi-volume configs into `networkVolumes`.
src/runpod_flash/core/resources/serverless_cpu.py	Excludes `networkVolumes` from CPU endpoint payload.
src/runpod_flash/core/resources/serverless.py	Adds multi-volume normalization/validation; datacenter normalization; CPU DC validation; deploy/update inject `networkVolumeIds`.
src/runpod_flash/core/resources/network_volume.py	Expands `DataCenter`; adds `from_string`; exports `CPU_DATACENTERS`; adds `datacenter` alias for `dataCenterId`.
src/runpod_flash/core/resources/load_balancer_sls_resource.py	Excludes `networkVolumes` from CPU load balancer payload.
src/runpod_flash/core/resources/init.py	Re-exports `CPU_DATACENTERS`.
src/runpod_flash/core/api/runpod.py	Extends `saveEndpoint` selection set to include `networkVolumeIds`.
src/runpod_flash/cli/commands/build_utils/manifest.py	Persists `locations` into the build manifest deployment config.
src/runpod_flash/init.py	Exposes `CPU_DATACENTERS` at the top-level package API.
docs/Flash_SDK_Reference.md	Documents expanded DC set, `datacenter=None`, `CPU_DATACENTERS`, and multi-volume examples.
docs/Flash_Deploy_Guide.md	Updates persistent storage examples for explicit DC targeting and multi-volume usage.

Comments suppressed due to low confidence (2)

src/runpod_flash/cli/commands/build_utils/manifest.py:242

Manifest extraction still only serializes the singular networkVolume (and networkVolumeId), so endpoints configured with multiple volumes (networkVolumes) will be written to the manifest with only the first volume. This will break deploy/runtime provisioning for multi-volume configs. Extend manifest extraction to include networkVolumes when present (and ensure runtime reconstruction supports it).

        if hasattr(resource_config, "locations") and resource_config.locations:
            config["locations"] = resource_config.locations

        if hasattr(resource_config, "env") and resource_config.env:
            env_dict = dict(resource_config.env)
            env_dict.pop("RUNPOD_API_KEY", None)
            if env_dict:
                config["env"] = env_dict

        if hasattr(resource_config, "networkVolume") and resource_config.networkVolume:
            nv = resource_config.networkVolume
            nv_config = {"name": nv.name}
            if nv.size is not None:
                nv_config["size"] = nv.size
            if hasattr(nv, "dataCenterId") and nv.dataCenterId is not None:
                nv_config["dataCenterId"] = (
                    nv.dataCenterId.value
                    if hasattr(nv.dataCenterId, "value")
                    else nv.dataCenterId
                )
            config["networkVolume"] = nv_config

src/runpod_flash/runtime/resource_provisioner.py:157

create_resource_from_manifest can reconstruct only networkVolume/networkVolumeId, but multi-volume endpoints are now represented via networkVolumes (and deployed via networkVolumeIds). Without handling those keys, manifest-based provisioning will silently drop additional volumes. Add support for networkVolumes (and/or networkVolumeIds) reconstruction here.

    if "locations" in resource_data:
        deployment_kwargs["locations"] = resource_data["locations"]

    # Reconstruct NetworkVolume from manifest data if present
    if "networkVolume" in resource_data:
        from runpod_flash.core.resources.network_volume import NetworkVolume

        nv_data = resource_data["networkVolume"]
        deployment_kwargs["networkVolume"] = NetworkVolume(**nv_data)
    elif "networkVolumeId" in resource_data:
        deployment_kwargs["networkVolumeId"] = resource_data["networkVolumeId"]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T01:16:51Z

src/runpod_flash/core/resources/serverless_cpu.py

    # CPU endpoints don't use GPU-specific fields, so exclude them from API payload
    # This prevents the RunPod GraphQL API from rejecting CPU endpoints with GPU-specific fields
    # Note: instanceIds is NOT in _input_only, so it will be sent to the API
    _input_only = {
        "id",
        "cudaVersions",  # GPU-specific, exclude from API payload
        "datacenter",
        "env",
        "gpus",  # Inherited from parent, but always None for CPU endpoints
        "gpuIds",  # GPU-specific API field, exclude from payload
        "gpuCount",  # GPU-specific API field, exclude from payload
        "allowedCudaVersions",  # GPU-specific API field, exclude from payload
        "flashboot",
        "flashEnvironmentId",
        "imageName",
        "networkVolume",
+        "networkVolumes",
        "python_version",
    }


CpuServerlessEndpoint now excludes networkVolumes from the API payload via _input_only, but the overridden config_hash below only includes networkVolume (singular). That means changing the multi-volume configuration won’t change config_hash, so drift detection/updates may not trigger. Include networkVolumes in the CPU config hash (or otherwise ensure multi-volume changes affect drift detection).

Copilot · 2026-03-12T01:16:52Z

src/runpod_flash/core/resources/load_balancer_sls_resource.py

    _input_only = {
        "id",
        "cudaVersions",
        "datacenter",
        "env",
        "flashboot",
        "flashEnvironmentId",
        "gpus",
        "gpuIds",
        "gpuCount",
        "allowedCudaVersions",
        "imageName",
        "networkVolume",
+        "networkVolumes",
        "python_version",
    }


CpuLoadBalancerSlsResource now excludes networkVolumes from the API payload via _input_only, but its custom config_hash implementation later in the class only hashes networkVolume (singular). Multi-volume changes may not be detected as drift, preventing needed updates. Include networkVolumes in the CPU LB config hash (or ensure multi-volume affects drift detection).

Copilot · 2026-03-12T01:16:52Z

docs/Flash_SDK_Reference.md

+from runpod_flash import NetworkVolume, DataCenter

+# existing volume by ID
 vol = NetworkVolume(id="vol_abc123")


The docs show vol = NetworkVolume(id="vol_abc123"), but NetworkVolume currently requires a name field (no default) in the model. Either update the example to include name=... when referencing an existing volume, or make name optional when id is provided.

Suggested change

vol = NetworkVolume(id="vol_abc123")

vol = NetworkVolume(id="vol_abc123", name="my-existing-volume")

Copilot · 2026-03-12T01:16:52Z

src/runpod_flash/core/resources/serverless.py

    async def _ensure_network_volume_deployed(self) -> None:
+        """Ensures all network volumes are deployed.
+
+        Deploys each volume in networkVolumes and collects their IDs.
+        Sets networkVolumeId (singular) for backward compat with the first volume.
+        Populates _deployed_volume_ids for multi-volume API payloads.
        """
-        Ensures network volume is deployed and ready if one is specified.
-        Updates networkVolumeId with the deployed volume ID.
-        """
+        self._deployed_volume_ids: list[str] = []
+
        if self.networkVolumeId:
-            return
+            self._deployed_volume_ids.append(self.networkVolumeId)



_ensure_network_volume_deployed assigns self._deployed_volume_ids on a Pydantic model without declaring it as a field/private attr. With the current BaseResource.model_config (no extra='allow'), setting an undeclared attribute will raise at runtime, breaking deploy/update. Define _deployed_volume_ids as a Pydantic PrivateAttr (or stash it via self.__dict__[...]) so it can be set safely.

Copilot · 2026-03-12T01:16:52Z

src/runpod_flash/endpoint.py

+    if isinstance(volume, NetworkVolume):
+        return [volume]
+    if isinstance(volume, list):
+        return volume or None


_normalize_volumes accepts a list but doesn’t validate that every element is a NetworkVolume. If a caller passes a mixed/incorrect list, the failure will happen later (e.g., when calling model_dump), producing a less clear error. Consider validating list element types here and raising a ValueError that points to the bad element.

Suggested change

return volume or None

if not volume:

# preserve existing behavior: empty list normalizes to None

return None

for idx, vol in enumerate(volume):

if not isinstance(vol, NetworkVolume):

raise ValueError(

"volume list elements must be NetworkVolume; "

f"element at index {idx} is {type(vol).__name__}"

)

return volume

feat: enable multi-datacenter support

c01fa70

KAJdev force-pushed the zeke/ae-2422-enable-adding-dcs branch from c567dc6 to c01fa70 Compare March 11, 2026 22:04

KAJdev requested review from deanq and runpod-Henrik March 11, 2026 22:05

docs: update SDK reference and deploy guide for multi-datacenter support

985dff4

KAJdev mentioned this pull request Mar 11, 2026

feat: add datacenter selection example and update volume examples runpod/flash-examples#44

Open

KAJdev added 3 commits March 11, 2026 15:29

feat: add datacenter alias on NetworkVolume

47bf575

fix: pass locations through manifest and resource provisioner

0de5b01

fix: remove dead _is_prod_environment function

c45e4e2

deanq requested a review from Copilot March 12, 2026 01:10

Copilot started reviewing on behalf of deanq March 12, 2026 01:10 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable multi-datacenter support#266

feat: enable multi-datacenter support#266
KAJdev wants to merge 5 commits intomainfrom
zeke/ae-2422-enable-adding-dcs

KAJdev commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	vol = NetworkVolume(id="vol_abc123")
	vol = NetworkVolume(id="vol_abc123", name="my-existing-volume")

-        return volume or None
+        if not volume:
+            # preserve existing behavior: empty list normalizes to None
+            return None
+        for idx, vol in enumerate(volume):
+            if not isinstance(vol, NetworkVolume):
+                raise ValueError(
+                    "volume list elements must be NetworkVolume; "
+                    f"element at index {idx} is {type(vol).__name__}"
+                )
+        return volume

Conversation

KAJdev commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KAJdev commented Mar 11, 2026 •

edited

Loading