Date: Wed, 11 Mar 2026 11:46:32 -0400
Subject: [PATCH 3/5] Simplify Flash apps documentation
---
flash/apps/customize-app.mdx | 234 +++++-----------------------
flash/apps/deploy-apps.mdx | 117 ++------------
flash/apps/initialize-project.mdx | 41 ++---
flash/apps/local-testing.mdx | 34 +----
flash/apps/overview.mdx | 244 +++---------------------------
flash/apps/requests.mdx | 4 +-
6 files changed, 85 insertions(+), 589 deletions(-)
diff --git a/flash/apps/customize-app.mdx b/flash/apps/customize-app.mdx
index 04058679..6312d5b2 100644
--- a/flash/apps/customize-app.mdx
+++ b/flash/apps/customize-app.mdx
@@ -8,212 +8,45 @@ import { LoadBalancingEndpointsTooltip, QueueBasedEndpointsTooltip } from "/snip
After running `flash init`, you have a working project template with example and . This guide shows you how to customize the template to build your application.
-## Understanding endpoint architecture
+## Endpoint types
-The relationship between endpoint configurations and deployed Serverless endpoints differs between load-balanced and queue-based endpoints. Understanding this mapping is critical for building Flash apps correctly.
+Flash supports two endpoint types, each suited for different use cases:
-### Key rules
+| Type | Best for | Functions per endpoint |
+|------|----------|------------------------|
+| **Queue-based** | Long-running GPU tasks | One |
+| **Load-balanced** | Fast HTTP APIs | Multiple (via routes) |
-**Queue-based endpoints** follow a strict 1:1:1 rule:
-- 1 endpoint configuration : 1 `@Endpoint` function : 1 Serverless endpoint.
-- Each function must have its own unique endpoint name.
-- Each endpoint gets its own URL (e.g., `https://api.runpod.ai/v2/abc123xyz`)
-- Called via `/run` or `/runsync` routes.
+
+
+ Each `@Endpoint` function creates a separate Serverless endpoint:
-**Load-balanced endpoints** allow multiple routes on one endpoint:
-- 1 endpoint instance = multiple route decorators = 1 Serverless endpoint.
-- Multiple routes can share the same endpoint configuration.
-- All routes share one URL with different paths (e.g., `/generate`, `/health`).
-- Each route defined by `.get()`, `.post()`, etc. method decorators.
+ ```python
+ @Endpoint(name="preprocess", gpu=GpuType.NVIDIA_A100_80GB_PCIe)
+ def preprocess(data): ...
-
-**Do not reuse the same endpoint name for multiple queue-based functions when deploying Flash apps.** Each queue-based function must have its own unique `name` parameter.
-
-
-### Examples
-
-The following sections demonstrate progressively complex scenarios:
-
-
-
-**Your code:**
-
-```python title="gpu_worker.py"
-from runpod_flash import Endpoint, GpuType
-
-@Endpoint(
- name="gpu-inference",
- gpu=GpuType.NVIDIA_A100_80GB_PCIe,
- dependencies=["torch"]
-)
-async def process_data(input: dict) -> dict:
- import torch
- # Your processing logic
- return {"result": "processed"}
-```
-
-**What gets deployed:**
-
-- **1 Serverless endpoint**: `https://api.runpod.ai/v2/abc123xyz`
- - Named: `gpu-inference`
- - Hardware: A100 80GB GPUs.
- - When you call the endpoint: A worker runs the `process_data` function using the input data you provide.
+ @Endpoint(name="inference", gpu=GpuType.NVIDIA_A100_80GB_PCIe)
+ def run_model(input): ...
+ ```
-**How to call it:**
+ Call via `/run` or `/runsync`: `https://api.runpod.ai/v2/{endpoint_id}/runsync`
+
+
+ Multiple routes share one endpoint:
-```bash
-# Synchronous call:
-curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
- -H "Authorization: Bearer $RUNPOD_API_KEY" \
- -d '{"input": {"your": "data"}}'
-
-# Asynchronous call:
-curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
- -H "Authorization: Bearer $RUNPOD_API_KEY" \
- -d '{"input": {"your": "data"}}'
-```
+ ```python
+ api = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))
-**Key takeaway:** Each queue-based function must have its own unique endpoint name. Do not reuse the same name for multiple queue-based functions in Flash apps.
-
+ @api.post("/generate")
+ def generate_text(prompt: str): ...
-
+ @api.get("/health")
+ def health_check(): ...
+ ```
-**Your code:**
-
-```python title="gpu_worker.py"
-from runpod_flash import Endpoint, GpuType
-
-# Each function needs its own endpoint
-@Endpoint(
- name="preprocess",
- gpu=GpuType.NVIDIA_A100_80GB_PCIe,
- dependencies=["torch"]
-)
-async def preprocess(data: dict) -> dict:
- return {"preprocessed": data}
-
-@Endpoint(
- name="inference",
- gpu=GpuType.NVIDIA_A100_80GB_PCIe,
- dependencies=["transformers"]
-)
-async def run_model(input: dict) -> dict:
- return {"output": "result"}
-```
-
-**What gets deployed:**
-
-- **2 Serverless endpoints**:
- 1. `https://api.runpod.ai/v2/abc123xyz` (Named: `preprocess` in the console)
- 2. `https://api.runpod.ai/v2/def456xyz` (Named: `inference` in the console)
-
-**How to call them:**
-
-```bash
-# Call preprocess endpoint:
-curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
- -H "Authorization: Bearer $RUNPOD_API_KEY" \
- -d '{"input": {"your": "data"}}'
-
-# Call inference endpoint:
-curl -X POST https://api.runpod.ai/v2/def456xyz/runsync \
- -H "Authorization: Bearer $RUNPOD_API_KEY" \
- -d '{"input": {"your": "data"}}'
-```
-
-**Key takeaway:** Each queue-based function must have its own unique endpoint name. Do not reuse the same name for multiple queue-based functions in Flash apps.
-
-
-
-
-
-**Your code:**
-
-```python title="lb_worker.py"
-from runpod_flash import Endpoint
-
-api = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))
-
-@api.post("/generate")
-async def generate_text(prompt: str) -> dict:
- return {"text": "generated"}
-
-@api.post("/translate")
-async def translate_text(text: str, target: str) -> dict:
- return {"translated": text}
-
-@api.get("/health")
-async def health_check() -> dict:
- return {"status": "healthy"}
-```
-
-**What gets deployed:**
-
-- **1 Serverless endpoint**: `https://abc123xyz.api.runpod.ai` (Named: `api-server`)
-- **3 HTTP routes**: `POST /generate`, `POST /translate`, `GET /health` (Defined by the route decorators in `lb_worker.py`)
-
- **How to call them:**
-
-```bash
-# Call /generate route:
-curl -X POST https://abc123xyz.api.runpod.ai/generate \
- -H "Authorization: Bearer $RUNPOD_API_KEY" \
- -d '{"prompt": "hello"}'
-
-# Call /health route (same endpoint URL):
-curl -X GET https://abc123xyz.api.runpod.ai/health \
- -H "Authorization: Bearer $RUNPOD_API_KEY"
-```
-
-**Key takeaway:** Load-balanced endpoints can have multiple routes on a single Serverless endpoint. The route decorator determines each route.
-
-
-
-
-
-**Your code:**
-
-```python title="mixed_api_worker.py"
-from runpod_flash import Endpoint, GpuType
-
-# Public-facing API (load-balanced)
-api = Endpoint(name="public-api", cpu="cpu5c-4-8", workers=(1, 5))
-
-@api.post("/process")
-async def handle_request(data: dict) -> dict:
- # Call internal GPU worker
- result = await run_gpu_inference(data)
- return {"result": result}
-
-# Internal GPU worker (queue-based)
-@Endpoint(
- name="gpu-backend",
- gpu=GpuType.NVIDIA_A100_80GB_PCIe,
- dependencies=["torch"]
-)
-async def run_gpu_inference(input: dict) -> dict:
- import torch
- # Heavy GPU computation
- return {"inference": "result"}
-```
-
-**What gets deployed:**
-
-- **2 Serverless endpoints**:
- 1. `https://abc123xyz.api.runpod.ai` (public-api, load-balanced)
- 2. `https://api.runpod.ai/v2/def456xyz` (gpu-backend, queue-based)
-
-**Key takeaway:** You can mix endpoint types. Load-balanced endpoints can call queue-based endpoints internally.
-
-
-
-### Quick reference
-
-| Endpoint Type | Configuration rule | Result |
-|---------------|-------------|--------|
-| Queue-based | 1 name : 1 function | 1 Serverless endpoint |
-| Load-balanced | 1 endpoint : 1 or more routes | 1 Serverless endpoint with >= 1 paths |
-| Mixed | Different names : Different functions | Separate Serverless endpoints |
+ Call via HTTP routes: `https://{endpoint_id}.api.runpod.ai/generate`
+
+
## Add load balancing routes
@@ -279,9 +112,10 @@ async def train_model(config: dict) -> dict:
This creates two separate Serverless endpoints, each with its own URL and scaling configuration.
-**Each queue-based function must have its own unique endpoint name.** Do not assign multiple `@Endpoint` functions to the same `name` when building Flash apps.
+**Do not reuse the same endpoint name for multiple queue-based functions when deploying Flash apps.** Each queue-based `@Endpoint` must have its own unique `name` parameter.
+
## Modify endpoint configurations
Customize endpoint configurations for each worker function in your app. Each `@Endpoint` function can have its own GPU type, scaling parameters, and timeouts optimized for its specific workload.
@@ -303,7 +137,11 @@ async def preprocess(data): ...
async def inference(data): ...
```
-See [Configuration parameters](/flash/configuration/parameters) for all available options, [GPU types](/flash/configuration/gpu-types) for selecting hardware, and [Best practices](/flash/configuration/best-practices) for optimization guidance.
+For details, see:
+
+- [Configuration parameters](/flash/configuration/parameters) for all available options.
+- [GPU types](/flash/configuration/gpu-types) for selecting hardware.
+- [Best practices](/flash/configuration/best-practices) for optimization guidance.
## Test your customizations
diff --git a/flash/apps/deploy-apps.mdx b/flash/apps/deploy-apps.mdx
index 11ea5869..000aa19c 100644
--- a/flash/apps/deploy-apps.mdx
+++ b/flash/apps/deploy-apps.mdx
@@ -1,25 +1,11 @@
---
title: "Deploy Flash apps to Runpod"
sidebarTitle: "Deploy to Runpod"
-description: "Build and deploy your FastAPI app to Runpod."
+description: "Build and deploy your Flash app for production serving."
---
import { LoadBalancingEndpointsTooltip, QueueBasedEndpointsTooltip } from "/snippets/tooltips.jsx";
-Flash provides a complete deployment workflow for taking your local development project to production. Use `flash deploy` to build and deploy your application in a single command, or use `flash build` for more control over the build process.
-
-## Deployment workflow
-
-A typical deployment workflow looks like this:
-
-1. **Create a new project**: Use [`flash init`](/flash/cli/init) to create a new project.
-2. **Develop locally**: Use [`flash run`](/flash/cli/run) to test your application. Any functions decorated with `@Endpoint` will be run on Runpod Serverless workers.
-3. **Preview** (optional): Use [`flash deploy --preview`](/flash/cli/deploy) to test locally with Docker.
-4. **Deploy**: Use [`flash deploy`](/flash/cli/deploy) to push to Runpod Serverless.
-5. **Manage**: Use [`flash env`](/flash/cli/env) and [`flash app`](/flash/cli/app) to manage your deployments.
-
-## Deploy your application
-
When you're satisfied with your endpoint functions and ready to move to production, use `flash deploy` to build and deploy your Flash application:
```bash
@@ -35,49 +21,17 @@ This command performs the following steps:
### Deployment architecture
-Flash deploys your application as multiple independent Serverless endpoints. Each endpoint configuration in your worker files becomes a separate endpoint:
-
-```mermaid
-%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
-
-flowchart TB
- Users(["USERS"])
- StateManager["Runpod GraphQL API
• Service discovery
• Manifest registry"]
-
- subgraph Runpod ["RUNPOD SERVERLESS"]
- LB["lb_worker ENDPOINT
(load-balanced)
• POST /process
• GET /health"]
- GPU["gpu_worker ENDPOINT
(queue-based)
• POST /runsync"]
- CPU["cpu_worker ENDPOINT
(queue-based)
• POST /runsync"]
-
- LB <-.->|"inter-endpoint calls"| GPU
- LB <-.->|"inter-endpoint calls"| CPU
-
- LB -.->|"service discovery"| StateManager
- GPU -.->|"service discovery"| StateManager
- CPU -.->|"service discovery"| StateManager
- end
-
- Users -->|"call directly"| LB
- Users -->|"call directly"| GPU
- Users -->|"call directly"| CPU
-
- style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
- style Users fill:#4D38F5,stroke:#4D38F5,color:#fff
- style LB fill:#5F4CFE,stroke:#5F4CFE,color:#fff
- style GPU fill:#22C55E,stroke:#22C55E,color:#000
- style CPU fill:#22C55E,stroke:#22C55E,color:#000
- style StateManager fill:#AE6DFF,stroke:#AE6DFF,color:#fff
-```
+Flash deploys your application as multiple independent Serverless endpoints. Each endpoint configuration in your worker files becomes a separate endpoint.
**How Flash deployments work:**
-- **One endpoint name = one endpoint**: Each unique endpoint configuration (defined by its `name` parameter) creates a separate Serverless endpoint with its own URL.
+- **One Endpoint class = one Serverless endpoint**: Each unique endpoint configuration (defined by its `name` parameter) creates a separate Serverless endpoint with its own URL.
- **Call any endpoint**: After deployment, you can call whichever endpoint you need—`lb_worker` for API requests, `gpu_worker` for GPU tasks, `cpu_worker` for CPU tasks.
- [Load balancing endpoints](/flash/create-endpoints#load-balanced-endpoints): Create HTTP APIs with custom routes using `.get()`, `.post()`, etc. decorators.
- [Queue-based endpoints](/flash/create-endpoints#queue-based-endpoints): Run compute tasks using the `/runsync` or `/run` routes.
- **Inter-endpoint communication**: Endpoints can call each other's functions when needed, using the Runpod GraphQL service for discovery.
-### Deploy to an environment
+### Deploy to a specific environment
Flash organizes deployments using [apps and environments](/flash/apps/apps-and-environments). Deploy to a specific environment using the `--env` flag:
@@ -122,7 +76,7 @@ Queue-based endpoints:
Each endpoint is independent with its own URL and authentication.
-## Understanding endpoint architecture
+
The relationship between endpoint configurations and deployed endpoints differs between load-balanced and queue-based endpoints:
@@ -170,7 +124,7 @@ curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
```
-**Important:** For deployed queue-based endpoints, you must use **one function per endpoint name**. Each function creates its own Serverless endpoint. Do not put multiple `@Endpoint` functions with the same name when building Flash apps.
+**Important:** For deployed queue-based endpoints, you must use **one function per endpoint name**. Each function creates its own Serverless endpoint. Do not create multiple `@Endpoint` functions with the same `name` when building Flash apps.
### Load-balanced endpoints (multiple routes per endpoint)
@@ -211,14 +165,11 @@ curl -X GET https://abc123xyz.api.runpod.ai/health \
-H "Authorization: Bearer $RUNPOD_API_KEY"
```
-### Key takeaway
-
-- **Queue-based**: 1 endpoint name = 1 function = 1 Serverless endpoint
-- **Load-balanced**: 1 endpoint instance = multiple routes = 1 Serverless endpoint
+
## Preview before deploying
-Test your deployment locally using Docker before pushing to production using the `--preview` flag:
+You can test your deployment locally using Docker before pushing to production using the `--preview` flag:
```bash
flash deploy --preview
@@ -231,13 +182,6 @@ This command:
3. Starts one container per endpoint configuration (`lb_worker`, `gpu_worker`, `cpu_worker`, etc.).
4. Exposes all endpoints for local testing.
-Use preview mode to:
-
-- Validate your deployment configuration.
-- Test cross-endpoint function calls.
-- Debug resource provisioning issues.
-- Verify the manifest structure.
-
Press `Ctrl+C` to stop the preview environment.
## Managing deployment size
@@ -278,10 +222,6 @@ When you run `flash deploy` (or `flash build`), Flash:
5. **Installs** dependencies with Linux x86_64 compatibility.
6. **Packages** everything into `.flash/artifact.tar.gz`.
-### Cross-platform builds
-
-Flash automatically handles cross-platform builds. You can build on macOS, Windows, or Linux, and the resulting package will run correctly on Runpod's Linux x86_64 infrastructure.
-
### Build artifacts
After building, these artifacts are created in the `.flash/` directory:
@@ -292,7 +232,7 @@ After building, these artifacts are created in the `.flash/` directory:
| `.flash/flash_manifest.json` | Service discovery configuration |
| `.flash/.build/` | Temporary build directory (removed by default) |
-## What gets deployed to Runpod
+## What gets deployed
When you deploy a Flash app, you're deploying a **build artifact** (tarball) onto pre-built Flash Docker images. This architecture is similar to AWS Lambda layers: the base runtime is pre-built, and your code and dependencies are layered on top.
@@ -365,45 +305,16 @@ The `flash_manifest.json` file is the brain of your deployment. It tells each en
### What gets created on Runpod
-For each endpoint configuration in the manifest, Flash creates an independent Serverless endpoint. Each endpoint runs as its own service with its own URL.
-
-**load-balanced endpoints** ([load balancer](/serverless/load-balancing/overview))
-
-- **Purpose**: HTTP-facing services for custom API routes
-- **Image**: Pre-built `runpod/flash-lb-cpu:latest` or `runpod/flash-lb:latest`
-- **Use cases**: REST APIs, webhooks, public-facing services
-- **Example**: `lb_worker.py` with `@api.post("/process")`
-- **Routes**: Custom HTTP endpoints defined in your route decorators
-- **Startup process**:
- 1. Container extracts your tarball
- 2. Auto-generated handler imports your worker file (e.g., `lb_worker.py`)
- 3. Routes are registered from decorators
- 4. Uvicorn server starts on port 8000
-- **Service discovery**: Queries the state manager for cross-endpoint calls
-
-**queue-based endpoints** (serverless compute)
-
-- **Purpose**: Background compute for intensive `@Endpoint` functions
-- **Image**: Pre-built `runpod/flash:latest` (GPU) or `runpod/flash-cpu:latest` (CPU)
-- **Use cases**: GPU inference, batch processing, heavy computation
-- **Example**: `gpu_worker.py` with `@Endpoint(name="...", gpu=...)`
-- **Routes**: Automatic `/runsync` endpoint for job submission
-- **Startup process**:
- 1. Container extracts your tarball
- 2. Worker module is imported (e.g., `gpu_worker.py`)
- 3. Function registry maps function names to callables
- 4. Worker listens for jobs from job queue
-- **Execution**: Sequential job processing with automatic retry logic
-- **Service discovery**: Queries the state manager for cross-endpoint calls
+For each endpoint configuration in the manifest, Flash creates an independent Serverless endpoint, identified by its `name` parameter.
### Cross-endpoint communication
When one endpoint needs to call a function on another endpoint:
-1. **Manifest lookup**: Calling endpoint checks `flash_manifest.json` for function-to-resource mapping
-2. **Service discovery**: Queries the state manager (Runpod GraphQL API) for target endpoint URL
-3. **Direct call**: Makes HTTP request directly to target endpoint
-4. **Response**: Target endpoint executes function and returns result
+1. **Manifest lookup**: The calling endpoint checks `flash_manifest.json` for function-to-resource mapping.
+2. **Service discovery**: It queries the state manager (Runpod GraphQL API) for target endpoint URL.
+3. **Direct call**: It makes an HTTP request directly to the target endpoint.
+4. **Response**: The target endpoint executes the function and returns the result.
Each endpoint maintains its own connection to the state manager, querying for peer endpoint URLs as needed and caching results for 300 seconds to minimize API calls.
diff --git a/flash/apps/initialize-project.mdx b/flash/apps/initialize-project.mdx
index 88c5d528..aecc7b91 100644
--- a/flash/apps/initialize-project.mdx
+++ b/flash/apps/initialize-project.mdx
@@ -70,43 +70,20 @@ cp .env.example .env
# RUNPOD_API_KEY=your_api_key_here
```
-## How it fits into the workflow
-
-`flash init` is the first step in the Flash development workflow:
-
-```mermaid
-%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
-
-flowchart LR
- Init["flash init"]
- Dev["flash run"]
- Deploy["flash deploy"]
-
- Init -->|"Create project"| Dev
- Dev -->|"Test locally"| Deploy
-
- style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff
- style Dev fill:#22C55E,stroke:#22C55E,color:#000
- style Deploy fill:#4D38F5,stroke:#4D38F5,color:#fff
-```
-
-1. **`flash init`**: Creates project structure and boilerplate.
-2. **`flash run`**: Starts local development server for testing.
-3. **`flash deploy`**: Builds and deploys to Runpod Serverless.
-
## Handle existing files
If you run `flash init` in a directory with existing files, Flash detects conflicts and prompts for confirmation:
```text
-┌─ File Conflicts Detected ─────────────────────┐
-│ Warning: The following files will be │
-│ overwritten: │
-│ │
-│ • lb_worker.py │
-│ • gpu_worker.py │
-│ • requirements.txt │
-└───────────────────────────────────────────────┘
+File Conflicts Detected
+
+Warning: The following files will be overwritten:
+ • requirements.txt
+ • gpu_worker.py
+ • README.md
+ • lb_worker.py
+ • cpu_worker.py
+
Continue and overwrite these files? [y/N]:
```
diff --git a/flash/apps/local-testing.mdx b/flash/apps/local-testing.mdx
index f4b52120..8b3458a9 100644
--- a/flash/apps/local-testing.mdx
+++ b/flash/apps/local-testing.mdx
@@ -4,14 +4,9 @@ sidebarTitle: "Test locally"
description: "Use flash run to test your Flash application locally before deploying."
---
-The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. The development server runs locally and updates automatically as you edit files. When you call a `@Endpoint` function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately.
+The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. The development server runs locally and updates automatically as you edit files.
-Use `flash run` when you want to:
-
-- Iterate quickly with automatic code updates.
-- Test `@Endpoint` functions against real GPU/CPU workers.
-- Debug request/response handling before deployment.
-- Develop without redeploying after every change.
+When you call a `@Endpoint` function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately.
## Start the development server
@@ -23,7 +18,7 @@ flash run
The server starts at `http://localhost:8888` by default. Your endpoints are available immediately for testing, and `@Endpoint` functions provision Serverless endpoints on first call.
-### Custom host and port
+### Using a custom host and port
```bash
# Change port
@@ -127,27 +122,6 @@ flowchart TB
Your code updates automatically as you edit files. Endpoints created by `flash run` are prefixed with `live-` to distinguish them from production endpoints.
-## Development workflow
-
-A typical development cycle looks like this:
-
-1. Start the server: `flash run`
-2. Make changes to your code.
-3. The server reloads automatically.
-4. Test your changes via curl or the API explorer.
-5. Repeat until ready to deploy.
-
-When you're done, use `flash undeploy` to clean up the `live-` endpoints created during development.
-
-## Differences from production
-
-| Aspect | `flash run` | `flash deploy` |
-|--------|-------------|----------------|
-| FastAPI app runs on | Your machine | Runpod Serverless |
-| Endpoint naming | `live-` prefix | No prefix |
-| Automatic updates | Yes | No |
-| Authentication | Not required | Required |
-
## Clean up after testing
Endpoints created by `flash run` persist until you delete them. To clean up:
@@ -184,7 +158,7 @@ flash run --auto-provision
Ensure `RUNPOD_API_KEY` is set in your `.env` file or environment:
```bash
-export RUNPOD_API_KEY=your_api_key_here
+export RUNPOD_API_KEY="your_api_key_here"
```
## Next steps
diff --git a/flash/apps/overview.mdx b/flash/apps/overview.mdx
index d5738f05..868eb54e 100644
--- a/flash/apps/overview.mdx
+++ b/flash/apps/overview.mdx
@@ -6,48 +6,21 @@ description: "Understand the Flash app development lifecycle."
import { ServerlessTooltip } from "/snippets/tooltips.jsx";
-A Flash app is a collection of endpoints deployed to Runpod. When you deploy an app, Runpod:
-
-1. Packages your code, dependencies, and deployment manifest into a tarball (max 500 MB).
-2. Uploads the tarball to Runpod.
-3. Provisions independent Serverless endpoints based on your [endpoint configurations](/flash/create-endpoints).
-
-This page explains the key concepts and processes you'll use when building Flash apps.
-
-
-If you prefer to learn by doing, follow this tuturial to [build your first Flash app](/flash/apps/build-app).
-
+A Flash app is a collection of endpoints deployed to Runpod.
+
+
+ Create a Flash app, test it locally, and deploy it to production.
+
+
+ Create boilerplate code for a new Flash project with `flash init`.
+
+
-## App development overview
+## App development workflow
Building a Flash application follows a clear progression from initialization to production deployment:
-
-```mermaid
-%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
-
-flowchart TB
- Init["flash init
Create project"]
- Code["Define endpoints with
@Endpoint functions"]
- Run["Test locally with
flash run"]
- Deploy["Deploy to Runpod with
flash deploy"]
- Manage["Manage apps and
environments with
flash app and flash env"]
-
- Init --> Code
- Code --> Run
- Run -->|"Ready for production"| Deploy
- Deploy --> Manage
- Run -->|"Continue development"| Code
-
- style Init fill:#5F4CFE,stroke:#5F4CFE,color:#fff
- style Code fill:#22C55E,stroke:#22C55E,color:#000
- style Run fill:#4D38F5,stroke:#4D38F5,color:#fff
- style Deploy fill:#AE6DFF,stroke:#AE6DFF,color:#000
- style Manage fill:#9289FE,stroke:#9289FE,color:#fff
-```
-
-
Use `flash init` to create a new project with example workers:
@@ -79,7 +52,7 @@ flowchart TB
return {"result": "..."}
```
- [Learn more about endpoint functions](/flash/create-endpoints).
+ [Learn more about customizing your app](/flash/apps/customize-app).
@@ -89,7 +62,7 @@ flowchart TB
flash run
```
- Your app runs locally and updates automatically. When you call an `@Endpoint` function, Flash sends the latest code to Runpod workers. This hybrid architecture lets you iterate quickly without redeploying. [Learn more about local testing](/flash/apps/local-testing).
+ Your app runs locally and updates automatically. When you call an `@Endpoint` function, Flash sends the latest code to Runpod workers. [Learn more about local testing](/flash/apps/local-testing).
@@ -99,7 +72,14 @@ flowchart TB
flash deploy
```
+ When you deploy an app, Runpod:
+
+ 1. Packages your code, dependencies, and deployment manifest into a tarball (max 500 MB).
+ 2. Uploads the tarball to Runpod.
+ 3. Provisions independent Serverless endpoints based on your [endpoint configurations](/flash/create-endpoints).
+
Your entire application—including all worker functions—runs on Runpod infrastructure. [Learn more about deployment](/flash/apps/deploy-apps).
+
@@ -107,195 +87,11 @@ flowchart TB
+
## Apps and environments
Flash uses a two-level organizational structure: **apps** (project containers) and **environments** (deployment stages like dev, staging, production). See [Apps and environments](/flash/apps/apps-and-environments) for complete details.
-## Local vs production deployment
-
-Flash supports two modes of operation:
-
-### Local development (`flash run`)
-
-```mermaid
-%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
-
-flowchart TB
- subgraph Local ["YOUR MACHINE"]
- DevServer["Development Server
• Auto-reload on changes
• localhost:8888"]
- end
-
- subgraph Runpod ["RUNPOD SERVERLESS"]
- LB["live-lb_worker"]
- GPU["live-gpu_worker"]
- CPU["live-cpu_worker"]
- end
-
- DevServer -->|"HTTPS"| LB
- DevServer -->|"HTTPS"| GPU
- DevServer -->|"HTTPS"| CPU
-
- style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
- style Runpod fill:#1a1a2e,stroke:#22C55E,stroke-width:2px,color:#fff
- style DevServer fill:#5F4CFE,stroke:#5F4CFE,color:#fff
- style LB fill:#22C55E,stroke:#22C55E,color:#000
- style GPU fill:#22C55E,stroke:#22C55E,color:#000
- style CPU fill:#22C55E,stroke:#22C55E,color:#000
-```
-
-**How it works:**
-
-- Development server runs on your machine and updates automatically.
-- `@Endpoint` functions deploy to Runpod endpoints (one for each endpoint configuration).
-- Endpoints are prefixed with `live-` for easy identification.
-- No authentication required for local testing.
-- Fast iteration on application logic.
-
-### Production deployment (`flash deploy`)
-
-```mermaid
-%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%
-
-flowchart TB
- Users(["USERS"])
- StateManager["Runpod GraphQL API
• Service discovery
• Manifest registry"]
-
- subgraph Runpod ["RUNPOD SERVERLESS"]
- LB["lb_worker ENDPOINT
(load-balanced)
• POST /process
• GET /health"]
- GPU["gpu_worker ENDPOINT
(queue-based)
• POST /runsync"]
- CPU["cpu_worker ENDPOINT
(queue-based)
• POST /runsync"]
-
- LB <-.->|"inter-endpoint calls"| GPU
- LB <-.->|"inter-endpoint calls"| CPU
-
- LB -.->|"service discovery"| StateManager
- GPU -.->|"service discovery"| StateManager
- CPU -.->|"service discovery"| StateManager
- end
-
- Users -->|"HTTPS (auth required)"| LB
- Users -->|"HTTPS (auth required)"| GPU
- Users -->|"HTTPS (auth required)"| CPU
-
- style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
- style Users fill:#4D38F5,stroke:#4D38F5,color:#fff
- style LB fill:#5F4CFE,stroke:#5F4CFE,color:#fff
- style GPU fill:#22C55E,stroke:#22C55E,color:#000
- style CPU fill:#22C55E,stroke:#22C55E,color:#000
- style StateManager fill:#AE6DFF,stroke:#AE6DFF,color:#fff
-```
-
-**How it works:**
-
-- All endpoints run independently on Runpod Serverless (one for each endpoint configuration).
-- Each endpoint has its own public HTTPS URL.
-- API key authentication is required for all requests.
-- Automatic scaling based on load.
-- Production-grade reliability and performance.
-
-### Endpoint functions vs. Serverless endpoints
-
-Understanding the relationship between your code (endpoint functions) and deployed infrastructure (Serverless endpoints) is crucial for building Flash apps.
-
-**Serverless endpoints** are the underlying infrastructure Flash creates on Runpod. Each unique endpoint configuration (defined by its `name` parameter) creates one Serverless endpoint with specific hardware (GPU type, worker count, etc.). Each Serverless endpoint gets its own public HTTPS URL (e.g., `https://abc123xyz.api.runpod.ai` for load-balanced or `https://api.runpod.ai/v2/abc123xyz` for queue-based).
-
-You call these endpoints to execute your functions. The endpoint configuration type determines the behavior and HTTPS URL of the endpoint:
-
-- **For queue-based endpoints**: You can only have one function per endpoint, which will be executed when you call `/runsync` or `/run` on the endpoint.
-- **For load-balanced endpoints**: You can have multiple functions with different HTTP routes per endpoint, which will be executed when you call the endpoint with the appropriate HTTP method and path.
-
-#### Queue-based example
-
-Queue-based endpoints must have exactly one function defined per endpoint configuration, which will be executed when you call the `/runsync` or `/run` route on the endpoint.
-
-```python
-from runpod_flash import Endpoint, GpuType
-
-# Each queue-based function needs its own endpoint configuration
-@Endpoint(name="preprocess", gpu=GpuType.NVIDIA_A100_80GB_PCIe)
-def preprocess(data): ...
-
-@Endpoint(name="inference", gpu=GpuType.NVIDIA_A100_80GB_PCIe)
-def run_model(input): ...
-```
-
-This creates two separate Serverless endpoints, each with its own public HTTPS URL and `/run` or `/runsync` route.
-
-The URL depends on your endpoint ID, which is randomly generated when you deploy your app. For example, if your endpoint ID is `fexh32emkg3az7`, the `/runsync` URL will be `https://api.runpod.ai/v2/fexh32emkg3az7/runsync`.
-
-#### Load-balancing example
-
-Load-balancing endpoints can have multiple routes on a single Serverless endpoint. Use the route decorator pattern:
-
-```python
-from runpod_flash import Endpoint
-
-# One endpoint can host multiple HTTP routes
-api = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))
-
-@api.post("/generate")
-def generate_text(prompt: str): ...
-
-@api.get("/health")
-def health_check(): ...
-```
-
-This creates one Serverless endpoint with multiple routes: `POST /generate` and `GET /health`, which will be executed when you call the endpoint with the appropriate HTTP method and path.
-
-The final endpoint URL depends on your endpoint ID, which is randomly generated when you deploy your app, and the HTTP routes defined in your decorators. For example, if your endpoint ID is `l66m1rhm9dhbjd`, the `/generate` route will be available at `https://l66m1rhm9dhbjd.api.runpod.ai/generate`.
-
-[Learn more about endpoint mapping](/flash/apps/customize-app#understanding-endpoint-architecture).
-
-## Common workflows
-
-### Simple projects (single environment)
-
-For solo projects or simple applications:
-
-```bash
-# Initialize and develop
-flash init PROJECT_NAME
-cd PROJECT_NAME
-
-# Test locally
-flash run
-
-# Deploy to production (creates 'production' environment by default)
-flash deploy
-```
-
-### Team projects (multiple environments)
-
-For team collaboration with dev, staging, and production stages:
-
-```bash
-# Create environments
-flash env create dev
-flash env create staging
-flash env create production
-
-# Development cycle
-flash run # Test locally
-flash deploy --env dev # Deploy to dev for integration testing
-flash deploy --env staging # Deploy to staging for QA
-flash deploy --env production # Deploy to production after approval
-```
-
-### Feature development
-
-For testing new features in isolation:
-
-```bash
-# Create temporary feature environment
-flash env create FEATURE_NAME
-
-# Deploy and test
-flash deploy --env FEATURE_NAME
-
-# Clean up after merging
-flash env delete FEATURE_NAME
-```
-
## Next steps
diff --git a/flash/apps/requests.mdx b/flash/apps/requests.mdx
index 39d893e9..44edb6fa 100644
--- a/flash/apps/requests.mdx
+++ b/flash/apps/requests.mdx
@@ -46,13 +46,13 @@ curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
}
```
-**Check status later:**
+**Check job status and retrieve results:**
```bash
curl https://api.runpod.ai/v2/abc123xyz/status/job-abc-123 \
-H "Authorization: Bearer $RUNPOD_API_KEY"
```
-**When job completes:**
+**When the job completes:**
```json
{
"id": "job-abc-123",
From 1f768ca5fc97c9ae010263912ae18f84d73ac376 Mon Sep 17 00:00:00 2001
From: Mo King
Date: Wed, 11 Mar 2026 13:40:04 -0400
Subject: [PATCH 4/5] Move Flash CLI to CLI tab in the top nav
---
docs.json | 33 ++++++++++++++-------------------
flash/cli/overview.mdx | 2 +-
2 files changed, 15 insertions(+), 20 deletions(-)
diff --git a/docs.json b/docs.json
index 76fd1b0a..52a5497a 100644
--- a/docs.json
+++ b/docs.json
@@ -63,8 +63,6 @@
"flash/configuration/best-practices"
]
},
- "flash/execution-model",
- "flash/troubleshooting",
{
"group": "Build apps",
"pages": [
@@ -78,20 +76,8 @@
"flash/apps/requests"
]
},
- {
- "group": "CLI reference",
- "pages": [
- "flash/cli/overview",
- "flash/cli/init",
- "flash/cli/login",
- "flash/cli/run",
- "flash/cli/build",
- "flash/cli/deploy",
- "flash/cli/env",
- "flash/cli/app",
- "flash/cli/undeploy"
- ]
- }
+ "flash/execution-model",
+ "flash/troubleshooting"
]
},
{
@@ -467,14 +453,23 @@
"tab": "CLI",
"groups": [
{
- "group": "Runpod CLI",
+ "group": "Flash CLI",
"pages": [
- "runpodctl/overview"
+ "flash/cli/overview",
+ "flash/cli/init",
+ "flash/cli/login",
+ "flash/cli/run",
+ "flash/cli/build",
+ "flash/cli/deploy",
+ "flash/cli/env",
+ "flash/cli/app",
+ "flash/cli/undeploy"
]
},
{
- "group": "Reference",
+ "group": "Runpod CLI",
"pages": [
+ "runpodctl/overview",
"runpodctl/reference/runpodctl-config",
"runpodctl/reference/runpodctl-create-pod",
"runpodctl/reference/runpodctl-create-pods",
diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx
index b3aaa290..82e78427 100644
--- a/flash/cli/overview.mdx
+++ b/flash/cli/overview.mdx
@@ -6,7 +6,7 @@ description: "Learn how to use the Flash CLI for local development and deploymen
The Flash CLI provides commands for initializing projects, running local development servers, building deployment artifacts, and managing your applications on Runpod Serverless.
-Before using the CLI, make sure you've [installed Flash](/flash/overview#install-flash) and set your [Runpod API key](/get-started/api-keys) in your environment.
+Before using the CLI, make sure you've [installed Flash](/flash/overview#install-flash).
## Available commands
From fe120cbd31472ab298d6616dabe5e31cfd94a9fc Mon Sep 17 00:00:00 2001
From: Mo King
Date: Wed, 11 Mar 2026 14:20:11 -0400
Subject: [PATCH 5/5] Add beta tag to Flash CLI
---
flash/cli/overview.mdx | 1 +
1 file changed, 1 insertion(+)
diff --git a/flash/cli/overview.mdx b/flash/cli/overview.mdx
index 82e78427..f9436b6a 100644
--- a/flash/cli/overview.mdx
+++ b/flash/cli/overview.mdx
@@ -2,6 +2,7 @@
title: "CLI overview"
sidebarTitle: "Overview"
description: "Learn how to use the Flash CLI for local development and deployment."
+tag: "BETA"
---
The Flash CLI provides commands for initializing projects, running local development servers, building deployment artifacts, and managing your applications on Runpod Serverless.