diff --git a/01_getting_started/01_hello_world/README.md b/01_getting_started/01_hello_world/README.md index beb291c..7ae39af 100644 --- a/01_getting_started/01_hello_world/README.md +++ b/01_getting_started/01_hello_world/README.md @@ -18,17 +18,30 @@ uv run flash login Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`. -### 3. Run Locally +### 3. Run the Example ```bash -uv run flash run +python gpu_worker.py ``` -Server starts at **http://localhost:8888** +The function executes on a Runpod GPU and prints the result directly: + +``` +Testing GPU worker with payload: {'message': 'Testing GPU worker'} +Result: {'status': 'success', 'message': 'Testing GPU worker', 'worker_type': 'GPU', ...} +``` + +First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds. + +### Alternative: HTTP API Testing -### 4. Test the API +To test via HTTP endpoints instead: -Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions. +```bash +uv run flash run +``` + +Visit **http://localhost:8888/docs** for interactive API documentation. ```bash curl -X POST http://localhost:8888/gpu_worker/runsync \ @@ -36,13 +49,6 @@ curl -X POST http://localhost:8888/gpu_worker/runsync \ -d '{"message": "Hello GPU!"}' ``` -### Full CLI Documentation - -For complete CLI usage including deployment, environment management, and troubleshooting: -- **[CLI Reference](../../CLI-REFERENCE.md)** - All commands and options -- **[Getting Started Guide](../../docs/cli/getting-started.md)** - Step-by-step tutorial -- **[Workflows](../../docs/cli/workflows.md)** - Common development patterns - ## What This Demonstrates ### GPU Worker (`gpu_worker.py`) @@ -133,14 +139,14 @@ The worker uses PyTorch to detect and report GPU information: ## Development -### Test Worker Locally +### Run the Worker ```bash python gpu_worker.py ``` -### Run the Application +### HTTP API Testing (Optional) ```bash -flash run +uv run flash run ``` ## Next Steps diff --git a/01_getting_started/01_hello_world/gpu_worker.py b/01_getting_started/01_hello_world/gpu_worker.py index d7a330f..6636dc9 100644 --- a/01_getting_started/01_hello_world/gpu_worker.py +++ b/01_getting_started/01_hello_world/gpu_worker.py @@ -1,6 +1,6 @@ -# gpu serverless worker -- detects available GPU hardware. -# run with: flash run -# test directly: python gpu_worker.py +# GPU serverless worker -- detects available GPU hardware. +# Run: python gpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import Endpoint, GpuType diff --git a/01_getting_started/02_cpu_worker/README.md b/01_getting_started/02_cpu_worker/README.md index 4d5fb88..8f80804 100644 --- a/01_getting_started/02_cpu_worker/README.md +++ b/01_getting_started/02_cpu_worker/README.md @@ -18,17 +18,30 @@ uv run flash login Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`. -### 3. Run Locally +### 3. Run the Example ```bash -uv run flash run +python cpu_worker.py ``` -Server starts at **http://localhost:8888** +The function executes on a Runpod CPU worker and prints the result directly: + +``` +Testing CPU worker with payload: {'name': 'Testing CPU worker'} +Result: {'status': 'success', 'message': 'Hello, Testing CPU worker!', 'worker_type': 'CPU', ...} +``` + +First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds. + +### Alternative: HTTP API Testing -### 4. Test the API +To test via HTTP endpoints instead: -Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions. +```bash +uv run flash run +``` + +Visit **http://localhost:8888/docs** for interactive API documentation. ```bash curl -X POST http://localhost:8888/cpu_worker/runsync \ @@ -36,13 +49,6 @@ curl -X POST http://localhost:8888/cpu_worker/runsync \ -d '{"name": "Flash User"}' ``` -### Full CLI Documentation - -For complete CLI usage including deployment, environment management, and troubleshooting: -- **[CLI Reference](../../CLI-REFERENCE.md)** - All commands and options -- **[Getting Started Guide](../../docs/cli/getting-started.md)** - Step-by-step tutorial -- **[Workflows](../../docs/cli/workflows.md)** - Common development patterns - ## What This Demonstrates ### CPU Worker (`cpu_worker.py`) @@ -135,14 +141,14 @@ The CPU worker scales to zero when idle: ## Development -### Test Worker Locally +### Run the Worker ```bash python cpu_worker.py ``` -### Run the Application +### HTTP API Testing (Optional) ```bash -flash run +uv run flash run ``` ## When to Use CPU Workers diff --git a/01_getting_started/02_cpu_worker/cpu_worker.py b/01_getting_started/02_cpu_worker/cpu_worker.py index 0679296..94b702b 100644 --- a/01_getting_started/02_cpu_worker/cpu_worker.py +++ b/01_getting_started/02_cpu_worker/cpu_worker.py @@ -1,6 +1,6 @@ -# cpu serverless worker -- lightweight processing without GPU. -# run with: flash run -# test directly: python cpu_worker.py +# CPU serverless worker -- lightweight processing without GPU. +# Run: python cpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import CpuInstanceType, Endpoint diff --git a/01_getting_started/03_mixed_workers/README.md b/01_getting_started/03_mixed_workers/README.md index e85fad4..c1199dd 100644 --- a/01_getting_started/03_mixed_workers/README.md +++ b/01_getting_started/03_mixed_workers/README.md @@ -40,16 +40,33 @@ Response **Prerequisites**: Complete the [repository setup](../../README.md#quick-start) first (clone, `make dev`, set API key). -### Run This Example +### Test Individual Workers + +Run the CPU and GPU workers directly: ```bash cd 01_getting_started/03_mixed_workers -flash run + +# Test CPU preprocessing worker +python cpu_worker.py + +# Test GPU inference worker +python gpu_worker.py ``` -### Alternative: Standalone Setup +First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds. + +### Run the Full Pipeline -If you haven't run the repository-wide setup: +The pipeline endpoint (`/classify`) orchestrates multiple workers via HTTP. To test it: + +```bash +uv run flash run +``` + +Server starts at http://localhost:8888 + +### Setup (if needed) ```bash # Install dependencies @@ -58,13 +75,8 @@ uv sync # Authenticate uv run flash login # Or create .env file with RUNPOD_API_KEY=your_api_key_here - -# Run -uv run flash run ``` -Server starts at http://localhost:8888 - ## Test the Pipeline ```bash diff --git a/01_getting_started/03_mixed_workers/cpu_worker.py b/01_getting_started/03_mixed_workers/cpu_worker.py index f65fd6c..d96001a 100644 --- a/01_getting_started/03_mixed_workers/cpu_worker.py +++ b/01_getting_started/03_mixed_workers/cpu_worker.py @@ -1,7 +1,7 @@ -# cpu workers for text preprocessing and postprocessing. -# part of the mixed CPU/GPU pipeline example. -# run with: flash run -# test directly: python cpu_worker.py +# CPU workers for text preprocessing and postprocessing. +# Part of the mixed CPU/GPU pipeline example. +# Run: python cpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import CpuInstanceType, Endpoint diff --git a/01_getting_started/03_mixed_workers/gpu_worker.py b/01_getting_started/03_mixed_workers/gpu_worker.py index b6ae065..7ac28a1 100644 --- a/01_getting_started/03_mixed_workers/gpu_worker.py +++ b/01_getting_started/03_mixed_workers/gpu_worker.py @@ -1,7 +1,7 @@ -# gpu worker for ML inference (sentiment classification). -# part of the mixed CPU/GPU pipeline example. -# run with: flash run -# test directly: python gpu_worker.py +# GPU worker for ML inference (sentiment classification). +# Part of the mixed CPU/GPU pipeline example. +# Run: python gpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import Endpoint, GpuGroup diff --git a/01_getting_started/03_mixed_workers/pipeline.py b/01_getting_started/03_mixed_workers/pipeline.py index 6a4615f..1088627 100644 --- a/01_getting_started/03_mixed_workers/pipeline.py +++ b/01_getting_started/03_mixed_workers/pipeline.py @@ -1,6 +1,13 @@ -# classification pipeline: CPU preprocess -> GPU inference -> CPU postprocess. -# demonstrates cross-worker orchestration via a load-balanced endpoint. -# run with: flash run +# Classification pipeline: CPU preprocess -> GPU inference -> CPU postprocess. +# Demonstrates cross-worker orchestration via a load-balanced endpoint. +# Run: python pipeline.py (local testing) +# Alternative: flash run (for HTTP route testing) +import sys +from pathlib import Path + +# Ensure sibling modules (cpu_worker, gpu_worker) are importable regardless of cwd +sys.path.insert(0, str(Path(__file__).parent)) + from runpod_flash import Endpoint pipeline = Endpoint(name="01_03_classify_pipeline", cpu="cpu3c-1-2", workers=(1, 3)) diff --git a/01_getting_started/04_dependencies/README.md b/01_getting_started/04_dependencies/README.md index cf9a4a8..e54b615 100644 --- a/01_getting_started/04_dependencies/README.md +++ b/01_getting_started/04_dependencies/README.md @@ -29,14 +29,16 @@ Learn how to manage Python packages and system dependencies in Flash workers. ```bash cd 01_getting_started/04_dependencies -flash run -``` -Server starts at http://localhost:8888 +# Run any worker directly +python gpu_worker.py +python cpu_worker.py +python mixed_worker.py +``` -### Alternative: Standalone Setup +First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds. -If you haven't run the repository-wide setup: +### Setup (if needed) ```bash # Install dependencies @@ -45,11 +47,18 @@ uv sync # Authenticate uv run flash login # Or create .env file with RUNPOD_API_KEY=your_api_key_here +``` -# Run +### Alternative: HTTP API Testing + +To test via HTTP endpoints: + +```bash uv run flash run ``` +Server starts at http://localhost:8888 + ## GPU vs CPU Packaging GPU and CPU endpoints use different base Docker images, which affects how dependencies are resolved: diff --git a/01_getting_started/04_dependencies/cpu_worker.py b/01_getting_started/04_dependencies/cpu_worker.py index 64e2c96..3b542ee 100644 --- a/01_getting_started/04_dependencies/cpu_worker.py +++ b/01_getting_started/04_dependencies/cpu_worker.py @@ -1,6 +1,6 @@ -# cpu workers demonstrating data science and zero-dependency patterns. -# run with: flash run -# test directly: python cpu_worker.py +# CPU workers demonstrating data science and zero-dependency patterns. +# Run: python cpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import CpuInstanceType, Endpoint diff --git a/01_getting_started/04_dependencies/gpu_worker.py b/01_getting_started/04_dependencies/gpu_worker.py index 07df859..8d951a8 100644 --- a/01_getting_started/04_dependencies/gpu_worker.py +++ b/01_getting_started/04_dependencies/gpu_worker.py @@ -1,6 +1,6 @@ -# gpu workers demonstrating Python and system dependency management. -# run with: flash run -# test directly: python gpu_worker.py +# GPU workers demonstrating Python and system dependency management. +# Run: python gpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import Endpoint, GpuGroup diff --git a/01_getting_started/04_dependencies/mixed_worker.py b/01_getting_started/04_dependencies/mixed_worker.py index 4b15892..6736d1d 100644 --- a/01_getting_started/04_dependencies/mixed_worker.py +++ b/01_getting_started/04_dependencies/mixed_worker.py @@ -3,8 +3,8 @@ # - GPU images (runpod/pytorch:*) have numpy pre-installed # - CPU images (python-slim) install numpy from the build artifact # -# run with: flash run -# test directly: python mixed_worker.py +# Run: python mixed_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import CpuInstanceType, Endpoint, GpuType diff --git a/02_ml_inference/01_text_to_speech/README.md b/02_ml_inference/01_text_to_speech/README.md index 4b89a47..af2e3af 100644 --- a/02_ml_inference/01_text_to_speech/README.md +++ b/02_ml_inference/01_text_to_speech/README.md @@ -33,31 +33,30 @@ Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`. ### Run ```bash -uv run flash run +python gpu_worker.py ``` -First run provisions the endpoint (~1 min). Server starts at http://localhost:8888 +First run provisions the endpoint (~1 min) and downloads the model. The result is printed directly to your terminal. -### Test the Endpoint +Subsequent runs take 5-10 seconds (worker is already running). -Visit http://localhost:8888/docs for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions. +### Alternative: HTTP API Testing + +To test via HTTP endpoints: -**Generate speech (JSON with base64 audio):** ```bash -curl -X POST http://localhost:8888/gpu_worker/runsync \ - -H "Content-Type: application/json" \ - -d '{"text": "Hello world!", "speaker": "Ryan", "language": "English"}' +uv run flash run ``` -**List available voices:** +Server starts at http://localhost:8888. Visit http://localhost:8888/docs for interactive API documentation. + +**Generate speech (JSON with base64 audio):** ```bash curl -X POST http://localhost:8888/gpu_worker/runsync \ -H "Content-Type: application/json" \ - -d '{}' + -d '{"text": "Hello world!", "speaker": "Ryan", "language": "English"}' ``` -Check `/docs` for the exact auto-generated endpoint paths and schemas. - ## API Functions QB (queue-based) endpoints are auto-generated from `@Endpoint` functions. Visit `/docs` for the full API schema. diff --git a/02_ml_inference/01_text_to_speech/gpu_worker.py b/02_ml_inference/01_text_to_speech/gpu_worker.py index 6d60e01..4245efb 100644 --- a/02_ml_inference/01_text_to_speech/gpu_worker.py +++ b/02_ml_inference/01_text_to_speech/gpu_worker.py @@ -1,6 +1,6 @@ # Qwen3-TTS text-to-speech GPU worker. -# run with: flash run -# test directly: python gpu_worker.py +# Run: python gpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import Endpoint, GpuGroup diff --git a/03_advanced_workers/05_load_balancer/README.md b/03_advanced_workers/05_load_balancer/README.md index 2c6eadc..d9e855a 100644 --- a/03_advanced_workers/05_load_balancer/README.md +++ b/03_advanced_workers/05_load_balancer/README.md @@ -37,13 +37,29 @@ uv run flash login Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`. -### 3. Run Locally (from repository root) +### 3. Test Individual Workers + +Run each load-balanced worker directly: + +```bash +# Test GPU load-balanced worker +python gpu_lb.py + +# Test CPU load-balanced worker +python cpu_lb.py +``` + +This tests the worker setup. Results are printed directly to your terminal. + +### 4. Test HTTP Routes + +Load-balanced endpoints expose HTTP routes. To test the full API: ```bash uv run flash run ``` -Visit **http://localhost:8888/docs** for interactive API documentation (unified app with all examples). +Visit **http://localhost:8888/docs** for interactive API documentation. ### 4. Test Endpoints (via unified app) diff --git a/03_advanced_workers/05_load_balancer/cpu_lb.py b/03_advanced_workers/05_load_balancer/cpu_lb.py index 08a9105..8b239b3 100644 --- a/03_advanced_workers/05_load_balancer/cpu_lb.py +++ b/03_advanced_workers/05_load_balancer/cpu_lb.py @@ -1,6 +1,6 @@ -# cpu load-balanced endpoints with custom HTTP routes. -# run with: flash run -# test directly: python cpu_lb.py +# CPU load-balanced endpoints with custom HTTP routes. +# Run: python cpu_lb.py (test worker setup) +# Run: flash run (test HTTP routes) from runpod_flash import Endpoint api = Endpoint( diff --git a/03_advanced_workers/05_load_balancer/gpu_lb.py b/03_advanced_workers/05_load_balancer/gpu_lb.py index 2637bef..d0f1218 100644 --- a/03_advanced_workers/05_load_balancer/gpu_lb.py +++ b/03_advanced_workers/05_load_balancer/gpu_lb.py @@ -1,6 +1,6 @@ -# gpu load-balanced endpoints with custom HTTP routes. -# run with: flash run -# test directly: python gpu_lb.py +# GPU load-balanced endpoints with custom HTTP routes. +# Run: python gpu_lb.py (test worker setup) +# Run: flash run (test HTTP routes) from runpod_flash import Endpoint, GpuType api = Endpoint( diff --git a/04_scaling_performance/01_autoscaling/README.md b/04_scaling_performance/01_autoscaling/README.md index 0e02e67..5b090ca 100644 --- a/04_scaling_performance/01_autoscaling/README.md +++ b/04_scaling_performance/01_autoscaling/README.md @@ -6,22 +6,32 @@ Configure Flash worker autoscaling for different workload patterns. This example **Prerequisites**: Complete the [repository setup](../../README.md#quick-start) first, or run `flash login` to authenticate. +### Run the Examples + ```bash cd 04_scaling_performance/01_autoscaling -flash run + +# Run GPU worker +python gpu_worker.py + +# Run CPU worker +python cpu_worker.py ``` -Server starts at http://localhost:8888 -- visit http://localhost:8888/docs for interactive API docs. +First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds. -### Test Individual Strategies +### Alternative: HTTP API Testing + +To test via HTTP endpoints: ```bash -# Scale-to-zero GPU worker -curl -X POST http://localhost:8888/gpu_worker/runsync \ - -H "Content-Type: application/json" \ - -d '{"matrix_size": 512}' +uv run flash run +``` -# Always-on GPU worker (same payload, different endpoint) +Server starts at http://localhost:8888. Visit http://localhost:8888/docs for interactive API docs. + +```bash +# Scale-to-zero GPU worker curl -X POST http://localhost:8888/gpu_worker/runsync \ -H "Content-Type: application/json" \ -d '{"matrix_size": 512}' diff --git a/04_scaling_performance/01_autoscaling/cpu_worker.py b/04_scaling_performance/01_autoscaling/cpu_worker.py index 6660ea3..17a6ada 100644 --- a/04_scaling_performance/01_autoscaling/cpu_worker.py +++ b/04_scaling_performance/01_autoscaling/cpu_worker.py @@ -1,6 +1,6 @@ -# cpu autoscaling strategies -- scale-to-zero and burst-ready. -# run with: flash run -# test directly: python cpu_worker.py +# CPU autoscaling strategies -- scale-to-zero and burst-ready. +# Run: python cpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import CpuInstanceType, Endpoint diff --git a/04_scaling_performance/01_autoscaling/gpu_worker.py b/04_scaling_performance/01_autoscaling/gpu_worker.py index 2d12fb0..8c900a0 100644 --- a/04_scaling_performance/01_autoscaling/gpu_worker.py +++ b/04_scaling_performance/01_autoscaling/gpu_worker.py @@ -1,6 +1,6 @@ -# gpu autoscaling strategies -- scale-to-zero, always-on, high-throughput. -# run with: flash run -# test directly: python gpu_worker.py +# GPU autoscaling strategies -- scale-to-zero, always-on, high-throughput. +# Run: python gpu_worker.py +# Alternative: flash run (for HTTP API testing) from runpod_flash import Endpoint, GpuType, ServerlessScalerType diff --git a/05_data_workflows/01_network_volumes/README.md b/05_data_workflows/01_network_volumes/README.md index bd9cf24..13473c1 100644 --- a/05_data_workflows/01_network_volumes/README.md +++ b/05_data_workflows/01_network_volumes/README.md @@ -22,29 +22,32 @@ uv run flash login Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`. -### 3. Run Locally +### 3. Run the GPU Worker + +Generate an image by running the GPU worker directly: ```bash -uv run flash run +python gpu_worker.py ``` -Server starts at `http://localhost:8888` +First run takes 60-120 seconds (provisioning + model download). The image is saved to the network volume and the result is printed to your terminal. + +### 4. Test the CPU Worker (HTTP API) -### 4. Test the API +The CPU worker serves images via HTTP routes. To test it: -**Generate an image (GPU worker):** ```bash -curl -X POST http://localhost:8888/gpu_worker/runsync \ - -H "Content-Type: application/json" \ - -d '{"prompt": "a sunset over mountains"}' +uv run flash run ``` -**List generated images (CPU worker):** +Server starts at `http://localhost:8888` + +**List generated images:** ```bash curl http://localhost:8888/images ``` -**Get a specific image (CPU worker):** +**Get a specific image:** ```bash curl http://localhost:8888/images/sd_generated_20240101_120000.png ``` diff --git a/05_data_workflows/01_network_volumes/cpu_worker.py b/05_data_workflows/01_network_volumes/cpu_worker.py index 5d1dad4..68d6fb3 100644 --- a/05_data_workflows/01_network_volumes/cpu_worker.py +++ b/05_data_workflows/01_network_volumes/cpu_worker.py @@ -1,6 +1,6 @@ -# cpu worker with network volume for listing and serving generated images. -# run with: flash run -# test directly: python cpu_worker.py +# CPU worker with network volume for listing and serving generated images. +# This is an LB endpoint with HTTP routes - use flash run to test routes. +# Run: flash run (required for HTTP route testing) from runpod_flash import Endpoint, NetworkVolume volume = NetworkVolume( diff --git a/05_data_workflows/01_network_volumes/gpu_worker.py b/05_data_workflows/01_network_volumes/gpu_worker.py index fd4c7b2..d7aa18c 100644 --- a/05_data_workflows/01_network_volumes/gpu_worker.py +++ b/05_data_workflows/01_network_volumes/gpu_worker.py @@ -1,6 +1,6 @@ -# gpu worker with network volume for Stable Diffusion image generation. -# run with: flash run -# test directly: python gpu_worker.py +# GPU worker with network volume for Stable Diffusion image generation. +# Run: python gpu_worker.py +# Alternative: flash run (for HTTP API testing) import logging from runpod_flash import Endpoint, GpuType, NetworkVolume diff --git a/06_real_world/README.md b/06_real_world/README.md index e640184..3cec2b4 100644 --- a/06_real_world/README.md +++ b/06_real_world/README.md @@ -116,9 +116,17 @@ All real-world examples include: ## Deployment Patterns ### Development + +Run individual workers directly: ```bash cd example_name -flash run +python gpu_worker.py +python cpu_worker.py +``` + +Or run the full app with HTTP routes: +```bash +uv run flash run ``` ### Production diff --git a/README.md b/README.md index c73bcdf..bd581d7 100644 --- a/README.md +++ b/README.md @@ -2,38 +2,6 @@ A collection of example applications showcasing Runpod Flash - a framework for building production-ready AI applications with distributed GPU and CPU computing. -## What is Flash? - -Flash is a Python framework that lets you run functions on Runpod's Serverless infrastructure with a single decorator. Write code locally, deploy globally—Flash handles provisioning, scaling, and routing automatically. - -```python -from runpod_flash import Endpoint, GpuType - -@Endpoint(name="image-gen", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch", "diffusers"]) -async def generate_image(prompt: str) -> bytes: - # This runs on a cloud GPU, not your laptop - ... -``` - -**Key features:** -- **`@Endpoint` decorator**: Mark any async function to run on serverless infrastructure -- **Auto-scaling**: Scale to zero when idle, scale up under load -- **Local development**: `flash run` starts a local server with hot reload -- **One-command deploy**: `flash deploy` packages and ships your code - -## Prerequisites - -- **Python 3.10+** -- **uv**: Install with `curl -LsSf https://astral.sh/uv/install.sh | sh` -- **Runpod account**: [Sign up here](https://runpod.io/console/signup) - -### Python version in deployed workers - -Your local Python version does not affect what runs in the cloud. `flash build` downloads wheels for the container's Python version automatically. - -- **GPU workers**: Python 3.12 only. The GPU base image ships multiple interpreters (3.9-3.14) for interactive pod use, but torch and CUDA libraries are installed only for 3.12. -- **CPU workers**: Python 3.10, 3.11, or 3.12. Configurable via `PYTHON_VERSION` build arg. - ## Quick Start ```bash @@ -45,11 +13,12 @@ uv sync && uv pip install -e . # Authenticate with Runpod uv run flash login -# Run all examples locally -uv run flash run +# Run an example +cd 01_getting_started/01_hello_world +python gpu_worker.py ``` -Open **http://localhost:8888/docs** to explore all endpoints. +The function executes on a Runpod GPU and prints the result directly. First run takes 30-60 seconds (provisioning); subsequent runs take 2-3 seconds. > **Using pip, poetry, or conda?** See [DEVELOPMENT.md](./DEVELOPMENT.md) for alternative setups. @@ -68,85 +37,30 @@ Open **http://localhost:8888/docs** to explore all endpoints. More examples coming soon in each category. -## CLI Commands - -```bash -flash login # Authenticate with Runpod (opens browser) -flash run # Run development server (localhost:8888) -flash build # Build deployment package -flash deploy --env # Build and deploy to environment -flash undeploy # Delete deployed endpoint -``` - -See **[CLI-REFERENCE.md](./CLI-REFERENCE.md)** for complete documentation. - -## Key Concepts - -### Endpoint - -The `Endpoint` class configures functions for execution on Runpod's serverless infrastructure: +## What is Flash? -**Queue-based (one function = one endpoint):** +Flash is a Python framework that lets you run functions on Runpod's Serverless infrastructure with a single decorator. Write code locally, deploy globally—Flash handles provisioning, scaling, and routing automatically. ```python from runpod_flash import Endpoint, GpuType -@Endpoint(name="my-worker", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, workers=(0, 3), dependencies=["torch"]) -async def process(data: dict) -> dict: - import torch - # this code runs on Runpod GPUs - return {"result": "processed"} -``` - -**Load-balanced (multiple routes, shared workers):** - -```python -from runpod_flash import Endpoint - -api = Endpoint(name="my-api", cpu="cpu3c-1-2", workers=(1, 3)) - -@api.get("/health") -async def health(): - return {"status": "ok"} - -@api.post("/compute") -async def compute(data: dict) -> dict: - return {"result": data} -``` - -**Client mode (connect to an existing endpoint):** - -```python -from runpod_flash import Endpoint - -ep = Endpoint(id="ep-abc123") -job = await ep.run({"prompt": "hello"}) -await job.wait() -print(job.output) +@Endpoint(name="image-gen", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch", "diffusers"]) +async def generate_image(prompt: str) -> bytes: + # This runs on a cloud GPU, not your laptop + ... ``` -### Resource Types - -**GPU Workers** (`gpu=`): -| Type | Use Case | -|------|----------| -| `GpuType.NVIDIA_GEFORCE_RTX_4090` | RTX 4090 (24GB) | -| `GpuType.NVIDIA_RTX_6000_ADA_GENERATION` | RTX 6000 Ada (48GB) | -| `GpuType.NVIDIA_A100_80GB_PCIe` | A100 (80GB) | - -**CPU Workers** (`cpu=`): -| Type | Specs | -|------|-------| -| `cpu3g-2-8` | 2 vCPU, 8GB RAM | -| `cpu3c-4-8` | 4 vCPU, 8GB RAM (Compute) | -| `cpu5c-4-16` | 4 vCPU, 16GB RAM (Latest) | +**Key features:** +- **`@Endpoint` decorator**: Mark any async function to run on serverless infrastructure +- **Auto-scaling**: Scale to zero when idle, scale up under load +- **Local development**: `flash run` starts a local server with hot reload +- **One-command deploy**: `flash deploy` packages and ships your code -### Auto-Scaling +## Prerequisites -Workers automatically scale based on demand: -- `workers=(0, 3)` - Scale from 0 to 3 workers (cost-efficient) -- `workers=(1, 5)` - Keep 1 warm, scale up to 5 -- `idle_timeout=5` - Seconds before scaling down +- **Python 3.10-3.12** +- **uv**: Install with `curl -LsSf https://astral.sh/uv/install.sh | sh` +- **Runpod account**: [Sign up here](https://runpod.io/console/signup) ## Resources