Simple example demonstrating CPU-based serverless workers with automatic scaling on Runpod's infrastructure.
uv syncuv run flash loginOr create a .env file with RUNPOD_API_KEY=your_api_key_here.
uv run flash runServer starts at http://localhost:8888
Visit http://localhost:8888/docs for interactive API documentation. QB endpoints are auto-generated by flash run based on your @Endpoint functions.
curl -X POST http://localhost:8888/cpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"name": "Flash User"}'For complete CLI usage including deployment, environment management, and troubleshooting:
- CLI Reference - All commands and options
- Getting Started Guide - Step-by-step tutorial
- Workflows - Common development patterns
Simple CPU-based serverless function that:
- Processes requests without GPU overhead
- Returns system and platform information
- Scales from 0-3 workers automatically
- Runs on general-purpose CPU instances
The worker demonstrates:
- Remote execution with the
@Endpointdecorator - CPU resource configuration via
cpu=parameter - Automatic scaling via
workers=parameter - Lightweight API request handling
QB (queue-based) endpoints are auto-generated from @Endpoint functions. Visit /docs for the full API schema.
Executes a simple CPU worker and returns a greeting with system information.
Request:
{
"name": "Flash User"
}Response:
{
"status": "success",
"message": "Hello, Flash User!",
"worker_type": "CPU",
"timestamp": "2024-01-24T10:30:45.123456",
"platform": "Linux",
"python_version": "3.11.0"
}02_cpu_worker/
├── cpu_worker.py # CPU worker with @Endpoint decorator
├── pyproject.toml # Project metadata
├── requirements.txt # Dependencies
├── .env.example # Environment variables template
└── README.md # This file
The @Endpoint decorator transparently executes functions on serverless infrastructure:
- Code runs locally during development
- Automatically deploys to Runpod when configured
- Handles serialization and resource management
from runpod_flash import Endpoint
@Endpoint(name="my-worker", cpu="cpu3c-1-2", workers=(0, 3))
async def my_function(data: dict) -> dict:
return {"result": "processed"}Available CPU configurations:
CpuInstanceType.CPU3G_2_8: 2 vCPU, 8GB RAM (General Purpose)CpuInstanceType.CPU3C_4_8: 4 vCPU, 8GB RAM (Compute Optimized)CpuInstanceType.CPU5G_4_16: 4 vCPU, 16GB RAM (Latest Gen)
CPU type can be specified as an enum or a string shorthand:
# enum
@Endpoint(name="worker", cpu=CpuInstanceType.CPU3C_1_2)
# string shorthand
@Endpoint(name="worker", cpu="cpu3c-1-2")The CPU worker scales to zero when idle:
- workers=(0, 3): Scale from 0 to 3 workers
- idle_timeout=5: 5 minutes before scaling down
python cpu_worker.pyflash runChoose CPU workers for:
- API request handling
- Data processing and transformation
- Lightweight compute tasks
- Cost-sensitive workloads
- No GPU requirements
Compare with GPU workers when you need:
- Machine learning inference
- Image/video processing
- CUDA acceleration
- GPU-specific libraries (PyTorch, TensorFlow)
- Customize CPU type: Change
"cpu3c-1-2"to a different instance type - Add request validation and error handling
- Integrate with databases or external APIs
- Deploy to production with
flash deploy