Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 21 additions & 15 deletions 01_getting_started/01_hello_world/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,31 +18,37 @@ uv run flash login

Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`.

### 3. Run Locally
### 3. Run the Example

```bash
uv run flash run
python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stay uv

```

Server starts at **http://localhost:8888**
The function executes on a Runpod GPU and prints the result directly:

```
Testing GPU worker with payload: {'message': 'Testing GPU worker'}
Result: {'status': 'success', 'message': 'Testing GPU worker', 'worker_type': 'GPU', ...}
```

First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.

### Alternative: HTTP API Testing

### 4. Test the API
To test via HTTP endpoints instead:

Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions.
```bash
uv run flash run
```

Visit **http://localhost:8888/docs** for interactive API documentation.

```bash
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"message": "Hello GPU!"}'
```

### Full CLI Documentation

For complete CLI usage including deployment, environment management, and troubleshooting:
- **[CLI Reference](../../CLI-REFERENCE.md)** - All commands and options
- **[Getting Started Guide](../../docs/cli/getting-started.md)** - Step-by-step tutorial
- **[Workflows](../../docs/cli/workflows.md)** - Common development patterns

## What This Demonstrates

### GPU Worker (`gpu_worker.py`)
Expand Down Expand Up @@ -133,14 +139,14 @@ The worker uses PyTorch to detect and report GPU information:

## Development

### Test Worker Locally
### Run the Worker
```bash
python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

```

### Run the Application
### HTTP API Testing (Optional)
```bash
flash run
uv run flash run
```

## Next Steps
Expand Down
6 changes: 3 additions & 3 deletions 01_getting_started/01_hello_world/gpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# gpu serverless worker -- detects available GPU hardware.
# run with: flash run
# test directly: python gpu_worker.py
# GPU serverless worker -- detects available GPU hardware.
# Run: python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv run gpu_worker.py

# Alternative: flash run (for HTTP API testing)
from runpod_flash import Endpoint, GpuType


Expand Down
36 changes: 21 additions & 15 deletions 01_getting_started/02_cpu_worker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,31 +18,37 @@ uv run flash login

Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`.

### 3. Run Locally
### 3. Run the Example

```bash
uv run flash run
python cpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

```

Server starts at **http://localhost:8888**
The function executes on a Runpod CPU worker and prints the result directly:

```
Testing CPU worker with payload: {'name': 'Testing CPU worker'}
Result: {'status': 'success', 'message': 'Hello, Testing CPU worker!', 'worker_type': 'CPU', ...}
```

First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.

### Alternative: HTTP API Testing

### 4. Test the API
To test via HTTP endpoints instead:

Visit **http://localhost:8888/docs** for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions.
```bash
uv run flash run
```

Visit **http://localhost:8888/docs** for interactive API documentation.

```bash
curl -X POST http://localhost:8888/cpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"name": "Flash User"}'
```

### Full CLI Documentation

For complete CLI usage including deployment, environment management, and troubleshooting:
- **[CLI Reference](../../CLI-REFERENCE.md)** - All commands and options
- **[Getting Started Guide](../../docs/cli/getting-started.md)** - Step-by-step tutorial
- **[Workflows](../../docs/cli/workflows.md)** - Common development patterns

## What This Demonstrates

### CPU Worker (`cpu_worker.py`)
Expand Down Expand Up @@ -135,14 +141,14 @@ The CPU worker scales to zero when idle:

## Development

### Test Worker Locally
### Run the Worker
```bash
python cpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

```

### Run the Application
### HTTP API Testing (Optional)
```bash
flash run
uv run flash run
```

## When to Use CPU Workers
Expand Down
6 changes: 3 additions & 3 deletions 01_getting_started/02_cpu_worker/cpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# cpu serverless worker -- lightweight processing without GPU.
# run with: flash run
# test directly: python cpu_worker.py
# CPU serverless worker -- lightweight processing without GPU.
# Run: python cpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import CpuInstanceType, Endpoint


Expand Down
30 changes: 21 additions & 9 deletions 01_getting_started/03_mixed_workers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,33 @@ Response

**Prerequisites**: Complete the [repository setup](../../README.md#quick-start) first (clone, `make dev`, set API key).

### Run This Example
### Test Individual Workers

Run the CPU and GPU workers directly:

```bash
cd 01_getting_started/03_mixed_workers
flash run

# Test CPU preprocessing worker
python cpu_worker.py

# Test GPU inference worker
python gpu_worker.py
Comment on lines +51 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

```

### Alternative: Standalone Setup
First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.

### Run the Full Pipeline

If you haven't run the repository-wide setup:
The pipeline endpoint (`/classify`) orchestrates multiple workers via HTTP. To test it:

```bash
uv run flash run
```

Server starts at http://localhost:8888

### Setup (if needed)

```bash
# Install dependencies
Expand All @@ -58,13 +75,8 @@ uv sync
# Authenticate
uv run flash login
# Or create .env file with RUNPOD_API_KEY=your_api_key_here

# Run
uv run flash run
```

Server starts at http://localhost:8888

## Test the Pipeline

```bash
Expand Down
8 changes: 4 additions & 4 deletions 01_getting_started/03_mixed_workers/cpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# cpu workers for text preprocessing and postprocessing.
# part of the mixed CPU/GPU pipeline example.
# run with: flash run
# test directly: python cpu_worker.py
# CPU workers for text preprocessing and postprocessing.
# Part of the mixed CPU/GPU pipeline example.
# Run: python cpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import CpuInstanceType, Endpoint


Expand Down
8 changes: 4 additions & 4 deletions 01_getting_started/03_mixed_workers/gpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# gpu worker for ML inference (sentiment classification).
# part of the mixed CPU/GPU pipeline example.
# run with: flash run
# test directly: python gpu_worker.py
# GPU worker for ML inference (sentiment classification).
# Part of the mixed CPU/GPU pipeline example.
# Run: python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import Endpoint, GpuGroup


Expand Down
13 changes: 10 additions & 3 deletions 01_getting_started/03_mixed_workers/pipeline.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# classification pipeline: CPU preprocess -> GPU inference -> CPU postprocess.
# demonstrates cross-worker orchestration via a load-balanced endpoint.
# run with: flash run
# Classification pipeline: CPU preprocess -> GPU inference -> CPU postprocess.
# Demonstrates cross-worker orchestration via a load-balanced endpoint.
# Run: python pipeline.py (local testing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP route testing)
import sys
from pathlib import Path

# Ensure sibling modules (cpu_worker, gpu_worker) are importable regardless of cwd
sys.path.insert(0, str(Path(__file__).parent))

from runpod_flash import Endpoint

pipeline = Endpoint(name="01_03_classify_pipeline", cpu="cpu3c-1-2", workers=(1, 3))
Expand Down
21 changes: 15 additions & 6 deletions 01_getting_started/04_dependencies/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,16 @@ Learn how to manage Python packages and system dependencies in Flash workers.

```bash
cd 01_getting_started/04_dependencies
flash run
```

Server starts at http://localhost:8888
# Run any worker directly
python gpu_worker.py
python cpu_worker.py
python mixed_worker.py
Comment on lines +34 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

```

### Alternative: Standalone Setup
First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.

If you haven't run the repository-wide setup:
### Setup (if needed)

```bash
# Install dependencies
Expand All @@ -45,11 +47,18 @@ uv sync
# Authenticate
uv run flash login
# Or create .env file with RUNPOD_API_KEY=your_api_key_here
```

# Run
### Alternative: HTTP API Testing

To test via HTTP endpoints:

```bash
uv run flash run
```

Server starts at http://localhost:8888

## GPU vs CPU Packaging

GPU and CPU endpoints use different base Docker images, which affects how dependencies are resolved:
Expand Down
6 changes: 3 additions & 3 deletions 01_getting_started/04_dependencies/cpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# cpu workers demonstrating data science and zero-dependency patterns.
# run with: flash run
# test directly: python cpu_worker.py
# CPU workers demonstrating data science and zero-dependency patterns.
# Run: python cpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import CpuInstanceType, Endpoint


Expand Down
6 changes: 3 additions & 3 deletions 01_getting_started/04_dependencies/gpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# gpu workers demonstrating Python and system dependency management.
# run with: flash run
# test directly: python gpu_worker.py
# GPU workers demonstrating Python and system dependency management.
# Run: python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import Endpoint, GpuGroup


Expand Down
4 changes: 2 additions & 2 deletions 01_getting_started/04_dependencies/mixed_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
# - GPU images (runpod/pytorch:*) have numpy pre-installed
# - CPU images (python-slim) install numpy from the build artifact
#
# run with: flash run
# test directly: python mixed_worker.py
# Run: python mixed_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import CpuInstanceType, Endpoint, GpuType


Expand Down
23 changes: 11 additions & 12 deletions 02_ml_inference/01_text_to_speech/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,31 +33,30 @@ Or create a `.env` file with `RUNPOD_API_KEY=your_api_key_here`.
### Run

```bash
uv run flash run
python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

```

First run provisions the endpoint (~1 min). Server starts at http://localhost:8888
First run provisions the endpoint (~1 min) and downloads the model. The result is printed directly to your terminal.

### Test the Endpoint
Subsequent runs take 5-10 seconds (worker is already running).

Visit http://localhost:8888/docs for interactive API documentation. QB endpoints are auto-generated by `flash run` based on your `@Endpoint` functions.
### Alternative: HTTP API Testing

To test via HTTP endpoints:

**Generate speech (JSON with base64 audio):**
```bash
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"text": "Hello world!", "speaker": "Ryan", "language": "English"}'
uv run flash run
```

**List available voices:**
Server starts at http://localhost:8888. Visit http://localhost:8888/docs for interactive API documentation.

**Generate speech (JSON with base64 audio):**
```bash
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{}'
-d '{"text": "Hello world!", "speaker": "Ryan", "language": "English"}'
```

Check `/docs` for the exact auto-generated endpoint paths and schemas.

## API Functions

QB (queue-based) endpoints are auto-generated from `@Endpoint` functions. Visit `/docs` for the full API schema.
Expand Down
4 changes: 2 additions & 2 deletions 02_ml_inference/01_text_to_speech/gpu_worker.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Qwen3-TTS text-to-speech GPU worker.
# run with: flash run
# test directly: python gpu_worker.py
# Run: python gpu_worker.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv

# Alternative: flash run (for HTTP API testing)
from runpod_flash import Endpoint, GpuGroup


Expand Down
Loading
Loading