A production-ready forecasting service that provides time series predictions using state-of-the-art models including TiRex and Amazon Chronos. This service is designed for deployment as a serverless endpoint on RunPod.
This project implements a forecasting API that supports multiple time series forecasting models:
- TiRex: A transformer-based forecasting model from NX-AI
- Chronos: Amazon's pretrained time series forecasting models
The service is optimized for serverless deployment and provides both point forecasts and uncertainty quantification through quantile predictions.
- Multi-Model Support: Choose between TiRex and Chronos models
- Quantile Forecasting: Get uncertainty estimates with 10th, 50th, and 90th percentiles
- Serverless Ready: Optimized for RunPod serverless deployment
- GPU Accelerated: Leverages CUDA for fast inference
- CPU Fallback: Automatic CPU detection and fallback when CUDA unavailable
- Production Ready: Error handling, logging, and input validation
- Auto-Configuration: Automatic CPU/CUDA detection with environment variable overrides
The fastest way to get started is using the pre-built Docker image from Docker Hub:
# Pull the latest image
docker pull egargale/forecasting:latest
# Run with GPU support (if CUDA is available)
docker run --gpus all -p 8000:8000 egargale/forecasting:latest
# Run with CPU only
docker run -p 8000:8000 egargale/forecasting:latest
# Run with custom environment variables
docker run -e USE_CPU=true -p 8000:8000 egargale/forecasting:latest
# Test the service
curl -X POST http://localhost:8000/runsync \
-H "Content-Type: application/json" \
-d '{"input": {"model": "tirex", "context": [1,2,3,4,5,6,7,8,9,10], "prediction_length": 5}}'-
Install Dependencies
pip install -r requirements.txt
-
Test CPU/CUDA Configuration
# Check automatic CPU/CUDA detection python test_cpu_config.py -
Run Local Tests
# Test with TiRex model python -c "import json; f=open('test_input.json'); print(json.load(f))" # Test with Chronos model python -c "import json; f=open('test_input_chronos.json'); print(json.load(f))"
-
Start Serverless Worker
python rp_handler.py
The service accepts POST requests with the following JSON structure:
{
"input": {
"model": "tirex",
"context": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
"prediction_length": 5
}
}Parameters:
model: Choose between"tirex"or"chronos"(default:"tirex")context: Array of historical time series valuesprediction_length: Number of future points to predict
Response Format:
{
"model": "tirex",
"forecast": [11.2, 12.3, 13.1, 14.5, 15.2],
"quantiles": [[10.1, 11.2, 12.3], [11.5, 12.3, 13.8], ...]
}The service automatically detects CPU/CUDA availability and configures models accordingly:
- CUDA Available: Uses GPU acceleration with
torch_dtype=torch.bfloat16 - CUDA Unavailable: Falls back to CPU with
torch_dtype=torch.float32
Override automatic detection with these environment variables:
| Variable | Values | Description |
|---|---|---|
USE_CPU |
true, 1, yes |
Force CPU mode regardless of CUDA availability |
TIREX_NO_CUDA |
1 |
Disable TiRex CUDA kernels (set automatically in CPU mode) |
# Force CPU mode
export USE_CPU=true
python rp_handler.py
# Use CUDA (default when available)
unset USE_CPU
python rp_handler.py-
Build Docker Image
docker build -t forecasting-service . -
Deploy to RunPod
- Upload your Docker image to a container registry
- Create a new serverless endpoint on RunPod
- Configure GPU requirements (minimum: 1x A100 or T4)
-
Environment Variables
- No additional environment variables required
- Models are downloaded automatically on startup
Test your handler locally before deployment:
# Install runpod package
pip install runpod
# Test the handler
python rp_handler.py- Type: Transformer-based forecasting model
- Strengths: Fast inference, good for short to medium-term forecasts
- Model Size: ~100M parameters
- Use Case: General-purpose forecasting with uncertainty
- Type: Pretrained time series language model
- Strengths: Zero-shot forecasting, handles multiple frequencies
- Model Size: ~50M parameters
- Use Case: Cross-domain forecasting without retraining
forecasting/
├── main.py # Simple entry point for local testing
├── rp_handler.py # RunPod serverless handler
├── test_cpu_config.py # CPU/CUDA configuration test script
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── test_input.json # Sample TiRex input
├── test_input_chronos.json # Sample Chronos input
├── test_input_tirex.json # Sample TiRex input (alias)
├── chronos-forecasting/ # Chronos model submodule
└── tirex/ # TiRex model submodule
-
Clone with Submodules
git clone --recurse-submodules <repository-url> cd forecasting
-
Install in Development Mode
pip install -e . -
Update Submodules
git submodule
This project incorporates third-party models and software:
- License: Apache License 2.0
- Source: https://github.com/amazon-science/chronos-forecasting
- Copyright: Amazon.com, Inc. or its affiliates
- License: Apache License 2.0
- Source: https://github.com/NX-AI/tirex
- Copyright: NX-AI GmbH
This software is provided "as is" without warranty of any kind, express or implied. The authors and copyright holders are not liable for any claims, damages, or other liability arising from the use of this software.