This is a hard fork of the pPXF package (originally by Michele Cappellari) designed for high-performance spectral fitting.
- Multi-Backend Support: Works seamlessly on macOS (MPS/Metal) and NVIDIA (CUDA).
- Native PyTorch Implementation: Replaced
numpy/scipylinear algebra withtorchequivalents (fft,lstsq). - Automatic Fallback: Gracefully falls back to CPU if no GPU is detected.
- Massive Speedup: Achieves ~20-30x speedup over sequential execution for large datasets.
- Full Capfit Integration: Runs the complete
capfitnon-linear optimization for every spectrum (kinematics are NOT tied). - Auto-Sizing: Automatically estimates the optimal batch size based on your GPU's available memory.
- Memory Safe: Processes huge datasets (e.g., 10k+ spectra) in efficient chunks to prevent OOM errors.
We recommend using conda for environment management.
conda create -n ppxf_gpu python=3.10 numpy scipy matplotlib astropy
conda activate ppxf_gpuFollow the official instructions for your hardware.
For Mac (M1/M2/M3):
pip install torch torchvisionFor NVIDIA GPU (Linux/Windows):
pip install torch --index-url https://download.pytorch.org/whl/cu118 # or your CUDA versionpip install -e .Simply add gpu=True to your ppxf() call.
from ppxf.ppxf import ppxf
pp = ppxf(templates, galaxy, noise, velscale, start,
moments=2, degree=4, gpu=True) # <--- Enable GPUUse the ppxf_batch wrapper to fit N spectra in parallel on the GPU.
from ppxf.ppxf_batch import ppxf_batch
# spectra: (n_pixels, n_spectra) array
# noise: (n_pixels, n_spectra) array OR (n_pixels,) vector
results = ppxf_batch(
templates,
spectra,
noise,
velscale,
start,
moments=2,
degree=4,
gpu=True
)
# results is a list of ppxf objects (one per spectrum)
for i, pp in enumerate(results):
print(f"Spectrum {i}: Vel={pp.sol[0]:.1f}, Sig={pp.sol[1]:.1f}")| Scenario | Sequential (CPU) | Batched (GPU) | Speedup |
|---|---|---|---|
| 100 Spectra | ~20 s | ~1.0 s | ~20x |
| 10,000 Spectra | ~1.7 hours | ~4 mins | ~25x |
Benchmarks run on Apple M1 Pro (16-core GPU).
- Linear Step (GPU): The expensive convolution and least-squares solving (for weights) are batched and executed on the GPU.
- Non-Linear Step (CPU+GPU): The
capfitoptimizer runs independently for each spectrum (CPU), but offloads the heavylinear_fitevaluations to the GPU in batches. - Precision: Results match the standard CPU version bit-for-bit (within float precision).
This project retains the original pPXF License.