Skip to content

CVPaul/xpandas

Repository files navigation

xpandas

English | 中文

The only pandas-compatible DataFrame library that compiles to TorchScript and runs in pure C++ inference — no Python runtime required.

Write your trading strategy with familiar pd.DataFrame / pd.Series syntax, compile it with torch.jit.script, ship the .pt artifact to a C++ engine, and run at bare-metal speed.

Key Features

  • 🔁 Drop-in replacementimport xpandas as pd replaces import pandas as pd, zero rewrite
  • TorchScript native — every op is a registered TORCH_LIBRARY C++ op, fully torch.jit.script-compatible
  • 🚀 Pure C++ inferencetorch::jit::load("alpha.pt") + dlopen(libxpandas_ops.so), no Python at runtime
  • 📊 35 ops covering groupby, rolling, ewm, shift, fillna, rank, zscore, cumulative, datetime, and more
  • 🏎️ 2–163× faster than pandas on element-wise and rolling ops (see Benchmarks)

Target Use Cases

Scenario Why xpandas?
High-frequency / quantitative trading Prototype Alpha signals in Python pandas, deploy to sub-millisecond C++ engines with zero rewrite
Online model serving Embed feature engineering (rolling stats, z-scores, pct_change) inside a TorchScript model served by torch::jit in C++
Low-latency inference pipelines Eliminate Python GIL and interpreter overhead — the entire signal path runs in compiled C++
Edge / embedded deployment Ship a single .pt file + shared library — no Python installation needed on the target machine

How xpandas Differs from Alternatives

xpandas pandas Polars Modin cuDF (RAPIDS)
Primary goal TorchScript compilation + C++ inference General data analysis Fast DataFrame engine Scale pandas with parallelism GPU-accelerated DataFrames
torch.jit.script support ✅ First-class — every op is a TORCH_LIBRARY custom op
Pure C++ inference torch::jit::load() — no Python at runtime ❌ Requires Python ❌ Requires Rust runtime ❌ Requires Python ❌ Requires Python + CUDA
Deployment artifact Single .pt file + .so Python source + env Python/Rust source + env Python source + env Python source + env
Python GIL-free execution ✅ All ops run in C++ ✅ (Rust) Partial (Ray) ✅ (GPU)
API compatibility pandas subset (35 ops) Full pandas API Own API (SQL-like) Full pandas API pandas subset
Best for Quant signals → C++ prod EDA, general analytics Large-scale data processing Scaling existing pandas code GPU batch processing

In short: Other libraries optimize how fast you can crunch data in Python.
xpandas solves a fundamentally different problem: getting your pandas logic out of Python entirely and into a compiled, deployable, GIL-free C++ artifact.

Why?

Quantitative trading strategies are often prototyped in Python using pandas. Deploying them to a low-latency C++ engine traditionally requires a full rewrite. xpandas bridges this gap:

  1. Replace import pandas as pd with import xpandas as pd
  2. torch.jit.script(model) compiles the module to TorchScript
  3. Load the .pt file in C++ -- no Python runtime needed

Architecture

  Python side                        C++ side
  -----------                        --------
  import xpandas                     dlopen(libxpandas_ops.so)
  model = Alpha()                    auto m = torch::jit::load("alpha.pt")
  scripted = torch.jit.script(model) m.get_method("on_bod")({ts, data})
  scripted.save("alpha.pt")          auto sig = m.forward({ts, data})

Data model:

  • Use xpandas.DataFrame exactly like pandas.DataFrame — same API, zero rewrite
  • Columns are 1-D float64 tensors (numeric) or int64 tensors (enum-encoded strings)
  • Internally, each pandas-like operation dispatches to a registered torch.ops.xpandas.* C++ op

Project Structure

xpandas/
  __init__.py              # Package init, loads C++ extension
  ops_meta.py              # FakeTensor kernels (for torch.compile)
  csrc/ops/
    ops.h                  # Common header with op declarations
    register.cpp           # TORCH_LIBRARY schema + CPU dispatch
    groupby_resample_ohlc.cpp
    compare.cpp
    cast.cpp
    lookup.cpp
    breakout_signal.cpp
    rank.cpp               # Example op (see CONTRIBUTING.md)
    to_datetime.cpp        # to_datetime + dt_floor
    groupby_agg.cpp        # groupby_sum/mean/count/std
    groupby_minmax.cpp     # groupby_min/max/first/last
    rolling.cpp            # rolling_sum/mean/std
    rolling_minmax.cpp     # rolling_min/max (O(n) monotonic deque)
    shift.cpp              # shift (lag/lead)
    fillna.cpp             # fillna
    where.cpp              # where_, masked_fill
    pct_change.cpp         # pct_change
    cumulative.cpp         # cumsum, cumprod
    clip.cpp               # clip
    math_ops.cpp           # abs_, log_, zscore
    ewm.cpp                # ewm_mean
    sort.cpp               # sort_by
inference/
  main.cpp                 # Pure C++ inference driver
examples/
  alpha_original.py        # Original pandas-based Alpha (reference)
  alpha_ts.py              # TorchScript-compatible Alpha (breakout)
  alpha_vwap.py            # TorchScript VWAP mean-reversion Alpha
  alpha_momentum.py        # TorchScript momentum z-score Alpha
  trace_and_save.py        # Script + test + save alpha.pt
benchmarks/
  bench_ops.py             # xpandas vs pandas performance comparison
tests/
  test_ops.py                  # Unit tests for each C++ op (110 tests)
  test_wrappers.py             # Wrapper API tests (233 tests)
  test_alpha_e2e.py            # End-to-end TorchScript tests (10 tests)
  test_alpha_xpandas_e2e.py    # End-to-end xpandas wrapper tests (5 tests)

Quickstart

Prerequisites

  • Python >= 3.9
  • PyTorch >= 2.0
  • A C++ compiler with C++17 support

Install (Python)

pip install --no-build-isolation -e .

Note: --no-build-isolation is required to ensure the C++ extension is compiled with the same ABI as your installed PyTorch.

Run Tests

pytest tests/ -v

Script and Save a Model

python examples/trace_and_save.py
# produces alpha.pt

Build and Run C++ Inference

mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch; print(torch.utils.cmake_prefix_path)')" ..
make -j

./alpha_infer ../alpha.pt ./libxpandas_ops.so
# Output: Signal: [+1.0, -1.0]

Available Ops (35 total)

DataFrame Utilities

Op Schema Pandas Equivalent
lookup (Dict(str, Tensor) table, str key) -> Tensor df['col']
sort_by (Dict(str, Tensor) table, str by, bool ascending) -> Dict(str, Tensor) df.sort_values(by)

Groupby / Aggregation

Op Schema Pandas Equivalent
groupby_resample_ohlc (Tensor key, Tensor value) -> (Tensor, Tensor, Tensor, Tensor, Tensor) df.groupby(key)[val].resample().{first,max,min,last}()
groupby_sum (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].sum()
groupby_mean (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].mean()
groupby_count (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].count()
groupby_std (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].std()
groupby_min (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].min()
groupby_max (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].max()
groupby_first (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].first()
groupby_last (Tensor key, Tensor value) -> (Tensor, Tensor) df.groupby(key)[val].last()

Element-wise Comparison

Op Schema Pandas Equivalent
compare_gt (Tensor a, Tensor b) -> Tensor series > series
compare_lt (Tensor a, Tensor b) -> Tensor series < series

Type Casting

Op Schema Pandas Equivalent
bool_to_float (Tensor x) -> Tensor series.astype(float)

Fused Signals

Op Schema Pandas Equivalent
breakout_signal (Tensor price, Tensor high, Tensor low) -> Tensor (price > high).float() - (price < low).float()

Statistical

Op Schema Pandas Equivalent
rank (Tensor x) -> Tensor series.rank(method='average')
zscore (Tensor x) -> Tensor (series - series.mean()) / series.std()

Datetime

Op Schema Pandas Equivalent
to_datetime (Tensor epochs, str unit) -> Tensor pd.to_datetime(series, unit=...)
dt_floor (Tensor dt_ns, int interval_ns) -> Tensor series.dt.floor(freq)

Rolling Window

Op Schema Pandas Equivalent
rolling_sum (Tensor x, int window) -> Tensor series.rolling(window).sum()
rolling_mean (Tensor x, int window) -> Tensor series.rolling(window).mean()
rolling_std (Tensor x, int window) -> Tensor series.rolling(window).std()
rolling_min (Tensor x, int window) -> Tensor series.rolling(window).min()
rolling_max (Tensor x, int window) -> Tensor series.rolling(window).max()

Shift / Lag

Op Schema Pandas Equivalent
shift (Tensor x, int periods) -> Tensor series.shift(periods)

NaN Handling

Op Schema Pandas Equivalent
fillna (Tensor x, float fill_value) -> Tensor series.fillna(value)

Conditional

Op Schema Pandas Equivalent
where_ (Tensor cond, Tensor x, Tensor other) -> Tensor series.where(cond, other)
masked_fill (Tensor x, Tensor mask, float fill_value) -> Tensor series.mask(mask, value)

Percentage Change

Op Schema Pandas Equivalent
pct_change (Tensor x, int periods) -> Tensor series.pct_change(periods)

Cumulative

Op Schema Pandas Equivalent
cumsum (Tensor x) -> Tensor series.cumsum()
cumprod (Tensor x) -> Tensor series.cumprod()

Clipping

Op Schema Pandas Equivalent
clip (Tensor x, float lower, float upper) -> Tensor series.clip(lower, upper)

Math

Op Schema Pandas Equivalent
abs_ (Tensor x) -> Tensor series.abs()
log_ (Tensor x) -> Tensor np.log(series)

Exponential Weighted

Op Schema Pandas Equivalent
ewm_mean (Tensor x, int span) -> Tensor series.ewm(span=span, adjust=False).mean()

Benchmarks

Run python benchmarks/bench_ops.py to compare xpandas ops against their pandas equivalents. Example output (N=10,000, 20 repeats, median time):

Op                         pandas (us)  xpandas (us)   speedup
--------------------------------------------------------------
clip                             481.2           5.6    85.91x >>>
to_datetime                     2706.3          16.6   163.02x >>>
rolling_sum                      142.1           9.1    15.66x >>>
breakout_signal                  187.5          10.3    18.26x >>>
pct_change                       207.7          17.3    11.99x >>>
...
--------------------------------------------------------------
Geometric mean speedup:                                  2.23x
Faster in 23/34 ops

Key wins: element-wise ops (clip, compare_*, fillna), rolling window ops (rolling_sum/mean/std), fused ops (breakout_signal), and datetime conversion (to_datetime — 163x). Groupby ops are slower because xpandas uses sorted std::map keys (for deterministic TorchScript output) vs pandas' optimized Cython hashmaps.

Wrapper Benchmarks

Run python benchmarks/bench_wrappers.py to measure Python wrapper overhead and end-to-end Alpha performance. Example output (N=10,000, 30 repeats, median time):

Part 1: Wrapper Overhead

Operation Direct (μs) Wrapper (μs) Overhead
Series.gt 9.4 9.7 +4%
Series.lt 9.4 9.9 +5%
Series.sub 3.7 3.9 +7%
Series.astype(float) 9.9 10.2 +3%
DataFrame.getattr 0.1 0.7 +596%
GroupBy→OHLC chain 127.5 517.9 +306%
OHLC×4 cached 509.7 131.1 -74% 🏆

Part 2: End-to-End Alpha (pandas vs xpandas, rolling mean crossover)

Size Instruments Pandas (ms) xpandas (ms) Speedup
Small (10×50) 10 0.3 0.0 11.9×
Medium (50×100) 50 0.4 0.1 7.8×
Large (500×10,000) 500 315.4 54.2 5.8×

Wrapper overhead on element-wise ops is negligible (<10%). xpandas is consistently faster than pandas at all tested scales. At production scale (500 instruments × 10,000 ticks), xpandas completes a rolling mean crossover signal in 54 ms vs pandas' 315 ms — a 5.8× speedup. The GroupBy→OHLC chain is an exception: xpandas uses sorted std::map keys for deterministic TorchScript output, which is slower than pandas' Cython hashmaps for groupby-heavy workloads.

API Reference (Python Wrappers)

The Python wrapper API (import xpandas as pd) provides pandas-compatible classes that dispatch to C++ ops under the hood.

Core Classes

Class Description Key Methods
pd.DataFrame Dict-backed DataFrame (Dict[str, Tensor]) __getitem__, __setitem__, columns, shape, dtypes, head(), tail(), drop(), rename(), sort_values(), merge(), describe(), apply(), groupby()
pd.Series 1-D Tensor wrapper Arithmetic (+,-,*,/,**,%), comparison (>,<,>=,<=,==,!=), abs(), log(), zscore(), rank(), fillna(), shift(), pct_change(), cumsum(), cumprod(), clip(), where(), mask(), rolling(), ewm(), expanding(), mean(), std(), sum(), min(), max()
pd.GroupBy GroupBy entry point __getitem__(col)GroupByColumn
pd.GroupByColumn Single-column group aggregation sum(), mean(), count(), std(), min(), max(), first(), last(), resample(freq) → returns (keys, values) tuples
pd.Rolling Rolling window mean(), sum(), std(), min(), max()
pd.EWM Exponential weighted mean()
pd.Expanding Expanding window sum(), mean()
pd.Resampler OHLC resampling first(), max(), min(), last() (cached — one C++ call for all four)
pd.Index Index wrapper get_level_values()

Module-Level Functions

Function Description
pd.concat(items, axis=0) Concatenate Series (axis=0) or DataFrames (axis=1)
pd.to_datetime(tensor, unit='s') Convert epoch timestamps to nanosecond datetime tensors
pd.dt_floor(tensor, freq='1D') Floor datetime tensors to a frequency

Important Differences from pandas

  • All tensors must be torch.double (float64) for value columns, torch.long (int64) for groupby keys
  • GroupBy returns (keys_tensor, values_tensor) tuples, not pandas-style grouped DataFrames
  • DataFrame is Dict[str, Tensor] internally — column order depends on insertion order
  • to_datetime and dt_floor are module-level functions, not Series methods

See examples/wrapper_api_tour.py for a complete working demo of every class and method.

Migration Guide

Migrating from pandas to xpandas is straightforward — most code changes are mechanical.

Step 1: Change your import

# Before
import pandas as pd

# After
import xpandas as pd
import torch

Step 2: Use torch.tensor instead of Python lists

# Before (pandas)
df = pd.DataFrame({'price': [100.0, 101.5, 99.8]})

# After (xpandas)
df = pd.DataFrame({'price': torch.tensor([100.0, 101.5, 99.8], dtype=torch.double)})

Step 3: Adapt GroupBy results

# pandas: returns a DataFrame/Series with group index
result = df.groupby('sector')['price'].mean()
print(result['tech'])  # index-based access

# xpandas: returns (keys_tensor, values_tensor) tuple
keys, means = df.groupby('sector')['price'].mean()
print(keys, means)  # tensor([0, 1, 2]), tensor([100.5, 98.3, 105.1])

Step 4: Use module-level datetime functions

# pandas
df['date'] = pd.to_datetime(df['epoch'], unit='s')
df['date_floor'] = df['date'].dt.floor('1D')

# xpandas
df['date'] = pd.to_datetime(df['epoch'], unit='s')
df['date_floor'] = pd.dt_floor(df['date'], freq='1D')

Common Patterns Side-by-Side

pandas xpandas Notes
df['col'] df['col'] ✅ Same
df.col df.col ✅ Same
series + series series + series ✅ Same
series.rolling(5).mean() series.rolling(5).mean() ✅ Same
series.ewm(span=10).mean() series.ewm(span=10).mean() ✅ Same
series.fillna(0) series.fillna(0) ✅ Same
df.sort_values('col') df.sort_values(by='col') ✅ Same
df.merge(other, on='key') df.merge(other, on='key') ✅ Same
series.where(cond, -1.0) series.where(cond, tensor) ⚠️ other must be a Tensor
df.groupby('k')['v'].sum() keys, vals = df.groupby('k')['v'].sum() ⚠️ Returns tuple
pd.to_datetime(s, unit='s') pd.to_datetime(t, unit='s') ✅ Same (module-level)
s.dt.floor('1D') pd.dt_floor(t, freq='1D') ⚠️ Module-level function

See examples/pandas_migration.py for a fully runnable side-by-side comparison.

Troubleshooting / FAQ

Q: I get RuntimeError: expected scalar type Double — what's wrong?

All xpandas value columns must be torch.double (float64). Check your tensor creation:

# Wrong
t = torch.tensor([1.0, 2.0, 3.0])            # defaults to float32!

# Right
t = torch.tensor([1.0, 2.0, 3.0], dtype=torch.double)

Q: GroupBy raises an error about Long tensors?

GroupBy key columns must be torch.long (int64):

df = pd.DataFrame({
    'group': torch.tensor([1, 2, 1, 2], dtype=torch.long),   # int64 keys
    'value': torch.tensor([10.0, 20.0, 30.0, 40.0], dtype=torch.double)
})
keys, sums = df.groupby('group')['value'].sum()

Q: where() or mask() fails with a scalar argument?

Unlike pandas, xpandas requires other to be a Tensor, not a scalar:

# Wrong
result = series.where(cond, -1.0)

# Right
result = series.where(cond, torch.full_like(series.values, -1.0))

Q: My model fails during torch.jit.script() — what should I check?

  1. Ensure all DataFrame columns are Tensors (no Python lists or NumPy arrays)
  2. GroupBy keys must be torch.long, values must be torch.double
  3. Use pd.to_datetime() and pd.dt_floor() as module-level calls, not methods
  4. Avoid Python-only constructs inside @torch.jit.script (list comprehensions, f-strings, etc.)

Q: How do I deploy to C++ inference?

# 1. Script and save in Python
python examples/trace_and_save.py  # produces alpha.pt

# 2. Build C++ inference binary
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch; print(torch.utils.cmake_prefix_path)')" ..
make -j

# 3. Run — no Python needed
./alpha_infer ../alpha.pt ./libxpandas_ops.so

See inference/main.cpp for the complete C++ driver code.

Q: Are groupby ops slower than pandas?

Yes — by design. xpandas groupby uses sorted std::map keys to guarantee deterministic output order in TorchScript. pandas uses optimized Cython hashmaps that are faster but non-deterministic. If groupby performance is critical, consider pre-sorting your data or reducing group cardinality.

Q: Can I use xpandas with torch.compile?

Basic support exists via FakeTensor kernels in ops_meta.py. However, the primary compilation target is torch.jit.script. For production deployment, use TorchScript.

Q: What about GPU support?

Currently all ops are CPU-only. The ops dispatch through PyTorch's dispatcher, so adding CUDA kernels is architecturally possible but not yet implemented.

Contributing

See CONTRIBUTING.md (中文) for a step-by-step guide to adding a new op, using rank as a worked example.

License

Apache-2.0. See LICENSE.

About

Drop-in pandas replacement backed by PyTorch custom C++ ops — enables torch.jit.script compilation and pure C++ inference

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors