English | 中文
The only pandas-compatible DataFrame library that compiles to TorchScript and runs in pure C++ inference — no Python runtime required.
Write your trading strategy with familiar pd.DataFrame / pd.Series syntax, compile it with torch.jit.script, ship the .pt artifact to a C++ engine, and run at bare-metal speed.
- 🔁 Drop-in replacement —
import xpandas as pdreplacesimport pandas as pd, zero rewrite - ⚡ TorchScript native — every op is a registered
TORCH_LIBRARYC++ op, fullytorch.jit.script-compatible - 🚀 Pure C++ inference —
torch::jit::load("alpha.pt")+dlopen(libxpandas_ops.so), no Python at runtime - 📊 35 ops covering groupby, rolling, ewm, shift, fillna, rank, zscore, cumulative, datetime, and more
- 🏎️ 2–163× faster than pandas on element-wise and rolling ops (see Benchmarks)
| Scenario | Why xpandas? |
|---|---|
| High-frequency / quantitative trading | Prototype Alpha signals in Python pandas, deploy to sub-millisecond C++ engines with zero rewrite |
| Online model serving | Embed feature engineering (rolling stats, z-scores, pct_change) inside a TorchScript model served by torch::jit in C++ |
| Low-latency inference pipelines | Eliminate Python GIL and interpreter overhead — the entire signal path runs in compiled C++ |
| Edge / embedded deployment | Ship a single .pt file + shared library — no Python installation needed on the target machine |
| xpandas | pandas | Polars | Modin | cuDF (RAPIDS) | |
|---|---|---|---|---|---|
| Primary goal | TorchScript compilation + C++ inference | General data analysis | Fast DataFrame engine | Scale pandas with parallelism | GPU-accelerated DataFrames |
torch.jit.script support |
✅ First-class — every op is a TORCH_LIBRARY custom op |
❌ | ❌ | ❌ | ❌ |
| Pure C++ inference | ✅ torch::jit::load() — no Python at runtime |
❌ Requires Python | ❌ Requires Rust runtime | ❌ Requires Python | ❌ Requires Python + CUDA |
| Deployment artifact | Single .pt file + .so |
Python source + env | Python/Rust source + env | Python source + env | Python source + env |
| Python GIL-free execution | ✅ All ops run in C++ | ❌ | ✅ (Rust) | Partial (Ray) | ✅ (GPU) |
| API compatibility | pandas subset (35 ops) | Full pandas API | Own API (SQL-like) | Full pandas API | pandas subset |
| Best for | Quant signals → C++ prod | EDA, general analytics | Large-scale data processing | Scaling existing pandas code | GPU batch processing |
In short: Other libraries optimize how fast you can crunch data in Python.
xpandas solves a fundamentally different problem: getting your pandas logic out of Python entirely and into a compiled, deployable, GIL-free C++ artifact.
Quantitative trading strategies are often prototyped in Python using pandas. Deploying them to a low-latency C++ engine traditionally requires a full rewrite. xpandas bridges this gap:
- Replace
import pandas as pdwithimport xpandas as pd torch.jit.script(model)compiles the module to TorchScript- Load the
.ptfile in C++ -- no Python runtime needed
Python side C++ side
----------- --------
import xpandas dlopen(libxpandas_ops.so)
model = Alpha() auto m = torch::jit::load("alpha.pt")
scripted = torch.jit.script(model) m.get_method("on_bod")({ts, data})
scripted.save("alpha.pt") auto sig = m.forward({ts, data})
Data model:
- Use
xpandas.DataFrameexactly likepandas.DataFrame— same API, zero rewrite - Columns are 1-D
float64tensors (numeric) orint64tensors (enum-encoded strings) - Internally, each pandas-like operation dispatches to a registered
torch.ops.xpandas.*C++ op
xpandas/
__init__.py # Package init, loads C++ extension
ops_meta.py # FakeTensor kernels (for torch.compile)
csrc/ops/
ops.h # Common header with op declarations
register.cpp # TORCH_LIBRARY schema + CPU dispatch
groupby_resample_ohlc.cpp
compare.cpp
cast.cpp
lookup.cpp
breakout_signal.cpp
rank.cpp # Example op (see CONTRIBUTING.md)
to_datetime.cpp # to_datetime + dt_floor
groupby_agg.cpp # groupby_sum/mean/count/std
groupby_minmax.cpp # groupby_min/max/first/last
rolling.cpp # rolling_sum/mean/std
rolling_minmax.cpp # rolling_min/max (O(n) monotonic deque)
shift.cpp # shift (lag/lead)
fillna.cpp # fillna
where.cpp # where_, masked_fill
pct_change.cpp # pct_change
cumulative.cpp # cumsum, cumprod
clip.cpp # clip
math_ops.cpp # abs_, log_, zscore
ewm.cpp # ewm_mean
sort.cpp # sort_by
inference/
main.cpp # Pure C++ inference driver
examples/
alpha_original.py # Original pandas-based Alpha (reference)
alpha_ts.py # TorchScript-compatible Alpha (breakout)
alpha_vwap.py # TorchScript VWAP mean-reversion Alpha
alpha_momentum.py # TorchScript momentum z-score Alpha
trace_and_save.py # Script + test + save alpha.pt
benchmarks/
bench_ops.py # xpandas vs pandas performance comparison
tests/
test_ops.py # Unit tests for each C++ op (110 tests)
test_wrappers.py # Wrapper API tests (233 tests)
test_alpha_e2e.py # End-to-end TorchScript tests (10 tests)
test_alpha_xpandas_e2e.py # End-to-end xpandas wrapper tests (5 tests)
- Python >= 3.9
- PyTorch >= 2.0
- A C++ compiler with C++17 support
pip install --no-build-isolation -e .Note:
--no-build-isolationis required to ensure the C++ extension is compiled with the same ABI as your installed PyTorch.
pytest tests/ -vpython examples/trace_and_save.py
# produces alpha.ptmkdir build && cd build
cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch; print(torch.utils.cmake_prefix_path)')" ..
make -j
./alpha_infer ../alpha.pt ./libxpandas_ops.so
# Output: Signal: [+1.0, -1.0]| Op | Schema | Pandas Equivalent |
|---|---|---|
lookup |
(Dict(str, Tensor) table, str key) -> Tensor |
df['col'] |
sort_by |
(Dict(str, Tensor) table, str by, bool ascending) -> Dict(str, Tensor) |
df.sort_values(by) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
groupby_resample_ohlc |
(Tensor key, Tensor value) -> (Tensor, Tensor, Tensor, Tensor, Tensor) |
df.groupby(key)[val].resample().{first,max,min,last}() |
groupby_sum |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].sum() |
groupby_mean |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].mean() |
groupby_count |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].count() |
groupby_std |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].std() |
groupby_min |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].min() |
groupby_max |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].max() |
groupby_first |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].first() |
groupby_last |
(Tensor key, Tensor value) -> (Tensor, Tensor) |
df.groupby(key)[val].last() |
| Op | Schema | Pandas Equivalent |
|---|---|---|
compare_gt |
(Tensor a, Tensor b) -> Tensor |
series > series |
compare_lt |
(Tensor a, Tensor b) -> Tensor |
series < series |
| Op | Schema | Pandas Equivalent |
|---|---|---|
bool_to_float |
(Tensor x) -> Tensor |
series.astype(float) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
breakout_signal |
(Tensor price, Tensor high, Tensor low) -> Tensor |
(price > high).float() - (price < low).float() |
| Op | Schema | Pandas Equivalent |
|---|---|---|
rank |
(Tensor x) -> Tensor |
series.rank(method='average') |
zscore |
(Tensor x) -> Tensor |
(series - series.mean()) / series.std() |
| Op | Schema | Pandas Equivalent |
|---|---|---|
to_datetime |
(Tensor epochs, str unit) -> Tensor |
pd.to_datetime(series, unit=...) |
dt_floor |
(Tensor dt_ns, int interval_ns) -> Tensor |
series.dt.floor(freq) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
rolling_sum |
(Tensor x, int window) -> Tensor |
series.rolling(window).sum() |
rolling_mean |
(Tensor x, int window) -> Tensor |
series.rolling(window).mean() |
rolling_std |
(Tensor x, int window) -> Tensor |
series.rolling(window).std() |
rolling_min |
(Tensor x, int window) -> Tensor |
series.rolling(window).min() |
rolling_max |
(Tensor x, int window) -> Tensor |
series.rolling(window).max() |
| Op | Schema | Pandas Equivalent |
|---|---|---|
shift |
(Tensor x, int periods) -> Tensor |
series.shift(periods) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
fillna |
(Tensor x, float fill_value) -> Tensor |
series.fillna(value) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
where_ |
(Tensor cond, Tensor x, Tensor other) -> Tensor |
series.where(cond, other) |
masked_fill |
(Tensor x, Tensor mask, float fill_value) -> Tensor |
series.mask(mask, value) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
pct_change |
(Tensor x, int periods) -> Tensor |
series.pct_change(periods) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
cumsum |
(Tensor x) -> Tensor |
series.cumsum() |
cumprod |
(Tensor x) -> Tensor |
series.cumprod() |
| Op | Schema | Pandas Equivalent |
|---|---|---|
clip |
(Tensor x, float lower, float upper) -> Tensor |
series.clip(lower, upper) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
abs_ |
(Tensor x) -> Tensor |
series.abs() |
log_ |
(Tensor x) -> Tensor |
np.log(series) |
| Op | Schema | Pandas Equivalent |
|---|---|---|
ewm_mean |
(Tensor x, int span) -> Tensor |
series.ewm(span=span, adjust=False).mean() |
Run python benchmarks/bench_ops.py to compare xpandas ops against their pandas
equivalents. Example output (N=10,000, 20 repeats, median time):
Op pandas (us) xpandas (us) speedup
--------------------------------------------------------------
clip 481.2 5.6 85.91x >>>
to_datetime 2706.3 16.6 163.02x >>>
rolling_sum 142.1 9.1 15.66x >>>
breakout_signal 187.5 10.3 18.26x >>>
pct_change 207.7 17.3 11.99x >>>
...
--------------------------------------------------------------
Geometric mean speedup: 2.23x
Faster in 23/34 ops
Key wins: element-wise ops (clip, compare_*, fillna), rolling window ops
(rolling_sum/mean/std), fused ops (breakout_signal), and datetime conversion
(to_datetime — 163x). Groupby ops are slower because xpandas uses sorted
std::map keys (for deterministic TorchScript output) vs pandas' optimized
Cython hashmaps.
Run python benchmarks/bench_wrappers.py to measure Python wrapper overhead and
end-to-end Alpha performance. Example output (N=10,000, 30 repeats, median time):
Part 1: Wrapper Overhead
| Operation | Direct (μs) | Wrapper (μs) | Overhead |
|---|---|---|---|
| Series.gt | 9.4 | 9.7 | +4% |
| Series.lt | 9.4 | 9.9 | +5% |
| Series.sub | 3.7 | 3.9 | +7% |
| Series.astype(float) | 9.9 | 10.2 | +3% |
| DataFrame.getattr | 0.1 | 0.7 | +596% |
| GroupBy→OHLC chain | 127.5 | 517.9 | +306% |
| OHLC×4 cached | 509.7 | 131.1 | -74% 🏆 |
Part 2: End-to-End Alpha (pandas vs xpandas, rolling mean crossover)
| Size | Instruments | Pandas (ms) | xpandas (ms) | Speedup |
|---|---|---|---|---|
| Small (10×50) | 10 | 0.3 | 0.0 | 11.9× |
| Medium (50×100) | 50 | 0.4 | 0.1 | 7.8× |
| Large (500×10,000) | 500 | 315.4 | 54.2 | 5.8× |
Wrapper overhead on element-wise ops is negligible (<10%). xpandas is consistently
faster than pandas at all tested scales. At production scale (500 instruments ×
10,000 ticks), xpandas completes a rolling mean crossover signal in 54 ms
vs pandas' 315 ms — a 5.8× speedup. The GroupBy→OHLC chain is an exception:
xpandas uses sorted std::map keys for deterministic TorchScript output, which is
slower than pandas' Cython hashmaps for groupby-heavy workloads.
The Python wrapper API (import xpandas as pd) provides pandas-compatible classes that dispatch to C++ ops under the hood.
| Class | Description | Key Methods |
|---|---|---|
pd.DataFrame |
Dict-backed DataFrame (Dict[str, Tensor]) |
__getitem__, __setitem__, columns, shape, dtypes, head(), tail(), drop(), rename(), sort_values(), merge(), describe(), apply(), groupby() |
pd.Series |
1-D Tensor wrapper | Arithmetic (+,-,*,/,**,%), comparison (>,<,>=,<=,==,!=), abs(), log(), zscore(), rank(), fillna(), shift(), pct_change(), cumsum(), cumprod(), clip(), where(), mask(), rolling(), ewm(), expanding(), mean(), std(), sum(), min(), max() |
pd.GroupBy |
GroupBy entry point | __getitem__(col) → GroupByColumn |
pd.GroupByColumn |
Single-column group aggregation | sum(), mean(), count(), std(), min(), max(), first(), last(), resample(freq) → returns (keys, values) tuples |
pd.Rolling |
Rolling window | mean(), sum(), std(), min(), max() |
pd.EWM |
Exponential weighted | mean() |
pd.Expanding |
Expanding window | sum(), mean() |
pd.Resampler |
OHLC resampling | first(), max(), min(), last() (cached — one C++ call for all four) |
pd.Index |
Index wrapper | get_level_values() |
| Function | Description |
|---|---|
pd.concat(items, axis=0) |
Concatenate Series (axis=0) or DataFrames (axis=1) |
pd.to_datetime(tensor, unit='s') |
Convert epoch timestamps to nanosecond datetime tensors |
pd.dt_floor(tensor, freq='1D') |
Floor datetime tensors to a frequency |
- All tensors must be
torch.double(float64) for value columns,torch.long(int64) for groupby keys - GroupBy returns
(keys_tensor, values_tensor)tuples, not pandas-style grouped DataFrames - DataFrame is
Dict[str, Tensor]internally — column order depends on insertion order to_datetimeanddt_floorare module-level functions, not Series methods
See
examples/wrapper_api_tour.pyfor a complete working demo of every class and method.
Migrating from pandas to xpandas is straightforward — most code changes are mechanical.
# Before
import pandas as pd
# After
import xpandas as pd
import torch# Before (pandas)
df = pd.DataFrame({'price': [100.0, 101.5, 99.8]})
# After (xpandas)
df = pd.DataFrame({'price': torch.tensor([100.0, 101.5, 99.8], dtype=torch.double)})# pandas: returns a DataFrame/Series with group index
result = df.groupby('sector')['price'].mean()
print(result['tech']) # index-based access
# xpandas: returns (keys_tensor, values_tensor) tuple
keys, means = df.groupby('sector')['price'].mean()
print(keys, means) # tensor([0, 1, 2]), tensor([100.5, 98.3, 105.1])# pandas
df['date'] = pd.to_datetime(df['epoch'], unit='s')
df['date_floor'] = df['date'].dt.floor('1D')
# xpandas
df['date'] = pd.to_datetime(df['epoch'], unit='s')
df['date_floor'] = pd.dt_floor(df['date'], freq='1D')| pandas | xpandas | Notes |
|---|---|---|
df['col'] |
df['col'] |
✅ Same |
df.col |
df.col |
✅ Same |
series + series |
series + series |
✅ Same |
series.rolling(5).mean() |
series.rolling(5).mean() |
✅ Same |
series.ewm(span=10).mean() |
series.ewm(span=10).mean() |
✅ Same |
series.fillna(0) |
series.fillna(0) |
✅ Same |
df.sort_values('col') |
df.sort_values(by='col') |
✅ Same |
df.merge(other, on='key') |
df.merge(other, on='key') |
✅ Same |
series.where(cond, -1.0) |
series.where(cond, tensor) |
other must be a Tensor |
df.groupby('k')['v'].sum() |
keys, vals = df.groupby('k')['v'].sum() |
|
pd.to_datetime(s, unit='s') |
pd.to_datetime(t, unit='s') |
✅ Same (module-level) |
s.dt.floor('1D') |
pd.dt_floor(t, freq='1D') |
See
examples/pandas_migration.pyfor a fully runnable side-by-side comparison.
All xpandas value columns must be torch.double (float64). Check your tensor creation:
# Wrong
t = torch.tensor([1.0, 2.0, 3.0]) # defaults to float32!
# Right
t = torch.tensor([1.0, 2.0, 3.0], dtype=torch.double)GroupBy key columns must be torch.long (int64):
df = pd.DataFrame({
'group': torch.tensor([1, 2, 1, 2], dtype=torch.long), # int64 keys
'value': torch.tensor([10.0, 20.0, 30.0, 40.0], dtype=torch.double)
})
keys, sums = df.groupby('group')['value'].sum()Unlike pandas, xpandas requires other to be a Tensor, not a scalar:
# Wrong
result = series.where(cond, -1.0)
# Right
result = series.where(cond, torch.full_like(series.values, -1.0))- Ensure all DataFrame columns are Tensors (no Python lists or NumPy arrays)
- GroupBy keys must be
torch.long, values must betorch.double - Use
pd.to_datetime()andpd.dt_floor()as module-level calls, not methods - Avoid Python-only constructs inside
@torch.jit.script(list comprehensions, f-strings, etc.)
# 1. Script and save in Python
python examples/trace_and_save.py # produces alpha.pt
# 2. Build C++ inference binary
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch; print(torch.utils.cmake_prefix_path)')" ..
make -j
# 3. Run — no Python needed
./alpha_infer ../alpha.pt ./libxpandas_ops.soSee inference/main.cpp for the complete C++ driver code.
Yes — by design. xpandas groupby uses sorted std::map keys to guarantee deterministic output order in TorchScript. pandas uses optimized Cython hashmaps that are faster but non-deterministic. If groupby performance is critical, consider pre-sorting your data or reducing group cardinality.
Basic support exists via FakeTensor kernels in ops_meta.py. However, the primary compilation target is torch.jit.script. For production deployment, use TorchScript.
Currently all ops are CPU-only. The ops dispatch through PyTorch's dispatcher, so adding CUDA kernels is architecturally possible but not yet implemented.
See CONTRIBUTING.md (中文) for a
step-by-step guide to adding a new op, using rank as a worked example.
Apache-2.0. See LICENSE.