Quadrants is a high-performance multi-platform compiler for physics simulation being continuously developed by Genesis AI.
It is designed for large-scale physics simulation and robotics workloads. It compiles Python code into highly optimized parallel kernels that run on:
- NVIDIA GPUs (CUDA)
- Vulkan-compatible GPUs (SPIR-V)
- Apple Metal GPUs
- AMD GPUs (ROCm HIP)
- x86 and ARM64 CPUs
The quadrants project was originally forked from Taichi in June 2025. As the original Taichi is no longer being maintained and the codebase evolved into a fully independent compiler with its own direction and long-term roadmap, we decided to give it a name that reflects both its roots and its new identity. The name Quadrants is inspired by the Chinese saying:
太极生两仪,两仪生四象
The Supreme Polarity (Taichi) gives rise to the Two Modes (Ying & Yang), which in turn give rise to the Four Forms (Quadrants).
Quadrants captures the idea of progression originated from taichi — built on the same foundation, evolving in its own direction while acknowledging its roots. This project is now fully independent and does not aim to maintain backward compatibility with upstream Taichi.
While the repository still resembles upstream in structure, major changes include:
- LLVM 22, ARM (aarch64) support
- Kernel-level code coverage — device-side branch coverage in standard
coverage.pyformat, integrated with pytest-cov - AI-driven checks for line wrapping, deleted comments, test coverage, and feature factorization
dataclasses.dataclassstructs — work with ndarrays and fields, nestable, passable toqd.func, zero kernel-runtime overheadqd.Tensor— unified API over fields and ndarrays with per-tensor layout control, pickle support, and abackend=switchBufferView— safe sub-range ndarray access with bounds checking in debug mode
To focus the compiler and reduce maintenance burden, we removed: GUI/GGUI, C-API, AOT, DX11/DX12, iOS/Android, OpenGL/GLES, argpack, CLI.
- Reduced launch latency — ndarray CPU performance improved 4.5×; ndarray GPU performance went from 11× slower than fields to ~30% slower (5090 GPU, Genesis benchmark)
- Fastcache — opt-in source-level cache (
@qd.kernel(fastcache=True)) that bypasses front-end AST parsing; reduces warm-cache kernel load from 7.2 s → 0.3 s on Genesis benchmarks - GPU Graphs —
@qd.kernel(graph=True)captures kernel sequences into a graph;qd.graph_do_whileruns GPU-side iteration loops (hardware conditional nodes on CUDA SM 9.0+) - perf_dispatch — auto-benchmarks multiple kernel implementations and selects the fastest at runtime
- Zero-copy interop —
to_torch(copy=False)/to_numpy(copy=False)via DLPack on CUDA, CPU, AMDGPU, and Metal; direct torch tensor pass-through into kernels
- Tile16x16 — register-resident 16×16 matrix tiles with Cholesky, triangular solve, and rank-1 updates; 5× faster than shared-memory baselines on blocked linear algebra
- Subgroup ops — cross-platform
shuffle,shuffle_down,reduce_add,reduce_all_addacross CUDA, AMDGPU, Metal and Vulkan
- Autodiff with dynamic loops — computes the gradient of any kernel transparently using reverse-mode differentiation and runtime-based memory allocation
- Forward-mode AD, custom gradients (
@qd.ad.grad_replaced),qd.ad.Tape
- Python backend —
qd.init(qd.python)interprets kernels as plain Python so they can be stepped through in a standard Python debugger
- Python 3.10-3.13
- Mac OS 14, 15, Windows, or Ubuntu 22.04-24.04 or compatible
- ROCm 5.2 or newer for AMD GPU support
pip install quadrants
(For how to build from source, see our CI build scripts, e.g. linux build scripts )
Quadrants stands on the shoulders of the original Taichi project, built with care and vision by many contributors over the years. For the full list of contributors and credits, see the original Taichi repository.
We are grateful for that foundation.