Skip to content

EfficientPPML/MORPH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MORPH

TPU-accelerated, free, immediate, fast, and cheap Zero Knowledge Proof serving for everyone

| paper | code | tutorial |

🔥 We have delivered a tutorial at ASPLOS'26 to help you get started with MORPH. Please visit CPA_tutorial to learn more. For questions, please drop an email to our community email.


MORPH: Enable AI Accelerator for Zero Knowledge Proof

License: MIT

1. What is MORPH?

MORPH is the first project to enable AI Accelerator, such as Google TPUs, to accelerate Zero Knowledge Proof Primitives (Multi-scalar Multiplication and Number Theory Transformation) and achieves the State-of-the-art (SotA) throughput and energy efficiency (performance per watt). Together with CROSS, they enable AI ASICs to be SotA throughput machine for cryptography primitive with wide-range precision.

It features

  • MXU Lazy Modular Reduction: bringing quadratic high-precision modular reduction down to linear operation.

  • dataflow optimization for MSM and NTT. Details in the paper.

This branch (asplos) contains demo scripts for profiling and comparing the two core workloads.

Project Structure

├── finite_field_context.py           # Finite field arithmetic (MORPH & CROSS backends)
├── elliptic_curve_context.py         # Elliptic curve point arithmetic
├── multiscalar_multiplication_context.py    # Multi-scalar multiplication (MSM)
├── number_theory_transform_context.py       # Number Theoretic Transform (NTT)
├── utils.py                          # JAX kernel utilities, number theory helpers
├── profiler.py                       # Trace parsing and kernel profiling
├── configurations.toml               # Curve parameters (BLS12-377)
├── c_kernels/                        # Custom C kernels for TPU acceleration
├── deployments/                      # Serialized compiled JAX kernels

All functions have _test.py and _perf_test.py for correctness and performance testing.

Key Concepts

Concept Description
DRNS (Double RNS) Residue Number System representation enabling efficient large-integer modular arithmetic on TPU
MORPH Alternative modular multiplication backend using chunk-based representation
MSM Multi-scalar multiplication — computing $\sum_i s_i \cdot P_i$ over elliptic curve points
Bucket Accumulation MSM decomposition strategy: scalars are sliced into windows, points accumulated into buckets per window
Compiled Kernels Pre-compiled JAX/C kernels stored in deployments/ for fast TPU execution
Sharding Distribution of computation aMORPH TPU cores

2. Environment Setup

Inside TPU VM, please do following setup to configure the environment.

Step 1: install miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x ./Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

Step 2: create environment and install required packages

source ~/.bashrc
conda create --name jaxite python=3.13
conda activate jaxite
pip install -U "jax[tpu]"
pip install xprof
pip install absl-py
pip install toml
pip install gdown
pip install pandas
pip install gmpy2

Step 3: Install the C++ toolchain for the MSM C kernel.

The MSM path uses a CPU C kernel (c_kernels/distribution.cpp) that is compiled on the first import of multiscalar_multiplication_context. You need a host g++ with OpenMP, and the conda env's bundled libstdc++ must be recent enough to satisfy the symbols emitted by that compiler. On modern Ubuntu (g++ 13+) this means GLIBCXX_3.4.32, which the conda default libstdcxx-ng 11.2.0 does not ship — so install a newer one from conda-forge:

sudo apt-get install -y g++          # skip if already installed
conda install -n jaxite -c conda-forge 'libstdcxx-ng>=13' 'libgcc-ng>=13'

If you see OSError: ... libstdc++.so.6: version 'GLIBCXX_3.4.XX' not found when importing multiscalar_multiplication_context, the conda libstdc++ is older than your system g++ — re-run the conda install line above (or LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 for a one-shot workaround).

Step 4: Download the reference data

mkdir -p data && gdown 1aJhANlS8hWrjSt9j0nBKoFRBoZh0W1aa -O data/data.tar.gz && cd data && tar -xvf data.tar.gz

Step 5 (optional): Pre-build the MSM C kernel.

The C kernel is compiled automatically by c_kernels/build.py on the first import of multiscalar_multiplication_context, so no separate build step is required. To pre-build (or force a rebuild) ahead of time:

python -m c_kernels.build           # build if missing or stale
python -m c_kernels.build --force   # always rebuild

The compiler defaults to g++ with -std=c++17 -fopenmp -O2 -fPIC -shared plus -I<jaxlib>/include. Override via the CXX and CXXFLAGS env vars (e.g. point CXX at conda's gxx_linux-64 to keep the build inside the env).

3. TPU Setup

The code is optimized for TPU execution, but it also runs on NVIDIA GPU and CPU for functional preview (not optimized for these devices).

  • Step 1: Create a Google Project tutorial.

Obtain the name of the project as <google_project_name> and Google Project ID from the created project.

  • Step 2: Apply for the Tree-tier TPU trail for 30 daysTRC

Once submitted the request, an email will be shot to you within one day, where there is a link to fill in a survey with your Google project ID.

  • Step 3: Launch TPU VM. You could do it over GUI or gcloud cli (in your local machine) to create a TPU VM. I give the gcloud cli as it works for all generations (>=v4) of TPUs.

For TPUv6e,

gcloud config set project <google_project_name>
gcloud config set compute/zone us-east1-d
gcloud alpha compute tpus queued-resources create <google_project_name> --node-id=<your_favoriate_node_name> \
    --zone=us-east1-d \
    --accelerator-type=v6e-1  \
    --runtime-version=v2-alpha-tpuv6e \
    --provisioning-model=spot

Note that TPUv5e and TPUv6e could only work with provisioning-model as spot, because they are popular resources, and Google cloud can preempt it if there are tasks with higher priority requiring these resources. But you could get a long-term active TPUv4 VM as it's less demanding by other tasks.

  • Step 4: Setup Remote SSH (VSCode or Cursor) to TPU VM Once the requested TPU vm is up and running as shown in Google console, you could use gcloud to forward the SSH port of the remote machine to a port of local machine and setup VSCode remote ssh.

You need to first setup local ssh key to Google's compute engine, following link. After your follow the instructions on the page, the ssh key will be dumped here <path_to_local_user>/.ssh/google_compute_engine.

gcloud compute tpus tpu-vm ssh <gcloud_user_name>@<your_favoriate_node_name> -- -L 9009:localhost:22

Where 9009 is the port of local machine, while 22 is the SSH port of the TPU vm.

After you set it up, you could configure VSCode to use the remote SSH package link to remotely access into TPUvm.

Host tpu-vm
    User <gcloud_user_name>
    HostName localhost
    Port 9009
    IdentityFile  <path_to_local_user>/.ssh/google_compute_engine

After this, you should follow the steps on link to log into TPU VM.

4. Ready to Play?

Run functional correctness tests for both NTT and MSM:

python3 number_theory_transform_test.py
python3 multiscalar_multiplication_test.py

Run performance tests for both NTT and MSM:

python3 number_theory_transform_perf_test.py
python3 multiscalar_multiplication_perf_test.py

Notes:

  • The first MSM test run auto-compiles c_kernels/distribution.cpp into c_kernels/distribution.so (a few seconds). Subsequent runs reuse the cached .so and rebuild only when the source is newer.
  • The first run of each test also JIT-compiles JAX kernels; expect a longer first iteration that is then cached under deployments/.
  • Performance tests assume the reference data from Step 4 is present under ./data/.

5. Call for Actions

Our mission is to build an open-sourced SoTA library for the community.

  • If you find this repository helpful, please consider giving it a star :)
  • For any questions, please feel free to open an issue.
  • For any suggestions or new features, please feel free to open a pull request.

Contact

Citation

@inproceedings{tong2025MORPH,
author = {Jianming Tong and Jingtian Dang and Simon Langowski and Tianhao Huang and Asra Ali and Jeremy Kun and Srini Devadas and Tushar Krishna},
title = {MORPH: Enabling AI ASICs for Zero Knowledge Proof},
year = {2026},
publisher = {IEEE Press},
booktitle = {Proceedings of the 63nd Annual ACM/IEEE Design Automation Conference},
location = {Los Angeles, California, United States},
series = {DAC '26}
}

Enjoy! :D

About

Enabling AI ASIC for ZKP

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors