Skip to content

pycleora 3.2.0

Choose a tag to compare

@ponythewhite ponythewhite released this 31 Mar 12:44
· 3 commits to master since this release

pycleora 3.2.0

March 2026 — Performance & Ecosystem Release

Graph embeddings. Blazing fast.


New Features

  • Rust-native Full Embed Loop (embed_fast) — The entire embedding pipeline (initialization, propagation, normalization, iteration) now runs inside a single Rust call. Eliminates Python↔Rust boundary crossing on every iteration. 3.7× faster on roadNet (2M nodes): 15.8s → 4.3s. 1.7× faster on Cora (2.5K nodes).

  • Embedding Whitening (whiten_embeddings) — Post-processing that mean-centers and decorrelates embedding dimensions via eigendecomposition. Boosts node classification accuracy from 0.26 → 0.70 on Cora. Combined with multiscale, achieves 0.83 accuracy. Available as whiten=True parameter in embed().

  • Residual Connections — Mix propagated embeddings with previous iteration: emb = (1-α)·propagated + α·prev. Prevents over-smoothing on deep iterations. Parameter: residual_weight in embed().

  • Convergence-Based Early Stopping — Automatically detects when embeddings stabilize (RMSE between iterations drops below threshold). Saves compute on graphs that converge before max_iterations. Parameter: convergence_threshold in embed().

  • Graph Statistics Module (pycleora.stats)graph_summary(), degree_distribution(), clustering_coefficient(), connected_components(), diameter(), betweenness_centrality(), pagerank().

  • Graph Preprocessing Module (pycleora.preprocess)clean_graph(), largest_connected_component(), filter_by_degree().

  • ANN Search Module (pycleora.search)ANNIndex class for approximate nearest neighbor queries. HNSW backend via optional hnswlib, ball-tree fallback without dependencies.

  • Embedding Compression Module (pycleora.compress)pca_compress(), random_projection(), product_quantize()PQIndex with reconstruct() and search().

  • Embedding Alignment Module (pycleora.align)procrustes(), cca_align(), alignment_score().

  • Ensemble Embeddings Module (pycleora.ensemble)combine() — merge multiple embedding matrices via concat, mean, weighted average, or SVD reduction.

  • Extended I/Ofrom_pandas(), from_scipy_sparse(), from_numpy(), from_edge_list().

Improvements

  • Rust Core: Double-Buffered Propagation — Two pre-allocated buffers are swapped instead of allocating a new embedding matrix every iteration. Reduces GC pressure and memory allocator overhead.

  • Rust Core: Faster Initialization Hashing — Replaced SipHash (DefaultHasher) with FxHash in init_value(). FxHash runs at ~0.3 cycles/byte vs SipHash's ~4 cycles/byte — 10× faster initialization for large graphs.

  • Rust Core: GIL Release During Embeddingpy.allow_threads() releases Python's GIL during the entire Rust embedding computation, enabling true multi-threaded parallelism.

  • Rust Core: Vectorization-Friendly Inner Loop — SpMM kernel rewritten with direct slice access instead of ndarray iterators, enabling better auto-vectorization by LLVM.


Benchmarks

Cleora achieves state-of-the-art results on node classification across 5 real-world datasets from SNAP, Planetoid, and DGL.

Dataset Nodes Cleora NetMF DeepWalk Node2Vec HOPE GraRep ProNE RandNE
ego-Facebook 4K 0.990 0.957 0.958 0.958 0.890 T/O 0.075 0.212
Cora 2.7K 0.861 0.839 0.835 0.835 0.821 0.809 0.179 0.247
CiteSeer 3.3K 0.824 0.810 0.806 0.806 0.740 0.756 0.189 0.244
PubMed 19.7K 0.879 OOM T/O T/O T/O OOM 0.339 0.351
PPI 3.9K 1.000 OOM T/O T/O T/O OOM 0.023 0.073

Cleora wins on accuracy on every single dataset while using 10–24× less memory than accuracy-competitive methods.

Install

pip install pycleora

Links