Release pycleora 3.2.0 · BaseModelAI/cleora

pycleora 3.2.0

March 2026 — Performance & Ecosystem Release

Graph embeddings. Blazing fast.

New Features

Rust-native Full Embed Loop (embed_fast) — The entire embedding pipeline (initialization, propagation, normalization, iteration) now runs inside a single Rust call. Eliminates Python↔Rust boundary crossing on every iteration. 3.7× faster on roadNet (2M nodes): 15.8s → 4.3s. 1.7× faster on Cora (2.5K nodes).
Embedding Whitening (whiten_embeddings) — Post-processing that mean-centers and decorrelates embedding dimensions via eigendecomposition. Boosts node classification accuracy from 0.26 → 0.70 on Cora. Combined with multiscale, achieves 0.83 accuracy. Available as whiten=True parameter in embed().
Residual Connections — Mix propagated embeddings with previous iteration: emb = (1-α)·propagated + α·prev. Prevents over-smoothing on deep iterations. Parameter: residual_weight in embed().
Convergence-Based Early Stopping — Automatically detects when embeddings stabilize (RMSE between iterations drops below threshold). Saves compute on graphs that converge before max_iterations. Parameter: convergence_threshold in embed().
Graph Statistics Module (pycleora.stats) — graph_summary(), degree_distribution(), clustering_coefficient(), connected_components(), diameter(), betweenness_centrality(), pagerank().
Graph Preprocessing Module (pycleora.preprocess) — clean_graph(), largest_connected_component(), filter_by_degree().
ANN Search Module (pycleora.search) — ANNIndex class for approximate nearest neighbor queries. HNSW backend via optional hnswlib, ball-tree fallback without dependencies.
Embedding Compression Module (pycleora.compress) — pca_compress(), random_projection(), product_quantize() → PQIndex with reconstruct() and search().
Embedding Alignment Module (pycleora.align) — procrustes(), cca_align(), alignment_score().
Ensemble Embeddings Module (pycleora.ensemble) — combine() — merge multiple embedding matrices via concat, mean, weighted average, or SVD reduction.
Extended I/O — from_pandas(), from_scipy_sparse(), from_numpy(), from_edge_list().

Improvements

Rust Core: Double-Buffered Propagation — Two pre-allocated buffers are swapped instead of allocating a new embedding matrix every iteration. Reduces GC pressure and memory allocator overhead.
Rust Core: Faster Initialization Hashing — Replaced SipHash (DefaultHasher) with FxHash in init_value(). FxHash runs at ~0.3 cycles/byte vs SipHash's ~4 cycles/byte — 10× faster initialization for large graphs.
Rust Core: GIL Release During Embedding — py.allow_threads() releases Python's GIL during the entire Rust embedding computation, enabling true multi-threaded parallelism.
Rust Core: Vectorization-Friendly Inner Loop — SpMM kernel rewritten with direct slice access instead of ndarray iterators, enabling better auto-vectorization by LLVM.

Benchmarks

Cleora achieves state-of-the-art results on node classification across 5 real-world datasets from SNAP, Planetoid, and DGL.

Dataset	Nodes	Cleora	NetMF	DeepWalk	Node2Vec	HOPE	GraRep	ProNE	RandNE
ego-Facebook	4K	0.990	0.957	0.958	0.958	0.890	T/O	0.075	0.212
Cora	2.7K	0.861	0.839	0.835	0.835	0.821	0.809	0.179	0.247
CiteSeer	3.3K	0.824	0.810	0.806	0.806	0.740	0.756	0.189	0.244
PubMed	19.7K	0.879	OOM	T/O	T/O	T/O	OOM	0.339	0.351
PPI	3.9K	1.000	OOM	T/O	T/O	T/O	OOM	0.023	0.073

Cleora wins on accuracy on every single dataset while using 10–24× less memory than accuracy-competitive methods.

Install

pip install pycleora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycleora 3.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

pycleora 3.2.0

New Features

Improvements

Benchmarks

Install

Links

Uh oh!