Releases: BaseModelAI/cleora
Default to interwhiten
Default to interwhiten
pycleora 3.2.0
pycleora 3.2.0
March 2026 — Performance & Ecosystem Release
Graph embeddings. Blazing fast.
New Features
-
Rust-native Full Embed Loop (
embed_fast) — The entire embedding pipeline (initialization, propagation, normalization, iteration) now runs inside a single Rust call. Eliminates Python↔Rust boundary crossing on every iteration. 3.7× faster on roadNet (2M nodes): 15.8s → 4.3s. 1.7× faster on Cora (2.5K nodes). -
Embedding Whitening (
whiten_embeddings) — Post-processing that mean-centers and decorrelates embedding dimensions via eigendecomposition. Boosts node classification accuracy from 0.26 → 0.70 on Cora. Combined with multiscale, achieves 0.83 accuracy. Available aswhiten=Trueparameter inembed(). -
Residual Connections — Mix propagated embeddings with previous iteration:
emb = (1-α)·propagated + α·prev. Prevents over-smoothing on deep iterations. Parameter:residual_weightinembed(). -
Convergence-Based Early Stopping — Automatically detects when embeddings stabilize (RMSE between iterations drops below threshold). Saves compute on graphs that converge before
max_iterations. Parameter:convergence_thresholdinembed(). -
Graph Statistics Module (
pycleora.stats) —graph_summary(),degree_distribution(),clustering_coefficient(),connected_components(),diameter(),betweenness_centrality(),pagerank(). -
Graph Preprocessing Module (
pycleora.preprocess) —clean_graph(),largest_connected_component(),filter_by_degree(). -
ANN Search Module (
pycleora.search) —ANNIndexclass for approximate nearest neighbor queries. HNSW backend via optional hnswlib, ball-tree fallback without dependencies. -
Embedding Compression Module (
pycleora.compress) —pca_compress(),random_projection(),product_quantize()→PQIndexwithreconstruct()andsearch(). -
Embedding Alignment Module (
pycleora.align) —procrustes(),cca_align(),alignment_score(). -
Ensemble Embeddings Module (
pycleora.ensemble) —combine()— merge multiple embedding matrices via concat, mean, weighted average, or SVD reduction. -
Extended I/O —
from_pandas(),from_scipy_sparse(),from_numpy(),from_edge_list().
Improvements
-
Rust Core: Double-Buffered Propagation — Two pre-allocated buffers are swapped instead of allocating a new embedding matrix every iteration. Reduces GC pressure and memory allocator overhead.
-
Rust Core: Faster Initialization Hashing — Replaced SipHash (
DefaultHasher) with FxHash ininit_value(). FxHash runs at ~0.3 cycles/byte vs SipHash's ~4 cycles/byte — 10× faster initialization for large graphs. -
Rust Core: GIL Release During Embedding —
py.allow_threads()releases Python's GIL during the entire Rust embedding computation, enabling true multi-threaded parallelism. -
Rust Core: Vectorization-Friendly Inner Loop — SpMM kernel rewritten with direct slice access instead of ndarray iterators, enabling better auto-vectorization by LLVM.
Benchmarks
Cleora achieves state-of-the-art results on node classification across 5 real-world datasets from SNAP, Planetoid, and DGL.
| Dataset | Nodes | Cleora | NetMF | DeepWalk | Node2Vec | HOPE | GraRep | ProNE | RandNE |
|---|---|---|---|---|---|---|---|---|---|
| ego-Facebook | 4K | 0.990 | 0.957 | 0.958 | 0.958 | 0.890 | T/O | 0.075 | 0.212 |
| Cora | 2.7K | 0.861 | 0.839 | 0.835 | 0.835 | 0.821 | 0.809 | 0.179 | 0.247 |
| CiteSeer | 3.3K | 0.824 | 0.810 | 0.806 | 0.806 | 0.740 | 0.756 | 0.189 | 0.244 |
| PubMed | 19.7K | 0.879 | OOM | T/O | T/O | T/O | OOM | 0.339 | 0.351 |
| PPI | 3.9K | 1.000 | OOM | T/O | T/O | T/O | OOM | 0.023 | 0.073 |
Cleora wins on accuracy on every single dataset while using 10–24× less memory than accuracy-competitive methods.
Install
pip install pycleoraLinks
- Website: https://cleora.ai
- Documentation: https://cleora.ai/docs
- API Reference: https://cleora.ai/api
- Benchmarks: https://cleora.ai/benchmarks
- Changelog: https://cleora.ai/changelog
v2.0.0
Cleora is now available as a Python package pycleora. Key improvements compared to the previous version:
- performance optimizations: ~10x faster embedding times
- performance optimizations: significantly reduced memory usage
- latest research: improved embedding quality
- new feature: can create graphs from a Python iterator in addition to
tsvfiles - new feature: seamless integration with
NumPy - new feature: item attributes support via custom embeddings initialization
- new feature: adjustable vector projection / normalization after each propagation step
Breaking changes:
- transient modifier not supported any more - creating
complex::reflexivecolumns for hypergraph embeddings, grouped by the transient entity gives better results.
v1.2.3
v1.2.2
v1.2.1
v1.2.0
v1.1.1
Cleora v1.1.0
Changed
- Bumped
env_loggerto0.8.2,smallvecto1.5.1, removedfnvhasher (#11).
Added
- Tests (snapshots) for in-memory and memory-mapped files calculations of embeddings (#12).
- Support for
NumPyoutput format (available via--output-formatprogram argument) (#15). - Jupyter notebooks with experiments (#16).