Releases · BaseModelAI/cleora

02 Apr 15:27

ponythewhite

v3.2.1

f1cb240

Default to interwhiten Latest

Latest

Default to interwhiten

Assets 2

31 Mar 12:44

ponythewhite

v3.2.0

9bfb69f

pycleora 3.2.0

March 2026 — Performance & Ecosystem Release

Graph embeddings. Blazing fast.

New Features

Rust-native Full Embed Loop (embed_fast) — The entire embedding pipeline (initialization, propagation, normalization, iteration) now runs inside a single Rust call. Eliminates Python↔Rust boundary crossing on every iteration. 3.7× faster on roadNet (2M nodes): 15.8s → 4.3s. 1.7× faster on Cora (2.5K nodes).
Embedding Whitening (whiten_embeddings) — Post-processing that mean-centers and decorrelates embedding dimensions via eigendecomposition. Boosts node classification accuracy from 0.26 → 0.70 on Cora. Combined with multiscale, achieves 0.83 accuracy. Available as whiten=True parameter in embed().
Residual Connections — Mix propagated embeddings with previous iteration: emb = (1-α)·propagated + α·prev. Prevents over-smoothing on deep iterations. Parameter: residual_weight in embed().
Convergence-Based Early Stopping — Automatically detects when embeddings stabilize (RMSE between iterations drops below threshold). Saves compute on graphs that converge before max_iterations. Parameter: convergence_threshold in embed().
Graph Statistics Module (pycleora.stats) — graph_summary(), degree_distribution(), clustering_coefficient(), connected_components(), diameter(), betweenness_centrality(), pagerank().
Graph Preprocessing Module (pycleora.preprocess) — clean_graph(), largest_connected_component(), filter_by_degree().
ANN Search Module (pycleora.search) — ANNIndex class for approximate nearest neighbor queries. HNSW backend via optional hnswlib, ball-tree fallback without dependencies.
Embedding Compression Module (pycleora.compress) — pca_compress(), random_projection(), product_quantize() → PQIndex with reconstruct() and search().
Embedding Alignment Module (pycleora.align) — procrustes(), cca_align(), alignment_score().
Ensemble Embeddings Module (pycleora.ensemble) — combine() — merge multiple embedding matrices via concat, mean, weighted average, or SVD reduction.
Extended I/O — from_pandas(), from_scipy_sparse(), from_numpy(), from_edge_list().

Improvements

Rust Core: Double-Buffered Propagation — Two pre-allocated buffers are swapped instead of allocating a new embedding matrix every iteration. Reduces GC pressure and memory allocator overhead.
Rust Core: Faster Initialization Hashing — Replaced SipHash (DefaultHasher) with FxHash in init_value(). FxHash runs at ~0.3 cycles/byte vs SipHash's ~4 cycles/byte — 10× faster initialization for large graphs.
Rust Core: GIL Release During Embedding — py.allow_threads() releases Python's GIL during the entire Rust embedding computation, enabling true multi-threaded parallelism.
Rust Core: Vectorization-Friendly Inner Loop — SpMM kernel rewritten with direct slice access instead of ndarray iterators, enabling better auto-vectorization by LLVM.

Benchmarks

Cleora achieves state-of-the-art results on node classification across 5 real-world datasets from SNAP, Planetoid, and DGL.

Dataset	Nodes	Cleora	NetMF	DeepWalk	Node2Vec	HOPE	GraRep	ProNE	RandNE
ego-Facebook	4K	0.990	0.957	0.958	0.958	0.890	T/O	0.075	0.212
Cora	2.7K	0.861	0.839	0.835	0.835	0.821	0.809	0.179	0.247
CiteSeer	3.3K	0.824	0.810	0.806	0.806	0.740	0.756	0.189	0.244
PubMed	19.7K	0.879	OOM	T/O	T/O	T/O	OOM	0.339	0.351
PPI	3.9K	1.000	OOM	T/O	T/O	T/O	OOM	0.023	0.073

Cleora wins on accuracy on every single dataset while using 10–24× less memory than accuracy-competitive methods.

Install

pip install pycleora

Changed

Bump libs (#60).

Fixed

Check for malformed lines in input (#59).

Assets 6

24 Jun 13:20

github-actions

v1.2.2

56269cd

v1.2.2

Changed

Allow cleora to accept multiple input files as positional args. Named argument 'input' is getting deprecated (#55).

Assets 6

13 Apr 10:04

github-actions

v1.2.1

4d85152

v1.2.1

Changed

Optimize "--output-format numpy" mode, so it doesn't require additional memory when writing output file (#50).
Bump libs (#52).

Assets 6

17 Mar 16:52

github-actions

v1.2.0

a9cea69

v1.2.0

Added

Use default hasher for vector init. (#47).

Assets 6

14 May 12:58

github-actions

v1.1.1

d78a53b

v1.1.1

Added

Init embedding with seed during training (#27).

Assets 6

23 Dec 18:07

github-actions

v1.1.0

ded180a

Cleora v1.1.0

Changed

Bumped env_logger to 0.8.2, smallvec to 1.5.1, removed fnv hasher (#11).

Added

Tests (snapshots) for in-memory and memory-mapped files calculations of embeddings (#12).
Support for NumPy output format (available via --output-format program argument) (#15).
Jupyter notebooks with experiments (#16).

Improved

Used vector for hash_to_id mappings, non-allocating cartesian product, ryu crate for faster write (#13).
Sparse Matrix refactor (cleanup, simplification, using iter, speedup). Use Cargo.toml data for clap crate (#17).
Unify and simplify embeddings calculation for in-memory and mmap matrices (#18).

Assets 6

23 Nov 16:37

github-actions

v1.0.1

8216dc8

Cleora v1.0.1

Fixed

Skip reading invalid UTF-8 line (#8).
Fix clippy warnings (#7).

Added

JSON support (#3).
Snapshot testing (#5).

Assets 6

Releases: BaseModelAI/cleora

Default to interwhiten

Uh oh!

pycleora 3.2.0

pycleora 3.2.0

New Features

Improvements

Benchmarks

Install

Links

Uh oh!

v2.0.0

Uh oh!

v1.2.3

Changed

Fixed

Uh oh!

v1.2.2

Changed

Uh oh!

v1.2.1

Changed

Uh oh!

v1.2.0

Added

Uh oh!

v1.1.1

Added

Uh oh!

Cleora v1.1.0

Changed

Added

Improved

Uh oh!

Cleora v1.0.1

Fixed

Added

Uh oh!