Skip to content

Releases: BaseModelAI/cleora

Default to interwhiten

02 Apr 15:27
f1cb240

Choose a tag to compare

Default to interwhiten

pycleora 3.2.0

31 Mar 12:44

Choose a tag to compare

pycleora 3.2.0

March 2026 — Performance & Ecosystem Release

Graph embeddings. Blazing fast.


New Features

  • Rust-native Full Embed Loop (embed_fast) — The entire embedding pipeline (initialization, propagation, normalization, iteration) now runs inside a single Rust call. Eliminates Python↔Rust boundary crossing on every iteration. 3.7× faster on roadNet (2M nodes): 15.8s → 4.3s. 1.7× faster on Cora (2.5K nodes).

  • Embedding Whitening (whiten_embeddings) — Post-processing that mean-centers and decorrelates embedding dimensions via eigendecomposition. Boosts node classification accuracy from 0.26 → 0.70 on Cora. Combined with multiscale, achieves 0.83 accuracy. Available as whiten=True parameter in embed().

  • Residual Connections — Mix propagated embeddings with previous iteration: emb = (1-α)·propagated + α·prev. Prevents over-smoothing on deep iterations. Parameter: residual_weight in embed().

  • Convergence-Based Early Stopping — Automatically detects when embeddings stabilize (RMSE between iterations drops below threshold). Saves compute on graphs that converge before max_iterations. Parameter: convergence_threshold in embed().

  • Graph Statistics Module (pycleora.stats)graph_summary(), degree_distribution(), clustering_coefficient(), connected_components(), diameter(), betweenness_centrality(), pagerank().

  • Graph Preprocessing Module (pycleora.preprocess)clean_graph(), largest_connected_component(), filter_by_degree().

  • ANN Search Module (pycleora.search)ANNIndex class for approximate nearest neighbor queries. HNSW backend via optional hnswlib, ball-tree fallback without dependencies.

  • Embedding Compression Module (pycleora.compress)pca_compress(), random_projection(), product_quantize()PQIndex with reconstruct() and search().

  • Embedding Alignment Module (pycleora.align)procrustes(), cca_align(), alignment_score().

  • Ensemble Embeddings Module (pycleora.ensemble)combine() — merge multiple embedding matrices via concat, mean, weighted average, or SVD reduction.

  • Extended I/Ofrom_pandas(), from_scipy_sparse(), from_numpy(), from_edge_list().

Improvements

  • Rust Core: Double-Buffered Propagation — Two pre-allocated buffers are swapped instead of allocating a new embedding matrix every iteration. Reduces GC pressure and memory allocator overhead.

  • Rust Core: Faster Initialization Hashing — Replaced SipHash (DefaultHasher) with FxHash in init_value(). FxHash runs at ~0.3 cycles/byte vs SipHash's ~4 cycles/byte — 10× faster initialization for large graphs.

  • Rust Core: GIL Release During Embeddingpy.allow_threads() releases Python's GIL during the entire Rust embedding computation, enabling true multi-threaded parallelism.

  • Rust Core: Vectorization-Friendly Inner Loop — SpMM kernel rewritten with direct slice access instead of ndarray iterators, enabling better auto-vectorization by LLVM.


Benchmarks

Cleora achieves state-of-the-art results on node classification across 5 real-world datasets from SNAP, Planetoid, and DGL.

Dataset Nodes Cleora NetMF DeepWalk Node2Vec HOPE GraRep ProNE RandNE
ego-Facebook 4K 0.990 0.957 0.958 0.958 0.890 T/O 0.075 0.212
Cora 2.7K 0.861 0.839 0.835 0.835 0.821 0.809 0.179 0.247
CiteSeer 3.3K 0.824 0.810 0.806 0.806 0.740 0.756 0.189 0.244
PubMed 19.7K 0.879 OOM T/O T/O T/O OOM 0.339 0.351
PPI 3.9K 1.000 OOM T/O T/O T/O OOM 0.023 0.073

Cleora wins on accuracy on every single dataset while using 10–24× less memory than accuracy-competitive methods.

Install

pip install pycleora

Links

v2.0.0

24 Nov 21:52
3cc300d

Choose a tag to compare

Cleora is now available as a Python package pycleora. Key improvements compared to the previous version:

  • performance optimizations: ~10x faster embedding times
  • performance optimizations: significantly reduced memory usage
  • latest research: improved embedding quality
  • new feature: can create graphs from a Python iterator in addition to tsv files
  • new feature: seamless integration with NumPy
  • new feature: item attributes support via custom embeddings initialization
  • new feature: adjustable vector projection / normalization after each propagation step

Breaking changes:

  • transient modifier not supported any more - creating complex::reflexive columns for hypergraph embeddings, grouped by the transient entity gives better results.

v1.2.3

29 Jun 14:56
4810af4

Choose a tag to compare

Changed

  • Bump libs (#60).

Fixed

  • Check for malformed lines in input (#59).

v1.2.2

24 Jun 13:20
56269cd

Choose a tag to compare

Changed

  • Allow cleora to accept multiple input files as positional args. Named argument 'input' is getting deprecated (#55).

v1.2.1

13 Apr 10:04
4d85152

Choose a tag to compare

Changed

  • Optimize "--output-format numpy" mode, so it doesn't require additional memory when writing output file (#50).
  • Bump libs (#52).

v1.2.0

17 Mar 16:52

Choose a tag to compare

Added

  • Use default hasher for vector init. (#47).

v1.1.1

14 May 12:58
d78a53b

Choose a tag to compare

Added

  • Init embedding with seed during training (#27).

Cleora v1.1.0

23 Dec 18:07
ded180a

Choose a tag to compare

Changed

  • Bumped env_logger to 0.8.2, smallvec to 1.5.1, removed fnv hasher (#11).

Added

  • Tests (snapshots) for in-memory and memory-mapped files calculations of embeddings (#12).
  • Support for NumPy output format (available via --output-format program argument) (#15).
  • Jupyter notebooks with experiments (#16).

Improved

  • Used vector for hash_to_id mappings, non-allocating cartesian product, ryu crate for faster write (#13).
  • Sparse Matrix refactor (cleanup, simplification, using iter, speedup). Use Cargo.toml data for clap crate (#17).
  • Unify and simplify embeddings calculation for in-memory and mmap matrices (#18).

Cleora v1.0.1

23 Nov 16:37

Choose a tag to compare

Fixed

  • Skip reading invalid UTF-8 line (#8).
  • Fix clippy warnings (#7).

Added

  • JSON support (#3).
  • Snapshot testing (#5).