Skip to content

protomn/ieee754-compression-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IEEE754 Compression Engine

A low-latency, bit-level compression engine for IEEE-754 double-precision time-series data.

Implements Gorilla-style XOR delta encoding with window reuse and adaptive mantissa precision tracking.

Benchmarked on 100M financial ticks (800MB raw):

  • 3.26–3.36× compression
  • 200–275M values/sec encoding
  • 170–270M values/sec decoding
  • Apple M2, clang -O3 -march=native

Quick Start

# Build everything
make all

# Run full test suite (validated on 100M ticks)
./bin/adaptive_encoder_benchmark && \
./bin/AdaptivePrecisionDecoder-Analysis && \
./bin/fast_encoder_benchmark && \
./bin/adaptive_integrity_test && \
./bin/precision_encoder_test && \
./bin/precision_decoder_test && \
./bin/stress_test data/market_data.csv

Results Overview:

  • All 24 tests pass (12 encoder + 12 decoder)
  • 100M tick stress test: 3.26-3.36:1 compression ratio
  • Throughput: 200-275 M values/sec encoding, 170-270 M values/sec decoding
  • Space savings: 69-70% on financial data

Overview

This library implements multiple variants of delta-XOR compression algorithms inspired by Facebook's Gorilla time-series database. It achieves 69-70% compression ratios on real financial market data (100M ticks) with throughput ranging from 200-275 M values/second for encoding and 170-270 M values/second for decoding on modern hardware.

Key Features:

  • Lossless compression verified to 1e-9 tolerance (100% pass rate on 10M+ test values)
  • Multiple algorithm variants optimized for different use cases
  • Zero external dependencies (C++20 STL only; header-only fast_float.h for testing)
  • Cache-optimized memory layout (64-byte alignment)
  • Hardware-accelerated bit manipulation (CLZ/CTZ instructions)
  • Comprehensive test suite with real-world stress testing (100M ticks validated)
  • Validated on 100M synthetic financial tick dataset with 65.2% identical values, 11.4% small changes

Architecture

Core Components

1. IEEE754 Utilities (ieee754.hpp)

Low-level IEEE754 bit manipulation using modern C++20 features:

  • std::bit_cast for type-punning without undefined behavior
  • Hardware-accelerated bit counting (std::countl_zero, std::countr_zero)
  • Bit field extraction for sign (1-bit), exponent (11-bit), mantissa (52-bit)
  • All functions are inline constexpr noexcept for zero runtime overhead

2. Bitstream I/O (bitstream.hpp)

Custom bit-level I/O implementation with performance optimizations:

linBuffer:

  • 64-byte cache-aligned linear buffer (L1 cache optimization)
  • Move-only semantics (deleted copy constructors)
  • No zero-initialization on hot paths
  • std::aligned_alloc for cache line alignment

bitWriter:

  • Variable-length bit packing into 64-bit words
  • Scratch buffer technique for efficient flushing
  • Branch prediction hints (__builtin_expect)
  • Little-endian bit ordering

bitReader:

  • Stateful bit-level reading from compressed buffers
  • Efficient cross-word boundary handling
  • Position tracking with byte offset and bit index

3. Compression Algorithms

Base Encoder/Decoder (encoder.hpp, decoder.hpp)

Algorithm: Gorilla-style XOR delta encoding

Format:
  First value:     [64 bits: raw IEEE754 double]
  Unchanged:       [1 bit:  '0']
  Changed:         [1 bit:  '1']
                   [5 bits: leading zeros count (0-31)]
                   [6 bits: meaningful length (1-63)]
                   [N bits: meaningful bits]

Encoding Process:

  1. Store first value as raw 64 bits
  2. For each subsequent value:
    • Compute delta = current XOR previous
    • If delta == 0: write single control bit 0
    • If delta != 0:
      • Count leading zeros (lzs) and trailing zeros (tzs)
      • Calculate meaningful length: len = 64 - lzs - tzs
      • Write control bit 1, lzs (5 bits), len (6 bits), meaningful bits

Compression Characteristics:

  • Identical consecutive values: 1 bit per value (>98% compression)
  • Small changes: 10-30 bits per value (60-80% compression)
  • Random changes: 40-76 bits per value (10-40% compression)
Adaptive Encoder/Decoder

Enhancement: Window reuse optimization

When consecutive deltas fit within the same leading/trailing zero window, reuses the previous encoding parameters. This saves 11 bits of header overhead per value when precision is stable.

Format (when reusing):
  [1 bit: changed '1']
  [1 bit: reuse '0']
  [N bits: meaningful bits from previous window]

Benefit: 10-15% additional compression on stable time-series data.

Fast Encoder/Decoder

Optimizations:

  • __attribute__((always_inline)) on hot path functions
  • Direct use of __builtin_clzll and __builtin_ctzll (hardware instructions)
  • Combined header writes (lzs + len packed into single 11-bit operation)
  • Branch prediction hints for common paths
  • Non-virtual fastAppend() method for zero vtable overhead

Use Case: Maximum throughput scenarios where compression ratio can be sacrificed for speed.

Precision Encoder/Decoder (precision_encoder.hpp, precision_decoder.hpp)

Innovation: Mantissa precision tracking

Monitors which mantissa bits actually vary across values and maintains a "significant_bits" state representing the current precision level.

Format:
  Unchanged:         [1 bit: '0']
  Changed + Reuse:   [1 bit: '1'][1 bit: '0'][N bits: data]
  Changed + New:     [1 bit: '1'][1 bit: '1'][5 bits: lzs][6 bits: len][N bits: data]

Reuse Condition:

  • Current leading zeros >= previous leading zeros
  • Current meaningful length <= tracked significant bits

Benefit: Better compression when only lower-order mantissa bits change (common in financial tick data).

Adaptive Precision Encoder/Decoder

Sophistication: Statistical precision tracking with predictive adjustment

Features:

  • Sliding window history (16 samples) for precision/leading-zero patterns
  • Statistical analysis: average precision, volatility (min/max range), stability detection
  • Regime detection: distinguishes stable vs volatile periods
  • Predictive precision adjustment with ±2 bit tolerance
  • Hysteresis prevents oscillation between precision levels

Data Structures:

  • Circular buffer (power-of-2 sized for fast modulo via idx & (SIZE-1))
  • Running statistics computed amortized (every 4th update)
  • Counters for reuse rate, precision changes, stability

Algorithm: Balances header overhead vs bit savings using historical patterns. In stable regimes, aggressively reuses precision; in volatile periods, adapts quickly to new precision levels.

Metrics Tracked:

double getPrecisionReuseRate() const;  // Percentage of precision reuses
int getAvPrecision() const;            // Average precision level
int getPrecisionVol() const;           // Precision volatility (max - min)
Fast Precision Encoder/Decoder

Combines precision tracking with fast-path optimizations (inlining, hardware intrinsics, branch hints).


Performance Characteristics

Optimization Techniques

Compiler-Level:

  • -O3 -march=native -std=c++20 compilation flags
  • Always-inline hot path functions
  • Branch prediction hints (__builtin_expect)
  • Zero-cost abstractions (inline, noexcept)

Hardware-Level:

  • 64-byte cache-aligned memory (prevents false sharing)
  • Hardware bit manipulation instructions (CLZ: Count Leading Zeros, CTZ: Count Trailing Zeros)
  • Sequential access patterns for cache locality

Algorithmic:

  • Amortized statistics computation (every 4th update in adaptive encoders)
  • Scratch buffer reduces memory write operations
  • Power-of-2 buffer sizes enable fast modulo via bitwise AND
  • Minimal branching in critical paths

Memory:

  • Pre-allocated buffers (no reallocation overhead)
  • No zero-initialization of buffers on hot paths
  • Move semantics (deleted copy constructors prevent expensive copies)
  • RAII principles ensure exception-safe resource management

Benchmark Results (100M Real Financial Tick Data)

Test Dataset Characteristics:

  • Size: 100,000,000 tick values (800 MB raw)
  • Price Range: $38.85 - $105.88 (mean: $79.14, σ: $16.84)
  • Data Distribution:
    • Identical consecutive: 65.2% (65,235,251 ticks)
    • Small changes (≤$0.01): 11.4% (11,354,912 ticks)
    • Medium changes (≤$0.05): 23.4% (23,409,836 ticks)
    • Maximum change: $0.04

Encoding Performance (M values/sec):

Encoder Variant Throughput Compressed Size Ratio Space Saved
Basic Encoder 228.68 M/s 238.25 MB 3.36:1 70.2%
Adaptive Encoder 214.06 M/s 245.56 MB 3.26:1 69.3%
Fast Encoder 268.12 M/s 238.25 MB 3.36:1 70.2%
Precision Encoder 216.27 M/s 245.65 MB 3.26:1 69.3%
Adaptive Precision 201.77 M/s 245.65 MB 3.26:1 69.3%
Fast Precision 274.36 M/s 245.65 MB 3.26:1 69.3%

Decoding Performance (M values/sec):

Decoder Variant Throughput
Basic Decoder 171.85 M/s
Adaptive Decoder 194.44 M/s
Fast Decoder 255.60 M/s
Precision Decoder 196.81 M/s
Adaptive Precision 183.15 M/s
Fast Precision 271.15 M/s

Key Observations:

  • Fast variants achieve 25-35% higher throughput than base implementations
  • Compression ratio: 3.26-3.36:1 (~69-70% space savings)
  • All decoders verified lossless (tolerance: 1e-9)
  • Fast Precision Encoder achieves best overall performance: 274 M/s encoding, 271 M/s decoding

Build System

Requirements

  • Compiler: C++20 compliant compiler (tested with Apple Clang 17.0.0, GCC 11+)
  • Platform: Linux, macOS (ARM64 and x86-64), Windows (with MinGW)
  • Dependencies: None (header-only external library fast_float.h included for CSV parsing in tests)

Compilation

# Build all tests and benchmarks
make

# Clean build artifacts
make clean

Build Configuration:

CXX      := g++
CXXFLAGS := -std=c++20 -O3 -march=native -Wall -Wextra -Wpedantic

Flags Explained:

  • -std=c++20: Modern C++ features (std::bit_cast, std::countl_zero)
  • -O3: Maximum optimization
  • -march=native: CPU-specific optimizations (SIMD, hardware bit instructions)
  • -Wall -Wextra -Wpedantic: Strict warning levels

Testing Infrastructure

Validated with:

  • 24 total encoder/decoder correctness tests
  • 10M–100M value stress tests
  • Edge case coverage (identical values, drift, alternating patterns, boundary conditions)
  • Full lossless verification (1e-9 tolerance)

Stress Test

100M financial tick dataset (800MB raw) - all encoder/decoder pairs verified lossless.

Implementation: tests/stress_test.cpp (431 lines)

Actual Test Output (100M ticks):

Loading data from: data/market_data.csv
Loaded 100000000 data points.

Data Characteristics:

Price Stats:
Total ticks: 100000000
Price range: $38.85 - $105.88
Mean price: $79.14
Std deviation: $16.84

Change Patterns:
Identical consecutive: 65235251 (65.2%)
Small changes (<=$0.01): 11354912 (11.4%)
Medium changes (<=$0.05): 23409836 (23.4%)
Maximum change: $0.0400

Testing Correctness:
Testing Basic Encoder/Decoder correctness.
PASS (lossless).

Testing Adaptive Encoder/Adaptive Decoder correctness.
PASS (lossless).

Testing Precision Encoder/Precision Decoder correctness.
PASS (lossless).

Testing Adaptive Precision Encoder/Adaptive Precision Decoder correctness.
PASS (lossless).

Correctness test passed.
Lossless compression/decompression verified.

Encoder Performance Benchmarking
Encoder                         Throughput      Compressed             Ratio         Saved
Basic Encoder                   228.68 M/s      238254128 bytes        3.36:1        70.2%
Adaptive Encoder                214.06 M/s      245558784 bytes        3.26:1        69.3%
Fast Encoder                    268.12 M/s      238254128 bytes        3.36:1        70.2%
Adaptive Encoder                216.27 M/s      245652384 bytes        3.26:1        69.3%
Adaptive Precision Encoder      201.77 M/s      245652384 bytes        3.26:1        69.3%
Fast Precision Encoder          274.36 M/s      245652384 bytes        3.26:1        69.3%

Decoder Performance Benchmarking
Decoder                        Throughput
Basic Decoder                  171.85 M/s
Adaptive Decoder               194.44 M/s
Fast Decoder                   255.60 M/s
Precision Decoder              196.81 M/s
Adaptive Precision Decoder     183.15 M/s
Fast Precision Decoder         271.15 M/s

Summary:
All tests complete!

Usage Examples

Basic Encoding/Decoding

#include "encoder.hpp"
#include "decoder.hpp"
#include <vector>

int main() {
    std::vector<double> data = {100.0, 100.01, 100.02, 100.01, 100.0};

    // Encoding
    comp::Encoder encoder(data.size() * 128);  // Buffer sized for worst case
    for (const auto& val : data) {
        encoder.append(val);
    }
    encoder.fin();  // Flush remaining bits

    // Access compressed data
    size_t compressed_size = encoder.sizeInBytes();
    const comp::linBuffer& buffer = encoder.get_buffer();

    // Decoding
    comp::Decoder decoder(buffer.front(), buffer.size());
    for (size_t i = 0; i < data.size(); ++i) {
        double decoded;
        if (decoder.next(decoded)) {
            // Use decoded value
            assert(std::abs(decoded - data[i]) < 1e-9);  // Lossless
        }
    }

    return 0;
}

Fast Encoder (Maximum Throughput)

#include "encoder.hpp"
#include <vector>

int main() {
    std::vector<double> prices;
    // ... populate prices from market feed ...

    comp::FastEncoder encoder(prices.size() * 128);

    for (const auto& price : prices) {
        encoder.fastAppend(price);  // Non-virtual, always-inlined
    }

    encoder.fin();

    // Compressed data ready for transmission/storage
    size_t bytes = encoder.sizeInBytes();

    return 0;
}

Adaptive Precision Encoder (Best Compression)

#include "precision_encoder.hpp"
#include "precision_decoder.hpp"
#include <iostream>

int main() {
    std::vector<double> tick_data;
    // ... load financial tick data ...

    comp::AdaptivePrecisionEncoder encoder(tick_data.size() * 128);

    for (const auto& tick : tick_data) {
        encoder.append(tick);
    }

    encoder.fin();

    // Access statistics
    std::cout << "Average precision: " << encoder.getAvPrecision() << " bits\n";
    std::cout << "Precision volatility: " << encoder.getPrecisionVol() << " bits\n";
    std::cout << "Precision reuse rate: "
              << (encoder.getPrecisionReuseRate() * 100.0) << "%\n";

    // Compression metrics
    size_t raw_size = tick_data.size() * sizeof(double);
    size_t compressed_size = encoder.sizeInBytes();
    double compression_ratio = (double)raw_size / compressed_size;

    std::cout << "Compression ratio: " << compression_ratio << ":1\n";
    std::cout << "Space saved: "
              << (100.0 * (1.0 - 1.0 / compression_ratio)) << "%\n";

    // Decode
    comp::AdaptivePrecisionDecoder decoder(
        encoder.get_buffer().front(),
        encoder.get_buffer().size()
    );

    std::vector<double> decoded;
    double val;
    while (decoder.next(val)) {
        decoded.push_back(val);
    }

    return 0;
}

Data Generation

Includes Python script for:

  • Yahoo Finance data fetch
  • Realistic microstructure simulation (random walk + discrete tick noise)

Used to generate 100M tick dataset for benchmarking.


Benchmarking

Available Benchmarks (benchmarks/)

  1. initial-benchmark.cpp - Basic encoder comparison
  2. adaptive_encoder_benchmark.cpp - Adaptive vs base encoder
  3. fast_encoder_benchmark.cpp - Fast variant throughput
  4. adaptive_decoder_benchmark.cpp - Decoding performance
  5. fast_decoder_benchmark.cpp - Fast decoding throughput
  6. AdaptivePrecisionDecoder-Analysis.cpp - Detailed analysis with CSV export

Running Benchmarks

# Compile all benchmarks
make clean && make

# Run individual benchmarks
./bin/adaptive_encoder_benchmark
./bin/fast_encoder_benchmark

# Run comprehensive analysis
./bin/AdaptivePrecisionDecoder-Analysis

# Run full test suite (as validated)
./bin/adaptive_encoder_benchmark && \
./bin/AdaptivePrecisionDecoder-Analysis && \
./bin/fast_encoder_benchmark && \
./bin/adaptive_integrity_test && \
./bin/precision_encoder_test && \
./bin/precision_decoder_test && \
./bin/stress_test data/market_data.csv

Technical Details

Memory Layout

Buffer Alignment:

// 64-byte alignment matches typical L1 cache line size
void* buffer = std::aligned_alloc(64, capacity);

Benefits:

  • Prevents false sharing in multi-threaded scenarios
  • Maximizes cache line utilization
  • Reduces cache misses

Scratch Buffer Pattern:

class bitWriter {
    uint64_t scratch;    // Accumulates bits until full
    int scratch_bits;    // Tracks current fill level

    void write_bits(uint64_t val, int nbits) {
        scratch |= (val << scratch_bits);
        scratch_bits += nbits;

        if (scratch_bits >= 64) {
            flush();  // Write to main buffer only when necessary
        }
    }
};

Rationale: Minimizes memory write operations, improves write bandwidth.

Bit Manipulation

IEEE754 Double Format:

[Sign: 1 bit][Exponent: 11 bits][Mantissa: 52 bits]
 ↑             ↑                   ↑
 63           62-52               51-0

Type-Safe Bit Casting:

// Modern C++20 approach (no undefined behavior)
inline uint64_t to_uint64(double val) noexcept {
    return std::bit_cast<uint64_t>(val);
}

// Avoid this (undefined behavior via pointer aliasing):
// uint64_t u64 = *reinterpret_cast<uint64_t*>(&val);  // UB!

Hardware-Accelerated Counting:

// Compiles to single CPU instruction (e.g., BSR on x86, CLZ on ARM)
int leading_zeros = __builtin_clzll(value);   // Count Leading Zeros
int trailing_zeros = __builtin_ctzll(value);  // Count Trailing Zeros

Compression Format Wire Protocol

Base Encoder Format:

[First value: 64 bits raw IEEE754]
[Control bit: 1 bit]
  ↓ if 0: value unchanged
  ↓ if 1: value changed
    [Leading zeros: 5 bits]  // 0-31
    [Meaningful length: 6 bits]  // 1-63
    [Meaningful bits: N bits]  // N = meaningful length

Precision Encoder Format:

[First value: 64 bits raw IEEE754]
[Control bit: 1 bit]
  ↓ if 0: value unchanged
  ↓ if 1: value changed
    [Precision control: 1 bit]
      ↓ if 0: reuse precision
        [Meaningful bits: N bits]  // N = tracked significant_bits
      ↓ if 1: new precision
        [Leading zeros: 5 bits]
        [Meaningful length: 6 bits]
        [Meaningful bits: N bits]

Decoding Algorithm:

// Reconstruct original value from XOR delta
uint64_t meaningful_bits = read_bits(len);
uint64_t del = meaningful_bits << trailing_zeros;
uint64_t reconstructed = del ^ previous_value;
double decoded = to_double(reconstructed);

Algorithm Analysis

Theoretical Compression Bounds

Best Case: Identical consecutive values

Average Case: Financial tick data with small consecutive changes

Worst Case: Random uncorrelated values

Complexity Analysis

Encoding Time Complexity: O(n) where n = number of values

Decoding Time Complexity: O(n)

Space Complexity: O(n) for buffer storage


References

Academic Papers

  1. Gorilla: A Fast, Scalable, In-Memory Time Series Database (Facebook, 2015)

  2. Lightweight Compression for Time Series Data (Alibaba, 2019)

IEEE754 Specification

  • IEEE Standard 754-2019: IEEE Standard for Floating-Point Arithmetic
  • Wikipedia

Last Updated: February 2026

About

A bit-level compression engine for IEEE-754 double time-series data. Uses XOR-data encoding, leading/trailing zero elision, and adaptive precision bucketing. Designed for cache-efficient encoding with benchmarks for cycle/values and cache misses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors