Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
15805c2
Add MSVC support for Windows port
peterboncz Mar 28, 2026
9eb2e6c
had forgotten to add the split interpreter files (were split to make …
peterboncz Mar 28, 2026
7be131b
fixes to get CI back working
peterboncz Mar 28, 2026
cd428a2
hmm.. this CI seems a bit rusty -- some more bumps and fixes
peterboncz Mar 28, 2026
9bffa5e
more minor fixes to pacify CI (probably not the last ones)
peterboncz Mar 28, 2026
9ba018e
remove unused includes
peterboncz Mar 28, 2026
436d9c6
new test matrix
peterboncz Mar 28, 2026
11884a3
attempt #1 at fixing mvsc++ build
peterboncz Mar 28, 2026
f0dba9c
attempt #2 at fixing mvsc++ build
peterboncz Mar 28, 2026
252b13b
attempt #3 at fixing mvsc++ build
peterboncz Mar 28, 2026
18dbd88
attempt #4 at fixing mvsc++ build
peterboncz Mar 28, 2026
46ffe2a
attempt #5 at fixing mvsc++ build
peterboncz Mar 28, 2026
d03ab56
attempt #6 at fixing mvsc++ build
peterboncz Mar 28, 2026
bad405b
attempt #7 at fixing mvsc++ build
peterboncz Mar 28, 2026
2fde9b3
attempt #8 at fixing mvsc++ build
peterboncz Mar 28, 2026
f710ad1
attempt #9 at fixing mvsc++ build
peterboncz Mar 28, 2026
04a63b2
attempt #10 at fixing mvsc++ build
peterboncz Mar 28, 2026
5f16c17
attempt #11 at fixing mvsc++ build
peterboncz Mar 28, 2026
2ffa803
attempt #12 at fixing mvsc++ build
peterboncz Mar 28, 2026
0495d68
attempt #13 at fixing mvsc++ build
peterboncz Mar 28, 2026
58dbb04
attempt #14 at fixing mvsc++ build
peterboncz Mar 28, 2026
53d08be
attempt #15 at fixing mvsc++ build
peterboncz Mar 28, 2026
bbc588b
attempt #16 at fixing mvsc++ build
peterboncz Mar 28, 2026
0f4163a
attempt #17 at fixing mvsc++ build
peterboncz Mar 28, 2026
16ea23b
back to attempt #8
peterboncz Mar 28, 2026
079a80a
go back at attempt #11
peterboncz Mar 28, 2026
4124b31
try to evolve #11 so that mvsc works..
peterboncz Mar 28, 2026
2b99c2f
try to evolve #11 so that mvsc works.. take#2
peterboncz Mar 28, 2026
76f7e5e
try to evolve #11 so that mvsc works.. take#3
peterboncz Mar 29, 2026
53d2538
try to evolve #11 so that mvsc works.. take#4
peterboncz Mar 29, 2026
21b09e5
try to evolve #11 so that mvsc works.. take#5
peterboncz Mar 29, 2026
196b8bb
try to evolve #11 so that mvsc works.. take#6
peterboncz Mar 29, 2026
84a13fa
try to evolve #11 so that mvsc works.. take#7
peterboncz Mar 29, 2026
36376a1
try to evolve #11 so that mvsc works.. take#8
peterboncz Mar 29, 2026
4657ca2
try to evolve #11 so that mvsc works.. take#10
peterboncz Mar 29, 2026
13ee6fd
try to evolve #11 so that mvsc works.. take#11
peterboncz Mar 30, 2026
44eaf9e
use std::range to allow also some older clang's to be used
peterboncz Mar 30, 2026
9cc1ea4
attempt at fixing SINGLE_COLUMN_JPEG test
peterboncz Mar 30, 2026
e6c6bf9
Fix three MSVC portability bugs caused by 32-bit unsigned long on Win…
peterboncz Mar 31, 2026
ec98391
make format
peterboncz Mar 31, 2026
8c4e2ab
fix casting
peterboncz Mar 31, 2026
4e9387b
small fix
peterboncz Mar 31, 2026
9d295c3
add missing includes
peterboncz Mar 31, 2026
d4bacc0
make sure the null map is always allocated
peterboncz Mar 31, 2026
f9669c4
Fix three MSVC test failures: GALP null deref, fill_in UB, and GTest …
peterboncz Apr 2, 2026
315e782
Merge branch 'windows-port' of github.com:cwida/FastLanes into window…
peterboncz Apr 2, 2026
8769826
format-fix
peterboncz Apr 2, 2026
4349bc5
Merge branch 'windows-port' of github.com:cwida/FastLanes into window…
peterboncz Apr 2, 2026
5a9894f
now that victory over the bugs is achieved move to get DLLs
peterboncz Apr 2, 2026
d11f2ef
make format
peterboncz Apr 2, 2026
2655d81
enable shared library (DLL) builds with explicit FLS_API symbol expor…
peterboncz Apr 2, 2026
ea2af7b
fix header
peterboncz Apr 2, 2026
f58eb48
format-fix
peterboncz Apr 2, 2026
428e9fe
remove redundant include
peterboncz Apr 3, 2026
9066389
trying to get to fully green
peterboncz Apr 3, 2026
57c3951
- add gcc as acompiler
peterboncz Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .github/actions/generate-dataset/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ runs:
# 2️⃣ Remove any .venv that might have been created earlier
# with Python 3.13, so we always recreate it with 3.12.
# ─────────────────────────────────────────────────────────────
- name: Remove stale virtual-env
- name: Recreate virtual-env with the correct Python
shell: bash
run: rm -rf "$GITHUB_WORKSPACE/.venv"
run: |
rm -rf "$GITHUB_WORKSPACE/.venv"
python3 -m venv "$GITHUB_WORKSPACE/.venv"

# ─────────────────────────────────────────────────────────────
# 3️⃣ Generate the synthetic data
Expand All @@ -36,5 +38,6 @@ runs:
# 4️⃣ Generate the sentence embeddings
# ─────────────────────────────────────────────────────────────
- name: Generate embeddings
if: runner.os != 'Windows'
shell: bash
run: make generate-embeddings
6 changes: 3 additions & 3 deletions .github/workflows/benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ run-name: >-
on:
push:
branches: [ '*' ]
branches: [ main, dev ]
pull_request:
branches: [ '*' ]
branches: [ main, dev ]

concurrency:
group: Benchmarker CI-${{ github.ref }}
Expand All @@ -28,7 +28,7 @@ jobs:
strategy:
fail-fast: false
matrix:
platform: [ ubuntu-latest, macos-latest ]
platform: [ ubuntu-latest ]

defaults:
run:
Expand Down
238 changes: 155 additions & 83 deletions .github/workflows/cpp.yaml

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions .github/workflows/examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ run-name: >-

on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]

Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/flatbuffers-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ run-name: >-
# Trigger on every push & PR, on all branches
# ─────────────────────────────────────────────────────────────
on:
push: # no branches filter ⇒ every branch
pull_request: # no branches filter ⇒ every target branch
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]
concurrency:
group: flatbuffers-${{ github.ref }}
cancel-in-progress: true
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/fsst.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ run-name: >-
# ──────────────────────────────────────────────────────────────────────────────
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ "main", "dev" ]
branches: [ main, dev ]

# Cancel in-flight runs on the same branch/PR so we do not waste minutes
concurrency:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/header-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ run-name: >-

on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ run-name: >-
# ────────────────────────────────────────────────────────
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main, dev ]
workflow_dispatch:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ run-name: >-

on:
push:
branches: [ main, dev ]
pull_request:
branches: [ "main", "dev" ]
branches: [ main, dev ]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,8 @@ rust/target/
skbuild-*/
.idea/
.venv/

# Windows build artifacts
build_win.bat
cmake_output.txt
build_win/
121 changes: 121 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What is FastLanes

FastLanes is a C++20 columnar compression storage format — "Like Parquet, but with 40% better compression and 40× faster decoding." Zero external dependencies, SIMD-friendly without explicit SIMD instructions. Bindings exist for Python (`python/`), Rust (`rust/`), C (`src/c_api/`), and CUDA (`cuda/`).

## Build Commands

FastLanes uses CMake 3.22+ with Ninja. On Linux/macOS it requires Clang >= 13. On Windows it uses MSVC (set up via `vcvarsall.bat`).

### Configure and build (Release with tests)
```bash
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DFLS_BUILD_TESTING=ON
cmake --build build --parallel
```

### Run all tests
```bash
cd build && ctest -j4 --output-on-failure --timeout 300 -E QuickFuzz
```

### Run a single test by filter
```bash
build/test/src/dataset_tests/dataset_tests.exe --gtest_filter=FastLanesReaderTester.issue_000
```

### Run a single test target
```bash
cmake --build build --target unit_test && ctest -R unit_test --output-on-failure
```

### Key CMake options
| Option | Default | Purpose |
|--------|---------|---------|
| `FLS_BUILD_TESTING` | OFF | Build tests (fetches GoogleTest v1.15.2) |
| `FLS_BUILD_SHARED_LIBS` | OFF | Build as shared library (DLL) instead of static |
| `FLS_BUILD_BENCHMARKING` | OFF | Build benchmarks |
| `FLS_BUILD_PYTHON` | OFF | Build Python bindings |
| `FLS_BUILD_CUDA` | OFF | Build CUDA reader |
| `FLS_ENABLE_CLANG_TIDY` | OFF | Enable clang-tidy on all targets |

### Windows-specific (MSVC)

Invoke builds via a `.bat` that calls `vcvarsall.bat` first. Example pattern:
```bat
call "C:\Program Files\Microsoft Visual Studio\...\vcvarsall.bat" arm64
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DFLS_BUILD_TESTING=ON
cmake --build build --parallel
```

Test data can be cached across builds by setting `FASTLANES_DATA_DIR` environment variable to an existing data directory (e.g., `build_release/_deps/data-src`).

### Format code
The project uses `.clang-format` (LLVM base, tabs, 120-column limit). Run clang-format on changed files before committing.

## Architecture

### Public API

The main entry point is `fastlanes::Connection` (in `src/include/fls/connection.hpp`):
```cpp
auto conn = fastlanes::connect();
conn->read_csv("input/"); // ingest CSV
conn->to_fls("output/"); // write FastLanes format
auto reader = conn->read_fls("data.fls"); // read back
auto table = reader->materialize();
```

`TableReader` provides rowgroup-level random access. `RowgroupReader` decompresses individual rowgroups. The reader stack: `TableReader` → `RowgroupReader` → `RowgroupView` → `ColumnView` → `SegmentView`.

### Library structure

All source is under `src/`. Each subdirectory builds an OBJECT library that gets linked into the single `FastLanes` library target. Key components:

- **`cor/`** — Core: architecture detection, CPU features, layout (`Buf`), compression/decompression engines
- **`expression/`** — Expression-based encoding: physical expressions, operators (RLE, FSST, ALP, dict, delta, etc.), interpreter
- **`encoder/`** — High-level encoding pipeline, materializer (decompression)
- **`wizard/`** — Schema discovery: analyzes data and selects optimal encoding per column
- **`reader/`** — File reading: segments, column views, rowgroup views, table reader
- **`table/`** — In-memory table representation: `Rowgroup`, `Table`, `Vector`, typed columns
- **`footer/`** — FlatBuffers-generated metadata descriptors (table, rowgroup, column, segment)
- **`alp/`** — ALP (Adaptive Lossless Floating-Point) compression codec
- **`primitive/`** — Low-level primitives: bitpacking, patching, FSST string compression

### DLL / Shared library support (Windows)

The `FLS_API` macro in `src/include/fls/api/api.hpp` controls symbol visibility:
- `FLS_STATIC` defined → `FLS_API` is empty (static build)
- `FLS_BUILD_DLL` defined → `FLS_API` is `__declspec(dllexport)` (building the DLL)
- Neither defined → `FLS_API` is `__declspec(dllimport)` (consuming the DLL)

When `FLS_BUILD_SHARED_LIBS=ON`, `FLS_BUILD_DLL` is set directory-scoped via `add_compile_definitions` in `src/CMakeLists.txt` so all object libraries under `src/` get it. Test targets (under `test/`) don't get it, so `FLS_API` correctly resolves to `dllimport` for them.

Any public function or class that test code (or external consumers) calls across the DLL boundary must be marked `FLS_API`. For template functions, the explicit instantiations in the .cpp must also carry `FLS_API`.

Note: `WINDOWS_EXPORT_ALL_SYMBOLS` does NOT work for this project — the symbol count exceeds the 65535 .def file limit.

**MSVC dllexport gotchas:** MSVC eagerly instantiates all special member functions for `__declspec(dllexport)` classes. This causes two problems:

1. **Non-copyable members** (e.g., `vector<unique_ptr<T>>`): MSVC tries to generate copy ctor/assign and fails. Fix: explicitly `= delete` copy operations on the class.

2. **Incomplete types in unique_ptr**: MSVC tries to generate the destructor inline, which needs the complete type. Fix: either include the complete type's header, or declare the destructor in the header and define it `= default` in the .cpp where the type is complete.

### Type aliases

Defined in `src/include/fls/common/alias.hpp`:
- `n_t` = `uint64_t` (counts), `idx_t` = `uint32_t` (indices), `bw_t` = `uint8_t` (bit width)
- `up<T>` = `unique_ptr<T>`, `sp<T>` = `shared_ptr<T>`

### Test structure

Tests live in `test/src/` with six suites: `dataset_tests`, `expression_tests`, `fls_reader_tests`, `primitive_tests`, `quick_fuzz_tests`, `unit_tests`. All use GoogleTest. On MSVC, a `msvc_heap_guard` object library handles SEH guard-page exceptions that would otherwise cause spurious test failures.

## Code Style

- `.clang-tidy` is strict: `WarningsAsErrors: '*'` — all warnings are errors
- Types: `CamelCase`. Functions: `aNy_CasE`. Members: `lower_case` (private: `m_` prefix). Constants: `UPPER_CASE`. Typedefs: `lower_case` with `_t` suffix
- Tabs for indentation, 120-column limit
- PRs target `dev` branch
Loading
Loading