Skip to content

More flexibility on numerical/string doc id and quantization#39

Merged
amallia merged 24 commits intomasterfrom
amallia
Jul 27, 2025
Merged

More flexibility on numerical/string doc id and quantization#39
amallia merged 24 commits intomasterfrom
amallia

Conversation

@amallia
Copy link
Member

@amallia amallia commented Jul 26, 2025

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds more flexibility to the JSONL to CIFF converter by allowing document IDs to be either strings or numbers (automatically converted to strings) and introducing an optional quantization feature for scores. The changes enable users to work with numerical document IDs and choose between pre-quantized integer scores or automatic 8-bit scalar quantization.

Key Changes:

  • Added support for both string and numerical document IDs with automatic conversion to strings
  • Introduced optional 8-bit scalar quantization that maps score ranges to integers 1-256
  • Modified the CLI to include a quantize flag for enabling the new quantization behavior

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/lib.rs Implements custom deserializer for flexible doc IDs and adds quantization logic with two-pass processing
src/jsonl2ciff.rs Adds quantize CLI flag and updates converter configuration

@elshize elshize self-requested a review July 27, 2025 19:10
Copy link
Member

@elshize elshize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🔥

@amallia amallia merged commit 0689e2a into master Jul 27, 2025
11 checks passed
@amallia amallia deleted the amallia branch July 27, 2025 19:10
@amallia
Copy link
Member Author

amallia commented Jul 27, 2025

Thanks Michal for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants