Skip to content

Conversation

@ryankert01
Copy link
Contributor

@ryankert01 ryankert01 commented Dec 23, 2025

Refactor QDP to Support Multiple Input Types

Problem

In QDP, we had/want to support multiple input types (we now support parquet/arrow ipc, we want to add more like numpy, torch). The solution needed to:

  1. Make it relatively easy to add more input types
  2. Not sacrifice speed or memory

Created a flexible, trait-based system that achieves all goals:

Core Architecture

  • DataReader trait: Basic batch reading interface
    • read_batch()
    • get_sample_size()
    • get_num_samples()
  • StreamingDataReader trait: Advanced streaming for large files
    • read_chunk()
    • total_rows()
  • Format implementations: Parquet (batch + streaming), Arrow IPC (batch), NumPy (batch)
  • Placeholders: PyTorch (with implementation guide)

Zero Performance Impact

  • Static dispatch: No virtual function overhead
  • Memory efficient: Maintains streaming (O(1) memory for any file size)
  • Zero-copy: Direct buffer access where possible (NumPy uses into_raw_vec_and_offset)
  • Benchmarks: Same performance as before refactoring + new NumPy benchmark

Benchmark

python benchmark_numpy_io.py --qubits 16 --samples 1000

For reviewer

qdp/docs/readers/README.md

@rich7420 rich7420 changed the title refactor: introduce 2 traits for flexible io type w/ example [QDP] refactor: introduce 2 traits for flexible io type w/ example Dec 23, 2025
@rich7420 rich7420 requested a review from guan404ming December 23, 2025 09:44
@400Ping
Copy link

400Ping commented Dec 24, 2025

Nice PR, I could help with the PyTorch implementation part.

@ryankert01
Copy link
Contributor Author

PTAL @guan404ming @rich7420 @400Ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants