feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310
Open
rich7420 wants to merge 1 commit into
Open
feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310rich7420 wants to merge 1 commit into
rich7420 wants to merge 1 commit into
Conversation
The single-sample f32 zero-copy entry point existed as a standalone `pub unsafe fn` on each encoder type, while the batch variant (`encode_batch_from_gpu_ptr_f32`) lived on the `QuantumEncoder` trait with a default `NotImplemented` body. That asymmetry meant adding a new encoder with f32 zero-copy required updating the dispatcher in three separate places. Changes: - Add `QuantumEncoder::encode_from_gpu_ptr_f32` with a default body returning `MahoutError::NotImplemented`, mirroring the existing batch variant exactly. Cfg-gated to Linux/CUDA same as the batch sibling. - Override on `AmplitudeEncoder`, `AngleEncoder`, `BasisEncoder` — each delegate to the existing inherent `_with_stream` workhorse fn so hot paths can keep calling without a vtable. - Add `AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream` as a standalone inherent fn matching the angle/basis pattern. The kernel-launch + L2-norm sequence was previously inlined in `QdpEngine::encode_from_gpu_ptr_f32_with_stream`; that engine method now delegates to the new encoder fn (-62 LoC inline, +0 net behaviour). - Add 4 trait-dispatch tests in `tests/gpu_ptr_encoding.rs`: amplitude/angle/basis success via trait method, plus phase's default body returning a `NotImplemented` with the expected message. Verified on RTX 2080 Ti: - 76 lib + 12 + 64 GPU + 6 type tests passing - Baseline throughput: amplitude 14,316 / angle 204,209 / basis 648,117 vec/s (no regression vs post-apache#1275 baseline)
8 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors the f32 single-sample zero-copy entry point onto the QuantumEncoder trait with a default NotImplemented body, mirroring the existing batch sibling. Encoders that support f32 (Amplitude, Angle, Basis) override the trait method and delegate to their inherent _with_stream workhorse; Amplitude gains the missing inherent workhorse (extracted verbatim from QdpEngine). Pure structural change, no behavior or kernel changes.
Changes:
- Add
encode_from_gpu_ptr_f32toQuantumEncoder(Linux/CUDA-gated) with defaultNotImplementedbody, and override on Amplitude/Angle/Basis. - Extract Amplitude's inlined f32 single-sample logic out of
QdpEngine::encode_from_gpu_ptr_f32_with_streaminto a newAmplitudeEncoder::encode_from_gpu_ptr_f32_with_streamworkhorse. - Add four trait-dispatch integration tests in
gpu_ptr_encoding.rscovering amplitude/angle/basis success and phase defaultNotImplemented.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| qdp/qdp-core/src/gpu/encodings/mod.rs | Adds the f32 single-sample method to the QuantumEncoder trait with a default NotImplemented body. |
| qdp/qdp-core/src/gpu/encodings/amplitude.rs | Adds trait override and new inherent encode_from_gpu_ptr_f32_with_stream workhorse extracted from lib.rs. |
| qdp/qdp-core/src/gpu/encodings/angle.rs | Adds trait override delegating to the existing inherent workhorse. |
| qdp/qdp-core/src/gpu/encodings/basis.rs | Adds trait override delegating to the existing inherent workhorse (drops unused input_len). |
| qdp/qdp-core/src/lib.rs | QdpEngine::encode_from_gpu_ptr_f32_with_stream now delegates to the encoder workhorse. |
| qdp/qdp-core/tests/gpu_ptr_encoding.rs | Adds four trait-dispatch tests via Encoding::encoder(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues
Changes
Why
The single-sample f32 zero-copy entry point exists as a standalone
pub unsafe fnon each encoder type (AmplitudeEncoder/AngleEncoder/BasisEncoder), while the batch sibling (encode_batch_from_gpu_ptr_f32) lives on theQuantumEncodertrait with a defaultNotImplementedbody.That asymmetry means adding a new encoder with f32 zero-copy support requires updating three separate touch points (inherent fn on the encoder, dispatcher in
engine.rs, engine-level entry method) instead of one. The batch side already enforces the trait-override discipline; the single-sample side should too.This is a pure structural refactor — no new functionality, no behaviour change, no kernel changes.
How
Trait change (
qdp-core/src/gpu/encodings/mod.rs)Add
encode_from_gpu_ptr_f32toQuantumEncoderwith a default body returningMahoutError::NotImplemented. Signature mirrors the existingencode_batch_from_gpu_ptr_f32. Cfg-gated to Linux / CUDA, same as the batch sibling.Encoder overrides
AmplitudeEncoder,AngleEncoder,BasisEncodereach gain a traitimplthat delegates to the existing inherent_with_streamworkhorse fn. Hot paths can keep calling the inherent fn directly without going through a vtable.NotImplementedbody — same pattern as the batch sibling.Amplitude alignment (
qdp-core/src/gpu/encodings/amplitude.rs)AmplitudeEncoderpreviously had no standaloneencode_from_gpu_ptr_f32_with_stream— the L2-norm + kernel-launch sequence was inlined inQdpEngine::encode_from_gpu_ptr_f32_with_stream(lib.rs). Add the missing inherent fn to mirror the angle / basis pattern, then haveQdpEnginedelegate to it (−62 LoC inline, no behaviour change).Tests (
qdp-core/tests/gpu_ptr_encoding.rs)Four new trait-dispatch tests route through
Encoding::encoder()(the trait object) rather than the inherent helper, so they exercise the new override mechanism:NotImplementedwith the expected messageLocal verification
gpu_ptr_encoding,gpu_fidelity,gpu_angle_encoding): all passing (incl. the 4 new ones)Checklist