feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait by rich7420 · Pull Request #1310 · apache/mahout

rich7420 · 2026-05-12T02:53:59Z

Related Issues

Changes

Why

The single-sample f32 zero-copy entry point exists as a standalone pub unsafe fn on each encoder type (AmplitudeEncoder / AngleEncoder / BasisEncoder), while the batch sibling (encode_batch_from_gpu_ptr_f32) lives on the QuantumEncoder trait with a default NotImplemented body.

That asymmetry means adding a new encoder with f32 zero-copy support requires updating three separate touch points (inherent fn on the encoder, dispatcher in engine.rs, engine-level entry method) instead of one. The batch side already enforces the trait-override discipline; the single-sample side should too.

This is a pure structural refactor — no new functionality, no behaviour change, no kernel changes.

How

Trait change (qdp-core/src/gpu/encodings/mod.rs)
Add encode_from_gpu_ptr_f32 to QuantumEncoder with a default body returning MahoutError::NotImplemented. Signature mirrors the existing encode_batch_from_gpu_ptr_f32. Cfg-gated to Linux / CUDA, same as the batch sibling.

Encoder overrides

AmplitudeEncoder, AngleEncoder, BasisEncoder each gain a trait impl that delegates to the existing inherent _with_stream workhorse fn. Hot paths can keep calling the inherent fn directly without going through a vtable.
Encoders that don't have an f32 zero-copy path yet (Phase, IQP, IQP-Z) inherit the default NotImplemented body — same pattern as the batch sibling.

Amplitude alignment (qdp-core/src/gpu/encodings/amplitude.rs)
AmplitudeEncoder previously had no standalone encode_from_gpu_ptr_f32_with_stream — the L2-norm + kernel-launch sequence was inlined in QdpEngine::encode_from_gpu_ptr_f32_with_stream (lib.rs). Add the missing inherent fn to mirror the angle / basis pattern, then have QdpEngine delegate to it (−62 LoC inline, no behaviour change).

Tests (qdp-core/tests/gpu_ptr_encoding.rs)
Four new trait-dispatch tests route through Encoding::encoder() (the trait object) rather than the inherent helper, so they exercise the new override mechanism:

amplitude / angle / basis success via trait method
phase's default body returns NotImplemented with the expected message

Local verification

Rust unit + integration: all passing
GPU-required tests (gpu_ptr_encoding, gpu_fidelity, gpu_angle_encoding): all passing (incl. the 4 new ones)
Python pytest: all passing
Clippy / fmt / license-header: clean
Baseline throughput (16 qubits, batch_size = 64, RTX 2080 Ti) holds: amplitude 14,316 / angle 204,209 / basis 648,118 vec/s

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes

The single-sample f32 zero-copy entry point existed as a standalone `pub unsafe fn` on each encoder type, while the batch variant (`encode_batch_from_gpu_ptr_f32`) lived on the `QuantumEncoder` trait with a default `NotImplemented` body. That asymmetry meant adding a new encoder with f32 zero-copy required updating the dispatcher in three separate places. Changes: - Add `QuantumEncoder::encode_from_gpu_ptr_f32` with a default body returning `MahoutError::NotImplemented`, mirroring the existing batch variant exactly. Cfg-gated to Linux/CUDA same as the batch sibling. - Override on `AmplitudeEncoder`, `AngleEncoder`, `BasisEncoder` — each delegate to the existing inherent `_with_stream` workhorse fn so hot paths can keep calling without a vtable. - Add `AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream` as a standalone inherent fn matching the angle/basis pattern. The kernel-launch + L2-norm sequence was previously inlined in `QdpEngine::encode_from_gpu_ptr_f32_with_stream`; that engine method now delegates to the new encoder fn (-62 LoC inline, +0 net behaviour). - Add 4 trait-dispatch tests in `tests/gpu_ptr_encoding.rs`: amplitude/angle/basis success via trait method, plus phase's default body returning a `NotImplemented` with the expected message. Verified on RTX 2080 Ti: - 76 lib + 12 + 64 GPU + 6 type tests passing - Baseline throughput: amplitude 14,316 / angle 204,209 / basis 648,117 vec/s (no regression vs post-apache#1275 baseline)

Copilot

Pull request overview

Refactors the f32 single-sample zero-copy entry point onto the QuantumEncoder trait with a default NotImplemented body, mirroring the existing batch sibling. Encoders that support f32 (Amplitude, Angle, Basis) override the trait method and delegate to their inherent _with_stream workhorse; Amplitude gains the missing inherent workhorse (extracted verbatim from QdpEngine). Pure structural change, no behavior or kernel changes.

Changes:

Add encode_from_gpu_ptr_f32 to QuantumEncoder (Linux/CUDA-gated) with default NotImplemented body, and override on Amplitude/Angle/Basis.
Extract Amplitude's inlined f32 single-sample logic out of QdpEngine::encode_from_gpu_ptr_f32_with_stream into a new AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream workhorse.
Add four trait-dispatch integration tests in gpu_ptr_encoding.rs covering amplitude/angle/basis success and phase default NotImplemented.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
qdp/qdp-core/src/gpu/encodings/mod.rs	Adds the f32 single-sample method to the `QuantumEncoder` trait with a default `NotImplemented` body.
qdp/qdp-core/src/gpu/encodings/amplitude.rs	Adds trait override and new inherent `encode_from_gpu_ptr_f32_with_stream` workhorse extracted from `lib.rs`.
qdp/qdp-core/src/gpu/encodings/angle.rs	Adds trait override delegating to the existing inherent workhorse.
qdp/qdp-core/src/gpu/encodings/basis.rs	Adds trait override delegating to the existing inherent workhorse (drops unused `input_len`).
qdp/qdp-core/src/lib.rs	`QdpEngine::encode_from_gpu_ptr_f32_with_stream` now delegates to the encoder workhorse.
qdp/qdp-core/tests/gpu_ptr_encoding.rs	Adds four trait-dispatch tests via `Encoding::encoder()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rich7420 requested review from 400Ping, guan404ming and ryankert01 as code owners May 12, 2026 02:54

0lai0 mentioned this pull request May 25, 2026

[Roadmap] GSOC2026 Rust code quality: QDP pipeline #1338

Open

8 tasks

rich7420 requested a review from Copilot May 26, 2026 03:49

Copilot started reviewing on behalf of rich7420 May 26, 2026 03:49 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310

feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait#1310
rich7420 wants to merge 1 commit into
apache:mainfrom
rich7420:feat/encode-from-gpu-ptr-f32-on-trait

rich7420 commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rich7420 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Changes

Why

How

Local verification

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rich7420 commented May 12, 2026 •

edited

Loading