Skip to content

pac_clip_sum, pac_clip_min, pac_clip_max: Per-User Contribution Clipping for PAC Aggregates #13

Merged
ila merged 29 commits intomainfrom
pac_clip
Apr 2, 2026
Merged

pac_clip_sum, pac_clip_min, pac_clip_max: Per-User Contribution Clipping for PAC Aggregates #13
ila merged 29 commits intomainfrom
pac_clip

Conversation

@ila
Copy link
Copy Markdown
Member

@ila ila commented Apr 1, 2026

Adds contribution clipping to PAC aggregates. When pac_clip_support is set, outlier contributions from users with too few distinct contributors at a given magnitude level are hard-zeroed, preventing variance side-channel attacks. Supports integer, float, double, and HUGEINT types for SUM, and integer/float/double for MIN/MAX.

pac_clip_sum (integer)
Level-based magnitude decomposition with SWAR bitslice counters. Each value is routed to a level based on its magnitude (62 levels, 2-bit shift = 4x per level, covering the full 128-bit range). Each level maintains 64 SWAR uint16 counters + overflow uint32 counters + a 64-bit distinct-contributor bitmap. At finalization, levels with fewer distinct contributors than pac_clip_support contribute nothing (hard-zero). Signed values are handled by splitting into separate positive and negative accumulators.

Float/double support
Floating-point values are converted to int64 before entering the integer-based level machinery via ScaleFloatToInt64<FLOAT_TYPE, SHIFT>. The scale factors are powers of 2 (2^20 for float, 2^27 for double) so the multiplication is exact in IEEE 754 — no rounding error is introduced by the scaling itself. Branchless clamping to [INT64_MIN, INT64_MAX] handles overflow. At finalization, the accumulated integer result is divided by the scale factor to recover the original floating-point range. This approach preserves ~6 significant digits for float and ~8 for double, which is sufficient for the PAC noise regime where the noise magnitude exceeds the lost precision.

pac_clip_min / pac_clip_max
Level-based clipping for MIN/MAX using int8_t extremes per level instead of uint16 counters. Each value is routed to a level by magnitude (same 62-level, 2-bit-shift structure as clip_sum), then an arithmetic right shift compresses it to int8_t [-128, 127]. The sign is preserved because arithmetic shift extends the sign bit.

Each level stores:

  • 8 × uint64_t SWAR-packed int8_t extremes (64 worlds × 1 byte each)
  • 1 × uint64_t bitmap — distinct-contributor tracking, same birthday-paradox estimation as clip_sum

At finalization, per-level extremes are reconstructed by left-shifting by level * 2 bits. Levels below the pac_clip_support threshold are excluded (hard-zero). The final result is the worst (smallest for MIN, largest for MAX) surviving extreme across all non-zeroed levels.

BOUNDOPT optimization: Each level tracks the worst-of-64 extreme as a scalar level_bounds[k]. During update, if the incoming shifted value cannot beat the current bound, the expensive SWAR update is skipped entirely. The bound is recomputed every 64 updates. This optimization is critical for skewed distributions where most values land in the same few levels.

Inline level optimization: One level can be stored inline in the state struct (overlapping the last 9 pointer slots = 72 bytes), avoiding an arena allocation for the common case where only one level is active.

Tests

  • test/sql/pac_clip_sum.test (485 lines): level boundaries, HUGEINT, over-clipping, multi-group, float/double scaling, mixed types
  • test/sql/pac_clip_min_max.test (282 lines): basic min/max, signed values, float/double, hard-zero at low support, NULL handling

peterboncz and others added 29 commits March 23, 2026 23:36
Change suffix attenuation from soft-clamp (scale by 16^distance) to hard-zero
(skip entirely). Unsupported magnitude levels now contribute nothing to the
result, fully eliminating the variance side-channel.

Attack results with clip_support=2:
- Small filter (3-4 users): 96% → 47% (random)
- 20K small items: 96% → 53% (random)
- Std ratio in/out: 90x → 0.87x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Finer-grained magnitude levels (2-bit bands, 4x per level) allow the clipping
mechanism to catch moderate outliers that were previously invisible within the
same 16x-wide level. A 10x outlier (50k vs 5k normal) now lands in a different
level and gets hard-zeroed.

Changes:
- PAC2_LEVEL_SHIFT: 4 → 2
- PAC2_NUM_LEVELS: 31 → 32 (covers int64; HUGEINT clamps to level 31)
- GetLevel/GetLevel128: divide by 2 instead of 4, clamp to max level
- Inline optimization threshold: 13 → 14
- All shift extraction: level << 2 → level << 1

Memory: +8 bytes per state (256 vs 248 byte pointer array). Negligible.
Performance: no regression on TPCH Q01 SF1 (1.38s → 1.31s).
Security: moderate outlier attack drops from 76.5% to 52.9% (random).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…vior

With hard-zero, unsupported outlier levels contribute nothing, so the
clipped result equals (not exceeds) the no-outlier baseline. Change > to >=.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase PAC2_NUM_LEVELS from 32 to 62 to cover the full 128-bit range
without clamping. int64 values naturally use only levels 0-29 (the extra
pointer slots remain NULL, no per-level data is allocated). The inline
optimization threshold moves from 14 to 44 accordingly.

Memory: +240 bytes per state for the pointer array (496 vs 256 bytes).
Per-level data allocations are unchanged for int64 workloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…i-group

New test cases:
- Level boundary routing (same-level vs cross-level with 4x bands)
- HUGEINT outlier clipping (values at 2^70, beyond int64 range)
- Negative HUGEINT outlier via neg_state
- Over-clipping (clip_support > group size → zero result)
- Multi-group with outlier isolated to one group

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fetched from main and added:
- Development rules: test coverage, no test removal, codebase-first search,
  helper function reuse, duckdb submodule is read-only
- Reference to the PAC paper (arXiv:2603.15023)
- PAC_DEBUG_PRINT usage guidance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Attack scripts testing the variance side-channel MIA against pac_clip_sum:
- clip_attack_test.sh: main suite (small filter, wide filter, 10K users, etc.)
- clip_multirow_test.sh: 20K small items user (tests pre-aggregation)
- clip_hardzero_stress.sh: stress tests (high trials, composed queries, collusion)
- clip_shift2_stress.sh: tests with 4x magnitude levels (shift=2)
- clipping_experiment.sh: input clipping (Winsorization) baseline
- output_clipping_experiment.sh: post-hoc output clipping baseline
- output_clipping_v2_experiment.sh: output clipping before noise
- clip_attack_results.md: full evaluation with findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: added code style rules (clang-tidy naming, clang-format style),
  attack evaluation section, development rules
- .claude/settings.json: PostToolUse hook to auto-run make format-fix after edits
- Skills: /run-attacks, /test-clip, /explain-pac, /explain-dp, /explain-pac-ddl

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change suffix attenuation from soft-clamp (scale by 16^distance) to hard-zero
(skip entirely). Unsupported magnitude levels now contribute nothing to the
result, fully eliminating the variance side-channel.

Attack results with clip_support=2:
- Small filter (3-4 users): 96% → 47% (random)
- 20K small items: 96% → 53% (random)
- Std ratio in/out: 90x → 0.87x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Finer-grained magnitude levels (2-bit bands, 4x per level) allow the clipping
mechanism to catch moderate outliers that were previously invisible within the
same 16x-wide level. A 10x outlier (50k vs 5k normal) now lands in a different
level and gets hard-zeroed.

Changes:
- PAC2_LEVEL_SHIFT: 4 → 2
- PAC2_NUM_LEVELS: 31 → 32 (covers int64; HUGEINT clamps to level 31)
- GetLevel/GetLevel128: divide by 2 instead of 4, clamp to max level
- Inline optimization threshold: 13 → 14
- All shift extraction: level << 2 → level << 1

Memory: +8 bytes per state (256 vs 248 byte pointer array). Negligible.
Performance: no regression on TPCH Q01 SF1 (1.38s → 1.31s).
Security: moderate outlier attack drops from 76.5% to 52.9% (random).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…vior

With hard-zero, unsupported outlier levels contribute nothing, so the
clipped result equals (not exceeds) the no-outlier baseline. Change > to >=.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase PAC2_NUM_LEVELS from 32 to 62 to cover the full 128-bit range
without clamping. int64 values naturally use only levels 0-29 (the extra
pointer slots remain NULL, no per-level data is allocated). The inline
optimization threshold moves from 14 to 44 accordingly.

Memory: +240 bytes per state for the pointer array (496 vs 256 bytes).
Per-level data allocations are unchanged for int64 workloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…i-group

New test cases:
- Level boundary routing (same-level vs cross-level with 4x bands)
- HUGEINT outlier clipping (values at 2^70, beyond int64 range)
- Negative HUGEINT outlier via neg_state
- Over-clipping (clip_support > group size → zero result)
- Multi-group with outlier isolated to one group

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fetched from main and added:
- Development rules: test coverage, no test removal, codebase-first search,
  helper function reuse, duckdb submodule is read-only
- Reference to the PAC paper (arXiv:2603.15023)
- PAC_DEBUG_PRINT usage guidance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Attack scripts testing the variance side-channel MIA against pac_clip_sum:
- clip_attack_test.sh: main suite (small filter, wide filter, 10K users, etc.)
- clip_multirow_test.sh: 20K small items user (tests pre-aggregation)
- clip_hardzero_stress.sh: stress tests (high trials, composed queries, collusion)
- clip_shift2_stress.sh: tests with 4x magnitude levels (shift=2)
- clipping_experiment.sh: input clipping (Winsorization) baseline
- output_clipping_experiment.sh: post-hoc output clipping baseline
- output_clipping_v2_experiment.sh: output clipping before noise
- clip_attack_results.md: full evaluation with findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: added code style rules (clang-tidy naming, clang-format style),
  attack evaluation section, development rules
- .claude/settings.json: PostToolUse hook to auto-run make format-fix after edits
- Skills: /run-attacks, /test-clip, /explain-pac, /explain-dp, /explain-pac-ddl

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the pac_metadata JSON sidecar files: naming convention, auto-loading,
save/clear pragmas, and the important note to delete metadata when recreating DBs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
explain-pac: added formal PAC definition, 4-step privatization template,
MI-to-posterior success rate table, composition theorem, PAC vs DP comparison,
and SIMD-PAC-DB implementation details.

explain-dp: added PAC vs DP comparison table, loose bounds insight,
privacy-conscious design (MSE = Bias² + (1/(2B)+1)·Var), and implications
for clipping (reducing variance improves privacy-utility tradeoff).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements level-based clipping for MIN/MAX aggregates (pac_clip_min,
pac_clip_max, pac_noised_clip_min, pac_noised_clip_max) using int8_t
extremes with per-level bitmaps for support estimation. Replaces the
previous alias-only stubs with a real implementation that reuses
UpdateExtremesSIMD from pac_min_max.hpp.

Adds native FLOAT/DOUBLE overloads for pac_clip_sum and pac_clip_min_max
using power-of-2 scale factors (2^20 for float, 2^27 for double) to
convert to int64 before entering the integer-based level machinery.
Removes the lossy BIGINT cast workaround from the expression builder.

Includes BOUNDOPT (per-level bound optimization), AllValid fast paths,
and shared ScaleFloatToInt64 helper with branchless clamping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds github.com/ila/duckdb-claude-skills at .claude/skills/shared/
with 7 generic DuckDB extension skills: best-practices, code-review,
plan-feature, project-review, duckdb-internals, write-docs, run-tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… previous

 merge commit already -- apologies).

Refactor pac_clip: shared code, two-sided unsigned min/max, unified outlier clipping

- Factor shared code into pac_clip_aggr.hpp: CLIP_* constants, ScaleFloatToInt64,
  ClipEstimateDistinct, PacClipBindData, PacClipBind functions. Remove duplicates
  from pac_clip_sum.hpp/cpp and pac_clip_min_max.hpp/cpp.

- Convert pac_clip_min_max from signed int8_t to two-sided unsigned uint8_t:
  positive values in pos_state, absolute negatives in neg_state (with !IS_MAX).
  GetLevel threshold 128→256, giving 8-bit precision instead of 7-bit.
  Lazy neg_state allocation: positive-only data never allocates it.

- Unify outlier elimination across sum and min/max using shared
  ClipFindSupportedRange and ClipEffectiveLevel helpers. Both now use
  first/last supported boundary logic (min/max previously did per-level
  independent filtering, missing interior-level preservation).

- Add pac_clip_scale setting (BOOLEAN, default false). When false, unsupported
  prefix/suffix levels are omitted. When true, they are scaled to the nearest
  supported boundary (4^distance). This replaces sum's previous asymmetric
  behavior (prefix scaled, suffix omitted) with a symmetric policy.

- Remove stale clip min/max stub registrations from pac_min_max.cpp
  (superseded by real implementations in pac_clip_min_max.cpp).

- Remove C++17 if constexpr usage from pac_clip_min_max.

- Add tests for negative values, mixed pos/neg, negative-only, and
  neg-outlier clipping in pac_clip_min_max.test.
memory optimizations for clipping:
- save second state pointer for unsigned types (one-sided)
- only hugeint needs 62 levels, int64 can do with 30
  use templating to make both variants possible in the same code
- we do not reduce below int64 because if we would, inlining would
  not work and there would be no memory savings anyway
@ila ila merged commit c01d39e into main Apr 2, 2026
16 checks passed
@ila ila deleted the pac_clip branch April 2, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants