-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathbeginner.html
More file actions
309 lines (272 loc) · 15.4 KB
/
beginner.html
File metadata and controls
309 lines (272 loc) · 15.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
<!DOCTYPE html>
<html lang="en" data-theme="dark">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Beginner's Guide - Privacy-Preserving AI Computing</title>
<meta name="description" content="Beginner's guide to Privacy-Preserving AI Computing.">
<!-- Google Fonts: Inter & Outfit for a more modern look -->
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link
href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=Outfit:wght@400;500;700;800&display=swap"
rel="stylesheet">
<link rel="stylesheet" href="styles.css">
</head>
<body>
<div class="app-layout">
<!-- Sidebar Navigation -->
<aside class="sidebar">
<div class="sidebar-header">
<div class="logo">CPA TUTORIAL</div>
<button id="mobile-close-btn" class="icon-btn" aria-label="Close Menu">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<path d="M18 6L6 18M6 6l12 12" />
</svg>
</button>
</div>
<nav class="sidebar-nav">
<!-- New Sections -->
<div class="nav-section">
<h3 class="nav-title">tutorial</h3>
<ul class="nav-links">
<li><a href="index.html">ASPLOS'26@Pittsburgh</a></li>
</ul>
</div>
<div class="nav-section">
<h3 class="nav-title">docs</h3>
<ul class="nav-links">
<li><a href="beginner.html" class="active">Easy HE Background</a></li>
<li><a href="cross.html">CROSS</a></li>
<li><a href="tpu.html">TPU Novice to Master</a></li>
<li><a href="ntt.html">NTT Algorithms</a></li>
<li><a href="open_challenge.html">Open Challenge</a></li>
</ul>
</div>
</nav>
</aside>
<!-- Main Content Area -->
<main class="main-content">
<header class="top-bar">
<button id="mobile-menu-btn" class="icon-btn" aria-label="Open Menu">
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<path d="M3 12h18M3 6h18M3 18h18" />
</svg>
</button>
<div class="header-title">Beginner Guide</div>
<div class="top-actions">
<button id="theme-toggle" class="icon-btn" aria-label="Toggle Theme">
<!-- Sun Icon -->
<svg class="sun-icon" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor"
stroke-width="2">
<circle cx="12" cy="12" r="5" />
<path
d="M12 1v2M12 21v2M4.22 4.22l1.42 1.42M18.36 18.36l1.42 1.42M1 12h2M21 12h2M4.22 19.78l1.42-1.42M18.36 5.64l1.42-1.42" />
</svg>
<!-- Moon Icon -->
<svg class="moon-icon" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor"
stroke-width="2">
<path d="M21 12.79A9 9 0 1 1 11.21 3 7 7 0 0 0 21 12.79z" />
</svg>
</button>
</div>
</header>
<div class="content-wrapper">
<article class="doc-content">
<section id="chapter-1">
<h2>Chapter 1: Introduction to CKKS HE</h2>
<h3 id="what-is-he">What is HE and where is it used?</h3>
<p>
<strong>Homomorphic Encryption (HE)</strong> defines a scheme to encrypt data so that computation (such as
multiplication and addition) can be directly performed in the encrypted format. This is known as
<strong>encrypted computation</strong>.
</p>
<ul>
<li><strong>Privacy-Preserving AI:</strong> HE is primarily used for sensitive applications like
healthcare (protecting genomic data), finance (analyzing investment data), and government.</li>
<li><strong>Security Foundation:</strong> HE is built on <strong>lattice-based cryptography</strong>,
which is considered "post-quantum"—meaning it is secure even against future quantum computers.</li>
<li><strong>The CKKS Scheme:</strong> This specific version of HE is designed for <strong>approximate
arithmetic</strong>, making it perfect for real-world tasks like AI inference and signal processing
where slight rounding errors are acceptable.</li>
</ul>
<h3 id="data-representation">How is data represented after encryption?</h3>
<p>
To encrypt a list of numbers (a <strong>vector</strong>), we go through a process called
<strong>Packing</strong>.
</p>
<ol>
<li><strong>Ciphertext Structure:</strong> A single ciphertext contains a pair of two high-precision
polynomials of degree $N$.</li>
<li><strong>Slots and Packing:</strong> Each pair of polynomials can hold $N/2$ data elements, termed as
<strong>"slots"</strong>. The specific way individual data is placed into these slots is called
<strong>"packing"</strong> or <strong>"layout"</strong>.
</li>
<li><strong>High Precision:</strong> Each coefficient in these polynomials can be <strong>thousands of
bits long</strong> to ensure security, which is much larger than the 32-bit or 64-bit numbers standard
computers use.</li>
</ol>
<p>
<strong>Example:</strong> As shown in the illustration below, for $N=65536$, a vector of 32,768 data
points is encoded into an $N$-degree polynomial and then encrypted into two polynomials of the same
degree.
</p>
<img src="fig/image8.png" alt="Raw data encoding into polynomials">
<p>
<strong>The SIMD Mindset:</strong> Once data is encrypted, it behaves like a <strong>SIMD (Single
Instruction, Multiple Data)</strong> machine with a massive vector length. If you add two ciphertexts,
the computer adds all $N/2$ slots at once. This makes HE-based encrypted processing exactly work like a
SIMD vectorized hardware, such as AVX512. And HE is analogous to an AVX<N/2>, e.g., AVX32768 for
N=65536.
</p>
<p><strong>Allowed Operations:</strong></p>
<ul>
<li><strong>Addition:</strong> Slot-wise addition of two ciphertexts.</li>
<li><strong>Multiplication:</strong> Slot-wise multiplication. Note: This is computationally "expensive"
and requires extra steps like <strong>Relinearization</strong> and <strong>Rescaling</strong> to manage
noise and data size.</li>
<li><strong>Rotation:</strong> Moves all data in the slots left or right (e.g., move every number one slot
to the right).</li>
</ul>
<p>
Such restriction reformulates the efficiency optimization problem as a problem of how to convert
application into a sequence of SIMD Addition, SIMD Multiplication and SIMD rotation of a very-long vector,
to minimize overall costs.
</p>
</section>
<section id="he-compilation-stack">
<h2>Chapter 2: The HE Compilation Stack</h2>
<p>
To tackle this entire problem structurally, HE acceleration techniques, particularly for CKKS encryption,
could be formally categorized into five distinct layers, including Packing, Mapping, Scheduling,
Decomposing and Binding.
</p>
<ol>
<li><strong>Packing</strong> defines how data is organized within ciphertext slots. Specifically, a CKKS
ciphertext operates like a vector of SIMD units, encoding multiple data per ciphertext and enforcing
lock-step operator for all encoded data. Thus, operators in original applications initially designed for
element-wise computations must be transformed into SIMD-compatible, HE-specific Privacy-Preserving
Operators (PP-Ops).</li>
<li><strong>Mapping</strong> translates PP-Ops into sequences of fundamental HE operators. Optimal mapping
seeks maximum arithmetic intensity, data reuse, and parallelism while minimizing computational and
memory overhead to reduce latency.</li>
<li><strong>Scheduling</strong> determines how HE kernels are scheduled for each HE operator (e.g.,
addition, multiplication, rotation, rescale, bootstrapping).</li>
<li><strong>Decomposing</strong> specifies arithmetic and memory operations on individual ciphertexts for
HE kernels.</li>
<li><strong>Binding:</strong> Arithmetic and memory operations are translated into hardware-specific
programming interfaces (e.g. JAX for TPU), or low-level hardware ISAs (e.g. SIMD ISA for TPU).</li>
</ol>
<img src="fig/image12.png" alt="HE Compilation Stack Overview">
</section>
<section id="math-foundations">
<h2>Chapter 3: Mathematical Foundations & Data Representation</h2>
<p>
Standard hardware units (like 32-bit CPUs or 8-bit TPU MXUs) cannot natively support thousand-bit
coefficients. We solve this through <strong>precision lowering</strong> and <strong>tensor
representation</strong>.
</p>
<h3>Mathematical Basics: Fields and Rings</h3>
<p>To understand HE from a computation perspective, we look at how numbers are constrained:</p>
<ul>
<li><strong>Finite Fields (Coefficients):</strong> Each polynomial coefficient is an integer in a finite
field, meaning there are limited choices of integer values. For example, a modulus $q=14$ gives 14
choices (0–13). If a value overflows, a modular reduction brings it back into range.</li>
</ul>
<img src="fig/image11.png" alt="Finite Field Modulus Example">
<ul>
<li><strong>Polynomial Rings:</strong> Putting all coefficients together forms a <strong>ring</strong>,
which has a limited number of degrees. When the degree overflows, a modular reduction by a polynomial
modulus (e.g., $\Phi(x) = x^{16} + 1$) is used to bring it back.</li>
</ul>
<img src="fig/image3.png" alt="Polynomial Ring Example">
<h3>Precision Lowering (thousands to 32 or to 8)</h3>
<p>Standard hardware cannot process 1000-bit integers directly. We use two mathematical "tricks" to solve
this:</p>
<ol>
<li><strong>Residue Number System (RNS):</strong> We break one "giant" polynomial into $L$ smaller
polynomials called <strong>Limbs</strong> (or Towers). Each limb uses smaller coefficients (e.g., 32-bit
or 64-bit) that hardware natively supports.</li>
</ol>
<img src="fig/image9.png" alt="RNS Limbs">
<ol start="2">
<li><strong>Digit Decomposition:</strong> For even smaller hardware units (like 8-bit units in a TPU), we
further break these limbs into "chunks" or "bytes".</li>
</ol>
<img src="fig/image1.png" alt="Digit Decomposition">
<h3>Data Representation (4D or 5D Tensor)</h3>
<p>
After lowering the precision using techniques like the <strong>Residue Number System (RNS)</strong>, we
can represent ciphertext as a multi-dimensional tensor.
</p>
<ol>
<li><strong>4D Tensor:</strong> Each ciphertext is represented by a 4D vector indexed by {Batch (B),
Elements (E), Degree (N), Limbs (L)}. In this representation, each specific combination of indices
specifies a single coefficient.</li>
</ol>
<img src="fig/image2.png" alt="4D Tensor Representation" style="width: 70%;">
<ol start="2">
<li><strong>5D Tensor:</strong> If each individual byte of a coefficient must be explicitly represented
(for low-precision hardware like 8-bit units), the data becomes a 5D tensor: {Batch (B), Elements (E),
Degree (N), Limbs (L), Bytes (K)}.</li>
</ol>
<img src="fig/image5.png" alt="5D Tensor Representation">
<p>
All operations do not alter the value of degree, hence we could just hide it in the visualization for
simplicity. Therefore, we simplify it into representing each limb as a circle, as shown below.
</p>
<img src="fig/image4.png" alt="Simplified Limb Representation">
<h3>Acceleration with NTT</h3>
<p>
Multiplying polynomials naively is expensive ($O(N^2)$). We use the <strong>Number-Theoretic Transform
(NTT)</strong> to transform coefficients into the <strong>evaluation domain</strong>, where
multiplication becomes element-wise.
</p>
<ul>
<li><strong>Blue Circles:</strong> Represent the <strong>coefficient domain</strong>.</li>
<li><strong>Red Circles:</strong> Represent the <strong>evaluation domain</strong>.</li>
</ul>
<img src="fig/image6.png" alt="NTT Domain Transformation">
</section>
<!-- Citation Section -->
<section id="citation" class="citation-section">
<h2>Citation</h2>
<p>If you find this tutorial helpful, feel free to:</p>
<ul>
<li>Star CROSS repo at <a
href="https://github.com/EfficientPPML/CROSS">https://github.com/EfficientPPML/CROSS</a></li>
<li>Cite our paper with biblatex below:</li>
</ul>
<pre><code>@inproceedings{tong2025CROSS,
author = {Jianming Tong and Tianhao Huang and Jingtian Dang and Leo de Castro and Anirudh Itagi and Anupam
Golder and Asra Ali and Jevin Jiang and Jeremy Kun and Arvind and G. Edward Suh and Tushar Krishna},
title = {Leveraging ASIC AI Chips for Homomorphic Encryption},
year = {2026},
publisher = {2026 IEEE International Symposium on High Performance Computer Architecture (HPCA)},
address = {Australia},
keywords = {AI ASICs, TPU, Fully Homomorphic Encryption},
location = {Australia},
series = {HPCA'26} }</code></pre>
</section>
</article>
<aside class="on-this-page">
<h4>On this page</h4>
<ul>
<li><a href="#chapter-1">Introduction</a></li>
<li><a href="#what-is-he">What is HE?</a></li>
<li><a href="#data-representation">Data Representation</a></li>
<li><a href="#he-compilation-stack">Compilation Stack</a></li>
<li><a href="#math-foundations">Math Foundations</a></li>
<li><a href="#computational-flow">Computational Flow</a></li>
<li><a href="#citation">Citation</a></li>
</ul>
</aside>
</div>
<footer class="site-footer">
<p>© 2026 Cryptography Primitives Acceleration Tutorial.</p>
</footer>
</main>
</div>
<script src="script.js"></script>
</body>
</html>