You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/analysis.tex
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,7 @@ \subsection{Methods}
75
75
76
76
For constant $\mtx{\Omega}$, the estimator~\refequ{equ:nystrompp-trace-estimator} coincides with the Nyström++ estimator from~\cite{persson-2022-improved-variants}, which is based on the Hutch++ estimator~\cite{meyer-2021-hutch-optimal}. In this situation (constant $\mtx{B}$), these estimators were both shown to achieve a relative $\varepsilon$-error with $\mathcal{O}(\varepsilon^{-1})$ matrix-vector products only, independent of the singular value decay of $\mtx{B}$.
77
77
78
-
\textcolor{green}{At this point, we acknowledge the existence of the XNysTrace estimator from \cite{epperly-2024-xtrace-making} which oftentimes outperforms the Nyström++ estimator. Moreover, a straight-forward extension of the XNysTrace estimator to the parameter-dependent setting seems to be in reach. However, what distinguishes \refalg{alg:nystrom-chebyshev-pp} for estimating spectral densities are two key observations --- using the cyclic invariance of the trace and the affine linear form of Chebyshev expansions (see \refsec{subsubsec:chebyshev-nystrom-implementation} for details) --- which it exploits to significantly speed up the computation. Unfortunately, we do not see a way of marrying these observations with the efficient implementation of XNysTrace described in \cite[Section 2.2]{epperly-2024-xtrace-making}, which limits its suitability for spectral density estimation.} \textcolor{green}{??? Maybe this would be more appropriate in the introduction? ???}
78
+
At this point, we acknowledge the existence of the XNysTrace estimator from \cite{epperly-2024-xtrace-making} which oftentimes outperforms the Nyström++ estimator. Moreover, a straight-forward extension of the XNysTrace estimator to the parameter-dependent setting seems to be in reach. However, what distinguishes \refalg{alg:nystrom-chebyshev-pp} for estimating spectral densities are two key observations --- using the cyclic invariance of the trace and the affine linear form of Chebyshev expansions (see \refsec{subsubsec:chebyshev-nystrom-implementation} for details) --- which it exploits to significantly speed up the computation. Unfortunately, we do not see a way of marrying these observations with the efficient implementation of XNysTrace described in \cite[Section 2.2]{epperly-2024-xtrace-making}, which limits its suitability for spectral density estimation.
79
79
%We can interpret this estimator as an interpolation between the trace of the Nyström approximation and the Girard-Hutchinson estimator.
80
80
81
81
%In the remainder of this section, we will derive upper bounds on the error of these estimators.
@@ -360,7 +360,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
360
360
\label{qu:nystrompp-theorem-bound}
361
361
\end{equation}
362
362
holds with probability at least $1 - \gamma^{-n_{\mtx{\Omega}} / 4}$ for
363
-
$c = \textcolor{green}{154}$. In particular, given $\varepsilon > 0$ and $\delta\in (0, 1)$, the bound $\int_{a}^{b} \lVert\mtx{B}(t) - \Nystr{\mtx{\Omega}}{\mtx{B}}(t) \rVert _F~\mathrm{d}t \leq\varepsilon\int_{a}^{b} \Trace(\mtx{B}(t))~\mathrm{d}t$ holds with probability at least $1-\delta$ if $n_{\mtx{\Omega}} = \mathcal{O}(\varepsilon^{-2} + \log(\delta^{-1}))$.
363
+
$c = 154$. In particular, given $\varepsilon > 0$ and $\delta\in (0, 1)$, the bound $\int_{a}^{b} \lVert\mtx{B}(t) - \Nystr{\mtx{\Omega}}{\mtx{B}}(t) \rVert _F~\mathrm{d}t \leq\varepsilon\int_{a}^{b} \Trace(\mtx{B}(t))~\mathrm{d}t$ holds with probability at least $1-\delta$ if $n_{\mtx{\Omega}} = \mathcal{O}(\varepsilon^{-2} + \log(\delta^{-1}))$.
364
364
\end{lemma}
365
365
366
366
%\todo{Proof idea: structural bound, then higher order moment bound to apply Markov's inequality.}
@@ -380,7 +380,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
380
380
\end{equation*}
381
381
we set
382
382
$\mtx{\Omega}_1 := \mtx{U}_1^{\top} \mtx{\Omega} \in\mathbb{R}^{k \times n_{\mtx{\Omega}}}$ and $\mtx{\Omega}_2 := \mtx{U}_2^{\top} \mtx{\Omega} \in\mathbb{R}^{(n - k) \times n_{\mtx{\Omega}}}$, which are independent Gaussian random matrices.
383
-
Applying Theorem B.1 from~\cite{persson-2023-randomized-lowrank} for $f(x) = x$, see also \textcolor{green}{proof of \cite[Corollary 8.2]{tropp-2023-randomized-algorithms}}, yields the bound
383
+
Applying Theorem B.1 from~\cite{persson-2023-randomized-lowrank} for $f(x) = x$, see also proof of \cite[Corollary 8.2]{tropp-2023-randomized-algorithms}, yields the bound
where we used \textcolor{green}{$(1 + m)^{\sfrac{1}{m}} \leq e$ and}$(m!)^{-\sfrac{1}{m}} \leq e/m$. Note that, in contrast
425
+
where we used $(1 + m)^{\sfrac{1}{m}} \leq e$ and $(m!)^{-\sfrac{1}{m}} \leq e/m$. Note that, in contrast
426
426
to the result of~\cite[Lemma B.3]{tropp-2023-randomized-algorithms}, this inequality is valid for arbitrarily large $n_{\mtx{\Omega}}$, at the expense of a slightly larger constant.
427
427
%
428
428
% To match the decay rate of the moments of the first term in \refequ{equ:nystrom-proof-persson-bonud}, we will need to ensure that this term also is of order $\mathcal{O}(\Trace(\mtx{B}) / \sqrt{k})$. We bound the moments of $\lVert \mtx{\Omega}_1^{\dagger} \rVert _2$ similarly to \cite[Lemma B.3]{tropp-2023-randomized-algorithms}, but without restricting $q$ to be smaller than $18$. The explicit integration of \cite[Equation B.7]{tropp-2023-randomized-algorithms} imposes the condition $n_{\mtx{\Omega}} - k - 2q \geq 0$. Both $n_{\mtx{\Omega}}$ and $k$ are integers and must be of the same order to ensure a decay of $\mathcal{O}(\Trace(\mtx{B}) / \sqrt{k})$. To avoid restricting the choice of $n_{\mtx{\Omega}}$ too much, we let it be even and set $k = n_{\mtx{\Omega}}/2$. The moment $q$ should be chosen as large as possible to ensure a fast decay of the failure probability, so we fix it to $q = n_{\mtx{\Omega}}/4$. Therefore, we get
@@ -443,7 +443,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
443
443
The second factor in~\refequ{equ:nystrom-proof-processed-tail} is bounded using \reflem{lem:spectral-norm-moment} with $\mtx{A} = \mtx{\Lambda}_2^{\sfrac{1}{2}}$: % and $p = n_{\mtx{\Omega}}/2$:
%The $q$-th moment of the second term can be processed with standard matrix norm manipulations and the stochastic independence of $\mtx{\Omega}_1$ and $\mtx{\Omega}_2$ to
@@ -478,19 +478,19 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
478
478
Inserting this inequality and~\refequ{equ:pinv-spectral-norm-bound} into~\refequ{equ:nystrom-proof-processed-tail} gives
Inserting~\refequ{equ:nystrom-proof-tail-bound} along with \refequ{equ:nystrom-proof-frobenius-trace} in \refequ{equ:nystrom-proof-persson-bonud}\textcolor{green}{, letting $c=154$,} and using the triangle inequality for $\mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}[\cdot]$, we obtain
493
+
Inserting~\refequ{equ:nystrom-proof-tail-bound} along with \refequ{equ:nystrom-proof-frobenius-trace} in \refequ{equ:nystrom-proof-persson-bonud}, letting $c=154$, and using the triangle inequality for $\mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}[\cdot]$, we obtain
Copy file name to clipboardExpand all lines: paper/intro.tex
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ \section{Introduction}
12
12
13
13
For a \emph{constant} matrix $\mtx{B}$, one of the most popular trace estimators is the Girard-Hutchinson estimator \cite{girard-1989-fast-montecarlo, hutchinson-1990-stochastic-estimator} along with variance reduction techniques~\cite{gambhir-2017-deflation-method, saibaba-2017-randomized-matrixfree, lin-2017-randomized-estimation, meyer-2021-hutch-optimal, persson-2022-improved-variants, chen-2023-krylovaware-stochastic, epperly-2024-xtrace-making}. Suitable extensions to parameter-dependent matrices have been considered, e.g., in~\cite{lin-2017-randomized-estimation,chen-2023-krylovaware-stochastic}, but we are not aware of an analysis providing rigorous justification and insight of these extensions. In passing, we note that dynamic trace estimation~\cite{dharangutte-2024-dynamic-trace,woodruff-2024-optimal-query} is an efficient technique for subsequently estimating the traces of matrices $\mtx{B}(t_1), \dots, \mtx{B}(t_m)$ when the increments $\mtx{B}(t_{i+1}) - \mtx{B}(t_i)$ are relatively small in norm. The potential of dynamic trace estimation appears to be limited in our setting because $\mtx{B}(t)$ may change rapidly close to eigenvalues, with $g_{\sigma}$ approximating a Dirac delta function.
14
14
15
-
All methods considered in this work are based on the following simple idea: Apply an existing randomized trace estimator to $\Trace(\mtx{B}(t))$ with \emph{constant} random vectors, that is, the same randomization is used for each value of the parameter $t$. \textcolor{green}{For example, the Girard-Hutchinson estimator becomes $n_{\mtx{\Psi}}^{-1} \sum_{j=1}^{n_{\mtx{\Psi}}} \vct{\psi}_j^{\top} \mtx{B}(t) \vct{\psi}_j$ for $n_{\mtx{\Psi}}$ constant Gaussian random vectors $\vct{\psi}_1, \dots, \vct{\psi}_{n_{\mtx{\Psi}}}$.}
15
+
All methods considered in this work are based on the following simple idea: Apply an existing randomized trace estimator to $\Trace(\mtx{B}(t))$ with \emph{constant} random vectors, that is, the same randomization is used for each value of the parameter $t$. For example, the Girard-Hutchinson estimator becomes $n_{\mtx{\Psi}}^{-1} \sum_{j=1}^{n_{\mtx{\Psi}}} \vct{\psi}_j^{\top} \mtx{B}(t) \vct{\psi}_j$ for $n_{\mtx{\Psi}}$ constant Gaussian random vectors $\vct{\psi}_1, \dots, \vct{\psi}_{n_{\mtx{\Psi}}}$.
0 commit comments