remove green color

FMatti · FMatti · commit 191f7fb0bbbd · 2025-02-18T19:04:01.000+01:00
diff --git a/paper/analysis.tex b/paper/analysis.tex
@@ -75,7 +75,7 @@ \subsection{Methods}
 
 For constant $\mtx{\Omega}$, the estimator~\refequ{equ:nystrompp-trace-estimator} coincides with the Nyström++ estimator from~\cite{persson-2022-improved-variants}, which is based on the Hutch++ estimator~\cite{meyer-2021-hutch-optimal}. In this situation (constant $\mtx{B}$), these estimators were both shown to achieve a relative $\varepsilon$-error with $\mathcal{O}(\varepsilon^{-1})$ matrix-vector products only, independent of the singular value decay of $\mtx{B}$.
 
-\textcolor{green}{At this point, we acknowledge the existence of the XNysTrace estimator from \cite{epperly-2024-xtrace-making} which oftentimes outperforms the Nyström++ estimator. Moreover, a straight-forward extension of the XNysTrace estimator to the parameter-dependent setting seems to be in reach. However, what distinguishes \refalg{alg:nystrom-chebyshev-pp} for estimating spectral densities are two key observations --- using the cyclic invariance of the trace and the affine linear form of Chebyshev expansions (see \refsec{subsubsec:chebyshev-nystrom-implementation} for details) --- which it exploits to significantly speed up the computation. Unfortunately, we do not see a way of marrying these observations with the efficient implementation of XNysTrace described in \cite[Section 2.2]{epperly-2024-xtrace-making}, which limits its suitability for spectral density estimation.} \textcolor{green}{??? Maybe this would be more appropriate in the introduction? ???}
+At this point, we acknowledge the existence of the XNysTrace estimator from \cite{epperly-2024-xtrace-making} which oftentimes outperforms the Nyström++ estimator. Moreover, a straight-forward extension of the XNysTrace estimator to the parameter-dependent setting seems to be in reach. However, what distinguishes \refalg{alg:nystrom-chebyshev-pp} for estimating spectral densities are two key observations --- using the cyclic invariance of the trace and the affine linear form of Chebyshev expansions (see \refsec{subsubsec:chebyshev-nystrom-implementation} for details) --- which it exploits to significantly speed up the computation. Unfortunately, we do not see a way of marrying these observations with the efficient implementation of XNysTrace described in \cite[Section 2.2]{epperly-2024-xtrace-making}, which limits its suitability for spectral density estimation.
 %We can interpret this estimator as an interpolation between the trace of the Nyström approximation and the Girard-Hutchinson estimator.
 
 %In the remainder of this section, we will derive upper bounds on the error of these estimators.
@@ -360,7 +360,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
         \label{qu:nystrompp-theorem-bound}
     \end{equation}
         holds with probability at least $1 - \gamma^{-n_{\mtx{\Omega}} / 4}$ for 
-        $c = \textcolor{green}{154}$. In particular, given $\varepsilon > 0$ and $\delta \in (0, 1)$, the bound $\int_{a}^{b} \lVert \mtx{B}(t) - \Nystr{\mtx{\Omega}}{\mtx{B}}(t) \rVert _F~\mathrm{d}t \leq \varepsilon \int_{a}^{b} \Trace(\mtx{B}(t))~\mathrm{d}t$ holds with probability at least $1-\delta$ if $n_{\mtx{\Omega}} = \mathcal{O}(\varepsilon^{-2} + \log(\delta^{-1}))$.
+        $c = 154$. In particular, given $\varepsilon > 0$ and $\delta \in (0, 1)$, the bound $\int_{a}^{b} \lVert \mtx{B}(t) - \Nystr{\mtx{\Omega}}{\mtx{B}}(t) \rVert _F~\mathrm{d}t \leq \varepsilon \int_{a}^{b} \Trace(\mtx{B}(t))~\mathrm{d}t$ holds with probability at least $1-\delta$ if $n_{\mtx{\Omega}} = \mathcal{O}(\varepsilon^{-2} + \log(\delta^{-1}))$.
 \end{lemma}
 
 %\todo{Proof idea: structural bound, then higher order moment bound to apply Markov's inequality.}
@@ -380,7 +380,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
     \end{equation*}
     we set
     $\mtx{\Omega}_1 := \mtx{U}_1^{\top} \mtx{\Omega} \in \mathbb{R}^{k \times n_{\mtx{\Omega}}}$ and $\mtx{\Omega}_2 := \mtx{U}_2^{\top} \mtx{\Omega} \in \mathbb{R}^{(n - k) \times n_{\mtx{\Omega}}}$, which are independent Gaussian random matrices.
-Applying Theorem B.1 from~\cite{persson-2023-randomized-lowrank} for $f(x) = x$, see also \textcolor{green}{proof of \cite[Corollary 8.2]{tropp-2023-randomized-algorithms}}, yields the bound
+Applying Theorem B.1 from~\cite{persson-2023-randomized-lowrank} for $f(x) = x$, see also proof of \cite[Corollary 8.2]{tropp-2023-randomized-algorithms}, yields the bound
     \begin{equation}
         \lVert \mtx{B}(t) - \Nystr{\mtx{\Omega}}{\mtx{B}}(t) \rVert _F 
         \leq  \lVert \mtx{\Lambda}_2 \rVert _F + \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \mtx{\Omega}_2 \mtx{\Omega}_1^{\dagger} \rVert _{(4)}^2,
@@ -410,7 +410,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
     \[
      \mathbb{E}\left[ \big\| ( \mtx{\Omega}_1 \mtx{\Omega}_1^{\top} )^{-1} \big\|_2^{\sfrac{n_{\mtx{\Omega}}}{4}} \right]%  \notag \\
         \leq
-        \textcolor{green}{\left(1 + \frac{n_{\mtx{\Omega}}}{2}\right)}
+        \left(1 + \frac{n_{\mtx{\Omega}}}{2}\right)
         \left( \frac{3}{4} n_{\mtx{\Omega}}\right)^{\sfrac{n_{\mtx{\Omega}}}{4}}
         \big( ( n_{\mtx{\Omega}} / 2 + 1)!\big)^{-\frac{n_{\mtx{\Omega}}}{2+n_{\mtx{\Omega}}}}.
     \]
@@ -422,7 +422,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
         \frac{e^4 n_{\mtx{\Omega}}}{(n_{\mtx{\Omega}} / 2+ 1)^2} \le 
         \frac{3}{4}  \frac{e^4}{n_{\mtx{\Omega}}},
     \end{equation}
-    where we used \textcolor{green}{$(1 + m)^{\sfrac{1}{m}} \leq e$ and} $(m!)^{-\sfrac{1}{m}} \leq e/m$. Note that, in contrast 
+    where we used $(1 + m)^{\sfrac{1}{m}} \leq e$ and $(m!)^{-\sfrac{1}{m}} \leq e/m$. Note that, in contrast 
     to the result of~\cite[Lemma B.3]{tropp-2023-randomized-algorithms}, this inequality is valid for arbitrarily large $n_{\mtx{\Omega}}$, at the expense of a slightly larger constant.
 %
 %     To match the decay rate of the moments of the first term in \refequ{equ:nystrom-proof-persson-bonud}, we will need to ensure that this term also is of order $\mathcal{O}(\Trace(\mtx{B}) / \sqrt{k})$. We bound the moments of $\lVert \mtx{\Omega}_1^{\dagger} \rVert _2$ similarly to \cite[Lemma B.3]{tropp-2023-randomized-algorithms}, but without restricting $q$ to be smaller than $18$. The explicit integration of \cite[Equation B.7]{tropp-2023-randomized-algorithms} imposes the condition $n_{\mtx{\Omega}} - k - 2q \geq 0$. Both $n_{\mtx{\Omega}}$ and $k$ are integers and must be of the same order to ensure a decay of $\mathcal{O}(\Trace(\mtx{B}) / \sqrt{k})$. To avoid restricting the choice of $n_{\mtx{\Omega}}$ too much, we let it be even and set $k = n_{\mtx{\Omega}}/2$. The moment $q$ should be chosen as large as possible to ensure a fast decay of the failure probability, so we fix it to $q = n_{\mtx{\Omega}}/4$. Therefore, we get
@@ -443,7 +443,7 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
     The second factor in~\refequ{equ:nystrom-proof-processed-tail} is bounded using \reflem{lem:spectral-norm-moment} with $\mtx{A} = \mtx{\Lambda}_2^{\sfrac{1}{2}}$: % and $p = n_{\mtx{\Omega}}/2$:
     \[
         \mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}\left[ \big\| \mtx{\Lambda}_2^{\sfrac{1}{2}} \mtx{\Omega}_2 \big\|_2^2 \right]
-        \leq \textcolor{green}{\frac{5}{4}} n_{\mtx{\Omega}} \Big( \textcolor{green}{2} \big\| \mtx{\Lambda}_2^{\sfrac{1}{2}} \big\|_2^2 + \frac{1}{n_{\mtx{\Omega}}} \big\| \mtx{\Lambda}_2^{\sfrac{1}{2}} \big\|_F^2 \Big).
+        \leq \frac{5}{4} n_{\mtx{\Omega}} \Big( 2 \big\| \mtx{\Lambda}_2^{\sfrac{1}{2}} \big\|_2^2 + \frac{1}{n_{\mtx{\Omega}}} \big\| \mtx{\Lambda}_2^{\sfrac{1}{2}} \big\|_F^2 \Big).
       %  \label{equ:spectral-norm-bound-applied}
     \]
     %The $q$-th moment of the second term can be processed with standard matrix norm manipulations and the stochastic independence of $\mtx{\Omega}_1$ and $\mtx{\Omega}_2$ to
@@ -478,19 +478,19 @@ \subsection{Nyström++ estimator for parameter-dependent matrices}
     Inserting this inequality and~\refequ{equ:pinv-spectral-norm-bound} into~\refequ{equ:nystrom-proof-processed-tail} gives
     \begin{equation}
         \mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}\left[ \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \mtx{\Omega}_2 \mtx{\Omega}_1^{\dagger} \rVert _{(4)}^2 \right]
-        \leq \textcolor{green}{\frac{15 e^4}{16}}  \sqrt{n_{\mtx{\Omega}}} \Big( \textcolor{green}{2} \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _2^2 + \frac{1}{n_{\mtx{\Omega}}} \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _F^2 \Big).
+        \leq \frac{15 e^4}{16}  \sqrt{n_{\mtx{\Omega}}} \Big( 2 \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _2^2 + \frac{1}{n_{\mtx{\Omega}}} \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _F^2 \Big).
         %\leq \sqrt{k} \frac{e^4}{2} \frac{(k + n_{\mtx{\Omega}})(2p + n_{\mtx{\Omega}})}{(n_{\mtx{\Omega}} - k + 1)^2}  \left( 3 \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _2^2 + \frac{1}{n_{\mtx{\Omega}}} \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _F^2 \right).
     \end{equation}
     Bounding $\lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _2^2 = \lambda_{k+1}  \leq \Trace(\mtx{B})/k$ and $\lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _F^2 = \Trace(\mtx{\Lambda}_2) \le \Trace(\mtx{B})$ (recall that $k = n_{\mtx{\Omega}}/2$) yields
     \begin{equation}
         \mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}\left[ \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \mtx{\Omega}_2 \mtx{\Omega}_1^{\dagger} \rVert _{(4)}^2 \right]
-        \leq \textcolor{green}{\frac{15 e^4}{16}}  \sqrt{n_{\mtx{\Omega}}} \Big( \frac{2}{n_{\mtx{\Omega}}} \Trace(\mtx{B}) + \frac{1}{n_{\mtx{\Omega}}} \Trace(\mtx{B}) \Big)
-        \leq  \frac{\textcolor{green}{154}}{\sqrt{n_{\mtx{\Omega}}}} \Trace(\mtx{B}).
+        \leq \frac{15 e^4}{16}  \sqrt{n_{\mtx{\Omega}}} \Big( \frac{2}{n_{\mtx{\Omega}}} \Trace(\mtx{B}) + \frac{1}{n_{\mtx{\Omega}}} \Trace(\mtx{B}) \Big)
+        \leq  \frac{154}{\sqrt{n_{\mtx{\Omega}}}} \Trace(\mtx{B}).
         \label{equ:nystrom-proof-tail-bound}
         %\leq \sqrt{k} \frac{e^4}{2} \frac{(k + n_{\mtx{\Omega}})(2p + n_{\mtx{\Omega}})}{(n_{\mtx{\Omega}} - k + 1)^2}  \left( 3 \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _2^2 + \frac{1}{n_{\mtx{\Omega}}} \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \rVert _F^2 \right).
     \end{equation}
     
-    Inserting~\refequ{equ:nystrom-proof-tail-bound} along with \refequ{equ:nystrom-proof-frobenius-trace} in \refequ{equ:nystrom-proof-persson-bonud}\textcolor{green}{, letting $c=154$,} and using the triangle inequality for $\mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}[\cdot]$, we obtain
+    Inserting~\refequ{equ:nystrom-proof-tail-bound} along with \refequ{equ:nystrom-proof-frobenius-trace} in \refequ{equ:nystrom-proof-persson-bonud}, letting $c=154$, and using the triangle inequality for $\mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}}[\cdot]$, we obtain
     \begin{equation} \label{eq:blubber}
         \mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}} \left[\lVert \mtx{B} - \Nystr{\mtx{\Omega}}{\mtx{B}} \rVert _F \right]
         \leq \mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}} \left[ \lVert \mtx{\Lambda}_2 \rVert _F \right] + \mathbb{E}^{\sfrac{n_{\mtx{\Omega}}}{4}} \left[ \lVert \mtx{\Lambda}_2^{\sfrac{1}{2}} \mtx{\Omega}_2 \mtx{\Omega}_1^{\dagger} \rVert _{(4)}^2 \right]
diff --git a/paper/intro.tex b/paper/intro.tex
@@ -12,7 +12,7 @@ \section{Introduction}
 
 For a \emph{constant} matrix $\mtx{B}$, one of the most popular trace estimators is the Girard-Hutchinson estimator \cite{girard-1989-fast-montecarlo, hutchinson-1990-stochastic-estimator} along with variance reduction techniques~\cite{gambhir-2017-deflation-method, saibaba-2017-randomized-matrixfree, lin-2017-randomized-estimation, meyer-2021-hutch-optimal, persson-2022-improved-variants, chen-2023-krylovaware-stochastic, epperly-2024-xtrace-making}. Suitable extensions to parameter-dependent matrices have been considered, e.g., in~\cite{lin-2017-randomized-estimation,chen-2023-krylovaware-stochastic}, but we are not aware of an analysis providing rigorous justification and insight of these extensions. In passing, we note that dynamic trace estimation~\cite{dharangutte-2024-dynamic-trace,woodruff-2024-optimal-query} is an efficient technique for subsequently estimating the traces of matrices $\mtx{B}(t_1), \dots, \mtx{B}(t_m)$ when the increments $\mtx{B}(t_{i+1}) - \mtx{B}(t_i)$ are relatively small in norm. The potential of dynamic trace estimation appears to be limited in our setting because $\mtx{B}(t)$ may change rapidly close to eigenvalues, with $g_{\sigma}$ approximating a Dirac delta function.
 
-All methods considered in this work are based on the following simple idea: Apply an existing randomized trace estimator to $\Trace(\mtx{B}(t))$ with \emph{constant} random vectors, that is, the same randomization is used for each value of the parameter $t$. \textcolor{green}{For example, the Girard-Hutchinson estimator becomes $n_{\mtx{\Psi}}^{-1} \sum_{j=1}^{n_{\mtx{\Psi}}} \vct{\psi}_j^{\top} \mtx{B}(t) \vct{\psi}_j$ for $n_{\mtx{\Psi}}$ constant Gaussian random vectors $\vct{\psi}_1, \dots, \vct{\psi}_{n_{\mtx{\Psi}}}$.}
+All methods considered in this work are based on the following simple idea: Apply an existing randomized trace estimator to $\Trace(\mtx{B}(t))$ with \emph{constant} random vectors, that is, the same randomization is used for each value of the parameter $t$. For example, the Girard-Hutchinson estimator becomes $n_{\mtx{\Psi}}^{-1} \sum_{j=1}^{n_{\mtx{\Psi}}} \vct{\psi}_j^{\top} \mtx{B}(t) \vct{\psi}_j$ for $n_{\mtx{\Psi}}$ constant Gaussian random vectors $\vct{\psi}_1, \dots, \vct{\psi}_{n_{\mtx{\Psi}}}$.
 ??? STOP HERE ???
 
 
diff --git a/paper/paper.tex b/paper/paper.tex
@@ -40,7 +40,7 @@
 \textcolor{red}{TODO for Fabio:
 \begin{itemize}
 %\item Add author list. Matti first, rest (He/Kressner/Lam) in alphabetical order.
- \item The text style is a bit too generous but no need to change this now. \textcolor{green}{Ok! Let's talk about it at some point.}
+ \item The text style is a bit too generous but no need to change this now.
 \end{itemize}
 }
 \color{blue}

Original file line number	Diff line number	Diff line change
`@@ -40,7 +40,7 @@`
`40`	`40`	`\textcolor{red}{TODO for Fabio:`
`41`	`41`	`\begin{itemize}`
`42`	`42`	`%\item Add author list. Matti first, rest (He/Kressner/Lam) in alphabetical order.`
`43`		`- \item The text style is a bit too generous but no need to change this now. \textcolor{green}{Ok! Let's talk about it at some point.}`
	`43`	`+ \item The text style is a bit too generous but no need to change this now.`
`44`	`44`	`\end{itemize}`
`45`	`45`	`}`
`46`	`46`	`\color{blue}`