Cauchy PCA | Daniel LeJeune

One of the cool things about math is that you can arrive at the same conclusion different ways. Not only does this provide you with many ways to reach the conclusion yourself and thereby increase your odds of success, but it also deepends your understanding and connection with mathematics once you discover the other ways to prove something. I want to share how I discovered one particular result connecting the principal eigenspace of a matrix with countour integration of the resolvent.

The statement and the “right” way

The resolvent formalism is a very powerful tool in operatory theory, very frequently used to characterize the spectrum of a matrix in random matrix theory, which is where I encountered it. Given a matrix $A$, the resolvent is the matrix $(A - zI)^{-1}$ for $z \in \mathbb{C}$. The statement that I discovered is the following.

Proposition. Let $A$ be a normal matrix and let $\gamma_\Lambda$ be a closed curve in $\mathbb{C}$ encircling the set of eigenvalues $\Lambda$ of $A$ and no other eignevalues. Then $U_\Lambda = -\frac{1}{2 \pi i}\oint_{\gamma_\Lambda} (A - z I)^{-1} dz$ is the projection operator onto the direct sum of the eigenspaces of the eigenvalues $\Lambda$.

For example, take $\Lambda$ to be the top $k$ eigenvalues of a Gram matrix. Then $U_\Lambda$ is the PCA projection matrix, which can be represented as a contour integral, no explicit eigenvalue decomposition required. Pretty interesting to see such a very practical thing like PCA, which I think about computationally, able to be studied using complex analysis!

Now the “right” way to prove this, which is pretty obvious to someone with a complex analysis background, is to use Cauchy’s residue theorem, which tells us that for any matrix $B$,

$$ \mathrm{tr}[U_\Lambda B] = \sum_{\lambda \in \Lambda} \lim_{z \to \lambda} (z - \lambda) \mathrm{tr}[(A - z I)^{-1} B]. $$

From there the right-hand side limits for each $\lambda$ pretty clearly converge to $\mathrm{tr}[U_\lambda B]$ for corresponding eigenspace projection operators $U_\lambda$, and then since $A$ is normal its eigenvectors are orthogonal and so the sum converges to the left-hand side. Since $B$ was arbitrary, we have exactly described $U_\Lambda$.

That’s the “right” way to prove this, and the “right” way to learn this would have been to crack open a text on operator theory, as this is a pretty classic formulation.

The longer but fun way

I, however, discovered the above result a different way. I had become familiar with the resolvent as a powerful tool used in random matrix theory and wanted to study PCA, so I was trying to figure out if there was a way that I could express PCA in terms of resolvents. I decided to target a function

$$ f(\lambda) = \begin{cases} 1 & \colon & \lambda > \lambda_0 \\ 0 & \colon & \lambda < \lambda_0 \end{cases}. $$

If I could apply $f$ to $A$ (in the operator sense, where $f$ is applied to each eigenvalue), then I would recover the PCA projection opereator for the eigenvalues greater than $\lambda_0$. That is, for $\Lambda = \{\lambda \colon \lambda > \lambda_0\}$, $U_\Lambda = f(A)$.

For reasons that will be clearer in a bit, it’s actually a bit easier to study $g(\lambda) = 1 - f(\lambda)$, so I’ll make that switch here.

This functions $f$ and $g$ of course don’t really make things any clearer, but we could take a sequence of functions $g_1, g_2, \ldots$ that converge to $g$ that might help. One such sequence is

$$ g_k(\lambda) = \frac{\lambda_0^k}{\lambda^k + \lambda_0^k}. $$

We still haven’t really gained anything immediately. We can technically apply this function at the operator level since it’s a rational polynomial, and get $U_\Lambda = \lim_{k \to \infty} \lambda_0^k (A^k + \lambda_0^k I)^{-1}$, which is sort of resolvent-like, but it’s resolvents of powers of $A$ rather than of $A$ itself. This form, if reimagined as a scalar rational polynomial, does bring back memories of using partial fraction decompositions to solve college integral calculus problems, which provides a way to simplify things.

Partial fraction decomposition

The polynomial $\lambda^k + \lambda_0^k$ is nice because its roots are scaled $k$-th roots of $-1$:

$$ z_j = \lambda_0 e^{\frac{(2j + 1) \pi i}{k}} \text{ for } j \in [k]. $$

Here are a few of those visualized.

Roots

Since the roots are simple, Taylor’s theorem gives us a pretty simple form for the decomposition (note that this is where $g$ is simpler than $h$, as we can just pull out the constant numerator):

$$ g_k(\lambda) = \lambda_0^k \sum_{j=1}^k \frac{a_j}{\lambda - z_j} \text{ where } a_j = \frac{1}{\prod_{j' \neq j} (z_j - z_{j'})}. $$

This is starting to look great! That sum is very resolvent-like; we just need to evaluate these $a_j$ coefficients. We’ll use two facts to do that. First, $z_j^{k-1} = -\lambda_0^k / z_j$. The second is a bit less obvious (update: see an expanded note with better intuition):

$$ \prod_{j=1}^{k-1} \Big( 1 - e^{\frac{2 \pi i j}{k}} \Big) = k. $$

Proof. First, note that $x^k - 1 = x^k - x^{k-1} + x^{k-1} + \ldots - x + x - 1 = (x - 1) \sum_{j=0}^{k-1} x^j$ (telescoping sum). Next, note that the roots of $x^k - 1$, similarly to the roots of $\lambda^k + \lambda_0^k$ above but with a sign flip, are $e^{\frac{2 \pi i j}{k}}$ ($k$ distinct values). Therefore,

$$ x^k - 1 = \prod_{j=0}^{k-1} \Big( x - e^{\frac{2 \pi i j}{k}} \Big) = (x - 1) \sum_{j=0}^{k-1} x^j. $$

Finally, note that one of the factors in the product is $(x-1)$, which cancels out the same factor in front of the sum, and then we can take the limit as $x \to 1$ to obtain the result. □

Now we can compute the $a_j$, or to be a little cleaner, the reciprocals.

$$ \frac{1}{a_j} = z_j^{k-1} \prod_{j'-1}^{k-1} \Big( 1 - e^{\frac{2 \pi i j'}{k}} \Big) = - \frac{\lambda_0^k k}{z_j} $$

And thus our $g_k$ functions are really quite simple:

$$ g_k(\lambda) = - \frac{1}{k} \sum_{j=1}^k \frac{z_j}{\lambda - z_j}. $$

And we can even apply $g_k$ to $A$ directly in terms of the resolvents.

$$ g_k(A) = - \frac{1}{k} \sum_{j=1}^k z_j (A - z_j I)^{-1}. $$

To sanity check, let’s plot an example with $\lambda_0 = 0.5$.

$Compare with partial fraction decomposition$

Taking $k \to \infty$

We ultimately want the limiting function $g$ rather than $g_k$, which is what corresponds to PCA. But since our sum is over points $z_j$ which are increasingly dense on the circle of radius $\lambda_0$, this suggests that our sum converges to a path integral. What we are missing is the $dz$ term, which for the discrete sum corresponds to $\Delta_j = (z_{j+1} - z_j)$. Due to the nice form of $z_j$, we obtain a simple form with an even simpler approximation:

$$ \Delta_j = z_j \Big( e^{\frac{2 \pi i}{k}} - 1 \Big) \approx z_j \frac{2 \pi i}{k}. $$

This approximation is exact asymptotically, which means it corresponds to the limiting integral. We just need to make sure that we introduce it in our sum.

$$ g_k(\lambda) = - \frac{1}{k} \sum_{j=1}^k \frac{z_j}{\lambda - z_j} \frac{\Delta_j}{\Delta_j} = - \frac{1}{2 \pi i} \sum_{j=1}^k \frac{\Delta_j}{\lambda - z_j}. $$

And therefore, taking the limit with $\gamma$ defined as the complex unit circle in of radius $\lambda_0$,

$$ g(\lambda) = -\frac{1}{2 \pi i} \oint_\gamma \frac{dz}{\lambda - z}. $$

The extension from $\lambda$ to $A$ follows naturally as before. Of course, $g$ is actually the complement of $h$ and actually gets us the projection operator of the orthogonal complement of the principal eigenspace, which is the projection operator for the eigenvectors with eigenvalue below $\lambda_0$. From here one could add and subtract differnet choices of $\lambda_0$ to carve out any desired subset of the spectrum for a projection operator. Of course, basic properties of holomorphic functions allow you to simplify that to a single contour integral as stated in the original statement.