In [1]:

import numpy as np
from matplotlib import pyplot as plt

PCA and Contour Integration¶

Tools from complex analysis can be very useful in linear algebra, even for real positive semidefinite matrices whose eigenvalues and eigenvectors are also real. The Stieltjes transform commonly used in random matrix theory comes to mind. In this note, I'll describe contour integration and the interesting way that it can be used to extract subspaces corresponding to the eigenvalues of a matrix, such as in principal component analysis (PCA).

But first, let's consider the basic PCA problem. Suppose we have a positive semidefinite matrix $\mathbf{A} \in \mathbb{R}^{d \times d}$ having eigenvalue decomposition $\mathbf{A} = \mathbf{U} \mathbf{D} \mathbf{U}^\top$. If this matrix is an estimate of the covariance of our data, then we can find a rank-$r$ projection described by $\mathbf{V} \in \mathbb{R}^{d \times r}$ by using a rank-$r$ truncation of $\mathbf{A}$, keeping only the largest $r$ eigenvalues. That is, we should choose $\mathbf{V} = \mathbf{U}_{1:r,:}$ as our low-dimensional projection operator, which will preserve that most variance of the data.

Continuous PCA?¶

The classical approach described above is very discrete and is straightforward to apply if you know the eigenvalue decomposition. However, perhaps we can approach this problem in a more "continuous" manner. Suppose we want to find the low-rank projection matrix $\mathbf{V} \mathbf{V}^\top$ for the principal subspace of $\mathbf{A}$ corresponding to all eigenvalues larger than $\lambda_0$. To make things slightly simpler, we can instead find the complement of this matrix, $\mathbf{Q} \mathbf{Q}^\top = \mathbf{I} - \mathbf{V} \mathbf{V}^\top$. We can obtain this matrix if we can apply the following transformation $g \colon \mathbb{R} \to \mathbb{R}$ to the eigenvalues $\lambda_1 \leq \ldots \leq \lambda_d$ of $\mathbf{A}$:

$$ g(\lambda) = \begin{cases} 1 & \text{if } \lambda < \lambda_0, \\ 0 & \text{if } \lambda > \lambda_0, \\ c_g & \text{if } \lambda = \lambda_0. \end{cases} $$

Here we let $g(\lambda_0) = c_g$ since we don't really care what happens at $\lambda_0$. We can obtain this function as the limit as $k \to \infty$ of the following parameterized continuous function:

$$ g_k(\lambda) = \frac{\lambda_0^k}{\lambda^k + \lambda_0^k}. $$

The nice thing about this form is that we can apply $g_k$ to $\mathbf{A}$ (considering a scalar function applied to a matrix to be the result of applying the scalar function to the eigenvalues of the matrix) using only matrix products, sums, and an inverse, such that

$$ \mathbf{Q} \mathbf{Q}^\top = \lim_{k \to \infty} g_k(\mathbf{A}) = \lim_{k \to \infty} \lambda_0^k (\mathbf{A}^k + \lambda_0^k \mathbf{I})^{-1}. $$

This may not yet seem like anything useful, since the matrix needs to be raised to a large power and then inverted. This suggests finding a simplification of $g_k$, and a partial fraction decomposition offers just such a thing.

Partial Fraction Decomposition¶

By Taylor's theorem, if $z_1, \ldots, z_k \in \mathbb{C}$ are the unique singleton roots of $x \mapsto x^k + \lambda_0^k$, then

$$ g_k(\lambda) = \lambda_0^k \sum_{j=1}^k \frac{a_j}{\lambda - z_k}, \text{ where } a_j = \frac{1}{\prod_{j' \neq j} (z_j - z_{j'})}. $$

The roots $(z_j)_{j=1}^k$ are the solutions to $x^k = -\lambda_0^k$, which means they have the simple form of being equally distributed on the complex circle of radius $\lambda_0$:

$$ z_j = \lambda_0 e^{\frac{(2j + 1)\pi i}{k}}. $$

We can vizualize this for a couple of different values of $k$ for $\lambda_0 = 1$.

In [2]:

unit_range = (-1.3, 1.3)

def scatter_complex(x, ax=None, *args, **kwargs):
    if ax is None:
        ax = plt.gca()
    ax.scatter(np.real(x), np.imag(x), *args, **kwargs)

plt.figure(dpi=100, figsize=(7, 3))
for i, k in enumerate([3, 4, 5]):
    
    zs = np.roots([1] + [0] * (k - 1) + [1])
    
    ax = plt.subplot(1, 3, i + 1)
    ax.add_patch(plt.Circle((0, 0), 1, ls='--', fill=False, zorder=-1))
    scatter_complex(zs, ax=ax)
    
    ax.set_xlim(unit_range)
    ax.set_ylim(unit_range)
    ax.set_aspect(1)
    plt.title('k = %d' % k)
    if i == 0:
        plt.ylabel('Im(z)')
    plt.xlabel('Re(z)')
    
plt.tight_layout()
plt.show()

Let us evaluate the coefficients $a_j$. The expression is simple given two facts. One is that $z_j^{k-1} = -\lambda_0^k / z_j$ by the definition of $z_j$. The second is that

$$ \prod_{j' = 1}^{k-1} \left(1 - e^{\frac{2 \pi i j'}{k}} \right) = k, $$

which is a natural consequence of combining $x^k - 1 = (x - 1) \sum_{j=1}^k x^j$ (telescoping series) and $x^k - 1 = \prod_{j=1}^k \left(1 - e^{\frac{2 \pi i j}{k}} \right)$ and then plugging in $x = 1$.

Then, if we start by factoring out the $z_j$, we obtain

\begin{align*} a_j &= \frac{1}{z_j^{k-1}} \frac{1}{\prod_{j' = 1}^{k-1} \left(1 - e^{\frac{2 \pi i j'}{k}} \right)} \\ &= \frac{-z_j}{k \lambda_0^k}, \end{align*}

which gives us the much simpler form for $g_k$:

$$ g_k(\lambda) = \frac{-1}{k} \sum_{j=1}^k \frac{z_j}{\lambda - z_j}. $$

This in turn gives a much simpler form for the application of $g_k$ to matrices, since it depends only on resolvents:

$$ g_k(\mathbf{A}) = \frac{-1}{k} \sum_{j=1}^k z_j (\mathbf{A} - z_j \mathbf{I})^{-1}. $$

Let's quickly verify our partial fraction decomposition for $k = 10$.

In [3]:

k = 10
lamda_0 = 0.5

lamdas = np.linspace(0, 1)
zs = np.roots([1] + [0] * (k - 1) + [lamda_0 ** k])

plt.figure(dpi=100, figsize=(4, 3))
plt.plot(lamdas, lamda_0 ** k / (lamdas ** k + lamda_0 ** k), label=r'$\frac{\lambda_0^k}{\lambda^k + \lambda_0^k}$')
plt.plot(lamdas, -np.real(np.mean(zs[:, None] / (lamdas[None, :] - zs[:, None]), axis=0)), '--', label=r'$\frac{-1}{k} \sum_{j=1}^k \frac{z_j}{\lambda - z_j}$')
plt.xlabel(r'$\lambda$')
plt.legend()
plt.show()

Taking $k \to \infty$¶

Of course, to extract the PCA subspace, we need to consider very large $k$. What is curious about the form of $g_k$ that we just obtained is that the values of $z_j$ become increasingly dense on the circle of radius $\lambda_0$, which suggests that by taking the limit we obtain a path integral.

In [4]:

k = 50

plt.figure(dpi=100, figsize=(4, 3))
zs = np.roots([1] + [0] * (k - 1) + [1])
    
ax = plt.gca()
ax.add_patch(plt.Circle((0, 0), 1, ls='--', fill=False, zorder=-1))
scatter_complex(zs, ax=ax)
    
ax.set_xlim(unit_range)
ax.set_ylim(unit_range)
ax.set_aspect(1)
plt.title('k = %d' % k)
plt.ylabel('Im(z)')
plt.xlabel('Re(z)')
    
plt.show()

We can write $\Delta_j = (z_{j + 1} - z_j)$ as we make our way counter-clockwise around the circle. Then for large $k$ this approximation holds well:

\begin{align*} \Delta_j &= z_j \left(e^{\frac{2 \pi i}{k}} - 1 \right) \\ &\approx z_j \frac{2 \pi i}{k}. \end{align*}

Therefore, for $\lambda \neq \lambda_0$, the Riemann sum $g_k(\lambda)$ approaches the Riemann contour integral for $\gamma = \{\lambda_0 e^{\theta i} \colon \theta \in [0, 2 \pi) \}$:

\begin{align*} g(\lambda) &= \lim_{k \to \infty} g_k(\lambda) \\ &= \frac{-1}{2 \pi i} \oint_\gamma \frac{dz}{\lambda - z}. \end{align*}

Returning to PCA¶

Based on the above reasoning, it seems that we can obtain the PCA subspace via contour integration as long as $\lambda_0$ is not an eigenvalue of $\mathbf{A}$; i.e.,

\begin{align*} \mathbf{Q} \mathbf{Q}^\top &= g(\mathbf{A}) \\ &= \frac{-1}{2 \pi i} \oint_\gamma (\mathbf{A} - z \mathbf{I})^{-1} dz. \end{align*}

Let's verify this. Consider an arbitrary $\mathbf{B} \in \mathbb{R}^{d \times d}$. By Cauchy's residue theorem, we have

$$ \mathrm{tr} [g(\mathbf{A}) \mathbf{B}] = - \sum_{j : \lambda_j < \lambda_0} \mathrm{Res} (z \mapsto \mathrm{tr} [(\mathbf{A} - z \mathbf{I})^{-1} \mathbf{B}], \lambda_j). $$

The residue has a simple form:

\begin{align*} \mathrm{Res} (z \mapsto \mathrm{tr} [(\mathbf{A} - z \mathbf{I})^{-1} \mathbf{B}], \lambda_j) &= \lim_{z \to \lambda_j} (z - \lambda_j) \mathrm{tr} [(\mathbf{A} - z \mathbf{I})^{-1} \mathbf{B}] \\ &= - \mathrm{tr} [\tilde{U}_j \tilde{U}_j^\top \mathbf{B}], \end{align*}

where $\tilde{U}_j$ is the orthonormal matrix that spans the eigenvectors of $\lambda_j$. Therefore we indeed obtain

$$ \mathrm{tr} [g(\mathbf{A}) \mathbf{B}] = \mathrm{tr} [\mathbf{Q} \mathbf{Q}^\top \mathbf{B}], $$

and since $\mathbf{B}$ was arbitrary, we have $\mathbf{Q} \mathbf{Q}^\top = g(\mathbf{A})$.

General Principal Subspaces¶

Although in the above derivation we set $\gamma$ to be a circle centered at the origin, it could be any closed curve enclosing any of the eigenvalues of $\mathbf{A}$, and the contour integral yields the projection operator onto the principal subspace of the enclosed eigenvalues. This is a key idea in the resolvent formalism, which enables the application of techniques from complex analysis to spectral analysis of operators.