There’s a neat result on matrix decompositions in a paper by Schirru et al. fresh from the arxiv (here). Take two square matrices, A and B. Assume that the matrices commute {\mathbf{AB=BA}}. Then they have the same eigenvectors, because if

\displaystyle  \mathbf{Av}=\lambda\mathbf{v}


\displaystyle  \mathbf{BAv}=\lambda\mathbf{B}\mathbf{v}

but also

\displaystyle  \mathbf{ABv}=\lambda\mathbf{Bv}

This means that if v is an eigenvector of A, then Bv is an eigenvector of A too. This is only possible if A and B have the same eigenvectors, because B is not allowed to map v to something that is not an eigenvector of A.

What this means for computation is that once you have the eigendecomposition of A, you can get that of B for cheap.


\displaystyle  \mathbf{A}=\mathbf{U}\mathbf{D}_{A}\mathbf{U}^{t}


\displaystyle  \mathbf{B}=\mathbf{U}\mathbf{D}_{B}\mathbf{U}^{t}

and we got U from the eigendecomposition of A, all we need now is:

\displaystyle  \mathbf{D}_{B}=\mathbf{U}^{t}\mathbf{B}\mathbf{U}

and we now that the matrix {\mathbf{D}_{B}} is diagonal so that we don’t need to compute explictly every element.

Schirru et al. apply this result to the kernel matrix in Gaussian process regression, and use this to speed up inference over kernel hyperparameters. It doesn’t solve the initial problem of having to compute the eigendecomposition of a large matrix, but it is very useful nonetheless.