Skip to content

Algorithm choices for Koopman Analysis and DMD

Spectral analysis of nonlinear flows

Assume one has a linear dynamical system. n is so large that we cannot compute the eignevalues of A directly.

xk+1=Axk,xkRn\pmb{x}_{k+1} = \pmb{Ax}_k, \quad \pmb{x}_{k}\in\mathbb{R}^n

To estimate these eigenvalues, a standard method is a Krylov method. One starts with a random vector x0x_0 and computes iterates of it, until one gets a collection of m vectors that span a Krylov subspace. To find the approximate eigenvalues and eigenvectors, one projects A onto this m-dimensional subspace, and calculate the eigenvalues and eigenvectors of the resulting low-rank operator. The projection means to represent A with linear combinations of the basis of Km\mathcal{K}_m (which basis are independent)

Km:=span{x0,Ax0,,Am1x0}\mathcal{K}_m := span\{\pmb{x}_0,\pmb{Ax}_0,\cdots,\pmb{A}^{m-1}\pmb{x}_0\} K=[x0Ax0A2x0Am1x0]=[x0x1x2xm1]\begin{align} \pmb{K} &= [\pmb{x}_0 \quad \pmb{Ax}_0 \quad \pmb{A}^2\pmb{x}_0 \quad \cdots \quad \pmb{A}^{m-1}\pmb{x}_0]\\ &= [\pmb{x}_0 \quad \pmb{x}_1 \quad \pmb{x}_2 \quad \cdots \quad \pmb{x}_{m-1}] \end{align}

Arnoldi algorithm is a type of Krylov method in which one orthonormalizes the iterates at each step, and it therefore involves computing the action of A on arbitrary vectors. Below if one variant by (Saad 1980) and (Schmid 2008) The logic. As the number of snapshots increase the Km\mathcal{K}_m captures the dominant features of the underlying physical process, it is reasonable to assume that, beyond a critical number of snapshots, adding further snapshots to the data sequence will not improve the Km\mathcal{K}_m. When the limit is reached, one can express the vector xm\pmb{x}_m as a linear combination of the previous, and linearly independent vectors.

xm=Axm1=[x0x1x2xm1][c0c1cm1]\pmb{x}_m = \pmb{Ax}_{m-1} = \begin{bmatrix} \pmb{x}_0 \quad \pmb{x}_1 \quad \pmb{x}_2 \quad \cdots \quad \pmb{x}_{m-1} \end{bmatrix} * \begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_{m-1} \end{bmatrix} xm=Kc\pmb{x}_m = \pmb{Kc}

Expand it to matrices

AK=[Ax0Ax1Axm1]=[x1x2xm]=[x0x1x2xm1][000c0100c1010c2001cm1]\begin{align} \pmb{AK} &= [\pmb{Ax}_0 \quad \pmb{Ax}_1 \quad \cdots \quad \pmb{Ax}_{m-1}] \\ &= [\pmb{x}_1 \quad \pmb{x}_2 \quad \cdots \quad \pmb{x}_m] \\ &= [\pmb{x}_0 \quad \pmb{x}_1 \quad \pmb{x}_2 \quad \cdots \quad \pmb{x}_{m-1}] * \begin{bmatrix} 0 & 0 & \cdots & 0 & c_0 \\ 1 & 0 & & 0 & c_1 \\ 0 & 1 & & 0 & c_2 \\ \vdots & & \ddots & & \vdots \\ 0 & 0 & \cdots & 1 & c_{m-1} \\ \end{bmatrix} \end{align} AK=KC\pmb{AK} = \pmb{KC}

Matrix C\pmb{C} is of companion type. The eigenvalues of C\pmb{C} then approximate some of the eigenvalues of A\pmb{A}. (A\pmb{A} and C\pmb{C} are similar matrices, thus they have the same eigenvalues)

Cα=λα\pmb{C\alpha} = \lambda \pmb{\alpha} AKα=KCα=Kλα=λKα\pmb{AK\alpha} = \pmb{KC\alpha} = \pmb{K}\lambda\pmb{\alpha} = \lambda \pmb{K\alpha}

It is easy to prove that v=Kα\pmb{v}=\pmb{K\alpha} is an eigenvector of A\pmb{A}, with eigenvalue λ\lambda


Now comes the real situation where the m-th interate is not a linear combination of the previous iterates, there will be a residual adding to the equation:

xm=Kc+r\pmb{x}_m = \pmb{Kc} + \pmb{r}

The residual should be minimized by carfully choosing c\pmb{c} to make that

rspan{x0,Ax0,,Am1x0}\pmb{r} \perp span\{\pmb{x}_0,\pmb{Ax}_0,\cdots,\pmb{A}^{m-1}\pmb{x}_0\}

It is clearly stated in (Budisic 2012) that c\pmb{c} is chosen to minimize the norm of r\pmb{r}, corresponding to choosing the right projection operator so that your new observales are the least-square approximations to the original dynamic system. This implies that if the norm of r\pmb{r} is sufficiently small, then in the following equation, C\pmb{C} is thought of as an approximation of the action of the Koopman operator on the associated finite dimensional space Km\mathcal{K}_m. (Raak, 2016)
In this case, we can expand the relation to:

AK=KC+reT=[x0x1xm1][000c0100c1010c2001cm1]+[000r]\pmb{AK} = \pmb{KC} + \pmb{r}\pmb{e}^T = [\pmb{x}_0 \quad \pmb{x}_1 \quad \cdots \quad \pmb{x}_{m-1}] \begin{bmatrix} 0 & 0 & \cdots & 0 & c_0 \\ 1 & 0 & & 0 & c_1 \\ 0 & 1 & & 0 & c_2 \\ \vdots & & \ddots & & \vdots \\ 0 & 0 & \cdots & 1 & c_{m-1} \\ \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ r \\ \end{bmatrix}

Ritz values - eigenvalues of C\pmb{C}, are approximations to the eigenvalues of A\pmb{A}
Ritz vectors - corresponding approximate eigenvectors of A\pmb{A}, given by v=Kα\pmb{v}=\pmb{K\alpha}

In the literature on DMD, the above procedure is sometimes called Arnoldi algorithm, which, in our opinion, is misplaced. In our opinion, the proper name should be Krylov-Rayleigh-Ritz procedure to point out that the procedure described here is just the Rayleigh-Ritz in a subspace defined by the Krylov basis. The Arnoldi algorithm is the method of applying the Gram-Schmidt orthogonalization (QR factorization) implicitly on the sequence of the xi\pmb{x}_i. A true Arnoldi scheme in the DMD frame work is proposed only recently in “A parallel Dynamic Mode Decomposition algorithm using modified Full Orthogonalization Arnoldi for large sequential snapshots”. Remark 7.11

(Note there are following variants of Arnoldi method and eventually leading the problem from Koopman Mode calculations to the degenerated verison, Dynamic Mode. The full Arnoldi method reduces it to an upper Hessenberg matrix, while the simplized arnoldi focuses on the companion matrix C\pmb{C} we talked above.)

  1. Arnoldi. A\pmb{A} -> Henssenberg, by simple QR-decomposition of K\pmb{K}. One successively orthogonormalizes the vectors of K\pmb{K} resulting in a decomposition of the form AQQH\pmb{AQ}\approx\pmb{QH} with J=QR\pmb{J}=\pmb{QR} and H=RSR1\pmb{H}=\pmb{RSR}^{-1} as a Henssenberg matrix. Use the eigenvalues of H\pmb{H} to approximate some of the eigenvalues of A\pmb{A} (Schmid 2010)
  2. Classical Arnoldi. A\pmb{A} -> Henssenberg, by a sequence of projections onto successive Krylov subspaces. This is more stable but the matrix A\pmb{A} has to be available. (Schmid 2010)
  3. Variant Arnoldi. A\pmb{A} -> Companion. From observation c=Kxm\pmb{c}=\pmb{K}^\dagger \pmb{x}_m is one solution. The dagger denotes the Moore-Penrose pseudoinverse. This solution is unique if and only if K\pmb{K} has linearly independent columns. In this case, we can use the pseudoinverse expansion K=(KK)1K\pmb{K}^\dagger=(\pmb{K}^*\pmb{K})^{-1}\pmb{K}^*. This is analytically correct but ill-conditioned in practice especially applied on noisy experimental data 2
  4. Variant Arnoldi: QR DMD. A\pmb{A} -> Companion. Denote K=QR\pmb{K}=\pmb{QR} as the economy-size QR-decomposition of the data sequence, then the last column of the companion matrix C\pmb{C} is c=R1QHxm\pmb{c}=\pmb{R}^{-1}\pmb{Q}^H\pmb{x}_m. However, in practice this could be ill-conditioned that is often not capable of extracting more than the first one or two dominant dynamic modes. (Schmid 2010)
  5. Variant Arnoldi: Standard DMD. A\pmb{A} -> Companion. Introduce SVD and project A\pmb{A} onto a POD basis. No orthogonalization step, low algorithmic stability and convergence property, but allows ‘model-free’ application. (Schmid 2010)
  6. Variant Arnoldi: Proney-type method or Hankel DMD. It uses time delays and Drylov subspaces in observable space. Its advantage is that it prevents the rank deficiency observed in the original Arnoldi-type algorithm. 1

(As the conclusion, we may get the algorithm scrapped from (Rowley 2009). However, the paper details nothing about how to get the proper constants c\pmb{c})

Input xk+1=Axk,xkRn\pmb{x}_{k+1} = \pmb{Ax}_k, \quad \pmb{x}_{k}\in\mathbb{R}^n
Output empirical Ritz values λj\lambda_j, empirical Ritz vectors vj\pmb{v}_j

  1. Define K=[x0Ax0Am1x0]\pmb{K} = [\pmb{x}_0 \quad \pmb{Ax}_0 \quad \cdots \quad \pmb{A}^{m-1}\pmb{x}_0]
  2. Find constants c\pmb{c} such that
    r=xmKc\pmb{r}=\pmb{x}_m-\pmb{Kc}
    rspan{x0,Ax0,,Am1x0}\pmb{r}\perp span\{\pmb{x}_0,\pmb{Ax}_0,\cdots,\pmb{A}^{m-1}\pmb{x}_0\}
  3. Define companion matrix:
    C=[000c0100c1010c2001cm1]\pmb{C}=\begin{bmatrix} 0 & 0 & \cdots & 0 & c_0 \\ 1 & 0 & & 0 & c_1 \\ 0 & 1 & & 0 & c_2 \\ \vdots & & \ddots & & \vdots \\ 0 & 0 & \cdots & 1 & c_{m-1} \\ \end{bmatrix}
  4. Find its eigenvalues and eigenvectors by C=T1ΛT\pmb{C}=\pmb{T}^{-1}\pmb{\Lambda T}
  5. empirical Ritz values are diagonal values of Λ\pmb{\Lambda}
  6. empirical Ritz vectors are columns of V=KT1\pmb{V}=\pmb{KT}^{-1}

But, how to understand these values?
If the linear relationship holds, these empirical Ritz values are the usual Ritz values of A\pmb{A} after m steps of the Arnoldi method. And, the Ritz values and vectors are good approximations of the eigenvalues and eigenvectos of A\pmb{A}.
If not hold, it produces nothing but approximations of the Koopman modes and associated eigenvalues. Why is that?


In step-6 we defined that V=KT1\pmb{V}=\pmb{KT}^{-1} and through a basic transformation we get:

K=VT=[v1vm][1λ1λ12λ1m11λ2λ22λ2m11λmλm2λmm1]\pmb{K} = \pmb{VT} = [\pmb{v}_1 \quad \cdots \quad \pmb{v}_m] * \begin{bmatrix} 1 & \lambda_1 & \lambda^2_1 & \cdots & \lambda^{m-1}_1 \\ 1 & \lambda_2 & \lambda^2_2 & \cdots & \lambda^{m-1}_2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \lambda_m & \lambda^2_m & \cdots & \lambda^{m-1}_m \\ \end{bmatrix} AK=KC+r=KT1ΛT+r=VΛT+r=[v1vm][λ1λ2λm][1λ1λ12λ1m11λ2λ22λ2m11λmλm2λmm1]+[00r]\begin{align} \pmb{AK}&=\pmb{KC}+\pmb{r} = \pmb{KT}^{-1}\pmb{\Lambda T}+\pmb{r} = \pmb{V\Lambda T}+\pmb{r} \\ &=[\pmb{v}_1 \quad \cdots \quad \pmb{v}_m] \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_{m} \\ \end{bmatrix} \begin{bmatrix} 1 & \lambda_1 & \lambda^2_1 & \cdots & \lambda^{m-1}_1 \\ 1 & \lambda_2 & \lambda^2_2 & \cdots & \lambda^{m-1}_2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \lambda_m & \lambda^2_m & \cdots & \lambda^{m-1}_m \\ \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ \vdots \\ r \\ \end{bmatrix} \end{align}

Remember that K=[x0x1xm1]\pmb{K} = [\pmb{x}_0 \quad \pmb{x}_1 \quad \cdots \quad \pmb{x}_{m-1}] and observed from the equations above we have

xk=j=1mλjkvj,k=0,,m1\pmb{x}_k = \sum^m_{j=1}{\lambda^k_j\pmb{v}_j}, \quad k = 0,\cdots,m-1 xm=j=1mλjkvj+r,rspan{x0,x1,,xm1}\pmb{x}_m = \sum^m_{j=1}{\lambda^k_j\pmb{v}_j} + \pmb{r}, \quad \pmb{r}\perp span\{\pmb{x}_0,\pmb{x}_1,\cdots,\pmb{x}_{m-1}\}

The spatial & temporal attributes

space << time, n << m. The rank of K\pmb{K} is at most n, that is, the rank-deficiency is inevitable when computing c\pmb{c}, which implies no uniqueness of c\pmb{c}. (It if unique when and only when K\pmb{K} has linear independent columns, i.e., full-ranked)

space >> time, n >> m. Especially in fluid mechanics, K\pmb{K} may have a row dimension over O(1051010)\mathcal{O}(10^5\sim 10^{10}) and a column dimension at O(101103)\mathcal{O}(10^1\sim 10^3). In this case, KK\pmb{K}^*\pmb{K} is much smaller than K\pmb{K}. So the DMD method by Schmid is much more efficient for less memory cost. (This is another kind of rank-deficiency where you truncate the space dimensions more essily with SVD) 2 (The DMD is more suitable for local short-time dynamics. Chen calls it as “snapshots approach”) The tacit assumption in a DMD analysis is that m<<n, that the number of snapshots m is much smaller than the ambient space dimension n. In other words, the action of A\mathbb{A} is empirically known only in a small-dimensional subspace of its domain, Rn\mathbb{R}_n. 1 P163

Examples:
28 × 241 Arnoldi for the KMD of conditioned room air temperature (Hiramatsu 2020)
20 × 24~120 141× 48~132 Arnoldi vs. DMD wind power farm (Raak 2016)
7 × 24~240 Arnoldi vs. DMD power system (Raak 2016)
1024^2 × 150~250 DMD on high-speed flame images (Ghosal 2016)
256 × 201 × 144 ~ 251 Arnoldi-like DMD on horse-shoe turbulence (Rowley 2009)


Tips
The eigenvalues of the following offset diagonal matrix, are the n-th roots of unity, λj=e2πij/n\lambda_j=e^{2\pi ij/n}, because you will always get λn=1\lambda^n=1 solving the matrix.

[0001100001000010]\begin{bmatrix} 0 & 0 & \cdots & 0 & 1 \\ 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \\ \end{bmatrix}

SVD ![[SVD.jpg | 500 ]]

[U,S,V] = svd(A)
[U,S,V] = svd(A,"econ")
  1. Full SVD
  2. Thin/economy-sized SVD (remove columns of U not corresponding to rows of V*)
  3. Compact SVD (truncate to singular values)
  4. Truncated SVD (truncate to the largest singular value)

QR decomposition Q - an orthogonal (QT=Q1Q^T=Q^{-1}) or unitary matrix (Q=Q1Q^*=Q^{-1}) R - an upper triangular matrix The decomposition follows the Gram-Schmidt process. If we presume that A=QR\pmb{A}=\pmb{QR} there will be:

(v1,v2,,vk)=(e1,e2,,ek)[u1proju1(v2)proju1(vk)0u2proju2(vk)00projuk1(vk)00uk](\pmb{v}_1, \pmb{v}_2, \cdots, \pmb{v}_k)=(\pmb{e}_1, \pmb{e}_2, \cdots, \pmb{e}_k) * \begin{bmatrix} \Vert \pmb{u}_1 \Vert & proj_{\pmb{u}_1}(\pmb{v}_2) & \cdots & proj_{\pmb{u}_1}(\pmb{v}_k) \\ 0 & \Vert \pmb{u}_2 \Vert & \cdots & proj_{\pmb{u}_2}(\pmb{v}_k) \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & proj_{\pmb{u}_{k-1}}(\pmb{v}_k) \\ 0 & 0 & \cdots & \Vert \pmb{u}_k \Vert \\ \end{bmatrix}

Reference

Footnotes

  1. The Koopman operator in systems and control: Concepts, methodologies, and applications. Springer. 2020 2 3

  2. Variants of Dynamic Mode Decomposition: Boundary Condition, Koopman, and Fourier Analyses. J Nonlinear Sci. 2012. 2