类似知识

PCA

待写

SVD

待写

Tensor Decomposition

用于feature extraction、dimensionality reduction和knowledge discovery。随着时间的变化，张量的数据可能增加、减少或者修改在他任何的维度上。现实生活中最常用的动态张量就是随着时间变化的数据，而其它的维度保持不变。这种张量称为 online tensors, tensor streams incremental tensors.
在大规模的online tensor中，找到其分解结果是很困难的，他的难点主要是

online tensors are growing with time,他的整体大小其实是没有限制的。TD 需要高效和可扩展，从时间和空间来看。
高数据生成速率需要分解方法提供实时或接近实时的性能。

Notation and basic operations

symbol	definition
$A^{T}$	转置 transpose
$A^{-1}$	逆 inverse
$A^{\dagger}$	Moore-Penrose pseudoinverse
$\lVert A\rVert$	frobenius norm
$\odot$	khatri-rao product
$\otimes$	hadamard product
$X_{(n)}$	张量的mode-n展开

预知识

CP分解

cp分解被广泛用于探索多维数据的潜在结构。给定一个N阶的张量$\mathcal{X}\in \mathbb{R}^{I_1\times \cdots \times I_N}$，CP分解近似这个张量通过N个loading 矩阵，因此

$\begin{equation} X_{(n)}\approx A^{(n)}(A^{(N)}\odot \cdots A^{(n+1)}\odot A^{(n-1)}\cdots A^{(1)})^T =A^{(n)}(\odot^N_{i\neq n}A^{(i)})^T=[A^{(1)},\cdots,A^{(N)}] \end{equation}$

其中$[\cdot]$定义为cp分解符号。
为了找到CP分解的结果，目标函数是最小化估计误差 $\mathcal{L}$，定义为

$\mathcal{L}=\frac{1}{2}\| X_{(n)}-A^{(n)}(\odot^N_{i\neq n}A^{(i)})^T\|^2$

但是直接最小化$\mathcal{L}$很困难，因为$\mathcal{L}$关于$A^{(1)},\cdots,A^{(N)}$是nonconvex的，因此常用的办法是使用ALS，通过固定n-th矩阵，交替最小化，此时$\mathcal{X}$关于$A^{(n)}$是convex的，此时

$A^{(n)}\gets argmin_{A^{(n)}} \frac{1}{2}\| X_{(n)}-A^{(n)}(\odot^N_{i\neq n}A^{(i)})^T\|^2$

online CP分解

给出一个三维的online tensor$\mathcal{X}\in \mathbb{X}^{I\times J\times (t_{old}+t_{new})}$，这个$\mathcal{X}$是通过对$\mathcal{X}_{old}\in \mathbb{R}^{I\times J\times t_{old}}$的最后一个mode进行追加$\mathcal{X}_{new}\in \mathbb{R}^{I\times J\times t_{new}}$.通常追加的张量很小，因此假设$\mathcal{X}_{new}\ll \mathcal{X}_{old}$. $\mathcal{X}_{old}$的cp分解写作$[A_{old},B_{old},C_{old}]$.目标是找到$\mathcal{X}$的cp分解$[A,B,C]$.
针对三阶张量

$C\gets argmin_{C} \frac{1}{2}\| X_{(3)}-C(B\odot A)^T\|^2$

^{_^}. 下面介绍一个现有的online cp分解的方法 ↩

SDT and RLST 这两种方法都是将online张量分解问题转换成增量矩阵分解问题，令$D=A\odot B$, 因此公式(1)可以被写成$X_{(3)}=CD^T$，问题就转变为如何去估计C和D。这两种方法不同之处在于计算C和D。SDT选择SVD来进行$ X_{old(3)}=U_{old}\sum_{old}V^T_{old}$，这时总有一个矩阵$W_{old}$使得$C_{old}=U_{old}W^{-1}_{old}$, $D_{old}=V_{old}\sum_{old}W_{old}^T$.同理得到，可以找到一个矩阵$W$，使得$C=UW^{-1}$, $D=V\sum W^T$,其中$U\sum V^T$是$X_{(3)}$的SVD分解。作者假设$D$和$D_{old}$只有微小的区别，因此

接下来介绍Zhou^[Zhou S, Vinh N X, Bailey J, et al. Accelerating Online CP Decompositions for Higher Order Tensors[A]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. New York, NY, USA: Association for Computing Machinery, 2016: 1375–1384.]的做法。

update temporal mode C

首先固定矩阵$A,B$,然后更新$C$，以此类推更新AB。

$\begin{aligned} C &\gets argmin_{C} \frac{1}{2}\| X_{(3)}-C(B\odot A)^T\|^2 \\ &=argmin_{C} \frac{1}{2}\left \lVert \left [ \begin{aligned}X_{old(3)}\\X_{new(3)}\\\end{aligned}\right]- \left [ \begin{aligned}C^{(1)}\\C^{(2)}\\\end{aligned}\right](B\odot A)^T\right\rVert^2\\ &=argmin_{C} \frac{1}{2}\left \lVert \left [ \begin{aligned}X_{old(3)}-C^{(1)}(B\odot A)^T \\X_{new(3)}-C^{(2)}(B\odot A)^T \\\end{aligned}\right]\right\rVert^2 \end{aligned}$

第一行通过固定$A_{old} B_{old}$得到$C_{old}$来最小化范数，即

$C_{old}(C^{(1)})=X_{old(3)}((B_{old}\odot A_{old})^T)^\dagger$

在关于t的online张量中，$C_{old}$是已经计算好的，$A$和$B$用$A_{old} B_{old}$。因此上面的最小化问题其实只要最小化第二行，整个问题变成计算

$C_{new}(C^{(2)})=X_{new(3)}((B\odot A)^T)^\dagger$

综合上述得到

$C= \left [\begin{aligned}C_{old} \\C_{new} \\\end{aligned}\right ] =\left [\begin{aligned}C_{old}& \\X_{new(3)}((B\odot &A)^T)^\dagger\\\end{aligned} \right ]$

update non-temporal modes A and B

首先更新$A$。通过固定$B$和$C$，这个误差估计函数$\mathcal{L}$可以写成$\frac{1}{2}| X_{(1)}-A(C\odot B)^T|^2$，其导数为

$\frac{\partial \mathcal{L}}{\partial A}=X_{(1)}(C\odot B)-A(C\odot B)^T(C\odot B)$

设置导数为0，并令$P=X_{(1)}(C\odot B)$以及$Q=(C\odot B)^T(C\odot B)$可以得到

$A=PQ^{-1}$

直接去计算$P$和$Q$是很费时的，主要是因为$(C\odot B)$是一个huge矩阵。
为了提升效率，需要一种更快的方法。

$\begin{aligned} P&=X_{(1)}(C\odot B)\\ &=[X_{old(1)},X_{new(1)}]\left(\left[\begin{aligned}C_{old} \\C_{new} \\\end{aligned}\right] \odot B \right)\\ &=[X_{old(1)},X_{new(1)}]\left[\begin{aligned}C_{old}\odot B \\C_{new}\odot B \\\end{aligned}\right]\\ &=X_{old(1)}(C_{old}\odot B )+X_{new(1)}(C_{new}\odot B) \end{aligned}$

注意到$B$可以固定为$B_{old}$,因此上式中的最后一行的第一部分其实是上次迭代的结果，因此上式可以写成

$P=P_{old}+X_{new(1)}(C_{new}\odot B)$

这意味着通过保存之前的$P$可以避免大量的计算，将P初始化为$\mathcal{X}$的一部分。这样无论何时新的数据进来，$P$都可以高效计算。
同样地，$Q$可以估计为

$\begin{aligned} Q&=Q_{old}+(C_{new}\odot B)^T(C_{new}\odot B)\\ &=Q_{old}+(C^T_{new}C_{new})\otimes (B^TB) \end{aligned}$

结合上一次的ALS迭代结果，矩阵$A$的更新规则为

$\begin{aligned} P&\gets P+X_{new(1)}(C_{new}\odot B)\\ Q&\gets Q+(C^T_{new}C_{new})\otimes (B^TB)\\ A&\gets PQ^{-1}\\ \end{aligned}$

同样地$B$的更新规则为

$\begin{aligned} U&\gets U+X_{new(2)}(C_{new}\odot A)\\ V&\gets V+(C^T_{new}C_{new})\otimes (A^TA)\\ B&\gets UV^{-1} \end{aligned}$

总结对于随着时间增长的三阶张量，提出了称为OnlineCP的算法，分为以下两步

初始化：对于non-temporal 模态，矩阵$PQUV$使用$\mathcal{X}_{init}$和$[![A,B,C]!]$进行初始化，即 $\begin{aligned} P=X_{init(1)}(C\odot B),& Q=(C^TC)\otimes (B^TB)\\ U=X_{init(2)}(C\odot A),& V=(C^TC)\otimes (A^TA) \end{aligned}$
更新: 对每个新进的数据块$X_{new}$
a. 对于temporal mode 3, C的更新方式为 $C= \left [\begin{aligned}C_{old} \\C_{new} \\\end{aligned}\right ] =\left [\begin{aligned}C_{old}& \\X_{new(3)}((B\odot &A)^T)^\dagger\\\end{aligned} \right ]$ b. 对于non-temporal mode 1 and 2, A的更新方式为 $\begin{aligned}P&\gets P+X_{new(1)}(C_{new}\odot B)\\Q&\gets Q+(C^T_{new}C_{new})\otimes (B^TB)\\A&\gets PQ^{-1}\end{aligned}$ 同理B的更新方式为 $\begin{aligned}U&\gets U+X_{new(2)}(C_{new}\odot A)\\V&\gets V+(C^T_{new}C_{new})\otimes (A^TA)\\B&\gets UV^{-1}\end{aligned}$
高阶张量的扩展
令$\mathcal{X}_{old}\in \mathbb{R}^{I_1 \times \cdots \times I_{N-1}\times t_{old}}$是$N$阶张量，他的CP分解是$[![A^{(1)}_{old}, \cdots,A^{(N-1)}_{old},A^{(N)}_{old} ]!]$, N-th mode表示时间。一个新的张量$\mathcal{X}_{new}\in \mathbb{R}^{I_1 \times \cdots \times I_{N-1}\times t_{old}}$添加到旧的张量中

upupwords

online CP decomposition