Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Background and Motivation

functions $f$: $\mathbb{R^d}\rightarrow \mathbb{R}$ of the form $f(x)=\sum^N_{i=1}f_i(x)$. 当$N$很大时，偏向使用$f$的随机估计，使用SGD的变体或者stochastic gradient langevin dynamics：

$X_{t+1}^{SGD}=X_{t}^{SGD} - \alpha_t N \nabla_{f_{I_t}}(X_{t}^{SGD})$ $X_{t+1}^{SGLD}=X_{t}^{SGLD} - \alpha_t N \nabla_{f_{I_t}}(X_{t}^{SGLD})$

$\{\alpha_t\}_{t=1}^T$表示一系列的step-sizes，索引$I_t$从$[N]$从随机挑选，使得$N\nabla_{f_{I_t}}(x)$是$f$的梯度的无偏估计。众所周知，这些算法给出的答案的质量取决于梯度估计器的（迹）方差，并且已经做出了相当大的努力来设计减少这种方差的方法。

本文关注于使用importance sampling来实现方差减少，在每次迭代，算法根据给定的$p^t$分布中选择$I_t$，估计方差为$\hat{g}^t:=\frac{1}{p^t_{I_t}\nabla f_{I_t}(x_t)}$. $\hat{g}^t$是$g^t:=\nabla f(x_t)$的无偏估计。只要选好了$p^t$,就可以显著地减少方差。但是计算方差减少的最优分布，需要计算每次迭代中的所有个体梯度$g^t_i:=\nabla f_i(x_t)$的euclidean范数，太expensive。

本文关注于online problem with bandit feedback. 尝试设计一个带有sub-linear expected static regret, which is defined as

$Regret_S(T):=\sum^T_{t=1}c_t(p^t)-min_{p\in \triangle}\sum_{t=1}^Tc_t(p)$

其中$\triangle$是$\mathbb{R}^N$的probability simplex，$c_t(p)$是$\hat{g}^t$的协方差矩阵的迹

$c_t(p):=\sum_{i=1}^N\frac{1}{p_i}\|g_i^t\|_2^2-\|g^t\|_2^2$

注意，第二项在regret的定义中被取消，剩下的讨论中忽略了它。在这这个公式中，为了保证计算负载的可管理，只能使用$I^{th}_t$ gradient的范数形式的部分反馈，而不能使用完整的cost function。在一致有界梯度的假设下，[12]中提出了一种具有$\hat{\mathcal{O}}(T^{\frac{2}{3}})$静态regret算法。
更好用的是动态regret算法

$Regret_D(T):=\sum^T_{t=1}c_t(p^t)-\sum_{t=1}^Tmin_{p\in \triangle}c_t(p)$

upupwords

importance sampling in stochastic gradient

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Background and Motivation

Main Contributions

Theoretical Framework/Algorithm

Experimental Design and Results