Deep Latent Gaussian Models
Deep Latent Gaussian Models
Deep latent Gaussian models (DLGMs) are a general class of deep directed graphical models that consist of Gaussian latent variables at each layer of a processing hierarchy. The model consists of $L$ layers of latent variables. To generate a sample from the model, we begin at the top-most layer ($L$) by drawing from a Gaussian distribution.
The activation \(\mathbf{h}_{l}\) at any lower layer is formed by a non-linear transformation of the layer above \(\mathbf{h}_{l+1}\) , perturbed by Gaussian noise. We descend through the hierarchy and generate observations $v$ by sampling from the observation likelihood using the activation of the lowest layer \(\mathbf{h}_1\). This process is described graphically in figure 1(a).
This generative process is described as follows:
\[\begin{aligned} & \mathbf{\xi}_l \sim \mathcal{N}(\mathbf{\xi}_l \mid \mathbf{0}, \mathbf{I}), \enspace l = 1, \dots , L \\ & \mathbf{h}_L = \mathbf{G}_L\mathbf{\xi}_L, \\ & \mathbf{h}_l = \mathit{T}_l(\mathbf{h}_{l+1}) + \mathbf{g}_l\mathbf{\xi}_l, \enspace l = 1, \dots, L - 1 \\ & v \sim \pi(\mathbf{v} \mid \mathbf{T}_0(\mathbf{h}_l)), \\ \end{aligned}\]where $\mathbf{\xi}_l$ are mutually independent Gaussian variables. The transformations $\mathit{T}_l$ represent multi-layer perceptrons (MLPs) and $\mathbf{G}_l$ are matrices. At the visible layer, the data is generated from any appropriate distribution $\pi(\mathbf{v}\mid\cdot)$ whose parameters are specified by a transformation of the first latent layer.
Stochastic Backpropagation
Gradient descent methods in latent variable models typically require computations of the form \(\nabla_\theta \mathbb{E}_{q_\theta}\left[f(\mathbf{\xi})\right]\), where the expectation is taken with repect to a distribution \(q_\theta(\cdot)\) with parameters $ \mathbf\theta $, and $f$ is a loss function that we assume to be integrable and smooth.