undefined

Matthew Varble

Mathematician for hire!

roadmap

formulate generative models and associated language
give examples of generative models for intuition
discuss inference of a generative model
summarize a cool inference algorithm

my research

quick terminology

Probability. The comprehensive study of probability measures, measurable functions, kernels, and their related operations and properties.
Statistics. The study of modeling indeterminism of real-world measurements as quantities $Y$ distributing under a measure $\mu_\theta$ determined by parameter $\theta \in \Theta$ . Investigate $\mu_\theta$ to solve $\theta$ .
Machine learning. Construct massive family $($ $F(\cdot,\alpha)$ $)_{\alpha \in \mathcal{A}}$ of functions $F(\cdot, \alpha)$ and use data-centric variational methods to select $\alpha \in \mathcal{A}$ , ultimately coercing $F(\cdot, \alpha)$ into behaving nicely in some context.

probabilistic operations

Probability. The comprehensive study of probability measures, measurable functions, kernels, and their related operations and properties.
$($ $\mu$ $,$ $f$ $)$ $\mapsto$ $\displaystyle\int_{\mathbb{X}} f(x) \mu({\rm d}x),$
$($ $\mu$ $,$ $T$ $)$ $\mapsto$ $T_\#\mu = \Big(\Gamma \mapsto \mu(T^{-1}\Gamma) \Big)$
Other notation: $X_\#\mu = \mu(X \in \cdot) = \mu_X$
$($ $\mu$ $,$ $\kappa$ $)$ $\mapsto$ $\displaystyle \kappa\ast\mu = \Big( \Gamma \mapsto \int_\mathbb{X} \kappa(x, \Gamma) \mu({\rm d}x) \Big)$
$($ $\kappa$ $,$ $T$ $)$ $\mapsto$ $T_\#\kappa = \Big( \Gamma \mapsto \kappa\big(T(x), T^{-1}\Gamma\big) \Big)$
$($ $\mu$ $,$ $X$ $,$ $Y$ $)$ $\mapsto$ $\mu_{X|Y}, \quad \mu_{(X,Y)} = \mu_{X|Y}\ast \mu_Y$

generative modeling

Construct computer algorithms to consecutively sample quantities/measures via sucessive operations

Sources. $x$ $\sim$ $\mu$ ; simple $\mu$
Transports/maps. $y$ $\sim$ $T_\#$ $\nu$
i.e. $x$ $\sim$ $\nu$ and $y$ $=$ $T($ $x$ $)$
Kernels/flatmaps. $y$ $\sim$ $\kappa \ast$ $\nu$
i.e. $x$ $\sim$ $\nu$ and $y$ $\sim$ $\kappa($ $x$ $,$ $\cdot)$
All sorts of combinations thereof!

example: hidden Markov model

X_0

$Y_0$

$X_1$

$Y_1$

$\cdots$

$\kappa_1 \ast \cdot$

$\lambda_1 \ast \cdot$

$\lambda_2 \ast \cdot$

$\kappa_2 \ast \cdot$

example: recurrent neural network

$W_0$

$\sim \mu_W$

$L_0$

$\sim \mu_L$

$X_0$

$Y_0$

$W_1$

$\sim \mu_W$

$L_1$

$X_1$

$Y_1$

$\cdots$

$\lambda_1\ast\cdot$

$T(\cdot,\theta)_\#\cdot$

$\lambda_2\ast\cdot$

$T(\cdot,\theta)_\#\cdot$

calibration

When our measurement is a quantity $Y$ , we may calibrate $\theta$ by repeatedly generating $Y$ under $\mu_\theta$ .

inference

All generative models produce to some joint measure $\mu$ (respectively $\mu_\theta$ ). Inference is the notion of studying a quantity $X$ conditioned on one $Y$ ; i.e. making judgements about $\mu_{X|Y}$ .

Bayesian inference.
$\mu(X \in {\rm d}x, Y \in {\rm d}y) = p_{XY}(x,y) {\rm d}x {\rm d}y$
$\leadsto \begin{aligned} \mu(X \in {\rm d}x) &= p_X(x) {\rm d}x \\ \mu(Y \in {\rm d}y) &= p_Y(y) {\rm d}y \\ \mu(X \in {\rm d}x | Y = y) &= p_{X|Y}(x|y) {\rm d}x \\ \mu(Y \in {\rm d}y | X = x) &= p_{Y|X}(y|x) {\rm d}y \\ \end{aligned}$
$\begin{aligned} p_{X|Y} &= p_{XY}/p_Y \\ &= p_X p_{Y|X} / p_Y \end{aligned}$

Bayesian inference

Various estimators for $X$ $|$ $Y=y$ , often based off of $p_{X|Y} \propto p_{Y|X}p_X$

MLE (respectively MAP) Maximize $p_{Y|X}(y|\cdot)$ (respectively $p_{X|Y}(\cdot|y)$ ).
Importance sampling. Sample a weighted prior.
$\displaystyle\int_{\mathbb{X}} x \mu_{X|Y}({\rm d}x|y) \propto \int_{\mathbb{X}} x p_{Y|X}(y|x) \mu_X({\rm d}x)$
i.e. sample $x_1,\ldots,x_M \sim \mu_X$ and take $\displaystyle \hat{x} = \sum_{i=1}^M \bigg( \frac{p(y|x_i) x_i}{\sum_{j=1}^M p(y|x_j)} \bigg)$
MCMC Construct a Markov chain with proposal kernel
$Q({\rm d}x'|x) = q(x'|x){\rm d}x'$
and rejection scheme to ensure the invariant distribution is $\mu_{X|Y}$ .

Markov-chain Monte Carlo

Do we get the idea?

1Initialize state $x_0$
2for $k = 0, \ldots, L-1$ do
3
sample proposal $\tilde x_{k+1}$ $\sim$ $Q(\cdot|$ $x_k$ $)$
4
sample $a_k$ $\sim$ $U(0,1)$ and set the following.
$a($ $x_k$ $,$ $\tilde x_{k+1}$ $)$ $=$ $\displaystyle \min\left\{ 1, \frac{p_X(\tilde x_{k+1})p_{Y|X}(y|\tilde x_{k+1})q(x_k|\tilde x_{k+1})}{p_X(x_k)p_{Y|X}(y|x_k)q(\tilde x_{k+1}|x_k)} \right\}$
$x_{k+1}$ $=$ $\left\{ \begin{array}{ll} \tilde x_{k+1} & a_k \leq a(x_k,\tilde x_{k+1}) \\ x_k & \text{ otherwise} \end{array}\right.$

transport map Markov-chain Monte Carlo

Learn a proposal kernel $T(\cdot$ $,$ $\alpha$ $)_\#$ $Q$ through variational method.

1Initialize state $x_0$ and parameter $\alpha$
2for $k = 0, \ldots, L-1$ do
3acompute reference $r_k$ $=$ $T($ $x_k$ $,$ $\alpha$ $)$
3bsample proposal reference $\tilde r_{k+1}$ $\sim$ $Q(\cdot|$ $r_k$ $)$
3cevaluate proposal $\tilde x_{k+1}$ $=$ $T^{-1}($ $\tilde r_{k+1}$ $,$ $\alpha$ $)$
4
sample $a_k$ $\sim$ $U(0,1)$ and set the following.
$a($ $x_k$ $,$ $\tilde x_{k+1}$ $)$ $=$ $\displaystyle \min\left\{ 1, \frac{p_X(\tilde x_{k+1})p_{Y|X}(y|\tilde x_{k+1})q(r_k|\tilde r_{k+1})|\operatorname{det}\nabla T(x_k,\alpha)|}{p_X(x_k)p_{Y|X}(y|x_k)q(\tilde r_{k+1}|r_k)|\operatorname{det}\nabla T(\tilde x_{k+1},\alpha)|} \right\}$
$x_{k+1}$ $=$ $\left\{ \begin{array}{ll} \tilde x_{k+1} & a_k \leq a(x_k,\tilde x_{k+1}) \\ x_k & \text{ otherwise} \end{array}\right.$
5if $(k+1 \mod K_U) = 0$ then
6
Update $\alpha$ by optimizing estimated divergence $\gamma \rightarrow C\big(T(\cdot; \gamma)\big)$ induced by running chain $\{x_1,\ldots,x_{k+1}\}$