decomposing scores

In Bayesian inference, the score is defined as the gradient of the log evidence \(P(x)\) respect to the observable random variable \(x\):

\(\nabla_x log P(x)\)

The key is that the observable RV \(x\) is also connected with hidden variable \(h\) such that the evidence could be decomposed:

\(P(x) = \int P(x|y) P(y) dy\)

Using Bayesian rule, we could easily reach the posterior of hidden variable \(y\):

\(P(y|x) = \frac{P(x|y)P(y)}{P(x)} = \frac{P(x|y)P(y)}{\int P(x|y) P(y) dy}\)

The trick is that, score could be rewritten given the posterior:

\(\nabla_x log P(x) = \int_y \nabla_x log P(x|y) P(y|x) dy = \mathbb{E}_{y\sim P(y|x)} \nabla_x log P(x|y)\)

This trick is used in score matching and diffusion models, as mentioned in A

for example:

the minimizer of below function will result in score matching method:

\(l_t(\bold{s}) = \mathbb{E}[|| \bold{s}(\bold{X_t}) - \nabla_{x_t} log P_{t|0}(\bold{X_t}| \bold{X_0}) ||^2]\)

where the expectation is taken in respected to the posterior \(P_{0|t}(\bold{X_0}|\bold{X_t})\)

CC BY-SA 4.0 Septimia Zenobia. Last modified: March 20, 2024. Website built with Franklin.jl and the Julia programming language.