In Bayesian inference, the score is defined as the gradient of the log evidence \(P(x)\) respect to the observable random variable \(x\):
\(\nabla_x log P(x)\)The key is that the observable RV \(x\) is also connected with hidden variable \(h\) such that the evidence could be decomposed:
\(P(x) = \int P(x|y) P(y) dy\)Using Bayesian rule, we could easily reach the posterior of hidden variable \(y\):
\(P(y|x) = \frac{P(x|y)P(y)}{P(x)} = \frac{P(x|y)P(y)}{\int P(x|y) P(y) dy}\)The trick is that, score could be rewritten given the posterior:
\(\nabla_x log P(x) = \int_y \nabla_x log P(x|y) P(y|x) dy = \mathbb{E}_{y\sim P(y|x)} \nabla_x log P(x|y)\)This trick is used in score matching and diffusion models, as mentioned in A
for example:
the minimizer of below function will result in score matching method:
\(l_t(\bold{s}) = \mathbb{E}[|| \bold{s}(\bold{X_t}) - \nabla_{x_t} log P_{t|0}(\bold{X_t}| \bold{X_0}) ||^2]\)where the expectation is taken in respected to the posterior \(P_{0|t}(\bold{X_0}|\bold{X_t})\)