It may look like a very old topic(around 2015). But I do spend some time on it, and it really helps me understand better about GADT, its advantages in handling transformations, which we're always encountering in a lot of tasks, including Bayesian inference.

The key point is that, why do we need GADT, as we already have ADT, what the benefit it may bring us? It takes me some time to fully understand this. And with the example of GADT in Bayesian inference, it really makes it clear!

Before we get started, I should mention all the materials this post is based on. This is important, since it looks like there happened to be a discussion and practice of this idea around 2015(8 years earlier!), and currently the technique is not used any more by those people(maybe there are some drawbacks of GADT which I haven't figured it out). So remember, it may be an old/discrepancy thing!

So, let's start!

The basic idea is to define a probability monad in the following ways:

data Dist a where
    Return      :: a -> Dist a
    Bind        :: Dist b -> (b -> Dist a) -> Dist a
    Primitive   :: (Sampleable d) => d a -> Dist a
    Conditional :: (a -> Prob) -> Dist a -> Dist a

Here, using GADT, 4 types of constructors are built.

As you can see, all about GADT is registering. It registers values and functions(of certain type) into a single type signature. And all the values and functions will not be evaluated until necessary(this is actually an implementation of lazy evaluation, and please do not confuse this with the built-in lazy evaluation in haskell). So this way, you define a probability graph most conveniently.

So let's take a look at how and when it will be evaluated:

instance Sampleable Dist where
    sample g (Return x)     = x
    sample g (Primitive d)     = sample g d
    sample g (Bind d f)        = sample g1 $ f $ sample g2 d where
        (g1, g2) = split g
    sample g (Conditional c d) = error "Attempted to sample from a conditional distribution."

Here, the parameter g refers to some random number generator, such as StdGen provided in System.Random. Let's explain sample function of each case:

Summary:

Compared with ADT, what GADT provides you, is a flexible way to wrap more types of data into you data type. You can not only register some type of data or function, but also register a combination of data and functions together, where the behaviors of this combination could be defined later in some trigger function. By doing this, you immediately obtain the power of expressing any computational graph into your single data type. That is amazing!

Notes:

For the necessary of using GADT instead of ADT, considering rewriting the example shown above using normal ADT. For example:

data Dist a where
    Return      :: a -> Dist a
    Bind        :: Dist b -> (b -> Dist a) -> Dist a
    Primitive   :: (Sampleable d) => d a -> Dist a
    Conditional :: (a -> Prob) -> Dist a -> Dist a

should be rewritten as the follows:

data Dist a = Return a 
            | Bind (Dist b) (b -> Dist a)
            | Primitive (d a)
            | Conditional (Dist a)

This code will not compile, and will raise the following error:

error: Not in scope: type variable ‘b’
error: Not in scope: type variable ‘d’






CC BY-SA 4.0 Septimia Zenobia. Last modified: May 28, 2024. Website built with Franklin.jl and the Julia programming language.