# Bayesian Analysis of Normal Distributions with Python

*This post is all about dealing with Gaussians in a Bayesian way; it’s a prelude to the next post: “Bayesian A/B Testing with a Log-Normal Model.” So don’t go analyzing the non-binary data from your $$A/B$$ tests until you read both blog posts!*

Let’s say you have some data and you’re pretty sure it can be well-modeled by a normal (Gaussian) distribution. Sure, you could just find maximum likelihood estimates of your mean $$mu$$ and variance $$sigma^2,$$ but that’s boring. Point estimates are *so* last century. So, you instead put on your felted Reverend Bayes hat, and decide to model the joint posterior distribution of *both* parameters given the data:

$$!p(mu,sigma^2|text{data}).$$

Yes! Nothing could be more heroic.

But how does one actually do this? We only had one parameter last time (which was headache enough) and now there are two…

Thankfully, this is a well-studied problem, and we won’t have to reinvent any wheels. First, apply Bayes rule:

$$!p(mu,sigma^2|text{data}) = frac{p(text{data}|mu,sigma^2)p(mu,sigma^2)}{p(text{data})},$$

where $$p(text{data}|mu,sigma^2)$$ is the (straightforward) Gaussian likelihood function, $$p(text{data})$$ doesn’t matter (because it’s not a function of $$mu$$ or $$sigma^2$$), and $$p(mu,sigma^2)$$ is the joint prior. I’m sure you’re already thinking to yourself, “And which prior function will we be using?” Why, I’m glad you asked…

### Conjugate priors

*Note: The main references for this is Gelman et al’s book (2nd ed.), Section 3.3, but you can find all the same info (with slightly different notation) online.
*

When faced with a Bayesian modeling task, I tend to shrug and then reach for a conjugate prior. It’s probably not the absolutely optimal approach, but it works well & is relatively easy to explain, and these two advantages are major currency here at {rr}.

We used a conjugate Beta prior for the Bernoulli case in the previous blog post, but now we need a joint prior for **two** parameters: $$mu$$ and $$sigma^2.$$ I won’t go into too much detail, but Gelman & co. show that the conjugate prior is of the form:

$$!p(mu,sigma^2) = p(sigma^2)p(mu|sigma^2),$$

where $$p(sigma^2)$$ is an inverse gamma distribution and $$p(mu|sigma^2)$$ is a normal distribution. (Note that Gelman uses a scaled inverse chi-squared, which can be reparametrized as an inverse gamma with a bit of bookkeeping.)

### Drawing samples from the posterior

Because we chose to use conjugate priors, we get to know the form of the joint posterior distribution of $$p(mu,sigma^2|text{data})$$ without a lot of extra work. This means our model is easier to write down and compute with. Now, Gelman et al. tell us how to draw samples from this posterior (page 80):

To sample from the joint posterior distribution we first draw a $$sigma^2$$ from its marginal posterior distribution, then draw $$mu$$ from its normal conditional posterior distribution, using the simulated value of $$sigma^2$$.

And here is a Python function that, given some data & prior parameters, does just that:

from numpy import sum, mean, size, sqrt from scipy.stats import norm, invgamma def draw_mus_and_sigmas(data,m0,k0,s_sq0,v0,n_samples=10000): # number of samples N = size(data) # find the mean of the data the_mean = mean(data) # sum of squared differences between data and mean SSD = sum( (data - the_mean)**2 ) # combining the prior with the data - page 79 of Gelman et al. # to make sense of this note that # inv-chi-sq(v,s^2) = inv-gamma(v/2,(v*s^2)/2) kN = float(k0 + N) mN = (k0/kN)*m0 + (N/kN)*the_mean vN = v0 + N vN_times_s_sqN = v0*s_sq0 + SSD + (N*k0*(m0-the_mean)**2)/kN # 1) draw the variances from an inverse gamma # (params: alpha, beta) alpha = vN/2 beta = vN_times_s_sqN/2 # thanks to wikipedia, we know that: # if X ~ inv-gamma(a,1) then b*X ~ inv-gamma(a,b) sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples) # 2) draw means from a normal conditioned on the drawn sigmas # (params: mean_norm, var_norm) mean_norm = mN var_norm = sqrt(sig_sq_samples)/kN mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples) # 3) return the mu_samples and sig_sq_samples return mu_samples, sig_sq_samples

Here is a quick guide to interpreting the four prior parameters:

`m0`

– Guess about where the mean is.`k0`

– Certainty about m0. Compare with number of data samples.`s_sq0`

– Number of degrees of freedom of variance.`v0`

– Scale of the sigma_squared parameter. Compare with number of data samples.

And here is some code that uses the above function to do an $$A/B$$ test on synthetic normal data:

from numpy.random import normal # step 1: define prior parameters for the normal and inverse gamma m0 = 4. k0 = 1. s_sq0 = 1. v0 = 1. # step 2: get some random data, with slightly different statistics A_data = normal(loc=4.1, scale=0.9, size=500) B_data = normal(loc=4.0, scale=1.0, size=500) # step 3: get posterior samples A_mus,A_sig_sqs = draw_mus_and_sigmas(A_data,m0,k0,s_sq0,v0) B_mus,B_sig_sqs = draw_mus_and_sigmas(B_data,m0,k0,s_sq0,v0) # step 4: perform numerical integration # probability that mean of A is greater than mean of B: print mean(A_mus > B_mus) # probability that variance of A is greater than variance of B: print mean(A_sig_sqs > B_sig_sqs)

There you have it! The only problem is that none of our real data is negative, so a normal distribution (which has support from negative infinity to positive infinity) is the wrong model. But a log-normal turns out to be right! In the next blog post we will use the function `draw_mus_and_sigmas()`

as a subroutine in our master plan: to conduct an $$A/B$$ test when both the $$A$$ and $$B$$ sides of the experiment result in log-normally distributed data!

[…] In the second blog post I went over how to do these kinds of tests on normally distributed data. But what if you have other types of data? Like, say, order values, revenue per session, or time spent on page? Notice that all of these measurements are non-negative or positive, as are many of the metrics we measure & report on at RichRelevance. This means we shouldn't model the data as being drawn from the battle-tested Gaussian (normal) distribution, because the Gaussian function stretches from negative infinity to positive infinity, and we know for a fact that negative values appear with probability zero. […]