Varying strength of prior for MCMC hierarchical linear model
I am training an MCMC model in using Pymc3.
My aim is to build a series of linear regression models which will predict the time to unload a truck, based on the number of crates to unload.
I have data for 2000 locations and they are divided into 4 categories of location. Locations of the same category tend to have similar unloading times.
For each location I have a series of recorded data points:
- number of crates on the truck
- how long it took to unload
I want to train a linear model for each location of this form:
y = α + βx
where y is the time, x is the number of crates.
For some locations I may have only 20 data points, others I may have 10,000.
Obviously when I have 10,000 data points there are enough data points to train a simple linear regression at that location with no other input.
However when I have smaller amounts of data, such as 20 points, I am hoping to train an over-arching model on all data points of locations in that category, and use its α and β as priors for the α and β of that location.
I have been following the very helpful example at https://github.com/parsing-science/pymc3_quickstart_guide/blob/master/Quickstart_Guide_no_errors.ipynb.
My questions are:
- how can I vary the strength of the prior so that it's a really strong prior if I have few data points, and it weakens to zero for the locations where I have enough data points?
- what values should I set for sigma_alpha and sigma_beta, the stdevs of the prior distributions of α and β? The Pymc3 examples use a uniform distribution, would the best idea to be to work out the approximate distribution by training a linear model at each location and getting the stdev of the parameters?
Here's my MCMC code:
import theano.tensor as T
import numpy as np
import pymc3 as pm
import pandas as pd
# load pandas dataframe df_train which has columns location_id, crates, unloading_time
# load location_category_lookup_dict which gives the category of each location
# load category_model_lookup which gives the pre-trained linear model for each of the 4 categories
num_features = 1
grouped = df_train.groupby(["location_id"])
location_no_to_mcmc_models = {}
for location_id, drops_at_this_location in grouped:
location_category_id = location_category_lookup_dict[location_id]
# Look up the intercept and coefficient in the previously trained linear model for this category
model_for_this_locations_category = category_model_lookup[location_category_id]
alpha_this_category = model_for_this_locations_category.intercept_
beta_this_category = model_for_this_locations_category.coef_
# Get the features (number of crates) and regressand (unloading time)
features = np.asarray([[c] for c in drops_at_this_location["crates"]])
y = np.asarray(drops_at_this_location["unloading_time"])
# Set up the PyMC model
hlm = pm.Model()
with hlm:
# Both alpha and beta are drawn from similar distributions
sigma_alpha = pm.Uniform("sigma_alpha", .0, 20, testval=2.)
sigma_beta = pm.Uniform("sigma_beta", .0, 4, testval=2.)
# Simulate the alphas
alpha = pm.Normal("alpha", alpha_this_category, sigma_alpha, shape=(1))
# Simulate the betas
beta = pm.Normal('beta', beta_this_category, sigma_beta, shape=(1, num_features))
# Simulate the noise
s = pm.Uniform("s", .01, 10, shape=1)
yd = pm.Normal('y', mu=alpha[0] + T.sum(beta[0]*features, 1), tau=s[0] ** -2, observed=y)
step = pm.NUTS()
tr = pm.sample(int(1e4), step, model=hlm)
mcmc_summary = pm.summary(tr)
location_no_to_mcmc_models[location_no] = mcmc_summary
print (location_id, "intercept:", mcmc_summary["mean"]["alpha__0"], "weight:", mcmc_summary["mean"]["beta__0_0"])
Topic mcmc numpy linear-regression scikit-learn
Category Data Science