A linear model with GPy.
Background: I’m working on a project aiming to extrapolate dialysis patient results over a 100-hour window (I’ll write future blog posts on this!). I’m working with James Fotheringham (Consultant Nephrologist) who brilliantly bridges the gap between clinic and research – allowing ivory-tower researchers (me) to get our expertise applied in the real world to useful applications.
Part of this project is the prediction of the patient’s weight. We’ll consider a simple model.
When a patient has haemodialysis, they will have fluid removed. If this doesn’t happen frequently or successfully enough the patient can experience *fluid overload*. Each dialysis session (in this simplified model) brings the patient’s weight back down to their dry-weight. Then this weight will increase (roughly linearly) as time goes by until their next dialysis session. I model their weight \(w(t,s)\) with a slow-moving function of time-since-start-of-trial, \(f(t)\), added to a linear function of time-since-last-dialysis-session, \(g(s)\):
$$w(t,s) = f(t) + g(s)$$
For now we’ll ignore f, and just consider g. This is a simple linear function \(g(s) = s*w\). The gradient \(w\) describes how much their weight increases for each day it’s been since dialysis.
We could estimate this from previous data from that patient (e.g. by fitting a Gaussian Process model to the data). But if the patient is new, we might want to provide a prior distribution on this parameter. We could get this by considering what gradient other similar patients have (e.g. those with the same age, vintage, gender, weight, comorbidity, etc might have a similar gradient).
Side note: Colleagues have recommended that we combine all the patients into one large coregionalised model. This has several problems: excessive computational demands, privacy (having to share the data), interpretability (the gradient might be a useful feature), etc.
Other side note: I plan on fitting another model, of gradients wrt patient demographics etc to make predictions for the priors of the patients.
So our model is: \(g(s) = \phi(s)w\) where we have a prior on \(w \sim N(\mu_p, \sigma_p^2)\).
If we find the mean and covariance of \(g(s)\):
\(E[g(s)] = \phi(s) E[w] = \phi(s) \mu_p\)
\($E[g(s)g(s’)] = \phi(s) E[ww^\top] \phi(s’) = \phi(s) (\mu_p \mu_p^\top + \Sigma_p) \phi(s’)\)
It seems a simple enough problem – but it does require than the prior mean is no longer zero (or flat wrt the inputs). I’m not sure how to do that in GPy.
Update: There is no way of doing this in GPy by default. So I solved this by adding a single point, with a crafted noise, to produce or ‘induce’ the prior we want.
I’ve written this into a simple python function (sorry it’s not on pip, but it seems a little too specialised to pollute python module namespace with). Download from <a href=”https://github.com/lionfish0/experimentation/blob/master/linear_kernel_prior.py”>github</a>. The docstring should explain how to use it (with an example).