Last week the Neurostats 2014 workshop took place at the University of Warwick (co-organised by Adam Johansen, Nicolas Chopin, and myself). The goal was to put some neuroscientists and statisticians together to talk about neural data and what to do with it. General impressions:
- The type of Bayesian hierarchical modelling that Andrew Gelman has been advocating for years is starting to see some use in neuroimaging. On the one hand it makes plenty of sense since the data at the level of individual subjects can be cr*p and so one could really use a bit of clever pooling. On the other, imaging data is very high-dimensional, running a Gibbs sampler can take days, and it’s not easy making the data comparable across subjects.
- You have to know your signals. Neural data can be unbelievably complicated and details matter a lot, as Jonathan Victor showed in his talk. A consequence if that if you as a neuroscientist have a data analysis problem, it’s not enough to go see a statistician and ask for advice. If you have EEG data you need to find someone who knows *specifically* about all the traps and pitfalls of EEG, or else someone who’s willing to learn about these things. A consequence is that we should think about training neurostatisticians, the way we already have biostatisticians, econometricians and psychometricians.
There were plenty of interesting talks, but below are some of my personal highlights.
Michael Gutmann (University of Helsinki) recently wrote me with some comments on the Poisson transform paper (here). It turns out that the Poisson likelihood we define in the paper is a special case of more general frameworks he has worked on, the most recent being:
M.U. Gutmann and J.Hirayama (2011). Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models,UAI.
available at arxiv.org/abs/1202.3727.
The paper gives a very general (and interesting) framework for estimation using divergences between the empirical distribution of the data and a theoretical model that is not necessarily normalised.
What we call the Poisson transform appears when taking as the generating function for the Bregman divergence. The same choice of Bregman divergence also corresponds to the generalised Kullback-Leibler divergence used in Minka (2005) Divergence measures and message passing. Presumably there are other connections we hadn’t seen either.
Michael also points out the following paper by Mnih & Teh (ICML 2012), who use noise-contrastive learning in a sequential unnormalised model: http://arxiv.org/abs/1206.6426. They ignore normalisation constants, which I wouldn’t recommend as a general strategy (it generally leads to biased estimates). See our paper for a solution that uses semiparametric inference.
A workshop on statistics and neuroscience, to take place at the University of Warwick, UK, Sept. 3-5 2014. We’ll talk spikes, voxels, pixels, MCMC, and so on.Official call for posters below the fold.
Nicolas Chopin has just arxived our manuscript on inference for unnormalised statistical models. An unnormalised statistical model whose likelihood function can be written
where is easy to compute but the normalisation constant is hard. A lot of common models fall into that category, for example Ising models or restricted Boltzmann machines. Not having the normalisation constant makes inference much more difficult.
We show that there is a principled way of treating the missing normalisation constant as just another parameter: effectively, you pretend that your data came from a Poisson process. The normalisation constant becomes a parameter in an augmented likelihood function. We call this mapping the Poisson transform, because it generalises a much older idea called the Multinomial-Poisson transform.
The Poisson transform can be approximated in practice by logistic regression, and we show that this actually corresponds to Guttman & Hyvärinen’s noise-contrastive divergence. Once you have seen the connection, generalising noise-contrastive divergence to non-IID models becomes easy, and we can do inference on unnormalised Markov chains, for example.
One nice thing about the result is that you can use it to turn a relatively exotic spatial Markov chain model into just another logistic regression. See the manuscript for details.
I have just arxiv’ed a new manuscript on speeding up computation for functional additive models such as functional ANOVA. A functional additive model is essentially a model says that a = b + c, where a, b and c are functions. It is a useful model when we want to express things like: I have three curves and I expect them to be related.
Gina Gruenhage has just arxived a new paper describing an algorithm we call cMDS. Here’s what it’s for: if you do any kind of data analysis you often find yourself comparing datapoints using some kind of distance metric. All’s well if you have a unique reasonable distance metric you can use, but often what you have is a family of possible distance functions, and very little idea how to choose among them. What if the patterns in the data change according to how you measure distance?
The slides for my ECVP tutorial on classification images are available here. Try this alternative version if the equations look funny.
(image from Mineault et al. 2009)
The slides are in HTML and contain some interactive elements. They’re the result of experimenting with R Markdown, D3 and pandoc. You write the slides in R Markdown, use knitr and pandoc to produce the slides, and add interaction using D3.
I’m not completely happy with the results but it’s a pretty cool set of tools.