## Poisson transform – update

Michael Gutmann (University of Helsinki) recently wrote me with some comments on the Poisson transform paper (here). It turns out that the Poisson likelihood we define in the paper is a special case of more general frameworks he has worked on, the most recent being:
M.U. Gutmann and J.Hirayama (2011). Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models,UAI.
available at arxiv.org/abs/1202.3727.

The paper gives a very general (and interesting) framework for estimation using divergences between the empirical distribution of the data and a theoretical model that is not necessarily normalised.
What we call the Poisson transform appears when taking $\Psi(x) = x\log x$ as the generating function for the Bregman divergence. The same choice of Bregman divergence also corresponds to the generalised Kullback-Leibler divergence used in Minka (2005) Divergence measures and message passing. Presumably there are other connections we hadn’t seen either.

Michael also points out the following paper by Mnih & Teh (ICML 2012), who use noise-contrastive learning in a sequential unnormalised model: http://arxiv.org/abs/1206.6426. They ignore normalisation constants, which I wouldn’t recommend as a general strategy (it generally leads to biased estimates). See our paper for a solution that uses semiparametric inference.

## Statistical Challenges in Neuroscience

A workshop on statistics and neuroscience, to take place at the University of Warwick, UK, Sept. 3-5 2014. We’ll talk spikes, voxels, pixels, MCMC, and so on.Official call for posters below the fold.

## The Poisson Transform for Unnormalised Statistical Models

Nicolas Chopin has just arxived our manuscript on inference for unnormalised statistical models. An unnormalised statistical model whose likelihood function can be written

$p(y|\theta) = \frac{f(y;\theta)}{z(\theta)}$

where $f(y;\theta)$ is easy to compute but the normalisation constant $z(\theta)$ is hard. A lot of common models fall into that category, for example Ising models or restricted Boltzmann machines. Not having the normalisation constant makes inference much more difficult.

We show that there is a principled way of treating the missing normalisation constant as just another parameter: effectively, you pretend that your data came from a Poisson process. The normalisation constant becomes a parameter in an augmented likelihood function. We call this mapping the Poisson transform, because it generalises a much older idea called the Multinomial-Poisson transform.

The Poisson transform can be approximated in practice by logistic regression, and we show that this actually corresponds to Guttman & Hyvärinen’s noise-contrastive divergence. Once you have seen the connection, generalising noise-contrastive divergence to non-IID models becomes easy, and we can do inference on unnormalised Markov chains, for example.

One nice thing about the result is that you can use it to turn a relatively exotic spatial Markov chain model into just another logistic regression. See the manuscript for details.

## Fast matrix computations for functional additive models

I have just arxiv’ed a new manuscript on speeding up computation for functional additive models such as functional ANOVA. A functional additive model is essentially a model says that a = b + c, where a, b and c are functions. It is a useful model when we want to express things like: I have three curves and I expect them to be related.

## cMDS: visualising changing distances

Gina Gruenhage has just arxived a new paper describing an algorithm we call cMDS. Here’s what it’s for: if you do any kind of data analysis you often find yourself comparing datapoints using some kind of distance metric. All’s well if you have a unique reasonable distance metric you can use, but often what you have is a family of possible distance functions, and very little idea how to choose among them. What if the patterns in the data change according to how you measure distance?

## ECVP tutorial on classification images

The slides for my ECVP tutorial on classification images are available here. Try this alternative version if the equations look funny.

(image from Mineault et al. 2009)

The slides are in HTML and contain some interactive elements. They’re the result of experimenting with R Markdown, D3 and pandoc. You write the slides in R Markdown, use knitr and pandoc to produce the slides, and add interaction using D3.
I’m not completely happy with the results but it’s a pretty cool set of tools.

## Fitting psychometric functions using STAN

STAN is a new system for Bayesian inference, similar to BUGS and JAGS. I’ve played with it a bit and it’s quite promising, it really has the potential to make MCMC less of a pain (on simple models). I’ve written a short introduction to fitting psychometric functions using STAN and R, in case that’s useful to psychophysicists out there.

## The ANOVA madness has to stop (rant)

Imagine a world in which people are taught that there’s two kinds of counting: there’s potato-counting, and there’s counting other stuff (beans, points, cards, etc.) Potatoes are special, so that potato-counting gets its own courses, under the name “Kartoffelanalysis”. When you take a Kartoffelanalysis 101 course, nobody mentions that you could use the same techniques to count other objects. Potatoes are special and unique. More advanced students learn that there are special techniques for counting a mix of potatoes and other things, and these sophisticated techniques are called Mixed Kartoffelanalysis. Only a select few ever learn that counting potatoes works pretty much the same way as counting other stuff.