Poisson transform – update

Michael Gutmann (University of Helsinki) recently wrote me with some comments on the Poisson transform paper (here). It turns out that the Poisson likelihood we define in the paper is a special case of more general frameworks he has worked on, the most recent being:
M.U. Gutmann and J.Hirayama (2011). Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models,UAI.
available at arxiv.org/abs/1202.3727.

The paper gives a very general (and interesting) framework for estimation using divergences between the empirical distribution of the data and a theoretical model that is not necessarily normalised.
What we call the Poisson transform appears when taking \Psi(x) = x\log x as the generating function for the Bregman divergence. The same choice of Bregman divergence also corresponds to the generalised Kullback-Leibler divergence used in Minka (2005) Divergence measures and message passing. Presumably there are other connections we hadn’t seen either.

Michael also points out the following paper by Mnih & Teh (ICML 2012), who use noise-contrastive learning in a sequential unnormalised model: http://arxiv.org/abs/1206.6426. They ignore normalisation constants, which I wouldn’t recommend as a general strategy (it generally leads to biased estimates). See our paper for a solution that uses semiparametric inference.

Statistical Challenges in Neuroscience

A workshop on statistics and neuroscience, to take place at the University of Warwick, UK, Sept. 3-5 2014. We’ll talk spikes, voxels, pixels, MCMC, and so on.Official call for posters below the fold.

Read more…

The Poisson Transform for Unnormalised Statistical Models

Nicolas Chopin has just arxived our manuscript on inference for unnormalised statistical models. An unnormalised statistical model whose likelihood function can be written

p(y|\theta) = \frac{f(y;\theta)}{z(\theta)}

where f(y;\theta) is easy to compute but the normalisation constant z(\theta) is hard. A lot of common models fall into that category, for example Ising models or restricted Boltzmann machines. Not having the normalisation constant makes inference much more difficult.

We show that there is a principled way of treating the missing normalisation constant as just another parameter: effectively, you pretend that your data came from a Poisson process. The normalisation constant becomes a parameter in an augmented likelihood function. We call this mapping the Poisson transform, because it generalises a much older idea called the Multinomial-Poisson transform.

The Poisson transform can be approximated in practice by logistic regression, and we show that this actually corresponds to Guttman & Hyvärinen’s noise-contrastive divergence. Once you have seen the connection, generalising noise-contrastive divergence to non-IID models becomes easy, and we can do inference on unnormalised Markov chains, for example.

One nice thing about the result is that you can use it to turn a relatively exotic spatial Markov chain model into just another logistic regression. See the manuscript for details.

lower_bound

Fast matrix computations for functional additive models

I have just arxiv’ed a new manuscript on speeding up computation for functional additive models such as functional ANOVA. A functional additive model is essentially a model says that a = b + c, where a, b and c are functions. It is a useful model when we want to express things like: I have three curves and I expect them to be related.
0_home_simon_Dropbox_fANOVA_Figures_illus_gen_model Read more…

cMDS: visualising changing distances

Gina Gruenhage has just arxived a new paper describing an algorithm we call cMDS. Here’s what it’s for: if you do any kind of data analysis you often find yourself comparing datapoints using some kind of distance metric. All’s well if you have a unique reasonable distance metric you can use, but often what you have is a family of possible distance functions, and very little idea how to choose among them. What if the patterns in the data change according to how you measure distance?

cmds_a

Read more…

ECVP tutorial on classification images

The slides for my ECVP tutorial on classification images are available here. Try this alternative version if the equations look funny.

Mineault_1

(image from Mineault et al. 2009)

The slides are in HTML and contain some interactive elements. They’re the result of experimenting with R Markdown, D3 and pandoc. You write the slides in R Markdown, use knitr and pandoc to produce the slides, and add interaction using D3.
I’m not completely happy with the results but it’s a pretty cool set of tools.
Read more…

Fitting psychometric functions using STAN

STAN is a new system for Bayesian inference, similar to BUGS and JAGS. I’ve played with it a bit and it’s quite promising, it really has the potential to make MCMC less of a pain (on simple models). I’ve written a short introduction to fitting psychometric functions using STAN and R, in case that’s useful to psychophysicists out there.

psychfun

The ANOVA madness has to stop (rant)

Imagine a world in which people are taught that there’s two kinds of counting: there’s potato-counting, and there’s counting other stuff (beans, points, cards, etc.) Potatoes are special, so that potato-counting gets its own courses, under the name “Kartoffelanalysis”. When you take a Kartoffelanalysis 101 course, nobody mentions that you could use the same techniques to count other objects. Potatoes are special and unique. More advanced students learn that there are special techniques for counting a mix of potatoes and other things, and these sophisticated techniques are called Mixed Kartoffelanalysis. Only a select few ever learn that counting potatoes works pretty much the same way as counting other stuff.
Read more…

Predicting spatial locations using point processes

I’ve uploaded a draft tutorial on some aspects of prediction using point processes. I wrote it using R-Markdown, so there’s bits of R code for readers to play with. It’s hosted on Rpubs, which turns out to be a great deal more convenient than WordPress for that sort of thing.

hpp_trend

Point processes for eye movements: update

We’ve just revised and re-arxived our manuscript on point processes for the analysis of eye movement data (joint work with Hans Trukenbrod & Ralf Engbert of the University of Potsdam, Felix Wichmann of the University of Tübingen).

The main idea is that often one is interested mostly in where people have looked and why. Fixation locations at are just points in space, and so you can analyse that sort of data with point processes. The reason you’d want to do that is that point processes give interesting ways of characterising the statistical patterns of points in space. The thing we focus on is predicting what people look at based on image content, but that’s only one of the many things you can do in that framework.
example_dep_initial_fix
Of potential interest to eye movement researchers and people who like statistical models of stuff.

Read more…