Michael Gutmann (University of Helsinki) recently wrote me with some comments on the Poisson transform paper (here). It turns out that the Poisson likelihood we define in the paper is a special case of more general frameworks he has worked on, the most recent being:
M.U. Gutmann and J.Hirayama (2011). Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models,UAI.
available at arxiv.org/abs/1202.3727.
The paper gives a very general (and interesting) framework for estimation using divergences between the empirical distribution of the data and a theoretical model that is not necessarily normalised.
What we call the Poisson transform appears when taking as the generating function for the Bregman divergence. The same choice of Bregman divergence also corresponds to the generalised Kullback-Leibler divergence used in Minka (2005) Divergence measures and message passing. Presumably there are other connections we hadn’t seen either.
Michael also points out the following paper by Mnih & Teh (ICML 2012), who use noise-contrastive learning in a sequential unnormalised model: http://arxiv.org/abs/1206.6426. They ignore normalisation constants, which I wouldn’t recommend as a general strategy (it generally leads to biased estimates). See our paper for a solution that uses semiparametric inference.