Michael Gutmann (University of Helsinki) recently wrote me with some comments on the Poisson transform paper (here). It turns out that the Poisson likelihood we define in the paper is a special case of more general frameworks he has worked on, the most recent being:
M.U. Gutmann and J.Hirayama (2011). Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models,UAI.
available at arxiv.org/abs/1202.3727.

The paper gives a very general (and interesting) framework for estimation using divergences between the empirical distribution of the data and a theoretical model that is not necessarily normalised.
What we call the Poisson transform appears when taking $\Psi(x) = x\log x$ as the generating function for the Bregman divergence. The same choice of Bregman divergence also corresponds to the generalised Kullback-Leibler divergence used in Minka (2005) Divergence measures and message passing. Presumably there are other connections we hadn’t seen either.

Michael also points out the following paper by Mnih & Teh (ICML 2012), who use noise-contrastive learning in a sequential unnormalised model: http://arxiv.org/abs/1206.6426. They ignore normalisation constants, which I wouldn’t recommend as a general strategy (it generally leads to biased estimates). See our paper for a solution that uses semiparametric inference.