## Point processes for eye movements: update

We’ve just revised and re-arxived our manuscript on point processes for the analysis of eye movement data (joint work with Hans Trukenbrod & Ralf Engbert of the University of Potsdam, Felix Wichmann of the University of Tübingen).

The main idea is that often one is interested mostly in where people have looked and why. Fixation locations at are just points in space, and so you can analyse that sort of data with point processes. The reason you’d want to do that is that point processes give interesting ways of characterising the statistical patterns of points in space. The thing we focus on is predicting what people look at based on image content, but that’s only one of the many things you can do in that framework.

Of potential interest to eye movement researchers and people who like statistical models of stuff.

## Finding patterns in time series using regular expressions

Regular expressions are a fantastic tool when you’re looking for patterns in time series. I wish I’d realised that sooner.

Here’s a timely example: traditionally, when you have two successive quarters of negative GDP growth, you’re in recession. We have a quarterly GDP time series for Australia, and we want to know how many recessions they’ve been having down under. How do we count these events?

Of course you can always write a loop, but it’s clumsy. You can use the “embed” function, that’s much better in a lot of cases. Or you could use regexps. Here’s how.

First, our Australian GDP data, courtesy of the expsmooth package:

plot(ausgdp, main = "Australian GDP per capita", ylab = "dollars", xlab = "Year")


(caveat: the series is per capita GDP, not absolute GDP, so my recession count might be off).

## Barycentric interpolation: fast interpolation on arbitrary grids

Barycentric interpolation generalises linear interpolation to arbitrary dimensions. It is very fast although suboptimal if the function is smooth. You might now it as algorithm 21.7.1 in Numerical Recipes (Two-dimensional Interpolation on an Irregular Grid). Using package geometry it can be implemented in a few lines of code in R.

Here’s a quick explanation of what the algorithm is doing: when interpolating to approximate the value of a function at ${f(\mathbf{x})}$, you want to see what the value of ${f}$ is at neighbouring grid points and return some kind of weighted average of these values (with higher weights given to closer grid points). We start with random grid points:


X <- cbind(rnorm(n),rnorm(n))


Barycentric interpolation on the plane uses just three near-neighbours to interpolate. As a first step a triangulation is built: usually but not necessarily the Delaunay triangulation. This divides the convex hull of the grid into triangular tiles:


dn <- delaunayn(X)


The function delaunayn returns the index of the three points that make up each triangle. The tiles define which neighbours to use in the interpolation: for example if we interpolate anywhere within the blue tile we will be using the measured value of ${f}$ at the three highlighted points.

The only thing that we still need to define is the relative weights of the three points in the interpolated value. This is where barycentric coordinates come in: given a point ${\mathbf{x}}$ in the interior of a triangle formed by ${\mathbf{x}_{1},\mathbf{x}_{2},\mathbf{x}_{3}}$, we can express ${\mathbf{x}}$ as a weighted average (convex combination) of ${\mathbf{x}_{1},\mathbf{x}_{2},\mathbf{x}_{3}}$:

$\displaystyle \mathbf{x}=\sum_{i=1}^{3}\alpha_{i}\mathbf{x}_{i}$

where ${\alpha_{i}>0}$ and ${\sum\alpha_{i}=1}$. The ${\alpha_{i}}$‘s are called the barycentric coordinates of ${\mathbf{x}}$, and we can use them as weights in the interpolation:

$\displaystyle f\left(\mathbf{x}\right)\approx\sum_{i=1}^{3}\alpha_{i}f\left(\mathbf{x}_{i}\right)$

For a linear function ${f(\mathbf{x})=\mathbf{w}^{t}\mathbf{x}+\mathbf{r}}$ this interpolation is exact, so barycentric interpolation makes sense for functions that are roughly piecewise linear on the triangulation.

The function tsearch will assign each interpolated location to its Delaunay tile, and return its barycentric coordinates. To avoid slow R loops I use a sparse matrix trick to compute the interpolation. The whole thing is essentially five lines of code:


#2D barycentric interpolation at points Xi for a function with values f measured at locations X
#For N-D interpolation simply replace tsearch with tsearchn and modify the sparse matrix definition to have non-zero values in the right spots.
interp.barycentric <- function(X,f,Xi)
{
require(geometry)
require(Matrix)
dn <- delaunayn(X)
tri <- tsearch(X[,1],X[,2],dn,Xi[,1],Xi[,2],bary=T)
#For each line in Xi, defines which points in X contribute to the interpolation
active <- dn[tri$idx,] #Define the interpolation as a sparse matrix operation. Faster than using apply, probably slower than a C implementation M <- sparseMatrix(i=rep(1:nrow(Xi),each=3),j=as.numeric(t(active)),x=as.numeric(t(tri$p)),dims=c(nrow(Xi),length(f)))
as.numeric(M%*%f)

}


## Slightly silly D3 example: shift one datapoint to get a significant result

Have you ever seen these scatterplots that report a significant correlation between X and Y, and look like it’s just the one point to the upper-right driving the correlation? Thanks to this interactive tool, you too can do this at home.

## Interactive MDS visualisation using D3

Here’s a sneak peak into upcoming visualisation work. I’ve been working a bit on MDS (Multi-dimensional scaling), a classical technique for visualising distance data. Classical MDS is useful, but interactive MDS is *much* more useful. Using D3, a Javascript visualisation framework, it’s relatively easy to make interactive MDS plots. This example shows how basic interaction can be used to show the approximation inherent in a MDS representation. See here for the interactive version, or click on the image below.

The sourcecode is available as well, but seeing as I’m new to Javascript it’s not exactly a model of clarity and elegance. All the computational hard work is done in R rather than D3, and was adapted from the “cmdscale” documentation. Most things are still a lot easier to do in R than in D3. Fortunately there’s Shiny, and the R + Shiny + D3 toolchain looks very promising. More on that in a future post.

## French R Conference in Lyon – call for contributions

[of interest mostly to French and R bilinguals]

La prochaine édition des Rencontres R aura lieu à Lyon en Juillet prochain. Ci-dessous, l’appel officiel à contributions.

————————————————–
Appel à communication des 2èmes Rencontres R :

Dans la lignée de la conférence internationale Use’R et suite à la première édition qui a eu lieu à Bordeaux les 2 et 3 juillet 2012, les Deuxièmes Rencontres R auront lieu les 27 et 28 juin 2013 à Lyon (http://r2013-lyon.sciencesconf.org).

L’esprit de ces rencontres est de fournir à l’échelle nationale un lieu d’échange et de partage d’idées sur l’usage du logiciel R dans différentes disciplines (visualisation et graphiques, statistique appliquée, biostatistique, statistique bayésienne, bioinformatique, analyse de données, modélisation, machine learning, high performance computing…).

Ces rencontres sont destinées à tous types d’utilisateurs de R : les chercheurs, les enseignants, les industriels, les étudiants, … Elles s’adressent aussi bien aux débutants qu’aux utilisateurs confirmés et expérimentés.

## edply: combining plyr and expand.grid

Here’s a code snippet I thought I’d share. Very often I find myself checking the output of a function f(a,b) for a lot of different values of a and b, which I then need to plot somehow.
An example: here’s a function that computes the value of a sinusoidal function on a grid of points, and returns a data.frame.

fun <- function(freq,phase) {
x <- seq(0,2*pi,l=100);
data.frame(x=x,value=sin(freq*x-phase))
}


It takes a frequency and a phase argument, and we want to know what the output looks like for frequencies between 1 and 6 and phase values of 0 and 1.
Usually this means calling e.g., expand.grid(freq=1:6,phase=c(0,1)), to get all possible combinations of the two variables, then calling one of the plyr functions to get the results in a useable form. The edply function does it all in one line:

d <- edply(list(freq=c(1,2,4,8),phase=c(0,1)),fun)


which returns a data.frame:

freq phase          x      value
1    1     0 0.00000000 0.00000000
2    1     0 0.06346652 0.06342392
3    1     0 0.12693304 0.12659245

which we can then plot:

ggplot(d,aes(x,value,col=as.factor(phase)))+facet_wrap( ~ freq)+geom_path()


The edply function can also be used to compute and plot a heatmap:

fun <- function(x,y) dnorm(x)*dnorm(y)*sin(x)
d <- edply(list(x=seq(-3,3,l=40),y=seq(-3,3,l=40)),fun)
ggplot(d,aes(x,y))+geom_raster(aes(fill=V1))


I’ve attached the code below, there really isn’t much to it. Note that there’s also an “elply” function that (not unexpectedly) returns a list.

#eply: combining plyr and expand.grid.
#Simon Barthelmé, University of Geneva
#
#Example usage
#-------------
#fun <- function(x,y) dnorm(x)*dnorm(y)*sin(x)
#d <- edply(list(x=seq(-3,3,l=40),y=seq(-3,3,l=40)),fun)
#ggplot(d,aes(x,y))+geom_raster(aes(fill=V1)) #Heatmap of f(x,y)

elply <- function(vars,fun,...,.progress="none",.parallel=FALSE)
{
df <- do.call("expand.grid",vars)
if (all(names(vars) %in% names(formals(fun))))
{
#We assume that fun takes the variables in vars as named arguments
funt <- function(v,...)
{
do.call(fun,c(v,list(...)))
}
res <- alply(df,1,funt,...,.progress=.progress,.parallel=.parallel)
}
else
{
#We assume that fun takes a named list as first argument
res <- alply(df,1,fun,...,.progress=.progress,.parallel=.parallel)
}
res
}

edply <- function(...)
{
res <- elply(...)
plyr:::list_to_dataframe(res,attr(res, "split_labels"))
}