Have you ever seen these scatterplots that report a significant correlation between X and Y, and look like it’s just the one point to the upper-right driving the correlation? Thanks to this interactive tool, you too can do this at home.

Have you ever seen these scatterplots that report a significant correlation between X and Y, and look like it’s just the one point to the upper-right driving the correlation? Thanks to this interactive tool, you too can do this at home.

%d bloggers like this:

Very nice. Why not add both an OLS and a robust regression line?

Thanks Eric! Statistical computing in Javascript is a massive pain in the backside, so right now I don’t really see myself implementing robust regression. What one could do is use R as a back-end with Shiny as an interface. I might do it at some point, it could be a good demonstration of Shiny+D3 at work (there’s already a couple of those here).

Not silly at all. This is looks to be a great teaching tool. It can be difficult to explain significance and p-values in a way that sticks correctly in students’ minds.

I actually think there’s a lot of potential in D3 for that sort of things (interactive demos for teaching). Somebody should definitely start a stats version of the Wolfram Demonstrations Project, hopefully skipping the annoying marketing parts.

Very good topic! It is extremely important since one encounters a lot of scientific publications in which one “influencing” data point drives the R-squared so high!

In R, we could use the ‘cooks.distance’ command:

Example 1: No data outlier

x <- rnorm(20, 0, 1)

y <- rnorm(20, 0, 1)

plot(x, y)

LM1 <- lm(y ~ x)

abline(LM1, col = 2)

summary(LM1)$r.squared

cooks.distance(LM1)

Example 2: added outlier which drives R^2 high:

x <- rnorm(20, 0, 1)

y <- rnorm(20, 0, 1)

x[21] <- 10

y[21] <- 10

plot(x, y)

LM2 <- lm(y ~ x)

abline(LM2, col = 2)

summary(LM2)$r.squared

cooks.distance(LM2)

In the last example the point 21 show a 1-2 magnitudes higher influence.

Cheers,

Andrej

Since we’re with teaching aids here, it would be quite useful to plot the Cook’s distances in this example…