Have you ever seen these scatterplots that report a significant correlation between X and Y, and look like it’s just the one point to the upper-right driving the correlation? Thanks to this interactive tool, you too can do this at home.
Have you ever seen these scatterplots that report a significant correlation between X and Y, and look like it’s just the one point to the upper-right driving the correlation? Thanks to this interactive tool, you too can do this at home.
Very nice. Why not add both an OLS and a robust regression line?
Thanks Eric! Statistical computing in Javascript is a massive pain in the backside, so right now I don’t really see myself implementing robust regression. What one could do is use R as a back-end with Shiny as an interface. I might do it at some point, it could be a good demonstration of Shiny+D3 at work (there’s already a couple of those here).
Not silly at all. This is looks to be a great teaching tool. It can be difficult to explain significance and p-values in a way that sticks correctly in students’ minds.
I actually think there’s a lot of potential in D3 for that sort of things (interactive demos for teaching). Somebody should definitely start a stats version of the Wolfram Demonstrations Project, hopefully skipping the annoying marketing parts.
Very good topic! It is extremely important since one encounters a lot of scientific publications in which one “influencing” data point drives the R-squared so high!
In R, we could use the ‘cooks.distance’ command:
Example 1: No data outlier
x <- rnorm(20, 0, 1)
y <- rnorm(20, 0, 1)
plot(x, y)
LM1 <- lm(y ~ x)
abline(LM1, col = 2)
summary(LM1)$r.squared
cooks.distance(LM1)
Example 2: added outlier which drives R^2 high:
x <- rnorm(20, 0, 1)
y <- rnorm(20, 0, 1)
x[21] <- 10
y[21] <- 10
plot(x, y)
LM2 <- lm(y ~ x)
abline(LM2, col = 2)
summary(LM2)$r.squared
cooks.distance(LM2)
In the last example the point 21 show a 1-2 magnitudes higher influence.
Cheers,
Andrej
Since we’re with teaching aids here, it would be quite useful to plot the Cook’s distances in this example…