The reliably excellent Neuroskeptic links to a study called “The Distance between Mars and Venus“, which claims that men and women are actually more different in their personality than previusly thought. The study has already engendered the sort of nonsense you can expect with the title it has, along with quite a bit of more serious commentary (e.g. Andrew Gelman here), but

(A) I don’t see anyone explaining things with graphs yet so here you go

(B) I think that there’s something potentially wrong with the whole thing, which is that the more traits you measure, the more men and women will be judged “different” according to their technique, although each difference taken alone might be completely trivial.

Janet Hyde’s “gender similarity” hypothesis says that men and women are essentially the same on most measures. Essentially, if you take just about any psychological attribute, and look at the difference between the female average and the male average, you’ll find it’s small relative to the variation in that attribute (in other words the “effect size” is small). In statistical terms, for most psychological traits, gender “explains” very little of the variability in the population.

Del Guidice et al. do not deny that, but claim that looking at traits taken *in isolation* is misleading and that we have to look at the overall pattern. The whole article is written in terms of distance but one might also look at it in terms of clustering because as I’ll explain below using their technique you can make the distance look as large as you like.

Let’s say I have a population of 13-year olds, and I ask them to rate how much they like Justin Bieber and how much they like poneys. Then if we represent each person’s preferences as a point in space, the data might look like:


(notice how this graph subtly defies gender expectations?)

Now in my made-up data we see that girls like Justin and poneys more than boys do. If we compare the Bieber ratings between boys and girls, we find an effect size of 1. Same for poneys. Del Guidice et al. say that underestimates the “true” distance, which should be 1.4 (actually, the square root of 2).

Why? Because boys and girls form two “clusters” (partially distinct point clouds), which are actually easier to distinguish when you take all the traits you’ve measured into account, instead of just one.

Let’s we create a synthetic trait that’s the sum of how much someone likes Bieber plus how much they like poneys. It’s clear that there’s more information in there than in the Bieber trait or the poney trait alone. Graphically this corresponds to measuring boys and girls along the line that joins the center of the boy cluster and the center of the girl cluster (note to stats people: our new synthetic trait is the asymptotic Fisher discriminant). If you measure the effect size along that line you’ll find it’s around 1.4.

(the two crosses are the cluster centers)

Another way of saying the same thing is that if we were to choose a one-number summary of people’s preferences so that the boys and girls distributions are as different as possible, then our new synthetic trait would be optimal.

What if we had also measured preference for football (sorry about all the gender clichés here)? Then our two populations would be even easier to distinguish, we could create a new synthetic trait Bieber + Poneys – Football, and the effect size would be even larger. Measure enough traits, and you can claim a distance as large as you like.

There are interesting things to be done with that kind of multivariate data, and it’s certainly useful to not just look at the distribution of one-dimensional traits, but I’m afraid there’s much sense in trying to quantify how different *in absolute terms* men and women are.

Here’s a more technical version of the argument above: it’s easy to come up with a model that has Mahalanobis distance (equivalently, max. effect size) going to infinity in O(sqrt(n)), where n is the number of traits measured, even if each difference taken alone is trivial. For example let’s say that there’s an infinite collection of traits t_1 t_2 …
On each trait boys score:
and girls:

Then even if you have trivial differences, like m_i \sim U(0,.1) and f_i \sim U(.1,.2), the Mahalanobis distance is going to converge to \sqrt{n}*.01 (give or take), which goes to infinity as n grows large, albeit slowly.
The proof comes from taking the expectation of \sum { (m_i - f_i)^2 }.

(Note: technically you also need your sample size to go to infinity to be able to go on estimating the Mahalanobis distance as n increaseses)

Seen differently: each additional trait gives you additional information allowing you to distinguish boys from girls.

This stays true (I think, haven’t proved it) if traits are correlated as long as the covariance matrix stays non-singular for all n.