Everyday, a poor soul tries to understand copulas by reading the corresponding Wikipedia page, and gives up in despair. The incomprehensible mess that one finds there gives the impression that copulas are about as accessible as tensor theory, which is a shame, because they are actually a very nice tool. The only prerequisite is knowing the inverse cumulative function trick.

That trick runs as follows: suppose you want to generate samples from some distribution with probability density . All you need is a source of *uniform* random variables, because you can transform these random variables to have the distribution that you want (which is why you can survive on a desert island with nothing but the rand function). Here’s how: if is a random variable with uniform distribution over [0,1], and if is the cumulative distribution function corresponding to the density f(x), then:

has the right probability density. This is easy to prove using the classical transformation formula for random variables.

The trick also works in the other direction: if you take and run it through than you get back . So it’s also a way of making uniform random variables out of non-uniform ones.

Now let’s say that what you want to generate two correlated random variables and , representing for example “personal wealth” and “cigar consumption”. One obvious way to generate correlated random variables is to use a multivariate Gaussian, but here you can’t assume that your variables have marginal Gaussian distributions – wealth is notably non-Gaussian (I don’t know about cigar consumption). Let’s say that you want them to have marginal distributions and , but you still want to preserve some king of positive correlation.

Here’s a possible recipe: generate , from a correlated Gaussian distribution. Then transform them using the cumulative Gaussian distribution into , . Now u and v have marginal *uniform* distributions, but are still positively correlated.

Finally transform again to , – you still have positive correlation, but the marginals you want. You’ve just used a Gaussian copula. The technical definition of a copula you’ll find on Wikipedia corresponds to the joint probability distribution you have over , i.e. at the step where you have uniform marginals.

Here’s some R code that illustrates this:

require(mvtnorm) S <- matrix(c(1,.8,.8,1),2,2) #Correlation matrix AB <- rmvnorm(mean=c(0,0),sig=S,n=1000) #Our gaussian variables U <- pnorm(AB) #Now U is uniform - check using hist(U[,1]) or hist(U[,2]) x <- qgamma(U[,1],2) #x is gamma distributed y <- qbeta(U[,2],1,2) #y is beta distributed plot(x,y) #They correlate!

That sort of stuff is tremendously useful when you want to have a statistical model for joint outcomes (for example when you want to describe how the dependency between wealth and cigar consumption changes depends on the country being US or Cuba).

Another interesting aspect of copulas, more theoretical, is that this also gives you a way of studying dependency independently of what the marginals look like…

How do you correlate ‘x’ with a binomial or ordinal variable (y)?

One way is to use signal detection techniques such as the area under the ROC curve, which will be high if large values of x are associated with y=1 (and low values with y=0).

This is very useful, thank you. However, what if you want to preserve the exact correlation you had in the multivariate normal case ? How would you do that ?

I’m not sure you can do that easily with copulas, but why would you want to? Correlation coefficients are less useful for non-Gaussian variables.

I wish to test a model I was using. I want to simulate data for the model, knowing in advance the exact correlation, so I can see if the model estimates it good enough.

I found this Matlab code, which does it. Do you know how to write something like this in R ?

n = 1000;

rho = .7;

nu = 1;

T = mvtrnd([1 rho; rho 1], nu, n);

U = tcdf(T,nu);

X = [gaminv(U(:,1),2,1) tinv(U(:,2),5)];

Use the mvtnorm package to simulate correlated Gaussian vectors. The rest is just a few calls to cumulative density and quantile functions (e.g. gaminv = qgamma)

Can it be done bu copulus ?

I am trying to find the CDF $Z = \max(X_1,X_2,\dots,X_N)$ and in my case $X_i$ are correlated. Is there any transfer domain or one to one function where I can derive the CDF and invert back to current domain ? Or how to handle this type of problem ?

Since the correlated joint distribution is not known ( but marginal distribution is known), I was wandering if we can transfer the variables in other domain e.g. multiplying with another matrix (whitening) so that the joint distribution would be simple multiplication of marginal distribution. But I am not sure about the equivalent one to one transforming function for maximum.

Thanks for sharing! What if the copula is t instead of Gaussian? How do I replace rmvnorm() with something similar that takes correlation matrix as input?

Hi Lisa. Use rmvt from package mvtnorm.

Great post. How can one use this concept to determine correlation of two samples that come from non Gaussian distributions without having to fit a known marginal. You allude to this in the statement “Another interesting aspect of copulas, more theoretical, is that this also gives you a way of studying dependency independently of what the marginals look like”. Is there a way?

You’ll find some information this doctoral thesis by Mélanie Rey. The basic idea is that if you want to study the dependence of variables X and Y then you can just look at how much their copula differs from an independent copula. By definition this is invariant to monotonic transformations, so you’ll get the same results if you look at (X,Y), (log X, log Y), or (X^(1/2), 3*Y). There’s also quite a bit of literature on tail dependence (meaning a correlation between extreme events).

Thank you for the reply and for that link, this is helpful in getting me started.