Today I’m going to talk about Simpson’s Paradox. I’ll demonstrate it with a simple example: I decide to challenge my daughter to a Solitaire competition. The aim is to find out who is the best Solitaire player. For five days, we’re each going to play games in the evening and report our scores to declare a winner. 
On Monday, I play ten games, and win four of them. This gives me a success rate of 40% (4/10) She wins the first day. 

On Tuesday, I play nine games, and win eight. My win percentage is 88.89% (8/9) She wins the second day. 

On Wednesday, I play seven games, and win six. My win percentage is 85.71% (6/7) She wins the third day. 
Daily competitions continue, and each day, my daughter wins. Here are the results in tabular format:
Mon  Tue  Wed  Thu  Fri  

Me  4/10 (40.00%)  8/9 (88.89%)  6/7 (85.71%)  9/10 (90.00%)  19/30 (63.33%) 
Daughter  3/7 (42.86%)  4/4 (100.00%)  2/2 (100.00%)  2/2 (100.00%)  11/17 (64.71%) 
As you can see, my daughter has won every single daily challenge. So this must make her the overall champion, right?
Well, something interesting happens if we sum up all five days. If we total up all five days, we find that I have played a total of 66 games, and won 46 of these. This gives me an overall percentage winning ratio of 69.70% (46/66).
Mon  Tue  Wed  Thu  Fri  TOTAL  

Me  4/10 (40.00%)  8/9 (88.89%)  6/7 (85.71%)  9/10 (90.00%)  19/30 (63.33%)  46/66 (69.70%) 
Daughter  3/7 (42.86%)  4/4 (100.00%)  2/2 (100.00%)  2/2 (100.00%)  11/17 (64.71%)  22/32 (68.75%) 
My daughter has played a total of 32 games and won 22 of them. This gives her an overall percentage of 68.75% (22/32). This is lower than my percentage!
My daughter has won every individual day, but when all the days are combined, I ended up winning.
That does not make sense. How can I lose every individual day yet still be the winner of the overall competition? This is Simpson’s Paradox. 
OK, what’s going on here? (Take your time and go back and check the arithmetic above. There is no funny business going on). As we will see later, the 'issue' is that my daughter and I played a different number of games on each day.
Let’s look at another example: In professional baseball, statistics are recorded for batting averages. If there is a large difference in the atbats between players in different years then it is possible to get situations where a player can have a higher batting average on both of two separate years than another player, yet when both years are combined, the results can be inverted, and he ends up with a lower batting average than the other player! 
This situation happened in 19951996 between the players Derek Jeter and David Justice:
1995  1996  Combined  

Derek Jeter  12/48  .250  183/582  .314  195/630  .310 
David Justice  104/411  .253  45/140  .321  149/551  .270 
In both 1995 and 1996, individually, David had a higher batting average that Derrick. However, when both years data are combined, he has a lower batting average.
Interestingly, in this case the 'discrepancy' carried on into the next season:
1995  1996  1997  Combined  

Derek Jeter  12/48  .250  183/582  .314  190/654  .291  385/1284  .300 
David Justice  104/411  .253  45/140  .321  163/495  .329  312/1046  .298 
This phenomenon was first documented by Udny Yule and Karl Pearson in 1899, but because of a paper written by Edward Simpson in 1951, it was given the name Simpson's Paradox by Colin Blyth in 1972. It's not really a paradox; the mathematics is not lying or changing, it's just that if you compare just the percentages then you are missing out on the the important variable of the sample size. 
To more correctly compare things you should really normalize the data to get to the same denominator (sample size). 
Look at Derek Jeter's results for 1995, we see that he only went to bat 48 times. David Justice went to bat in the same year 411 times. To more accurately compare Derek to David, we should scale up (normalize) Derek's hits so that they have the same number of atbats. (How many hits would we expect Derek to get if he'd been at bat the same number of times as David?)
Derek batted 12/48, which is 0.250. If we assumed he continued at exactly that skill level, had he gone to bat 411 times, then we would expect him, to have hit 0.250 × 411 = 102.75 / 411 Similarly, in 1996, the opposite occurred, with David only being at bast 140 times, cf. the 582 times for Derek. 
We can scale up David's ratio of 45/140 to what it would be if he had played 582 times. (45/140) × 582 = 187.07 / 582
Let's look at this effect on the data:
1995  1996  Combined  

Derek Jeter  (12/48) × 411  102.75/411  .250  183/582  183/582  .314  285.75/993  .288 
David Justice  104/411  104/411  .253  (45/140) × 582  187.07/582  .321  291.07/993  .293 
Now we can see that, after we've normalized the data and so each player has the same denominator, there is no vacillation of the result, and David has the higher average the entire time. A similar result would have been obtained in the Solitaire results were they scaled so that both of us had played the same number of games.
One of the most famous reallife examples of Simpson's paradox occurred when the University of California, Berkeley was sued for bias against women who had applied for admission to graduate schools there. The admission figures for the fall of 1973 showed that men applying were more likely than women to be admitted, and the difference was "so large that it was unlikely to be due to chance".
Here are the total figures. At first glance it does look pretty damning:
Applicants  Admitted  

Men  8,442  44% 
Women  4,321  35% 
However, as we've learned, combining statistics with different denominators can lead to bogus interpretations.
In fact, when examining the individual departments, it can be seen that no department was significantly biased against women.
Of the six largest departments (listed below), there was evern a "small but statistically significant bias" in favour of women.

A research paper into the issue concluded that women tended to apply to competitive departments with low rates of admission even among qualified applicants (such as in the English Department), whereas men tended to apply to lesscompetitive departments with high rates of admission among the qualified applicants (such as in engineering and chemistry).
So, the issue was not about Gender Bias. Sure, we still need to do a better job to encourage more women to apply (the number of total applicants were fewer for women), but it was not about a bias in the gender of those that applied.
You can find a complete list of all the articles here.^{} Click here to receive email alerts on new articles.