# Solitaire Championship

 Today I’m going to talk about Simpson’s Paradox. I’ll demonstrate it with a simple example: I decide to challenge my daughter to a Solitaire competition. The aim is to find out who is the best Solitaire player. For five days, we’re each going to play games in the evening and report our scores to declare a winner.
 On Monday, I play ten games, and win four of them. This gives me a success rate of 40% (4/10) My daughter players seven games and wins three. Her win percentage is 42.86% (3/7) She wins the first day. On Tuesday, I play nine games, and win eight. My win percentage is 88.89% (8/9) My daughter plays four games and wins all four of them to achieve a perfect 100% (4/4) She wins the second day. On Wednesday, I play seven games, and win six. My win percentage is 85.71% (6/7) My daughter plays two games and wins both of them, again to achieve a perfect score of 100% (2/2) She wins the third day.

Daily competitions continue, and each day, my daughter wins. Here are the results in tabular format:

MonTueWedThuFri
Me 4/10 (40.00%) 8/9 (88.89%) 6/7 (85.71%) 9/10 (90.00%) 19/30 (63.33%)
Daughter 3/7 (42.86%) 4/4 (100.00%) 2/2 (100.00%) 2/2 (100.00%) 11/17 (64.71%)

As you can see, my daughter has won every single daily challenge. So this must make her the overall champion, right?

Well, something interesting happens if we sum up all five days. If we total up all five days, we find that I have played a total of 66 games, and won 46 of these. This gives me an overall percentage winning ratio of 69.70% (46/66).

MonTueWedThuFriTOTAL
Me 4/10 (40.00%) 8/9 (88.89%) 6/7 (85.71%) 9/10 (90.00%) 19/30 (63.33%) 46/66 (69.70%)
Daughter 3/7 (42.86%) 4/4 (100.00%) 2/2 (100.00%) 2/2 (100.00%) 11/17 (64.71%) 22/32 (68.75%)

My daughter has played a total of 32 games and won 22 of them. This gives her an overall percentage of 68.75% (22/32). This is lower than my percentage!

My daughter has won every individual day, but when all the days are combined, I ended up winning.

 That does not make sense. How can I lose every individual day yet still be the winner of the overall competition? This is Simpson’s Paradox.

OK, what’s going on here? (Take your time and go back and check the arithmetic above. There is no funny business going on). As we will see later, the 'issue' is that my daughter and I played a different number of games on each day.

 Let’s look at another example: In professional baseball, statistics are recorded for batting averages. If there is a large difference in the at-bats between players in different years then it is possible to get situations where a player can have a higher batting average on both of two separate years than another player, yet when both years are combined, the results can be inverted, and he ends up with a lower batting average than the other player!

This situation happened in 1995-1996 between the players Derek Jeter and David Justice:

1995 1996 Combined
Derek Jeter 12/48 .250 183/582 .314 195/630 .310
David Justice 104/411 .253 45/140 .321 149/551 .270

In both 1995 and 1996, individually, David had a higher batting average that Derrick. However, when both years data are combined, he has a lower batting average.

Interestingly, in this case the 'discrepancy' carried on into the next season:

1995 1996 1997 Combined
Derek Jeter 12/48 .250 183/582 .314 190/654 .291 385/1284 .300
David Justice 104/411 .253 45/140 .321 163/495 .329 312/1046 .298
 This phenomenon was first documented by Udny Yule and Karl Pearson in 1899, but because of a paper written by Edward Simpson in 1951, it was given the name Simpson's Paradox by Colin Blyth in 1972. It's not really a paradox; the mathematics is not lying or changing, it's just that if you compare just the percentages then you are missing out on the the important variable of the sample size.
 To more correctly compare things you should really normalize the data to get to the same denominator (sample size).

Look at Derek Jeter's results for 1995, we see that he only went to bat 48 times. David Justice went to bat in the same year 411 times. To more accurately compare Derek to David, we should scale up (normalize) Derek's hits so that they have the same number of at-bats. (How many hits would we expect Derek to get if he'd been at bat the same number of times as David?)

 Derek batted 12/48, which is 0.250. If we assumed he continued at exactly that skill level, had he gone to bat 411 times, then we would expect him, to have hit 0.250 × 411 = 102.75 / 411 Similarly, in 1996, the opposite occurred, with David only being at bast 140 times, cf. the 582 times for Derek.

We can scale up David's ratio of 45/140 to what it would be if he had played 582 times. (45/140) × 582 = 187.07 / 582

Let's look at this effect on the data:

1995 1996 Combined
Derek Jeter (12/48) × 411 102.75/411 .250 183/582 183/582 .314 285.75/993 .288
David Justice 104/411 104/411 .253 (45/140) × 582 187.07/582 .321 291.07/993 .293

Now we can see that, after we've normalized the data and so each player has the same denominator, there is no vacillation of the result, and David has the higher average the entire time. A similar result would have been obtained in the Solitaire results were they scaled so that both of us had played the same number of games.

# Gender Bias?

One of the most famous real-life examples of Simpson's paradox occurred when the University of California, Berkeley was sued for bias against women who had applied for admission to graduate schools there. The admission figures for the fall of 1973 showed that men applying were more likely than women to be admitted, and the difference was "so large that it was unlikely to be due to chance".

Here are the total figures. At first glance it does look pretty damning:

Men 8,442 44%
Women 4,321 35%

However, as we've learned, combining statistics with different denominators can lead to bogus interpretations.

In fact, when examining the individual departments, it can be seen that no department was significantly biased against women.

Of the six largest departments (listed below), there was evern a "small but statistically significant bias" in favour of women.

Department Men Women
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 373 6% 341 7%

A research paper into the issue concluded that women tended to apply to competitive departments with low rates of admission even among qualified applicants (such as in the English Department), whereas men tended to apply to less-competitive departments with high rates of admission among the qualified applicants (such as in engineering and chemistry).

So, the issue was not about Gender Bias. Sure, we still need to do a better job to encourage more women to apply (the number of total applicants were fewer for women), but it was not about a bias in the gender of those that applied.