×   Home   Blog   Newsletter   Privacy   Contact Us   About

The importance of independence

Imagine there is a judge, and this judge is good judge. After reviewing all the evidence in a case, she consistently rules correctly 90% of the time (this article is a math article, not a philosophical article, nor any comment to the legal system! I know real legal matters are much more complex and opinion based than the oversimplification I am representing them here, and it is open to interpretation what ‘correct’ really means!)
What if you wanted to get more confidence than 90% in a ruling, how could you do it?
You could hire more independent judges. For instance, you could employ three judges, not one, and say that you will accept the result of the simple majority of these (at least two out of the three). If each judge, independently, rules, and each of the judges is correct 90% of the time, what is the combined chance of getting the correct result?
Statisticians talk about events like these as Bernoulli trials (a judge either rules correctly, or does not), and there is a binomial formula to calculate the probabilities, but let’s draw tree diagram enumerating all the possibilities:
The key here is that the judges make their rulings independently.
Each judge can rule correctly (90% of the time), or incorrectly. The paths that have (at least) two judges ruling correctly are marked with a green check. Overall, there is a 97.2% chance of getting to the correct answer if we add these successful outcome chances. A good improvement from the base 90%.

Binomial Probabilities

Another way to think of this is that there are 3C2 ways of two judges voting correctly, and 3C3 for all three judging correctly. For two judges, this happens with a (0.90)2 × (0.10)1 probability. For three judges it is (0.90)3 × (0.10)0.
Generically, if the probability of a judge ruling correctly is p, and if there n judges, and we want to calculate the probability of k of them voting correctly then (number of ways this can happen, multiplied by the chances of getting that number of correct, and that number of incorrect, rulings):
So, for three judges, where we need a simple majority of at least two (p=0.9, n=3), we need the chances of getting exactly two, and the chances of getting exactly three:
This confirms the 97.2% we calculated before.
If we have five judges, and we need a simple majority of (at least) three:
(Any combination of three judges, or any combination of four judges, or all five judges).
Increasing the number of judges to five will increase the chances of the majority voting correctly to 99.144% (if each judge, independently, has a 90% chance of ruling correctly). We could go higher, but you get the idea (if dealing with larger numbers, statisiians use tables of cummulative binomial distributions so that individual terms don't need to be summed; or these days, use the Excel function!)

Experimenting

I wrote a few lines of code to allow simulations with varying numbers of judges, and for judges with asymmetric skills. Using simple Monte-Carlo simulation, and random numbers, it performs 100 million iterations of each event to get estimates of the probability.
We know, from above, that if every judge is 90% then it’s a 97.2% of the correct ruling. What about if we have judges with varying 'skill' levels?
Interestingly, if we have three judges with {0.90, 0.80, 0.70}, then we can still get over 90% accuracy (90.2%). If we have two judges at 90%, then we can still maintain 90% if the third is only accurate half the time! {0.90, 0.90, 0.50} = 90% (replacing the last judge with a coin flip).
Here are a few random examples of how the probabilities change with various combination of judge ‘skills’ when we have three judges and require a simple majority of at least two.
Overall Pr93.20%96.00%89.60%92.55%93.60%95.10%80.90%91.10%91.50%
Judge #10.950.950.800.930.960.960.850.950.90
Judge #20.900.900.800.800.800.920.900.900.80
Judge #30.550.750.800.750.750.600.200.400.75
Even if one of the judges is really bad (almost into sabotage territory with a probability of correctness below 0.5), because of independence, it’s possible to get a fairly good outcome when combined with others.
Advertisement:

Paradox?

This independence can lead to situations which, at first glance, can appear paradoxical.
Imagine you have a panel of five judges with varying skills ranging from 93% accuracy down to 70% accuracy {0.93, 0.90, 0.90, 0.85, 0.70}, where a simple majority of three dictates the outcome. Using this breakdown, the probability of the panel selecting the correct outcome can be calculated to be 98.035%
In this example, the worst judge on the panel has a skill of just 70% success. What if we changed it so that this low performing judge, rather the use his own independent skill, simply copied and duplicated the ruling of the first judge? (the one with 93% skill). Surely the chance of a successful verdict would go up? After all, we’re removing the vote of a poor performing a judge, and replacing it with double the vote of the best performing judge.
Paradoxically, if we do this, we reduce the chance of a successful verdict to 97.679%
In this case, removal of the verdict of the poorest performing judge and replacing it with double the best performing, reduces the chances of getting the correct result.
Paradox? Removing the poorest performer and replacing with double the best performer, reduces the chances of getting the correct result!
No! The loss of the independence affects the outcome (and also, the top performing judge is correct most of the time already, so doubling this weight has less impact on the overall probability).
If this is hard to comprehend, think of a case with just two judges; one at 93%, and one at 10%, for a case in which you just need one of them to rule to convict. Most of the time the strong judge will secure the conviction, but in the event the strong judge does not convict, there’s a chance, however small, that the weaker judge will provide the necessary verdict. If we simply doubled the vote of the first judge, this small second judge chance would never been seen.
There’s a related, seemingly paradoxical, puzzle we discussed a little earlier, concerning a Chess Tournament Puzzle, and a need to win two games in a row against opponents of different skill levels.

Supreme Court

There are nine justices on The Supreme Court of the United States.
If all nine judges voted, and we needed a simply majority of at least five, and if each judge were similary skilled, we can use the cummulative binomial function to calculate what skill levels would be needed to achieve certains levels of confidence.
Independance, and diversity, is a good thing.