Algorithmic Justice League founder Joy Buolamwini gave a speech during the World Economic Forum this week on the need to fight AI bias.
Buolamwini is also an MIT Media Lab researcher and went somewhat viral for her TED Talk in 2016 titled ‘How I’m fighting bias in algorithms’.
Her latest speech included a presentation in which Buolamwini went over an analysis of the current popular facial recognition algorithms.
Here were the overall accuracy results when guessing the gender of a face:
- Microsoft: 93.7 percent
- Face++: 90 percent
- IBM: 87.9 percent
Shown in this way, there appears to be little problem. Of course, society is a lot more diverse and algorithms need to be accurate for all.
When separated between males and females, a
- Microsoft: 89.3 percent (females), 97.4 percent (males)
- Face++: 78.7 percent (females), 99.3 percent (males)
- IBM: 79.7 percent (females), 94.4 percent (males)
Here we begin to see the underrepresentation of females in STEM careers begin to come into effect. China-based Face++ suffers the worst, likely a result of the country’s more severe gender gap (PDF) over the US.
Splitting between skin type also increases the disparity:
- Microsoft: 87.1 percent (darker), 99.3 percent (lighter)
- Face++: 83.5 percent (darker), 95.3 percent (lighter)
- IBM: 77.6 percent (darker), 96.8 percent (lighter)
The difference here is likely again to do with a racial disparity in STEM careers. A gap between 12-19 percent is observed between darker and lighter skin tones.
So far, the results are in line with a 2010 study by researchers at NIST and the University of Texas in Dallas. The researchers found (PDF) algorithms designed and tested in East Asia are better at recognising East Asians, while those developed in Western countries are more accurate when detecting Caucasians.
“We did something that hadn’t been done in the field before, which was doing intersectional analysis,” explains Buolamwini. “If we only do single axis analysis – we only look at skin type, only look at gender… – we’re going to miss important trends.”
Here is where the results get most concerning. Results are in descending order from most accurate to least:
Lighter Males (100 percent)
Lighter Females (98.3 percent)
Darker Males (94 percent)
Darker Females (79.2 percent)
Darker Males (99.3 percent)
Lighter Males (99.2 percent)
Lighter Females (94 percent)
Darker Females (65.5 percent)
Lighter Males (99.7 percent)
Lighter Females (92.9 percent)
Darker Males (88 percent)
Darker Females (65.3 percent)
The lack of accuracy with regards to females with darker skin tones is of particular note. Two of the three algorithms would get it wrong in approximately one-third of occasions.
Just imagine surveillance being used with these algorithms. Lighter skinned males would be recognised in most cases, but darker skinned females would be stopped often. That could be a lot of mistakes in areas with high footfall such as airports.
Prior to making her results public, Buolamwini sent the results to each company. IBM responded the same day and said their developers would address the issue.
When she reassessed IBM’s algorithm, the accuracy when assessing darker males jumped from 88
Buolamwini commented: “So for everybody who watched my TED Talk and said: ‘Isn’t the reason you weren’t detected because of, you know, physics? Your skin reflectance, contrast, et cetera,’ — the laws of physics did not change between December 2017, when I did the study, and 2018, when they launched the new results.”
“What did change is they made it a priority.”
You can watch Buolamwini’s full presentation at the WEF here.
Interested in hearing industry leaders discuss subjects like this and their use cases? Attend the co-located AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, Blockchain Expo, and Cyber Security & Cloud Expo.