An analysis of 630 billion words published online suggests that people tend to think of men when using gender-neutral terms, a sexist bias that could be learned by AI models
April 1, 2022
When people use gender-neutral words like “people” and “humanity,” they are more likely to think of men than women, reflecting the sexism present in many societies, according to an analysis of billions of words published online. The researchers behind the work warn that this sexist bias is passed on to artificial intelligence models trained on the same text.
April Bailey of New York University and colleagues used a statistical algorithm to analyze a collection of 630 billion words contained on 2.96 billion web pages collected in 2017, including informal text from blogs and discussion forums as well as more formal text, written by the media and corporations and governments, mostly in English. They used an approach called word embedding, which infers a word’s intended meaning from the frequency with which it occurs in context with other words.
They found that words like “person,” “people,” and “humanity” are used in contexts that more closely match the context of words like “men,” “he,” and “male” than words like “women.” “, “she”. ” and you”. The team says that because these gendered words were used more similarly than those referring to men — a reflection of male-dominated society — people might see them as more masculine in their conceptual meaning. They accounted for the fact that male authors may be overrepresented in their dataset and found that this did not affect the result.
An open question is how dependent this is on English, the team says — other languages, such as Spanish, contain explicit gender references that could alter the results. The team also did not consider non-binary gender identities or differentiate between the biological and social aspects of sex and sex.
According to Bailey, finding evidence of sexist bias in English is not surprising, since previous studies have shown that words like “scientist” and “engineer” are also considered to be more closely related to words like “man” and “male” than to “woman”. ‘ and ‘female’. However, she says this should be of concern, as the same collection of texts trawled through by this research will be used to train a range of AI tools that will inherit this bias, from language translation websites to conversational tools -Bots.
“It learns from us, and then we learn from it,” says Bailey. “And we’re kind of in this reciprocal loop where we reflect it back and forth. It’s concerning because it suggests that if I were to snap my fingers now and magically get rid of everyone’s individual cognitive bias of considering a person to be male rather than female, we still have that bias in our society would have because it is embedded in AI tools.”
Magazine reference: scientific advancesDOI: 10.1126/sciadv.abm2463
More on these topics: