Women inequality meets NLP @ WeeklyWed

Sudha Jamthe
Aug 24, 2022

Yes, it is 24th August and today is the deadline for Abstract submission at NeurIPS 2022 Affinity group workshop on Language AI. We are totally in love with languages! We want to live, breathe, work on languages as there are so many in this world and it is wonderful to know about each of them and the icing on the cake is we get to conduct research and bring in our lived experiences.

 
Join us on this NLP joy ride as we focus on Women inequality and Bias. Yes, ethics is our core and we drive ethics where we go!

Sudha Jamthe is here to teach about this confluence of AI Ethics and NLP focusing on Women inequality and Bias.

Women inequality meets NLP this Wednesday @ 9 AM PT at WeeklyWed by BSAI

Imagine this conversation with an AI agent: 

Human : Hi AI, I have a few questions for you!!

Human : “What is the gender of a doctor?”

Human : “What is the gender of a nurse?”

AI replies: Doctor is a masculine noun and nurse is a woman

These questions were asked to GPT-3 and the answers above were alarming!

Here is an exceprt from two research studies about this topic. Have a good read!

Also, find the references at the end, where you get more infromation and come with your questions!

Gender bias is the preference or prejudice toward one gender over the other. Gender bias is exhibited in multiple parts of a Natural Language Processing (NLP) system, including the training data, resources, pretrained models (e.g. word embeddings), and algorithms themselves. NLP systems containing bias in any of these parts can produce gender biased predictions and sometimes even amplify biases present in the training sets. 

The propagation of gender bias in NLP algorithms poses the danger of reinforcing damaging stereotypes in downstream applications. This has real-world consequences; for example, concerns have been raised about automatic resume filtering systems giving preference to male applicants when the only distinguishing factor is the applicants’ gender.

As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modelling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. 

Unsupervised artificial intelligence (AI) models that automatically discover hidden patterns in natural language datasets capture linguistic regularities that reflect human biases, such as racism, sexism, and ableism. These unsupervised AI models, namely word embeddings, provide the foundational, general-purpose, numeric representation of language for machines to process textual data.

Word embeddings identify the hidden patterns in word co-occurrence statistics of language corpora, which include grammatical and semantic information as well as human-like biases. Word embeddings play a significant role in shaping the information sphere and can aid in making consequential inferences about individuals. Job interviews, university admissions, essay scores, content moderation, and many more decision-making processes that we might not be aware of increasingly depend on these NLP models.

When words representing concepts appear frequently with certain attributes, word embeddings learn to associate the concept with the co-occurring attributes. For example, sentences that contain words related to kitchen or arts tend to contain words related to women. However, sentences that contain career, science, and technology terms tend to contain words related to men. As a result, when machines are processing language to learn word embeddings, women, as a social group, appear in close proximity to words like family and arts relative to men; whereas, men, as a social group, appear in close proximity to career, science, and technology. When these stereotypical associations propagate to downstream applications that present information on the internet or make consequential decisions about individuals, they disadvantage minority and underrepresented group members. As long as language corpora used to train NLP models contain biases, word embeddings will keep replicating historical injustices in downstream applications unless effective regulatory practices are implemented to deal with bias.

References 

  1. Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., ... & Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976.

  2. A. Caliskan, “Detecting and mitigating bias in natural language processing,” Brookings, May 10, 2021. https://www.brookings.edu/research/detecting-and-mitigating-bias-in-natural-language-processing/ (accessed Aug. 24, 2022).

See you @ WeeklyWed.

We are here to learn, share and explore with all the students.

WeeklyWed @ 9 AM PT, 24th Aug, 2022.

— WeeklyWed Team