Algorithmic fairness is a vital field of research that aims to understand and correct biases in data and algorithms. Biases can be introduced into algorithms in several ways, including through the framing of the problem, collecting data, preparing data, and model selection. However, there are steps we can take to avoid bias, including being aware of our assumptions, collecting diverse data, regularly auditing our models, and being transparent about our models. By understanding and correcting biases, we can build unbiased AI models that serve everyone.
Algorithms have become an integral part of our lives, from making decisions about what to buy to whom to connect with. However, the increasing use of algorithms in decision-making processes has led to concerns about their fairness. If algorithms make incorrect predictions that systematically disadvantage certain groups of people, it is considered biased.
Influential companies like OpenAI have acknowledged this is as a main problem with Large Language Models like ChatGPT and are working on various initiatives to counter these biases. Within academia there’s been a lot of exciting research done over the past years, like a recent approach by MIT researchers to train logic-aware language models in order to reduce harmful stereotypes like gender and racial biases.
These initiatives fall under the umbrella of algorithmic fairness, a field of research that aims to understand and correct these biases. The field includes, but is not limited to:
Bias can be introduced into algorithms in several ways, including:
The problem framing stage involves deciding and defining the goal of a model in a way that can be computed. However, if the problem is framed in a way that unintentionally disadvantages certain groups, it can introduce bias into the model. For example, a study showed that an algorithm designed to optimize delivery for ads promoting STEM jobs while keeping costs low showed that men were more likely to be shown the ad, not because men were more likely to click on it, but because women are more expensive to advertise to. This issue, also dubbed as the Alignment Problem, can lead to systematically disadvantage specific groups and work against the actual intentions of the individual or company designing the algorithm.
Bias can show up when the data collected is not representative of reality or reflects existing prejudices. For example, a recent experiment by Web Summit’s women in tech community about how generative AI views women in tech and what this might say about how large-scale AI models are trained:
“Across the board, ChatGPT and Dall-e were more likely to imagine a tech founder and CEO as a man. While ChatGPT tended to generate more Chinese-American name combinations, including ‘Sarah Chen’ or ‘John Kim’, Dall-e was more likely to produce images of white men. Neither of these large-scale AI models was very diverse – there was a noticeable underrepresentation of people of colour and other ethnic minorities.Ultimately, our little experiment with ChatGPT and Dall-e prompted us to ask: Who chooses the data? Is it representative of the world around us? And do the teams working on these AI models have an ethical obligation to ensure they are trained in a way that supports diversity, equity and inclusivity?” (Web Summit, 2023)
As a second example, Google Translate's Hungarian (Non-Gendered Pronouns) data set collected from millions of examples of published text showed that the adjective "clever" has a higher probability of a "male context" and "beautiful" a "female." This data set is accurate to history but unfair.
Bias can be introduced in the data preparation step, such as cleaning or subsampling the data. For example, Amazon's hiring algorithm was trained on historical hiring decisions that favored men over women. As a result, it learned to do the same, penalizing terms such as "women's" in CV text.
Unless the model is built on strong theories of the underlying causal mechanisms, it can be difficult to identify and detect bias resulting from correlations with protected attributes. For example, Amazon's hiring algorithm started picking up on implicitly gendered words that were more highly correlated with men over women, such as "executed" and "captured," and using that to make its decision.
☝️These pitfalls illustrate one of the issues with Deep Learning Models. Perhaps this could have been resolved through using a causal model as opposed to a Natural Language Classification Model. Amazon now uses a “much-watered down version” of the recruiting engine to help with some rudimentary chores, including culling duplicate candidate profiles from databases.
Building unbiased AI models is crucial in today's world. Here are some tips to help you avoid bias in your models:
When framing the problem, be aware of your assumptions and biases. Consider what data you're using and how it could be biased. Seek outside perspectives if you're unsure. Last resort, reach out to us... 😉
Collect data that is diverse and representative of reality. This can help you avoid reflecting existing prejudices and ensure that your model is fair.
A great example for this is how Google AI tackled the issue that image data-sets are often found to be geographically skewed based on how they were collected. Google AI introduced an initiative for users to contribute images from minority samples and started a “inclusive Images Competition” on Kaggle to encourage submissions of diverse data.
Regularly audit your models to detect and correct biases. Try tools such as AI Fairness 360, Fairness Flow, What-if tool , Fairlearn, and Sagemaker Clarify to identify and mitigate bias.
Always make sure to provide as much transparency as possible for the users/people affected by your model’s outcomes. By ensuring to always have direct feedback opportunities on the model output, you can build trust with users, collect crucial data to improve performance over time, and quickly identify cases of harmful bias.
Algorithmic fairness is an essential aspect of ensuring that algorithms work for everyone. By understanding and correcting biases, we can create a more equitable world and by being aware of the ways bias can be introduced and taking steps to avoid it, we can build unbiased AI models that are less likely to accidentally cause harm.
The techniques mentioned above can help you identify and mitigate bias in your AI models and ensure that your algorithms are fair. If you have any open questions or feedback, always feel free to reach out to us directly via LinkedIn or email.