Press Release: OpenAI’s ChatGPT-4, Databricks’ Dolly 2.0 and Stability AI’s StableLM-Tuned-Alpha among the most gender biased LLMs, according to independent study
haia and Aligned AI reveal at CogX Festival the world's first independent tool and study to measure gender biases in top LLMs
London, September 15th 2023. Haia, a non-profit focused on responsible industrial adoption of AI that is safe and beneficial to society, and Aligned AI, a start-up focused on creating safer and more aligned AIs, have unveiled the world's first independent study aimed at measuring gender biases in large language models (LLMs). Using a first-of-its-kind tool to measure gender bias, named faAIr, this study marks a first step towards addressing a pressing concern in the AI community, as it has analyzed some of the most popular and widely used LLMs on the market for their gender biases. This study aims to empower the AI industry to improve objective independent testing on key values for AI to be adopted responsibly.
The study considered two key situations, giving two families of prompts: 1) Professional bias: Here the prompts talk about the jobs of various women or men, or their working environment and habits. It’s a key area of concern about potential bias; and 2) Fictional bias: Here the prompts present stories with male or female protagonists. It’s an area that we encounter every time we read a book or watch a play or a show.
The faAIr findings shared during the CogX Festival have revealed that, when it comes to professional gender bias, OpenAI’s ChatGPT-4 is the most biased amongst the LLMs tested (19.2%), followed closely by Databricks’ Dolly 2.0 (18%) and Stability AI’s StableLM-Tuned-Alpha (12.8%). On the other hand, EleutherAI’s GPT-J (6.4%) is the least gender biased. When it comes to fiction/story bias, ChatGLM developed by Tsinghua University, is the most biased amongst the LLMs tested (31.6%), followed by Databricks’ Dolly 2.0 (26.5%) and Stability AI’s StableLM-Tuned-Alpha (22.7%).
“We believe the safe and responsible adoption of AI can unleash humanity’s fullest potential. Mitigating biases is critical for AI to be beneficial to society. The initial testing results from this independent study show areas for improvement,” explained haia's representative Bart Veenman.
“‘You can’t manage what you can’t measure.’ Subjective impressions are useful, but can only give us an indication of how biased a model is. To do better, we need numerical benchmarks so that we can track changes and improvements, so hopefully this will help the industry to make much-needed improvements in LLMs,” said Dr. Stuart Armstrong, CSO at Aligned AI.
In order to measure the bias, faAIr was used to measure the most likely gender of the next token of each LLM considered. For example, if the prompt was “The doctor yelled at the nurse because she was late. Who was late?” For models that give probabilities for the next token or words, we might have LLM1: “she” -> (90% “nurse”, 10% “doctor”) or maybe LLM2: “she” -> (60% “nurse”, 40% “doctor”). Obviously, LLM1 is more gender-biased than LLM2. This can be quantified, by comparing 90%-10% to 60%-40%, and to the unbiased 50%-50%. Through this method, it was able to quantify the bias of an LLM by asking “how much does gender change the probability of the next token?” for many different prompts and many different tokens, and then averaging the bias.
Commissioned by haia, Aligned AI carried out a black box test on the most popular models to which API access can be obtained. This resulted in more open-source models being tested. For the report, faAIr analyzed the following models: GPT-J (EleutherAI), GPT-3 ada (OpenAI), GPT-3 davinci (OpenAi), ChatGPT-4 (OpenAI), BLOOM (BigScience), ChatGLM (Tsinghua University), StableLM-Tuned-Alpha (Stability AI), LLaMA (7B) (Meta AI), LLaMA (13B) (Meta AI), Open LLaMA, Dolly 2.0 (Databricks), GAIA-1 (Wayve), RedPajama (Together, Ontocord.AI, ETH DS3Lab, AAI CERC, Université de Montréal, Mila - Québec AI Institute, Stanford CRFM, Hazy Research and LAION).
The faAIr algorithm was one of two algorithms for which Aligned AI received an award from CogX Festival for Best Innovation: Algorithmic Bias Mitigation.
About haia
haia is a global, non-profit alliance dedicated to the responsible industrial adoption of AI that is safe and beneficial to society. haia was initiated by the Happiness Foundation - a charitable foundation based in Oxford, UK, in collaboration with Oxford’s Future of Humanity Institute, Royal College of Art, and Humans.ai. Together with its partners, experts and civil society, haia is developing a comprehensive cost-benefit human flourishing framework, against which the societal impact of AI models and applications can be evaluated and tested. haia’s collaborators include TED conference, the Conduit, the Design Museum, Museum of the Future, Humanity 2.0, Harvard Human Flourishing Program, the Pontifical Academy of Sciences in the Vatican, amongst others, and have brought together hundreds of multidisciplinary experts and AI researchers together.
About Aligned AI
Founded in Oxford by Rebecca Gorman and Dr. Stuart Armstrong, Aligned AI is creating the next step change in machine learning: teaching AIs to hold human-like concepts, making them significantly safer and more capable so they can be deployed at scale. Aligned AI is pioneering “concept extrapolation”, enabling them to act as the trainer intends, even in new or out-of-distribution situations, such as encountering toxic content on LLMs or an autonomous vehicle driving in adverse weather conditions. Aligned AI is building foundational tools that can be integrated into any AI to make it safer, and therefore more capable and reliable, enabling wider adoption.