This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
IWD2023: How can we prevent gender bias in next gen chatbots?
Silicon Valley startup Open AI’s beta launch of ChatGPT has intensified the AI arms race. Big Tech companies are all scrambling to develop a generative AI that can act as a future gatekeeper of internet search and one that will also digitally transform enterprises with a host of new products and services.
Three months after its release, Microsoft, a long-time investor in ChatGPT, announced that the chatbot – which provides human-like answers to questions – would be integrated into its search engine and released as an API for enterprises to innovate with. In response, Google released its, also not-quite-ready large language bot, Bard.
Early mistakes by both offerings have been well documented. A factual error by Bard wiped $100bn worth of shares off Google parent Alphabet’s share price. ChatGPT’s blagging qualities have also been uncovered – it’s been found to confidently present falsehoods as facts while offering up seemingly plausible citations.
All this comes as no surprise to those who work in the field who appreciate the challenges of trying to develop large language models at scale.
Tech companies will likely spin early mistakes like this as learning opportunities. Meanwhile we, the users, are testing the tech for free. It’s a bold approach that Open AI has taken to create a buzz, but it’s not a careful one and in the rush to innovate tech companies could be inadvertently discriminating.
Training data
ChatGPT was initially trained on 570GB of data obtained from books, webtexts, Wikipedia, articles and other pieces of writing on the internet.
Algorithms are only as good as the data they are trained on, and all data contains bias. Sometimes it’s what’s omitted rather than included. The field of medical research for instance, is dominated by men and there is subsequently far more data on male specific conditions than female conditions such as endometriosis.
Historic data also contains biases that have resulted in blatant discrimination – remember Microsoft’s faulty facial verification software that discriminated against Uber drivers of colour? Or Amazon’s recruitment software that taught itself – based on 10 years’ worth of CVs – to only select male applicants for tech jobs?
Will the chatbots of the future now repeat the algorithmic sins of the past?
As Ivana Bartoletti, a data privacy officer at WiPro warns in her book An Artificial Revolution – On Power, Politics and AI: “If society, as it is today, is the only model we use to train algorithms that are going to affect us tomorrow then we risk hard coding injustices and prejudices into societies of the future.”
And yet, in the race to produce an all-singing all-dancing chatbot, it feels that we’re at risk of hardcoding these prejudices into what may soon become trusted digital gatekeepers, ones that also power a myriad of enterprise-based products and services.
One of the datasets powering ChatGPT is Wikipedia – a site on which 80% of content is generated by men. Jimmy Wales, founder of the world’s fifth most visited site, admits that systemic biases are often reflected in its content curation. A 2021 study found that, in one month 41% of the online encyclopedia’s biographies nominated for deletion were of women despite only 17% of published biographies being of or about women.
Toxic labels
According to Bartoletti, bias can emerge from any point of the AI life cycle from the training data to representation and evaluation. “Who are the people labelling the data? Where do they come from? Are they diverse enough?” she asks.
Unfortunately, the very act of protecting vulnerable groups online can sometimes lead to the harm of others. The laborious labelling of datasets is often outsourced to workers in developing countries, for instance, for very little pay.
ChatGPT is no exception. In January Time magazine reported that OpenAI used Kenyan workers earning less than $2 per hour to label toxic content so that it could build a safety system against harmful material, which was eventually deployed in the bot we’re all currently experimenting with.
Workers there describe being exposed to thousands of images of sexual abuse, hate speech, suicide and violence as “torture” and Sama, the Silicon-Valley based outsourcing contractor reportedly cancelled its work with Open AI last February, eight months earlier than planned.
While not deliberately ill-intentioned, in their rush to win the AI arms race, technology companies risk taking shortcuts at the expense of women and other minority groups in the name of ‘innovation’.
AI products will continue to be released before they’re ready and not all biases addressed. Many, in fact, may be repeated. Given these circumstances, it feels unrealistic to rely on commercially-driven technology companies to self-regulate.
Universal digital rights
As the world moves towards a new version of the internet, Web 3.0 and the metaverse, it feels as though there’s an opportunity to flush out biases of the past and create a safer space for women and other minority groups.
It’s a belief that’s motivated several women leaders in AI, including Bartoletti, to form an alliance calling for a global set of rules to be brought in to regulate the internet and digital technology.
Led by former Greenpeace campaigner and activist Emma Gibson, Alliance for Universal Digital Rights recognises that technology has no borders. It wants to take a similar approach to the agreements that have been brought in to tackle climate change.
“AI is reshaping the world right now,” says Gibson. “If you’re going to regulate for this, you need a globally-agreed set of rules and we think that the only way to empower women is if these rules are rooted in a human rights-based, feminist and intersectional approach,” she says.
Serendipitously, the alliance has found an ally in outgoing UN Secretary General Antonio Guterres, whose last project before retiring will be to create a common set of global standards to protect citizens online, and to get those who are digitally excluded, connected.
The UN’s Global Digital Compact has appointed a tech envoy and two representatives (from Sweden and Rwanda) whose mission will be to get countries around the world to reach an agreement that will take the form of several core principles by 2024.
The alliance is hopeful that ‘equality by design’ will be one of these principles – which looks at how algorithms are designed and how algorithmic impact and equality impact assessments can be carried out.
“The compact is not a legally binding treaty but an agreement based on a set of shared principles that will inform the laws that are put in place,” Gibson explains.
Data reflects the society we live in, which is still not sadly equal. This data is then fed into machines like ChatGPT, which provides the answers we are looking for and informs our views. That’s why AI needs to be governed by external forces and not left to Big Tech.
As Bartoletti argues in her book: “Like nuclear power, AI can bring enormous opportunities, but to do so requires a form of authority enshrined in global governance to avoid its terrifying downsides.”
#BeInformed
Subscribe to our Editor's weekly newsletter