This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
A digital dual: web scraping vs automated political misinformation
Since May of last year, there has been a 1000% increase in AI-generated news and information sites created to spread misinformation.
According to the latest data from misinformation watchdog NewsGuard, there are over 800 AI-generated sites in action, with generic names including iBusiness Day, Ireland Top News, and Daily Time Update – shaped to appear as established news sites, churning out dozens, if not hundreds, of bot-written articles ranging from politics, technology, and entertainment.
Some of these sites are state-funded with the sole purpose of spreading misinformation. Others, using AI to find and churn out already-written articles, will occasionally rewrite ironic pieces as the truth, such as this one about Israel’s prime minister.
On social media, bots are running wild – and they’re more believable than ever. No longer are they spewing unrelated comments on posts, they’re now able to digest text and comments, and tailor a realistic reply that aims to also lead a reader to the links in their profile.
Experts believe that this kind of AI-generated text is a huge threat to the political landscape, and much larger than even audio, video, and image deepfakes.
“I think the greater threat is going to be the scale and speed you can generate text content. Especially given that it’s an election year, the threat to elections is really going to become messy,” says Sohrob Kazerounian, AI researcher at cyber security company, Vectra AI.
“You can generate text content that has appeared to various subgroups of people and it’s going to be a much more difficult thing to detect and regulate. The stories we hear about deepfakes on the audio of video side of things is relatively quickly flagged,” he says.
Whether it be a site, comments, articles, or profiles, it is becoming easier for those with malicious intentions to mass-influence public knowledge within their usual news consumption through the use of AI and troll farms.
“The interesting thing with text-based disinformation propaganda is that you can generate highly tailored text,” expands Kazerounian.
It’s generally understood that state actors are behind the activity, whether it be to incentivise and disrupt another country’s elections, or to internally steer and manipulate it in their own favour.
“There are troll farms in Russia, and apparently some in Brazil, China, and some other countries, and those people’s main job is to shape public opinion using comments on social media, using fake accounts, or publishing misleading articles,” explains Vaidotas Šedys, Head of Risk Management at web scraping firm Oxylabs.
“The people responsible for spreading misinformation have deep pockets, sometimes funded by governments, and have access to unlimited resources, helping them to make broad misinformation campaigns.”
Web scraping to tackle misinformation
Šedys explains that Oxylabs is offering its web scraping tool to partners such as Debunk.org, the Civic Resilience Initiative, Bellingcat, and many more to help detect and track fabricated content at a large scale.
In a nutshell, web scraping is the name for little bots that go into websites, collect information, and feed it back to the user.
Google, for instance, is a web scraper. The tool is in use whenever a search is made in its engine to find related words to the inquiry and present the results, thereby dictating Google search ranking, and leading to an entire sub-industry known as Search Engine Optimisation (SEO).
More advanced uses for it now include price comparison sites which use the tool to compare prices on different products, such as flights.
For misinformation for elections, and even wars, and other sectors such as health, the partners use it to monitor trends of misinformation across the internet.
Once discovered, Šedys explains that the partners can use the information to create counter misinformation campaigns and to educate the public on what’s happening.
“Once you get a small piece of misinformation, you can see what’s happening on the whole internet. Is it on a specific side of the internet? Is it just one article? Maybe there’s hundreds of it, and this is where you can start using the technology.”
For instance, before Russia invaded Ukraine in 2022, Oxylabs web scraper tool was able to detect unusual activity from a “journalist” who was apparently able to publish 38,000 articles in one year, primarily about Ukraine.
Two days before the full-scale invasion, the journalist (or AI-bot), was able to write 150 different articles in a single day, mainly related to propaganda that the Kremlin was working to their agenda.
“Manually checking this would not be possible,” says Šedys.
Still, Šedys homes in that it is state-based actors against non-governmental organisations with barely comparable budgets.
“NGOs and other institutions that track this misinformation have really limited resources. In most cases, it’s a small number of volunteers that are committed to this,” he says.
“By using web scraping, just by having a single piece and some understanding how the propaganda or misinformation works, you can replace hundreds and work on it efficiently.”
Plus, web scrapers also have to work around bypassing sites that block the technology by using different IP addresses, for instance, and there is much talk on how legal the technology will be.
“We have a strong legal team here, and our own internal guidelines on how to perform web scraping in order to comply with local laws and regulations,” says Šedys.
Plus, the team hosts legal webinars for its users which Šedys says is their most popular, “so this is being taken into account.”
Using AI
Former UK deputy Prime Minister Nick Clegg – now head of global affairs at Meta – said in a talk earlier this month that the social media giant is using AI to be a “sword and a shield” against misinformation.
Artificial intelligence can be used to scan content on its platforms and detect misinformation at scale, in a similar way to web scraping.
“Web scraping can be used to provide the data,” explains Kazerounian, in conversation over using AI as a whole.
This information can be used to train the models to recognise specific false information – “So all of these models are improving based on data that is being fed on web scraping.”
But when it comes to using AI for misinformation, Kazerounian says it is “sort of a double-edged sword.”
“I think AI can be used to detect some of this [misinformation], but at the same time, any of those techniques that you have can be used itself to inform the generative side.”
So, if a generative AI model can learn that what it’s outputting is easily detectable (which is the case for deepfakes, too), then it can improve the model so then it is less likely to be caught out.
Read: How misinformation can impact businesses
“So, there’s this inherent tension. You have an AI system that is generating content, and an AI system that detects it versus the real content, and this adversarial process is what gets used to train both models, so every time one gets better, the other improves as a result,” Kazerounian adds.
“So, this sort of thing that happens in tandem is really what drives the process.”
As generative AI improves and text-based content becomes more intelligent, it’s not only elections that are at threat of being impacted, but also businesses vulnerability to being scammed through tricks such as phishing.
To solve this, Kazerounian says there needs to be more awareness and more training on how to detect any fraudulent content.
“There’s going to be a degree of getting people comfortable with having a gut sense of understanding of what’s fake and what’s not.”
Alternatively, companies may start getting employees to cryptographically sign everything that is legitimately them, for it to be verified to check it’s them other than a deep faker.
#BeInformed
Subscribe to our Editor's weekly newsletter