Maldita.es is using AI to support complex fact-checking

From chatbots to custom tools, this pioneering fact-checking foundation’s journey reveals what it really takes to fight disinformation at scale

Image generated using artificial intelligence on Canva.

By: Pablo Pérez

At Maldita.es, the fact-checking process is primarily driven by our community engagement through an automated chatbot system on closed messaging apps. When individuals come across content that appears potentially misleading, they can submit it directly via the chatbot, which serves as an accessible and anonymous entry point. Each submission is received and managed by a team of professional journalists, who review, tag, and analyse the content. This team conducts thorough research using reliable sources and established methodologies to determine the accuracy of the information and provide a clear, evidence-based response.

A lot has changed since 2018, when Maldita.es Foundation started fighting disinformation. Among other things, our organisation has grown a lot. We now handle hundreds of searches every day from users who want to fact-check what they found online: videos, forwarded messages, memes, etc. We have published thousands of articles and gathered a valuable registry of experts who are willing to contribute with their knowledge. 

Our community has grown significantly, and with that growth come new challenges. As organisations expand, they accumulate more data, and journalists must navigate increasingly complex tagging systems, something that can be quite demanding. As knowledge becomes fragmented, it’s often required to knock on the door of the colleague who published an article years ago, or the one manually managing the databases. This becomes especially true for newcomers, who are less familiar with the organisation’s materials. 

In order to tackle these and other challenges, the JournalismAI Innovation Challenge, supported by Google News Initiative, gave us the opportunity to develop an AI assistant for fact-checkers. Rather than an opaque and centralised multipurpose access point, what we wanted was to build an AI layer on top of some of our internal tools. The goal is  to leverage those abilities that AI models are best at: pattern recognition, retrieval, clustering, etc. Introducing these functionalities within an expanded workflow not only enhances the AI system’s performance, but also enables us to monitor and correct any deviations in its behaviour. 

This is an ongoing project that impacts many different aspects of how Maldita.es operates. As such, it has involved a range of roles across various contexts, and we expect this to continue over the coming months. But there are some valuable experiences we can share that will hopefully be beneficial for other newsrooms or organisations trying to use AI for different purposes. 

1. Open source: from idea to implementation

When we started this project we were very sure that we wanted to implement open-source models for some of our retrieval-oriented tasks. We had located multimodal AI models that were a perfect fit for us. All we had to do was implement them. 

As you might expect, things got complicated as soon as we began the implementation phase. It became obvious very quickly that there was a huge gap between what those models were trained to do and what we wanted to do with them. More specifically, there was a mismatch between their data and our data.

The first barrier was language. Although Spanish is widely represented in the AI community, frontier niche models are still often found only in English, and while Spanish is close enough to be relatively supported, performance is clearly subpar to English. But beyond language, the clean, structured and prototypic samples the models were trained on often differed from the collection of highly varied, messy and unbalanced datasets that we work with. Some types of data, like screenshots or extremely long forwarded texts, are very frequent data for us, but are not the kind of objects these models are trained on. 

We obviously had to backtrack and carry out an unexpectedly long process of browsing and indexing candidate models for our task. This was not easy as model configurations are not reported systematically. What we initially believed could be elegantly handled by a single model had to be reframed into a modular system using different models for different kinds of data, affecting both usage and efficiency. 

Although we managed to implement our minimum goal, additional features had to be discarded because there were simply no models that could support them, or we were not able to find them. This experience was very valuable to us, and moving forward, we plan to allocate more time and resources to the process of developing open-source models from concept to implementation. 

2. Iteration registry

As everyone who works with AI models surely knows, measuring the system’s performance is crucial to make decisions based on an objective idea of how the model works, rather than cherry picking some results that fit or not into what we want to find. Following this, we were eager to test our matching system against real-world data. 

As our databases grew, testing on the full dataset became unfeasible, so we relied on specific partitions. However, this again led to imbalanced datasets, with variations in the date and nature of the content used for testing, which in turn led to inconsistent performance. If this had been properly documented, it would have offered valuable insights into the efficacy of our newly developed system in different scenarios. However, we were not systematic in documenting these processes, which made it difficult to determine to what extent performance could be attributed to dataset variations versus the model itself. This made us realise that proper documentation, not only of the metrics obtained, but also about the conditions under which they were obtained, is essential for generating robust data to inform decision-making. 

3. Talk talk talk

Our project required us to engage deeply with many different departments, including community managers, social network experts, journalists, fact-checkers, editors, archivists, etc. Our intention was to improve their experience, and this meant that we had to work closely with them. During the development of our project we carried out one-on-one interviews, focus groups and internal testings, all of which were crucial for making sense of the work. 

We didn’t do it perfectly and learned a few lessons throughout the process. Sometimes we focused too much on what one colleague said and missed the general picture. Other times, the potential usefulness of our testing was limited by a prototype that failed at very basic features. But the overall consequence of this is that we always had in mind the real problems and situations faced by our newsroom. This often meant advancing quickly and decisively as they brought valuable ideas and input that we, as developers, might not have considered otherwise. In fact, it was often the editorial and community staff who put into practice the ideas we had drafted in the proposal. And this makes perfect sense, as they are the ones who will (hopefully) benefit from it!

  • Pablo Pérez is the AI Tech Lead at Fundación Maldita.es

———


This article is part of a series providing updates from 35 grantees on the JournalismAI Innovation Challenge, supported by the Google News Initiative. Click here to read other articles from our grantees.

Previous
Previous

AI is powering reader revenue at Daily Maverick

Next
Next

Cómo Maldita.es usa IA para optimizar procesos complejos de verificación