Shomrim’s AI tool flags subtle news distortions

30 Jun

Learn how this non-partisan newsroom is using AI to help journalists spot hidden reporting flaws – from vague sourcing to emotionally loaded language, and overlooked gaps in coverage – all in service of more transparent journalism

By: SHOMRIM’s Team

When facts are not enough

When high-profile events unfold, we often see the same occurrence described dramatically differently by various news outlets. The competing narratives can be so divergent that they appear to depict entirely separate realities. This phenomenon is not new – but it has become increasingly problematic in an era marked by a proliferation of content channels, growing societal polarisation, and the rampant spread of fake news and “alternative facts”.

Traditionally, efforts to combat misinformation and disinformation have relied on fact-checking – a binary approach that classifies claims as either “true” or “false” (the spectrum is wider, but the general logic is predicated on a true/false premise). Yet this approach often fails to capture the more subtle ways journalism can distort reality. Every time journalists report on an event, they face the task of compressing a complex, multifaceted reality into a concise narrative. This process inevitably involves omissions, which shape the lens through which readers perceive events and form opinions.

The question is: to what extent is a reader’s perception grounded in fact? Or, conversely, how much of it is shaped by evocative language, heightened drama, insinuations and vague generalisations – tactics that may be used to engage audiences or, at times, to promote a particular agenda?

When significant gaps exist in reporting – and specifically when the journalist is not transparent about these gaps – there is an immediate, often unconscious tendency to fill in the blanks, especially if those gaps are seasoned with non-factual elements. We divide this mechanism into two groups of journalistic flaws: missing information (factual) and compensative information (anything but).

Here is a hypothetical (and deliberately general) example:

“The government is spying on citizens.”

[NOTE: this and all the following examples are hypothetical and any connection to real reporting is coincidental]

Syntactically, the object “citizens” is indefinite (it is not “the citizens”); semantically, it does not refer to any specific group in the real world (it is not “the citizens of X” or a defined population). When readers encounter a phrase like that, they may complete the missing information themselves. For example: “The government is spying after [all] citizens” or “The government is spying after [innocent] citizens.” Through omission, an idea can be planted in the reader’s mind – without the writer having to take responsibility for explicitly stating it.

This phenomenon can be traced back to a concept in Aristotle’s Rhetoric called the Enthymeme. In a nutshell (and extremely reductively), the enthymeme is a method by which an orator relies on the audience to complete an unstated argument – whether by drawing on shared knowledge or through rhetorical cues. These cues correspond to our second group of flaws: compensative information. Among them, we find non-factual elements such as tone, emotional description, opinion, and implication.

For example:

“Reminiscent of Orwell’s dystopian prophecy in his novel 1984, the government is spying surreptitiously on unsuspecting citizens.”

We were inspired by the challenge of creating a solution that could help journalists identify potential blind spots in their reporting – both in pre-publication and in post-publication analysis, as part of their broader journalistic work.

In this post, we describe an AI-based system that analyses journalistic articles and helps detect potential reporting flaws. The system works on a per-article basis and connects to a Large Language Model (LLM) capable of recognising over 30 predefined patterns commonly associated with rhetorical and structural weaknesses in reporting.

System in action: Spotlight on potential reporting flaws

As a beta implementation, we are focusing on newsroom workflows aligned with the principles of “slow journalism”. Imagine a journalist who works on a story. He or she reads various articles that report on different aspects before composing the story, which in turn is subject to publication.

Whether it is her or his story, or an already published article that may support it, the journalist can submit text to the system and get a set of reports that spotlight potential reporting flaws. The following figure shows the main report that specifies the flaws.

Figure: The main flaws report. The left panel displays the article text (shown here in Hebrew). The right panel lists identified potential flaws. Clicking on a specific flaw highlights its location in the text and provides a detailed explanation of why the system flagged the issue.

Under the hood: Deconstructing journalism

We have configured a flexible programmatic workflow that interacts with multiple instances of an LLM (agents). Agents can be thought of as specialised analysts. They work together to decompose articles into specific aspects and elements of reliability. The system does not attempt to validate the reported facts themselves. Instead, it focuses on the quality of reporting.

Each agent is instructed to recognise a specific designated pattern that may reflect journalistic flaws. In total, the system can currently identify over 30 distinct patterns, across the following four categories:

1. Flaws in the presentation of the factual report

Flaws under this category occur when part of the important aspects of a reported event is either missing entirely or not concrete enough. This may include, for example:

Partial or incomplete description of the event itself
Details of when it happened
Details of who was involved
Specification of where it happened
Explanation of why it happened
The outcomes of events
The sources of information, from which the details are inferred

To date, the system has detected ten different factual-reporting patterns that may indicate flaws.

To demonstrate the operation of the system, we have composed a hypothetical article, in Hebrew, that reports on tension in the nuclear talks between the US and Iran. The article contains a sentence that can be translated to: “Alongside the renewal of talks, voices from both sides are discussing the possibility of a failure in the negotiations.”

The system alerted that the article reports on “voices from both sides” but does not specify whose voices these are.

2. Flaws that relate to wording or language

Flaws under this category occur when wording or language may distort the reader's conception of events. For example:

Descriptions that convey strong sentiment
Use of irony
Internal emotions or intentions of a person are inferred by the journalist
Use of terms that invoke emotional confidence, like “as we all remember”
Non-required use of adjectives or adverbs

Use of ambiguous terms
Use of direct contradiction terms, like although or despite
Use of metaphors as part of the reporting

To date, the system has detected nine different patterns under this category.

For example, in the hypothetical article about the nuclear talks there is the following sentence: “In this way, Witkoff expressed a subtle threat towards the Iranians.” The system alerted that the term “subtle threat” is emotionally charged.

Notice that the system does not aspire to make judgements. Depending on editorial standards, such a term may be acceptable, for example, in an opinion column.

3. Flaws that relate to sources and citations

This category contains flaws that may occur as part of citing the evidence behind the reported facts, or as part of collecting relevant comments and responses. For example:

Citations from anonymous sources
Interpretations of citations, which are made by the reporter
Short citations, which appear to be cut out from a longer statement

To date, the system has detected six different patterns under this category.

For example, in the same hypothetical nuclear-talks article, the system flagged the following sentence: “From the American side, on Friday, Witkoff issued a stern warning that if the talks in Oman are not productive, they will not continue” – and remarked that since this is the first mention of Mr. Witkoff, he should have been mentioned as the US Special Envoy to the Middle East, Steve Witkoff.

4. Flaws that refer to the article as a whole

This category contains flaws that refer to the entire text of the article and not just to a specific part of it. For example:

The headline of the article does not suit its entire text
The headline is sensational
The main arguments of the article are presented in a redundant way
The article includes significant colour or environmental descriptions, which does not contribute to the factual report

To date, the system has detected six different patterns under this category. For example, the headline of the hypothetical nuclear talks article was: “In Iran and the U.S., they anticipate failure in the nuclear talks,” whereas the article was only reporting on tension in the talks, and the system alerted on that the headline seems to draw a conclusion.

The challenges we faced: Atomic units and black boxes

Developing the prompts led to extensive internal discussions addressing critical questions such as: What constitutes a factual statement? How flexible should the system be regarding different journalistic genres and their reliance on factual reporting? This inquiry process, the "Journalistic Deconstruction", has yielded significant insights.

One outcome is that we expanded upon the traditional "Five Ws" of journalism. Our analysis has determined that the mere presence of a subject (“Who”) is insufficient; the subject must also meet specific quality criteria. An indefinite subject (e.g., “citizens broke through the fences” versus “the citizens broke through the fences”) is considered a flaw. Furthermore, even a definite subject must represent a clearly defined entity with a verifiable presence in reality (e.g., “three protesters who participated in last night’s demonstrations broke through the fences”).

Another challenge was defining what constitutes a factual "atomic unit" for evaluation. After long and in-depth deliberations on the matter, we have reached some interesting results. At a certain point, we decided to let the LLM itself break down the articles into factual units "as it sees fit", without imposing a fixed set of rules. After several rounds of testing and iteration, we discovered it performs this task remarkably well. This is a prime example of what is often referred to as the AI’s “black box”: we do not know exactly how the LLM segments the texts into factual components – but it does so effectively.

In this context, an additional development occurred: Initially, we determined that for each "flaw" identified by the output, a brief explanation would be provided, illustrating how the flaw affects the quality of the factual information and why it warrants attention. We authored these explanations ourselves and stored them in a directory, from which they were retrieved by the UI during the output stage. However, as the project progressed, we discovered that the LLM is highly capable of independently generating strong justifications for each flaw. Moreover, it can tailor these explanations to the specific text under analysis, providing context-sensitive insights. This approach not only feels more natural to users but also helps avoid the pitfalls that often arise when relying on generalised explanations. We are currently exploring the implications of this new capability. We are closely monitoring and examining the emerging pattern that shows the tool is occasionally exercising a degree of autonomy beyond what was originally defined. As long as this autonomy enhances its performance, we view it positively. Nevertheless, we are maintaining careful oversight to ensure it remains beneficial, and we are discussing possible measures to manage or limit this if necessary.

This was added to a more essential question in designing the system’s logic: what is the resolution at which the system conducts its evaluation? Does the system assess an entire article? A paragraph? A sentence? All of the above? We held numerous discussions around this issue and ultimately decided on a modular resolution – the output refers both to the article as a whole (where relevant) and to the individual factual units that compose it.

The UI was also something that we discussed extensively, as the system is designed for a very specific professional user base. The system relies on a URL as its input; therefore, we coded it to support all major Israeli news sites. At the same time, to allow input from less common sources, also enabled free-text input (this also enables journalists to upload their own drafts). This move created a UI challenge, since copying free text from websites often includes images, banners, and other elements that clutter the interface and could lead to inelegant results. Therefore, we decided – for now – that users will copy the free text into a Word document, which will then be uploaded as input to the system. At this stage, which is still very early and intended for a limited, committed user base, this compromise is preferable to risking unforeseen delays for a smoother interface that does not align with the core goals we set for the project.

Getting feedback: The litmus paper effect

Introducing the system to our colleagues in the newsroom was the biggest challenge for us, mainly because we didn't know what to expect: we were mindful of the diversity within our team and the range of reactions we were likely to encounter. We also acknowledged that changing old habits is never easy – especially among journalists who work within a deeply rooted professional tradition.

When the time came to introduce the system, we felt it went beyond expectations, and honestly, the best way to describe it is that everything we feared did happen... but in the best possible way.

The first part, introducing the system’s logic and the core journalistic values it’s built around, went well. Reactions were enthusiastic, and it sparked a lot of interest and excitement. Then we showcased the system. We took an article from one of Israel’s most respected newspapers, ran it through the system and discussed a few results. Most of it went well, but the very first result, about the headline, immediately kicked off a heated debate. At one point two colleagues came up to us and said, “Yeah, we read that article over the weekend, and we thought it was really solid”, and we will admit – that moment hit us. It made us realize the level of rewiring we’re going to need to do.

But the upside is that it already started happening. That 15-minute debate made people start looking at things differently – and over the next couple of days, I could sense them starting to notice flaws. The rewiring began right there in the room. Another thing that kicked in immediately was improvement: literally just a few hours after the session, we had a new prompt and new ideas for the feedback mechanism. Things started moving fast. We realised that day that innovative ideas, as well as a healthy feedback culture, are like litmus paper with a reactive substance – even the slightest touch can trigger a ripple of change.

Our vision: The cyborg journalist

The current system is a beta demonstration of capabilities. The system was implemented with a basic interface supporting the Hebrew language. We see the use of AI as fundamental to news systems, in the pursuit of higher reliability and reporting quality. Our roadmap includes many plans such as:

Enriching the existing analysis, including by identifying rhetorical flaws in arguments from both reporters and quoted sources
Analysis of images accompanying the reporting that may contribute to gaps and their alternative filling
Expanding the analysis to video segments that sometimes accompany news channels, and subsequently to entire news broadcasts
Improving the system’s success rates
Computing a quality score for articles and enabling benchmarking

In our vision, technology does not replace the judgment of a human editor, but supports it, expands it, and streamlines it. The technology supports the fundamental journalistic role of reporting reality in the most transparent and factual manner. We believe that supporting journalism in this way is more important today than ever before: to help humanity overcome the crises it has experienced in recent years.

———

This article is part of a series providing updates from 35 grantees on the JournalismAI Innovation Challenge, supported by the Google News Initiative. Click here to read other articles from our grantees.

JournalismAI Innovation Challenge supported by the Google News InitiativeInnovation Challenge

Guest Writer