leading-ai-chatbots-struggle-to-generate-accurate-news-summaries
Leading AI Chatbots Struggle To Generate Accurate News Summaries

Artificial intelligence has proven useful for a multitude of tasks. One of the most touted features by AI-focused companies is the ability to summarize content. This seems great for very long or complex articles where the chatbot could offer a more “digestible” version. However, some of the leading AI chatbots have proven inaccurate when generating news summaries in tests.

The BBC tested four of the leading AI chatbots, focusing on their ability to summarize news. The chatbots in question are OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Anthropic’s Perplexity. During testing, the BBC enabled AI chatbots to access its news feeds. The outlet usually doesn’t allow this as they use a “robots.txt” file to tell AI platforms that they can’t grab content from its website. However, they temporarily disabled the restriction for testing.

AI chatbots have high probability of generating inaccurate news summaries, BBC tests show

The experiment consisted of making AI chatbots generate summaries for 100 BBC news articles. The outlet also brought in experts in the relevant news topics to rate the outputs. The results showed that 51% of the summaries generated had notable problems of some kind. The most worrying part was that there was a hallucination rate of 19%. More specifically, the summaries for 19% of the articles included incorrect—or non-existent—statements of fact, figures, or dates.

See also  DeepSeek And Other Chinese AIs Might Be Banned In The US

The report also mentions that the chatbots “struggled to differentiate between opinion and fact, editorialized, and often failed to include essential context.”

Deborah Turness, CEO of BBC News, had a few words regarding the results of the tests. She considers AI to be a source of “endless opportunities.” However, Turness considers that AI firms are “playing with fire.” “We live in troubled times, and how long will it be before an AI-distorted headline causes significant real-world harm?

AI platforms aren’t inherently bad at generating summaries

Turness says she is open to “work together in partnership to find solutions.” OpenAI was the only one of the four AI companies to offer a statement regarding the results. “We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We’ll keep enhancing search results,” a spokesperson said.

This doesn’t mean that AI platforms are inherently bad at generating summaries. They tend to do a pretty good job when it comes to small bits of information from different sources. AI-powered tools that summarize emails also work fine. However, it seems that things get more complicated when they have to deal with longer and more complex content.

See also  Samsung Becomes The Largest Shareholder In Rainbow Robotics