Leading AI Chatbots Struggle To Generate Accurate News Summaries

Contents

AI chatbots have high probability of generating inaccurate news summaries, BBC tests show

AI platforms aren’t inherently bad at generating summaries

Artificial intelligence has proven useful for a multitude of tasks. One of the most touted features by AI-focused companies is the ability to summarize content. This seems great for very long or complex articles where the chatbot could offer a more “digestible” version. However, some of the leading AI chatbots have proven inaccurate when generating news summaries in tests.

The BBC tested four of the leading AI chatbots, focusing on their ability to summarize news. The chatbots in question are OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Anthropic’s Perplexity. During testing, the BBC enabled AI chatbots to access its news feeds. The outlet usually doesn’t allow this as they use a “robots.txt” file to tell AI platforms that they can’t grab content from its website. However, they temporarily disabled the restriction for testing.

AI chatbots have high probability of generating inaccurate news summaries, BBC tests show

The experiment consisted of making AI chatbots generate summaries for 100 BBC news articles. The outlet also brought in experts in the relevant news topics to rate the outputs. The results showed that 51% of the summaries generated had notable problems of some kind. The most worrying part was that there was a hallucination rate of 19%. More specifically, the summaries for 19% of the articles included incorrect—or non-existent—statements of fact, figures, or dates.

AI platforms aren’t inherently bad at generating summaries

Turness says she is open to “work together in partnership to find solutions.” OpenAI was the only one of the four AI companies to offer a statement regarding the results. “We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We’ll keep enhancing search results,” a spokesperson said.

This doesn’t mean that AI platforms are inherently bad at generating summaries. They tend to do a pretty good job when it comes to small bits of information from different sources. AI-powered tools that summarize emails also work fine. However, it seems that things get more complicated when they have to deal with longer and more complex content.

Latest Top Posts

Samsung May Be Planning An Unusual Move For Its Next Galaxy Watch Update

Google Has Entered Showbiz To Convince Young People That Android Is Cool Too

Deal: Beats Solo 4 Headphones Are 50% Off

Get Ready To Talk To Search! Google Looks To Supercharge AI Mode With “Live” Feature

Amazon Echo Smart Speaker Deals Are Getting Hot!

Leading AI Chatbots Struggle To Generate Accurate News Summaries

AI chatbots have high probability of generating inaccurate news summaries, BBC tests show

AI platforms aren’t inherently bad at generating summaries