AI ‘podcasts’ are prone to errors

(Credit: Getty Images)

AI-generated “podcasts” are useful but prone to errors, warns a researcher.

Google’s NotebookLM can take a dense research paper and turn it into an entertaining podcast, but as with many tools that rely on large language models (LLMs), the final product is still prone to mistakes.

Ian Flynn, research assistant professor in the geology and environmental science department at the University of Pittsburgh, outlined pros and cons of using such tools in the planetary sciences in an October paper published in the American Geophysical Union journal Perspectives of Earth and Space Scientists.

Working with coauthor Sean Peters, a visiting assistant professor at Middlebury College in Vermont, Flynn chose three published papers to transform into NotebookLM audio overviews, which Google describes as “deep-dive discussions between AI hosts that provide in-depth summaries of the key topics in your uploaded sources.”

The overviews are supposed to objectively represent the sources they draw from, “rather than subjective opinions from the AI hosts.”

The three papers, all related to volcanism on Mars, were each in a slightly different format: One was a letter-type, five-page paper with three figures and one table. The second was a typical research paper, 29 pages long with nine figures and four tables. The final publication was a 23-page review paper with 11 figures.

Flynn and Peters found the overviews produced engaging, plain-language summaries and used “fairly accurate and creative analogies” that could be helpful when it came to education and accessibility of complicated research.

Each overview also contained errors, however, and they were always at the end of the overview. Sometimes errors took the form of unjustified extrapolation—for instance, from the identification of a certain volcanic feature, the first overview extrapolated the presence of liquid water and the possibility of life on Mars. Neither of these conclusions were in the original paper.

But sometimes the errors were more subtle, which could make them more difficult for nonexperts to detect. To remedy this, Flynn and Peters suggest always reading the original source material.

Ultimately, the team understood the appeal of NotebookLM audio overviews. They could see several ways these overviews could be useful, not only for learning about a topic, but also as a tool for illustrating how they should be interpreted and expanding access to complex scientific topics.

“Overall, NotebookLM’s generated audio overviews can be a useful tool for the planetary science community,” the paper concludes, “but it is doubtful to be a replacement for critically reading the source material.”

Source: University of Pittsburgh