Like a snake eating its own tail: What happens when AI consumes its own data?

Asked ChatGPT anything lately? Talked with a customer service chatbot? Read the results of Google's "AI Overviews" summary feature? If you've used the Internet lately, chances are, you've consumed content created by a large language model. These models, like DeepSeek-R1 or OpenAI's ChatGPT, are kind of like the predictive text feature in your phone on steroids. In order for them to "learn" how to write, the models are trained on millions of examples of human-written text. Thanks in part to these same large language models, a lot of content on the Internet today is written by generative AI. That means that AI models trained nowadays may be consuming their own synthetic content ... and suffering the consequences.View the AI-generated images mentioned in this episode.Have another topic in artificial intelligence you want us to cover? Let us know my emailing shortwave@npr.org!Listen to every episode of Short Wave sponsor-free and support our work at NPR by signing up for Short Wave+ at plus.npr.org/shortwave.

Feb 18, 2025 - 09:05
 0
Like a snake eating its own tail: What happens when AI consumes its own data?
In large language model collapse, there are generally three sources of errors: The model itself, the way the model is trained and the data — or lack thereof — that the model is trained on.

Asked ChatGPT anything lately? Talked with a customer service chatbot? Read the results of Google's "AI Overviews" summary feature? If you've used the Internet lately, chances are, you've consumed content created by a large language model. These models, like DeepSeek-R1 or OpenAI's ChatGPT, are kind of like the predictive text feature in your phone on steroids. In order for them to "learn" how to write, the models are trained on millions of examples of human-written text. Thanks in part to these same large language models, a lot of content on the Internet today is written by generative AI. That means that AI models trained nowadays may be consuming their own synthetic content ... and suffering the consequences.

View the AI-generated images mentioned in this episode.

Have another topic in artificial intelligence you want us to cover? Let us know my emailing shortwave@npr.org!

Listen to every episode of Short Wave sponsor-free and support our work at NPR by signing up for Short Wave+ at
plus.npr.org/shortwave.

(Image credit: Andriy Onufriyenko)