Reddit sues Perplexity for allegedly ripping its content to feed AI

Reddit is suing Perplexity and three “data-scraping service providers” to “stop the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit,” according to the complaint.

The company equates the data scraping companies — SerpApi, Oxylabs, and AWMProxy — to “would-be bank robbers” who “knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.” Reddit alleges that Perplexity is a customer of “at least one” of the data scraping companies, saying that it “will apparently do anything to get the Reddit data it desperately needs to fuel its ‘answer engine’ — that is, anything other than enter into an agreement with Reddit directly, as some of its competitors have done.”

According to the lawsuit, Reddit sent a cease-and-desist letter to Perplexity in May 2024 “demanding that it stop scraping Reddit data.” While Perplexity told Reddit at the time that it didn’t use Reddit content to train AI models and that it would respect Reddit’s robots.txt, after that letter, the volume of Reddit citations on Perplexity actually increased. Reddit also created a post that could only be crawled by Google, and “within hours,” Perplexity “ produced the contents” of that post, the company says.

“The only way that Perplexity could have obtained that Reddit content and then used it in its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine,” Reddit writes.

Reddit’s data — posts on all sorts of topics written by and ranked by humans — is hugely helpful to help train AI models, and the company knows it; the API changes that sparked the 2023 protests were positioned as a way for the company to be compensated for that data. Reddit has struck deals with AI companies including OpenAI and Google, and it reportedly wants better ones. And Reddit has previously taken legal action against Anthropic, alleging that Anthropic’s bots accessed Reddit’s platform even after Anthropic said they wouldn’t be doing that.

“AI companies are locked in an arms race for quality human content — and that pressure has fueled an industrial-scale ‘data laundering’ economy,” Ben Lee, Reddit’s chief legal officer, says in a statement. “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.

“Defendants Oxylabs UAB, AWM Proxy, and SerpAI — a Lithuanian data scraper, a former Russian botnet, and a company that openly advertises its shady circumvention tactics — are textbook examples of this illegal behavior,” Lee says. “Unable to scrape Reddit directly, they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search. Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself.”

“Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge,” Jesse Dwyer, Perplexity’s head of communication, tells The Verge. “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

2 Comments

lindgren.lessie


October 22, 2025, 6:20 pm

This is an interesting development in the ongoing conversation about data usage and intellectual property. It’s crucial to consider the implications for both content creators and AI development. Looking forward to seeing how this unfolds!
gabriel03


October 22, 2025, 6:54 pm

Absolutely! It’s fascinating to see how legal actions like this could shape the future of AI training and content ownership. This case might set important precedents for how companies handle data, especially regarding user-generated content.

2 Comments

Leave a Reply Cancel reply