
Reddit is suing Perplexity and three âdata-scraping service providersâ to âstop the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit,â according to the complaint.
The company equates the data scraping companies â SerpApi, Oxylabs, and AWMProxy â to âwould-be bank robbersâ who âknowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.â Reddit alleges that Perplexity is a customer of âat least oneâ of the data scraping companies, saying that it âwill apparently do anything to get the Reddit data it desperately needs to fuel its âanswer engineâ â that is, anything other than enter into an agreement with Reddit directly, as some of its competitors have done.â
According to the lawsuit, Reddit sent a cease-and-desist letter to Perplexity in May 2024 âdemanding that it stop scraping Reddit data.â While Perplexity told Reddit at the time that it didnât use Reddit content to train AI models and that it would respect Redditâs robots.txt, after that letter, the volume of Reddit citations on Perplexity actually increased. Reddit also created a post that could only be crawled by Google, and âwithin hours,â Perplexity â produced the contentsâ of that post, the company says.
âThe only way that Perplexity could have obtained that Reddit content and then used it in its âanswer engineâ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine,â Reddit writes.
Redditâs data â posts on all sorts of topics written by and ranked by humans â is hugely helpful to help train AI models, and the company knows it; the API changes that sparked the 2023 protests were positioned as a way for the company to be compensated for that data. Reddit has struck deals with AI companies including OpenAI and Google, and it reportedly wants better ones. And Reddit has previously taken legal action against Anthropic, alleging that Anthropicâs bots accessed Redditâs platform even after Anthropic said they wouldnât be doing that.
âAI companies are locked in an arms race for quality human content â and that pressure has fueled an industrial-scale âdata launderingâ economy,â Ben Lee, Redditâs chief legal officer, says in a statement. âScrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because itâs one of the largest and most dynamic collections of human conversation ever created.
âDefendants Oxylabs UAB, AWM Proxy, and SerpAI â a Lithuanian data scraper, a former Russian botnet, and a company that openly advertises its shady circumvention tactics â are textbook examples of this illegal behavior,â Lee says. âUnable to scrape Reddit directly, they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search. Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself.â
âPerplexity has not yet received the lawsuit, but we will always fight vigorously for usersâ rights to freely and fairly access public knowledge,â Jesse Dwyer, Perplexityâs head of communication, tells The Verge. âOur approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.â

