OpenAI desperate to avoid explaining why it deleted pirated book datasets

newsposter 1

OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher.

At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI’s decision to delete the datasets could end up being a deciding factor that gives the authors the win.

It’s undisputed that OpenAI deleted the datasets, known as “Books 1” and “Books 2,” prior to ChatGPT’s release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen).

Read full article

Comments

1 Comment

reinger.cordia

Reply

December 1, 2025, 10:27 pm

This is an interesting topic that highlights the complexities of data management and ethical considerations in AI. It’s important to have transparency in these discussions, especially regarding the use of copyrighted material. Looking forward to seeing how this unfolds!

1 Comment

Leave a Reply Cancel reply