Meta Reportedly Torrented 82TB Of Copyrighted Books For AI Training

meta-reportedly-torrented-82tb-of-copyrighted-books-for-ai-training
Meta Reportedly Torrented 82TB Of Copyrighted Books For AI Training

Facebook’s parent company Meta is in the middle of a potential legal mess following a lawsuit filed by a group of authors. The lawsuit accuses Meta of using copyrighted material without permission for the development of AI products. In a new update, Meta is said to have torrented up to 82TB of content for AI training.

The “new” lawsuit against Meta arose in mid-January 2025. It is actually a follow-up to a lawsuit originally filed in 2023 that had already been dismissed. The plaintiffs claimed that Meta illegally used content from books to train its Llama AI models. At the time, the amount of copyrighted content used by the firm was estimated to be as high as 32TB. The data was reportedly obtained from LibGen, a dataset that was available on the internet for a while and included content from books of all kinds—from comedy to science.

The size of datasets reportedly torrented by Meta for AI training reaches 82TB

That said, the latest updates on the case reveal that Meta may have used a much larger amount of data. In addition to LibGen, Meta reportedly used Anna’s Archive and Z-Library datasets. In total, the document mentions that Meta actually torrented about 82TB of files for AI training.

See also  AH Real Deal: This UGREEN 65W USB-C Charger Is Just $30 For Prime Members

The evidence presented in the case shows the alleged concerns expressed by employees involved in Meta’s project. “I don’t think we should use pirated material. I really need to draw a line here,” a senior AI researcher reportedly said in 2022. “Using pirated material should be beyond our ethical threshold,” another researcher reportedly said. “SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they’re infringing it,” they added.

The original complaint also claims that Mark Zuckerberg was aware of the origin of the datasets. However, in a meeting in 2023, Meta’s CEO reportedly approved their use. “We need to move this stuff forward… we need to find a way to unblock all this,” Zuckerberg reportedly said. “Torrenting from a corporate laptop doesn’t feel right [laughing out loud emoji],” one Meta employee reportedly told another in a conversation.

Meta could have tried to avoid leaving traces of downloads

The documents even claim that Meta took steps to hide its steps. The company reportedly tried to prevent leaving traces that would allow the tracking of dataset download activity to its servers. This implies that Meta may have deliberately tried to avoid copyright laws.

See also  New Phishing Scam Can Bypass Google Calendar Spam Filters

It doesn’t seem like the case will have a solution soon. The first rulings on the matter are expected within months. Plus, if the output is negative for Meta, they will surely appeal, which will further prolong the process. It is possible that, in the end, we will not have a final verdict for this case until years from now. This lawsuit—and others—is an example of how copyright for AI training remains in a “gray area” years later.