Meta is facing a class action lawsuit filed by five major book publishers and one author over claims the company trained its Llama AI models on pirated copies of their works sourced from notorious websites like LibGen and Sci-Hub, without permission or compensation. The lawsuit accuses Meta of repeatedly copying copyrighted materials, including books and journal articles, to fuel its AI development. The case threatens to upend the use of large language models in content creation.
Overview
The lawsuit, filed by Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage, and author Scott Turow, alleges that Meta engaged in one of the most massive infringements of copyrighted materials in history. The publishers claim that Meta repeatedly copied their books and journal articles without permission, using material from notorious pirate sites such as LibGen, Anna's Archive, Sci-Hub, Sci-Mag, and others.
What Each Side Says
Meta has been accused of training its Llama AI models on copyrighted works without permission, including books and journal articles. The company has allegedly used the Common Crawl dataset, which contains unauthorized copies of copyrighted works. As a result, Llama outputs verbatim and near-verbatim substitutes of copyrighted material.
Tradeoffs
The lawsuit threatens to upend the use of large language models in content creation. While a federal judge ruled in favor of Meta in a previous lawsuit, he pointed out that his ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. A group of authors also sued Anthropic over copyright infringement, and while a federal judge ruled that training AI models on legally purchased books without permission is considered fair use, he allowed the authors to move forward with a class action lawsuit.
When to Use It
The case highlights the importance of fair use and copyright law in the development of AI models. While AI is powering transformative innovations, productivity, and creativity for individuals and companies, courts have rightly found that training AI on copyrighted material can qualify as fair use.
Bottom Line
Meta will fight the lawsuit aggressively, according to a spokesperson. The outcome of the case will have significant implications for the use of large language models in content creation.