Judge: Writers suit fails vs Meta over using books to train AI

U.S. District Judge Vince Chhabria
Calawyers.org/

U.S. District Judge Vince Chhabria
Calawyers.org/

A federal judge told a group of fiction writers they needed a better strategy to successfully sue Meta for allegations it downloaded their works into a “shadow library” for the purposes of training a generative artificial intelligence model.

Describing such technology as “software products that are capable of generating text, images, videos or sound,” U.S. District Judge Vince Chhabria in his June 25 opinion noted “companies have been unable to resist the temptation to feed copyright-protected materials into their models — without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal.”

At issue is Meta’s Llama platform, a so-called “large language model,” or LLM, which the Facebook parent company released in February 2023. Meta released Llama 3 in April 2024 and plans a fourth version in 2025. Chhabria said 13 authors — “mostly famous fiction writers” — allege Meta violated copyright protections by downloading their works from illicit sources and feeding the content into Llama’s training module.

On cross motions for summary judgment, the parties clashed over whether Meta’s conduct constituted fair use. Chhabria rejected the authors’ arguments, noting that although Llama can produce small snippets of the authors’ text, it can’t spit out enough to matter, and further said the authors can’t claim a diminished ability to sell their works to LLM developers as they aren’t entitled to that market.

“As for the potentially winning argument — that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution — the plaintiffs barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works,” Chhabria wrote. “But in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors — not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”

Chhabria wrote at length about the Copyright Act and how the concept of fair use balances protection of ownership with room for innovation. Examples of fair use include commentary and criticism, news reporting, teaching or research. The law is designed to foster creativity by protecting the rights of artists, he explained, and fair use allows courts to make sure copyrights don’t improperly stifle creative output.

The law has four factors courts should use when discerning fair use, Chhabria continued, including whether use of protected material is commercial or nonprofit, the nature of the copyrighted work, how much of an original work is used, and the effect the use has on the market or value of the source material.

Although he said the concept of fair use is “flexible,” and the four factors are not exhaustive, he also said the fourth factor is clearly the most important: “If the law allowed people to copy your creations in a way that would diminish the market for your works, this would diminish your incentive to create more in the future. Thus, the key question in virtually any case where a defendant has copied someone’s original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original work.”

Chhabria said Meta engineers valued books as Llama training material and noted the company’s “head of generative AI discussed spending up to $100 million on licensing,” but a problem with that strategy was how many individual authors, and not publishers, hold such rights. Meta abandoned licensing efforts after it downloaded a database from Library Genesis, which Chhabria said is, like other shadow libraries, “an online repository that provides things like books, academic journal articles, music or films for free download, regardless of whether that media is copyrighted.”

Meta also downloaded Anna’s Archive, which is a compilation of shadow libraries, and did so using BitTorrent, allowing more efficient downloading and uploading of reams of data. Chhabria noted Meta’s contention it trained Llama to ensure it would not just spit out the text fed into its model.

“Even using ‘adversarial’ prompts designed to get Llama to regurgitate its training data, Llama will not produce more than 50 words of any of the plaintiffs’ books,” Chhabria wrote. “And there is no indication that it will generate longer portions of text that would function as ‘repackaging’ of those books.”

Meta projects Llama will generate between $460 billion and $1.4 trillion in revenue over a decade, Chhabria said. He called that factor relevant, as was Meta’s means of accessing the authors’ protected texts, noting some shadow library operators have been indicted for copyright infringement. But the authors submitted no evidence on those fronts, and he said Meta successfully argued its use of the material was transformative — it wanted the words to train Llama, not to sell other authors’ written output.

Still, he wrote, “it’s easy to imagine a situation in which a secondary use is highly transformative but the secondary user nonetheless loses on fair use because allowing people to engage in that kind of use would have too great an effect on the market for the original work.” He then laid out the ways the authors might have won summary judgment, such as by contending Llama eventually could generate works similar enough to theirs to adequately compete for consumers.

“But the plaintiffs’ presentation is so weak that it does not move the needle, or even raise a dispute of fact sufficient to defeat summary judgment,” Chhabria wrote. He noted the authors and a support brief focused only on the relevance of an LLM producing words that directly infringe on copyrighted material.

“This is not a case where an original work is being compared to one secondary work. Nor is this case like the previous fair use cases involving creation of a digital tool,” Chhabria wrote. “This case, unlike any of those cases, involves a technology that can generate literally millions of secondary works, with a miniscule fraction of the time and creativity used to create the original works it was trained on. No other use — whether it’s the creation of a single secondary work or the creation of other digital tools — has anything near the potential to flood the market with competing works the way that LLM training does. And so the concept of market dilution becomes highly relevant.”

Although he said market dilution would likely be a decisive argument in favor of the authors, Chhabria noted he couldn’t decide based on what he thought should happen. In order to stave off summary judgment, he said, the authors would’ve needed to raise a factual dispute suitable for a jury. But they didn’t do so, even as Meta noted the lack of evidence that Llama training hampered book sales.

Chhabria said he wouldn’t infer the potential of a dampened market for the authors. Meta argued its use of the material was transformative and said Llama users can’t access the protected writing.

“On this record, then, Meta has defeated the plaintiffs’ half-hearted argument that its copying causes or threatens significant market harm,” Chhabria wrote. “That conclusion may be in significant tension with reality, but it’s dictated by the choice the plaintiffs made to put forward two flawed theories of market harm while failing to present meaningful evidence on the effect of training LLMs like Llama with their books on the market for those books.”

Chhabria called “nonsense” Meta’s argument that barring use of copyrighted text to train LLMs would disservice the public interest, explaining developers could pay to license books if it wasn’t fair use to take them without permission.

“No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books,” Chhabria wrote in conclusion. “And some cases might present even stronger arguments against fair use.”

But in this instance, the authors failed to make such arguments, and Chhabria granted summary judgment to Meta. He will rule separately on a claim about unlawful distribution using the torrenting process.

Judge: Writers suit fails vs Meta over using books to train AI

Jonathan Bilyk

Trending Now

Plaintiffs in WVSP hidden camera cases could get $1M each

Civil suits accuse GreenPower of sexual harassment, more

Personal injury lawyers flaunt cash after PPP loans forgiven

Parents accuse school of letting autistic girl flee campus

Will GOP act on $124B in Medicare insurance fraud?

Join Our Newsletter

Regions

Sections

Jonathan Bilyk

Tags

Jonathan Bilyk

Get email notifications on {{subject}} daily!

Followed notifications

Please log in to use this feature

Trending Now

Plaintiffs in WVSP hidden camera cases could get $1M each

Civil suits accuse GreenPower of sexual harassment, more

Personal injury lawyers flaunt cash after PPP loans forgiven

Parents accuse school of letting autistic girl flee campus

Will GOP act on $124B in Medicare insurance fraud?

Join Our Newsletter

Regions

Sections