Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
By MATT O’BRIEN
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn’t break the law by training its chatbot Claude on millions of copyrighted books.
But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online “shadow libraries” of pirated copies.
U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system’s distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative.”
“Like any reader aspiring to be a writer, Anthropic’s (AI large language models) trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,” Alsup wrote.
But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works.
“Anthropic had no entitlement to use pirated copies for its central library,” Alsup wrote.
A trio of writers — Andrea Bartz, Charles Graeber and Kirk Wallace Johnson — alleged in their lawsuit last summer that Anthropic’s practices amounted to “large-scale theft,” and that the San Francisco-based company “seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.”
Books are known to be important sources of the data — in essence, billions of words carefully strung together — that are needed to build large language models. In the race to outdo each other in developing the most advanced AI chatbots, a number of tech companies have turned to online repositories of stolen books that they can get for free.
Documents disclosed in San Francisco’s federal court showed Anthropic employees’ internal concerns about the legality of their use of pirate sites. The company later shifted its approach and hired Tom Turvey, the former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles.
With his help, Anthropic began buying books in bulk, tearing off the bindings and scanning each page before feeding the digitized versions into its AI model, according to court documents. But that didn’t undo the earlier piracy, according to the judge.
“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages,” Alsup wrote.
The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram.
Anthropic — founded by ex-OpenAI leaders in 2021 — has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way.
But the lawsuit filed last year alleged that Anthropic’s actions “have made a mockery of its lofty goals” by building its AI product on pirated writings.
Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with “copyright’s purpose in enabling creativity and fostering scientific progress.” Its statement didn’t address the piracy claims.
The authors’ attorneys declined comment.
With Beyoncé's Grammy Wins, Black Women in Country Are Finally Getting Their Due
February 17, 2025Bad Bunny's "Debí Tirar Más Fotos" Tells Puerto Rico's History
February 17, 2025
Comments 0