AI Copyright Lawsuits: Where Things Stand for Publishers
The legal battle over AI and copyright is now well underway. Multiple lawsuits from publishers against AI companies are working through courts, and the outcomes will shape the industry for years.
Here’s my attempt to summarize a complicated legal landscape for media professionals who need to understand the stakes.
The Core Question
At its simplest, the legal question is this: Did AI companies infringe copyright by using publisher content to train their models without permission or payment?
The AI companies generally argue their use constitutes fair use—transformative use that doesn’t substitute for the original works.
Publishers argue that training on copyrighted material without license is infringement, regardless of what the trained model produces.
The courts will ultimately decide, but the answer isn’t obvious. Fair use doctrine is notoriously fact-specific and hard to predict.
Major Active Cases
Several significant lawsuits are in progress:
The New York Times vs. OpenAI and Microsoft. Perhaps the highest-profile case, filed late 2023. The Times alleges OpenAI and Microsoft trained on Times content without permission and that outputs sometimes reproduce Times material nearly verbatim. The case is proceeding through discovery.
Authors Guild vs. OpenAI. A class action on behalf of authors whose books were allegedly used for training. Similar issues to the Times case but focused on book content rather than journalism.
Getty Images vs. Stability AI. Focuses on image generation rather than text. Getty alleges Stable Diffusion was trained on millions of Getty images without license.
ANI vs. OpenAI (India). Asian News International’s case in Indian courts raises similar issues in a different jurisdiction with different copyright frameworks.
Various other actions. Smaller publishers, music companies, artists, and software developers have filed related cases raising similar issues.
The cases are at different stages. None have reached definitive resolution yet.
The Fair Use Analysis
American fair use analysis considers four factors:
Purpose and character of use. Is the use transformative? AI companies argue training creates something fundamentally different from source material. Publishers argue it’s just copying at industrial scale.
Nature of the original work. Creative works get more protection than factual ones. Journalism often blends both.
Amount and substantiality. AI training involves the entire work, not excerpts. This typically weighs against fair use, though AI companies argue the use is different in kind.
Market effect. Does the use harm the market for the original? Publishers argue it undermines their licensing revenue and competitive position. AI companies argue trained models don’t substitute for reading original articles.
How courts weigh these factors will determine outcomes. Reasonable lawyers disagree on predictions.
What Publishers Should Expect
Several possible outcomes:
Publisher wins broadly. Courts rule that training on copyrighted material requires permission. AI companies would need to license training data retroactively and prospectively. Major financial implications for AI companies; significant revenue opportunity for publishers.
AI company wins broadly. Courts rule training constitutes fair use. Publishers lose leverage; AI companies can train freely on published content. This would fundamentally change the power dynamic.
Nuanced outcome. Courts find some uses fair and others not. Perhaps training is fair use but regenerating near-verbatim content is not. Or perhaps commercial use differs from research use. The details would matter enormously.
Settlement. Parties negotiate resolution before courts rule. This is increasingly common in complex litigation. Settlements would create precedent of payment without definitive legal ruling.
Legislative resolution. Congress or other legislatures act to clarify rules. This could supersede litigation outcomes.
Most observers expect some combination—some cases settling, some reaching judgment, potentially different outcomes in different jurisdictions.
Implications for Strategy
While legal uncertainty persists, publishers should consider:
Negotiation leverage. The litigation creates leverage for licensing negotiations. AI companies settling with major publishers suggests they see litigation risk.
Documentation. Publishers should document their content’s use in AI training, potential harms, and licensing value. This evidence matters whether for litigation or negotiation.
Technical protection. Robots.txt signals, paywall protection, and other technical measures establish that publishers didn’t consent to training use.
Collective action. Publishers acting together have more leverage than individual action. Industry associations and collective licensing bodies may become more important.
Revenue planning. Don’t assume AI licensing revenue will materialize. But don’t assume it won’t either. Scenario planning makes sense.
International Dimensions
Copyright law varies by jurisdiction. Outcomes may differ:
United States. Fair use doctrine is relatively permissive compared to other countries. But it’s also unpredictable.
European Union. The EU has a text and data mining exception but with conditions. Implementation varies by member state.
United Kingdom. The UK considered but did not adopt a broad training exception. The legal position is somewhat uncertain.
Australia. Australian fair dealing is narrower than American fair use. Publishers may have stronger claims here, though no major cases have reached resolution.
Global AI companies must navigate a patchwork of different legal regimes. This creates complexity for both AI companies and publishers.
What I’m Watching
Several developments may prove significant:
Discovery in the Times case. What internal documents reveal about how OpenAI approached copyrighted content could influence public opinion and other cases.
Early judicial opinions. Rulings on motions to dismiss or preliminary matters signal how judges are viewing the core issues.
Settlement patterns. Which publishers are settling, on what terms, and whether settlements create industry standards.
Regulatory movement. Whether regulators (EU AI Act implementation, FTC action, ACCC attention) create new obligations independent of litigation.
Technology evolution. Whether AI companies develop training approaches that reduce copyright exposure.
Practical Steps
For publishers uncertain how to proceed:
Consult legal expertise. Copyright litigation is specialized. General counsel may need outside support.
Participate in industry efforts. Publishers acting collectively through industry associations have more influence.
Document your position. Clear records of what you’ve published, how it’s protected, and what uses you’ve permitted matter.
Engage with AI companies. Even while litigation proceeds, negotiation continues. Understanding what deals are possible informs strategy.
Consider technical measures. Work with partners who understand both the technology and the legal landscape. Team400’s AI team can help evaluate technical protection options.
The Longer View
Whatever happens in current litigation, the underlying tension persists: AI systems need content to train on; publishers need compensation for their work.
Some resolution will emerge—through courts, legislation, or market dynamics. The question is whether that resolution fairly compensates content creators or whether the value of their work is captured by AI companies without payment.
Publishers who engage actively with this process—legally, through negotiation, through industry coordination—will influence outcomes more than those who wait passively.
The stakes are high enough to warrant the investment. Working with advisors who understand both journalism and AI—like AI consultants in Sydney—can help publishers navigate the complexity.
The legal battle is underway. Publishers need to be paying attention.
I’m tracking these cases and will update as significant developments occur. If you’re involved in related litigation or negotiation, I’d welcome perspective.