Background
In late 2023, the New York Times sued OpenAI and Microsoft alleging that the companies had ingested millions of NYT articles to train GPT-class large language models, that the resulting models can be prompted to regurgitate substantial portions of NYT articles verbatim, and that ChatGPT and Microsoft’s Bing/Copilot products competed directly with the Times’s business by substituting for visits to nytimes.com. Sister complaints from Daily News and the Center for Investigative Reporting (CIR) made parallel allegations and were consolidated for pretrial purposes.
The defendants moved to dismiss most of the claims. They argued, among other things, that training on copyrighted material is a quintessentially transformative fair use; that contributory liability fails because GPT has substantial noninfringing uses under Sony; that the DMCA § 1202(b) claims required removal of complete copyright management information rather than excerpts; that hot-news misappropriation is preempted by the Copyright Act; and that the trademark dilution claims failed because no commercial use of the Times’s marks was plausibly alleged.
The Court’s Holding
Judge Stein largely denied the motion to dismiss. The court held the following claims state plausible causes of action and may proceed to discovery: (1) direct copyright infringement against OpenAI based on alleged regurgitation outputs and on training; (2) contributory copyright infringement — the Sony “substantial noninfringing uses” defense was deemed premature at the pleadings stage and must await a developed factual record; (3) DMCA § 1202(b)(1) claims by the Daily News and CIR plaintiffs against OpenAI based on allegations that copyright management information was knowingly stripped from training inputs; and (4) federal trademark dilution claims based on the alleged commercial use of the Times’s name and trademarks in connection with chatbot output.
The court dismissed several theories. Most DMCA § 1202(b)(3) claims (which require removal or alteration of CMI in copies that are then distributed) were dismissed because the alleged outputs were partial excerpts rather than full reproductions. Hot-news misappropriation was dismissed as preempted by the Copyright Act. DMCA claims against Microsoft were dismissed because Microsoft’s role in training was less direct than OpenAI’s.
An important conceptual move from the opinion: Judge Stein distinguished between “regurgitations” — verbatim or near-verbatim reproduction of NYT text in chatbot output — and “abridgements,” or summaries that convey the gist without copying expressive language. Regurgitations are potentially infringing; abridgements may not be. That distinction will likely structure how the parties (and other AI cases) frame their proof at summary judgment.
Key Takeaways
- Fair use is not resolved at the pleadings stage. Defendants who hoped to win the AI-training cases on a Rule 12 motion under Authors Guild v. Google or Sega v. Accolade will not get that easy ruling.
- The contributory-infringement claim survives, which means OpenAI cannot simply blame end users for any infringing outputs. The case will probe what OpenAI knew and what it did to mitigate regurgitation risks.
- DMCA § 1202(b)(1) is a live front: stripping author bylines, copyright notices, or RSS metadata from training inputs may itself be actionable, separate from the underlying copyright claim, and carries statutory damages.
- Federal dilution claims survived, signaling that publisher names and marks may have meaningful trademark protection in the AI-output context — an area where almost no caselaw existed before.
- The regurgitation-vs-abridgement frame will probably structure the proof phase: plaintiffs will document verbatim outputs, defendants will emphasize abridgements and the role of guardrails.
Why It Matters
NYT v. Microsoft is the marquee AI-copyright case. The defendants’ loss on the motion to dismiss means the case will proceed through full discovery into OpenAI’s training corpus, deduplication and filtering practices, model behavior, and Microsoft’s contracting and product-integration relationships. The Times will be able to subpoena training data, model checkpoints, alignment instructions, and internal communications about regurgitation risk. That discovery will likely shape the disposition of every other AI-copyright case in the country.
The trademark-dilution survival is also notable. Until now, AI-output litigation has been almost entirely a copyright fight. After this ruling, every publisher whose name or marks appear in chatbot output has a plausible federal trademark cause of action to consider, with statutory damages and attorneys’ fees in play. That broadens the available leverage in licensing negotiations between publishers and AI labs significantly.
Your browser cannot display this PDF inline.