News

OpenAI argues its alleged use of authors’ work is “fair use”

Brendyn Lotz

1st September 2023

OpenAI’s lawyers have argued that authors claiming it infringed on their copyright don’t understand how copyright works.
The legal team argues that it hasn’t infringed the copyright of authors as the firm’s use of the work falls under fair use.
The legal team has asked the judge to dismiss five of the six claims the authors have made.

Fair use is a term that anybody who creates content or consumes said content has likely come across in recent years. This legal tool gives creators the ability to use copyrighted content without the need to license it, provided some conditions are met.

Watching a movie in its entirety on a live stream is not fair use, but using some scenes from that movie for the purpose of a review could be seen as fair use.

But what about a multi-billion dollar company using the literary works of an author to train a large language model (LLM)? Would that pass the muster for “fair use”?

That’s what OpenAI appears to be doing in a legal tussle with novelists Paul Tremblay and Mona Awad.

The pair filed a lawsuit against OpenAI accusing the company of using their work to train its LLM. While the content of the books is mentioned, the pair allege that by OpenAI’s own admission, the company uses books to teach its model how to parse long pieces of text.

However, as reported by The Register, OpenAI has hit back, in a rather creative manner.

“At the heart of Plaintiffs’ Complaints are copyright claims. Those claims, however, misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence,” OpenAI’s legal team writes.

Citing various examples where fair use has been used to transform existing technology into something different, the legal eagles argue fair use of existing products is how many tech companies have developed new innovations.

But at the core of the argument is that ChatGPT isn’t creating “derivative work” when it answers a prompt simply because that answer is drawn from an enormous training dataset.

“According to the Complaints, every single ChatGPT output—from a simple response to a question (e.g., ‘Yes’), to the name of the President of the United States, to a paragraph describing the plot, themes, and significance of Homer’s The Iliad—is necessarily an infringing ‘derivative work’ of Plaintiffs’ books. Worse still, each of those outputs would simultaneously be an infringing derivative of each of the millions of other individual works contained in the training corpus—regardless of whether there are any similarities between the output and the training works. That is not how copyright law works,” argues the legal team.

The lawyers have asked the court to throw out five of the six claims Tremblay and Awad have presented. These claims include vicarious infringement, violation of the Digital Millennium Copyright Act, unfair competition, negligence, and unjust enrichment.

However, OpenAI seemingly hopes to push ahead with fighting the claim of direct copyright infringement.

We have to say, the argument that the use of an author’s work to train an LLM is fair use sure is creative. Whether it will pass legal muster all comes down to how well it’s argued in court and what the judge can be convinced to rule.

Should the judge find that OpenAI’s argument is good enough, it could set a precedent that would make it much harder for those whose work is exploited by LLMs to get compensation.

[Image – Gülfer ERGİN on Unsplash]

About Author

Brendyn Lotz

Brendyn Lotz writes news, reviews, and opinion pieces for Hypertext. His interests include SMEs, innovation on the African continent, cybersecurity, blockchain, games, geek culture and YouTube.