OpenAI defends alleged use of novels in training data sets for “innovation” – The Hindu

OpenAI defends alleged use of novels in training data sets for “innovation” - The Hindu

OpenAI has defended the use of copyrighted materials such as novels in data sets for training large language models (LLMs), claiming that fair use protects such innovation.

In a court filing dated August 28, OpenAI responded to the suit filed by authors Paul Tremblay and Mona Awad, who claimed that the AI startup used their copyrighted work to train ChatGPT. OpenAI called for most of the claims to be dismissed, further referencing the class action complaint filed by authors Chris Golden, Sarah Silverman, and Richard Kadrey who also said their creative work was mined.

“At the heart of Plaintiffs’ Complaints are copyright claims. Those claims, however, misconceive the scope of copyright, failing to take into account the limitations and exceptions(including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence. The constitutional purpose of copyright is “[t]o promote the Progress of Science and useful Arts.”,” said OpenAI in its filing.

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

Authors have previously alleged that Google and Meta also scraped copyrighted works for their own AI products.

Some writers claimed that ChatGPT being able to partially summarise their work was evidence of it being trained on the novel, while a plaintiff in a similar suit against Google claimed that the AI chatbot Bard regenerated parts of their book almost down to the last word.

Authors have also claimed that OpenAI, Google, and Meta scraped copyrighted works available for free on book piracy websites.

OpenAI said in its filing that many courts in the past had applied the fair use doctrine to acknowledge that it was permissible for “innovators” to use copyrighted work in “transformative ways.”