World

OpenAI and Microsoft are hit with a new copyright lawsuit over AI training

When AI image generators first became popular last year, many artists noticed that AI would regularly steal their unique styles.

A view shows a Microsoft logo at Microsoft offices in Issy-les-Moulineaux near Paris, France, January 25, 2023. REUTERS/Gonzalo Fuentes/File Photo

A few minutes every morning is all you need.

Stay up to date on the world's Headlines and Human Stories. It's fun, it's factual, it's fluff-free.

The backstory: In order to create a basis for or allow a generative artificial intelligence (AI) to form its “intelligence,” tech companies have to take from existing materials, essentially building a library to train the AI model. With the generative AI we see today, those libraries are extensive, containing data from all over the internet. When it comes to large language models (LLMs), the AI algorithm needs to train with tons of different materials to be able to complete natural language processing (NLP) tasks. OpenAI, which is behind ChatGPT and DALL-E, has gotten this training data from the internet and also other media, like books, for some of its services and tools.

More recently: When AI image generators first became popular last year, many artists noticed that AI would regularly steal their unique styles. A lot of people who create the type of content that ends up in the training datasets of AI models – like authors, artists and other creators – have been taking action against tech firms that use their work without permission. This year, famous writers like screenwriter Michael Chabon, journalist Rachel Louise Synder, George R.R. Martin, John Grisham, Jodi Picoult and comedian Sarah Silverman filed lawsuits against OpenAI for copyright infringement. Artists and other creatives have also sued it for using their works for AI content training. In response to these kinds of allegations, OpenAI has said that the content generated by its systems doesn’t count as a “derivative work,” so it’s not copyright infringement.

The development: OpenAI was just hit by a big class action lawsuit that also named Microsoft (a major investor and partner of OpenAI) as another defendant. The lawsuit accuses the companies of using other people's intellectual property without permission to train generative AI tech. It was filed by a group of nonfiction authors who say that their books and academic journals were used to train OpenAI’s LLM without payment or anything. The complaint alleges that OpenAI and Microsoft ignore copyright laws, bringing in a lot of money by using copyrighted content that took writers years researching and building.

Author and Hollywood Reporter editor Julian Sancton is leading the suit, which claims that OpenAI copied tens of thousands of nonfiction books without permission to train LLMs to respond to humans. In practice, this would mean that OpenAI’s LLMs would be able to regurgitate the research, ideas and styles of these authors without permission, proper creditor compensation.

OpenAI has already confirmed that Sancton’s book, “Madhouse at the End of the Earth: The Belgica’s Journey Into the Dark Antarctic” was a part of the dataset that was used to train the ChatGPT, according to the lawsuit. This is the first class action lawsuit related to ChatGPT that also puts the blame on Microsoft, saying that Microsoft is "deeply involved" in training and developing OpenAI’s models and for that is also liable for copyright infringement.

Key comments:

“Nonfiction authors often spend years conceiving, researching, and writing their creations. While OpenAI and Microsoft refuse to pay nonfiction authors, their AI platform is worth a fortune. The basis of the OpenAI platform is nothing less than the rampant theft of copyrighted works,” the lawsuit states.

"While OpenAI and Microsoft refuse to pay nonfiction authors, their AI platform is worth a fortune," Julian Sancton's attorney, Justin Nelson, said in a statement. "The basis of OpenAI is nothing less than the rampant theft of copyrighted works."