The Use Of Copyrighted Works As Training Material May Be In Vogue, But Is It Infringement?

AI is increasingly impacting how we learn, work, and create. Whether you’re mapping out the details of your summer vacation, predicting consumer trends for your fashion brand, or generating graphics for your children’s book, AI is simplifying our lives and enhancing our experiences. But do you ever wonder where AI derives its content? It doesn’t magically appear out of thin air. While AI training models are complex, they essentially involve feeding the AI large datasets such as images, texts, or even sounds, allowing the AI to learn by discerning patterns and relationships within the data. This process poses legal questions related to intellectual property infringement, some of which are already being considered in courtrooms. I will focus on the question of whether the use of copyrighted works as training material constitutes copyright infringement, as posed in a class-action lawsuit initiated by Sarah Silverman, Christopher Golden, and Richard Kadrey against OpenAI. Additionally, I will explore the repercussions of this lawsuit within the fashion industry.

This suit was filed against OpenAI in the Northern District of California in July of this year for direct and vicarious copyright infringement, violation of Section 1202(b) of the Digital Millennium Copyright Act (DMCA), violations of California and common law unfair competition laws, negligence, and unjust enrichment. The class members are book authors who claim that the use of their copyrighted books as training material for ChatGPT, without their consent, constitutes copyright infringement. Specifically, the class consists of all U.S. persons “that own a U.S. copyright in any work that was used as training data for the OpenAI Language Models.”

The subject of this lawsuit is OpenAI’s ChatGPT, which is powered by a large language model (OpenAI LLM) allegedly trained by “copying massive amounts of text and extracting expressive information from it.” As evidence of copying, the complaint explains that “ChatGPT can accurately summarize a certain copyrighted book.” According to the complaint, books are key to training a large language model because “books offer the best examples of high-quality longform writing.” The plaintiffs claim that the OpenAI LLM was trained using multiple datasets of books containing hundreds of thousands of titles.

In their infringement claim, the plaintiffs allege that OpenAI’s copying of the their books during the training process of the OpenAI LLM constitutes direct infringement. The plaintiffs also argue that every output from the OpenAI LLM constitutes an act of vicarious infringement. The plaintiffs’ rationale behind this claim is that because OpenAI LLM’s outputs are based on the information extracted from the plaintiffs’ books, every output is an infringing derivative work. Additionally, the plaintiffs claim OpenAI intentionally removed copyright-management information (CMI) (i.e., copyright notice, title, and other identifying information) from the plaintiffs’ infringed works in violation of 17 U.S.C. 1202(b)(1). The plaintiffs seek certification of their class action, monetary damages, and injunctive relief to require OpenAI to make “changes to ChatGPT to ensure that all applicable information set forth in 17 U.S.C. § 1203(b)(1) is included when appropriate.”

In its motion to dismiss, OpenAI pushes back on all of plaintiffs’ claims except for direct copyright infringement. OpenAI hints at its intention to assert a fair use affirmative defense against the plaintiffs’ claim of direct infringement. OpenAI argues that the plaintiffs’ claims “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.” Drawing on the precedent set by Google v. Oracle, 141 S. Ct. 1163 (2021), OpenAI contends that “it is not an infringement to create ‘wholesale cop[ies] of [a work] as a preliminary step’ to develop a new, non-infringing product, even if the new product competes with the original.”

Let’s take a closer look at Google v. Oracle. Oracle, which owns a copyright in Java SE, filed suit against Google for copyright infringement after Google copied roughly 11,500 lines of code from the Java SE program. The copied lines are part of a tool called an Application Programming Interface (API), which allows programmers to call upon pre-written computing tasks for use in their own programs. While the Federal Circuit reversed the jury’s holding in favor of Google, the Supreme Court found that “Google’s copying of the API to reimplement a user interface, taking only what was needed to allow users to put their accrued talents to work in a new and transformative program, constituted a fair use of that material as a matter of law.” The court highlighted that “[t]he fact that computer programs are primarily functional makes it difficult to apply traditional copyright concepts in that technological world.” While OpenAI’s utilization of the plaintiffs’ books appears “transformative,” it remains uncertain if the functional computer code can be adequately compared to the books employed in training the OpenAI LLM.

In their opposition brief, the plaintiffs notably criticize OpenAI for discussing fair use, which they assert is “an affirmative defense … not properly considered as part of a motion to dismiss.” The plaintiffs clarify that “OpenAI has not moved to dismiss plaintiffs’ direct copyright-infringement claim. Nevertheless, OpenAI still tries to leverage its motion to pre-litigate issues it thinks will carry the day in the future. This is improper on a motion to dismiss and should be disregarded.”

Furthermore, the plaintiffs accuse OpenAI of “seek[ing] to rewrite Ninth Circuit copyright law in its favor by claiming that substantial similarity is an essential element for Plaintiffs’ claim.” The plaintiffs continue to state that substantial similarity is not necessary “in cases that involve direct, wholesale digital copying of copyrighted works — such as this one.”

Addressing the plaintiffs’ vicarious liability claim in its motion to dismiss, OpenAI first argues that the claim fails because the plaintiffs have not demonstrated that direct infringement actually occurred. Specifically, OpenAI contends that the plaintiffs assert that every output is an infringing derivative work simply because it was “based on” an original work. However, OpenAI points out that the Ninth Circuit has rejected the notion that a secondary work that is “based on” an original is necessarily a derivative work. Instead, the plaintiffs must articulate how the outputs are substantially similar to plaintiffs’ works. Second, OpenAI argues that the plaintiffs fail to plead facts to support the elements of vicarious infringement, including the defendants’ “right and ability to supervise” the direct infringement at issue and that defendants have a “direct financial interest” in the direct infringement at issue.

In their opposition, the plaintiffs reiterate their vicarious liability claim, emphasizing that (1) every output of the OpenAI LLM, which are all infringing outputs initiated by third parties, constitute the direct infringements that are the predicate for the vicarious-infringement claim against OpenAI; (2) because OpenAI created and released ChatGPT and exclusively control how it works, it can be plausibly inferred that OpenAI has always had the “right to stop” the respective infringement; (3) and allegations that OpenAI “profits richly from the use of Plaintiffs’ and Class members’ copyrighted materials” is clear evidence of financial interest in the infringing activity.

In its motion to dismiss the plaintiffs’ DMCA claim, OpenAI first argues that the plaintiffs do not plausibly allege that any CMI was removed during OpenAI LLM’s training process. In fact, OpenAI argues that the plaintiffs’ alleged facts “suggest the exact opposite,” as ChatGPT’s outputs attached to the complaints include multiple references to the plaintiffs’ names. Second, OpenAI argues that even if the training process did result in the omission of CMI from the alleged copies in its training dataset, the plaintiffs do not plead facts sufficient to draw a reasonable inference that OpenAI designed its process with the requisite intent to conceal infringement. Third, OpenAI argues that the plaintiffs’ DMCA claim fails because Section 1202(b)(3) applies only if the defendant “distribute[d]” the plaintiff’s actual “works” or “copies of [them],” which is not alleged here. OpenAI highlights the difference between removing the CMI out of a work versus creating a new work without the CMI in it.

The plaintiffs’ opposition states that OpenAI knowingly removed or altered CMI because it trained its large language models by “copying massive amounts of texts … and feeding these copies into the model.” Further, the plaintiffs claim that ChatGPT never reproduced any CMI that the plaintiffs included with their works. Moreover, the plaintiffs’ claim that the DMCA’s disputed “scienter element is evidenced by OpenAI’s conscious and telling failure to reveal which internet book corpora ChatGPT is trained on.”

Addressing the plaintiffs’ unfair competition claim, anchored in the purported DMCA claim, OpenAI presents three primary arguments in its motion to dismiss. First, OpenAI contends that the claim lacks merit because the plaintiffs have not adequately pleaded DMCA claims. Second, OpenAI asserts that the claim is deficient as the plaintiffs have not asserted an economic injury directly resulting from the alleged DMCA violations. Third, OpenAI argues that the claim falters due to the absence of factual pleadings justifying any relief under the Unfair Competition Law (UCL).

Plaintiffs’ response in defense of their unfair competition claim consists of allegations that the “unfair” prong of the UCL is met because “[d]efendants misappropriated their books and names which they marketed and sold in their commercial AI product”; the “fraudulent” prong of the UCL is met because “OpenAI knowingly and secretively scraped Plaintiffs names and CMI for use as parameters in ChatGPT, along with their copyrighted books, from sources OpenAI knew were unauthorized databases of collected copyrighted books for inclusion in a generative AI product”; and the “unlawful” prong is met predicated on the DMCA violation.

Lastly, OpenAI argues in its motion to dismiss that the plaintiffs’ negligence and unjust enrichment claims fail because they are state claims preempted by federal copyright law.

In response to OpenAI’s preemption defense to claims of negligence and unjust enrichment, the plaintiffs claim that “the Ninth Circuit has held that a state law tort claim concerning the unauthorized use of the software’s end-product is not within the rights protected by the federal Copyright Act.” The plaintiffs continue to assert that they have successfully stated a claim for negligence, as OpenAI “owed a duty of care towards Plaintiffs and putative class members when they possessed, controlled and had the authority to control Plaintiffs’ Works, and failed to safeguard that information and/or intentionally and affirmatively remove such information when training the ChatGPT system.” Similarly, the plaintiffs assert that they have successfully stated a claim for unjust enrichment, as the plaintiffs invested time and energy in creating valuable works and OpenAI benefited by taking these valuable works for use in its LLM training.

Given the precedent on this issue, securing a favorable outcome for the authors in this case may prove challenging. In 2016, the U.S. Supreme Court upheld the lower court’s ruling rejecting the authors’ claim that Google’s digitizing millions of books, summarizing them, and showing small excerpts to users constitutes copyright infringement. Deven Desai, a professor of business law and ethics at Georgia Institute of Technology, says he believes “what OpenAI has done with books is awfully close to what Google was allowed to do with its Google Books project and so will be legal.”

The outcome of this case may provide some guidance on the use of AI in the fashion industry. AI plays a prominent role in the fashion industry, contributing to tasks ranging from trend forecasting and design development to marketing and customer interactions. Much like OpenAI utilizes its LLM to generate content from books, fashion brands can employ AI to generate fresh designs.

Esteemed names like Zegna, Valentino, and Moncler have incorporated generative AI into their marketing campaigns. Specialized tools like the Artificial Intelligence-based Interactive Design Assistant for Fashion (AiDA) enable designers to input sketches, materials, and colors into a virtual mood board, using AI to generate innovative designs. A group of Amazon researchers in Israel developed machine learning that can analyze just a few labels along with images and then determine whether a certain look is stylish. Additionally, Tommy Hilfiger partnered with IBM and FIT on a project called Reimagine Retail, which allowed designers to use IBM’s AI tools to create new designs. The AI was trained using 15,000 Tommy Hilfiger product images, some 600,000 publicly available runway images and nearly 100,000 patterns from fabric sites.

Numerous IP issues arise with the use of AI in fashion. Who is considered the fashion designer when AI is involved? What protections are afforded to designs created with the assistance of AI? Is there infringement of the copyright protected works used to train the AI? Is the output generated by an AI trained on protected images a derivative work?

In the United States, where regulations around AI are still largely uncharted territory, these intellectual property concerns add a layer of complexity. However, new cases are emerging to address some of these issues, affecting sectors like the fashion industry. The OpenAI case, specifically centered on copyright issues, stands as a significant contributor to this ongoing discourse. The resolution of this legal battle has the potential to establish a precedent, influencing how both tech and creative industries harness the potential of AI while respecting intellectual property rights.

Nicolette Shamsian joined Above the Law as a fashion law columnist in 2023. Nicolette earned her B.A., summa cum laude, in Political Science and minor in Entrepreneurship from the University of California, Los Angeles and her Juris Doctor from UCLA School of Law. Nicolette is an attorney whose work focuses on intellectual property litigation. As a fashion law aficionado, Nicolette enjoys leading discussions to keep attorneys up to date on noteworthy fashion law cases.