AI-generated derivative works: the case for mandatory disclosure of weights and prompts

June 29, 2025

The courts are beginning to grapple with the issue of whether works generated by artificial intelligence, particularly music, art, and prose, can be characterized as derivative of the material on which the AI was trained.

Under US law, a derivative work is

a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a ‘derivative work.’

17 USC sec. 101. Copyright law in the European Union is similar.

The creation of derivative works is one of the exclusive rights the law confers on copyright owners. The emergence of sophisticated generative AI models trained on copyrighted material has engendered litigation over whether the models themselves, or their outputs, constitute derivative works of such material.

This litigation is in its early stages. The caselaw that we have seen thus far, and recent scholarly commentary on the topic, seems to cast doubt on whether the derivative works right is infringed by the use of general-purpose generative AI.

For example, in Kadry v Meta Platforms, Inc. (ND Cal 2023), the plaintiffs advancing claims against Meta Platform’s generative AI, LLaMA (an acronym for “Large Language Model Meta AI”), contend that “every output of the LLaMA language models is an infringing derivative work,” and that because users initiate queries of LLaMA, “every output from the LLaMA language models constitutes an act of vicarious copyright infringement.”

The US District Court flatly rejected this theory of liability in granting in part Meta’s motion to dismiss:

The plaintiffs are wrong to say that, because their books were duplicated in full as part of the LLaMA training process, they do not need to allege any similarity between LLaMA outputs and their books to maintain a claim based on derivative infringement. To prevail on a theory that LLaMA’s outputs constitute derivative infringement, the plaintiffs would indeed need to allege and ultimately prove that the outputs “incorporate in some form a portion of” the plaintiffs’ books. Litchfield v. Spielberg, 736 F.2d 1352, 1357 (9th Cir. 1984); see also Andersen v. Stability AI Ltd., No. 23-CV-00201-WHO, 2023 WL 7132064, at *7-8 (N.D. Cal. Oct. 30, 2023) (“[T]he alleged infringer’s derivative work must still bear some similarity to the original work or contain the protected elements of the original work.”); 2 Melville B. Nimmer & David Nimmer, Nimmer on Copyright § 8.09 (Matthew Bender Rev. Ed. 2023) (“Unless enough of the preexisting work is contained in the later work to constitute the latter an infringement of the former, the latter, by definition, is not a derivative work.”); 1 Melville B. Nimmer & David Nimmer, Nimmer on Copyright § 3.01 (Matthew Bender Rev. Ed. 2023) (“A work is not derivative unless it has substantially copied from a prior work.” (emphasis omitted)).

Legal commentary is in accord. See eg, Oren Bracha, Generating Derivatives: AI and Copyright’s Most Troublesome Right, NC J L & Tech (2024) (derivative rights claims against general-purpose generative AI “threaten basic copyright principles of subject matter (what informational elements are within the domain of copyright) and scope (the breadth of the right to exclude with respect to copyrightable subject matter).”)

The thrust of the critique of these cases and commentary is, thus far, limited to the situation of general-purpose AI models being trained on vast quantities of wildly varying material. See Anderson v. Stability AI (ND Cal 2023) (Stability AI created and released a “general-purpose” software program called Stable Diffusion and is alleged to have used billions of copies of training images).

The case for caution in applying the derivative works right is compelling. There’s a legitimate fear of prematurely kneecapping technology that has enormous potential.

We shouldn’t lose sight of the easy cases, however. We shouldn’t allow the hard cases to prejudice our ability to call something a duck if it quacks and waddles like a duck.

If an AI model is trained exclusively on, say, the works of Banksy, so that one can, within seconds and with little effort, generate art that immediately evokes Banksy, the model itself is, in all likelihood, either a derivative work of Banksy’s art, or a means for the creation of infringing derivative works. As such, the dissemination of the model would constitute contributory or vicarious infringement.

Likewise, if a user of a generative music AI enters prompts that instruct the model to create a song that sounds exactly like an Oasis song, with the same anthemic vibe, jangling guitar riffs, and a voice indistinguishable from Oasis frontman Liam Gallagher, the end result would be, well, this.

Note that the fair use defense would likely fail in these scenarios, in that there’s nothing transformative occurring here, and the commercial market for the works is undoubtedly impaired. The intention behind the model in the case of the Banksy AI, and the user in the Oasis scenario, is to create a ready substitute of the original. The US Supreme Court’s Warhol decision stands as a near-insurmountable barrier to fair use in this situation.

It stands to reason therefore that the intention of the user and/or the AI developer becomes a paramount consideration. If the intent of the user or the AI developer is to create, or enable the creation of, generated works that are in fact substantially similar to or contain protected elements of the original works, then infringement of the derivative works right becomes much more obvious.

Consequently, the inquiry would be considerably aided if the prompts the user entered to generate the output in question were known and made readily available. An AI prompt is any form of text, information, or coding that communicates to the AI what response is sought. Prompts can be quite elaborate and complex, such that prompt engineers are now in high demand.

Likewise, the weights that the AI developer employed in the training model would bring particular salience to bear on the infringement issue. Weights in the AI context refer to the numerical values that determine the strength and direction of connections between neurons in artificial neural networks. Here’s a simplistic yet illuminating explanation:

Imagine you have a special robot that can learn and do tasks all by itself. To make the robot smart, we give it a brain called a “machine learning model.” The model has little switches called “weights” that help the robot make decisions.

Each weight is like a knob that the robot can turn to change how important different things are. Let’s say the robot is learning to recognize cats and dogs in pictures. It looks at different features like the shape of the ears, the color of the fur, and the size of the nose. The weights decide how much importance the robot gives to each feature.

The copyright significance of weights in AI models is coming into focus. See Hassan Uriostegui, AI-Copyright Weights: A New Frontier in Intellectual Property Law (“Regulating the generative output of AI is important, but recognizing the role of weights in this process could be equally crucial. By examining this concept, we might just be touching the surface of the ‘mind’ of AI, prompting a reassessment of our relationship with these sophisticated systems and the legal frameworks that govern them.”)

Disclosure of this information would go a long way in assisting judicial decision-making in the derivative rights context. If the AI developer utilized weights in a way that does not favor expressive copyrightable content, and if the generative AI user entered prompts that did not attempt to elicit the reproduction of protected expression, whether infringement of the derivatives right has occurred becomes a much easier question.

One commentator has gone so far as to condition copyrightability of AI-generated content on disclosure of the prompts that engendered it. Ziyong “Sean” Li, Rethinking Copyright Law: The Case for Protecting AI-Generated Content and Rewarding Those Who Truly Know What They Want:

Granting copyright protection to AI-generated content in exchange for the disclosure of prompts, while establishing that these prompts are not protected by copyright when used with LLMs, could resolve many issues. First, it offers incentives for prompt writers to share their prompts with the public, enhancing our awareness of our own thoughts and desires. Second, users would be free to use the prompts however they wish without restriction. Third, it simplifies oversight for the LLM platforms. Instead of filtering prompts, platforms need only ensure that the outputs are not identical to the copyrighted content initially generated by the copyright owner.

Indeed, the prudent and law-abiding user and developer should want to be able to demonstrate the prompts and weights used in creation so as to be immunized from a derivative infringement claim. As lawyers, we should be advising our clients to document this information so that it can be readily retrieved, and in a form that allows for reproduction of the results of the prompts and weights used.