Generative Artificial Intelligence models: Not necessarily a matter of copyright

Lydia Mendola

24 Febbraio 2023

In recent months, the topic of AI has gone mainstream, and we have learned that after a generative artificial intelligence model has “read” and “digested” thousands of pieces of content, e.g., text or images (that may or may not be protected by copyright law), it can autonomously produce seemingly creative output.

The first question to be answered is whether works generated by artificial intelligence can be considered works protected by copyright law.

The second question, which need only be answered if we agree that AI-generated works can be protected by copyright, is how to identify the author of an AI-generated creative work. Is it the AI? Is it a human being? If so, who among the multiple actors involved in the creative process should be considered the author?

Can an AI model be considered the author of a work?

Regarding the first question, according to principles of international copyright law and EU law, a work can be protected if it is a new creation with creative character. The requirement of “intellectual creation” is explicit in the concept of literary and artistic work enshrined, among other places, in the Berne Convention and in Italian law. According to that requirement, work protectable under copyright is work that is the fruit “of the mind” and possesses the “creative character” typical of literature, music, figurative arts, architecture, theater, and cinematography, but also (under certain conditions) software and databases, “whatever the mode or form of expression.” This definition is commonly interpreted to mean that the work must be new and original, where original is intended to mean that it is an expression of the author’s own intellectual creation.

The issue of authorship is therefore closely linked to the possibility of interpreting output generated by AI as an expression of an author’s intellectual creation. In the United States, the U.S. Copyright Office (USCO) states in its Compendium that “the Office will not register works produced by a machine or by a mere mechanical process operating randomly or automatically without any input or creative intervention by a human author.” Across the pond, the European Union has launched an Action Plan for Intellectual Property that considers the issue of authorship of AI-generated and AI-assisted works. In a nutshell, according to the European Commission, AI systems cannot be treated as authors and therefore the product of an AI machine cannot be considered a copyrighted work.

The fact that only a human being can be the author of a work seems to be a moot point at the moment in all EU Member States and beyond, and therefore the traditional rules for originality and authorship, as well as novelty, should apply; there is thus the possibility that AI-generated works will fall into the public domain unless an autonomous and sufficiently substantial human creative contribution can be identified.

For instance, in September of last year, the U.S. Copyright Office granted registration to a comic strip generated with the help of artificial intelligence text-to-image software called Midjourney. The comic book is a complete work: an 18-page narrative with characters, dialogue, and a traditional comic-book layout. The decision still may be overturned on the basis of more in-depth analysis pinpointing the creative input of the user of the AI model, i.e., the artist who created the work. That artist has stated that the USCO asked for details of the creative process to demonstrate substantial human involvement.

I believe the issue of measuring human creative contribution will be crucial whenever an attempt is made to invoke authorship protection for works generated by or with the help of artificial intelligence models. It is likely that most of the results of generative AI models used today are not eligible for authorship protection unless the author can prove that the AI model was a moment or tool within a more complex creative process.

Can copyrighted content be used to train artificial intelligence models?

Many doubts have arisen in relation to lawfulness and other issues related to the ways generative AI models are fed. Most systems are trained using huge amounts of content—text, code, images— “scraped” from the web.

There is no easy answer to whether or not scraping is lawful, not least of all because it is difficult even to comprehend the size and complexity of generative AI training datasets. The answer will vary depending on the specific case and applicable law.

In the United States, AI researchers, startups, and established technology companies invoke the doctrine of fair use as legal grounds for seizing massive amounts of works protected under U.S. copyright law.

The doctrine of fair use ultimately aims to balance the interests of those holding exclusive rights to protected work with the social and cultural benefits derived from the creation and distribution of derivative works.

The doctrine of fair use has no direct match in Italian or EU law, but this does not mean that the European legislature and the Italian legislature have failed to address the balancing of the rights and interests of authors and other rightsholders with those of users in connection to certain new types of use of digital works.

This issue was most recently addressed by Articles 3 and 4 of the Directive on Copyright and Related Rights in the Digital Single Market (Directive (EU) 2019/790), which resulted in an amendment to Italian copyright law.

In particular, for our purposes here of considering limits on the legality of training artificial intelligence models, Articles 3 and 4 of the EU directive provide that, in certain circumstances, copyright or exclusive database rights may not be used by their respective holders to prevent massive extraction of protected content, i.e., “text and data mining.”

Text and data mining encompasses any automated analytical technique for analyzing text and data in digital form to generate information, including, but not limited to, patterns, trends, and correlations.

According to the EU directive and the Italian implementing legislation, such massive digital data mining and reproduction are freely permitted when carried out by research organizations or cultural heritage institutions acting within the limits of nonprofit study and research activities, and provided that access to such data is lawful. If the scope of the exception had been limited to this hypothesis, however, the European and Italian legislatures would not have noted that text and data mining is also a valuable tool for all participants in the digital economy, not just a handful of research institutions. According to Article 4 of the directive, Member States must establish that text and data mining is always lawful provided that the use of the extracted works and other materials has not been expressly reserved for the rightsholders in an appropriate manner. In other words, from the European perspective, it is rightsholders who must take action, by appropriate means, to keep content over which they can exercise exclusive rights from being subject to massive data mining.

Recital 18 of the same directive specifies that “in the case of content made publicly available online, it should be considered appropriate to reserve such rights only through the use of machine-readable means, including metadata and the terms and conditions of a website or service” and that “in other cases, it may be appropriate to reserve rights by other means, such as contractual agreements or a unilateral declaration.” The Italian legislature has decided not to give specific instructions for how exclusive rightsholders may reserve rights and not authorize text and data mining, even implicitly. In Germany and the Netherlands, on the other hand, use of works accessible online can be effectively reserved only if it takes machine-readable form. One example of technology going in that direction is DeviantArt, which has created a metadata tag for images shared on the web that warns AI researchers not to scrape its content.

This approach has met with mixed reactions from the art community. How does a no-scraping tag utilized by some help artists whose work has already been used to train an artificial intelligence system? And even if upstream data acquisition becomes legal in the future, how do we control how downstream datasets are used?

In fact, even if training generative artificial intelligence using data protected by copyright were legal because it was authorized—or at least not expressly reserved—it would then be necessary to state plainly how such datasets were used. In other words, it is possible to train an artificial intelligence model using other people’s data, but what is done with that model could constitute infringement.

Think of a text-to-image artificial intelligence model used in different scenarios. Training the model on millions of images and using it to generate new images is highly unlikely to constitute copyright infringement, provided that upstream training did not use “reserved” material. The training data is transformed during the production process for the final output, and it is highly likely that the result does not even partially overlap any of the works originally used to train the artificial intelligence model. However, if the model is trained on 100 images by a specific artist, with the goal of generating images reproducing that artist’s style, recurring themes of that person’s artistic production, and that person’s technique—in other words, work that could be mistaken for one of that artist’s original works—then the artist in question could have legal grounds for complaint, even if the artist had failed to express reservations about their rights after the introduction of the text and data mining exception.

However, between the two extremes of lawful and unlawful use of an AI model and its output, there are countless scenarios in which input, purpose, and output have different weights and interactions, and this could influence a legal assessment in one direction or the other. The assessment must be made on a case-by-case basis, bearing in mind the most recent guidelines of Italian jurisprudence, according to which the essential elements of an original work must match those of the work resulting from the transposition in order for copyright infringement to occur. However, plagiarism of another person’s work does not consist solely of total or partial plagiarism of a protected work, but may also take the form of what is known as “evolutionary plagiarism.” Evolutionary plagiarism presupposes a merely formal distinction between the works being compared, so that the new work, although not slavishly imitative or reproductive of the original, is only a substantial reworking of it with minimal intervention, thus resulting not in an original and individual work, albeit inspired by the pre-existing one, but in the unauthorized reworking of the latter. It should be noted, however, that the right to create derivative works (the right to elaborate, transform, modify, and so on) is one of the author’s exclusive prerogatives, even in the case of non-minimal and not purely formal transformations. The distinction between a mere inspiration and a derivative work lies in whether or not the original work is recognizable in the derivative.

This leads to another question: Who is responsible for the potentially illegal activity? Who is responsible for the plagiarism? The generative artificial intelligence model, its programmer, the company that owns the relevant platform, or the user who approached the artificial intelligence model to obtain the plagiarized work? Here again, as with questions of authorship, the answer is the same, meaning that the steps of the (possibly) creative process that led to the production of certain content need to be investigated.

Indietro