The interface between AI and copyright: Authorship and the use of copyrighted works for machine learning

19 Ottobre 2021
In 2016 Google’s artificial intelligence (AI) wrote this melancholy poem:

there is no one else in the world.

there is no one else in sight.

they were the only ones who mattered.

they were the only ones left. he had to be with me.

she had to be with him.

i had to do this.

i wanted to kill him.

i started to cry.

i turned to him.

How can AI write poetry like this? The answer is simple: after “reading” thousands of romance books and, more specifically, by employing a type of AImachine learning that is used for AI-generated creativity. In a nutshell, the AI digests a huge number of creative works (that may or may not be protected by copyright law) and learns how to produce creative output autonomously.

This is just one of many possible examples of algorithmic art.[1] AI is definitely forcing the legal community to review the conceptual categories of the law with respect to certain fundamental aspects, from the concept of civil liability[2] to the concept of inventorship.[3] In the domain of copyright, there are two important aspects that have to be taken into account when it comes to AI-generated creativity: the concept of authorship and the extent to which copyrighted works can be used to train AI.


It has been said that copyright cannot be recognized as “a grudgingly tolerated way station on the road to the public domain,” but that copyright law makes sense and can be properly understood only if one recognizes the centrality of the figure of the author as the owner of rights.[4] Indeed, the author is the first owner of economic rights and the holder of moral rights, and the duration of protection is calculated based on the author’s life. Furthermore, under article 3 of the Berne Convention for the Protection of Literary and Artistic Works (“Berne Convention”), the author’s nationality and residence are two of the criteria used to determine eligibility for protection.

AI-driven creativity can be divided into two categories: i) AI-assisted creations where a human author programs and uses AI as part of the creative process and ii) AI-generated works that are produced with full autonomy by a machine. This article will briefly analyze the latter category of works.

The first issue that needs to be established is whether AI-generated works can be considered works protected by copyright law.

The second issue, which comes into consideration only if we agree that AI-generated works can be copyrightable, is how to identify the author of an AI-generated creative work. Is it the AI? Is it a human? If so, who among the multiple actors involved in the creative process should be regarded as the author?

With regard to the first issue, as a matter of principle, under international copyright law and EU law, a work can be protected if it is a new and original creation of the author.[5] Under the Berne Convention, any product in the literary, scientific, or artistic domain, whatever its mode or form of expression, is protectable as an artistic or literary work. The “intellectual creation” requisite is explicit in the concept of literary and artistic works, so the work must be new and original. Under EU law, Court of Justice case law established that a work can be protected under copyright law only if the work is the author’s own intellectual creation and if the subject matter protected by copyright is expressed in a manner that makes it identifiable with sufficient precision and objectivity.

The topic of authorship is strictly linked to whether AI-generated work can be considered as expression of  the author’s own intellectual creation.  In the United States, the Copyright Office in its Compendium states that “the Office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.”[6] At the European level, the European Union launched an Intellectual Property Action Plan that considers the issue of authorship of AI-generated and AI-assisted works. According to the European Commission, “Whilst […] creations autonomously created by AI technologies are still mostly a matter for the future, the Commission takes the view that AI systems should not be treated as authors,”[7] and therefore the output of an AI machine is not a copyrighted work.

The situation in the United Kingdom is interesting, as the Copyright, Designs and Patents Act 1988 (“CDPA”) provides specific rules for computer-generated works (defined as “works generated by a computer in circumstances such that there is no human author of the work”).[8] According to section 9(3) of the CDPA, “In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”[9] According to section 17(2) of the CDPA, the term of protection for computer generated works is “50 years from the end of the calendar year in which the work was made.” However, it has been pointed out that in United Kingdom law there is no reference to the originality requirement, and therefore it might be suggested that “the originality requirement will have to be self-standing and independent of authorship.”[10] Another point of uncertainty relates to the meaning of the phrase “person by whom the arrangements necessary for the creation of the work are undertaken,” exacerbated by the fact that case law on this provision is scarce. If we carry that definition over to the field of AI, where the arrangements necessary for work creation are particularly complex, the uncertainty increases.

In most countries, the fact that only a human can be an author seems undisputed at the moment and therefore traditional rules on originality and authorship should be applied; if no author can be identified there is the possibility that AI-generated works fall into the public domain, or, more likely, protected under different set of provisions.

Therefore, there is still much uncertainty in this field, both at the international and at the national level in most countries, but the more the technology develops the more answers and solutions will come. So far, no AI has won the prize of “Most Human Machine” by acing the famous Turing Test (a test of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human), but the time when one does may not be far in the future, and with “human” machines the question of authorship will become even more pressing.

Copyrighted works used to train AI

A linked hot topic is the kind of protection which can be enjoyed by the author of copyrighted works used to feed the AI-machine learning system. The use of copyrighted works for text and data mining purposes was addressed by the European Union in Directive 2019/790 (the “Copyright Directive”). Article 3(1) of the Copyright Directive allows research organizations and cultural heritage institutions to use reproduction and extraction to perform text and data mining of works and other subject matter to which they have lawful access for the purposes of scientific research. Article 4 provides an exception for reproduction and extraction of lawfully accessible works and other subject matter for the purposes of text and data mining, regardless the qualification of the data miner, on the condition that the use of the works and other subject matter has not been expressly reserved by their rightsholders in an appropriate manner, such as via machine-readable means in the case of content made publicly available online.

Given the above, the rightsholder’s exclusive rights constitute a limit on free exploitation of copyrighted works for feeding machine learning systems only when the rightsholder properly reserves his  rights. It will be interesting to see how the Italian legislature implements the passage on “appropriate manner” for the rightsholder to reserve rights, meaning whether it opts to leave it up to rightsholders to choose the best disclosure tools or suggests metadata or alternative automated tools as the German legislature did in implementing the Copyright Directive in order to prevent uncertainty.


