Screen scraping: to what extent is it lawful?

19 November 2019

According to a recent decision of the Court of Rome, screen scraping should be lawful if carried out by legitimate users—even professionals—who scrape publicly available data in a fragmented and non-extensive way for contingent use

With the decision issued on September 5, 2019 (the “Decision”), the Court of Rome left some room to carry out “screen scraping” activities, also known as “web harvesting” or “web data extraction”. Briefly, this is the process of extracting and collecting data displayed on a website, using a software, and reusing them in another website or application.

The Decision was issued in the context of interim proceedings brought by the primary Italian rail-transport operator, Trenitalia S.p.A. as the petitioner (“Trenitalia”) against Go Bright Media Ltd., an English company owning an application called “Trenìt” (“Trenìt”), as the defendant (“Go Bright”). At the request of users, Trenìt offers a comparison of rail travel solutions by accessing (rectius: scraping) the data contained in the official databases of rail-transport operators, including Trenitalia. By order of June 26, 2019 – issued on an urgent basis, without the defendant being heard – the Court of Rome had first found that Trenìt infringed the exclusive right of Trenitalia over its database and suspended the app. Having heard the parties, the same Court then overturned the previous interim order and revoked it: in fact, after further considerations, the Decision found Trenìt to be legitimate.

According to Go Bright, the software used by Trenìt carries out a “selection and continuous acquisition over time of data useful for the individual user who makes a simultaneous request” (our translation).

Depending on the originality of the scraped content and given the massive and systematic extent of the scraping itself, this activity could be considered copyright and/or database infringement. As a preliminary indication, it is important to stress that the scraping activities under scrutiny in this case did not involve any personal data, to which these remarks cannot be applied.

As to copyright infringement, under Law No. 633/1941 (the “Italian Copyright Law”, in which Directive 96/9/EC on the legal protection of databases was also transposed) only databases that, on account of the selection or arrangement of their contents, constitute the author’s own intellectual creation may access copyright protection. Therefore, the infringing nature of an act of screen scraping in relation to a database covered by copyright protection must be assessed on a case-by-case basis, primarily by assessing whether the scraped database can be considered as the creative expression of its author.

However, even when the scraped database is not eligible for copyright protection, under certain circumstances scraping may nonetheless infringe the sui generis right over such database. Under this limited protection system (under Article 102-bis ff. of the Italian Copyright Law), rights over non-creative databases are exclusively vested in their creators (i.e., those who make significant investments in creating a database or in its verification or presentation, and commit financial resources, time or work to this end), so that they can exclude third parties from extracting and reusing, in full or in (substantial) part, the information contained therein. This sui generis right only lasts 15 years from the year following the creation of the non-creative database, as opposed to the 70-year term for copyright-protected databases after the demise of their author(s).

Consequently, authorization from right holders is required only when the acts to extract or reuse concern the totality or a substantial part of a sui generis database. Conversely, extracting or reusing these data in a limited or partial way (to be understood in both quantitative and qualitative terms) is implicitly considered admissible, regardless of the authorization.

However, the same legal provision that provides sui generis protection to databases also expressly imposes that no extraction or reuse is allowed when these acts are systematically and repeatedly carried out and involve non-substantial parts of the information contained in the database, insofar as such acts require operations that conflict with the normal use of the database or cause excessive harm to the legitimate interests of its creator(s). Moreover, excessive harm must consist of something more than just the loss of profit or income that derives from the absence of a license (and subsequent lack of payment of license fees).

In the case under consideration, when assessing the impact—and thus the legitimacy—of the scraping activity performed by Go Bright, the Court also looked at the traffic of the scraped website. Since the number of Trenitalia’s database logs amounted to 800,000, roughly making up 30% of the total daily accesses to Trenitalia’s website —a figure that the Court defined as “non impressive”—the Court reasoned that there was no unlawful activity in terms of extraction of a substantial part of the database. Given that (at least based on the evidence acquired in these interim proceedings) the software used by Trenìt accesses Trenitalia’s database at the sole request of one of its users and extracts data only to fulfil such request, the scraping performed by Go Bright cannot be considered as a systematic and extensive extraction. In fact, such scraping should be construed as a periodic and selective acquisition of data on the servers of Go Bright, without any demonstrable and unequivocal misappropriation of data from Trenitalia. Moreover, the Court pointed out that the abovementioned number of accesses to the scraped website signaled that Go Bright only temporarily stored the data at issue on its servers, otherwise the defendant would not have needed to repeatedly query the petitioner’s database. For all these reasons, the Court maintained that the scraping in question was of a partial and non-substantial nature: the fragmented and non-extensive nature of the acquisition of data by Go Bright is indicative of their contingent use by Trenìt. Finally, according to the Court, the absence of any proven unjustified harm to Trenitalia excludes any infringement of the (sui generis) rights over the latter database. Conversely, it is the behavior of Trenitalia that may be deemed as anticompetitive: indeed, its attempt to prevent Trenìt from accessing its database could be interpreted as an attempt to prevent legitimate comparison activities with the offers of other competitors, also considered that the rail-transport operators’ market is an oligopoly.

The Decision gives some guidance to the increasingly numerous companies that base their business on screen scraping and emphasizes the extent to which such activity may be carried out in compliance with the rights over a database.

Back