Is web scraping legal?

Index

Updated on November 25, 2020

The web scraping is a computer technique that allows for the automated extraction of information from web pages. This definition clearly highlights the serious doubts that this technique raises from the point of view of intellectual property law and other branches of law. However, not in all cases does its use imply infringement of the rights recognized in the Intellectual Property Law (LPI). For this reason, it is necessary to specify the limits that must be respected in its exercise, especially if we take into account that the web scraping can be very useful for numerous perfectly legitimate applications.

The first clarification that must be made in this regard is that for the web scraping be legal should not fall on protected works or services by copyright (cinematographic or literary works, photographs, etc.) without the authorization of the respective rights holders. It is not legally possible to extract such works and performances using this procedure without the consent of their respective owners, as this would infringe, at least, the reproduction rights held by the aforementioned owners. Nor is it possible to extract, without consent, elements protected by other rights, such as right to one's own image (collected in the Organic Law 1/1982, of May 5, on civil protection of the right to honor, personal and family privacy, and one's own image.).

It may also be unlawful to web scraping when it affects personal information, that is, any information about an identified or identifiable natural person. The extraction of this data constitutes processing that will only be lawful if one of the conditions in Article 6 of the General Data Protection Regulation (GDPR).

However, not all information of interest is subject to copyright or image rights, or affects personal data; and, in principle, this information not subject to prior rights could be the subject of a web scraping legal. However, we must remember here that a compilation of information may be protected by intellectual property law, even if the information compiled itself is not covered by any prior rights, if such compilation constitutes a database protected by our Intellectual Property Law.

In this article, we will focus primarily on the implications that the web scraping may have, firstly, from the perspective of intellectual property rights; and secondly, in relation to the terms and conditions of use of the website. scraped.

What is a database?

A database is, above all, a collection of items. These elements may consist of works or performances that are eligible for protection, but they may also consist of other elements that are not protected by prior rights. For example, a database may compile translations of words (as in a dictionary or translator), or data on restaurants or hotels (name, address, price, location, etc.).

A requirement for us to be able to talk about a database is that the elements collected must be independent of each other. This means, on the one hand, that each of these elements must retain its meaning and significance independently of the others, and must be understandable on its own; and, on the other hand, that if one of these elements were removed, the database would continue to be functional and useful for the purpose for which it was designed.

Furthermore, for the purposes of applying the protection conferred by our Intellectual Property Law, which we will see below, the elements collected must be pre-existing to the database. This is a crucial aspect, as there will be no database (and, consequently, no right preventing the web scraping) if, instead of collecting existing elements, they are created from scratch. This was established by the Supreme Court in the Ryanair case (STS 572/2012 of October 9), in which it understood that the website of this well-known airline did not deserve the protection afforded to a database, since it no data was collected about flight schedules, but rather that this information was generated by the website itself based on a series of parameters. This is a very important distinction: if pre-existing information is compiled, we are dealing with a database in terms of our Intellectual Property Law; if new information is created and then displayed on a website, there is no database.

Finally, the data collected must be organized systematically and methodically, so that it is provided to the individual access to them.

Therefore, if we are dealing with a compilation of elements that are independent of each other, pre-existing, ordered, and individually accessible, we are dealing with a database that can be protected by intellectual property law. The question now is what exactly this protection consists of.

How are databases protected?

The LPI provides for two different approaches to protection from the databases.

Firstly, when the selection or arrangement of its elements is original, the database itself will be a work protected by copyright. copyright (Article 12 LPI). It does not matter whether the elements collected are original or not; what matters is that selection and arrangement of those elements is. We can then say that this compilation of elements constitutes a collection, which is protected under the aforementioned Article 12 of the Intellectual Property Law.

But when can the selection and arrangement of elements in a database be considered original? The Court of Justice of the European Union has ruled that a database will be original “when, through the selection or arrangement of the data it contains, its author expresses his creative ability in an original manner by making free and creative choices and thus imprints his personal touch” and, conversely, there will be no originality “when the constitution of the database is dictated by technical considerations” (Judgment of the Court of Justice of the European Union of March 1, 2012, C‑604/10).

Therefore, not every form of selection or orderly arrangement of the elements of a database is original. Originality requires making free decisions that are not imposed by technical requirements.. It will be necessary to identify some feature of this selection or arrangement of data that distinguishes it from other ordinary selections or arrangements in similar databases.

Now, it may be that the way in which the elements of a database are selected and arranged is not particularly original, but that an important challenge has been addressed. investment to carry it out. In this case, the LPI does not leave the investor unprotected, but rather provides for the following in their favor: sui generis right on databases (Art. 133 LPI). In this case, it is no longer necessary for the database to be original; it is sufficient that it has been created. a substantial investment for obtaining, verifying, or presenting content.

In this regard, it should be noted that the database compiles pre-existing elements. Therefore, in order for the investment made to justify the existence of the sui generis right, it is necessary to that such investment has been made precisely for the purpose of obtaining, verifying, or presenting existing content, never for its creation from scratch.

Finally, it should be noted that, despite the existence of two different means of protection, both routes can be combined. In other words, if the selection and arrangement of the elements of a database is original and, at the same time, a significant investment has been made in obtaining, verifying, or presenting its contents, that database will be protected by both copyright and sui generis rights.

But what happens when none of the means of protection are applicable? If a website is not a database that can be protected by copyright or sui generis rights, is it always possible to make web scrapingThere is another possible barrier: the Terms and Conditions (T&C) on the website.

The Terms and Conditions

The owner of a website who wants to prevent it from being the subject of web scraping may include in its T&C an express prohibition on doing so, so that if, despite this, a third party were to carry out such activity, they could (potentially) incur the corresponding contractual liability.

However, the aforementioned Supreme Court ruling in the Ryanair vs. Atrápalo case raised some doubts in this regard. The fact is that The Supreme Court ruled in this case that Ryanair's terms and conditions were not applicable..

The High Court held that the T&Cs are a contract governing a specific legal situation, in this case the booking of flights online. However, the activity of Atrápalo (the web scraping) was completely different from the subject matter of the T&C (the purchase of airline tickets). In fact, the web scraping not only was it not the subject matter of the contract, but it was expressly prohibited by it. Therefore, the Supreme Court did not consider it applicable and, consequently, did not find any breach of contract.

Thus, in the case discussed, the existence of an express prohibition on web scraping on Ryanair's website proved useless in preventing (or penalizing) this practice. However, we should not rush to believe that the T&Cs, in light of the aforementioned ruling, are completely useless in preventing web scraping. Each case must be considered individually..

Let's imagine, for example, that it was a website where, unlike Ryanair's site, no services could be contracted, but rather it simply offered information. It could be, perhaps, a website where you can look up the names, reviews, and contact details of restaurants (including their phone numbers and email addresses), but which does not allow you to make reservations directly through the website itself. In this case, it could not be said that the T&C regulate the contracting of a service; on the contrary, these T&C would be a contract that simply governs navigation on that website. And in this sense, it could perhaps be defensible that they regulate such navigation by prohibiting web scraping, which means that this practice could perhaps be considered a breach of contract.

This example demonstrates the existence of a wide range of cases in practice with regard to web scraping; however, at present, few court rulings have addressed the issue. Therefore, if you have any doubts in this regard, it is best to consult a professional. At Bamboo.legal, we are specialists in intellectual property and can answer your questions.

[Article written by Luis Mª Benito Cerezo]