When General Data Protection Regulation (EU) 2016/679 (GDPR) rolled out last year, it required companies and individuals to take explicit consent of individuals residing in the European Union before storing their information on systems. Such information includes personally identifiable data entered via forms, cookies, etc. The law also requires businesses to delete if a request was made by an individual to opt-ou of the decision to share information.
What about web scraping then?
All this leads to confusion and difficulties to companies that collate and curate various kinds of data off the internet. Web scraping, which is a process of downloading large amounts of data from various internet sources, directly comes under the purview of GDPR. Unfortunately, no business can survive without engaging in web scraping; marketing wouldn't exist without some form of web scraping. Thankfully, if you stick to the GDPR protocol, you can have the web scraping cake and eat it as well.
Let us take a look at how you can safely engage in web scraping and comply with GDPR at the same time.
1. Confirm if your web scraping project collects personally identifiable information
If you are collecting or using personally identifiable information such as name, residential address, date of birth, email address, credit card details, health records or video/audio recording, you will need explicit permission of the individual to whom such information belongs. Though the law specifies that it is applicable only to those who reside in the European Union, it applies to practically everyone in the world because anybody could "reside" in the European Union, even if it is for just a few hours during transit!
2. Do not collect sensitive information
Make sure not to collect data that can be politically or socially discriminatory towards certain demographic groups. This includes information related to one's race, sexual orientation, political ideology, genetic and biometric data, and potentially sensitive details. If such data is compromised, it could result in different kinds of harm to the individual, not to forget, potential lawsuits filed against you. Any ethical business doesn't need these kinds of information to run successful marketing campaigns anyway.
3. Minimize data collection and use it only for the stated purpose
Make sure that you only collect as much data as is required for your unique business requirements. Reduce the quantity of data you collect so that it is easier for you to manage and delete when such a situation arises. Do not use the data that you collect for purposes other than the stated one. If you have collected email addresses for the purpose of marketing, do not use it for another purpose without explicit consent (which needs to be periodically sought anyway).
4. Implement a data retention policy
Get your data retention policies right, as individuals can request for their data to be taken off your system. They may also request for a copy of the data you have stored. This is necessary because you will need to comply with Data Subject Access Rights (DSAR) as well. Update your policies and contracts regularly to reflect the changes taking place within the GDPR realm.
5. Seek vendor compliance compulsorily
There have been instances where a client is compliant with GDPR but the offshore vendor isn't, resulting in lawsuits being filed against both the vendor and the client. Confirm that the external vendor you are working with to scrape data off internet complies with GDPR as well. Even if you comply with GDPR, you will still violate its terms if your external partner isn't GDPR compliant. GDPR compliance is something that you need to include in your outsourcing contract, before you delegate web scraping projects to a third-party vendor.
6. Revisit older web scraping projects and conduct an audit
Conduct an audit of your older web scraping projects, and make sure that they comply with GDPR. If they don't, immediately delete data you may have stored via older web scraping projects. Many businesses take data that is stored in their records very lightly. Just because you engaged in web scraping years before GDPR rolled out doesn't mean that data is exempt from GDPR compliance. You can get into serious trouble if you audit all the data you have collected over the years, and seek permission all over again.
GDPR makes web scraping more relevant
It can seem daunting and confusing to engage in web scraping with GDPR lurking in the background. However, when you work with an agency that strictly follows the GDPR protocol while rendering web scraping services, you will discover that you are left with far more useful information than you would have if you didn't comply with GDPR. After all, if you do not have permission to use personally identifiable information of certain individuals, they probably weren't interested in doing business with your organization anyway. It wouldn't be an exaggeration to say that GDPR helps you keep your scraped data updated and relevant.