Researchers in the US have used machine learning techniques to study the GDPR privacy policies of over a thousand representative websites based in the EU. They found that 97% of the sites studied failed to comply with at least one requirement of the European Union’s 2018 regulatory framework, and that they complied least of all with regulatory requirements around the practice of ‘user profiling’.
The paper states:
‘Our results show that even after GDPR went into effect, 97% of websites still fail to comply with at least one requirement of GDPR.’
The study is titled Automated Detection of GDPR Disclosure Requirements in Privacy Policies using Deep Active Learning, and comes from three researchers at the University of Virginia at Charlottesville.
The area of least compliance, according to the study, concerned GDPR’s stipulations about user profiling, with the authors stating that only 15.3% of the sites studied were in full compliance with this particular rule.
User profiling (where a person’s interaction with websites is recorded and often used to ‘target’ them in other online contexts, such as advertising) has become one of the hottest controversies in tech since the Cambridge Analytica scandal.
On Tuesday, a key committee of the European Parliament passed the first stage of the new Digital Markets Act (DMA) legislation, which would ban the behavioral targeting of minors, imposing fines of up to 20% of global annual sales for infringing companies.
Though the Act has been received by the media as a direct response to the growing influence of tech giants such as Facebook and Google, the sheer scale of non-compliance represented by the new research suggests that the vast majority of EU companies (including EU-resident offices for American companies trading in Europe) are legally exposed to GDPR fines.
Additionally, Italy has this week imposed the maximum allowable fine of 10 million euros ($11.2 million USD) against Apple and Google for exploiting user profiling, among other infractions.
The sites examined in the new research were sampled from the top 10,000 websites listed in Quantcast, the English-language privacy policies of which were extracted through Yandex searches on UK-based VPNs (in order to ensure that the policies were not geo-blocked).
EU websites have been obliged to provide prescribed privacy policies, covering 18 central requirements (see graph above) since the General Data Protection Regulation (GDPR) act came into full effect in May 2018.
The researchers limited their extraction of privacy policies to a period from August 2018 onward, to allow reasonable time for domains to have published the required policies (a requisite that they had advance knowledge of for at least a year of the two-year development phase of GDPR since 2016).
The filtering process produced a privacy corpus of 9,761 policies, from which 1,080 policies were randomly selected by the researchers.
The team employed two legal experts to train four human annotators to label each of the 18 possible privacy policies mandated by GDPR.
Some of the legalese in the policies covered more than one of the 18 requirements, making it necessary to use a Convolutional Neural Network (CNN) to detect language features associated with each policy.
An initial attempt to train a model to identify compliance based on language achieved 80.5% success. To improve these results, the researchers applied Active Learning to bolster the model’s performance using less labeled data. By these means it was possible to train the classifier CNN up to an accuracy of 89.2%, with an F1 score of 0.88 (where ‘1’ is complete success).
As per standard practice, the final data was split 80/20 between trained data and test data (i.e. randomly selected data against which the accuracy of the algorithm will be judged). A human-in-the-loop measurement study was added to the architecture in order to evaluate the quality of results.
Besides the results already mentioned, the users found that portability – the right under GDPR to translocate or export data held by a company – was almost as poorly served as profiling.
The researchers conclude:
‘[Requirements] such as users’ Right to Portability and providing the contact information of Data Protection Officer (DPO contact) are covered by 15.5% and 16.4% websites, respectively. Other primary requirements, such as users’ right to Lodge Complaint, Withdraw Consent, Right to Object, and Adequacy Decision, are covered by17-20% websites.’
‘It appears that only 3% of websites fully comply with 18 requirements. These findings indicate that many websites still do not follow the requirements of GDPR.’
7pm 26/11/2021 – Clarified first graph caption. – MA
- NFL and AWS Close Out AI Safety Challenge
- IBM Acquires Envizi, Looks Toward Sustainability and Environmental Initiatives
- Overinterpretation May Be a Bigger and More Intractable Threat Than Overfitting
- Navrina Singh, CEO and Founder of Credo AI – Interview Series
- BioNTech, InstaDeep Develop Early Warning Detection System for COVID Variants