anonymisation / big data / Data protection / data protection agencies / European Data Protection Supervisor / General Data Protection Regulation / ICO / Privacy / pseudonymisation / Risk-based approach

The GDPR and the biggest mess of all: why accurate legal definitions really matter….

Article 29 WP broom

Issued last week, here is what seems to be the final version of the General Data Protection Regulation (the GDPR)! This 6 April 2016 version, likely to be adopted by the European Parliament this week, is now in the kiosks! HIP HIP HOORRAY I hear you thinking, either ironically because more than 4 years of legislative process was definitely too long, or with (relative) good faith because we have finally a regulation (and not a directive anymore) on important matters!

The GDPR is clearly a very ambitious piece of legislation. It attempts to set the floor for all sectors in the field of data protection and this for the whole EU (i.e. 28 or 27 Member States depending upon one’s own anticipation of the UK situation) and as such one can only appreciate the effort!

With this said, the adoption of the GDPR will not suddenly solve all data protection law issues, not by far. If there is something that the final version of the GDPR shows us that is crucial to think carefully about, it is the topic of definitions, and in particular legal definitions attempting to comprehend technological practices. To demonstrate this, I will take one example and one example only, as this example really goes at the core of what data protection law is about. [For those who want to read a more scientific demonstration, with less exclamation points, we also have an article on this, which will soon be in the kiosks as well!]. This hopefully will also show that the future European Data Protection Board (for more details on this new body, see here) will have to be less dogmatic and more pragmatic.

Remember the soon-to-be-defunct Data Protection Directive (DPD)? In its Recital 26 it excludes “data rendered anonymous” from the scope of data protection law. This is what Recital 26 says:

“Whereas the principles of protection must apply to any information concerning an identified or identifiable person; whereas, to determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person; whereas the principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable;”

Reading this recital it thus seem that the DPD embodies a risk-based approach to delineate the category of personal data, the test of which is that of “the means likely reasonably to be used” by the data controller and third parties. If the approach is risk based, and this is important, this should thus mean that the anonymisation process does not have to be irreversible to find that legally the data has been rendered anonymous and therefore is outside the scope of data protection law.

What does the GDPR say in relation to anonymisation? Similar things. Recital 26 of the GDPR is wordier but goes in the same direction:

“To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes;”

The GDPR, just like the DPD, embodies a risk-based approach, the test of which is that of the “means reasonably likely to be used” by the data controller and third parties in determining whether a particular person may be deemed identifiable from the data.

The GDPR, nevertheless, goes beyond the DPD in that it introduces a new definition in its Article 4: the definition of pseudonymisation.

“’pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;”

[Note that so-called pseudonymisation processes implemented for targeted advertising purposes are not pseudonymisation processes within the meaning of the GDPR, for the very rich nature of the information collected for targeted advertising purposes makes it very likely that a natural person is identifiable from the information collected!].

Moreover, Recital 26 expands upon its definition in confirming that:

“Data which has undergone pseudonymisation, which could be attributed to a natural person by the use of additional information, should be considered as information on an identifiable natural person”.

While anonymisation processes can lead to data being brought outside the scope of data protection law, pseudonymisation processes would not have the same upshot. Why?

The ICO, in its Code of Practice of 2012 (Anonymisation: Managing Data protection Risk Code of Practice) states that effective anonymisation through pseudonymisation is not impossible (page 21).

Why is the guidance in this area such a mess, with seemingly contradictory guidance being given on the efficacy of anonymisation efforts to transform personal data into a form that takes it outside the scope of the data protection rules in respect of its (further) processing post-anonymisation? One way to make sense of it all, is to go back to the EU Article 29 Data Protection Working Party (WP)’s opinion on Anonymisation Techniques of 2014. The WP wrote in 2014 that:

it is critical to  understand  that  when  a  data  controller  does  not  delete  the  original  (identifiable)  data  at event-level, and the data controller hands over part of this dataset (for example after removal or  masking  of  identifiable  data),  the  resulting  dataset  is  still  personal  data”.

As Francis Aldhouse (previously the UK’s Deputy Information Commissioner) explained in his keynote speech at a workshop dedicated to anonymisation practices in March at the University of Southampton, such a statement is clearly problematic. It is basically implying that a dataset can never be considered anonymised (i.e. as non-personal data, the processing of which would not trigger the application of data protection rules) as long as the initial raw dataset has not been destroyed.

Reading the GDPR in the light of the WP’s 2014 opinion, it would thus seem that as long as the initial raw dataset has not been destroyed, the subsequent de-identified dataset can only be called pseudonymised dataset. [An alternative reading would be to say that pseudonymised data does not necessarily amount to personal data, as the word “could” is being used in the GDPR phrase “which could be attributed to a natural person by the use of additional information”. However, upon that interpretation, it makes less sense to say, then, that pseudonymisation is a means for data controllers to comply with their data protection obligations. Moreover, this interpretation seems at odds with a straightforward reading of the meaning of Recital 26 of the GDPR.]

So what about researchers then?  Would they have to comply with all data subject rights? There is an exception for the right to information (Article 14(5)(b) of the GDPR), but it is less obvious in relation to access for example (although Article 11 should be of some help…  But does singling out also mean identifying?).

Does this really make sense? If one adopts a risk-based approach, I am afraid the answer is no.

Besides, what about aggregated datasets? Technically, it would seem that aggregation is not exactly the same as pseudonymisation, while both processes in principle could be used to attempt to anonymise datasets. Is the EU legislator now merging aggregation and pseudonymisation? What a mess!

Surely, assuming it is possible to adopt a risk-based approach, and the work of the UK Anonymisation Network seems to show that it should be possible with a bit of good will and further research, it would make sense to leave the legal door open to the possibility of adequate de-identification, as long as it is clear that once the data is rendered anonymised this does not mean that the initial data controller can forget about it all! In fact the whole point of a risk-based approach is to say that to maintain the characterisation of “data rendered anonymous” it is crucial to continuously monitor the data environment!

What is more, the position adopted by the EU legislator in the GDPR has been justified as an application of a precautionary principle. But what is Recital 29 really implying? It is written that:

“In order to create incentives to apply pseudonymisation when processing personal data, measures of pseudonymisation should, whilst allowing general analysis, be possible within the same controller when that controller has taken technical and organisational measures necessary to ensure, for the processing concerned, that this Regulation is implemented, and that additional information for attributing the personal data to a specific data subject is kept separately.”

In other words, through the means of this recital, the EU legislator seems to be legitimising big data analytics practices on data debris (i.e. data already collected), at least where they take place within the same controller, as long as pseudonymisation processes are in place. At the same time, when compared with the DPD, the GDPR has made one safeguard disappear [see my previous post on this here]: the further processing of personal data for historical, statistical or scientific purposes is deemed compatible under the DPD (Recital 29 and Article 6(b)) with the initial processing, as long as the further processing does not aim at taking measures or decisions against the data subjects [although obviously Article 89 of the GDPR brings in new safeguards. See also Article 21(6)].

What does this story show? That legal definitions can be a mined field!

Besides, it also shows that once again the WP, or the European Data Protection Board as it will soon become, will need to have a strong back, as it has huge responsibilities given the relatively limited role played by the Court of Justice of the European Union (CJEU) in the interpretation of data protection law in practice.

Two last calls:

  • To Article 29 WP: the ball is yours, it’s never too late to distinguish aggregation from pseudonymisation!
  • To the European Commission: the e-privacy Directive also contains the words “anonymous” and “pseudonymous”! When reviewing it, could you be careful when/if you pick at this phrase that you remain consistent with the GDPR and the new terminology that is being developed around it!


Sophie Stalla-Bourdillon


9 thoughts on “The GDPR and the biggest mess of all: why accurate legal definitions really matter….

  1. Pingback: ‘The GDPR and the biggest mess of all: why accurate legal definitions really matter …’ | Private Law Theory - Obligations, property, legal theory

  2. Pingback: Law and Media Round Up – 2 May 2016 | Inforrm's Blog

  3. Pingback: New rules around the Processing of Passenger Travel Records in respect of flights into or out of the EU to apply from 2018 | Peep Beep!

  4. Pingback: eIDAS applies from 1 July 2016: An EU dream come true after a Brexit nightmare? | Peep Beep!

  5. Pingback: The First-Tier Tribunal and the anonymisation of clinical trial data: a reasoned expression of Englishness…. which would have to be abandoned with the GDPR? | Peep Beep!

  6. Pingback: CJEU in Breyer: Dynamic IP addresses will (very?) often be personal data and German Law is too restrictive! Okay but how shall we care about voluntary and systematic retention of logs? | Peep Beep!

  7. Pingback: Case Law, CJEU: Breyer v Germany, Dynamic IP addresses will (very?) often be personal data and German Law is too restrictive – Sophie Stalla-Bourdillon | Inforrm's Blog

  8. Pingback: A call for a common techno-legal language to speak about anonymisation, pseudonymisation, de-identification… Could this be one of the biggest challenges brought about by the GDPR? | Peep Beep!

  9. Pingback: Anonymisation, pseudonymisation, WiFi tracking and the French: the JCDecaux case | Peep Beep!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s