Collect, delete, repeat …. From ‘Where I am’ to ‘Who I am’, and back again?
To pick up the thread from my previous posts on the topic of location data here and here, this final piece in the set returns to the first theme I discussed. This relates to the legal debate over when location data are deemed personal data, such that its processing is subject to compliance with obligations under the Data Protection Directive (as implemented e.g. by the UK Data Protection Act) and, on 25 May 2018, the GDPR 2016/679.
The importance of this debate is considered in this post in the context of the argument made by many companies that, where they collect individual-level location information from which a data subject is identifiable, this is subsequently anonymised (i.e. transformed into a state of non-personal data outside the data protection framework). [Therefore, it is argued by implication, its further processing cannot cause any privacy harm to the individual to whom it originally related.] Anecdotal examples of this argument being used include claims by some mobile operators that, while they do not offer any means for subscribers of its services to opt-out of their advertising analytics platform, data collected from users are anonymised before being shared with third parties. As such, they claim that any obligation to obtain data subject consent for data reuse is bypassed through anonymisation. Consequently, key to many privacy-related issues arising from geo-location data is the question of when an adequate level of anonmyisation – accepting that perfect anonmyisation is impossible – is achieved that satisfy legal standards of at least, for all intents and purposes, adequate ‘functional anonymisation’ in practice.
While it is easy to claim that location data stripped of identifying details are anonymous, the problem with such clams lies in the fact that – whereas direct identifiability from data may be excluded – indirect identifiability of the data subject may still be possible when the ‘anonymised’ data are combined with other information. For example, Wi-Fi analytics involves the use of information obtained through unique identifiers (media access control (MAC) addresses), which are transmitted by Wi-Fi enabled devices when searching for Wi-Fi networks. (This happens even when the device is switched off, as long as the Wi-Fi feature is switched on). The MAC address not only identifies the device but can be used to track its location over time. Such information could be personal data if individuals can be identified from it in association with other information held by network operators.
Identification and re-identification risks related to location data are all the more accentuated because of the highly unique and often granularly accurate nature of location data. ([A US government website reports findings that, in 2011, the GPS accuracy on Android smart phones ranged from five to eight meters. How Accurate is the GPS on my Smart Phone? (Part2)). Furthermore, location-relevant, privacy effects are possible when personal data are processed – see recent US news of new functionality to deliver personalised adverts on billboards that track you when your car drives past, which in turn depends upon the scanning and collection of massive of data from smartphones in cars.
I won’t dawdle to dip into wider discussions around research that alleges that it is relatively straightforward to reverse engineer apparently anonymised data to identify an individual (see e.g. ‘Unique in the Shopping Mall: on the re-identifiability of credit card metadata’ and its claim that “four cell points in a mobile trace are enough to uniquely identify 95 per cent of the individuals in a sample of 1.5 million people”; as well as recent rebuff to the findings in this paper e.g. here). However, it seems reasonable to assume that human patterns are largely predictable and non-random, together with an assumption of a strong degree of continuity between geo-location data and single (device owner) users.
Moving to a consideration of recent UK data protection guidance on this issue, the ICO’s recent Wi-Fi Analytics Guidance related to compliance with the DPA when processing Wi-Fi geolocation analytics data, includes the recommendation that network operators should convert MAC addresses into an alternative format that removes any identifiable elements and delete original data no longer required. (Remember, here, as alluded to in my previous post that the ICO considers the concept of identifiability as encompassing the possibility of specific individuals whose offline identities are unknown/unknowable being singled out in relation to data. This can be enabled through device-location tracking over time. See also the “singling out” reference in Recital 26 of the GDPR) To bolster this recommendation, the ICO suggests that privacy impact assessments be carried out in advance to identify the extent of any privacy risks associated with processing, in particular taking into account the location of data collection devices – along with the timing of collection and the use of sampling methods – to minimise privacy intrusion. Clearly defined data retention periods for aggregated data that was previously individual-level data, are also recommended.
Turning to a comparison of 2011 guidance published by the EU Article 29 Data Protection Working Party (WP) in its Opinion on Geo-location services on smart mobile devices, it warns against relying upon the assumption that stripping direct identifiers from location data are sufficient to then forget about it:
“In case it is demonstrably necessary for the developer [of an operating system} and/or controller of a geolocation infrastructure to collect anonymous location history data for the purpose of updating or enhancing its service, extreme care must be taken to avoid making this data (indirectly) identifiable.”
The WP also associates anonymisation policies closely with data deletion policies (reminding data controllers that personal data must not be kept for longer than is necessary for the purpose for which it was originally collected), while distinguishing between the effects of the two. It simultaneously differentiates between the strength of different anonymisation techniques. Therefore, the WP states that providers of geo-location applications or services should implement retention policies which ensure that geo-location data and profiles derived from such data are deleted after a justified period of time. However, it goes on to say that if the developer of the operating system or controller of the geo-location infrastructure allocates a unique id number (pseudonym) in relation to location data, this may still only be stored for a maximum period of 24 hours, for operational purposes. The reasoning is as follows:
“After that period this UDID [Unique Device Identifier] should be further anonymised while taking into account that true anonymisation is increasingly hard to realise and that the combined location data might still lead to identification. Such a UDID should neither be linkable to previous or future UDIDs attributed to the device, nor should it be linkable to any fixed identifier of the user or the telephone (such as a MAC address, IMEI or IMSI number or any other account numbers).”
The WP also implies that anonymisation of certain types of location data is a dynamic process, requiring on-going monitoring and action by data controllers:
“With regard to data about WiFi access points, once the MAC address of a WiFi access point is associated with a new location, based on the continuous observations of owners of smart mobile devices, the previous location must immediately be deleted, to prevent any further use of the data for inappropriate purposes, such as marketing aimed at people that have changed their location.”
It is not surprising that the ICO and WP show such concern on this issue. To revert to the issue of when location data are personal data (the converse issue of when personal data become non-personal data), as mentioned previously, the acknowledged status of location data as an example of personal data is highlighted in the GDPR. See Article 4(1) and its definition of personal data, which refers to location data explicitly as a factor by reference to which a person can be identified, “directly or indirectly”; this is a new addition over the DPD text.
[Although to note, Recital 30 of the GDPR appears more nuanced in suggesting that another type of data linked to location tracking – RFID tags – may become personal data, but in less strong terms than Article 4(1) : “Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them”. Indeed, Sophie and I question in a forthcoming paper whether it makes sense to treat all types of location data the same way in law, as factors such as the granularity of other information associated with the relevant data are decisive considerations for making that determination. To an extent this debate is also about how risk aversive data protection law should be in making leaps to conclusions that certain types of data are always to be deemed personal – and their processing therefore subject to data protection law – because of the potential risks of harm to individuals typically involved with their use. In particular, the impact on individuals whose data of these types are being processed are predicted to be nearly always considerable because of its close association with them. But surely it is the nature of the processing to be carried out on the data that is equally relevant?]
And, of course, these type of legal discussions around whether and when location data are subject to privacy regulation are not just confined to the EU. A US case in point is Yershov v. Gannett Satellite Information Network, Inc., No. 15-1719 (1st Cir. Apr. 29, 2016), involving an individual who downloaded and used an app (by Gannett) that provided access to videos and collects information about him. In particular, this information was the GPS coordinates of his device at the time the video was viewed, and certain identifiers associated with his device, such as its unique Android ID, along with the title of the video viewed. One of the legal questions addressed was whether that information was personally identifiable information and the court agreed that it was. I quote directly from the judgement:
“Many types of information other than a name can easily identify a person. Revealing a person’s social security number to the government, for example, plainly identifies the person. Similarly, when a football referee announces a violation by “No. 12 on the offense,” everyone with a game program knows the name of the player who was flagged. Here, the complaint and its reasonable inferences describe what for very many people is a similar type of identification, effectively revealing the name of the video viewer. To use a specific example, imagine Gannett had disclosed that a person viewed 146 videos on a single device at 2 sets of specified GPS coordinates. Given how easy it is to locate a GPS coordinate on a street map, this disclosure would enable most people to identify what are likely the home and work addresses of the viewer (e.g., Judge Bork’s home and the federal courthouse). And, according to the complaint, when Gannett makes such a disclosure to Adobe, it knows that Adobe has the “game program, “so to speak, allowing it to link the GPS address and device identifier information to a certain person by name, address, phone number, and more. While there is certainly a point at which the linkage of information to identity becomes too uncertain, or too dependent on too much yet-to-be-done, or unforeseeable detective work, here the linkage, as plausibly alleged, is both firm and readily foreseeable to Gannett.”
By way of an interesting contrast, however, in terms of how the courts are developing the implications of similar conclusions around personally identifiable information (‘PII’) involving location data, came this month a decision from the pre-eminent US Supreme Court in Spokeo, Inc. v. Robins. In this case, the Court had to opine on whether an individual (Robins) had suffered a concrete harm when data broker, Spokeo, posted false information about him on a website. In the context of the issues discussed in this post, Justice Alito said that “it is difficult to imagine how the dissemination of an incorrect zip code, without more, could work any concrete harm” (p.11). Yet, as Daniel Solove (rightly in my mind) points out here: “Congress might have imagined that an incorrect zip code in a credit report is a concrete injury. Perhaps because a lot can be inferred about a person based on where they live. I bet you can probably make some guesses about a person’s wealth if her zip code is 90210. Other zip codes might lead to demographic generalizations about race, religion, or ethnicity. Marketing companies have found it useful to segment by zip code based on generalizations about people who live in certain areas. Maybe you’re trying to get a job where you need to be on call and living nearby, but the wrong zip code puts you very far away. Or maybe you said you lived at a particular address but your zip code doesn’t match the one in your profile due to an error, and you might be viewed as lying“. (To note, home address postcode – as it is called in the UK – is typically used in analytics to link to third party demographics data, such as Experian Mosaic).
To conclude and draw these different strands together one last time, there appears to be a shift towards recognition by key regulators that new geo-location technological developments can facilitate personal identifiability in relation to data (effectively jumping ‘from where you are’ to ‘who you are’). The most accurate and potentially privacy-invasive of such data being GPS. [Whereas – and this is something that the regulators perhaps do not emphasise enough, the extent of the privacy harm that might flow from the use of location data can only be assessed fully by considering, not just the type of data in the particular circumstances at issue (e.g. a post code or MAC address) and other information associated with the relevant data, but also the processing activity to which it is subjected and perhaps the associated purpose of the data controller].
Moreover, while anonymisation is a good security tool, longitudinal location data is undeniably much harder to depersonalise adequately in practice (and according to legal standards) than many other data elements relating to people. But what is the limit that someone should go to when attempting to render location data non identifiable (in considering “the means reasonably likely” that others could use to thwart their efforts, citing Recital 26 of the GDPR, including in respect of new technical possibilities and other factors related to the associated data environment)? And there is the additional proviso, of course, that the stronger the anonymisation, the lower the re-identification risk (and, presumptively, the better the privacy protection), but the worse the data in terms of extracting informational value from it in a dataset. Whereas the risks involved from a data subject’s point of view are not just from commercial operators who might unfairly use their data, but also encompass lax security around the storage of location data that could leave them open to harm from cyber criminals.
Finally, in ending this series of posts on location data and data protection, we should not forget the international picture. Rules under the DPD and the ePrivacy Directive (currently under consultation) can apply to those who collect and process location data related to identifiable EU citizens from outside the EU. This is something that the Facebooks and Googles of this world (as well as the newer developers of software that create innovate solutions in mashing up and creating value from predictive patterns in our location data) must always bear in mind!