The Counter-Terrorism and Security Bill (CTSB) was introduced in the House of Commons on 26 November 2014 as a way to “address the capabilities gap the authorities face when it comes to communications data” says Theresa May. In particular, the aim is to “require internet providers to retain Internet Protocol – or IP – address data to identify individual users of internet services”.
What is the Counter-Terrorism and Security Bill really bringing to the table? [of law enforcement bodies?] We got caught off guard by the Data Retention and Investigatory Powers Act (DRIPA) in July 2014, do we really need CTSB?
DRIPA was adopted over the summer to make sure data retention laws would survive the invalidity of the data retention Directive as recognised by the Court of Justice of the European Union (CJEU) in April 2014 in the Digital Rights Ireland case.
A few things to mention about DRIPA:
- DRIPA starts by adopting the language of the Regulation of Investigatory Powers Act 2000 (RIPA) [that sets the rules for access to communications data by law enforcement bodies]. s.2(1) provides that ““communications data” has the meaning given by section 21(4) of [RIPA] so far as that meaning applies in relation to telecommunications services and telecommunication systems”. [Note that communications data within the meaning of RIPA cover a lot and arguably could go beyond network-level metadata and include the payload. See here for more information on these terms].
- However, s.1(1) of DRIPA provides that retention notices can only target “relevant communications data”. The adjective ‘relevant’ is very important as it has the effect of reducing the types of data to be retained by public telecommunications operators. In other words, the category of relevant communications data is narrower than the category of communications data to be found in RIPA. The category of ‘relevant communications data’ is expressly defined within the Schedule to the Data Retention (EC Directive) Regulations [the substance of which is now to be found within the Schedule to the Data Retention Regulations 2014].
Part 3 of this Schedule to the 2009 Regulations identifies 5 types of data relating to Internet access, Internet e-mail or Internet telephony, which comprise:
- Data necessary to trace and identify the source of a communication, i.e. user ID and telephone number allocated to the communication, the name and address of the subscriber or registered user to whom an IP address, telephone number or user ID was allocated
- Data necessary to identify the destination of a communication, i.e. user ID, telephone number, name and address of the subscriber or registered user at the other end [No mention of IP addresses here]
- Data necessary to identify the data, time and duration of a communication, i.e. IP address, user ID and date and time of the log-in and log-off
- Data necessary to identify the type of communication, i.e. the internet service used
- Data necessary to identify users’ communication equipment, i.e. calling telephone number or DSL or other end point of the originator of the communication
Part 3 of the Schedule to the 2009 Regulations does not cover all types of network-level metadata (the traditional 5-tuple) and in particular are missing port numbers (which give a hint to which applications are involved as certain applications generally use the same port numbers) and the protocol used.
This means that by looking at the data retained it is not always possible to single out one individual. There are several reasons that explain such “impossibility”. First of all, per definition IP addresses are linked to devices and not individuals. Second, IP addresses can be shared. Network Address Translation (NAT) is a way to link many computers to one IP address and this can happen with unrelated computers.
Indeed, Tim Chown explains that “in a home network, NAT shares all family devices behind a single IP address, so by using the observed source IP address of a communication alone, an ISP cannot determine the individual – father, son, mother or daughter – whose traffic is observed. Some homes though, such as student residences, or perhaps blocks of flats, have people who may only loosely know each other, if that. And in a wireless ‘hotspot’, such as at a coffee shop or airport, there will be many completely unrelated people sharing one IP address.
Because NAT usually rewrites TCP/UDP source port numbers as well as the internal private source IP address, someone looking to identify an individual needs to combine IP address (source and destination) and port number (source and destination) data. But without any access to the logs of the IP/port mappings of the NAT device (the router for the home network or hotspot), and the mappings of internal private IP addresses to physical device addresses (MAC/Ethernet addresses) this is impossible to do for an external observer (such as the ISP or a content provider”)”.
What does CTSB do then? It enlarges the list of data to be mentioned in retention notices going beyond Part 3 of the Schedule to the 2009 Regulations. Retention notices can now target “relevant internet data”, which means data that:
“(a) relates to an internet access service or an internet communications service,
(b) may be used to identify, or assist in identifying, which internet protocol address, or other identifier, belongs to the sender or recipient of a communication (whether or not a person)”,
This would thus mean that IP addresses of destination, port numbers of both source and destination, protocols and even MAC (media access control) addresses would also need to be retained [“Data necessary for the resolution of IP addresses could include port numbers or MAC (media access control) addresses” say the explanatory notes]……as long as these data are actually “generated or processed in the United Kingdom by public telecommunications operators in the process of supplying the telecommunications services concerned”. This is where it gets trickier but it could be argued that – at least for carrier grade NAT – ISPs are generating translation data and therefore could be required to retain the logs…. although retaining all the logs could be quite expensive. The best interpretation should however be that as long as logs (of data) are not generated, there should not be any retention but the word ‘processed’ is added to ‘generated’ and arguably processing does not require retention.
With this said, ISPs are not always generating translation data themselves. As Tim notes, “ in a home network the NAT device (home router) is often bought by the home user, and logs, even if enabled, are not available to the ISP. As we said above, to identify a specific device, the NAT mapping logs, and logs of which physical device was assigned to which internal private IP address, would need to be retained and correlated”
Notably even if ISPs have access to the NAT mapping logs, the physical identifier (generally the Ethernet/MAC address) can be changed by the user (indeed an experiment on MAC address privacy was recently run at the 91st IETF meeting in the USA).
In larger commercial networks, e.g. enterprise networks, or university campuses, it may be the case (particularly in academic networks) that NAT is not used, as the organisation has enough globally unique IP addresses for all its users/devices. And where NAT is used, it is probable that some form of network authentication is used, by which individuals can be identified (independently of the use of NAT). Even where a unique IP address is known for a device, e.g. a smart phone on a campus, without appropriate authentication control, it is challenging to identify the user concerned.”
There is a two-fold rider to the definition of ‘Internet related data’. It is not data which:
“(i) may be used to identify an internet communications service to which a communication is transmitted through an internet access service for the purpose of obtaining access to, or running, a computer file or computer program, and
(ii) is generated or processed by a public telecommunications operator in the process of supplying the internet access service to the sender of the communication (whether or not a person)”
What does the rider mean? That application-level metadata and in particular URLs are excluded from the scope of relevant Internet data? [In other words it would not be possible to ask ISPs to do deep packet inspection to fetch elements of the payload]. That would make sense as communications data cannot cover the content of communications. [But wait Theresa May’s intention is also to revive the Draft Communications Bill!]
What is more, as it is plausible [to say the least] that one of DRIPA’s objectives was to enlarge the category of service providers susceptible of being required to retain relevant communications data and extend it to over-the-top service providers, they could also be asked to retain extended logs, as the servers might generate the logs.
Well [to be a bit cynical] there might be hope even in despair: the situation could have been even worse. What if the Home Office had decided to get rid of the adjective “relevant”?