A few words on ‘Internet Connection Records’

There are many things in the new draft Investigatory Powers Bill that need very careful attention – some of which may be cautiously welcomed, some of which need to be taken with a distinct pinch of salt. The issues surrounding ‘Bulk Powers’ (which we’re not supposed to call ‘mass surveillance’) and ‘Equipment Interference’ (which I presume we’re not supposed to call hacking) will be examined in great detail, and quite rightly so because they’re of critical importance, and clearly recognised as such. The issue of ‘Internet Connection Records’, on the other hand, does not yet seem to have been given the attention it deserves – but I am sure that will change, because the collection of them has massive significance and represents a major change in surveillance, for all that they are described in the introduction to the bill as just ‘restoring capabilities that have been lost as a result of changes in the way people communicate’. They don’t restore capabilities: they provide hitherto unprecedented intrusion into people’s lives.

Internet Connection Records (ICRs)

The description of ICRs in the bill leaves quite a lot to be desired. In the introductory explanation they are set out as:

Screen Shot 2015-11-05 at 09.33.33

In accordance with the bill, these ICRs will be captured and stored for a year by the communications providers. This means, essentially, that a rolling record of a year of everyone’s browsing history will be stored. Not, it seems, beyond the top level of website (so that you’ve visited ‘www.bbc.co.uk’ but not each individual page within that website, nor what you have ‘done’ on that website). The significance of this data is very much underplayed, suggesting it is just a way of checking that so-and-so accessed Facebook at a particular time, in a similar way to saying ‘so-and-so called the following number’ on the phone, and thus the supposed ‘restoring of capabilities’ referred to. That, however, both misunderstands the significance of the data and of the way that we use the technology.

The latter part is perhaps the most easily missed. Our ‘online life’ isn’t just about what is traditionally called ‘communications’, and isn’t the equivalent of what we used to do with our old, landline phones. For most people, it is almost impossible to find an aspect of their life that does not have an online element. We don’t just talk to our friends online, or just do our professional work online, we do almost everything online. We bank online. We shop online. We research online. We find relationships online. We listen to music and watch TV and movies online. We plan our holidays online. Monitoring the websites we visit isn’t like having an itemised telephone bill (an analogy that more than one person used yesterday) it’s like following a person around as they visit the shops (both window shopping and the real thing), go the pub, go to the cinema, turn on their radio, go to the park, visit the travel agent, look at books in the library and so forth.

That, however, is only part of the problem. The other aspect is perhaps even more important – the inferences that can be gleaned from analysis of the ICRs. There are two different sides to this:

  1. The first is the ‘logical’ analysis of web browsing data: the kind of inferences that can be made by looking at the kinds of sites visited, the times that they are visited and so forth. This can be very direct, like using knowledge that a person visited sites connected with a particular religion to ‘guess’ their own religion, or that they visited sites connected with a particular health condition to ‘guess’ that they might be concerned about their own health. It can also be less direct but similarly logical – men who spend a lot of time watching Top Gear might be thought to have sympathy for Jeremy Clarkson’s views on ‘political correctness’ or be skeptical about climate change, or people who visit a lot of ‘news’ websites might be particularly interested in politics. People who visit pizza delivery websites regularly might be ‘guessed’ to have unhealthy lifestyles. The number of possibilities are huge – and not just relating to the actual sites visited, but the time and pattern of those visits. Browse a great deal in the middle of the night, and that says something very different to browsing only during working hours.
  2. The second is perhaps even more concerning: the ‘big data’ analysis of ICRs. One of the critical aspects of ‘big data’ is that it picks up traits and establishes correlations rather than seeking to find logical connections for things. This has been studied by academics, with some surprising findings – the story from one such study that ‘liking’ (in Facebook terms) curly fries correlates to higher intelligence makes the point. This kind of data – and it really is ‘big data’ – allows far more inferences to be drawn than are immediately obvious. Moreover, it is a kind of analysis that is being worked on, and worked on extensively, by some of the biggest, most powerful and most technologically advanced corporations in the world. What Google, Facebook and others develop in order to identify target audiences for advertising or markets for products is just as suitable for identifying people with particular political views.

The problems with these inferences should not be underestimated. If they’re accurate, they represent major intrusions into people’s privacy – sometimes they allow the analysts to predict behaviour better than the people themselves can predict it – whilst if they’re inaccurate they can mean that terrible decisions are made about people. When this is confined to advertising the impact is rarely that significant (though it can be, as the non-apocryphal stories of revealed pregnancies and sexuality have shown) but if decisions are made on a similar basis by law enforcement or security services they could be hideous.

So we should not underplay the importance of Internet Connection Records. They matter a great deal – and gathering them is a major step in surveillance. What is more, asking communications service providers to gather and hold them adds a whole raft of new vulnerabilities. The Talk Talk hack – and Talk Talk are precisely the kind of company who would be forced to hold this kind of data – should make the vulnerability to hacking crystal clear. This kind of data is perfect for identity theft, scamming, blackmail (Ashley Madison style) and far more crimes, and the servers holding it might as well have big red signs on them saying ‘hack me please’. The chance of individual misuse of the information should also not be downplayed – in the initial draft of the Bill it looks as though access to the data will not be via warrant, but through the ‘Designated Person’. The past has shown how individuals can misuse systems for personal reasons – this kind of data can be very tempting.

The chance of ‘function creep’ is perhaps even more concerning. Where systems are built and data gathered for one purpose, it is hard to resist using it for another, seemingly obvious and sensible reason. That’s how RIPA ended up being used for dog fouling, fly-tipping and school catchment enforcement when it was intended for terrorism and serious crime. If you build it, it will be used, and not just for the original purpose.

None of this is to say that Internet Connection Record should definitively not be collected – but that the ‘mature debate’ that has been called for on surveillance should be about what they can really be used for, and the depth of the intrusion into people’s lives that they really represent. The bar should be set very high here, and the case to gather and hold this information needs to be a very good one indeed. The arguments put forward so far do not seem strong enough to me – perhaps more will be provided in the process through which the bill is scrutinised over the next few months. If not, this is a part of the bill that should be opposed very strongly.

18 thoughts on “A few words on ‘Internet Connection Records’

  1. Yesterday in the House of Commons, the Home Secretary said,

    “Law enforcement agencies would not be able to make a request for the purpose of determining, for example, whether someone had visited a mental health website, a medical website or even a news website. They would only be able to make a request for the purpose of determining whether someone had accessed a communications website or an illegal website, or to resolve an internet protocol address where it is necessary and proportionate to do so in the course of a specific investigation.”

    This seems designed to fit with obliging CSPs to provide communications cleartext on receipt of a warrant. The police need to know where to send the warrant.

    If the Bill matches the letter of the Home Secretary’s statement, then there could be a privacy-preserving way of implementing this, which would be to limit ICR requests to the form “Did IP address ::x connect to communications website Y between dates 1 and 2, and if so, when.” The CSPs response can then be limited to “No” or “Yes, at these dates and times.”

    It would be even better if the ICR retention obligation only extended to communications websites, rather than all websites, though that might not scale well: it would require someone to maintain a central list of communications websites and distribute it to CSPs (c.f. the IWF watchlist).

    In reality, I suspect this will work differently. I haven’t had time to check, but I expect the Bill allows that the method by which the police can check if an IP address contacted a communications website is to request their whole history for a given date range and then inspect it “for the purposes of determining”. The side-effect is that they get to see if that IP has “visited a mental health website, a medical website or even a news website” anyway, even though the Home Secretary has said that’s outside the purposes for which such requests could be authorised.

    • I don’t think the controls on access built into the bill are that bad, but the reality may be very different, and I think the function creep is almost inevitable, either with or without legal changes to allow it. It will be interesting to see how the debate goes on this – and if the risks come out into the open.

  2. RIPA was never really just intended for “terrorism and serious crime” – but it was always intended for purposes that were considered necessary and proportional. And there are plenty of people who thought that it was proportionate for use in enforcing school catchment enforcement – especially those parents who feared that other parents were cheating & ensuring that their sprogs were being allowed to attend local schools at the expense of other parents who lived closer to that school!

    • Never ‘really’ intended in that way, perhaps, but sold to the public to a great extent in those terms. That matters, doesn’t it? How things are sold to the public should matter a lot!

  3. I think this is a very good article. But I see some logical contradictions.
    They exist in the big picture and are reflected in matters of detail.
    The big picture is the porosity of the Internet and that it seems that “bulk capture” is underway.
    Paul, you seem to be saying that surveillance agencies will only do what is inside the law.
    But you outline the benefits of Big Data capture or, perhaps, I should say the temptations and the dangers.
    I find it difficult to believe that this is not underway and that this Bill is not a front to legitimise some low level aspects of it, such as police activity, where results would have to be brought before the courts.
    That, of course, requires a legal framework which this Bill would provide.
    You say
    “Not, it seems, beyond the top level of website (so that you’ve visited ‘www.bbc.co.uk’ but not each individual page within that website, nor what you have ‘done’ on that website).”
    Isn’t this because an ISP’s logs will retain IP but not page request information which is contained in the packet, not the header to the request?
    This answer on Quora describes the mechanism: https://www.quora.com/What-are-the-series-of-steps-that-happen-when-an-URL-is-requested-from-the-address-field-of-a-browser
    If the Bill were otherwise it would be requesting ISPs to retain information that they do not have, apart in the transient package. So it would be requiring them to introduce additional infrastructure and also a procedure that would, in itself, be intrusive or spying (or whatever euphemism is preferred). The very activity that this legislation seeks to circumscribe by a legal framework.
    ISPs prefer not, what would they do with all this data? Moreover there are further difficulties. This is the mechanics of requests once on a particular website. Since I assume we are not talking about packet sniffing by the ISP, what is available to the ISP is only the fact of repeated requests to a particular website, but nothing of the particular page visited on that website. (Not very useful if we were talking about Facebook.)
    However there are yet further issues, those of encryption:-
    On http://stackoverflow.com/questions/187655/are-https-headers-encrypted we find
    1. “the Server Name Identification (SNI) standard means that the hostname may not be encrypted if you’re using TLS. Also, whether you’re using SNI or not, the TCP and IP headers are never encrypted. (If they were, your packets would not be routable.) – mehaase Jun 7 ’12 at 18:59”
    Which seems to contradict this representative answer:
    2. “HTTP version 1.1 added a special HTTP method, CONNECT – intended to create the SSL tunnel, including the necessary protocol handshake and cryptographic setup.
    The regular requests thereafter all get sent wrapped in the SSL tunnel, headers and body inclusive.”

    I think this means that 2. is the general state of affairs but that when a connection is setup 1. pertains.
    There is more to this and, importantly, it is important to recognise that anyone wishing to hide their communications on the Internet has various means to do so with various levels of sophistication.
    If we’re thinking about terrorists and their ideology is anti education then we may be in luck, but a moment’s thought will indicate these are not our real problems in this area.
    So here the issue is whether clumsy government legislation won’t just be driving more activity under the radar?
    But then ignorant and government which is far from straightforward make it very hard for us to assess.

    You also point out with a descriptive scenario, that this surveillance would be like following someone around monitoring their every action.
    But without this individual page information how would that be?

    If the definition in the quoted para 44. is all that is provided for what an ICR is then we are really totally in the dark.

    Paul, you then go on to explore the dangers of Big Data and data mining.
    You imply that the information in the ICR will be enough for this sort of activity. But will it?
    I believe (and I have reasons to so believe as I know that GCHQ work with Big Data mining applications) that GCHQ do “bulk collect” anyway, as I have mentioned.
    Aside from that the temptation is too great, and the competition with friend and foe may seem too imperative to resist the issues you raise are very legitimate.
    In particular there is the issue of the extent of such activity and here extent doesn’t any longer mean how much data is collected, how or from whom, It means what sort of inferences can be made and how (this is the crucial bit) those inferences are deployed.
    It is here that a new technology can so easily cross the line into totalitarian control, the notion of which a free, or at least somewhat free, society should always be able to debate.

    • There are a few errors above:-
      ‘But then ignorant and government which … ‘ should read ‘But then an ignorant government which … ‘
      ‘If the definition in the quoted para 44. is all that .. ‘ should read ‘If the quoted para 44. definition is all that … ‘
      ‘ … imperative to resist the issues … ‘ should read ‘ … imperative to resist, the issues … ‘

      I’m sure there are others. There is no edit button at this stage!

    • Thanks for all that – lots to think about. Just one point now: I certainly don’t believe that the surveillance agencies will only do what is within the law. No-one who has studied even a cursory amount of the history could really believe that. This doesn’t, however, mean that the law has no significance, or that we should ignore it. It matters politically, for example, and it matters in terms of setting norms. It also matters to many of those within the surveillance agencies and we should never assume that there isn’t any internal conflict within those agencies: Edward Snowden was an insider once. Those internal conflicts can be shaped by many factors, one of which is the law….

  4. Reblogged this on Disturbing the Universe and commented:
    Clear and cogent thinking on some of the implications of the IP Bill going through parliament really mean.

    It’s telling that MPs seem to have swallowed the ‘itemised phone bill’ analogy. It suggests that they either don’t realise what happens in an online world or are so out of touch they don’t care.

  5. Paul – what is your opinion on Subject Access Rights under S7 DPA to this data. It seems to me there is a lot of precedent to say the device IMEI, MAC, Phone Number, IP are all forms of PII.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s