The enormous scale of GCHQ's surveillance was revealed on Friday by newly published Snowden documents. The files note the growth in capabilities enjoyed by the UK government's snoopers since intercepting communications in bulk from 2007.
These details were revealed in a series of documents published by The Intercept including one on the "flat data store" codenamed BLACK HOLE, and a document calling itself "the one-stop shop for Cyber Defence Operations legal and policy information."
When the slide on BLACK HOLE was composed in March 2009, the flat data store held more than 1.1 trillion things which GCHQ had collected since August 2007.
The store weighed in at 217TB when uncompressed, the largest share of which was HTTP data (41 per cent), which alongside web search (19 per cent) and SMTP data (12 per cent) accounted for almost three quarters of all that it held.
Additional data covered instant messenger records, hacking logs for Computer Network Exploitation (CNE) operations, and the use of "Anonymisers."
The collection began after Section 32 of the Terrorism Act 2007 had amended RIPA to extend interception warrants.
GCHQ has since "developed new population scale analytics for multi-petabyte cluster," which allows "population scale target discovery."
In a vision document for 2013, its aim was to have created "the world's biggest SIGINT engine to run cyber operations and to enable IA, Effects and SIGINT ... [as well as] to perform CNE exfiltration, eAD, beaconry, and geo-location."
There are 7 billion people in the world. GCHQ has 18 billion targeting identifiers for them. https://t.co/ttTIhq7K91— Eric King (@e3i5) September 25, 2015
BLACK HOLE's recorded events contain only metadata, according to the "Events" page from the GCWiki, although it notes that "sometimes there are grey areas between events and content" citing how the subject of an email is generally transmitted in the header portion of the SMTP communication, despite being considered content.
Slides showing GCHQ's Content-Metadata Matrix suggest that the spooks' views of what is metadata extends to passwords, buddylists, and folders used to organize emails.
The majority of GCHQ's operational data is acquired through the agency's operational activities, whether they are interception, computer network exploitation (CNE, or aggressive hacking), or through JTRIG operations.
One new document also discloses a number of tools used to analyze the data stored in BLACK HOLE, which are complementary and provide an insight into the depth and breadth of GCHQ's surveillance practices. These tools all come under a portion of GCHQ's analysis project called BLAZING SADDLES.
It is worth noting that the word "target" here does not mean a person specified for investigation by a warrant, but merely a hypothetical identity which has had identifiers allocated to it.
AUTOASSOC provides information as to which Target Detection Identities (TDIs) have been seen at the same time and from the same IP addresses as other TDIs – allowing the spooks to enlarge the number of identifiers tied to a particular target.
HRMap provides information about host-referrer relationships, examining how internauts traverse the web, i.e., what route they have taken to a particular site, and where they proceed to.
INFINITE MONKEYS is a tool which targets v-bulletin software, to reveal the forum accounts of targets and additionally to target particular forum users.
KARMA POLICE, which we have reported on, allows the spooks to know which websites the target visited, and when/where those targets occur – all of which is additionally tied to IPs.
MARBLED GECKO provides information about the use of Google Earth and Google Maps, which combined with MUTANT BROTH allows the noseys to see who is looking at particular areas of the Earth.
MEMORY HOLE provides information on web searches made on engines such as Google's. It provides information on when, where, and from which IP addresses particular searches were made.
MUTANT BROTH is a tool to sift through BLACK HOLE data by TDIs, such as cookies. It allows the spooks to create a profile of any given target's online activities.
- SAMUEL PEPYS is described as "a near real-time Internet diarisation tool. It enables powerful IP stream analysis/profiling by fusing all available traffic types in one place. It contains both unselected events and content."
SOCIAL ANIMAL provides information about how targets interact with other targets, and with files/pictures/video on the internet.
SOCIAL ANTHROPOID is a "converged comms events database" which enables the spooks to see who their targets have communicated with "via phone, internet, or using converged channels (e.g., sending emails from a phone or making voice calls over the internet)." This project is set to subsume SOCIAL ANIMAL.
GOLDEN AXE, which shares its name with a classic side-scrolling Sega game, is primarily for International Mobile Equipment Identity defeats – allowing the spooks to figure out whether particular mobile devices uniquely identify targets. The Register understands that some handsets may have identical IMEI, as in India.
These tools were being used in a Joint Collaboration Environment titled Innov8, which was testing large-scale analytics using both GCHQ and NSA data.
A sample search was provided, based on automatic TDIs, which showed visits to pornography site YouPorn, as well as Reuters, Facebook, Yahoo, and Google.
The Intercept noted that MUTANT BROTH's ability to identify cookies was integral to GCHQ's attack on Belgian telco Belgacom.
Cookies associated with the IPs revealed the Google, Yahoo, and LinkedIn accounts of three Belgacom engineers, whose computers were then targeted by the agency and infected with malware.
The hack, codenamed "Operation Socialist," gained access to Belgacom's Core GRX routers so the spooks could run man-in-the middle attacks against targets roaming with smartphones.