Updated The “1.4 billion identity leak” that was hyped up before the weekend involved, no, not a database ransacking at Facebook, YouTube, or anything like that.
No, instead, a US-based spam-slinging operation accidentally spilled its treasure chest of email and postal addresses used to deluge people with special offers, marketing crap, and the like.
On Friday, Twitter user Chris Vickery teased world plus dog that he was going public on Monday with news of a massive data breach of 1.37 billion records. And that turned out to be as many as 1.37 billion contact details amassed by River City Media (RCM) – an internet marketing biz apparently based in Jackson, Wyoming, that claims to emit up to a billion emails a day.
The 200GB table includes real names, email addresses, IP addresses, and "often" physical addresses, it is claimed. Vickery said he "stumbled upon a suspicious, yet publicly exposed, collection of files," and discovered the database and documents related to RCM. Among the millions and millions of contact details were chat logs and files exposing the sprawling RCM empire. It turns out the spamming, er, marketing biz has many tentacles and affiliates, mostly acting as web service providers and advertising operations.
"Someone had forgotten to put a password on this repository," Vickery claimed. The data was found in a backup held in a poorly secured rsync-accessible system, it is alleged.
It is understood RCM gathers information from people applying for free gifts and online accounts, requesting credit checks, entering prize giveaways, and such things on the internet, or the information is bought from similar info-slurping outfits. Vickery said he managed to confirm that at least a few records were real, although the addresses tended to be out of date. He added that there are a "ton" of combinations of names, military email addresses, and IP addresses.
"I’m still struggling with the best software solution to handle such a voluminous collection, but I have looked up several people that I know and the entries are accurate," wrote Vickery. "The only saving grace is that some are outdated by a few years and the subject no longer lives at the same location."
It's not yet clear how much of the information in the backup is duplicated or accurate. It would be a convincing blow to people's online privacy if this data turns out to be valid and managed to fall into the wrong hands via an insecure rsync system.
RCM did not respond to a request for comment on Vickery's findings. Meanwhile, anti-spam clearing house Spamhaus has blacklisted the organization's entire infrastructure. ®
Updated to add
Vickery has been in touch to clarify a few things. He said that each database row included an email address, a first and last name, the public IP address used to signup for whatever got the person onto RCM's spam list, and the public IP address used to confirm the email address, which is almost always the signup IP address. A physical address is "included in large sections depending on the source, but not 100 per cent of entries."
He also answered our questions, thus:
The Reg: How many of these records are duplicated?
"Each row is unique, but a row consists of all fields: email address, full name, IP address, and sometimes physical address. If any of the fields are different, it is a unique row. Meaning that if someone was logged while on an IP at their home, but with the same email address, they may be logged a second time while on an IP at work with that email address. Ultimately the question of how many unique email addresses alone will be answered shortly as I hand over a limited copy of the database to Troy Hunt [of Have I Been Pwned] for notification purposes."
For what it's worth, Hunt reckons there are more like 393m unique email addresses in the database.
How many of rows are accurate and/or up to date?
"It would take immense resources to get a really accurate number on that. I can tell you that the data spans 2009 to 2017, and the more recent data appears to represent the more verifiable data (as one would expect). But having knowledge of an IP that was used at a given point in time for a given email address or person's name is still a powerful thing to consider from an operational security perspective."
How many of the physical addresses are simply GeoIP lookups?
"I have seen zero examples of the physical addresses simply being GeoIP lookup addresses. It appears that if the spammers were not given real addresses in a particular web form, they simply did not include them in the database. There does not appear to be 'guesswork' done when physical addresses are present."
How much of this is publicly accessible information that just goes to show that pretty every corp knows, from your public IP and email addresses, your name and where you live?
"Tying an IP address to a name or a name to an email address is, at least here in the US, highly protected. The subpoena process must be carried out and a judge must order such information to be handed over. While true that some very large corporations have likely built their own IP-to-email-to-identity mappings, it is certainly not even close to public record or even available to the general public."
RCM eventually got in touch with us, and had this to say about the leaked info: "It's an opt-in email database in which user email addresses, and IP locations were legally obtained as a requirement by the FTC and Can Spam Act of 2003."