Misconfigured Big Data apps are leaking data like sieves
Bank and health info included in more than a petabyte of files left lying around
More than a petabyte of data lies exposed online because of weak default settings and other configuration problems involving enterprise technologies.
Swiss security firm BinaryEdge found that numerous instances of Redis cache and store archives can be accessed without authentication. Data on more than 39,000 MongoDB NoSQL databases is similarly exposed.
More than 118,000 instances of the Memcached general-purpose distributed memory caching system are also exposed to the web and leaking data, according to Binary Edge. Finally, 8,000-plus instances of Elasticsearch servers responded to probes.
BinaryEdge concludes that it found close to 1,175 terabytes (or 1.1 petabytes) of data exposed online, after looking into just four technologies as part of an online scan.
"Versions installed are quite often old and not updated, which means that, in some cases, not only is data exposed but even servers can be compromised," Binary Edge concludes in a blog post on its research. "Companies are still figuring out how to use these technologies and by default they are not secure."
Tiago Henriques, chief exec at BinaryEdge, told El Reg that the problems it identified were almost always due to misconfiguration that exposed systems on to the internet rather than inherent flaws with the software deployments themselves. Firewalls and other defensive technologies were not deployed correctly to protect servers, leaving them open for BinaryEdge and others to probe.
"We haven't contacted the developers of these technologies. However, for example in the case of Redis and even some of the other technologies the developers clearly state that these services are not meant to be directly exposed, yet organisations keep ignoring these warnings and do it anyway," Henriques explained.
Misconfigured installations were discovered in a wide range of organisations, ranging from small businesses to large top-500 companies. "Some of these technologies are used as cache servers, so its data is always changing and a multitude of client/company data can be looked at, for example, auth[entication] sessions information," BinaryEdge added.
Pressed for a better idea of the type of data exposed, Henriques offered a more detailed explanation of what the security firm found.
"We obviously didn't look at the actual data at all. However, we did do a small analysis on database/keys names. What we did with each technology was write probes that would request service status, like versions used, and database metadata, like names and sizes," he said.
"There are also a lot of usernames and passwords and also session tokens which could be used to take over active sessions. We also have databases from pharmaceutical companies, hospitals which are named 'patient' and 'doctor-list' and to finish we have banks as well, with databases named 'coin' and 'money'," he added.
In another case, a firm in the robotics industry had left files on its database such as "blueprints" and the names of projects exposed. BinaryEdge only looked at metadata pertaining to exposed files, rather than their contents. Some of these files might be honeytraps designed to divert hackers, of course, but it's hard to see this applying to anything more than a minority of cases.
BinaryEdge wants to use its research to build an "automated system that will alert companies of open technologies in their networks" which it intends to develop as a commercial service.
"We are going to warn companies for free when we do this type of publication. Business is important, but so is the safety of this data," Henriques said. "After we give them this warning, we will then offer them an optional service that we are developing called Timelines, where they can use our platform to scan and continuously monitor their perimeters." ®