This article is more than 1 year old

Building big data? Are you building a security headache too?

I didn't mean to isn't good enough

Companies have long understood how to classify certain data sets as sensitive or non-sensitive, for privacy purposes. Information tagging for example, has been a well-understood technique here. But they don’t always understand that seemingly non-sensitive data sets can become sensitive when combined.

“The way to combat this is to understand and define the desired business outcome before you collect or process the data,” says Elmellas.

“Know the lifecycle. By doing this, you can use the same traditional classification techniques but vary them as required throughout the project.”

Understanding the context in which the information is used is crucial in extracting the information you need from the huge piles of data you have collected.

Companies know this, just as the spooks up in Maryland and at GCHQ do. Context is also important for private enterprises wanting to manipulate big data in a secure way.

These challenges are difficult enough when dealing with big data inside your own domain. What about when you are shipping it out to third parties?

Don’t think you won’t. Logistics chains are a prime example, says Clive Longbottom, founder of analyst firm Quocirca.

Information may move from the retailer to the OEM manufacturer and the fulfilment company, for example. This data enables these stakeholders to deliver a product efficiently and also lets the customer track progress through a self-service portal. But companies must make sure that the information is being used sensibly at all stages.

“The information can (and should) be hashed with an identifier, rather than being stored with the personal identifiable data [PID] as it moves along the chain,” says Longbottom.

Weak links in the chain

Any PID is stored in the company’s database in hashed, encrypted form, he says, and the reference is then matched with a certificate to create a public token. That token is used if any stakeholder needs to see the PID related to the customer order – and the customer has to agree first.

“This also makes the sending of data outside a legal jurisdiction easier,” says Longbottom.

"India can work against the data to their hearts’ content, but they do not see anything that has PID in it. It is only the work packages that get returned and then, through use of the hashed security, add value to the data stored.”

So, you understand the subtle interplay of data classification, business process and risk? Good for you. But sometimes, a mere software bug can send things awry.

One of the biggest data sets of all is Facebook’s. It just ran foul of privacy rules after it accidentally divulged the personal information of six million loyal users.

When a user uses the Download Your Information feature in Facebook, the social network spits out all of that user’s data, including the phone and email addresses for any contacts they have uploaded to its address book.

The bug added any new address book information for those contacts uploaded by other users, enabling, say, abusive ex-partners to access a person’s new telephone number and email address.

Researchers found that uploading one public email address for an individual could harvest a dozen extra pieces of data about that person. The individual doesn’t even have to be a Facebook user.

Clearly, big data security leaks can come from many places. Misclassification of data is one, as is the ability to combine information from multiple sources. Simple software bugs are a third.

Companies collecting vast buckets of data about individuals may not intend to use it maliciously – but it doesn’t mean that others won’t.

By all means, build massive data sets and use them to find answers to questions you don’t even understand yet. But make sure you get your legal and computational ducks in a row before you start down this road. ®

More about

TIP US OFF

Send us news


Other stories you might like