This article is more than 1 year old

The lady (or man) vanishes: The thorny issue of GDPR coding

The Devil is in the enhanced data model

Europe's General Data Protection Regulation (GDPR) is now less than a year away, coming into effect in May 2018, and any legal or compliance department worth its salary should already have been making waves about what it means for your organisation.

As a technology pro, you know that these waves will lap up against the side of your boat. You're probably going to have to recode something somewhere. The question is what, why, and how bad is it going to be?

GDPR gives customers ("data subjects" in privacy lingo) new rights that experts say will create thorny technical problems for companies. Under the new regulations, subjects will be able to ask companies to delete their information, simply by withdrawing consent. That might sound simple on the surface, but there are typically a lot of moving parts underneath.

Mark Skilton, professor of practice information systems, management and innovation at the University of Warwick's Business School, has spent decades grappling with those moving parts. He handled business analysis for Sky's new media business, and also dealt with information systems at companies ranging from Unigate to CSC.

"A lot of the large multinationals will have multiple databases either on premises or with multiple cloud providers," he says, adding that this is why those in the know are tearing their hair out. "They have to go and delete all the data where instances of you are held today, or historically."

That data isn't always instantly available, or even visible, because of the legacy and different data fiefdoms that have sprung up in most IT departments.

The ICO, which recently issued guidance on tackling GDPR, lists data mapping as one of 12 preparation steps. You can do it manually by talking to subject matter experts in the organisation, or you can bring in discovery tools from firms like Exonar or others to assist. You'll probably need to do both.

Companies won't just need to delete such data on demand. They must also export it into a machine-readable form upon request, and provide the file to customers who want to move to another service provider.

This raises other issues, warn experts. "What happens if a bunch of disgruntled ex-customers get together and all issue a regulatory request to export their data?" asks Christy Haragan, principal sales engineer at NoSQL database maker MarkLogic. Under GDPR, companies must service those needs within 40 days. They may be able to go and find that data manually and export it, but that approach won't work at scale. You have to automate it, at least in part.

Olivier Van Hoof, pre-sales manager at data governance firm Collibra, argues for the use of a master system that can act as a single point of control for various transactional systems. Companies will already have something like this in some regulated sectors such as finance, which already has to contend with know your customer (KYC) requirements, he suggests.

"You're already creating a master environment where you master customer data all in one area," he says. "Rather than delete it in 15 systems, I will flag it for deletion in the main database and drive it from there."

One system to bind them all

Building a "single source of truth" that indexes multiple transactional systems is likely to be a key requirement for many businesses under GDPR. But what should that system look like?

Dave Levy, associate partner at IT firm Citihub Consulting, posits two broad approaches. Using a single consolidated database that serves as the single source of information on data subjects does away with the whole gnarly problem of storing multiple records in different systems.

This would also help with another technology measure, which might otherwise throw an extra spanner into the works: pseudonymisation. Although not mandatory under GDPR, the ICO highlights this concept as one option to demonstrate that you comply with the accountability principles under GDPR. It replaces individual personally identifying information such as names and addresses with tokens referenced somewhere else.

"A consolidated solution will have the huge advantage that pseudononymisation becomes easy and cheap and that none of the transactional systems know who they're dealing with," Levy says.

The downside is that this approach would involve some significant changes to transaction systems so that they could support it. "It may make subject matter access and restriction of processing requests harder to perform," he adds.

The alternative is to use a data lake; a less structured pool of data that takes input from transactional systems.

"A data lake taking feed from the CRM/HRMS and other transactional systems may be less disruptive," says Levy, adding that the ultimate choice will depend in part on the size of the application portfolio.

Documenting consent and legal purpose

Whichever a company chooses, the data lake or consolidated database, it will have to support not just the right to be deleted and that provides for data portability, but several other processes mandated by GDPR. Data subjects can object to automated profiling using algorithms without human intervention, and can demand that the processing of their data is restricted while such complaints are sorted out, for example.

Under GDPR, these rights apply to data records based on the legal purpose of those data records, explains Levy.

"For any piece of personal information, you need to know your legal purpose," he says. "That's part of the enhanced data model, and not all rights are available to all pieces of information."

The legal purpose for storing and using a particular piece of information feeds into another major change to the rules under GDPR: consent.

Under the existing regime, companies can gather consent in a single agreement that covers many uses of the customer's data. When GDPR hits next year, consent will have to be more granular, explains William Long, data protection, privacy and information security partner at international law firm Sidley Austin.

This carries two significant ramifications. Firstly, companies will have to gather consent for specific uses of their data, which Long says could lead to a checkmark system for different kinds of consent.

Maybe a customer might check the boxes for your company to use their data for support and for order fulfilment, but not for marketing or behaviour analysis. Software would have to store all of those choices. Many systems won't be set up for that today.

"The draft guidance that came out from UK ICO indicated that for consent to be valid you have to show evidence of when that consent was given," he says. "So you need a timestamp showing when the individual clicked 'I accept'." That entails even more code and data architecture tweaking.

Don't overlook the need to store the specific text used to obtain consent from a customer at a given time, warns MarkLogic's Haragan. If the terms and conditions change on your site, are you sure that you'll be able to follow an audit trail showing exactly how you obtained consent, and what language you used?

This is important because GDPR reverses the burden of truth, placing the onus on the company to prove that it complied, rather than making the customer prove that they didn't, says Ashley Winton, a partner in the corporate practice at legal firm Paul Hastings and a former microelectronics engineer.

"That can be tricky. So an audit trail showing what version of the privacy policy is applicable to me is important to level the playing field," he says.

Haragan suggests using another central resource, but this time specifically addressing consent. This "consent hub" would be a go-to resource when fulfilling audit requirements, she says.

"The hub provides a central place to store 'consent entities' that attach timestamps for all granted consents, and any associated documentation that can be analysed to understand exactly what each individual has agreed to," she says.

MarkLogic is a NoSQL company, so it's probably not very surprising that she should recommend moving away from relational tables for this stuff. You'll find yourself storing HTML snippets, PDFs, email and other information snippets in there, she argues, emphasising the need for a flexible schema to support a wide variety of unstructured information. That's true; relational tables can be brittle and difficult to update compared to NoSQL schemas based on documents or JSON key value pairs.

Documenting the legal purpose and associated rights for data records will probably involve tagging data appropriately, say experts. That may not be as much of a concern when gathering future data using systems that have been configured with such tagging in mind. But what about all of your existing data?

"The elephant in the room is legacy data, which could be in a data lake, and trying to figure out how that data was collected might be difficult," says Winton.

There isn't much time left to do this stuff. Companies must put their master systems in place to index and tag all this data, and must then adapt transactional systems to support the execution of these new data subject rights. "This will be significant and should have been started last year," confirms Levy.

On the upside, if you do it properly, you'll be able to query systems to find out which parts of the business own which data records, whose servers they're running in, and how an individual's specific data is reflected across all of your systems.

That level of visibility puts you in good standing for other projects that might benefit from that information, ranging from customer relationship management to support. Let's hope so, because there has to be an upside to all this heavy lifting somewhere... right? ®

More about


Send us news

Other stories you might like