The lady (or man) vanishes: The thorny issue of GDPR coding

The Devil is in the enhanced data model

Europe's General Data Protection Regulation (GDPR) is now less than a year away, coming into effect in May 2018, and any legal or compliance department worth its salary should already have been making waves about what it means for your organisation.

As a technology pro, you know that these waves will lap up against the side of your boat. You're probably going to have to recode something somewhere. The question is what, why, and how bad is it going to be?

GDPR gives customers ("data subjects" in privacy lingo) new rights that experts say will create thorny technical problems for companies. Under the new regulations, subjects will be able to ask companies to delete their information, simply by withdrawing consent. That might sound simple on the surface, but there are typically a lot of moving parts underneath.

Mark Skilton, professor of practice information systems, management and innovation at the University of Warwick's Business School, has spent decades grappling with those moving parts. He handled business analysis for Sky's new media business, and also dealt with information systems at companies ranging from Unigate to CSC.

"A lot of the large multinationals will have multiple databases either on premises or with multiple cloud providers," he says, adding that this is why those in the know are tearing their hair out. "They have to go and delete all the data where instances of you are held today, or historically."

That data isn't always instantly available, or even visible, because of the legacy and different data fiefdoms that have sprung up in most IT departments.

The ICO, which recently issued guidance on tackling GDPR, lists data mapping as one of 12 preparation steps. You can do it manually by talking to subject matter experts in the organisation, or you can bring in discovery tools from firms like Exonar or others to assist. You'll probably need to do both.

Companies won't just need to delete such data on demand. They must also export it into a machine-readable form upon request, and provide the file to customers who want to move to another service provider.

This raises other issues, warn experts. "What happens if a bunch of disgruntled ex-customers get together and all issue a regulatory request to export their data?" asks Christy Haragan, principal sales engineer at NoSQL database maker MarkLogic. Under GDPR, companies must service those needs within 40 days. They may be able to go and find that data manually and export it, but that approach won't work at scale. You have to automate it, at least in part.

Olivier Van Hoof, pre-sales manager at data governance firm Collibra, argues for the use of a master system that can act as a single point of control for various transactional systems. Companies will already have something like this in some regulated sectors such as finance, which already has to contend with know your customer (KYC) requirements, he suggests.

"You're already creating a master environment where you master customer data all in one area," he says. "Rather than delete it in 15 systems, I will flag it for deletion in the main database and drive it from there."

One system to bind them all

Building a "single source of truth" that indexes multiple transactional systems is likely to be a key requirement for many businesses under GDPR. But what should that system look like?

Dave Levy, associate partner at IT firm Citihub Consulting, posits two broad approaches. Using a single consolidated database that serves as the single source of information on data subjects does away with the whole gnarly problem of storing multiple records in different systems.

This would also help with another technology measure, which might otherwise throw an extra spanner into the works: pseudonymisation. Although not mandatory under GDPR, the ICO highlights this concept as one option to demonstrate that you comply with the accountability principles under GDPR. It replaces individual personally identifying information such as names and addresses with tokens referenced somewhere else.

"A consolidated solution will have the huge advantage that pseudononymisation becomes easy and cheap and that none of the transactional systems know who they're dealing with," Levy says.

The downside is that this approach would involve some significant changes to transaction systems so that they could support it. "It may make subject matter access and restriction of processing requests harder to perform," he adds.

The alternative is to use a data lake; a less structured pool of data that takes input from transactional systems.

"A data lake taking feed from the CRM/HRMS and other transactional systems may be less disruptive," says Levy, adding that the ultimate choice will depend in part on the size of the application portfolio.

Documenting consent and legal purpose

Whichever a company chooses, the data lake or consolidated database, it will have to support not just the right to be deleted and that provides for data portability, but several other processes mandated by GDPR. Data subjects can object to automated profiling using algorithms without human intervention, and can demand that the processing of their data is restricted while such complaints are sorted out, for example.

Under GDPR, these rights apply to data records based on the legal purpose of those data records, explains Levy.

"For any piece of personal information, you need to know your legal purpose," he says. "That's part of the enhanced data model, and not all rights are available to all pieces of information."

The legal purpose for storing and using a particular piece of information feeds into another major change to the rules under GDPR: consent.

Under the existing regime, companies can gather consent in a single agreement that covers many uses of the customer's data. When GDPR hits next year, consent will have to be more granular, explains William Long, data protection, privacy and information security partner at international law firm Sidley Austin.

This carries two significant ramifications. Firstly, companies will have to gather consent for specific uses of their data, which Long says could lead to a checkmark system for different kinds of consent.

Maybe a customer might check the boxes for your company to use their data for support and for order fulfilment, but not for marketing or behaviour analysis. Software would have to store all of those choices. Many systems won't be set up for that today.

"The draft guidance that came out from UK ICO indicated that for consent to be valid you have to show evidence of when that consent was given," he says. "So you need a timestamp showing when the individual clicked 'I accept'." That entails even more code and data architecture tweaking.

Don't overlook the need to store the specific text used to obtain consent from a customer at a given time, warns MarkLogic's Haragan. If the terms and conditions change on your site, are you sure that you'll be able to follow an audit trail showing exactly how you obtained consent, and what language you used?

This is important because GDPR reverses the burden of truth, placing the onus on the company to prove that it complied, rather than making the customer prove that they didn't, says Ashley Winton, a partner in the corporate practice at legal firm Paul Hastings and a former microelectronics engineer.

"That can be tricky. So an audit trail showing what version of the privacy policy is applicable to me is important to level the playing field," he says.

Haragan suggests using another central resource, but this time specifically addressing consent. This "consent hub" would be a go-to resource when fulfilling audit requirements, she says.

"The hub provides a central place to store 'consent entities' that attach timestamps for all granted consents, and any associated documentation that can be analysed to understand exactly what each individual has agreed to," she says.

MarkLogic is a NoSQL company, so it's probably not very surprising that she should recommend moving away from relational tables for this stuff. You'll find yourself storing HTML snippets, PDFs, email and other information snippets in there, she argues, emphasising the need for a flexible schema to support a wide variety of unstructured information. That's true; relational tables can be brittle and difficult to update compared to NoSQL schemas based on documents or JSON key value pairs.

Documenting the legal purpose and associated rights for data records will probably involve tagging data appropriately, say experts. That may not be as much of a concern when gathering future data using systems that have been configured with such tagging in mind. But what about all of your existing data?

"The elephant in the room is legacy data, which could be in a data lake, and trying to figure out how that data was collected might be difficult," says Winton.

There isn't much time left to do this stuff. Companies must put their master systems in place to index and tag all this data, and must then adapt transactional systems to support the execution of these new data subject rights. "This will be significant and should have been started last year," confirms Levy.

On the upside, if you do it properly, you'll be able to query systems to find out which parts of the business own which data records, whose servers they're running in, and how an individual's specific data is reflected across all of your systems.

That level of visibility puts you in good standing for other projects that might benefit from that information, ranging from customer relationship management to support. Let's hope so, because there has to be an upside to all this heavy lifting somewhere... right? ®

Similar topics

Narrower topics

Other stories you might like

  • Suspected phishing email crime boss cuffed in Nigeria
    Interpol, cops swoop with intel from cybersecurity bods

    Interpol and cops in Africa have arrested a Nigerian man suspected of running a multi-continent cybercrime ring that specialized in phishing emails targeting businesses.

    His alleged operation was responsible for so-called business email compromise (BEC), a mix of fraud and social engineering in which staff at targeted companies are hoodwinked into, for example, wiring funds to scammers or sending out sensitive information. This can be done by sending messages that impersonate executives or suppliers, with instructions on where to send payments or data, sometimes by breaking into an employee's work email account to do so.

    The 37-year-old's detention is part of a year-long, counter-BEC initiative code-named Operation Delilah that involved international law enforcement, and started with intelligence from cybersecurity companies Group-IB, Palo Alto Networks Unit 42, and Trend Micro.

    Continue reading
  • Broadcom buying VMware could create an edge infrastructure and IoT empire
    Hypervisor giant too big to be kept ticking over like CA or Symantec. Instead it can wrangle net-connected kit

    Comment Broadcom’s mooted acquisition of VMware looks odd at face value, but if considered as a means to make edge computing and the Internet of Things (IoT) more mature and manageable, and give organizations the tools to drive them, the deal makes rather more sense.

    Edge and IoT are the two coming things in computing and will grow for years, meaning the proposed deal could be very good for VMware’s current customers.

    An Ethernet switch that Broadcom launched this week shows why this is a plausible scenario.

    Continue reading
  • Ex-spymaster and fellow Brexiteers' emails leaked by suspected Russian op
    A 'Very English Coop (sic) d'Etat'

    Emails between leading pro-Brexit figures in the UK have seemingly been stolen and leaked online by what could be a Kremlin cyberespionage team.

    The messages feature conversations between former spymaster Richard Dearlove, who led Britain's foreign intelligence service MI6 from 1999 to 2004; Baroness Gisela Stuart, a member of the House of Lords; and Robert Tombs, an expert of French history at the University of Cambridge, as well as other Brexit supporters. The emails were uploaded to a website titled "Very English Coop d'Etat," Reuters first reported this week.

    Dearlove confirmed his ProtonMail account was compromised. "I am well aware of a Russian operation against a Proton account which contained emails to and from me," he said. The Register has asked Baroness Stuart and Tombs as well as ProtonMail for comment. Tombs declined to comment.

    Continue reading
  • As Microsoft's $70b takeover of Activision nears, workers step up their organizing
    This week: Subsidiary's QA staff officially unionize, $18m settlement disputed, and more

    Current and former Activision Blizzard staff are stepping up their organizing and pressure campaigns on execs as the video-game giant tries to close its $68.7bn acquisition by Microsoft.

    Firstly, QA workers at Raven Software – a studio based in Wisconsin that develops the popular first-person shooter series Call of Duty – successfully voted to officially unionize against parent biz Activision. Secondly, a former employee appealed Activision's proposed $18 million settlement with America's Equal Employment Opportunity Commission regarding claims of "sex-based discrimination" and "harassment" of female staff at the corporation. 

    Finally, a group of current and ex-Activision employees have formed a Worker Committee Against Sex and Gender Discrimination to try and improve the company's internal sexual harassment policies. All three events occurred this week, and show how Activision is still grappling with internal revolt as it pushes ahead for Microsoft's takeover. 

    Continue reading
  • Nvidia shares tumble as China lockdown, Russia blamed for dent in outlook
    Sure, stonking server and gaming sales, but hiring and expenses to slow down, too

    Nvidia exceeded market expectations and on Wednesday reported record first-quarter fiscal 2023 revenue of $8.29 billion, an increase of 46 percent from a year ago and eight percent from the previous quarter.

    Nonetheless the GPU goliath's stock slipped by more than nine percent in after-hours trading amid remarks by CFO Colette Kress regarding the business's financial outlook, and plans to slow hiring and limit expenses. Nvidia stock subsequently recovered a little, and was trading down about seven percent at time of publication.

    Kress said non-GAAP operating expenses in the three months to May 1 increased 35 percent from a year ago to $1.6 billion, and were "driven by employee growth, compensation-related costs and engineering development costs."

    Continue reading
  • Millions of people's info stolen from MGM Resorts dumped on Telegram for free
    Meanwhile, Twitter coughs up $150m after using account security contact details for advertising

    Miscreants have dumped on Telegram more than 142 million customer records stolen from MGM Resorts, exposing names, postal and email addresses, phone numbers, and dates of birth for any would-be identity thief.

    The vpnMentor research team stumbled upon the files, which totaled 8.7 GB of data, on the messaging platform earlier this week, and noted that they "assume at least 30 million people had some of their data leaked." MGM Resorts, a hotel and casino chain, did not respond to The Register's request for comment.

    The researchers reckon this information is linked to the theft of millions of guest records, which included the details of Twitter's Jack Dorsey and pop star Justin Bieber, from MGM Resorts in 2019 that was subsequently distributed via underground forums.

    Continue reading
  • DuckDuckGo tries to explain why its browsers won't block some Microsoft web trackers
    Meanwhile, Tails 5.0 users told to stop what they're doing over Firefox flaw

    DuckDuckGo promises privacy to users of its Android, iOS browsers, and macOS browsers – yet it allows certain data to flow from third-party websites to Microsoft-owned services.

    Security researcher Zach Edwards recently conducted an audit of DuckDuckGo's mobile browsers and found that, contrary to expectations, they do not block Meta's Workplace domain, for example, from sending information to Microsoft's Bing and LinkedIn domains.

    Specifically, DuckDuckGo's software didn't stop Microsoft's trackers on the Workplace page from blabbing information about the user to Bing and LinkedIn for tailored advertising purposes. Other trackers, such as Google's, are blocked.

    Continue reading
  • Despite 'key' partnership with AWS, Meta taps up Microsoft Azure for AI work
    Someone got Zuck'd

    Meta’s AI business unit set up shop in Microsoft Azure this week and announced a strategic partnership it says will advance PyTorch development on the public cloud.

    The deal [PDF] will see Mark Zuckerberg’s umbrella company deploy machine-learning workloads on thousands of Nvidia GPUs running in Azure. While a win for Microsoft, the partnership calls in to question just how strong Meta’s commitment to Amazon Web Services (AWS) really is.

    Back in those long-gone days of December, Meta named AWS as its “key long-term strategic cloud provider." As part of that, Meta promised that if it bought any companies that used AWS, it would continue to support their use of Amazon's cloud, rather than force them off into its own private datacenters. The pact also included a vow to expand Meta’s consumption of Amazon’s cloud-based compute, storage, database, and security services.

    Continue reading

Biting the hand that feeds IT © 1998–2022