This article is more than 1 year old
The glorious uncertainty: Backup world is having a GDPR moment
Many still unclear how 'right to erasure' will work
"The right to erasure is not absolute," the UK Information Commissioner's Office told us as the question of the backup tech industry's exposure to the EU's General Data Protection Regulation was raised in the week after it came into force.
The concerns
Just last week, Curtis Preston, chief technical architect at Druva, raised the issue that one cannot search backup files for right-to-be-forgotten data and that any organisation faced possible GDPR compliance issues if requested to erase personal information held in a backup file.
He blogged that "GDPR is not going to be able to force companies to 'forget' people in their backups – especially personal data found inside an RDBMS or spreadsheet."
He noted that backup software stores what it is told to back up without knowing what the data is. When told to back up a relational database:
With few exceptions, backup products are handed an object and some metadata about that object. [Backup products] don't control the content or format of the object, nor do they have any knowledge what's inside it.
In addition, the object that backup software is handed is often just a block or few that was changed in a file, VM, or database since the last time it was backed up. [Backup software] might not even know where that block fits inside the whole, nor do[es it] have the info to figure that out.
Relational databases have the concept of referential integrity. When the database is open, this is not a problem when you delete record X. It will automatically delete any references to record X so there aren’t any referential integrity problems. It will also update any indices that reference those references. Easy peasy.
That's impossible to do when the database is a bunch of objects in a backup. First, it requires the backup software to know much more about the format of the file than it needed to know before. It then would need to be able to delete a record, any references to that record, and any indices referencing that record, and it would need to do that for every RDBMS it supports. I just don’t see this being a good idea.
Storage pundits and vendors alike have been concerned about the right to erasure requirements under GDPR's article 17 – also known as the "right to be forgotten" – as they relate to backup. Vendor software across the board doesn't know what it is backing up and won't necessarily easily or practicably find the data the subject has requested to be erased, according to industry sources.
The GDPR states that organisations must erase personal details – see here for the list of when you need to do so – under certain circumstances if requested, and businesses face fines if they fail to meet this obligation.
The ICO told The Reg: "The key point is that organisations should be clear with individuals as to what will happen to their data when their erasure request is fulfilled, including in respect of both production environments and backup systems. We will be providing more information on backups and the right to erasure soon."
The Restoration period...
The only practical thing to do is to detect and erase the information on restore, he suggested, which would be a big task but, in principle, doable.
Liran Zvibel, CEO of storage startup Weka IO, noted that snapshots raised another question for legislators: "Backups are one aspect. What about snapshots on the primary storage? A storage system for that RDBMS took snapshots, now someone requests to be forgotten. Does the organization need to erase all past snapshots with his data? Maybe [it] will be forced to alter history."
Preston responded: "Snapshots are read only. You would have to make them read/write to mod them, [t]hen mount the DB in question and run the delete – [all the w]hile placing your org at significant risk of data loss."
We asked the experts
Though the law has yet to be tested in a number of ways, a data protection lawyer told The Register that a possible solution might involve securing the backup data, making sure to avoid using it "to inform any decision in respect of any individual" and erasing it at a later date which has been outlined to the person requesting the deletion.
Robert Wassall, head of legal services at ThinkMarble, told The Register: "If the data is anonymised, then it ceases to become personal data and no longer falls under the remit of the GDPR. If the backed-up data is impossible to identify and can be considered anonymous, then the right to be forgotten cannot be enforced.
"If this is the case, then the ICO will also need to be satisfied that the information has been 'put beyond use', if not actually deleted, provided that the data controller holding it is not able to use the data in the following ways:
- Does not use the personal data to inform any decision in respect of any individual or in a manner that affects the individual in any way
- Does not give any other organisation access to the personal data
- Must surround the personal data with appropriate technical and organisational security
- Commits to the permanent deletion of the information if, or when, this becomes possible
Peter Groucutt, managing director at Databarracks, a business continuity and IT disaster recovery provider, said: "This is an issue we've been discussing with software vendors, industry analysts and lawyers.
"We're already seeing some development from the software vendors, Asigra for instance, to address the issue. But the challenges of database backups, VM backups (without the option of granular recovery) aren't as easily fixed.
"GDPR does not supersede all other responsibilities. For instance, a customer may make a request to be forgotten and for a business to delete personal data, but some personal data may need to be held in order to deliver the service or for other compliance requirements. There are even cases where an organisation can refuse requests.
"The issue at the moment is the uncertainty. We'll see this clarified soon as the ICO begins enforcing GDPR."
Alex McDonald, Storage Networking Industry Association Europe board member, said: "We have a tension between what legislators imagine is possible, and what is feasible in the real world. One can only do what is feasible and practical.
"It is the outcome of using this data that's important and breaches personal privacy."
Linus Chang, CEO of BackupAssist and Scram Software, said: "If one person invokes their right to be forgotten, it would be unreasonable to expect that that person's data be deleted from backups – for two reasons:
(b) even if it is possible, deleting data from a backup is a terrible idea because it risks corrupting the backup, breaking referential integrity, breaking applications that were expecting that data to be present, and importantly, breaking any checksums on the data that would prove that a restore was successful.
Deleting data from a backup is a terrible idea because it risks corrupting the backup, breaking referential integrity, breaking applications that were expecting that data to be present, and importantly, breaking any checksums on the data that would prove that a restore was successful...
Guy Bunker, SVP of products at infosec firm Clearswift, told The Reg: "When an RTBF request comes in, then there is the possibility to go through all the old backup and archive material, search for the individual making the request and delete the appropriate data.
"The search on disk would be relatively quick, on tape will be relatively slow and on WORM [write once read many] potentially impossible – as it is 'write once' so the goal is that it doesn't get altered, ever. This could happen when a request comes in, but the following day there might be another one, and then another one, so the issue could get worse over time.
"Even moving to an archive scenario – in which the information is indexed before being stored to make it easier to find – doesn't mean it is a perfect solution as these also use WORM storage.
"Is there an obvious solution? No and historical backups and archives could potentially fall foul of GDPR compliance. For many organisations, they do migrate backups (and archives) over time which is done as 'good practice' to ensure that the information is recoverable. It might be that a process to remove RTBF information as part of that migration is something the ICO will accept as best practice. However, backup migration is something that is ongoing and could potentially take years.
"Another alternative is to have a mechanism to remove the information on restore. Again, a formalised process as to how this happens which the ICO may accept as the way to implement RTBF requests – while creating minimal disruption. Both of these ideas will also have to overcome the challenge of 'being forgotten', while remembering that you have forgotten them. This can be done with a 'one-way hash' of the information – and is the way blockchain is going to overcome their challenges of encoding 'personal' information into transactions.
"With GDPR having just come in, there are still several dark corners of the regulation which need to be better explored, but [these] will only really become concrete when there are cases in law." ®