AI systems have weird memories. The machines desperately cling onto the data they’ve been trained on, making it difficult to delete bits of it. In fact, they often have to be completely retrained from scratch with the newer, smaller dataset.
That’s no good in an age where individuals can request their personal data be removed from company databases under privacy measures like the Europe's GDPR rules. How do you remove a person’s sensitive information from a machine learning that has already been trained? A 2017 research paper by law and policy academics hinted that it may even be impossible.
“Deletion is difficult because most machine learning models are complex black boxes so it is not clear how a data point or a set of data point is really being used,” James Zou, an assistant professor of biomedical data science at Stanford University, told The Register.
In order to leave out specific data, models will often have to be retrained with the newer, smaller dataset. That’s a pain as it costs money and time.
AWS's upgraded DeepLens AI camera zooms in on EuropeREAD MORE
The research, led by Antonio Ginart, a PhD student at Stanford University, studied the problem of trying to delete data in machine learning models and managed to craft two “provably deletion efficient algorithms” to remove data across six different datasets for k-means clustering models, a machine learning method to develop classifiers. The results have been released in a paper in arXiv this week.
The trick is to assess the impacts of deleting data from a trained model. In some cases, it can lead to a decrease in the system’s performance.
“First, quickly check to see if deleting a data point would have any effect on the machine learning model at all - there are settings where there's no effect and so we can perform this check very efficiently. Second, see if the data to be deleted only affects some local component of the learning system and just update locally,” Zou explained.
It seems to work okay for k-means clustering models under certain circumstances, when the data can be more easily separated. But when it comes to systems that aren’t deterministic like modern deep learning models, it’s incredibly difficult to delete data.
Zou said it isn’t entirely impossible, however. “We don't have tools just yet but we are hoping to develop these deletion tools in the next few months.” ®