This article is more than 1 year old

Machine needs more Learning: Google Drive dings single-character files for copyright infringement

If you're unable to share documents, this may be why

Google last month announced plans to prevent customer files stored in Google Drive from being shared when the web giant's automated scanning system finds files that violate its abuse prevention rules.

"When [a file is] restricted, you may see a flag next to the filename, you won't be able to share it, and your file will no longer be publicly accessible, even to people who have the link," Google explained at the time.

That system is now up and running, just not very well: Google Drive's scanning system has been finding copyright violations where they do not exist and flagging innocuous files.

Dr Emily Dolson, assistant professor at Michigan State University, in the departments of Computer Science & Engineering and Ecology, Evolution, & Behavior, had a run-in with the errant scanner recently when she uploaded a file named "output04.txt" that consisted of a single character, the numeral one.

One wonders what exactly upset Google – the digit or the output04.txt filename? Certainly the number 1 does turn up in all manner of copyrighted works. No one let the internet search giant know that Microsoft has its own cloud storage named OneDrive.

"I'm currently teaching a graduate-level algorithms class where students need to write code that solves problems I give them," Dolson told The Register today via email. "I like to make the test cases I use to evaluate the code freely accessible to students to assist them with debugging.

"This issue occurred when I uploaded a large set of files to Drive containing inputs and expected outputs for these test cases. Among the expected output files, there were a few that contained just the character '1'. Shortly after uploading them, I received a string of emails from Google indicating that those files had been flagged for copyright infringement."

Dolson can still access the files, but she cannot share them, which she said was unfortunate because she created them to share with her students.

Others have reported similar experiences. Richard D. Morey, a Reader (UK lingo for professor) in psychology at Cardiff University, responded to Dolson's Twitter post by noting, "I stopped using Google Drive professionally for this reason. It was flagging and pulling down documents I authored myself, and no students could access them!"

And other people responding to Dolson's post claim to have independently replicated the issue by getting small files flagged in Drive.

As has been pointed out by those participating in the Twitter discussion, Europe's GDPR gives people "the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her." US privacy law, however, doesn't really do much for those subject to false algorithmic decisions.

The Register twice asked Google to confirm that Drive's content flagging system is broken but we've not heard back. However, a Google engineering manager responding to Dolson's Twitter thread acknowledged that Google is aware of the issue.

Google's post announcing its Drive abuse notification system depicts a sample message that includes a button labeled "Request a Review," to have a human check the violation scanner's decision.

But Dolson said the automated email notification she received offered no way to push back against the determination of Google's content vetting algorithm – the Request a Review button was not included in the message she received, as can be seen from the screenshot she posted.

Which, you know, is a bit worrying for people concerned about the dead hand of AI being used as arbiter in these matters.

"The e-mail explicitly said 'A review cannot be requested for this restriction,'" she explained. "I do think that it is problematic to automate processes like this without providing any mechanism for a manual override.

Relying on viral social media posts as a sort of backdoor communication channel to the developers should not be the only option

"In this case it's a fairly minor inconvenience (I can just tell my students that the answer is 1), but in a different context it might be a much bigger problem. It's totally normal and understandable for software to have bugs, but that's exactly why there needs to be a mechanism for communicating those bugs back to the developers."

Dolson also took issue with allowing social media to drive customer support.

"Relying on viral social media posts as a sort of backdoor communication channel to the developers should not be the only option – that opens up a heap of equity concerns," she said. "Your ability to receive support for software products should not depend on whether you are sufficiently well connected to technology Twitter."

Netizens reported problems with other numbers, including 0, while the wags over on Hacker News pointed to a mildly relevant Onion article, headlined: "Microsoft Patents Ones, Zeroes."

Because there's always an Onion article where automation drives swathes of the IT world beyond satire. ®

Editor's note: Article updated to include quotes from Dr Emily Dolson.

More about

TIP US OFF

Send us news


Other stories you might like