Don't scrape the faces of our citizens for recognition, Canada tells Clearview AI – delete those images
Plus: Check if your Flickr photos are in facial recognition engines and and the list of NSFW words for AI
Canada’s privacy watchdog has found Clearview AI in “clear violation” of the country’s privacy laws, and has told the facial-recognition startup to stop scraping images of Canadians and delete all existing photos it has on those citizens.
The Office of the Privacy Commissioner of Canada launched an official investigation into the upstart’s practices, and as a result Clearview stopped selling its software to Canadian police.
“Clearview's massive collection of millions of images without the consent or knowledge of individuals for the purpose of marketing facial recognition services does not comply with Quebec's privacy or biometric legislation,” said Diane Poitras, President of the Quebec Commission on Access to Information, a government organization involved in the investigation.
The startup was told to stop taking people’s photos to train its facial-recognition software, delete all the ones it has collected from people in Canada, and to not sell its services to any Canadian customers. New-York-based Clearview, however, argued that it does not have a “real and substantial connection” to the country so shouldn’t need to abide by its laws, and that consent was not needed to scrape the photos since they’re all publicly available anyway.
Have your Flickr photos been used to train a facial recognition model?
AI researchers have built an online tool that allows people to check if their selfies have been used to secretly train facial-recognition software.
Exposing.ai – built by developer and artist Adam Harvey, and Liz O’Sullivan, technology director at privacy rights group the Surveillance Technology Oversight Project – looked through AI training datasets built from scraping creative-commons-licensed photos on photo-sharing site Flickr. They tracked down the URL for each photo and put it into a database, and users can look through the data by searching for a specific URL, image hashtag, or Flickr username.
If there’s a hit, then the image is present in at least one of the six datasets used to teach machines how to identify faces. “People need to realize that some of their most intimate moments have been weaponized,” O’Sullivan told the NYT. “The potential for harm seemed too great.”
You can use the tool here.
The List of Dirty, Naughty, Obscene, and Otherwise Bad Words AI researchers use to filter data
The best way to prevent machine-learning models from generating any text or images that are too racy and lewd is to not train the software on data that is, well, too racy or lewd.
One way that researchers do this is by automatically screening any data that contains or is related to x-rated subject areas that they want their models to avoid. Enter the List of Dirty, Naughty, Obscene, and Otherwise Bad Words, known as LDNOOBW, a handy checklist containing indecent words, and now shared on GitHub.
Created first by folks over at Shutterstock, the stock image biz, the list contains hundreds of words in numerous languages so far, and is now employed by other tech companies like Slack and Google, Wired reported.
Colossal Clean Crawled Corpus, the popular text dataset used to train large language models, uses LDNOOBW to filter out webpages containing those words. The idea is that words like ‘busty’ or ‘kinky’ are more likely to be associated with pornographic sites and are blocked from the training data. But some critics believe censoring some words means that these algorithms will have no knowledge of some human sexualities that are traditionally underrepresented.
Do you need an AI algo to help you code at work?
Kite, a startup focused on building autocomplete tools for programmers using machine learning, now has support specifically for developers on the job. Companies can now pay for an enterprise license to use the software at work, in other words.
It costs $40 per user per month, $10 more than its llicense for individuals. Students are allowed to use it for free.
The enterprise version, known as Kite Team Server, is more powerful and runs on GPU servers rather than CPU ones. The software can also be trained on a company’s proprietary codebase to come up with suggestions based on custom code.
CEO Adam Smith, told The Register, that people’s code is always kept private.
“Kite Team Server custom-trains ML models on a GPU behind the company's firewall. Kite Team Server ensures code stays private and secure by keeping it behind the firewall.” None of the inputs and outputs generated by its tools are stored on its servers or shared.
You can read more about it here. ®