Don't scrape the faces of our citizens for recognition, Canada tells Clearview AI – delete those images

Plus: Check if your Flickr photos are in facial recognition engines and and the list of NSFW words for AI


Canada’s privacy watchdog has found Clearview AI in “clear violation” of the country’s privacy laws, and has told the facial-recognition startup to stop scraping images of Canadians and delete all existing photos it has on those citizens.

The Office of the Privacy Commissioner of Canada launched an official investigation into the upstart’s practices, and as a result Clearview stopped selling its software to Canadian police.

“Clearview's massive collection of millions of images without the consent or knowledge of individuals for the purpose of marketing facial recognition services does not comply with Quebec's privacy or biometric legislation,” said Diane Poitras, President of the Quebec Commission on Access to Information, a government organization involved in the investigation.

The startup was told to stop taking people’s photos to train its facial-recognition software, delete all the ones it has collected from people in Canada, and to not sell its services to any Canadian customers. New-York-based Clearview, however, argued that it does not have a “real and substantial connection” to the country so shouldn’t need to abide by its laws, and that consent was not needed to scrape the photos since they’re all publicly available anyway.

Have your Flickr photos been used to train a facial recognition model?

AI researchers have built an online tool that allows people to check if their selfies have been used to secretly train facial-recognition software.

Exposing.ai – built by developer and artist Adam Harvey, and Liz O’Sullivan, technology director at privacy rights group the Surveillance Technology Oversight Project – looked through AI training datasets built from scraping creative-commons-licensed photos on photo-sharing site Flickr. They tracked down the URL for each photo and put it into a database, and users can look through the data by searching for a specific URL, image hashtag, or Flickr username.

If there’s a hit, then the image is present in at least one of the six datasets used to teach machines how to identify faces. “People need to realize that some of their most intimate moments have been weaponized,” O’Sullivan told the NYT. “The potential for harm seemed too great.”

You can use the tool here.

The List of Dirty, Naughty, Obscene, and Otherwise Bad Words AI researchers use to filter data

The best way to prevent machine-learning models from generating any text or images that are too racy and lewd is to not train the software on data that is, well, too racy or lewd.

One way that researchers do this is by automatically screening any data that contains or is related to x-rated subject areas that they want their models to avoid. Enter the List of Dirty, Naughty, Obscene, and Otherwise Bad Words, known as LDNOOBW, a handy checklist containing indecent words, and now shared on GitHub.

Created first by folks over at Shutterstock, the stock image biz, the list contains hundreds of words in numerous languages so far, and is now employed by other tech companies like Slack and Google, Wired reported.

Colossal Clean Crawled Corpus, the popular text dataset used to train large language models, uses LDNOOBW to filter out webpages containing those words. The idea is that words like ‘busty’ or ‘kinky’ are more likely to be associated with pornographic sites and are blocked from the training data. But some critics believe censoring some words means that these algorithms will have no knowledge of some human sexualities that are traditionally underrepresented.

Do you need an AI algo to help you code at work?

Kite, a startup focused on building autocomplete tools for programmers using machine learning, now has support specifically for developers on the job. Companies can now pay for an enterprise license to use the software at work, in other words.

It costs $40 per user per month, $10 more than its llicense for individuals. Students are allowed to use it for free.

The enterprise version, known as Kite Team Server, is more powerful and runs on GPU servers rather than CPU ones. The software can also be trained on a company’s proprietary codebase to come up with suggestions based on custom code.

CEO Adam Smith, told The Register, that people’s code is always kept private.

“Kite Team Server custom-trains ML models on a GPU behind the company's firewall. Kite Team Server ensures code stays private and secure by keeping it behind the firewall.” None of the inputs and outputs generated by its tools are stored on its servers or shared.

You can read more about it here. ®

Similar topics


Other stories you might like

  • Warehouse belonging to Chinese payment terminal manufacturer raided by FBI

    PAX Technology devices allegedly infected with malware

    US feds were spotted raiding a warehouse belonging to Chinese payment terminal manufacturer PAX Technology in Jacksonville, Florida, on Tuesday, with speculation abounding that the machines contained preinstalled malware.

    PAX Technology is headquartered in Shenzhen, China, and is one of the largest electronic payment providers in the world. It operates around 60 million point-of-sale (PoS) payment terminals in more than 120 countries.

    Local Jacksonville news anchor Courtney Cole tweeted photos of the scene.

    Continue reading
  • Everything you wanted to know about modern network congestion control but were perhaps too afraid to ask

    In which a little unfairness can be quite beneficial

    Systems Approach It’s hard not to be amazed by the amount of active research on congestion control over the past 30-plus years. From theory to practice, and with more than its fair share of flame wars, the question of how to manage congestion in the network is a technical challenge that resists an optimal solution while offering countless options for incremental improvement.

    This seems like a good time to take stock of where we are, and ask ourselves what might happen next.

    Congestion control is fundamentally an issue of resource allocation — trying to meet the competing demands that applications have for resources (in a network, these are primarily link bandwidth and router buffers), which ultimately reduces to deciding when to say no and to whom. The best framing of the problem I know traces back to a paper [PDF] by Frank Kelly in 1997, when he characterized congestion control as “a distributed algorithm to share network resources among competing sources, where the goal is to choose source rate so as to maximize aggregate source utility subject to capacity constraints.”

    Continue reading
  • How business makes streaming faster and cheaper with CDN and HESP support

    Ensure a high video streaming transmission rate

    Paid Post Here is everything about how the HESP integration helps CDN and the streaming platform by G-Core Labs ensure a high video streaming transmission rate for e-sports and gaming, efficient scalability for e-learning and telemedicine and high quality and minimum latencies for online streams, media and TV broadcasters.

    HESP (High Efficiency Stream Protocol) is a brand new adaptive video streaming protocol. It allows delivery of content with latencies of up to 2 seconds without compromising video quality and broadcasting stability. Unlike comparable solutions, this protocol requires less bandwidth for streaming, which allows businesses to save a lot of money on delivery of content to a large audience.

    Since HESP is based on HTTP, it is suitable for video transmission over CDNs. G-Core Labs was among the world’s first companies to have embedded this protocol in its CDN. With 120 points of presence across 5 continents and over 6,000 peer-to-peer partners, this allows a service provider to deliver videos to millions of viewers, to any devices, anywhere in the world without compromising even 8K video quality. And all this comes at a minimum streaming cost.

    Continue reading
  • Cisco deprecates Microsoft management integrations for UCS servers

    Working on Azure integration – but not there yet

    Cisco has deprecated support for some third-party management integrations for its UCS servers, and emerged unable to play nice with Microsoft's most recent offerings.

    Late last week the server contender slipped out an end-of-life notice [PDF] for integrations with Microsoft System Center's Configuration Manager, Operations Manager, and Virtual Machine Manager. Support for plugins to VMware vCenter Orchestrator and vRealize Orchestrator have also been taken out behind an empty rack with a shotgun.

    The Register inquired about the deprecations, and has good news and bad news.

    Continue reading
  • Protonmail celebrates Swiss court victory exempting it from telco data retention laws

    Doesn't stop local courts' surveillance orders, though

    Encrypted email provider Protonmail has hailed a recent Swiss legal ruling as a "victory for privacy," after winning a lawsuit that sees it exempted from data retention laws in the mountainous realm.

    Referring to a previous ruling that exempted instant messaging services from data capture and storage laws, the Protonmail team said this week: "Together, these two rulings are a victory for privacy in Switzerland as many Swiss companies are now exempted from handing over certain user information in response to Swiss legal orders."

    Switzerland's Federal Administrative Court ruled on October 22 that email providers in Switzerland are not considered telecommunications providers under Swiss law, thereby removing them from the scope of data retention requirements imposed on telcos.

    Continue reading
  • Japan picks AWS and Google for first gov cloud push

    Local players passed over for Digital Agency’s first project

    Japan's Digital Agency has picked Amazon Web Services and Google Cloud for its first big reform push.

    The Agency started operations in September 2021, years after efforts like the UK's Government Digital Service (GDS) or Australia's Digital Transformation Agency (DTA). The body was a signature reform initiated by Prime Minister Yoshihide Suga, who spent his year-long stint in the top job trying to curb Japan's reliance on paper documents, manual processes, and faxes. Japan's many government agencies also operated their websites independently of each other, most with their own design and interface.

    The new Agency therefore has a remit to "cut across all ministries" and "provide services that are driven not toward ministries, agency, laws, or systems, but toward users and to improve user-experience".

    Continue reading
  • Singaporean minister touts internet 'kill switch' that finds kids reading net nasties and cuts 'em off ASAP

    Fancies a real-time crowdsourced content rating scheme too

    A Minister in the Singapore government has suggested the creation of an internet kill switch that would prevent minors from reading questionable material online – perhaps using ratings of content created in real time by crowdsourced contributors.

    "The post-COVID world will bring new challenges globally, including to us in the security arena," said Minister for Defence Dr Ng Eng Hen at a Tuesday ceremony to award the city-state's 2021 Defense Technology Prize.

    "For operations, the SAF (Singapore Armed Force) has to expand its capabilities in the digital domain. Whether for administrative or operational purposes, I think that we will need to leverage technology to the maximum," he declared.

    Continue reading
  • China Telecom booted out of USA as Feds worry it could disrupt or spy on local networks

    FCC urges more action against Huawei and DJI, too

    The US Federal Communications Commission (FCC) has terminated China Telecom's authority to provide communications services in the USA.

    In its announcement of the termination, the government agency explained the decision is necessary because the national security environment has changed in the years since 2002. That was when China Telecom was first allowed to operate in the USA.

    The FCC now believes – partly based on classified advice from national security agencies – that China Telecom can "access, store, disrupt, and/or misroute US communications, which in turn allow them to engage in espionage and other harmful activities against the United States." And because China Telecom is state-controlled, China's government can compel the carrier to act as it sees fit, without judicial review or oversight.

    Continue reading

Biting the hand that feeds IT © 1998–2021