Brouhaha over IBM using Flickr faces for AI training, big trouble in not-so-little China for Microsoft, and more

Plus: It's time to take AI to school

Roundup Hello, here's a quick lowdown on what's been going on in AI beyond what we've already reported lately.

Where did you get the data from? IBM was blasted this week for using millions of photos from Flickr, the popular image sharing website, to create a dataset aimed at making facial recognition systems less biased.

These machine-learning models need to be trained on millions of images, and it’s not easy to get your hands on that sort of data, so many developers just end up scraping them off the internet. Sometimes corner-cutting researchers take them from dating or social media websites without consent, and promise they’re processed in a way that keeps them private. And other times they’re lifted from platforms like Flickr with appropriate licensing rights.

IBM created its own Diversity in Faces dataset, which was made up of a million labelled Creative-Commons-licensed photos downloaded from Flickr. Under these conditions, the images could be used in any way as long as it was not for commercial purposes. IBM said its dataset was meant to be an academic resource, and was not publicly available for download or sale. You had to request access to the dataset.

Thus, the snaps were legally above board: they were uploaded to Flickr by users who marked them as available for non-commercial reuse, and IBM had reused them, as per the CC license, in its not-for-sale dataset.

Nevertheless, some people freaked out when they were told their photos were being used to train facial recognition systems, according to an NBC report. It seems people were happy for their photos to be shared, but didn't expect them to be used to train software. The photographers found it difficult to get their images removed from IBM's dataset, too, and it was impossible to delete them from copies that had already been given to eggheads.

The Creative Commons group responded here, stating that "fair use allows all types of content to be used freely."

It raised an interesting question: even if your data has been removed from a training dataset, is it impossible to delete it from systems that have already been trained on it?

And it raises a legitimate concern: for example, the US government, academics, and businesses have used datasets of photos of immigrants, abused children, and dead people to test their facial recognition systems, all without consent, according to separate research.

AI business school: Ah yes, AI and business. It’s all anyone ever wants to talk about these days in Silicon Valley.

But do you need to use AI in your company? (Most likely, no). Do you still want to use anyway? (Probably, yes). Well, here’s something that might be useful for you. Microsoft have released a free online course dubbed AI busines school for “business leaders to lead with confidence in the age of AI.”

“There is a gap between what people want to do and the reality of what is going on in their organizations today, and the reality of whether their organization is ready,” Mitra Azizirad, corporate vice president for AI marketing at Microsoft in Redmond, Washington, said this week.

“Developing a strategy for AI extends beyond the business issues,” she explained. “It goes all the way to the leadership, behaviors and capabilities required to instill an AI-ready culture in your organization.”

It’s different from most AI courses. AI business school is aimed at non-techies that are more interested in learning how deploying these systems will affect a company’s strategy and structure, and doesn’t teach you much about the technology itself.

You can start AI business school here.

Microsoft denies partnering with a Chinese surveillance org: Since we’re on the subject of Redmond, it has denied working with SenseTime, an asian AI unicorn startup, linked with the Chinese government in spying on China’s minority Muslim population.

SenseTime, valued over a billion dollars is one of the most successful computer vision companies in AI. It set up an office in Xinjiang, a region on the Northwest of China bordering Tajikistan and Kyrgyzstan, home to the largest Uyghur population. The startup noted Microsoft as one its partners on its website, leading the New Statesman to quiz the nature of their partnership.

Microsoft has denied it’s working with Sensetime, however. A spokesperson said that SenseTime was using its logo on its website without permission and has since asked for it to be removed.

And finally... In case you missed it, OpenAI, a leading San-Francisco-based AI non-profit research lab, has created a for-profit arm that hopes to raise enough capital from investors to continue its expensive work to develop artificial general intelligence in a way that doesn't harm humanity – a move that has proved divisive. And here's a deep look into DeepMind, Google's AI stablemate. ®

Other stories you might like

  • Minimal, systemd-free Alpine Linux releases version 3.16
    A widespread distro that many of its users don't even know they have

    Version 3.16.0 of Alpine Linux is out – one of the most significant of the many lightweight distros.

    Version 3.16.0 is worth a look, especially if you want to broaden your skills.

    Alpine is interesting because it's not just another me-too distro. It bucks a lot of the trends in modern Linux, and while it's not the easiest to set up, it's a great deal easier to get it working than it was a few releases ago.

    Continue reading
  • Verizon: Ransomware sees biggest jump in five years
    We're only here for DBIRs

    The cybersecurity landscape continues to expand and evolve rapidly, fueled in large part by the cat-and-mouse game between miscreants trying to get into corporate IT environments and those hired by enterprises and security vendors to keep them out.

    Despite all that, Verizon's annual security breach report is again showing that there are constants in the field, including that ransomware continues to be a fast-growing threat and that the "human element" still plays a central role in most security breaches, whether it's through social engineering, bad decisions, or similar.

    According to the US carrier's 2022 Data Breach Investigations Report (DBIR) released this week [PDF], ransomware accounted for 25 percent of the observed security incidents that occurred between November 1, 2020, and October 31, 2021, and was present in 70 percent of all malware infections. Ransomware outbreaks increased 13 percent year-over-year, a larger increase than the previous five years combined.

    Continue reading
  • Slack-for-engineers Mattermost on open source and data sovereignty
    Control and access are becoming a hot button for orgs

    Interview "It's our data, it's our intellectual property. Being able to migrate it out those systems is near impossible... It was a real frustration for us."

    These were the words of communication and collaboration platform Mattermost's founder and CTO, Corey Hulen, speaking to The Register about open source, sovereignty and audio bridges.

    "Some of the history of Mattermost is exactly that problem," says Hulen of the issue of closed source software. "We were using proprietary tools – we were not a collaboration platform before, we were a games company before – [and] we were extremely frustrated because we couldn't get our intellectual property out of those systems..."

    Continue reading

Biting the hand that feeds IT © 1998–2022