Meta trains data2vec neural network to grok speech, images, text so it can 'understand the world'

Whatever it takes, Mark


Researchers at Facebook parent's Meta have trained a single AI model capable of processing speech, images, and text in the hope that these so-called multi-modal systems will power the company’s augmented reality and metaverse products.

The model, known as data2vec, can perform different tasks. Given an audio snippet, it can recognize speech. If it’s fed an image, it can classify objects. And when faced with text, it can check the grammar or analyse the writing’s tone and emotions.

AI algorithms are typically trained on one type of data, though data2vec is trained on three different modalities. It still, however, processes each form, whether its speech, images, and text, separately.

Meta believes these multi-modal models will help computers be more adaptable to blend physical and digital environments into one. “People experience the world through a combination of sight, sound and words, and systems like this could one day understand the world the way we do,” Meta CEO Mark Zuckerberg said in a statement to El Reg.

“This will all eventually get built into AR glasses with an AI assistant so, for example, it could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.”

Data2vec is a transformer-based neural network and uses self-supervised learning to learn common patterns in audio, computer vision, and natural language processing. The model learns to operate with different types of data by learning how to predict how the representation of data it’s given; it knows it has to guess the next group of pixels when given an image, or the next speech utterance in audio, or fill in the words in a sentence.

The researchers used a mix of 16 Nvidia V100 and A100 GPUs to train data2vec on 960 hours of speech audio, millions of words from books and Wikipedia pages, and images from ImageNet-1K.

"We train separate models for each modality but the process through which the models learn is identical," Alexei Baevski, a research engineer at Meta AI told The Register.

"We hope that it will enable future work to build high performing self-supervised models that combine modalities and are more effective than specialized models. Different modalities can add additional information to the same piece of content - for example body language from video, prosodic information from audio, and text can combine into a richer representation of a dialog. The algorithms that currently try to combine multi-modal information exist but they do not yet perform well enough to replace specialized algorithms and we hope our work will help change that."

Baevski said in the future multi-modal systems could incorporate a larger range of data to model concepts like smell, 3D objects, or videos. He referred back to the idea of AR glasses helping wearers cook.

"Imagine having a model that has been trained on recordings of thousands of hours of cooking activity from various restaurants and chefs. Then, when you are cooking in a kitchen wearing your AR glasses that have access to this model, it’s able to overlay visual cues for what you need to do next, point out potential mistakes, or explain how adding a particular ingredient will affect the taste of your dish," he told us.

Previous research on multi-modal systems have shown they can be prone to easy adversarial attacks. OpenAI's CLIP model, for example, trained on images and text will identify an image of an apple incorrectly as an iPod if the word "iPod" is in the picture. It's unclear, however, if data2vec suffers from similar weaknesses.

"We have not specifically analyzed how our models will react to adversarial examples but since our current models are trained separately for each modality, we believe that existing research on adversarial attack analysis for each modality would be applicable to our work as well," Baevski said.

"In the future, we hope to use our work to enable high performance algorithms that combine modalities in one model and we plan to study how susceptible they are to adversarial attacks."

When the researchers tested data2vec, it outperformed some top models that had been trained on a specific data type only on different types of tasks. The preliminary results are described in a paper [PDF], and the code has been published on GitHub.

“Data2vec demonstrates that the same self-supervised algorithm can work well in different modalities — and often better than the best existing algorithms,” the researchers explained in a blog post this week.

“This paves the way for more general self-supervised learning and brings us closer to a world where AI might use videos, articles, and audio recordings to learn about complicated subjects, such as the game of soccer or different ways to bake bread. We also hope data2vec will bring us closer to a world where computers need very little labeled data in order to accomplish tasks.” ®


Other stories you might like

  • Suspected phishing email crime boss cuffed in Nigeria
    Interpol, cops swoop with intel from cybersecurity bods

    Interpol and cops in Africa have arrested a Nigerian man suspected of running a multi-continent cybercrime ring that specialized in phishing emails targeting businesses.

    His alleged operation was responsible for so-called business email compromise (BEC), a mix of fraud and social engineering in which staff at targeted companies are hoodwinked into, for example, wiring funds to scammers or sending out sensitive information. This can be done by sending messages that impersonate executives or suppliers, with instructions on where to send payments or data, sometimes by breaking into an employee's work email account to do so.

    The 37-year-old's detention is part of a year-long, counter-BEC initiative code-named Operation Delilah that involved international law enforcement, and started with intelligence from cybersecurity companies Group-IB, Palo Alto Networks Unit 42, and Trend Micro.

    Continue reading
  • Broadcom buying VMware could create an edge infrastructure and IoT empire
    Hypervisor giant too big to be kept ticking over like CA or Symantec. Instead it can wrangle net-connected kit

    Comment Broadcom’s mooted acquisition of VMware looks odd at face value, but if considered as a means to make edge computing and the Internet of Things (IoT) more mature and manageable, and give organizations the tools to drive them, the deal makes rather more sense.

    Edge and IoT are the two coming things in computing and will grow for years, meaning the proposed deal could be very good for VMware’s current customers.

    An Ethernet switch that Broadcom launched this week shows why this is a plausible scenario.

    Continue reading
  • Ex-spymaster and fellow Brexiteers' emails leaked by suspected Russian op
    A 'Very English Coop (sic) d'Etat'

    Emails between leading pro-Brexit figures in the UK have seemingly been stolen and leaked online by what could be a Kremlin cyberespionage team.

    The messages feature conversations between former spymaster Richard Dearlove, who led Britain's foreign intelligence service MI6 from 1999 to 2004; Baroness Gisela Stuart, a member of the House of Lords; and Robert Tombs, an expert of French history at the University of Cambridge, as well as other Brexit supporters. The emails were uploaded to a .co.uk website titled "Very English Coop d'Etat," Reuters first reported this week.

    Dearlove confirmed his ProtonMail account was compromised. "I am well aware of a Russian operation against a Proton account which contained emails to and from me," he said. The Register has asked Baroness Stuart and Tombs as well as ProtonMail for comment. Tombs declined to comment.

    Continue reading
  • As Microsoft's $70b takeover of Activision nears, workers step up their organizing
    This week: Subsidiary's QA staff officially unionize, $18m settlement disputed, and more

    Current and former Activision Blizzard staff are stepping up their organizing and pressure campaigns on execs as the video-game giant tries to close its $68.7bn acquisition by Microsoft.

    Firstly, QA workers at Raven Software – a studio based in Wisconsin that develops the popular first-person shooter series Call of Duty – successfully voted to officially unionize against parent biz Activision. Secondly, a former employee appealed Activision's proposed $18 million settlement with America's Equal Employment Opportunity Commission regarding claims of "sex-based discrimination" and "harassment" of female staff at the corporation. 

    Finally, a group of current and ex-Activision employees have formed a Worker Committee Against Sex and Gender Discrimination to try and improve the company's internal sexual harassment policies. All three events occurred this week, and show how Activision is still grappling with internal revolt as it pushes ahead for Microsoft's takeover. 

    Continue reading
  • Nvidia shares tumble as China lockdown, Russia blamed for dent in outlook
    Sure, stonking server and gaming sales, but hiring and expenses to slow down, too

    Nvidia exceeded market expectations and on Wednesday reported record first-quarter fiscal 2023 revenue of $8.29 billion, an increase of 46 percent from a year ago and eight percent from the previous quarter.

    Nonetheless the GPU goliath's stock slipped by more than nine percent in after-hours trading amid remarks by CFO Colette Kress regarding the business's financial outlook, and plans to slow hiring and limit expenses. Nvidia stock subsequently recovered a little, and was trading down about seven percent at time of publication.

    Kress said non-GAAP operating expenses in the three months to May 1 increased 35 percent from a year ago to $1.6 billion, and were "driven by employee growth, compensation-related costs and engineering development costs."

    Continue reading
  • Millions of people's info stolen from MGM Resorts dumped on Telegram for free
    Meanwhile, Twitter coughs up $150m after using account security contact details for advertising

    Miscreants have dumped on Telegram more than 142 million customer records stolen from MGM Resorts, exposing names, postal and email addresses, phone numbers, and dates of birth for any would-be identity thief.

    The vpnMentor research team stumbled upon the files, which totaled 8.7 GB of data, on the messaging platform earlier this week, and noted that they "assume at least 30 million people had some of their data leaked." MGM Resorts, a hotel and casino chain, did not respond to The Register's request for comment.

    The researchers reckon this information is linked to the theft of millions of guest records, which included the details of Twitter's Jack Dorsey and pop star Justin Bieber, from MGM Resorts in 2019 that was subsequently distributed via underground forums.

    Continue reading
  • DuckDuckGo tries to explain why its browsers won't block some Microsoft web trackers
    Meanwhile, Tails 5.0 users told to stop what they're doing over Firefox flaw

    DuckDuckGo promises privacy to users of its Android, iOS browsers, and macOS browsers – yet it allows certain data to flow from third-party websites to Microsoft-owned services.

    Security researcher Zach Edwards recently conducted an audit of DuckDuckGo's mobile browsers and found that, contrary to expectations, they do not block Meta's Workplace domain, for example, from sending information to Microsoft's Bing and LinkedIn domains.

    Specifically, DuckDuckGo's software didn't stop Microsoft's trackers on the Workplace page from blabbing information about the user to Bing and LinkedIn for tailored advertising purposes. Other trackers, such as Google's, are blocked.

    Continue reading
  • Despite 'key' partnership with AWS, Meta taps up Microsoft Azure for AI work
    Someone got Zuck'd

    Meta’s AI business unit set up shop in Microsoft Azure this week and announced a strategic partnership it says will advance PyTorch development on the public cloud.

    The deal [PDF] will see Mark Zuckerberg’s umbrella company deploy machine-learning workloads on thousands of Nvidia GPUs running in Azure. While a win for Microsoft, the partnership calls in to question just how strong Meta’s commitment to Amazon Web Services (AWS) really is.

    Back in those long-gone days of December, Meta named AWS as its “key long-term strategic cloud provider." As part of that, Meta promised that if it bought any companies that used AWS, it would continue to support their use of Amazon's cloud, rather than force them off into its own private datacenters. The pact also included a vow to expand Meta’s consumption of Amazon’s cloud-based compute, storage, database, and security services.

    Continue reading

Biting the hand that feeds IT © 1998–2022