Despite 'key' partnership with AWS, Meta taps up Microsoft Azure for AI work

Someone got Zuck'd

Meta’s AI business unit set up shop in Microsoft Azure this week and announced a strategic partnership it says will advance PyTorch development on the public cloud.

The deal [PDF] will see Mark Zuckerberg’s umbrella company deploy machine-learning workloads on thousands of Nvidia GPUs running in Azure. While a win for Microsoft, the partnership calls in to question just how strong Meta’s commitment to Amazon Web Services (AWS) really is.

Back in those long-gone days of December, Meta named AWS as its “key long-term strategic cloud provider." As part of that, Meta promised that if it bought any companies that used AWS, it would continue to support their use of Amazon's cloud, rather than force them off into its own private datacenters. The pact also included a vow to expand Meta’s consumption of Amazon’s cloud-based compute, storage, database, and security services.

The AWS-Meta team up also included a collaboration to optimize workloads using the PyTorch machine learning framework — which Meta, then Facebook, released in 2016 — for deployment in the cloud provider’s Elastic Compute Cloud and SageMaker services.

It appears Meta is more than happy to play the field, though, deploying workloads wherever it pleases. Guess that's what they call multi-cloud; it also demonstrates the difference between "key" and "exclusive."

The announcement this week revealed Meta began deploying workloads on Azure’s Nvidia A100-accelerated instances to train AI models in 2021. Meta now plans to expand deployments on Azure to a dedicated cluster consisting of 5,400 of Nvidia’s 80GB A100 GPUs to accelerate AI research and development on “cutting-edge ML training workloads” for its AI business unit.

In fact, the social media giant says it trained its 175 billion parameter OPT-175B natural-language processing transformer model, released in March, in Azure.

“With Azure’s compute power and 1.6TB/s of interconnect bandwidth per VM, we are able to accelerate our ever-growing training demands to better accommodate larger and more innovative AI models,” Meta’s VP of AI Jerome Pesenti said in a statement.

While neither Microsoft nor Meta provided specifics as to how exactly the massive GPU cluster will be used moving forward, there’s a fair chance that, much like the social media giant’s earlier AWS collab, it’ll involve PyTorch.

In addition to the infrastructure deal, Meta said it would collaborate with Microsoft to “scale PyTorch adoption on Azure.”

“We’re happy to work with Microsoft in extending our experience to their customers using PyTorch in their journey from research to production,” Pesenti said.

Later this year, Microsoft plans to roll out PyTorch development accelerators it says will make it easier to deploy the framework on Azure.

The Register reached out to Meta for further comment; we’ll let you know if we hear back. ®

Other stories you might like

  • FabricScape: Microsoft warns of vuln in Service Fabric
    Not trying to spin this as a Linux security hole, surely?

    Microsoft is flagging up a security hole in its Service Fabric technology when using containerized Linux workloads, and urged customers to upgrade their clusters to the most recent release.

    The flaw is tracked as CVE-2022-30137, an elevation-of-privilege vulnerability in Microsoft's Service Fabric. An attacker would need read/write access to the cluster as well as the ability to execute code within a Linux container granted access to the Service Fabric runtime in order to wreak havoc.

    Through a compromised container, for instance, a miscreant could gain control of the resource's host Service Fabric node and potentially the entire cluster.

    Continue reading
  • Nvidia wants to lure you to the Arm side with fresh server bait
    GPU giant promises big advancements with Arm-based Grace CPU, says the software is ready

    Interview 2023 is shaping up to become a big year for Arm-based server chips, and a significant part of this drive will come from Nvidia, which appears steadfast in its belief in the future of Arm, even if it can't own the company.

    Several system vendors are expected to push out servers next year that will use Nvidia's new Arm-based chips. These consist of the Grace Superchip, which combines two of Nvidia's Grace CPUs, and the Grace-Hopper Superchip, which brings together one Grace CPU with one Hopper GPU.

    The vendors lining up servers include American companies like Dell Technologies, HPE and Supermicro, as well Lenovo in Hong Kong, Inspur in China, plus ASUS, Foxconn, Gigabyte, and Wiwynn in Taiwan are also on board. The servers will target application areas where high performance is key: AI training and inference, high-performance computing, digital twins, and cloud gaming and graphics.

    Continue reading
  • Nvidia, Siemens tout 'industrial metaverse' to predict the future
    Using Pixar-derived tech to make digital twins immersive

    Siemens and Nvidia don’t want manufacturers to imagine what the future will hold – they want to build a fancy digital twin that helps them to make predictions about whatever comes next.

    During a press conference this week, Siemens CEO Roland Busch painted a picture of a future in which manufacturers are besieged with productivity, labor, and supply chain disruptions.

    "The answer to all of these challenges is technology and digitalization," he said. "The point is, we have to make the digital twin as realistic as possible and bring it as close as possible to the real world."

    Continue reading

Biting the hand that feeds IT © 1998–2022