On-Prem

HPC

Tesla's Dojo supercomputer is a billion-dollar bet to make AI better at driving than humans

More data means better neural net training, but it also means more cores


Tesla says it is spending upwards of $1 billion on its Dojo supercomputer between now and the end of 2024 to help develop autonomous vehicle software.

Dojo was first mentioned by CEO Elon Musk during a Tesla investor day in 2019. It was built specifically for training machine learning models needed for video processing and recognition to enable the vehicles to be self-driving.

During Tesla's Q2 earnings call this week, Musk said Tesla was not going to be "open loop" on its Dojo expenditure, but the sum involved would certainly be "north of a billion through the end of next year."

"In order to copy us, you would also need to spend billions of dollars on training compute," Musk claimed, saying that developing a reliable autonomous driving system is "one of the hottest problems ever."

"You need the data and you need the training computers, the things needed to actually achieve this at scale toward a generalized solution for autonomy."

Musk pointed out that training complex machine learning models needs huge volumes of data, the more the better, and this is what Tesla has access to, thanks to all the telemetry from its vehicles.

"With respect to Autopilot and Dojo, in order to build autonomy, we obviously need to train our neural net with data from millions of vehicles. This has been proven over and over again, the more training data you have, the better the results," he said.

"It barely works at 2 million [training examples]. At 3 million, it's like, wow, OK, we're seeing something. But then, you get to, like, 10 million training examples, it becomes incredible. So there's just no substitute for massive amount of data. And obviously, Tesla has more vehicles on the road collecting this data than all of the other companies combined. I think maybe even an order of magnitude," Musk claimed.

On the Dojo system itself, Musk said it was designed to significantly reduce the cost of neural net training, and has been "somewhat optimized" for the kind of training that Tesla requires, which is video training.

"We see a demand for really vast training resources. And we think we may reach in-house neural net training capability of 100 exaFLOPS by the end of next year," Musk claimed, which is quite a lot of compute power, to put it mildly.

Dojo is based largely on Tesla's own technology, starting with the D1 chip that comprises 354 custom CPU cores. Twenty-five of these D1 chips are interlinked into a 5x5 array inside a "training tile" module, building up to the base Dojo V1 configuration featuring 53,100 D1 cores, according to our colleagues at The Next Platform.

Musk believes that with all of the training data and a "high-efficiency inference computer" in the car, Tesla's autonomous driving system will soon make its vehicles not just as proficient as a human driver, but eventually much better. When? He didn't say and has form in making grand claims.

"To date, over 300 million miles have been driven using FSD [Full Self-Driving] Beta. That 300-million-mile number is going to seem very small, very quickly. And FSD will go from being as good as a human to then being vastly better than a human. We see a clear path to full self-driving being 10 times safer than the average human driver," he claimed.

This is important, Musk explained, because "right now, I believe there's something in the order of a million automotive deaths per year. And if you're 10 times better than a human, that would still mean 100,000 deaths, So, it's like, we'd rather be a hundred times better, and we want to achieve as perfect a safety as possible."

Dojo is not the only supercomputer Tesla has for video training. The company also built a compute cluster equipped with 5,760 Nvidia A100 GPUs, but Musk said they simply couldn't get enough GPUs for the task.

"We'll actually take the hardware as fast as Nvidia will deliver it to us," he said, adding: "If they could deliver us enough GPUs, we might not need Dojo, but they can't because they've got so many customers." ®

Send us news
43 Comments

Google Cloud chief is really psyched about this AI thing

We're on a highway to ML

Arm flexes silicon muscles to push generative AI at the edge

Ethos-U85 microNPU boasts 4x performance boost over previous gen

Developers are calling the shots on AI planning, judging by your experience

And American CIOs keep a closer eye on the purse strings than European equivalents

Tech titans assemble to decide which jobs AI should cut first

But don't worry, if tech takes your job, we'll retrain you

Intel CEO suggests AI can help to create a one-person Unicorn

And possibly replace entire business units too

Microsoft rolls out safety tools for Azure AI. Hint: More models

Defenses against prompt injection, hallucination arrive as Feds eye ML risks

Microsoft puts ex-DeepMind boffin in charge of London AI hub

Follows £2.5 billion pledge to 'upskill' British workers for the new world order

US House mulls forcing AI makers to reveal use of copyrighted training data

Proposed law doesn't include any ban on use of such stuff to build models, mind you

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

All your PC needs for 40 TOPS is an M.2 slot

Despite two previous court victories, Tesla settles third Autopilot liability case

Amount sealed because it would result in 'serious injury,' say company lawyers

Why Microsoft's Copilot will only kinda run locally on AI PCs for now

Redmond’s strategy for blending cloud and client is finally taking shape

What if AI produces code not just quickly but also, dunno, securely, DARPA wonders

As 70% of boffinry nerve center's projects involve machine learning