This image-recognition neural net can be trained from 1.2 million pictures in the time it takes to make a cup o' tea

Just 90 seconds, it's claimed, provided a) you have 512 Nvidia V100 GPUs and b) er, no need for accuracy


The shortest time for training a neural network using the popular ImageNet dataset has been slashed again, it is claimed, from the previously held record of four minutes to just one and a half.

Training is arguably the most important and tedious part of deep learning. A small mountain of data to teach a neural network to identify things, or otherwise make decisions from future inputs, is fed into the software, and filtered through multiple layers of intense matrix math calculations to train it. Developers long for snappy turnarounds in the order of minutes, rather than hours or days of waiting, so they can tweak their models for optimum performance, and test them, before the systems can be deployed.

Shorter reeducation sessions also means facial-recognition, voice-recognition, and similar systems can be rapidly updated, tweaked, or improved on-the-fly.

There are all sorts of tricks to shave off training times. A common tactic is to run through the dataset quickly by increasing the batch size so that the model processes more samples per iteration. It decreases the overall accuracy, however, so it’s a bit of a balancing act.

Another tactic is to use a mix of half-precision floating point, aka FP16, as well as single-precision, FP32. This, for one thing, alleviates the memory bandwidth pressure on the GPUs or whatever chips you're using to accelerate the machine-learning math in hardware, though you may face some loss of accuracy.

Researchers at SenseTime, a Hong Kong-based computer-vision startup valued over $1bn, and Nanyang Technological University in Singapore, say they used these techniques to train AlexNet, an image-recognition convolutional neural network, on ImageNet in just 1.5 minutes albeit it with a 58.2 per cent accuracy.

It required 512 of Nvidia’s 2017-era Tesla Volta V100 accelerators, in two physical clusters connected using a 56Gbps network, to crank through more than 1.2 million images in the ImageNet-1K dataset in that time. Each chip can set you back about $10,000, so you may prefer to rent them instead from a cloud provider, if possible.

They also used 16-bit FP16 parameters and gradients in forward-backward computation phases, and 32-bit FP32 values during the model update phase, balancing bandwidth versus accuracy. The training run also completed 95 epochs in the 90 seconds, using a per-GPU batch size of 128 for AlexNet and 65,536 for the full 512 GPU setup.

Someone holding an hourglass of blue sand

This is your four-minute warning: Boffins train ImageNet-based AI classifier in just 240s

READ MORE

The team devised a software toolkit, dubbed GradientFlow, to slash its training times on GPUs, as described in their arXiv-hosted paper, which was emitted earlier this month. Each GPU stores batches of data from ImageNet and crunches through their pixels using gradient descent. The value of the gradients are then passed onto server nodes to update the parameters in the overall model using a type of parallel-processing algorithm known as allreduce.

Trying to ingest these values, or tensors, from hundreds of GPUs at a time will run into bottlenecks. GradientFlow, it is claimed, increases the efficiency of the code by allowing the GPUs to communicate and exchange gradients locally before final values are sent to the model.

“Instead of immediately transmitting generated gradients with allreduce, GradientFlow tries to fuse multiple sequential communication operations into a single one, avoiding sending a huge number of small tensors via network,” the researchers wrote.

"To reduce network traffic, we design coarse-grained sparse communication. Instead of transmitting all gradients in every iteration, GradientFlow only sends important gradients for allreduce at the level of chunk (for example, a chunk may consist of 32K gradients)."

It’s about 2.6 times faster than the previous fastest-known model, developed by researchers at TenCent, a Chinese tech giant, and Hong Kong Baptist University, that took four minutes. ®

Broader topics

Narrower topics


Other stories you might like

  • Running Windows 10? Microsoft is preparing to fire up the update engines

    Winter Windows Is Coming

    It's coming. Microsoft is preparing to start shoveling the latest version of Windows 10 down the throats of refuseniks still clinging to older incarnations.

    The Windows Update team gave the heads-up through its Twitter orifice last week. Windows 10 2004 was already on its last gasp, have had support terminated in December. 20H2, on the other hand, should be good to go until May this year.

    Continue reading
  • Throw away your Ethernet cables* because MediaTek says Wi-Fi 7 will replace them

    *Don't do this

    MediaTek claims to have given the world's first live demo of Wi-Fi 7, and said that the upcoming wireless technology will be able to challenge wired Ethernet for high-bandwidth applications, once available.

    The fabless Taiwanese chip firm said it is currently showcasing two Wi-Fi 7 demos to key customers and industry collaborators, in order to demonstrate the technology's super-fast speeds and low latency transmission.

    Based on the IEEE 802.11be standard, the draft version of which was published last year, Wi-Fi 7 is expected to provide speeds several times faster than Wi-Fi 6 kit, offering connections of at least 30Gbps and possibly up to 40Gbps.

    Continue reading
  • Windows box won't boot? SystemRescue 9 may help

    An ISO image you can burn or drop onto a USB key

    The latest version of an old friend of the jobbing support bod has delivered a new kernel to help with fixing Microsoft's finest.

    It used to be called the System Rescue CD, but who uses CDs any more? Enter SystemRescue, an ISO image that you can burn, or just drop onto your Ventoy USB key, and which may help you to fix a borked Windows box. Or a borked Linux box, come to that.

    SystemRescue 9 includes Linux kernel 5.15 and a minimal Xfce 4.16 desktop (which isn't loaded by default). There is a modest selection of GUI tools: Firefox, VNC and RDP clients and servers, and various connectivity tools – SSH, FTP, IRC. There's also some security-related stuff such as Yubikey setup, KeePass, token management, and so on. The main course is a bunch of the usual Linux tools for partitioning, formatting, copying, and imaging disks. You can check SMART status, mount LVM volumes, rsync files, and other handy stuff.

    Continue reading

Biting the hand that feeds IT © 1998–2022