The state of today's machine learning: Short, wide, deep but not high
A gentle guide to where we're at with AI
Comment Remember that kid in middle school who was deeply into Dungeons & Dragons, and hadn't seen his growth spurt yet? Machine learning is sort of like that kid – deep, wide, and short – and not so tall.
But on the serious side, machine learning today is useful for a wide variety of pattern recognition problems, including the following:
- Image classification.
- Speech processing.
- Handwriting recognition.
- Text processing.
- Threat assessment.
- Fraud detection.
- Language translation.
- Self-driving vehicles.
- Medical diagnosis.
- Sentiment analysis.
- Stock trading.
Deep learning, a subset of machine learning, has progressed rapidly during the past decade due to:
- Big data – an increased availability of large data sets for training and deployment has also driven the need for deeper nets.
- Deeper nets – deep neural nets have multiple layers, and often possess higher-order architecture (width) within a given layer.
- Clever training – it was discovered that a large dose of unsupervised learning in the earlier stages of training allowed for the net to do its own automated, lower-level feature recognition and extraction, and pass those features on to the next stage for higher-level feature recognition.
- High performance computing – clustered systems, enhanced with accelerator technology, have become essential to training large deep nets.
In deep learning, the key computational kernels involve linear algebra, including matrix and tensor arithmetic. A deep neural net can have millions or even billions of parameters due to their rich connectivity. While depth refers to the number of layers, the layers can also be quite wide – with hundreds to thousands of neurons in a given layer. The weights of these connections must be adjusted iteratively until a solution is reached in a space of very high dimensionality.
Because of the large number of parameters and the generally modest accuracy required for the final output – is this image a cat? or is this a fraudulent application? – low-precision arithmetic typically suffices. Training can be successful with floating point half precision (16 bits) or with fixed point or integers (as low as 8 bits in some cases). This is the short aspect.
Yann LeCun, one of the pioneers of deep learning, has noted: "Getting excellent results on ImageNet is easily achieved with a convolutional net with something like 8- to 16-bit precision on the neuron states and weights."
The dominance of linear algebra kernels plus short precision indicates that accelerator hardware is extremely useful in deep learning. Overall, the class of problems being addressed is that of very high-order optimization problems with very large input data sets – it is thus natural that deep learning has entered the realm of high-performance computing.
Major requirements are highly scalable performance, high memory bandwidth, low power consumption, and excellent short arithmetic performance. The requisite computational resources are clusters whose nodes are populated with a sufficient number of accelerators. These provide the needed performance while keeping power consumption low. Nvidia GPUs are the most popular acceleration technology in deep learning today.
Since the advent of the Pascal version of their CPU and CUDA 7.5, Nvidia has added half precision support, specifically for deep learning. With half precision or 16-bit floating point, the peak performance is double that obtained with 32-bit. In their marketing of the DGX-1 "deep learning supercomputer," Nvidia touts the higher 170 Teraflops peak rate that is based on FP16 half-precision.
Other alternatives beyond GPUs are often based on ASICs and FPGAs. From Intel we have Altera FPGAs, Nervana Engines (being acquired), and Movidius VPUs (being acquired), as well as the Knights Mill (the next-generation 2017 version of Phi). From other companies, solutions include Alphabet's Google TPUs, Wave Computing DPUs, DeePhi Tech DPUs, and IBM's TrueNorth neuromorphic chips. All of these technologies have enhanced performance for reduced precision arithmetic.
So like that dweeby middle school kid, machine learning is deep, wide, and short. But for it to grow, it will continue to depend on flexible HPC compute solutions – particularly accelerators – be they GPUs, FPGAs, ASICs or some other brand new chippy solution. ®