Machine learning has become a buzzword. A branch of Artificial Intelligence, it adds marketing sparkle to everything from intrusion detection tools to business analytics. What is it, exactly, and how can you code it?
Programs historically produce precise results using linear instructions. Fetch that file, open it, alter this record in these specific ways, then write it back to disk. Machine learning is far less precise.
"In machine learning, the difference is that you show the computer a bunch of data, and it works out what to do with that data to get the result that you want it to get," says Adam Geitgey, formerly a researcher at Groupon and now a consultant in AI who provides some machine learning basics here.
Most machine learning today involves supervised learning. You teach it what to look for by tagging lots of examples as false or positive. Cat pictures are a good example.
You want the computer to recognise pictures of cats to delete, so that you don't have to scroll through the reams of cat memes that your uncle sends you. You source thousands of images online. You identify all of the cat pictures in your collection, and all the pictures that aren't cats. Then you feed the computer the two sets of pictures, and let it produce a statistical model of the cat and not-cat images.
It can then use this model to analyse new pictures and tag them itself. At this point you don't usually see explicitly how the computer does it. It just uses the model from all the other pictures to get it right.
Machine learning systems use several classification techniques, including decision trees, regression, instance-based and Bayesian algorithms. One class of algorithm – neural networks – has broken out into another specialty sub-branch of machine learning called deep learning.
"With machine learning, you'll typically have less data and the problems are less complex," says Alex Champandard, an expert in neural networks who previously worked on artificial intelligence algorithms in the gaming sector. Now he runs Creative.AI, a company developing AI tools to help creative professionals such as designers with decisions. "Deep learning is more about dealing with large quantities of data and more complex problems," he adds.
Deep learning applies many layers to a neural network, and each of them analyses a different aspect of the data. Lower layers analyse details, such as edges and curves in an image, for example, whereas higher layers aggregate this information to find higher-level features ("Is that an ear?" "They look like whiskers to me!").
Microsoft put 152 of these layers together to win the ImageNet large-scale visual recognition competition for the first time in 2015, beating Google, Intel, Qualcomm, startups and academics. The idea of that particular challenge was to correctly find and classify objects in 100,000 photographs from Flickr and search engines and put them in 1,000 object categories. Microsoft beat the completion with an error rate of 3.5 per cent and localisation error rate of 9 per cent.
A couple of factors have accelerated interest in deep learning according to Satya Mallick, founder of AI consulting firm Big Vision.
"First of all, we didn't have access to these large datasets. Deep learning algorithms are data-hungry," he says. Whereas many other machine learning algorithms saturate with too much data, deep learning neural networks can handle far more of it.
The second aspect is computational power. Neural networks need as much of this as they can get, and in the early part of this decade, it became available in the form of graphical processing units (GPUs). The double-precision floating point calculation in GPUs' multiple cores made them great for graphics, but also more recently for other high-performance operations. Nvidia produced CUDA, a platform for more general purpose high-performance computing operations, in 2007. A specialised version for neural networking called CudaNN enabled deep learning researchers to use its chips for neural network analysis.
Different ways of coding
All of which leads us to another question: how do you code this stuff? The average .Net programmer may balk at programming concurrent GPU operations for statistical modelling using C++. Not to worry, says Geitgey – you typically don't do down to that level.
"There's probably only a hundred people in the world that do, and care. That doesn't help you research and solve problems," he says. Instead, others have built machine learning and deep learning frameworks that aggregate the functions in such low-level programs, making them more digestible to programmers.
These frameworks mostly support common languages like Python, meaning that you don't have to learn new ones, argues Geitgey. Your IDE won't change, but some of the underlying skills and concepts will. Develop a basic understanding of statistics and linear algebra, says Geitgey. Known enough calculus to differentiate equations. "You can learn the basics pretty quickly, but there's a big difference between that and real-world results."
Good AI programmers also tend to be good managers, muses Joanna J Bryson, a reader at the University of Bath and sabbatical fellow and affiliate at Princeton's Center for Information Technology Policy, who specialises in natural intelligence. AI isn't about getting clear, black-and-white results first time. You have to be able to go with the flow, and adapt on the fly.
"It isn't as deterministic. It's the cat-herding aspect of management," Bryson says. Training statistical AI models and neural networks is a repetitive process involving lots of trial and error. You have to understand what knobs to tweak during those iterations. "You have a bunch of things that may or may not discover things you were expecting, and you have to think about that as a process."
There's another problem, says Mallick. "Things have changed so fast that people haven't developed good workflows." Programmers may have to grapple with the processes for developing and testing these programs themselves.
Frameworks for AI
At least they won't have to grapple with the inner workings of GPU optimisation while they do it. There are several frameworks to help developers in their machine learning and deep learning work.
Theano, developed at the University of Montreal, is the venerable old dame of machine learning frameworks. It is a Python library with GPU support. One of its big benefits is approachability – it powers large computationally intensive projects, but can also be used in the classroom.
Theano may have preceded it, but still the best-known at this point is likely Google's TensorFlow. The successor to Google's previous DistBelief system supports GPU and CPU operations. It may only have been released in 2015, but Google was smart in making it open source, which has led to significant adoption. With 1,500 or so GitHub projects mentioning it on Github, and with the community adding support for things like Hadoop, this has become the granddaddy of AI frameworks. Google recently updated TensorFlow to version 1.0, with a selection of new features, including faster operation and a new domain-specific compiler for hardware accelerators.
"It's a more general machine learning tool, and there's a steep learning curve," according to Mallick, who reckons it can be tough for beginners. Those struggling with TensorFlow (or even with Theano) might consider Keras, a high-level library for neural network manipulation that sits atop Theano and TensorFlow. "Instead of a thousand lines of code, you're writing one line of code," Mallick says of this tool.
Facebook relies heavily on Torch, a framework it developed in conjunction with many players including Twitter and Yandex. It built this MATLAB-style environment on the LuaJIT language. Because Lua is not a popular language, there's now a Python version of the Torch framework called PyTorch.
Amazon supports other frameworks on AWS, but MXNet is its horse for deep learning. A week after backing MXNet, Amazon backed this up by announcing additional services: Polly, which tackles text-to-speech, Lex, for conversational interfaces, and an image analysis service called Rekognition.
Dan Kara, research director at ABI Research specialising in AI and robotics, argues that Amazon must still work hard to establish MXNet. "It isn't as cutting edge as Microsoft or Google in terms of what you're able to do with it," he says.
Conversely, with the AI services available on Azure, Microsoft has focused on the enterprise for a long time, he argues. Microsoft released a beta of its Microsoft Cognitive Toolkit (formerly CNTK) last October. Originally focused on speech, it has since expanded to more general purpose deep learning tasks.
IBM, which calls itself the "Red Hat of AI", focuses on cobbling together other frameworks in an enterprise implementation for its Power architecture. This package supports Tensorflow, Caffe, Torch and Theano. It runs on Ubuntu.
Microsoft has also focused on building API-based services designed for high-level access via their cloud services
Karl Freund, consulting lead for HPC and Deep Learning at analyst firm Moor Insights & Strategy, said: "What Microsoft is saying is that most enterprises won't spend the time and effort to build a deep learning team. They want to take their existing business processes and supercharge them.
"That entails services like sentiment analysis. You can do all that without having to learn machine learning. You don't have to train neural networks, they've done it for you."
IBM has taken a similar approach, serving up APIs for developers who want to access high-level AI services under its Watson brand. There's a lot of stuff going on under the hood there that gets masked by its Bluemix cloud service.
There are many other frameworks and sets of libraries for AI. Caffe, produced by the Berkeley Vision and Learning Center, uses C++ for speed in its neural network manipulation and focuses mostly on computer vision applications. Chainer is a Python-based library that, like Theano, integrates nicely with NumPy, Python's scientific computing extension for handling large multi-dimensional arrays (which serves as a basis for neural network data structures).
Lesser known are Apache's Singa and its Horn project, still in incubation. UC Berkeley has also created SparkNet, a system for training neural networks on Apache's Spark cluster computing platform. Even Nvidia created DIGITS, a neural network tool for data scientists wanting to use Nvidia GPUs that IBM folded into PowerAI.
The encouraging thing about almost all of these projects is that they're open source, which encourages group collaboration on the technology. Collaboration is particularly important with machine and deep learning because it's all based on configured statistical models, which open-source collaborators can exchange.
After all, once I've modelled my data set, there's no reason that you shouldn't use it, too. Only one person should ever have to classify that many cat pictures. ®