This article is more than 1 year old
Is this a hotdog? What it takes for an AI to answer that might surprise you
Don't boil the ocean
Artificial intelligence isn't going away. Even if the hype abates its presence will have succeeded in raising awareness of a smorgasbord of interlinking concepts, technologies and ideas – neural networks and machine learning, cognitive intelligence, recommendation engines, big data, statistics and analysis – that together let computers and software do more of the thinking and acting for us.
Being in tech means you are naturally fascinated by the new. The problem is, however, that you've spent the last few years programming distinctly unintelligent apps – enterprise servers for back office or web, systems that control batch transfer jobs or calculating energy models.
How big a leap will you have to make to break into the AI strata?
First things first: not all AI jobs are equal, and the skills you need as an AI practitioner will vary depending on role. The spectrum of roles runs from devising, producing and refining the machine learning and deep learning algorithms that often underpin AI frameworks to the actual raw coding.
The former will typically need low-level programming skills and PhD-level academic chops. A bit of neurobiology won't hurt, for example. That's why it's a highly rarefied area in which practitioners can write their own ticket. The latter will develop AI-capable software, which takes machine learning concepts and applies them to input data. In doing so, they must understand how to do two things, broadly: train AI algorithms on lots of data, and then deploy the trained models to go and analyse new things.
One thing that quickly becomes clear, therefore, is there's a lot of data handling. That doesn't mean that just being a data scientist will cut it, though. There's a big difference between data scientists and machine learning engineers in the AI space. The former focus on sourcing, scrubbing and understanding data, and then building stuff with it. The latter, he says, have stronger software development skills but also understand how to monitor and tweak machine learning systems in production.
So where does the software development fit? There isn't much to speak of, according to AI consultant Adam Geitgey, who maintains that the coding isn't the hardest part. "Usually you spend 70-80 per cent of your time getting the data, cleaning up the data, trying different settings on the model to get it to work with your data, until you get good results," he says. "When you're writing the code, usually there's not that much code."
To hotdog or not to hotdog?
We can see this in some examples of AI. Read this blow-by-blow account of developing an app designed to support a TV joke. The app in question – the Silicon Valley show's Not Hotdog software – just recognised when a picture was a hotdog and when it wasn't.
In spite of it being a skit in a TV show, the people behind it actually built it to work. It may have limited real-world value, but it's a good technical challenge nonetheless.
Tim Anglade, who acts as technology consultant on the series, hadn't done much AI at all before. A Ruby and Java programmer, he threw himself in at the deep end. "I don't think I even started taking lessons until after I built the first prototype," he says. "I didn't know the first thing about neural networks or back propagation or any of those core concepts that you need to know to use machine learning. If you're comfortable with math[s] or with programming, you really have a good chance to build an AI application. The trick is finding what to build."
He and his team built the initial prototype using two components: React Native (which is a mobile app framework for JavaScript and has little to do with AI) and the Google Cloud Vision API (which does). They had to abandon this because Google's image matching (which extends across tens of thousands of images of different things) wasn't as fast or as accurate as they needed it to be.
In any case, they decided to do the inference (the part of the AI that uses its training to look at new pictures and decide whether they are hot dogs) on the phone rather than in the cloud. That's when things got more complex.
This highlights a couple of levels of understanding in AI. Some AI problems can be black-boxed. If the problem you're trying to solve is simple enough, then you can just load a pre-trained model, call an API and use the result in your software app. If you have to go in and tinker with neural networks and signals with hundreds of input parameters, then you'll probably have to roll up your sleeves and lift the hood. That's what happened next with Anglade's app.
Suddenly this is more complicated than anticipated...
They switched from a simple cloud API to Google's TensorFlow framework, and then had to push many rocks uphill to get where they needed. Most of them were data-related. They had to retrain the Inception neural architecture used atop TensorFlow to make it excellent at recognising hot dogs rather than pretty good at recognising lots of different things.
They realised that there were some things they couldn't easily do with the retraining tool that they were using on top of TensorFlow. They had relatively few hot dog pictures but mountains of non-hotdog pictures (you need both to train the neural network). The disparity meant that they had to weight the network to favour hot dogs. To do that, they had to use Keras, which is an API that can plug into TensorFlow and other deep learning frameworks.
"For most people who are starting out, the library that people are using is Keras," says data scientist Ben Lorica. The nice thing about it is that it's a high-level tool that supports different frameworks – the underlying libraries that do the guts of the work, such as setting up and running the neural networks. There are many different frameworks to choose from, each with their own capabilities. Along with TensorFlow, Keras supports CNTK and Theano.
Keras is a Python API. That's not unusual, says Karl Freund, consulting lead for HPC and deep learning at analyst firm Moor Insights & Strategy.
"The primary language that people are using [for coding AI] now is Python," he says. For example, although its core is written in C++ with internal support for NVIDIA's CUDA GPGPU-based parallelised computing platform, developers won't typically see any of that. Instead, it exposes a Python API that developers can use to express and manipulate neural network models.
Python's popularity for manipulating neural network and machine learning models doesn't mean that you can't express such models in Java or other languages, though. "A variety of languages can be used to program these frameworks, including PHP," adds Freund. You'll find C++, Haskell, Java, Go and Rust in there, and even support for R.
What this means is that you don't necessarily need to be a C++ expert to get started in AI. When you get to the point where you're deploying, say, microsecond-latency inference models in autonomous cars, you might need to redeploy higher-level code directly in those lower-level, high-performance languages, says Freund. Most of us won't ever need that, though.
One thing you won't find (at least not yet) in Tensorflow or many other frameworks is support for LISP or Prolog, yet these were the darling languages of AI, back in the day. "They were early innovators, but they weren't picked up as building blocks for these AI frameworks. They all moved well beyond the LISP and Prolog world," says Freund. "That was 25 years ago."
Pass the scalpel
After immersing itself in Python and Keras, Anglade's team had to try and squeeze its neural network model onto a phone. That's an important lesson for would-be AI programmers to understand: the training data may need a lot of manipulation, but you'll also find yourself involved in what amounts to digital brain surgery.
"Sometimes for some of these steps I'd use a network off the shelf, but usually you have to tinker with it quite a bit," Anglade explains.
The workflow for this tends to veer back and forth between the neural network and the data. The team had to switch away from Inception to another neural architecture called SqueezeNet, which produced far smaller models that could fit onto a phone. Then they had to spend time manipulating the training data to keep the accuracy levels up.
When they couldn't make that work, they hit on MobileNets, a family of on-device computer vision that promised better accuracy with a lower footprint. Someone implemented it in Keras, and Anglade and his colleagues then had to adapt it to their own needs.
What does that look like, and what level of programming is involved? Anglade's team was editing the new Keras implementation to structure a custom neural network that would handle the specific needs of its own data. The implementation was no more than 50 lines of Python code that described the neural architecture, but it did a lot.
"The end result ends up being very light, elegant code – but I can't tell you how many times we rewrote those 50 lines," he says. That can be painful. Every time the team made a change it had to retrain the data set, incurring an overhead of between 10-40 hours.
That neural network is just one part of the code. He also needed metadata to control it – he likens the neural net to a child's brain and the metadata to a curriculum that tells it how to learn. The metadata took 200 lines of code or so, stored in a Jupyter notebook.
So, a lot of the work involved trying various neural architectures that would recognise hot dogs properly and which could fit the inference model onto a phone. Another strand involved manipulating hot dog pictures to support the kinds of images they expected to see (hot dogs photographed from Google Images, snapped at weird angles from phones and so on).
Where to go from here
If you want to start down the AI path, don't boil the ocean. Start with a simple app (like Not), and work your way up. You should probably pick up some courses along the way. NVIDIA is training 100,000 developers, often for free. There are a boatload of online courses.
When you've filled your head with that lot, you can take your data off to the gym for some reinforcement learning, which Lorica says is cutting-edge stuff right now. ®