What's surprising about the announcement isn't so much that the ASF is accepting this face in the crowd to its ranks – it's hard to turn around in the software world these days without tripping over ML tools – but rather that MXNet developers, most of whom are from Amazon, believe ASF is relevant.
MXNet is an open-source "deep learning" framework that allows you to define, train, and deploy so-called neural networks on a wide array of devices. It also happens to be the machine learning (ML) tool of choice at Amazon Web Services (AWS) and is available today via ready-to-deploy EC2 instances.
Deep learning is the currently very popular subset of ML that focuses on hierarchical algorithms with non-linearities, which help find patterns and learn representations within data sets. That's a fancy way of saying it learns as it finds. Deep learning tools are currently popular thanks to their success in applications like speech recognition, natural language understanding and recommendation systems (think Siri, Alexa and so on). Every time you sit on your couch yelling at Alexa you're employing a deep learning system.
What makes MXNet interesting at this stage is Amazon claims it's the most scalable tool the company has, and Amazon is a company that knows a thing or two about what scales and what doesn't.
MXNet is far from the only kid on the deep-learning block. In fact, it's a bit late to the game. Other popular tools in the deep learning world include Torch – used at Facebook, Google and NYU – and Microsoft's Adam, but perhaps the biggest direct competitor is Google's TensorFlow. TensorFlow is open source, uses an Apache license, and was released as a version 1.0 this month with customers already picking it up.
Google has more than a year on AWS with Tensorflow. Already used in a stable of Google services such as Gmail and Google Photos, Google released a stable version to eager testers in November 2015 – 12 months ahead of AWS picking up MXNet. In June last year, Google claimed 1,500 repositories on Github mentioned TensorFlow, of which just five came from Google.
If you're new to the world of open source – and ML tools and developers often are – you'd be forgiven for having no real idea what ASF is.
Even if you're very familiar with ASF you might still wonder why a multi-billion dollar company like Amazon would be so excited to have its pet project adopted by an all-volunteer group that somehow manage to run the ASF on barely $500k a year?
In a word: community.
Founded in 1999 and funded entirely by donations, the ASF system first helped establish Apache HTTP Server as the web's most popular web server. The formula has been recognised and repeated with latter hits including Hadoop, Spark, Tomcat and Struts. Wounded over its damaging control of OpenOffice, Oracle dropped OpenOffice into ASF to help it win broader buy-in.
The purpose of the ASF incubator is to help external projects improve the quality of their code and participate in the larger community. It is a kind of seal of approval for an open-source project that it is truly open source and uses the ASF voting procedures and all the rest of the quasi-democratic governance system the ASF has developed, known among the anointed as The Apache Way.
Given a choice between that sort of community and the TensorFlow community – which, while it's open source, is very heavily managed by Google – MXNet starts to look more appealing. And the more appeal it has, the more developers get involved and the better the code gets. If you want to think of it in terms of ML, the ASF is a learning network for developers.
It's worth noting that not every project that enters the ASF incubator manages to escape its parents. But officially projects don't get to move past the incubation stage until they demonstrate independence from any one contributor or sponsoring entity.
Incubation is the first step for a project that wants to become an official ASF project, but there is no guarantee that a project will either succeed or end up in the auspices of the ASF.
Among the incubator's successes are Cassandra, CouchDB, Mesos, and many more. Then there's OpenOffice – another incubator graduate, but one that has largely been eclipsed by LibreOffice.
Now Amazon is hoping that MXNet can learn a few tricks from the ASF, and maybe build a community that can help it catch up to competitors.
As AWS general manager of Artificial Intelligence Matt Wood said, the reason the project wants to be part of the Apache Incubator is to "take advantage of the Apache Software Foundation's process, stewardship, outreach, and community events."
In short, it wants to use the ASF's clout to attract more developers.
It's tempting to see Amazon's move as entirely self-serving, and indeed it is, but that's just the beginning of the story.
The ASF may not be the household name it once was, but it still has considerable clout and its governance and so-called Apache Way really do turn out some impressive, well-developed community projects. With that behind MXNet, its odds of besting TensorFlow and others do go up considerably.
And of course, the ASF gets what's probably its best ML project to date. MXNet is certainly one of the easiest to deploy, given that there's already an AWS Deep Learning AMI available, complete with MXNet, and plenty of example code pre-compiled and ready to use.
That the server instance you just spun up happens to be closely tied into other AWS services, which you might want to invest in as well, is just coincidence, I'm sure. ®