Network monitoring is hard... If only there was some kind of machine that could learn to do it

*AI bursts through wall* 'OHHH YEAHHH!'

Comment It's difficult not to wish for the "good old days" when workloads stayed put, packets behaved, and firewalls just did basic port-level filtering. Admins knew where they stood.

The cloud has changed that. Workloads and the resources they run have become malleable, virtual machines can be shunted from place to place, traffic volume has ballooned while traffic types have increased, and users can increasingly provision stuff.

Exacerbating the situation, new applications demand high volume and low latency – high-definition video and gaming deliver lots of data quickly and are sensitive to latency – while IoT confounds traditional network monitoring, with devices connecting and disconnecting frequently.

Shervin Shirmohammadi, professor with the school of electrical engineering and computer science at the University of Ottawa and a member of the IEEE MMTC Multimedia Cloud Computing Interest Group, sums up the situation. "Some of these applications are highly interactive, so in those cases network monitors need to catch and fix issues causing poor service quite fast, in the order of 100ms, which is very difficult with old methods and tools."

No wonder analysts think the market for network performance monitoring and diagnostics is set to explode. Gartner thinks it is worth $2.2bn and is growing at a compound annual rate of 15.9 per cent. Market research rival Network Analytics is even more bullish, envisioning a $3.1bn market by 2022.

Driving this is the sheer scale of cloud – the proliferation of devices, greater data, more users and increasing uncertainty. Traditional network monitoring is backwards-looking and manual dashboards will fail to scale as the number of devices, applications and packet routes rise.

If only there were some system that could learn from the present, respond accordingly and anticipate the challenges of the future – to head them off before they become challenges.

What is machine learning?

In network management, AI and machine learning have potential. Both thrive on data – they learn from it and grow – and there are few domains in IT that generate as much data as network management.

What they learn can be turned to our needs, and vendors in the field are beginning to add AI and ML capabilities to their products. These are being used to automate functions – detect real-time performance and availability, identify their root cause and speed up recovery times. It's early days, but you can expect more to come.

AI, meet software-defined networking

The advent of software-defined networking (SDN) should mean a greater role for this data-driven, AI-based approach to network management. SDN separates the control data layer from the physical switches underpinning the network, enabling administrators to configure network operations from a central console.

Abstracting network control into a separate software-based layer makes it easier to collect large amounts of data from the infrastructure and to configure the network quickly. This will provide more fuel for data-slurping AI algorithms. It could also make it easier for them to reconfigure the network directly via software APIs.

This puts the automation of network functions on the map. Shirmohammadi expects the mitigation of problems and optimisation of traffic flow to become more automatic over time as AI algorithms become more adept at analysing historical data and human administrators begin trusting them more.

"AI network analytics can pinpoint a problem that is causing poor service, or allocate resources more efficiently as nodes join and leave at a large scale, both faster than what is possible today," Shirmohammadi told us.

"In the case of a network problem, in addition to finding it, it can also suggest a solution to the network operator, or, in advanced systems, even go ahead and apply the solution without human intervention."

As AI network management gets to this point, the dream is that it should free up fleshbags to concentrate on better architecture design and network strategies.

ClAIrvoyant systems

Machine learning applied to network management isn't just supposed to be good at spotting present problems. The idea is it should be good at predicting events, too. How? By having algorithms process large amounts of existing data and case history in current traffic to extrapolate patterns. This could happen in several ways in a network environment.

One scenario sees AI better understand how network parameters change over time, leading to more accurate capacity planning and enabling procurement teams to kick off provisioning processes at the right juncture.

Another could predict adverse events and anticipate their effects in advance, in a Minority Report-style precog scenario.

Experts think that this may happen because predictive Machine Learning algorithms thrive on all that data, which in this case, is the historical stuff. The more data that they have from the past, the better they become at suggesting what may happen in the future. Computer networks are one of the most data-rich resources.

They constantly generate information about the devices and applications connecting to them and the information they are communicating. By collating that data and running it through a machine-learning model, you can potentially build up an accurate statistical model of how the network will react under certain conditions.

Training day, and tomorrow, and the day after

This all sounds great, but there are challenges – and one of the biggest is training.

AI and ML have an insatiable demand for data. Though there's no such thing as universal AI or ML, you need to pick a model or framework for specific cases. Moreover, the more you want to drill down into how different applications affect the network, the more specific the data for your chosen model or framework will need to be.

"Different applications such as OTT video, IoT, gaming and conferencing need to have different models," Shirmohammadi said. "A training model that works for OTT video will likely not work well enough for IoT. So, an AI network analytics system needs to consider each application differently while also considering cross traffic."

Algorithms may also need to be tweaked – you can't run and forget them. Data scientists have to repeatedly alter and test algorithms to make sure they work properly and are generating the correct outputs, a fact that means near continuous training. AI training algorithms can, however, use techniques for incremental training that don't need retraining from scratch, which can save time and effort.

"This would significantly speed up incremental additions to the model," Shirmohammadi added.

This tweaking may take more work given the many different data types that a competent AI-based network monitoring model would consume. For each application, an AI model can gather traffic traces from many layers, ranging from the application layer through to the transport, networking and physical layer, not to mention the node's own hardware and operating system.

Many metrics can be collected from each of these layers. Just at the transport layer you can collect more than 100 different types of metrics, such as the number of packets, window size, number of bytes in the payload and timestamp.

Developers and AI architects building on such systems must, therefore, be selective on features to overcome this challenge.

AI for network monitoring has promise but – you won't be surprised to learn – isn't a slam dunk. You'll need to dust off the marketing glitter and consider how much time and effort you want to invest in building, training and integrating these algorithms into the large, complex cloud-based networks.

That said, AI does promise to deliver a healthy amount of insight and automation into what's happening and what could be about to happen in the increasingly big and varied networks of the cloud. ®

Similar topics

Other stories you might like

  • UK Home Secretary delays Autonomy founder extradition decision to mid-December

    Could be a Christmas surprise in store from Priti Patel

    Autonomy Trial Autonomy founder Mike Lynch's pending extradition to the US has been kicked into the long grass again by the UK Home Office.

    Lynch is wanted in the US to stand trial on 17 charges of fraud and false accounting. He is alleged to have defrauded Hewlett Packard investors over the sale of British software firm Autonomy in 2011.

    Continue reading
  • Want to buy your own piece of the Pi? No 'urgency' says Upton of the listing rumours

    A British success story... what happens next?

    Industry talk is continuing to circulate regarding a possible public listing of the UK makers of the diminutive Raspberry Pi computer.

    Over the weekend, The Telegraph reported that a spring listing could be in the offing, with a valuation of more than £370m.

    Pi boss, Eben Upton, described the newspaper's article as "interesting" in an email to The Register today, before repeating that "we're always looking at ways to fund the future growth of the business, but the $45m we raised in September has taken some of the urgency out of that."

    Continue reading
  • All change at JetBrains: Remote development now, new IDE previewed

    Security, collaboration, flexible working: Fleet does it all apparently

    JetBrains has introduced remote development for its range of IDEs as well as previewing a new IDE called Fleet, which will form the basis for fresh tools covering all major programming languages.

    JetBrains has a core IDE used for the IntelliJ IDEA Java tool as well other IDEs such as Android Studio, the official programming environment for Google Android, PyCharm for Python, Rider for C#, and so on. The IDEs run on the Java virtual machine (JVM) and are coded using Java and Kotlin, the latter being primarily a JVM language but with options for compiling to JavaScript or native code.

    Fleet is "both an IDE and a lightweight code editor," said the company in its product announcement, suggesting perhaps that it is feeling some pressure from the success of Microsoft's Visual Studio Code, which is an extensible code editor. Initial language support is for Java, Kotlin, Go, Python, Rust, and JavaScript, though other languages such as C# will follow. Again like VS Code, Fleet can run on a local machine or on a remote server. The new IDE uses technology developed for IntelliJ such as its code-processing engine for features such as code completion and refactoring.

    Continue reading

Biting the hand that feeds IT © 1998–2021