This article is more than 1 year old
What's wrong with network monitoring tools? Where do I start...
That red screen? It's just embarassment
5. Virtually comprehensible
I need my management package to understand the hypervisor layer so that I can do a packet capture on the virtual NIC of a virtual machine and the physical port to which it is eventually plumbed through the virtual infrastructure. Actually one of the switch vendors (Enterasys) has shown me this type of intelligence, but that's the exception rather than the norm.
6. Sensible discovery
Why are so many auto-discovery functions so bloody awful? I'd love a monitoring package that does some sensible discovery and presents you with something other than a single page with 950 overlapping icons? RiverSoft was one of the better offerings I've seen, but that was properly expensive, and actually Cabletron Spectrum had some clever bits in it, but the ones I've used lately are pretty awful.
7. Spanning Tree
Draw me a picture (a comprehensible one, please) of my Spanning Tree topology, highlight the root bridge(s), and report when changes take place. And don't tell me you can't do it because if the switches in the network know by talking to each other what their STP world looks like, you can use that traffic to figure it out too. Listen to the BPDUs and draw a picture.
8. Idiotproof GUI
I'll be slightly kind to the monitoring software vendors and point out that they're certainly not alone in this one: I've been reviewing software packages since 1994 and have come across probably half a dozen GUIs – across all product types – that I've considered brilliant.
But why are so many monitoring screens so unintuitive? Remember that in many cases the people using these screens are junior, inexperienced, level-1 support staff: this means it needs to be simple and understandable. I should be able to group devices together by dragging them around, or by saying “Everything in subnet XXX”, or “Everything whose name matches this pattern”... and yet so many packages don't let you do this. And I want to be able to click on anything and go to a screen showing me the data I think I want to see.
9. Storage should understand storage
Show me IOPS counts on my iSCSI subnet, and picture my Fibre Channel switch zoning. And, for that matter, tell me I've buggered up the resilience in the zoning. With iSCSI increasingly high-performing (hardware-based iSCSI can be screamingly fast) it's becoming more popular, but monitoring screens are seldom over-burdened with storage information.
10. Application-centricity
The monitoring package must understand the applications. As the infrastructure guy I'm not actually responsible for keeping the infrastructure running; I'm responsible for providing the resources that the apps need in order to run. The sales director probably doesn't care that one of my WAN links is down, for instance, because I have resilient links and the failover was seamless when someone put a digger through the fibre. So by all means let me know that something's down, but report at multiple levels with different views for application owners, business managers and the like. So when a fibre breaks, turn my screen bright red but turn the sales manager's a tasteful shade of pastel amber, so he knows something's wrong but his service is performing normally.
How to do it
Achieving all of the above is non-trivial, but most of the tasks are do-able. For instance:
- As I've said, the Spanning Tree traffic on a network provides enough information to build a picture. So do it, please.
- You can improve your GUI by talking to people who use it and employing usability analysts.
- The APIs exist to allow you to figure out the physical path from A to B, even in a virtual environment. Enterasys's own management suite does it very well, for instance, so why don't we see it in mainstream monitoring apps?
- Some management packages are able to download configs from switches and routers and highlight changes, but they don't take the next step of actually parsing the config and understanding it. By definition these are structured files which provide an accurate representation of the config (the router wouldn't work otherwise!), so use them to understand the configuration. Add in some ISDP/CDP/LLDP information and you're starting to build the end-to-end picture.
- Time-limited packet capture should be trivial, particularly with the ability to do virtual span ports and the like in your VM infrastructure.
- If you're monitoring packets you can figure out (say) that a DNS query didn't get a response, or that it took 10 seconds. Ever seen a Bounce Diagram in CompuWare's application analysis tools? Shouldn't be too hard to come close for a network equivalent.
- Yes, Syslog data is pretty unstructured but if you take (say) the top five vendors in each market you should be able to find enough patterns in the messages to make meaningful decisions. At least one vendor out there is doing natural language processing on Syslog messages, interestingly – I'm quite looking forward to seeing their package in the next week or two.
Conclusion
Monitoring companies need to take a step back and consider whether their tools are really giving customers what they need. My feeling is: no, they're not – instead they're confining themselves to what they know and giving the infrastructure manager just enough to keep things running and react just about quickly enough.
But as I see it there's huge scope to add a number of user-oriented features that would place the first company to do it head and shoulders above the rest. ®