This article is more than 1 year old
Event management kit can take a hammering these days: Use it well and it'll save your ass
Hunting the known unknowns
Analysis Who'd have thought it? Diagnostic event streams and log files are fashionable at last.
But, despite many advances, they're still as big a pain in the backside as they were 30 years ago – both as a tool for observing and reporting security issues thanks to their sheer volume and, increasingly, the numbers of data types we're dealing with.
Logging has always been a paradox. Increase the log volume and you'll naturally raise the number of alarms, buried in millions (literally) of lines of data. Limit your log input to something manageable and you risk missing potentially important messages as well as losing the underlying data.
Given the recent rise in machine learning (ML) and artificial intelligence (AI), it's no surprise that AI types are trying to make sense of log files and come up with new ways of analysing them. The trend seems to be to talk about "behaviours".
Now, oldies like me are all used to "events". Speaking simplistically, an event is a single line in a log file or log data stream such as a password failure for an administrator account, or a new device being seen on the LAN. An event might mean something, but a sysadmin would have to look at what else is going on, either by checking other logs or systems, to see whether it's important. Sometimes you'd amalgamate events to take automated actions, but this would be pretty noddy – for example, by locking a user's account after three failed login attempts.
"Behaviour" suggests a more long-term view, and that's exactly the point of today's algorithms. They take multiple streams of event data over time to build up the context in which systems are operating, and to make far more considered decisions than legacy monitoring tools. Think of it this way: a young baby crying is an event, which may be associated with accepted (and standard) behaviour if it's around feeding time. However, if the baby were crying immediately following feeding, that behaviour may be alarming.
The concept of User Behaviour Analytics (UBA) has been around for a few years, and it uses precisely this concept. It combines event streams to analyse what users do on systems, and identifies anomalies: the more unusual and potentially damaging the behaviour, the more urgent the alert. So when a 9-5 accountant suddenly logs in at 2am for the first time ever, that's more suspicious than a member of the 24/7 call centre doing the same thing.
UBA has now expanded into UEBA – User and Entity Behaviour Analytics – which is all about doing the same thing but also for devices ("entities") on the network, not just user activity. Then of course you get into analysing the network traffic itself, not just the network devices, and lo and behold, we have the concept of NTBA – Network Traffic Behaviour Analytics.
Helping the machines
The more data the machine learning software sees, the better it'll be at getting the right answer – but a helping hand is always welcome. Take, for example, one of my pet hates: staff off-boarding processes that aren't being followed properly. I'd like it to tell me if I see a login to Active Directory for an individual that was previously the subject of an "employment ceased" event in the HR system. No AI system's going to guess how to do this when you first fire it up, though, so you'll have to give it some clues.
Thinking at a higher level
You'll also have to train yourself to think less technically. As IT specialists we're used to thinking about traffic streams, ports, user IDs, login events, and the like. Because the technology's now doing more of the low-level grunt work and – in real time – all the log correlation and deduction that we used to spend hours figuring out after the event, we can step back from that. We'll think more how to configure and educate the software than how it's working under the hood. And where we let the software go a step further and take action to, say, disable a user account, or kill the switch port of a virus-riddled PC, you'll need to get to grips with finding your systems configured differently from how you left them.
Picking your alerts
In the old days you'd be strapped for space on most of your technology so you'd have to be frugal with the log data you allowed the kit to generate. These days the management software vendors – particularly in Security Information and Event Management (SIEM) – are saying: "Just throw everything at us, we'll take it." So while you used to restrict the input to your management platform, you're now turning up the input to 11 and having to enforce limitations (or at least prioritisation) on the output instead. It's a never-ending cycle of set-observe-repeat, but one you must nonetheless stick to: examine the alerts that are generated regularly, re-prioritise them rigorously, and ensure that stuff you've not seen yet is set to alert you. Security alerting thresholds should be more sensitive than general system management alerting levels. If you don't get an alert to a system failure from the management platform then you'll probably get one by phone from the users who can't work, but in many security incidents the users will be blissfully unaware there's something wrong so you need to be sure the kit will tell you.
What about monitoring for a lack of alerts?
If you can't hear your children playing, they're probably up to no good. The same can apply in information security: alerting to something that appears in a log file is great, and putting it in context using other event streams is better – but what about alerting to something that hasn't happened?
Say a user ID is compromised and the attacker is able to throttle the logging level from a crucial system. Will the system react to this? Yes, it'll probably smell a rat because the entity (the server, switch, or whatever was compromised) will be behaving differently from normal.
But what about when an administrator logs in? It might be a perfectly normal activity for that user ID at that time of day. Hang about, though: that user's swipe card hasn't been used since he left yesterday, and there's no activity on the remote access server for him – because another user has compromised his password and is using it for their own means. Can you configure systems to protect against this? Of course, though it usually won't be trivial. Will an AI-based monitoring system that's running out-of-the-box in its untrained state clock it? Probably not. Give some thought, then, to how the clever tech can correlate based on information that's not there: it's an area where I think we'll see more and more work taking place in the AI space.
And on a simpler level, you can monitor the files that users access – but where in the log does it tell you the files that they don't go anywhere near? (Answer: it doesn't.) Your AI monitoring package is dead clever, and is getting more clever the more data it consumes and the more you tune it. So, there's no reason why it can't decide that it can turn off access to a particular folder for user X because he has rights to it but hasn't used it for six months. Again, this is something you could script, but won't it be great if the AI figures it out for you and saves you the trouble? Again, expect to see your AI monitoring tool piping up and saying: "Hey, have you noticed this isn't happening?"
A final reminder
I mentioned earlier that modern SIEM products have the capacity to consume pretty much all the data you can throw at them. So, while they're chewing on the millions of event messages per day you're bombarding them with, make sure you're also sending them the non-existent event streams. By this I mean make sure you're forwarding the event streams that should be empty. If your firewall's configured not to permit inbound traffic, make sure you send the inbound traffic monitoring log to the SIEM platform. If all's well it won't cost you any bandwidth or storage (it should be empty, after all) but you'll be damned grateful to have it when someone internal misconfigures your firewall or someone outside hacks it.
Why? Well, take the true story of the sysadmin who'd been asked to run a report of all email accounts that were being auto-forwarded to external email addresses. The indignant cry came: "There can't be any, we disabled it when we installed the server." He ran it anyway, just to prove that all was well, and found that someone's entire inbound email stream was being forwarded to a hacker's Gmail account. Which would have been obvious if they'd been monitoring for stuff that shouldn't happen.
Maybe the moral of the story, then, is that monitoring what's not there is what'll save you in the long run. ®