This article is more than 1 year old
Okay IT pros, change happens. But here's your Reg guide to staying in control
Has a client ever had you working for months on features they don't really want?
When I started my IT career, the organisations I worked with didn't really do formal change management. And that wasn't really a problem: either they were small enough for it not to matter (we just told the handful of users: “We're about to upgrade X”), or the departments I worked in were sufficiently small and autonomous that the same logic applied as above.
Over the years, though, it's become clear that formalising the change process isn't just useful: it's an absolute necessity. Not only that, but you absolutely need it in both the development and operational fields – and for different reasons.
Change control in development
Ever come across a thing called “scope creep”? Of course you have – you're not a proper IT person until you've worked on a project that, metaphorically, started out with the intention of building a sandcastle and ended up with the Burj Khalifa.
Techopedia says that scope creep: “Refers to a project that has seen its original goals expand while it's in progress.” Now, I don't quite agree with this definition: I prefer Wikipedia's, which says it's about “uncontrolled changes or continuous growth in a project's scope”.
The key word here is “uncontrolled”. Changes in the scope of a project are perfectly acceptable, as long as they're done sensibly and in a controlled fashion. And “sensibly” in this context means you need to understand all the implications of the change, communicate them to the team and the client, and get the latter to agree to the changes.
Adding features to a design generally means more cost, more development time and more testing, and so if you're already up against time, the client needs to agree one of two things: either (a) that new items will be inserted into the schedule and that displaced items will be delivered later than previously promised, or (b) that new items or changes to existing items will be done at the end in a separate phase.
If you decide to add another phase to mop up the changes, be clear what delivery of the original project looks like. I once worked on a project that was scheduled to take 13 months but in fact ran to 22 months; sounds terrible, but in fact the original project over-ran by precisely a week. The remainder of the time was down to the client asking for modifications over and above the original spec because what he'd received gave him inspiration for other stuff he could ask for.
If you add work into the schedule and put other stuff back, redefine what “finished” looks like: put in the new stuff and take out the items that have been de-scoped from the original deliverable.
Above all, whenever any change is agreed, document it to death and unambiguously, and get the client to sign off all changes (and a client could be a paying customer but it could equally be another division of your own company). If something gets de-scoped then document this and at the very least circulate it in the minutes of the meeting (you did take minutes, didn't you?). If you don't, I guarantee you'll get beaten up at the end for not delivering it. Oh, and don't change the scope unless the client representative asking for the change has the authority to do so.
Eventually you'll deliver something. With luck it'll be what the client wanted, but sometimes it won't. (I remember a case where I had to cover my ar*e by thoroughly documenting the discussion process where the correct behaviour described by the “design authority” changed with the wind). But what matters, is that it corresponds with the most recent spec you agreed with the client.
Change control in ops
Once something is delivered, it will change over time. New features may be added, or bugfix patches may be released, or the server it sits on may be upgraded. Unsurprisingly, whenever anything changes you need to control the process of this happening.
If you're an ITIL fan then you'll have a change manager backed up by a Change Advisory Board (CAB). The change manager has the authority to approve proposed changes, and the CAB provides expert and advice and/or a sounding board to help with the decision. Often, and particularly in organisations with a wide variety of systems and concepts, you'll have the decision spread across a small number of individuals so as not to lumber one individual with the onus of every decision. Doesn't really matter which way you go, as long as you have some kind of change management “engine” and a process that sits around it.
The term “change management” isn't very descriptive, though: I consider it more as “cock-up prevention”. Consider the three rough categories of change and you'll see what I mean:
- Fixes: Someone asks for permission to carry out a change that will fix a problem – a failed disk in a server, perhaps, or the replacement of a dead firewall in a clustered pair
- Updates: The new version of (or a patch to) a software package is being installed to provide new features and/or to keep it up-to-date and hence within the vendor's maintenance regime
- Decommissioning: A system has reached the end of its life and the techies want to remove the kit and dispose of it
The best possible outcome of any of these types of activity is that the users don't notice: you swap out the dead disk and the replacement quietly rebuilds itself in the background, or you power down the obsolete system and the Service Desk phone stays quiet, or you run your Patch Tuesday Windows Update and none of your servers craps itself. And the immediate outcome of doing nothing is that nothing changes – with the caveat, of course, that this could mean that a system stays down or continues with reduced resilience rather than operating normally. Stuff seldom gets worse if you do nothing or succeed in the change, then – you're employing change management to protect against screwing something up – to mitigate and minimise risk.
When I'm a member of a CAB there are five core things that you need to convince me of if you want me to favour going ahead with a change:
- A test plan: If you're doing a change, you must be able to determine whether it succeeded
- A rollback plan: What are you going to do if it all goes tits-up? Sometimes it's not possible (or feasible) to roll back a change, in which case tell me how else you're going to get the service back to normal if the change fails
- Resources: If your change document says: “5:15am: Finance department to verify tax calculations”, I'd like to see that they've agreed to be available to do so
- Risk: High-risk changes are perfectly acceptable (and are often necessary) – so long as you're clear on the impact of something not going right, and what you're doing to mitigate and respond to that risk
- Communications: If users or customers will see a change or interruption to a service, what's your plan for talking to them? You may decide that you're not going to tell them about a potential interruption if it's unlikely – but I want to know that you've been through that decision process
And it's this last sentence that's the key thing about operational change management: just the action of making people go through a process of completing a change control template and submit it into a formal process makes them think about the change.
Unless you're the OCD sufferer from Hell the chances are that in the absence of a formal process you're tempted to wing stuff and figure out fixes on the hoof if something goes wrong. In every company I've worked with, systems were more reliable with change control in place than they were prior to its introduction – because people have had to make the effort to think about changes up-front in order to get them over the change control barrier.
Why you need change control in DevOps is pretty straightforward, then. In the development world, strict control over changes to the spec is the only way to ensure you deliver what you agreed with the client you would deliver – even if it turns out not to be what they actually wanted, it will be what they asked for.
And in the operation world, it's a simple fact that change control leads to fewer things breaking – and for no reason more complicated than the fact that it makes people think before acting. ®