Virtual domain controllers (VDCs) in Server 2012 – and now 2012 R2 – are awesome.
I have used domain controllers inside virtual machines since Virtual Server 2005 and have seen them fail in every way imaginable. VDCs address all of my issues and, considering the features they bring to the table, it is flat out nuts not to use this technology.
There are three primary scenarios where traditional domain controllers fail in a virtualised environment: restoring an individual domain controller from a backup into an existing environment; "oh damn"-class disaster recovery (where everything is coming from backups); and cloning.
To my delight and amazement VDCs cope with all three scenarios.
With the disaster recovery stuff Microsoft has created a new feature where version one is not crippled, half-assed or missing the features that made us want it in the first place.
If anyone of influence at Microsoft reads this, the people in charge of this project should be running the whole company. At the very least, buy them a tropical island. Maybe next to the one you should have bought the storage team by now.
I am a lot more skeptical about the cloning features. They strike me as being good for a narrow-use case, while still missing the mark. Microsoft should not have used the word here, as what most sysadmins think of as cloning and what Microsoft calls cloning tangentially intersect.
Microsoft has some comprehensive documentation on the how and why of VDCs. This naturally includes some PowerShell examples for those who choose to script the process. It is worth bookmarking as you will need to reference it at some point.
Count the blessings
The heart and soul of VDCs is the VM-Generation ID. This is one of those "so simple it's brilliant" ideas that I wish we'd had years ago when virtualisation started to take off.
The simple version of the VM-Generation ID is a counter. One copy of the counter is kept in the virtual machine by the operating system and another is maintained by the host. Any time you do something to the virtual machine – suspend it, snapshot it, restart it or what have you – the counter is incremented.
If the counter inside the virtual machine is different from the counter maintained by the host the virtual machine knows that something has occurred beyond normal operation.
Its wide adoption is a matter of time, given how useful the concept is
Now that VM-Generation ID exists, in theory any operating system could make use of it. I suspect its wide adoption is a matter of time, given how useful the concept is to any number of applications.
In the context of a Server 2012 domain controller, VM-Generation ID is used by the Active Directory service to determine if it should trust the local copy of the Active Directory. If the value of VM-Generation ID inside the virtual machine does not match that of the host then the Active Directory will invalidate its RID pool and any changes to the invocation ID.
In other words, any pending on that domain controller are not sent out to other domain controllers on the network; and the domain controller that has discovered its database is out of sync will fetch a clean copy from an unaffected domain controller.
This is great for people like me who don't have the licences to burn on making my domain controller just domain controllers. Mine are generally DHCP servers and print servers as well.
They have done this job quite well for more than a decade, but once every three years or so a printer driver update will go sideways. The ability to just restore from a snapshot would be really useful.
It is also useful for those instances where Patch Tuesday touches your week with a borked Windows update. Because you stagger your domain controller update days… right?
I emphasise the importance of staggered updates because the ability to recover a virtual machine from snapshot or backup in this manner is dependent on there being a "good" domain controller on the network from which to fetch a clean copy of the Active Directory. You have to deal with things differently if you break all your domain controllers at the same time.
You broke them all?
You should take a slightly different approach to getting things up and running if you break all your domain controllers at the same time – less difficult than it sounds, especially for companies with few domain controllers. The short version is "bring up the domain controllers that own FSMO roles (flexible single master operations) first."
The first domain controllers up should be the PDC emulator followed by the RID master. In many instances they are one and the same system but they could just as easily be two different ones as they are separate FSMO roles.
These need to come up before anything else so that you have the core infrastructure of an Active Directory network up and running, at least enough for the domain controllers to chat among themselves and determine who is boss.
Bring up any remaining FSMO role domain controllers and make sure you have at least one global catalogue (GC) server. (GCs end up being important for the smooth operation of anything and everything in a Windows network.)
Manually trigger replication between these servers to make sure they can all talk among themselves. If one of them gives you grief a restart should get it syncing with the rest.
By this point you have managed to get at least one domain controllers up that believes it is authoritative and the Active Directory infrastructure required to replicate among further domain controllers online and waiting.
Any additional domain controllers you bring online will behave just as in the previous section: they will wake up, realise something is wrong and grab a clean copy of the directory from the rest of the network.
For the curious, VM-Generation ID is supported in Hyper-V 3.0 and later as well as VMware 5.0 u2 and later. Commits were added to both the Xen and KVM development chains well over a year ago. If support hasn't already been patched in to your favourite distro, it will be soon.