Every pure solid-state disk (SSD) and hybrid storage vendor on earth would like you to know how brilliant it is at handling virtual desktop infrastructure (VDI) workloads.
VDI may be a niche, but it is a miserably difficult niche, in which storage plays a huge role. There is a lot more to making VDI work well than just throwing some SSDs at it.
Hybrid array vendors would have you believe that the secret is in their patented algorithm for determining what data should be on the SSD and what should be moved to traditional magnetic disk.
Pure SSD players want you to know all about their data efficiency technology, and how deduplication plus compression equals super powers.
Caching companies want you to know that SSDs belong in the server, not the array, and please use their super-deluxe software to cache things there so you can keep on buying spinning storage from your preferred vendors.
Every one of these marketing approaches is basically right, yet at the same time gets things wrong by omission.
Send in the clones
VDI is a small number of desktops and applications copied multiple times to serve identical copies to multiple users. As you can imagine, the result is a lot of redundant, identical data.
You can run your VDI in such a way that each child copy of the parent maintains only a copy of the changes that are different between the two. You can run a VDI in which each user has a completely dedicated environment.
You can even break your VDI up so that the operating system lives in a separate storage container from the applications, and then attach copies (or "differenced" child copies) to the operating system as needed.
I won't go into the full details of all the possible configurations here, but hope you are beginning to get an idea of how VDI workloads look.
It is not difficult to see that a whole bunch of identical, or mostly identical, copies of data deduplicate really well. They even compress pretty well. That means that with the right software you can do absolutely magical things.
Know the combination
Consider for a moment the humble hybrid hard drive. You can buy a 4TiB traditional magnetic hard drive with a seemingly paltry 8GiB of usable SSD cache built in.
At first glance, that seems ludicrous. What could 8GiB possibly do to accelerate 4TiB? As with most things in IT, use case is everything.
Seagate did research into the usage patterns of the average desktop user. It discovered that over the course of five days a total of 19.48GiB of data was read from the user's disk, with 9.59GiB of that data being unique.
In fact, Seagate found that if you could cache just 2.11GiB of data you would have cached 95 per cent of the unique data read by the average corporate desktop user across five days.
Taken in context then, the design of the hybrid hard drive makes perfect sense (though I'd still prefer one with 128GiB of SSD cache as I am anything but the average user).
Tricks of the software trade
Now, put the ideas of VDI cloning, deduplication, compression and SSD caching together and roll them around in your mind for a bit.
You might realise that while an individual VDI instance might consist of a 20GiB operating system and a series of application disks amounting to 50GiB in total, you don't need to move 70GiB per user to the SSD tier to make VDI fast.
In fact, of the 9.59GiB of unique data accessed by users over the course of five days, caching vendors have found that more than 80 per cent of it is common across all VDI instances.
That's 7.672GiB in common, on average, between user desktops. Let's round that to 8GiB, with 1.918GiB (rounded to 2GiB) of user-unique data.
That means I could get great performance for 100 VDI users if only I could provide them with 208GiB of SSD read cache. We will round that up to 256GiB and call it 256GiB for 100 VDI users. Hybrid storage starts to make sense for this workload now, doesn't it?