VSANs choking on VMware's recommended components
Virtzilla's initial recipes for virtual SANs didn't work`
VMware has changed the recipe for its virtual storage area networks (VSANs) after some components it recommended were found out as not being up to the job.
Virtzilla's notification of the change says it is being made because some “low-end IO controllers” it once recommended “offer very low IO throughput”.
So low, in fact, that “the probability of the controller queue getting full is high. When the controller IO queue gets full, IO operations time out, and the VMs become unresponsive.”
VMware seems to have known about these problems for a while. As Marcus Van Den Berg's up2v blog points out, a VSAN user posted VMware support's response to his problems with a Dell H310 controller to Reddit.
That response, posted here nearly a month ago says “While this controller was certified and is in our Hardware Compatibility List, its use means that your VSAN cluster was unable to cope with both a rebuild activity and running production workloads.”
The email from VMware goes on to say “to avoid issues such as this in the future, we are re-evaluating our Hardware Compatibility List. Our intention is to support hardware that meets a minimum threshold to work well during rebuild/resync scenarios.”
VMware has since listed 19 controllers that, as of July 1st, are no longer supported in VSAN rigs.
The explanation for their withdrawal follows:
“As part of VMware’s ongoing testing and certification efforts on Virtual SAN compatible hardware, VMware has decided to remove these controllers from the Virtual SAN compatibility list. While fully functional, these controllers offer too low IO throughput to sustain the performance requirements of most VMware environments. Because of the low queue depth offered by these controllers, even a moderate IO rate could result in IO operations timing out, especially during disk rebuild operations. In this event, the controller may be unable to cope with both a rebuild activity and running Virtual Machine IO causing elongated rebuild time and slow application responsiveness. To avoid issues, such as the one described above, VMware is removing these controllers from the Hardware Compatibility List.”
Might that message be translatable as “Ooops. We put some stuff on the list that we shouldn't have and/or should have tested more thoroughly?” Let us know in the comments, especially if you take up VMware's offer to contact customer care if you bought the kit it has banished from its lists. ®