July 24th, 2014
Hidden Relationships & Complexities of Virtual Storage
Let’s continue exploring the importance of relationships and dependencies; and look at some interesting virtual storage scenarios.
In a typical virtualized environment VMs are stored in shared virtual storage devices often called “datastores”. What is a datastore? It is usually a storage LUN provisioned to some virtualized physical hosts and it supports some virtual file system allowing storing multiple VM images. Effectively, a datastore is one shared “volume” easily accessible by all hosts. Easy to create and easy to use.
But as with any resource with limited capacity, bottlenecks happen when there are too many VMs using it. If there are several shared datastores available to hosts in a cluster, and VMs in some datastores start suffering from higher than usual storage I/O latencies, it is likely the time to address this by spreading the load across multiple shared datastores. It is not a trivial task, the idea is to find the right combination of VMs and datastores where the space, IOPS and latencies represent the right state – so VMs can get as much IOPS and space as needed and latencies are minimal. Once the placement is clear (manually it is very difficult but there are solutions which could do this for you), VMs should be moved to the appropriate datastore with lower latencies and the task is solved.
Is it really? Let’s consider 2 datastores, one has 100 VMs and experiences visibly high latencies (300 ms per VM), the other is a brand new, has only few inactive VMs and practically zero latency – a perfect target for storage vMotion. But the story could be quite different if we start exploring how these datastores are implemented and related to each other.
A LUN could represent a standalone disk drive (like a local drive in a host) or be part of a large disk array which is mapped to some RAID group. A disk array is controlled by a storage controller which is effectively a computer with multiple CPU cores, which performs many needed control and maintenance functions like disc scrubbing, compression, de-duplication, write-back caching, inter-tier load distribution, etc. While multiple LUNs could be mapped to different virtual storage arrays, several arrays could be managed by the same storage controller.
So our 2 datastores could be mapped to such LUNs. If one busy LUN experiences high I/O traffic, it could keep the controller very busy doing lots of scrubbing, de-duplication, snapshotting, etc. As the controller is shared across these 2 LUNs, the moment we move VMs to the second LUN they will start experiencing the same latencies as they are caused not by disk arrays, but by the controller itself.
So this is the case where it is extremely important to know the relationships not only between VMs, hosts and datastores, but also between datastores, disk arrays and virtual storage controllers. Otherwise the control actions could be inaccurate and won’t guarantee the proper performance. A better solution here could be to identify a datastore managed by a less busy controller and move the load there – or if there is none, recommend to provision a new one.
This is important but still a relatively simple case. The relationship and dependency matrix could be much more complex. Just consider the NetApp CDOT – Clustered Data On Tap. It is a big shared storage infrastructure which allows implementing so called SVM, Storage Virtual Machines, implementing multi-tenant storage service offerings. This is yet another layer of dependencies and relationships on top of volumes that can add some business constraints and QoS requirements. Figuring out VM placements in such environments may become not just complex but practically impossible, as one needs to track resource consumption and constraints across multiple dimensions. And the cost of making a wrong decision could be very high as it may impact hundreds or thousands of applications.
How do you take these virtual storage dependencies into account today to guarantee workload performance? Are you even aware of them?