July 17th, 2014
Virtualized Applications and Software-Defined Networks – you talkin’ to me?
Previously we looked at some challenges that virtualization and clouds brought to traditional infrastructure management – what used to be a simple static story of predictable usage of CPU, memory, I/O, became a management nightmare in the presence of concurrent independent pieces of workload demanding compute and storage resources at the same time.
But let’s assume we magically (and expensively) solved this by carefully examining and separating all these workload pieces into their independent quarters – clusters and even hosts. After all, the QoS of production load takes precedence and at some high cost we could isolate those critical pieces. But is isolation the ultimate solution? In the modern world no single VM is an island – today’s applications consist of multiple pieces interacting with each other constantly.
Let’s look at the typical 3-tier application, a cornerstone of many Web services – a load-balanced Web-tier, an application server and the database backend. Web servers receive requests from end users, invoke some special logic within an app server which in order to fulfill the request need to perform some SQL queries asking a DB backend for some pieces of data. The amount of data exchanged between tiers depends on the nature of the application and end-user demand, and this could be very significant at peak times (consider viewers rushing to their computer screens to start the webcast of the Super Bowl).
All these VMs talk to each other by sending information over networks. Usually the app and DB servers listen on specific ports, and in order to minimize network latencies these VMs better be communicating via the same device. This device could be a real network switch, or now more often a virtual switch running inside a hypervisor. In the traditional static network design these VMs would be using the same VLAN implemented by few tightly connected devices.
However, in cloud environments configuring multiple VLANs and switches becomes a management headache. Statically configured network devices combining data and control planes don’t scale and limit workload mobility. So software-defined networks (SDN) provided a real breakthrough – one can now create one big VLAN crossing the boundaries of hosts, clusters and data centers and providing a foundation for much greater mobility. But SDN, while bringing all these benefits, creates some interesting new challenges.
Consider the following situation. In order to guarantee DB performance administrators could create separate DB clusters where a few number of high performing database VMs run on powerful servers and fast storage devices. The DB performance is stellar and because it is isolated nothing can interfere with it. The same could be done with the application servers, and then a special cluster could run load-balanced web servers whose number can heavily fluctuate depending on the load.
The entire data center implements a very fast network built on top of a so called “spine” of fast non-blocking switches. But these spine switches need to deliver the network traffic to hosts via leaf devices, cheaper commodity devices with lower speeds.
Image courtesy of W.R. Koss at siwdt.com
So if a large number of web server VMs start sending a lot of traffic to an app server, their immediately-connected network device could become saturated and while the spine has plenty of network capacity, the application will experience high latencies as slow leaf switches will be struggling with the traffic. And what if to address the burst in traffic and limited capacity of the private cloud you decide to start your web servers in Amazon cloud? Then the traffic will be crossing the public internet which is much less predictable in terms of network latencies.
How to avoid that – bring the frequent talkers together and take advantage of virtual switches. If frequently-talking VMs run on the same host, network packets will never cross the host boundary and will avoid going to switches at all. How to bring the frequent talking together? Well, first one needs to know which VMs talk to whom. Application owners can follow their design and tell datacenter admins to bring them together by implementing so-called affinity rules.
In theory it looks simple. However a large enterprise may run hundreds of such applications. And every application may require their own affinity rules as frequent talkers could be different. And by the way, the intensity of talking really depends on nature of requests – sometimes many simple requests will saturate the web-app connection or some data-intensive request may bring to its knees the app-db connections. So to be absolutely safe, let’s keep all 3 tiers together. The number of load balanced Web VMs can vary, but again, they better be kept near the app servers. Implement the appropriate affinity rules and bingo, the latencies are minimal.
But wait a minute. We just isolated the app servers and the DB servers into their own clusters. Not only they’d better run on separate hosts, they better use their own fast storage and networks.
So this is an interesting challenge. The compute and storage infrastructure forces you to separate concurrently peaking loads as far as possible from each other onto separate devices. Network saturation suggests to bring the frequent talkers together to minimize latencies. The frequent talkers happen to be exactly the same VMs which you want to separate. What is better – latencies because of ready queues or I/O blender or because of congested leaf switches? Obviously we want to minimize all of them. But how to do it in the presence of hundreds of applications (BTW, not all of them are simple 3-tier, they can have dozens of tiers), consisting of thousands of VMs sharing hundreds of hosts, data stores and network devices?
You defined your application workload in software, you defined your networks in software, you defined your storage in software. But how to drive the application performance to guaranteed defined service levels? Are you ready for the software-defined datacenter? Is there a better solution?
About the Author
Before joining VMTurbo, Yuri Rabover managed the Advanced Solution group in EMC’s CTO office and worked closely with EMC’s Architecture and Applied Research teams in prototyping and innovating across a broad range of technology stacks and solutions. Yuri joined EMC with the acquisition of Smarts, where he had a long and diverse career: he served as a member of the founding team, managed engineering and product development, and was responsible for technology partnership development, managing relationships with strategic accounts.
Image source: Robert DeNiro back in the day in the Oscar-winning Taxi Driver