One of the perceptions around virtualisation seems to be that it automatically provides you with high availability (HA) and fault tolerance (FT). Certainly, as a platform, it paves the way for this capability, but additional tools are still required for this to be the case.
Whereas VMware has its own tools in VMware HA, both Citrix and Microsoft have been a little lacking in this area. However, this looks like it is changing, as the historical HA players come more to the fore.
For example, Marathon has just announced the fruits of work it has carried out with Citrix to build HA in to the next version of XenServer, and the provision of extra functionality through its everRun product that will integrate directly with the HA versions of XenServer to provide extra levels of FT.
Here, the base level of relatively simple HA, known as Level 1, which will be free within all Enterprise and Platinum versions of XenServer, will rapidly clone a whole physical server plus all the virtual machines on it to another physical server. EverRun will add the capabilities for more granularity, being able to sense when storage or network components fail and re-route the virtual machines to make use of other storage or network assets as a Level 2 capability. Eventually, in the early part of next year, Level 3 fault tolerance will be brought in that provides full system-level fault tolerance.
Similarly, Neverfail has been not been resting on its laurels, and has launched tooling to manage hybrid environments. Here, the thought process is that at this stage, many users are not ready to allow "business important" applications to run in a virtualised environment, but that the use of an "N+M" redundancy system—one physical server mirrored with another local server for immediate failover plus one or two remote servers for geographic redundancy—is neither cost effective nor good for an organisation's green credentials. Remembering that many Wintel servers are running at less than 10% utilisation, an "N+M" system can lead to overall utilisation rates of less than 2.5%.
Neverfail's approach is to use the physical server as the prime system, but use shared virtual servers as the failover systems. Therefore, many physical servers can be backed up by just a few other physical servers—or, in many cases, by just two. On the failure of the main system, a virtual image of the application is rapidly spun up on a virtual server and connections are re-made to storage and network links. Users see a little blip in initial connectivity, but then can carry on—albeit at a lower overall response rate. However, as in the Marathon FT case, not only does the application remain running, but transaction state and data fidelity are also kept.
As use of virtualisation becomes more widespread for the main runtime environments of many organisations, the need for HA and FT tools will become more apparent. The approaches being brought to the fore by the likes of Marathon and Neverfail are good steps in the right direction, and it is likely there will be more functionality and capability added as time goes on.
We can also expect to see the main system management vendors do more—the likes of IBM Tivoli, BMC, CA and HP cannot allow the virtual world to escape them, and their current offerings around physical HA and FT, along with their virtualisation capabilities, will have to be brought closer together and beefed up to cover these requirements—or they will need to ensure that they partner with the likes of Marathon and Neverfail to ensure that such capabilities are available.
It's likely to be an interesting few years, as virtualisation becomes mainstream and the problems of a mixed physical/logical mix become clear. By acting now, the likelihood of being caught unawares will be minimised—and the means of making the most of existing and new systems will be maximised.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.