IT Crashes: March 2010

In most systems, 1+1 redundancy is enough for critical systems. 1+1 means that for every protected system, there is an identical replacement system running in parallel, and it can instantly jump in and replace the first system in case of failure. In this case, however, two distinctions need to be made.

First distinction is whether the nodes operate in "Active-Active" or "Active-Passive" mode. The first one generally requires more setup and is more expensive, but allows for failure of the node without losing the state of the system - something that is called "stateful fail-over". For instance, in the case of network equipment, an Active-Active system will preserve the existing connections that used to go across the failed node by constantly sharing the working memory set and configuration between the two nodes. Such a setup has different names depending on the vendor: Live Failover, Fault Tolerance, Full HA, Stateful Load Balancing and many others. The main requirement for stateful Active-Active clustering is some form of shared synchronous storage that operates on a transaction level and ensures that both nodes are aware of all external conditions that each one is going through.

Active-Passive clustering is a cheaper form that utilizes a standby system with exactly the same specifications, the role of which is to watch the active system and assume its role only when the original system is unavailable. In general, active-passive systems do preserve the configuration, but require the state to be reset, thus "kicking everyone offline" for a moment and requiring either manual or automatic reconnection.

Using Microsoft server systems as an example, Network Load Balancing would be considered Active-Active clustering: several systems listen on the same set of virtual IP addresses and guarantee that if one of them is down, the other ones will still receive the communication and provide a response. Fail-over clustering is then considered Active-Passive, since the failover requires a "virtual restart" of all services on the passive node. Both options should be carefully examined for the right mix, since both have their ups and downs.

IT Crashes

Saturday, March 13, 2010

Unofficial Citrix Support channel on IRC

On Redundancy - Part 2

Every environment is different. Support should understand that before suggesting ideas.

Followers

Blog Archive

About Me