Fault Tolerance
Fault Tolerance is related to
Clustering whether you are planning a high availability or load balanced implementation.
Fault tolerance has two components - Hardware and Software.
The hardware vendors
who sell fault tolerant hardware would like you to think that that is all you
need to think about. It is a very big part of the solution to application
crashes. Fault tolerant hardware has redundant power supplies, CPU's, RAM,
video cards, network cards, RAID adapters, etc - all plugged into a dual
backplane system where almost any hardware fault has a backup. Vendors
that sell these products spend huge amounts on research and developement to make
all of these pieces work together. What you end up with is a server that
costs about 5-10 times as much as an equivalent (in performance) non fault
tolerant server. If your application is a database that runs your world
wide accounting database or your e-commerce site, it is probably OK to spend that much
on the one hardware piece that you cannot have fail. On all of these systems, the operating
system is a limited version with special patches to minimise the chance
of it crashing. About the only software that companies run on
such a platforms is Oracle which is also customized to run on that hardware
to minimise the chance of it crashing. Pretty good for your your
Oracle database but not much use for any other
application.
This is where software
steps in to take over for hardware in the fault tolerence market.
Clustering software is one part of this and is talked about here
.