asian1 wrote:
Hello
I have idea about creating Fault Tolerant server
using 3 mini ITX A1Lite board, Linux PowerPC
realtime (hardened kernel) and Dual redundant
power supply unit (2 x 500 watt).
The first board is the main board, the second
a backup and the third is the supervisor board.
If Supervisor failed, backup became supervisor.
It is possible to remove a board (turned off
first) without shutting down the whole system.
Is this idea possible?
Real Time
Hmm. A high-availability (fault-tolerant) cluster should certainly be possible with the hardware. However, you're barking up the wrong tree - "realtime" refers to the timing guarantees of the kernel and its scheduler, a-la AmigaOS or QNX (will a syscall or whatever be serviced within N us/ms on X hardware?), *not* the fault-tolerance of the system as a whole. Some of the realtime projects might have high-availability solutions, but it's a separate problem domain.
Aside from that, the topology you're talking about is a little suboptimal; with one supply (even redundant), and shared disks?, it only really protects you from mainboard failures, which are actually somewhat rare... So moving to independent machines, with independent supplies, would probably be equally or better maintainable. (Think about it; if half the redundant supply fails, and you've had to modify it to support 3 boards, that's going to be a bitch to swap out, right? While replacing one machine in its own rack or case is comparatively 'easy.')
That said, I've seen projects to hook an ATX supply in this manner, but I can't find them. (You'll either want to give one machine control over the ATX power on signal - oops, single point of failure again! - or wire it 'on' all the time - in either case, no ability to shut down just a single machine if someone calls you up and tells you it's on fire.)
Check out the
Linux High Availability Project for more information, and some examples of their idea of reliable designs. There's
an old HOWTO with some example configurations, but if you follow them to the letter, keep in mind that the 'Y' topology used on their SCSI chain is really more of a 'V' - and that any modern/popular SCSI card that runs the bus 'through' the card (internal and external connectors) will probably create a nice big stub on the chain electrically, putting you at *greater* risk of data corruption. If you need to share disks like that, FC-AL is probably a better/more robust solution until SAS is out?
Nothing's perfect. Best of luck with it!