[SAGE] Linux automated installation/bare-metal recovery?



Hi,

$University is going to be replacing a few older Red Hat boxes with new CentOS5. Most of our older stuff was built as one-off boxes (hand-configured, etc.) and I'm trying hard to move everything new over to something more stable. Specifically, my group runs about 2 dozen core boxes, mostly unique (little duplication between services/configurations, at most 3 sets of primary/secondary boxes). At the moment (all inherited), most of the "backups" are done as manual tarballs of /etc and other relevant directories. So, since I'm going to be rebuilding a few of them, it's time to rollout some realistic plan for recovery. As we've mostly standardized on one physical platform (SunFire x4100), binary compatibility isn't a big concern, and we have a few spare identical machines.

What are you guys doing for installation/bare metal recovery? The idea of config management (Puppet/Cfengine/etc.) has already been vetoed by management given the small number of boxes we run (though our services support ~60k users).

So far, the two running theories are:
1) Use Kickstart for installation (and recovery for new boxes) and backup configs/non-package files (Bacula). 2) Use a gold master Kickstart file as a base, hand-configure machines past that, backup *everything* in Bacula and make use of Bacula's Bare Metal Recovery method. Disk capacity is cheap, and we just purchased 6TB for backups.

Given the number of users we support, our main criterion is time from failure to recovery on replacement hardware, with ease of automation (at least easy enough that anyone can do it with a simple HowTo document) as a close second.

Any other ideas to throw out there that I may have missed?

Thanks for any suggestions,
Jason

PS - We're a 100% OSS shop with a pretty low software budget, so anything that's not both Free and free is out of the running.



This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.