[Tfug] recommendations for installing a cluster..

Dean Jones dean.jones at gmail.com
Wed Dec 31 10:15:19 MST 2008


Hi,

I have (and do) manage a few clusters currently.  I'll throw my 2c here.

Jeremy D Rogers wrote:
> Hello all,
> 
> I'm looking for opinions or advice on reconfiguring and reinstalling my 
> lab's cluster.
> 
> History:
> My lab has a 32 node dell cluster for simulations and it recently 
> imploded during a facility cooling meltdown. It seems that no hardware 
> was damaged, but the disks on the master node had forgotten its 
> partition table. I suspect it was unrelated to the cooling problem and 
> only showed up because we had to reboot for the first time in ages. I 
> wasn't too upset because the system has been in a downward spiral for 
> some time and we've been looking for and excuse to reinstall everything 
> (redhat cruft + too many admins cycling through trying to do things 
> differnetly = bad times).
> 
> Hardware:
> 1 masternode with 6 swappable raid drives (2 for / using raid1 73GB, 4 
> using raid5 originally for /home but currently unused 440GB)
> 32 dual proc slave nodes with smallish (maybe 100GB) disks and cd drives
> 1 storage server added recently: dual quad cores, 8BG ram and 7TB 
> software raid5 for serving /home
> 
> Plan:
> So far, I'm pretty well bent on debian/ubuntu because I would otherwise 
> go through apt withdrawel. And I haven't used redhat much since about 
> 2001 and I find my self doing things like spending 5 minutes to remember 
> that I should be editing /etc/sysconfig/network-scripts/ifcfg-eth0 
> instead of /etc/network/interfaces. We also have plans to add a second 
> cluster with newer hardware in an adjacent rack and probably have the 
> storage server serve /home to that as well. Now what I think I want to 
> do is diskless nodes booting from a chroot on the master node. But I 
> have questions:

Check out OSCAR or SystemImager for automating the system installations. 
  That way when you add the new nodes it will take very little time.

If you have some money Scyld is worth looking at.  It automates software 
installs and back end cluster management as well as scheduling and metrics.

These tools should not care which flavor on Linux you are using.

> 
> 1. Should I go diskless and use PXE boot from a chroot on the masternode 
> (I like this) or just install on the nodes' dirves? It seems like it 
> will be easier to maintain and upgrade when all I have to do is work 
> with the chroot. Perhaps the only disadvantage is bootup time? These 
> should reboot infrequently, so I think that should be fine.

This really depends on how your jobs/applications behave.

Are you thinking of NFS mounting root from the server?  That really 
causes the host to spend a lot of network cycles just doing normal OS 
operations, not including any additional programs you are running.  And 
like you say booting is very slow, especially if you have to reboot 
every node.

It was slowing down one of our clusters terribly, but our workloads are 
  hard on the network anyway and pull large amounts of data to crunch 
on.  A small mathmatical model would not see as much of a delay and 
might not notice the overhead of nfs root.

> 2. I think it was purely a kludge because the storage was added later,  
> but the masternode was mounting homedirs and then serving those to the 
> node. It seems like the MN and the slaves should all just mount /home 
> from the storage server directy, right? Any reason to do it otherewise?

No, mount from the server actually serving, you are doubling up network 
load on the master.  Perhaps there was a strange network separation 
reason this was done but that should be fixed.

> 
> 3. For queing, I'm leaning towards torque/maui which looks like the 
> newer version of openPBS. We were previsously using SGE. Any 
> opinions/experience?
> 
> 4. If I leave the hardware raid config alone, I have 73GB raid1 as sda 
> and 440GB raid5 as sdb. I would plan to use sda as /. Since raid1 should 
> be faster than raid5, I thought I would put swap on sda as well. Any 
> reason to do otherwise?

You do not want the nodes swapping over nfs.  You at least want that 
locally on their internal disk.

root over NFS is bad enough, so keep at least something local.

The cluster mangement tools I mentioned earlier all expect an internal 
disk for storing the OS, at least as a default.

> 
> 5. Now this one is far less important, but if I go with diskless boot 
> nodes and am using the storage server for /home, that leaves 32 nodes 
> with 100GB drives in them not doing anything. Also, that 400GB raid5 
> array on the masternode. Any clever use I should put these disks to?

At least put swap and /tmp (or something similar) onto the nodes local 
disks.  Depending on what you are running on them they may want some 
actual fast disk for storing temporary data.  A local slice would be 
ideal.  Our applications were written to take advantage of a local slice 
but like I mentioned before, they have to move a lot of data around.

Personally I do not think that the payoff of mangagement ease with a 
chroot/nfs root is enough to make up for the perfomance loss involved.

The stability of our nodes has increased since moving away from NFS root 
as well.

If you are using a scheduler and a node dies, you can remove that node 
from the list of available ones and no one should notice.

Hopefully this helps, but clusters can be very specific to the 
job/application running.


> 
> I think there are other questions I may have as I get going on this, but 
> I welcome any comments or suggestions anyone has.
> Thanks,
> JDR
> 
> --
> Jeremy D. Rogers, Ph.D.
> Postdoctoral Fellow
> Biomedical Engineering> Northwestern University
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Tucson Free Unix Group - tfug at tfug.org
> Subscription Options:
> http://www.tfug.org/mailman/listinfo/tfug_tfug.org





More information about the tfug mailing list