[Tfug] recommendations for installing a cluster..

Jeremy D Rogers jdrogers at optics.arizona.edu
Thu Jan 1 08:03:38 MST 2009


[snip]
> on a supported version.  For chroot vs local install, if you can limit
> package installation to a small number of people, I'd really go with a
> local install, and just update everything monthly.  But it really
> depends on where the commonly used programs are run.  If say
> /usr/bin/gcc is infrequently used, and /home/someguy/mysim is always
> being used, then you could probably get away with setting things up
> diskless (just be aware there is some work in setting things up for N
> clients to use a "single" chroot for NFS, but bind mounts make this an
> easier to solve problem than it used to be).

I'm not 100% sure I understand the reason you say local install is
better. We had local install's previously, and for some reason, the
systems started getting inhomogeneous. We would reboot the nodes and
some would boot up while other did not. I think the most attractive
thing for me about diskless isthe idea of only having two systems to
update and install on rather than 33. I'll have to look more carefully
at the NFS problem you mentioned though, so thanks for the heads up
there.

As far as usage, I assume gcc won't even need to be on the nodes'
chroot. All compiling should probably be done on the master. The nodes
would just run the code and use MPI, etc.

> > 5. Now this one is far less important, but if I go with diskless boot nodes
> > and am using the storage server for /home, that leaves 32 nodes with 100GB
> > drives in them not doing anything. Also, that 400GB raid5 array on the
> > masternode. Any clever use I should put these disks to?
>
> I'm really a fan of a local /scratch per node.  Just make sure it's
> known that this area is not backed up and of course, local.  I'd also

Scratch sounds like a great idea, but I'm not sure how it would get
used. The way this will be setup, I don't expect anyone to ever log in
to a node. Everything would hopefully be executed on the master and
the queue would just be sending jobs off to be executed on each node.

> reconfigure the master's raid1/5 ;)

Do you mean you would switch from raid1 to raid5 for everything?

Thanks for the ideas!
JDR




More information about the tfug mailing list