[Tfug] raid help

Mon Mar 31 02:14:31 MST 2008

Ron, there is a lot in your post that could probably fill a book of OS
theory and IT best practices.  I'll try to briefly break out some of
the core concepts.

The purpose of all of these technologies is high availability.  In other
words, a system that can suffer a component failure and keep running
with minimum service impact.

I think most people here understand RAID 1.  Two drives mirror the same
data so if one fails, an exact copy is on the second drive.  The
computer can keep running on a single drive without losing any data.
Higher end systems have hot swap drives so you can replace the failed
disk without suffering any downtime.  The RAID subsystem can then
re-sync the mirror between the existing good drive and the new drive.

So the key to RAID 1, lose a drive, keep chugging along without
significant slow down.

Virtual memory splits the system memory into small chunks called blocks.
  The virtual memory subsystem will move blocks around between main
memory and the swap area (be it a partition or file).  Less frequently
used blocks will be moved to disk to make space in RAM for active
blocks of memory.  Since the OS looks at memory as blocks, portions of
a process can be in RAM and have other parts on disk.

Now since you see it's not an all or nothing deal to be in RAM or in
swap, it would be pretty bad if a disk failure caused swap to
disappear.  This is why people use RAID to mirror swap.  It goes back
to the theory of keeping the system available as best as possible.

High Availability setups (HA) cover the case of a service crash (e.g.
your datbase) and/or a full system crash.  A heartbeat is sent between
the 2 systems in your cluster.  If a system stops answering, the other
one will assume the services.  This is usually done with IP aliasing to
float a "service IP" between the 2 servers and shared storage on the
backend.

Heartbeat is a whole topic within itself.  It can be simple like a ping
between the systems.  Or it can be a smarter protocol aware probe that
can detect if a specific service is running. (e.g. a POP probe that can
actually log in and retrieve a test message in a preset amount of time)

Reliable storage underpins a good HA cluster.  The HA machines run the
service with their cpu/memory/network and the data floats to the active
machine in the cluster.  The service will only be active on a single
machine at a time.  Bad things happen when the filesystem is mounted
read/write on both servers at the same time.  Cluster filesystems may
change this but many active/passive HA clusters are running on a
traditional SAN or distributed filesystem.  How the storage floats
depends on your storage setup.

So if a HA node were to go crazy and start crapping all over the disk,
you are screwed because it boils down to a single server with the
active filesystem attached to it.  Switching the service over to the
other clustered server won't uncrap the data.

Luckily the common case is a software crash or a simple hardware
failure.  Computers aren't good at automatically recovering from
hardware gone wild.  Wild hardware = recovery from backups.

People who spend big bucks to get HA clusters, servers with ECC memory
and reliable storage aren't too concerned about over-using a disk.
Disks are bought to be used and have hot swap RAID to fix failures
without suffering downtime.  IT departments pay extra for quick
warranty replacement to minimize the window of running in a degraded
state.

If the thought of over-swapping keeps you up at night, the best solution
is to buy more RAM.

Hope this helps,

Brian

Quoting Ronald Sutherland <ronald.sutherland at gmail.com>:
> I got a lot of reading to do on HA but I think the idea is not to transfer
> the computer core memory system of one to the other, but the data that makes
> it into a journaled hard drive, which has gone though memory parity test and
> should be good (I think anyway). This is the data I would want transfered
> over in a timely fashion. That works (I hope) with an an SQL server, because
> they make sure data is on disk (meaning the journal is updated) before
> marking the transaction as complete. So when the other computer launched and
> found SQL data not marked as complete, it is rolled back. Anyway I think the
> idea is to let the computer die as fast as possible when a memory compromise
> is detected and then have a plan to recover with what is most likely good
> data.
>
> If bad data gets to the hard drive then never mind, I vote back-up, but I
> still do not see a good reason for mirroring the swap files (other than
> speed). Virtual memory is an extension of main memory, and anything going
> wrong in main memory is grounds for stopping, this is indefinitely true. If
> I assign importance to memory I place my own bits as most valuable, system
> and service bits next, then main memory. I care least if main memory is
> lost. I care more if system or service bits are lost, and I will have a shit
> fit if my stuff is lost. So I try to separate these things physically, I
> have at least 3 drives, 2 mirrored for my stuff, and one for system and
> swap. If the system and swap die I don't actually care very much, and can
> recreate it in about 1.5hr. The less the swap system is hammering on the
> stuff I care about, the better I feel (does that make sense?).
>
> On Sun, Mar 30, 2008 at 3:31 PM, Brian Murphy
> <murphy+tfug at email.arizona.edu<murphy%2Btfug at email.arizona.edu>>
> wrote:
>
>> Jumbled RAM is a game over situation.  Especially if the system writes a
>> portion of the jumble out to your disk.  Mirroring just ensures that
>> both drives get the jumbled data. :)
>>
>> Backups are a good idea and about your only recovery option if you write
>> garbage data over the good stuff.
>>
>> HA is good, but you typically have shared disk between the 2 servers.
>> If server 1 writes bad data, server 2 will be equally hosed if they
>> share disks.  The same is true if you have subtle file corruption and a
>> periodic sync between your two servers. (e.g. rsync from cron)  Keep
>> tape backups for anything you really don't want to lose.  Snapshots can
>> also be part of a good recovery strategy as long as you don't have
>> controller issues that scramble your entire disk.
>>
>> Brian
>>
>>
>> Quoting Ronald Sutherland <ronald.sutherland at gmail.com>:
>> > Last time I was trying to figure out what all I should mirror I was
>> having
>> > over heating issues (it was jumbling my RAM). I've also seen power line
>> > sags/spikes/noise, and power supply's go bad and jumble RAM. So for my
>> needs
>> > I've decided that first I want data (SQL, FileServer, SVN/CVS...)
>> mirrored
>> > and second my system to be fully duplicated and/or real easy to build
>> again
>> > (a setup that is scripted). Having seen memory get messed up for various
>> > reasons, I didn't see much advantage in adding redundancy to the virtual
>> > part of the memory system, although I guess mirroring gives a speed
>> > advantage during reading. I have the hardware but not yet the time to
>> setup
>> > full redundancy. Many of the Linux rags have ran articles on a service
>> > called "heartbeat" that allows the backup to have a clue if the
>> main/master
>> > is alive and then take over if not, anyway thats what I'm looking into.
>> >
>> > http://www.linux-ha.org/Heartbeat
>> >
>> > On S
>>

The opinions or statements expressed herein are my own and should not be
taken as a position, opinion, or endorsement of the University of
Arizona.