[Tfug] NAS suggestions

Sat Oct 20 03:36:53 MST 2012

Hi Angus,

--- On Fri, 10/19/12, Angus Scott-Fleming <angussf at geoapps.com> wrote:

> On 14 Oct 2012 at 21:32, Zack Williams  wrote:
> 
> > Just an FYI - for large disks, there a not-insignificant change of
> > data corruption when doing RAID rebuilds:
> > 
> > http://queue.acm.org/detail.cfm?id=1670144
> 
> This is another interesting article on modern drives and RAID:
> 
> Data Storage: The Myth of Redundancy - Datamation
>  http://www.datamation.com/storage/data-storage-the-myth-of-redundancy-1.html

Yes, I think for RAID to survive, disk interfaces either need to
get a lot *wider* (which has not been the trend!) to dramatically
increase bandwidth to/from the medium *or* drives must internally
implement the redundancy (pray for no spindle failures -- redundant
motors??).

E.g., return to the days when disks could pull data off 100+
cylinders at the same time!  (actually makes sense if you
*know* the drive is destined for a RAID array and can, therefore,
be more predictive of access patterns)

Of course, adding complexity also runs the risk of making things
more brittle -- until that technology matures.

It just takes too long to rebuild a failed RAID array.  It assumes
you can *spot* the failure, quickly.  And, are comfortable going
through the steps to rebuild it (i.e., lots of experience/practice
doing so!)

And, *how* you store the data also affects recoverability.
E.g., a bit flip in the middle of an ASCII file can *probably*
be recovered (by a sentient being) a lot more reliably than
a bit flip in that same file *gzip-ed*!

Finally, the consequences of particular errors can vary greatly.
E.g., have a mailing address change from "1313 Mockingbird Ln"
to "1312 Mockingbird Ln" will probably never be noticed (i.e.,
your mail will still get delivered by the letter carrier).
Similarly, a "deposit" appearing to have been recorded as 
$1025.47 instead of $1.47 will likely be discoverable when
someone/thing notices the balance is off by $1024!

To date, I've had one disk fail completely (i.e., fail to spin up)
and, (just recently) another exhibiting unrecoverable data error(s).
Since these were "working disks" (not archive disks), it was a
simple matter to replace them (I've not yet finished examining
this recent failure to see the extent of the problem).

OTOH, I did have an experience some years (20?) ago with an OS
bug *trashing* disks (superblock corruption?).

Moral of story:  don't place much value on data!  :>
(Conversely, if you place value on the data, then be prepared
to put a corresponding amount of *money* on the line, as well!)

--don