[Tfug] ECC (was Re: Using a Laptop as a server)

Zack Williams zdwzdw at gmail.com
Thu Mar 14 14:39:07 MST 2013


On Thu, Mar 14, 2013 at 1:35 PM, Bexley Hall <bexley401 at yahoo.com> wrote:
>
> So, what does this tell you in terms of the quality/reliability of your
> system?  When do you start getting nervous?  Statistically, a device
> that throws an error is more likely to throw *more* errors in the
> future.  [Unless the source of the errors is the memory infrastructure
> and not the memory (device) itself.]

I write them up as an environmental hazard, caused by cosmic rays
(btw, I've always wanted to build a cloud chamber after seeing one at
a science museum), radon, etc.  Unless there's some systematic,
repetitive error that I see 2 or more times, I don't view it as a
hardware flaw.   Those are the kind of errors I'm seeing.

I've also had cases where I did need to replace memory that was
throwing ECC errors on a daily basis - that's where it's doing it's
job: functioning properly until scheduled replacement can happen (see
also: RAID).

One interesting story that is tangentially related - in the early
2000's Sun released a bunch of processors that had radioactive casings
on the cache chips, which caused these sorts of errors.

http://www.sparcproductdirectory.com/artic-2001-dec-1.html
http://nighthacks.com/roller/jag/entry/at_the_mercy_of_suppliers

- Zack




More information about the tfug mailing list