[Tfug] ECC (was Re: Using a Laptop as a server)

Bexley Hall bexley401 at yahoo.com
Thu Mar 14 13:35:37 MST 2013


Hi Zack,

On 3/14/2013 9:01 AM, Zack Williams wrote:
> On Thu, Mar 14, 2013 at 7:44 AM, Louis Taber<ltaber at gmail.com>  wrote:
>> Does Linux, by default, log ECC errors?  If so where?  If not, how logging
>> be turned on?
>
> Shows up in the system log - the Linux kernel driver that reads error
> codes is named "edac". I've logged a fair number of main DRAM and L2
> and L3 cache ECC errors on my system, probably once ever 2-3 months
> per 64GB of active memory across all systems.

So, what does this tell you in terms of the quality/reliability of your
system?  When do you start getting nervous?  Statistically, a device
that throws an error is more likely to throw *more* errors in the
future.  [Unless the source of the errors is the memory infrastructure
and not the memory (device) itself.]

How do you develop/implement *policy* for dealing with these numbers?
Or, do they just serve a "blinkenlichten" role?

> I view ECC as a safety net.  Not needed if you're walking the high
> wire, but invaluable if you do need it.





More information about the tfug mailing list