[Tfug] Small-ish (capacity + size) disk alternatives

Thu Jan 31 17:52:31 MST 2013

Hi Yan,

On 1/31/2013 2:57 PM, Yan wrote:
>> No!  My problem is *laptop* HDD's wear out -- not HDD's in general!
>> In the 30 years I've owned computers, I've had exactly three HDD's
>> wear out -- all laptop drives despite the fact that I rarely *use*
>> a laptop (one drive in a laptop, two more in this 365/24/7 situation).
>
> You got quite lucky. I haven't owned computers for anywhere near 30
> years (although for most of my computer-owning time, my gear's been
> bought used), but I've had at least a dozen hard drive failures. Here
> in the lab, we have somewhere around 100 computers with probably
> upwards of 250 hard drives. Those were all bought new, and most of
> them are enterprise-grade. We have something around a failure a month,
> and that's probably beating the statistics on these things.

AFR's for disks are in the 2-8% range.  So, if *all* of those computers
are seeing "regular, normal use", you would expect 5 to 20 of them to
fail per year.  Or, 0.5 to 1.5 per month (depending on brand, production
lot, operating conditions, etc.).

I typically have ~1-3 machines running "normally" -- figure 2-10 drives.
So, in a year, I'd expect to see .04 to .8 drive failures.

But, my computer usage hasn't been constant over those 30 years.
E.g., my first disk drive was a wopping 1MB drive (yes, "one").
I didn't keep it long (it was bigger than many of the early PC's.

My first practical drive was, I think, 10MB -- almost 10 whole
(8 inch!) floppies on a single drive!!  :>

I didn't keep it long, either!

My first IDE drives were 60MB units (from early 386 machines).  I kept
them running for many years before the machines were just taking up
more room than they were worth!  (I *threw* the drives onto the
concrete floor to crash their heads before discarding them -- since
they were still operable).

I still have a pair of 600MB IDE drives from 20 years back that are
now doing duty in a "Compaq 386 Portable" (that I keep for some
legacy hardware installed in that machine).

Currently, I have about 20TB here -- with the bulk of that in cheap
"consumer" external drives (that I have been using to snapshot my
development database/software on a daily basis... disposable once
this project is done).  About 6-7TB of that is spread out over
dozens of spindles (e.g., each of my Sun machines has a TB on
14 spindles; x86 servers have a similar amount on half as many
spindles; etc.).  I.e., I have preferred smaller drives and SCSI
drives for most of my media.

>> No!  But, neither does every "block worth" (which might be as much as
>> 500KB!) contain 100% "new data".  That's the point here -- that the
>> work you are asking the disk to do can be much less than the work
>> it actually ends up *doing*.
>>
>> E.g., if I update the "hours worked this week" for each employee
>> in a dataset and each employee's "record" resides in a different
>> FLASH block, then an entire block is erased for each employee in
>> the organization -- even if that's only 4 *actual* bytes per employee!
>
> Admittedly, this is a problem for SSDs (although I'd argue it doesn't
> preclude it from being used for the application you describe). There's
> some research being done in this area. Samsung and some South Korean
> university published a paper proposing a new filesystem optimized for
> (among other things) reducing erase cycles. The paper is at
> http://static.usenix.org/events/fast12/tech/full_papers/Min.pdf and
> the presentation at
> https://www.usenix.org/conference/fast12/sfs-random-write-considered-harmful-solid-state-drives
> . It doesn't look like they released the implementation itself,
> though, so this is more of a future argument.

The problem with this is that it forces the application to be cognizant
of the underlying hardware implementation.  At the same time, there is
increasing bloat and abstraction in OS's and application development
tools that obfuscates the view "down" to the hardware.  So, it is harder
to tune the application to fit the hardware.  If the hardware is
homogenous and reliable, this isn't an issue.  But, if the hardware
is neither of these, then system reliability becomes suspect.  *Before*
the device is even deployed!

>> But it isn't.  Error frequencies go up which means more (hidden)
>> update cycles are incurred, etc.  Notice how "enterprise" SSDs
>> tend to stick to SLC technology -- trading capacity/speed for
>> endurance/reliability.
>
> I'm not sure if this is the case anymore. Most SSDs I've seen billed
> as enterprise are heading to MLC nowadays. This might be a temporary
> trend, though, for all I know.

I think you will either see a significant effort made to improve the
reliability of MLC technologies (i.e., redundancy -- which sort of
eliminates the advantage of MLC to begin with!) or a shift to some
other technology/tricks that sidesteps this issue.

Look at where the drives are currently being used, their pricepoints,
etc.  Technology's goal is always to move towards the ubiquity of
the "consumer" market (SSDs don't seem ready for prime time, there)

>>> So what. This is what they do. If you have stuff that is never being
>>> written (only read) the controller will move it to a frequently written
>>> cell. It will do this before it thinks that there is only one write left
>>> on that target cell. Even if the cell fails, so what. Cells fail. Those
>>> cells are marked as bad, and the drive uses other cells. ALL modern SSDs
>>> have spare area. They can (and do) handle cell failures.
>>
>> This is reflected to the interface (i.e., user) as indeterminism.
>> The application never *knows* that the data that the drive has
>> previously claimed to have "written" has actually been written.
>> It's an exaggeration of the write caching problem.  But, it is
>> brought about by the inherent "endurance" limitations of the
>> media.  As if you had a HDD that was inherently failure prone.
>
> It isn't much different from HD caching, or from the OS-level caching,
> or anything like that. If you want to complain about the disk
> equivalent of bufferbloat, that might be valid, but pretending that
> this only occurs with SSDs is very inaccurate.

The point is that the media has this as an *inherent* "flaw".
HDDs remap sectors when they are found to be bad or suspect.
They don't routinely move "safe" data around on the media
because other parts of the media are flakey.

> In fact, it's probably
> better on an SSD, because the amount of time that it takes to remap a
> block is (probably) faster than the amount of time data sits in an HD
> write cache waiting for the HD to spin to the right place.
>
>> Imagine a HDD that was designed to *randomly* pick a block of
>> otherwise stable data, copy it to a new location, verify that
>> the copy succeeded and then erase (not just overwrite) the original
>> all so that some NEW piece of data could be written in its place
>> (and, at the same time, updating the behind the scenes bookkeeping
>> that keeps track of all that shuffling around -- using media
>> with the same "characteristics" that it is trying to workaround)
>
> That'd be insanely slow, and that's because HDs are HDs, and are slow.
> While an SSD would be faster if it didn't do this, it's still quite
> fast while doing it.

But the reason it is doing this is because the technology/media isn't
up to the expectations that this *other* media has already championed!
It's like putting a 100 gallon fuel tank in your gas guzzler just
so folks won't be discouraged from purchasing it because it can
only travel 200 miles on a more traditional sized gas tank (when
other vehicles get 300+)

>>> I fail to see the problem. SSD controller have a complicated job to do,
>>> and they do it.
>>
>> They *try* to do it.  I saw a recent survey claiming 17% of respondents
>> had an SSD fail in the first *6* months!  (of course, a survey in which
>> respondents self-select will tend to skew the results -- people are
>> more likely to bitch about their experiences than praise them!)
>
> On top of the statistical issues with that poll (which had "600+"
> respondents), I would guess that those types of failures ("my SSD
> completely stopped working") are more likely caused by the disk
> controller giving out than the underlying medium. That's a guess, and
> it doesn't make things better for the end-user, but it might take some
> of the heat off of the wear-leveling debate.

If by disk controller you mean the (integrated) *flash* controller, it
just draws attention to the higher level of technology involved in
FLASH media vs. magnetic media.  How long do you wait for folks to
get the software, wear leveling technologies, etc. right?

> It does shine light on another issue, which is the fact that SSDs are
> just barely leaving the early adoption phase. I hopped on about a year
> ago, and haven't had any issues. People around the lab are getting
> SSDs more and more, and I haven't heard of any failing either, so at
> least my anecdotal evidence implies to me that the manufacturers are
> getting the hang of things better.

I've a friend who has already replaced his *replacement* SSD.
Consider the lifespan of most laptops and you've got to wonder
why he's had two fail, already.

>> So, you are suggesting I simply say, "Buy this particular SSD otherwise
>> the system won't work"?  Would you build a MythTV box if you were told
>> you had to use this disk (endurance), this motherboard (performance),
>> this fan (sound level), etc.?  Or, would you cut some corner and then
>> complain later to anyone who will listen?
>
> "Buy this particular SSD otherwise the system won't work" should
> really be "Buy this particular SSD otherwise the system will fail
> faster".
>
> If you're going to build a system, the quality of the components you
> choose is going to affect its performance and reliability. You can't
> get away from that. If you go with HDs, you'll have to say "Buy a
> non-laptop drive of this caliber or it'll fail faster." It's the same
> thing. If you go with a high-quality SSD, it doesn't mean that the box
> won't work with a low-quality one. It'll just work slower or less
> reliably. It's the same for every other component.

A year vs. 5-10 years is a huge difference!  As I said in a previous
message, I find it annoying that I have to replace the batteries in
the smoke detectors annually (and that is a *safety* issue, not just
a "convenience" one!).

People, for the most part, aren't proactive.  They wait for things
to break instead of monitoring/maintaining in an ongoing basis.

Imagine if you had gone to the trouble to put together a MythTV
box.  Then, less than a year later, discovered the disk drive
had died due to the usage patterns that application placed on it.
You *might* replace the drive (assuming you had kept good notes
on how to initialize the new drive).  How excited would you be
to replace it *again* a year later??

E.g., chances are, you'll throw together a MythTV box out of
"spare parts" to "explore" the features it presents.  Its
largely "free".  And, you'd replace that disk with another
"spare" you had on hand at the time.  But, faced with the
"replace every year" realization, would you run out and BUY a
more expensive drive?  Or, would you buy a COTS DVR that you
could replace en toto when *it* fails?

>> Passive cooling.  The inside of the enclosure has never been above 35C.
>> No one wants to listen to fans 24/7/365!
>
> Here's another plus for SSDs. They'll pump a lot less heat out, so
> your machine might work in less friendly environments (ie, locked in
> an entertainment center or something).

This was the appeal of the laptop drive -- far less power than
an equivalent amount of DRAM *or* a full-sized hard disk.  E.g.,
here, it sits in the bottom of a closet (no ventilation other than
the occasional opening of the closet door to access the other
contents of the closet.

>> When was the last time you replaced your thermostat?  Irrigation
>> controller?  Garage door opener?  Washer/dryer?  Doorbell?  TV?
>> Security camera(s)?  DVR?  "HiFi"?  Hot water heater?  Weatherstation?
>>
>> Then, ask yourself *why* you replaced it:  because you were tired
>> of "last year's model"?  Because it wasn't performing as well as
>> it should?  Because it *broke*?
>>
>> Chances are, most of these things did their job until they broke (or
>> were outpaced by other technological issues) and *then* were replaced.
>
> In the last year, I've replaced an irrigation controller (and
> irrigation pumps), had to repair a dryer, had to repair an AC unit,
> and had to replace a hard drive (not a laptop one, either) in a
> security camera system. That's not including regular filter changes
> and so forth, either.

You've been *un*lucky!  :>  We replaced the washer and dryer after
20 years because "she wanted something new" (though they were in
pristine condition).  I had to replace a start cap on the cooling
fan for the (outdoor) AC compressor many years ago but no other HVAC
issues (other than filters in the circulating fan).  Hot water heater
has 20+ years on it.  Same for refrigerator and stove (though the
latter to be replaced soon as part of remodel)

Never had a problem with a doorbell in 50 years.  Last TV lasted 25
years.  Threw away my HiFi VCR a few years ago after copying everything
onto DVD media.  Still have my 4 Nakamichi's (though they see routine
maintenance).  A pair of two "integrated" bookshelf stereos have seen
the same CD changer repair a total of three times in the past 25 years
(one just being serviced last week -- par for the course).

Pinball machines see a fair bit of maintenance -- but that's the nature
of the beast.  Cars need oil changes, filter changes, etc.  Printers
rarely need "more toner".

The single biggest maintenance issue I've encountered, here, has been
the swamp cooler (which we haven't used in years) and the *roof*
itself!

I.e., we tend to invest in things that will "last" instead of
saving a few pennies here and there.

> Components fail. There's not going to be a
> magical HD that'll prevent that.

Components fail but at vastly different rates depending on
their quality and how well their usage is matched to that!
I've designed products that routinely see 20+ year lifetimes.
The difference is how you *approach* the design.

I watch the stuff that comes into World Care every day.  And
how *small* the difference between failing *now* vs. still
being usable.  OTOH, those vendors aren't going to "waste"
money to increase the usable lifetime of a product that
their consumers have been "trained" to expect to replace
that often!  (who would design a consumer PC with more than
a 3 year expected lifetime?)

>>> You are avoiding one limitation (SSD finite erase/program cycle) but
>>> with HDDs you still suffer mechanical wear and tear. As you noted in
>>> your original email the HDDs you've been using "die pretty easily".
>>
>> But those have all been laptop HDD's!  E.g., I suspect moving to
>> a "real" disk drive will give me the same sorts of reliability
>> that I've seen in my other machines (though at a higher power
>> budget and cooling requirements).
>
> I think the common wisdom is that a good HD will have a longer
> lifespan. The argument a) that this lifespan isn't indefinite and b)
> that an SSD's lifespan won't be unreasonably short. But it's your
> system; you'll ultimately have to decide whether it's worth upping the
> specs for it.

I'm not worried about *my* "instance" of the system -- it won't
run on COTS hardware (since COTS hardware doesn't give me the
physical size characteristics that I want, the power budget,
reliability, etc.).  But, I have been trying to keep the *design*
"mainstream" enough that others could replicate it, functionally.
(i.e., if you can afford the space, power and noise, you could
site a group of PC's in a "closet"/basement someplace and use
COTS components).

It could be that this is the falacy in my approach -- trying to give
the benefits of a "proprietary" design to a COTS implementation.
I.e., maybe I just have to tell folks they have to build a particular
set of boards to replicate the design!  (this would sure make the
software easier as it wouldn't have to worry about hardware
abstractions!)

>>> If you don't fully understand your data access/update patterns it
>>> doesn't seem like you can say whether or not they will overly burden an
>>
>> I can look at data write *rates* (sector counters) and make conclusions
>> based solely on that!  The SSD won't give me any better guarantees than
>> total number of rewrites.  I.e., it doesn't care if I am writing
>> "AAAAAAAAX" or "AAAXAAAAAA" in place of "BBBBBBBBB" -- as long as either
>> write is a "sector".
>>
>> Knowing the access patterns (at the application level) IN DETAIL lets
>> me restructure tables so that the data that are often updated "as a
>> group" tend to be grouped in the same memory allocation units.
>>
>> E.g., if you;re running a payroll application, then wages and taxes
>> are the hot items that see lots of use.  OTOH, if you are running
>> an application that tracks attendance (timeclock), then wage
>> information probably sees *less* activity than "hours worked"
>> (which would have to be updated daily).  In either case, employee
>> *name* is probably RARELY updated!
>
> If the data is in a database, ensuring this at the hardware level
> might be harder than you'd think, given filesystem abstraction,
> separate storage for indices, etc. If the data is in a DB-backed LDAP
> directory or something, you can probably forget about it. Of course,
> if you're using a specialized filesystem and so forth, it's possible,
> but that's a lot of effort for questionable gain.

For a *particular* DBMS, you can create your tables to exploit
the sorts of accesses and updates that you actually encounter.
With judicious use of tablespaces, you can move individual
tables into different spaces -- deliberately backed by "drives"
having the appropriate capabilities.

E.g., the imaginary employee database I've mentioned could separate
the static "employee" information from the "wage"/hours information
(two or more tables linked by a shared index).  The employee table
can be backed by a "write infrequently" medium while the wage table
is backed by a "write frequently" medium.

>> It won't be *me* that's rolling the dice!  :>   Rather, it will be
>> someone who tries to build and configure a similar system and
>> wonders why *his* choice of storage media proved "less than ideal".
>> Or, why the *identical* system ("as seen on TV") performs so
>> differently after he's made some "trivial" changes to the code.
>>
>> "Gee, I just changed the payroll program to update the wages on
>> a daily basis -- each time the timeclock recorded additional
>> hours for the employee.  Now, I'm seeing problems with the wage
>> data's reliability..."
>
> Given the abstraction involved in these things, I think the actual
> effect will be "Gee, I just changed the payroll program to update the
> wages on a daily basis -- each time the timeclock recorded additional
> hours for the employee. Now the disk failed a day faster.". They'll
> eventually see data issues ("bad sectors" and so forth) from the wear,
> but the wear-leveling will distribute it out of just the
> frequently-updated wage data.

That depends on what's backing the individual data.  E.g., if the
wage data had been on the "write infrequently" medium, suddenly,
that medium sees 5-7 times the usage rate that had been planned
for it.

Tracking information shows I should receive my gifted "laptop
disks" tomorrow.  So, I can reset the "life clock" and see how
long *they* last.  Meanwhile, I'll look into the availability
of physical RAM disks to replace the VM requirements of the
disk drive.  (this might be a viable option *if* a user can just
install their own "surplus" DIMMs to get capacity "for free")