[Tfug] Small-ish (capacity + size) disk alternatives

Wed Jan 30 20:17:30 MST 2013

Hi Yan,

On 1/30/2013 6:58 PM, Yan wrote:
> Hi guys,
>
> E.g., John's mention of "685TB of data to a 40GB (drive)" as if that
>> was admirable.  Yet, that's just ~15,000 times the capacity of the
>> drive (suggesting each FLASH block is rated at ~15,000 erase cycles).
>>
>> Said another way, if you had 39GB of "executables" (i.e., data that
>> is never altered) on the drive, you might be limited to writing
>> ONLY ~15,000GB over the *life* of the drive.  An application that
>> wrote to the disk at 15MB/s would kill the drive in ~2 weeks! (!!)
>
> You're describing dynamic wear leveling, whereas I think pretty much any
> SSD worth its salt nowadays uses static wear leveling (see,
> http://thessdguy.com/how-controllers-maximize-ssd-life-better-wear-leveling/).
> Although
> for all I know, that might not apply to the smaller, cheaper ones.

It doesn't matter.  Would you expect your traditional (magnetic)
disk to wear out once (15,000 * capacity) of data was written to it?
Imagine if *RAM* had a similar *inherent* limitation!

"I've got to get some new RAM for my machine... the old stuff is
so WORN that you can see the GROOVES cut in the dies!"  :>

And, the actual amount of "work" that you are calling on the drive
to do (measured in terms of bytes "written") can be considerably less
than that "15000 C" number!  E.g., I might want to update a single
byte per sector/cluster/FLASH block yet this "costs" as much (in
terms of endurance) as if I had rewritten the entire sector/block!

[This is why DBMS applications are so hard on SSD's -- because they
might update a small portion of a "record" yet the disk sees this
as an artificially "large" update -- due to its internal architecture]

As newer, multilevel processes become more commonplace, you'll see
densities go up -- and endurances go *down*!

Static wear leveling also poses problems for caching and asynchronous
access as the drive has to be able to remap data that is already
"safe and secure" to blocks that may *not* be!  I.e., attempting
to move "existing data" can fail.  So, the drive can't erase the
"good" instance of the data until/unless it has successfully
moved it.  As a result, freeing up blocks that one would *think*
still have some wear left in them can be problematic.  But, the drive
can't report this problem until long after the *need* to make that
space available (e.g., if data is residing in R/W cache waiting to
be committed to FLASH).  I.e., it makes the drive behave as if it
was "slow with a large cache" -- the application isn't notified of
writeback failures until long after the write operation that was
responsible has "completed".

Again, this is a consequence of the decoupling of the storage
media from the application (because of all the *bloat* that
exists in the region between the two)

>> (similarly, assuming you could write to the *entire* media "at will",
>> you're looking at 80 weeks).
>
> With the price of SSDs nowadays (provided that they do support static wear
> leveling), that might not be too bad, and possibly not too much more
> expensive (and if trends continue, might even be cheaper soon).

You're missing the point.  Would you want to have <whatever>
require replacement/servicing in that short an interval?

E.g., would you want to replace/repair (labor cost$) your DVR
because "the disk wore out"?  (in that time frame)  Or, your
PC?  In my case, should the entire multimedia/automation system
grind to a halt because the disk "wore out"?

["Honey, why is the house so cold and all the plants wilted?"
"Never mind that -- why can't I watch TV???"  :> ]

Do I design something with a built-in/inherent replacement date?

Instead, you (I) look for technologies that let you avoid these
limitations.  This is a lot easier to do in hindsight; considerably
harder in foresight!  :-/

I.e., you can't quantify the data access/update patterns until
you can actually *measure* them.  And, until you've identified them,
you can't *alter* them to place less stress on the media.

Years ago, there were lots of fledgling "EAROM" technologies that
predated FLASH (e.g., MNOS).  All had incredibly limited endurance
figures -- i.e., 10,000 writes per cell (!!  not block!) was a
*huge* number!  So, you didn't consider using them as "RAM"
as you would wear one out in a matter of MINUTES (or seconds).

So, you implemented a shadow copy of each datum in normal RAM
and accessed this from your application.  Then, just prior to
the machine "going down" (orderly or unanticipated shutdown),
you quickly "burned" this image into the nonvolatile device.

But, they had incredibly long write times!  So, you had to design
early warning systems to watch for power FAILING so that you could
start the write operation in enough time for it to complete before
the power fell to a level that the electronics (CPU, etc.) would
no longer reliably complete the operation.

As the amount of data increased, this time became prohibitively long.
(remember, you don't want to prematurely signal power FAILING as
that will result in an extra write cycle on the device!)  This
approach quickly runs out of steam.

So, you find another storage medium that gives you nonvolatility
with faster update times (e.g., BBSRAM).  And, when you exceed
the capabilities/economies of that technology, move on to yet
another (e.g., slow-cycling BBDRAM).

In each case, you exploit/bastardize technologies to get the
performance you want in *spite* of what is offered.  :>

I can do a fair bit of tweeking to the update patterns by redefining
tables and views.  And, try to group things into appropriate
tablespaces.  But, I still need a medium that folks can easily
acquire and that has long-term reliability ("months" just doesn't
cut it!).

Of course, it also has to be affordable!  :>

--don