[Tfug] Small-ish (capacity + size) disk alternatives

Bexley Hall bexley401 at yahoo.com
Wed Jan 30 00:25:19 MST 2013


Hi John,

On 1/29/2013 8:48 PM, John Hubbard wrote:
> On 1/28/2013 10:46 PM, Bexley Hall wrote:
>> [snip]
>> SSD's would probably fail in short order due to the
>> inherent media life limitations (the read-write uses for
>> the disk involve lots of update cycles).
>
> Can you define "lots of update cycles"?

Replace your swap partition/tmp filesystem with an SSD.  Then,
throttle back the amount of RAM in your system until you just
start to *notice* performance consequences.

> There are lots of fear around SSD lifespans that seems to be unfounded.
> This forum [1] looks at some of these issues. In most cases a 40~64 GB
> drives allows a total of hundreds of terabytes of data to be written
> before failing. One contributor reported writing 685TB of data to a 40GB
> Intel 320 before it failed. How much data are you going to write?!?

Apples and oranges.  If you look at a drive as a homogenous storage
medium randomly accessed (i.e., no "precious" data), then you can
approach the longevity of the underlying media (though even this
continues to be threatened as we move to tri-level flash and higher
densities)

But, when you take some portion of the medium *out* of the equation,
it has a big impact on longevity.  E.g., any data that is seldom
(or never) rewritten "wastes" the endurance available in those
cells.  Sort of like having a rewritable CD/DVD medium and only
rewriting *part* of it (repeatedly).

Next, when you are treating disk as "memory", *any* write gets
magnified to the size of the allocation unit containing that
datum.

E.g., if I want to toggle a bit, I have to modify the entire *byte*.
When it comes to disks, a byte change means a *sector* change.

With FLASH (esp NAND), the "allocation unit" isn't *just* a "few
hundred bytes".

You *write* to a flash in units of "pages" (typ. 512, 2K, 4K).
However, *erasing* is done in units of *blocks* (typ 32, 64, 128
*pages*... i.e., 16KB to 512KB!!).

But, we started out talking about a *bit*!  ("Remember Alice?
This is a song about Alice.")  Suddenly, we're manipulating
hundreds of thousands of bits for the sake of *one* that has
changed!

Of course, you typically don't go poking at single bits in memory!
<grin>

But, in most operating environments, you have very little say
over how your data is stored, *where* it is stored, *when* it
is *actually* updated, etc.

If you know your data will be backed with NAND FLASH, you would
organize your data so that entire "pages" tended to change at
the same time.  E.g., 4000 bits in a contiguous 512B page -- and
definitely not one bit in each of 4000 RANDOMLY DISTRIBUTED pages!

Additionally, you consider *how* the data is represented.  E.g.,
instead of representing a counter (that *tends* to be used in a
monotonically increasing fashion) with the count sequence:
    1 (0001)
    2 (0010)
    3 (0011)
    4 (0100)
    5 (0101)
    6 (0110)
    7 (0111)
    8 (1000)
    ...
you would, instead, opt for a sequence more like:
    1 (0000)
    2 (0001)
    3 (0011)
    4 (0111)
    5 (1111)
    6 (1000)
    7 (1001)
    8 (1011)
    ...
because the first sequence requires an "erase" operation (or a data
migration) to proceed from 1 to 2, 3 to 4, 5 to 6, 7 to 8...  By
contrast, the second sequence only requires an erase at the 5 to 6
transition!

(handwaving:  assuming ones are writable but zeroes are only available
by erasure.  The opposite can be the case -- depending on "where"
you are looking at the data)

Each erase reduces endurance.  And eats up lots of "time" (making
the block containing those bits unavailable for the duration!)

In most programming environments, you have absolutely *zero* chance of
exercising this sort of control over your application's robustness and
performance (Shirley none with a 'U' and an 'X' in their name!  :> )

E.g., imagine you are implementing a database that tracks employees:
names, SSN's, addresses, pay, attendance/vacation, etc.  You create
a record for each employee.  Pretend it fits in a single page.

Each week, that record needs to be updated to reflect the wages-to-date
(at the very least).  The entire page is updated even though only
one datum on the page has changed (i.e., the employee's *name* hasn't!
Nor has his SSN, address, department, etc.).  *And*, the pages for
every *other* employee are also updated!  Each page takes a "hit".

If some other application later comes through to update the "vacation
accumulated" entry, then the page gets hit a *second* time.  Again,
this applies to all employees (i.e., each page)!

OTOH, if you reorganize the data store so that the "less frequently"
changed items were maintained separate from the "more frequently"
changed items, then a trivial caching strategy can often group the
"wages updates" for multiple employees into a single "page update".

In that case, fewer pages "wear" each week.  The medium appears to
last longer.

> Beyond the endurance that you get out of a stock drive, as someone else
> mentioned, more spare area (via empty space or over-provisioning) -->
> less write amplification --> longer life span. Additionally enterprise
> class drives also offer better endurance (in many cases this is done, or
> at least helped, by greater over-provisioning).

Many drives try to anticipate the "desktop/server" environments that
they are deployed in.  E.g., a database server will tend to deal with
data in 8KB chunks.  A "PC" typically deals with "load once"
applications where the executables represent a large part of
the medium (compared to, e.g., a DBMS).

You can also play games with *how* you map the pages and blocks
to the allocation units (e.g., sectors) that the user will see.
E.g., for performance, you want to match page size with whatever
the user tends to manipulate (since you can load up a *full* page
and then write it in one operation).  If the user's allocation
unit is a multiple of page size, you can ensure that "adjacent"
pages (adjacent from the user's perspective -- based on your
ASSUMPTIONS about how the drive will be formatted) reside in
different FLASH devices handled by different *controllers*.
So, you take the user's updated allocation unit and distribute
it to the N different controllers which can all update their
respective "chips" concurrently.

And, you can implement a mixed media solution whereby you
include a "RAM cache" in the SSD (which may, also, reside in the
actual FLASH "chips" in many cases) so you eliminate the need
to update the "backing store" for every write.

[You could even include BBSRAM and/or NOR FLASH if you can identify
usage patterns (for your market) that can exploit it]

But, they are far from "tuned" to their environments.  Rather, just
try to be "not as dumb" as a naive implementation might be.

> I personally have SSDs in all of my personal machines. Intel X25-m
> (40GB), OCZ Solid (30GB), OCZ Vertex-Turbo (30GB), Intel 520 (120GB) and
> Intel 330, 4GiB who knows in my Eee PC 701 (4GB). And an I'll admit that
> I don't have write heavily to the systems (my worst offense is probably
> the new 20+MB firefox-nightly package that I update daily.) but I have
> had no problems with any of the system.

As I said above, try moving your swap onto that medium (and downsizing
the RAM so you *don't* have a "surplus" of the latter -- i.e., need to
hit on swap pretty regularly).

[Hint:  *don't* -- unless you want to replace the media in the next
few months]

> [1]
> http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm





More information about the tfug mailing list