[Tfug] RDBMS reprise

Tue Jan 22 12:02:11 MST 2008

Hi, Judd,

--- Judd Pickell <pickell at gmail.com> wrote:

> In your original email I would typically choose 2,
> but I have to wonder. Why is the data set so
> sensitive to being changed? Typical application I
> would think would prefer to have as much updated
> information as is possible. Though I do suppose
> there is a situation or two where a change in the
> dataset would be bad if the app was meant
> to to allow correction of the dataset. I don't
> suppose there is more clues to this mystery than
> you have give so far?

I'm looking for a *general* solution -- I don't want
to have to adopt a different strategy for each
dataset/query.

A PDA is a good model to look at:

You look through your "address book" and see a
list of names (because, among other things, the
screen is too small to show more than a list of
names at the "top level")

This list would be obtained by issuing a query
like: SELECT first, last FROM people

Then, the user *picks* a name that he is interested
in -- "John Doe".  Now, the list of names is
discarded (freeing the memory that it consumed)
and a new query is issued:

SELECT address, city, state, zip FROM people
WHERE first="John" last="Doe"

But, you (the application designer) have no idea
how long it will be between the time the first
query is issued (to build the list of names) and
the *second* query (to request more details on
a particular name).  E.g., maybe a *month* will
pass (this, of course, is silly in this particular
example; but the relative time frames can vary
depending on each application and dataset involved)

What if, in that intervening month, "John Doe" has
been purged from the dataset?  Or, what if it was
"*JANE* Doe" and she has married in the intervening
time (now known as "Jane Buck")?

So, the second query fails.

If, OTOH, the first query could gather up all of the
information that the user might want and just 
*filter* what it presents to the user at each level
in the application, then there is no possibility
of the dataset becoming inconsistent between
queries.

There are a variety of other readers-writers type
problems inherent in any such design when multiple
clients/agents are involved.  There is no "perfect"
solution but you want to come up with a solution
that:
- is relatively easy to implement ROBUSTLY
- can be implemented consistently without
foreknowledge
  of the nature of the dataset or the query
- results in the "least surprise" for the user

For a more concrete example, imagine storing DHCP
leases in the database.  What if the lease expires
in the time between query #1 and query #2 -- and,
a different client comes along in the intervening
timeframe and acquires the IP assigned to the
original host?

In some applications, you can come up with a
cheating way to work-around this race.  For example,
when decoding barcodes, I need to know the time
between successive black and white edges.  That is,
the wand produces a single bit output -- "black"
vs. "white" -- and I time the distance between
each contiguous pair of transitions (W->B, B->W).

These time intervals can be *very* short.  Some
bars (bars are white or black -- i.e. the "spaces"
between bars are really bars, too!) may be 0.007"
wide.  And, if the user is scanning them at 100"/sec
(which really is not very fast -- try it!) you
are looking at a time of tens of microseconds.
Add to this the fact that ink tyically spreads
when the bars are printed (this is addressed in
barcode printing standards) which can thus reduce
the width of the "spaces" between them.

As a result, you have to clock the counter (timer)
reasonably quickly to get fine enough resolution.
With inexpensive (read: "dirt cheap") hardware,
you often only have an 8bit counter/timer to
play with.  Clearly, you can't let the counter
overflow and generate an interrupt every < 250usec
(this would eat up all the processing power in a
small processor -- just "implementing the carry out"
of the timer).

So, you cascade two 8bit counter/timers.  But, you
have to read them in two separate accesses!

What if the timer overflows (carries out) between
reads?

A naive approach is:
- read one byte of timer
- read other byte
- reread first byte
- compare first byte read with its "reread" value
- if different, repeat

But, this is silly as:
- if you opt to read the LOW byte of the timer
  *first*, chances are, it *will* have changed
  on the reread!
- the time spent reading the timers is not bounded
  (if there is a difference between the two reads,
  then you spend more time RE-rereading)
- you can get *stuck* doing this forever (i.e. if
  the timer keeps overflowing -- which is possible
  if you allow interrupts to occur while this is
  happening)

A simple solution is:
- read HIGH byte of timers
- read LOW byte of timers
- reread HIGH byte
- if HIGH byte has changed, use second value and
  pretend LOW byte is "00" (this assumes counter
  "count up")

You can make similar assumptions for some of the
datasets that you may encounter -- but not all.
So, you don't want to have to come up with a
clever *hack* for each particular query, etc.
I don't like having to solve the same problem
twice!  :>

--don

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ