[Tfug] RDBMS reprise

Tue Jan 22 09:54:11 MST 2008

Argh!  Sorry to revisit this issue but I keep
thinking myself in circles... :<

Recall I am trying to work with a (severely)
resource-starved client and databases of
potentially large size.  The problem is
handling queries which may return a large
number of results.

There are several strategies -- each with a
different set of tradeoffs/problems:

1* Client issues query and pulls down "all"
results.

This is the *ideal* solution as the snapshot
it provides of the data is known to be self
consistent (i.e., query is atomic operation).

The server-side resource requirments are
transient -- the data is not duplicated.  But,
the resource requirements on the client are
boundless.

2* Client issues query and uses server-side
cursor to step through the results.

This shifts the resource load to the server.
It runs the risk of requiring the entire
dataset to be duplicated on the server
(e.g., "SELECT *").  However, the results
are always self-consistent.

Client-side resource requirements are at
their minimum, here (?).  However, some
additional overhead is required to manipulate
the cursor.

3* Client issues query and pulls down some
number of results each time -- reissuing
the query until all results have been seen.

Server-side resources are reduced by taking
smaller "bites" instead of an entire snapshot.

This bounds client-side resource requirements
at the expense of some overhead in accessing
the dataset.

But, the data may contain inconsistencies as
the dataset -- as a whole -- is not accessed
atomicly (i.e., it can change between "bites").

4* Client locks the table(s) on the server
and accesses some number of results at a time.

This bounds server- and client-side resources
at the expense of locking the data for an
indeterminate length of time.

Inconsistencies are not a problem as the
query is effectively atomic (since dataset
is locked for the duration).

1* is out -- it won't work in my environment
due to resource limitations on the client.

3* seems to be the worst of both worlds -- it
is harder to implement and requires concious
handling of potential inconsistencies in the
data (in a way which depends on the nature of
the *specific* data -- i.e., inconsistencies
in one set of query results may need to be
handled differently than inconsistencies in
a different query's results... you can't
encapsulate/standardize this handling!)

4* is downright scary -- unless the compute-time
of the client can be bounded and *guaranteed*,
it seems like it is just one giant deadlock
waiting to happen!  (since other applications
will be accessing some or all of the dataset
concurrently).

(sigh) This sucks!  But, I guess it is the only
practical solution to my situation (2*)?  Have
I missed any other approaches?

*If* I adopt this approach, what do I have to
watch out for?

No doubt I have to balance the resources available
on the server against the possible concurrent
demands that can be made on it.  Clients will
have to treat query results as precious and
dispose of them as soon as practical (since holding
onto them ties up resources on the server that
impacts other clients).

Are there any tricks I can use to hint to the
server that it can *begin* reusing these resources?
E.g., if I always walk the cursor through the
result set from top to bottom, can the server
exploit this fact to release the resources used
by result records that I have already examined?

Thx,
--don

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ