[Tfug] "Ordering" tuples in a table

Mon Oct 19 23:47:28 MST 2009

Hi Terence,

>> In particular, I am trying to figure out how to
>> adapt to the fact that tuples in the database are
>> not ordered in the same sense that I can explicitly
>> order them in a TEXT (or DATA) segment within my code.
>> (yes, I know you can order them "on output" but the
>> data themselves are not really "ordered" within the
>> table -- forget indices, etc)
>> 
>> E.g., the RDBMS doesn't see any difference in the
>> following two lists:
>>   Tom Martin
>>   Bob Jones
>>   Fred Williams
>> 
>> vs.
>> 
>>   Fred Williams
>>   Tom Martin
>>   Bob Jones
>> 
>> And, this is "A Good Thing"!  :-/
> 
> I think you miss the point,  The RDBMS holds the data, not
> the information.  An order implies information whicj is not
> in the realm of data storage.   In a broad definition of
> RDBMS the order of a set is indeterminate.    

That was *exactly* the point I was making!  The RDBMS *can't*
preserve the order (unless I artificially impose that with a
special field, etc. that *I* "interpret* as "order")

However, I am sure that other real world problems make 
assumptions regarding the order/presence of certain tuples 
*in* the database.  My question is:  how do you handle this
in those "other" applications?

> But, when building table driven code, you often
> implicitly (or explicitly) apply some ordering to
> the contents of your tables!  It allows you to
> later exploit that ordering to simplify the
> algorithms that are *driven* from those tables!
> 
> In one such table, I *guarantee* circular definitions
> can't exist by imposing the requirement that anything
> referenced by a definition must already be defined
> (i.e., must exist "earlier" in the table).  So, I
> can have:
>   x = 3
>   ...
>   y = x + 2
> but this rule prevents:
>   x = y - 2
>   ...
>   y = 5
> (i.e., because, at the time x is defined, y does not
> yet have a definition).  Otherwise, I could create
> anomalies inadvertently like:
>   x = y - 2
>   ...
>   y = X + 2
> 
> I can force new entries to the (RDBMS) table to observe this
> rule by tying some code to the INSERT method such that any
> references the code can't resolve at INSERT-time are
> rejected.
> 
> However, if the table is eventually *dumped*, there is
> no guarantee that rebuilding the table will cause the
> tuples to be re-INSERTed in the same order that was
> 
> imposed on the original INSERTions.
> If the INSERT order of the data effects
> the information then the model is wrong.  
> 
> 
> 
> 
> So, how do you do this?  Consider a record for a
> 
> "person" having fields that specify name,
> birthdate, etc.
> 
> In addition, fields that REFERENCE *other* records
> 
> defining the "mother" and "father" for
> that "person".
> 
> I.e., how do you refer to a mother who's record
> doesn't
> 
> yet exist??
> A CHILD cannot exits before it's PARENT conceived it.

Yes, but there is nothing that prevents the data for the
child from being entered into the database *before* the
data for its parent (other than my bogus INSERT method).

> Some times in data acquisition The
> CHILD is entered before it's parent, Thus deferred
> constraints.    

But, how do you ensure those constraints are *eventually* satisfied?
I.e., how can you rollback the offending INSERTs (i.e., remove those
now BOGUS tuples) when you "later" discover that key fields are
missing/incorrect?  Especially when there is never really a "commit"
point in the logic?

E.g., when restoring a dump, you can turn off these constraints
while loading the data.  There is then a point "at the end"
when you essentially say, "OK, that's everything!" and it *must*
be everything.

Returning to the parent/child example, how (other than the hacked
INSERT method I mentioned) do you force the *user* to specify the
data in a way that follows these rules?

>> (I figure *this* sort of thing has to come up "in the real world"!?)
>
> It does come up it.  I have seen in
> countless DBs I have had to 'fix' , but it has never
> been an issue in any DB I have designed, at least the ones I
> have had full information of the data.   There in lies the
> rub,  can you know enough about the data before you start
> the design.

Look at my two examples.  ("parent/child" and "X/Y").
I don't see any way of preventing this problem from coming up
without putting tests in the INSERT (and DROP -- think about it)
methods.