[Tfug] Database file import/consolidation

Thu Jul 15 16:41:22 MST 2004

First,  Paul's suggestion may be the easiest to implement.  But what you
are doing sounds like an UPSERT.  A feature new to SQL and in some
current db.  The SQL command is MERGE and has two branches one for the
insert of new  the other for update of existing records.  A simplefied
versio of the syntax is

MERGE INTO TABLE USING table/subquery ON  ( condition ) 
WHEN MATCHED update stmt
WHEN NOT MATCH insert stmt

Without MERGE as a SQL option I would do 

have the old table called REALDATA.  Load the new data into STAGE.  The
do

INSERT INTO realdata  
  ( SELECT * FROM stage  
         MINUS 
     SELECT * FROM realdata ) 

This will only work if the existing data if contained in the new
dataset.  Depending on the db this could take seconds or days YMMV. 
Both table should have simular indexs on them for as close to a PK as
you can get. 

TR 

On Thu, 2004-07-15 at 16:04, Paul Scott wrote:
> Choprboy wrote:
> 
> >Anybody doing regular imports into a database of a large set of partially 
> >updated data? I.e. you get an updated file regularly and need to accumulate 
> >the changes in the DB?
> >
> >I'm talking about large data sets here, ~500,000+ records... How do you do 
> >efficent updates (additions really) of the DB when there is no unique key 
> >across the rows (or a practical way to make one)???
> >
> >At the moment, to process updates, I have to take the current 500,000+ row 
> >table and copy/index all the data it into a second temporary table. Then 
> >import the 500,000+ rows of the update file (more than 95% of which is the 
> >same) into the temporary table. Then do a SELECT DISTINCT() across the entire  
> >secondary temp table and insert/index the results into a third table... After 
> >I'm done, I can then quickly R/W lock the original table, drop it, and swap 
> >in my newly created/consolidated table... But it obviously takes a while and 
> >severely hammers the server to be building/rebuilding/destroying multiple 
> >500,000+ row tables to integrate new updates.
> >
> >Is there an easier way to do this? Or am I out of luck?
> >  
> >
> Is the whole database ~500,000 records?  Possibly export the data and 
> remove the duplicate records with command line tools and/or scripts and 
> the then just update the actually changed records.  It would probably be 
> faster.
> 
> Paul Scott
> 
> _______________________________________________
> tfug mailing list
> tfug at tfug.org
> https://www.tfug.org/mailman/listinfo/tfug
>