[Tfug] Database file import/consolidation

Fri Jul 16 10:01:31 MST 2004

Hi Adrian,

This is a difficult situation.

I have several thoughts.  

Given a limited understanding of your data, I would suggest you define a set of rules to deal with this situation.

The other input I would give is, the built in functions are great for small record sets however, from my experience, they tend to be so generic that they are resourse intensive.  Given this I suggest writing your own script to process this data.  MySql has a C API that you can use, or you can use PERL.

-Using the data to be imported create a table.  
-Index that table using several fields that you find will allow for the greatest probability to determine the record can be matched (or unique) to one already in the system - I would use a master customer record (also like indexed) to match them.  Now all you need is a do loop with seek.
-As you process these records and find them to match I would fill in the unique key feild.
-Once complete you have two sets of data in the import table - the records that have a key and the ones who do not.
-You can import the matched records and go into a second process to try and id the umnatched records.
-To remove duplicate records from your import table all you have to do is a do while not end of file loop checking to see if the key field is populated and matches the prior record.  If so delete.

>From my experience if you build your own code your process will be both faster and will take up much less resources.  I have found (using another data engine) that SELECT DISTINCT is rather slow.

Given my understanding of your situation I would think a scripted process would take 5 to 30 minutes on a computer with >500mhz cpu/128mb ram/7200rpm drive.

I built a system a number of years ago that would process 4.5G of data in about 20 tables with the largest file containing (as I recall) over a million records.  It took about 4 to 8 hours to pull the data from a Novell server (depending on load) and about 6 hours to process.  The amazing thing was I was using a discarded 100mhz / 64M computer.  The cool part was I was able get an external ulta wide fast scsi HD (9.1G) which was very fast.  This was 1997.

I hope this helps,

Keith Smith

Choprboy <choprboy at dakotacom.net> wrote:
On Thursday 15 July 2004 18:56, keith smith wrote:
> Adrian,
> 
> Do you assign a unique key to each customer? Are they customer numbers?
> 

Well, yes and no. In other tables, each customer has a unique key and there is 
only one instance of each user (actually, there are multiple copies of the 
same customer, to be fixed later, but they are considered unique customers).

But I have to check our customers being processed for matches against this 
table provided by an external 3rd party, which may contain multiple records 
for the same customer, has no unique keys, and I can not get them to provide 
a "uniqueness". Add to that, I have to do "fuzzy-ish" query against it to 
catch name/etc. variations between the local and 3rd party tables.

Adrian
_______________________________________________
tfug mailing list
tfug at tfug.org
https://www.tfug.org/mailman/listinfo/tfug

---------------------------------
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!