[Tfug] OT: Fairly simple relational database quick job bids...

Jim March 1.jim.march at gmail.com
Fri Oct 1 13:42:26 MST 2010


Zack,

Will this take into account entries in the main (larger) data that
don't have any corresponding "event entries"?  Will it also take into
account the "zero padding" present in the ID number of one of the
files?

If it will cool, I'll try it :).

Jim

On Fri, Oct 1, 2010 at 12:35 PM, Zack Williams <zdwzdw at gmail.com> wrote:
> On Fri, Oct 1, 2010 at 10:48 AM, Jim March <1.jim.march at gmail.com> wrote:
>> Folks,
>>
>> I need a contract on what we think is a fairly simple relational
>> database problem.
>>
>> What he's got is "raw data" somebody to chew on.
>>
>> He has two .CSV files.
>>
>> The first is 112megs, listing people and details about them as a
>> single line (record).  There's a unique ID number in one field.
>>
>> The second is 100megs, with each record only three fields long.  The
>> first field contains the unique ID number, the second contains a
>> number for an event they participated in, the third is not really
>> relevant.  For each person there will be several lines (records).
>>
>> Example:
>>
>> 1234543,2
>> 1234543,5
>> 1234543,6
>> 1234543,7
>>
>> This tells us that voter 1234543 was involved in events 2, 5, 6 and 7.
>>
>> One complication: it's common for the data on events to contain a ID
>> number of "123456" where the main data on the people lists the ID
>> number as "00123456".  That'll have to be parsed somehow.
>
> Here you go:
>
> --
> #!/usr/bin/env perl
> use strict;
> use warnings;
> use diagnostics;
>
> # this is the name of the input file
> my $infile = "votes.csv";
>
> # sort the contents of the file numerically
> `sort -n $infile > $infile.sorted`;
>
> open(INFILE, "$infile.sorted") or die "couldn't open input file";
>
> my $current_voter = 0;
>
> while(<INFILE>){
>    chomp; # get rid of the newline
>    my ($voter,$vote)= split(","); # split on the comma
>    if($voter != $current_voter) { # if this isn't the same voter as
> the last line (numeric comparison), print a newline and new voter id,
>        print "\n$voter,";
>    }
>    print "$vote,"; # print the vote
>    $current_voter = $voter;
> }
> print "\n";
> ---
>
> - Zack
>
> _______________________________________________
> Tucson Free Unix Group - tfug at tfug.org
> Subscription Options:
> http://www.tfug.org/mailman/listinfo/tfug_tfug.org
>




More information about the tfug mailing list