[Tfug] text file help

Choprboy choprboy at dakotacom.net
Thu Jun 26 10:07:40 MST 2008


On Thursday 26 June 2008 06:25, christopher floess wrote:
> Alright, I imagine this to be at least in part a sed-type problem. I'd
> like to try to figure
> it out on my own, but I'm on a time crunch here. I've got a 1300 page
> document that needs to be parsed and edited.
[snip]
>The problem is that if a location has 30 of one
> thing, that thing is
> listed 30 times.
[snip]
> 883230063751         BELT A REVERSIBLE        /RED     /L      402008
>     1       13.21       13.21
> 883230063805         BELT A REVERSIBLE        /RED/NVY /XL     402008
>     1       13.41       13.41
> 883230063805         BELT A REVERSIBLE        /RED/NVY /XL     402008
>     1       13.41       13.41
> 883230063751         BELT A REVERSIBLE        /RED     /L      402008
>     1       13.21       13.21
> 883230063751         BELT A REVERSIBLE        /RED     /L      402008
>     1       13.21       13.21
> 883230063751         BELT A REVERSIBLE        /RED     /L      402008
>     1       13.21       13.21
[snip]
> here you can see what's in fixture 122, area 13 and you can see item
> #883230063751
> is listed 6 times and in fact, the 6th column on any one of those
> lines is the quantity

So... this could be VERY simple to fix. How consistent is the data? If it is 
well formatted, you could just do a sort and uniq count. (Even if it isnt 
completely, it will clean it up and make the differences apparent.) Using 
your snippet of data:
[ajensen at tatertot ~]$ sort <inventory.txt |uniq -c
      1 883230063737         BELT A REVERSIBLE        /RED     /S      402008    
6       13.21       79.27
      1 883230063744         BELT A REVERSIBLE        /RED/NVY /M      402008    
6       13.41       80.48
      8 883230063751         BELT A REVERSIBLE        /RED     /L      402008    
1       13.21       13.21
      3 883230063805         BELT A REVERSIBLE        /RED/NVY /XL     402008    
1       13.41       13.41
      1 883230063805         BELT A REVERSIBLE        /RED/NVY /XL     402008    
6       13.41       80.48

From your 14 lines of data you end up with 5 lines of results showing you have 
8 of SKU 883230063751 in location 402008  listed as a single quantity. Since 
it is sorted by SKU/Location/Quantity (assuming the desc field is the same) 
other quantity counts of the same SKU would be in sequential lines.

Simple first step reduction in the data to group and count all the same 
entries. Afterward add in a couple awk and sed commands, perhaps multiplying 
the uniq count and the qty fields as a total field and then reordering the 
columns, lather, rinse, repeat to continue consolidating the data.

Adrian




More information about the tfug mailing list