[Tfug] grep question [was find | grep]

Rich r-lists at studiosprocket.com
Fri Oct 24 08:23:22 MST 2008


Hang on a sec...

Maxim #644: Use of temporary files with sed is an admission of failure.

So: (tested)

	find . -type f -name abook.mab -exec grep @ {} \; \
	  | sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g'

I'll assume you understand the "find" you were given earlier. And  
there's more after the regex explanation.

Explanation of regex:
  s = search & replace
  / = beginning of search pattern
  [^=]*=\([^@]*@[^)]*\))[^=]* = search pattern
  	[^=]* = any number (*) of not (^) an equals sign '='
	=     = the '=' sign -- opening delimiter for email addresses
	\([^@]*@[^)]*\) = the bit we want to keep
		\( = the opening delimiter
		\) = the closing delimiter
		[^@]* = any number (*) of not (^) at signs '@'
		@ = the at sign
		[^)]* = any number (*) of not (^) right parentheses ')'
	) = the right parenthesis which is the closing delimiter for email  
addresses
  	[^=]* = again, any number (*) of not (^) an equals sign '='
  / = end of search pattern and beginning of replace pattern
  \1, = replace pattern
	\1 = first remembered string in search pattern
	, = the comma sign ','
  / = end of replace pattern
  g = 'global': do this s&r more than once per line

So, all in all, pretty straightforward. It doesn't even get into the  
advanced features of sed.

Caveats:
Currently, the one-liner preserves carriage returns. If you want to  
squash them, use xargs:

	find . -type f -name abook.mab -exec grep @ {} \; \
	  | sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g' | xargs

And there's a trailing comma. Squish:

	find . -type f -name abook.mab -exec grep @ {} \; \
	  | sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g' | xargs \
	  | sed -e 's/,$//'

And it doesn't check for dups. Splat:

	find . -type f -name abook.mab -exec grep @ {} \; \
	  | sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g' \
	  | sort -u | xargs \
	  | sed -e 's/,$//'

Only problem here is that with more than one email per line, they're  
still treated as one line. I'm pretty sure GNU sed allows you to  
insert a newline with \n, but I'm on a Mac without GNU sed, and it's  
a nice challenge to work around this:

	find . -type f -name abook.mab -exec grep @ {} \; \
	  | sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,=/g' \
	  | tr = '\n' | sort -u | xargs \
	  | sed -e 's/,$//'

So I added an equals sign to the sed replace string, as a placeholder  
for where I want the newline to be. The output is piped to tr to  
convert '=' into newlines. Then it goes to sort, and everything's  
nice again.

Now that we're happy with the output, we can push it into a file:

	find . -type f -name abook.mab -exec grep @ {} \; \
	  | sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,=/g' \
	  | tr = '\n' | sort -u | xargs \
	  | sed -e 's/,$//' > spam_list.csv

The best thing about one-liners is that you don't have to fit them on  
one line :-)

R.


On Oct 23, 2008, at 9:21 pm, Jeff Breadner wrote:

>
>>
>>
>> ...ummm, now that all the terms/lines containing '@' are in one  
>> file (which is fine in itself), is there any way I could extract  
>> the string of characters containing the '@' symbol, but only print  
>> those characters in the string between the '=' symbol and the ')'  
>> symbol???
>>
>> and output that into a nice text file with commas in between?????
>>
>> I suppose I can write my own script someday.
>>
>> :|
>>
>> Here's are two sample lines of output:
>>
>> (145=lasalledre at aol.com)(146=lasalledre)(14D=438f81ad)(147=2f)(A1
>> (152=marc diMinno)(142=marcdiminno at hotmail.com)(153=43919ff2)(143=2e
>>
>> which would be perfect to see like this:
>>
>> ...
>>
>> lasalledre at aol.com,
>> marcdimminno at hotmail.com,
>>
>> ...
>>
>>
> Another one, this one might be more reliable for your specific  
> situation, but less portable to other files (should anyone else be  
> looking for a similar solution).  This one replaces the bracket and  
> equals signs with newline characters, then just greps out any line  
> that has an @ symbol in it:
>
> cat inputfile | tr \(=\) \\\n\\\n\\\n | grep @
>
>
> cheers
>  Jeff
>
> _______________________________________________
> Tucson Free Unix Group - tfug at tfug.org
> Subscription Options:
> http://www.tfug.org/mailman/listinfo/tfug_tfug.org





More information about the tfug mailing list