[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Perl Golf: was mail log filtering



Steve Wrote: 
> !/usr/bin/perl
> @foo=<>; for my $bar (@foo) {
>  if ($bar =~ /sendmail\[\d+\]:\s+(\w+):\s+.*\<([^\>]+)\>.*User unknown$/) {
>   my $id=$1; my $to=$2;
>   for my $baz (grep { /:\s+$id:/ } @foo) {
>    if ($baz =~ /:\s+$id:\s+from=\<([^\>]+)\>.*relay=[^\[]*\[([\d\.]+)\]$/) {
>     print "from=<$1> to=<$to> relay=<$2>\n"; last;
> }}}}

Someone asked: Why is this so slow?

Anyone up for Perl Golf?

Here's my explanation: Steve's algorithm is slow because it operates in O(N^2)
time. It passes over an N line file N times.

So we start with 2 cups of caching, sprinkle liberally with regex optimization,
and simmer for a couple of seconds.

Here's the commented version:
#!/usr/bin/perl
my %id;
while (<>) {
 chomp; # The next few lines check to see if it is in our cache of interest
 my $line = $_; # Rename for grep to use $_
 study $line; # We will be matching this string a lot
 my $id; # Inspect the line to match any ID we've already found
 foreach (keys %id) {next unless $line =~ m/:\s+$_:/i; $id=$_; last;}
 if ($id) {
  # Do these as separate regexen as .* leads to lots of backtracking
  my ($from)  = $line =~ m/\s+from=\<([^\>]+)\>/io;
  my ($relay) = $line =~ m/\s+relay=[^\[]*\[([\d\.]+)\]$/io;
  # Spit out what we found
  print "FROM=<$from> TO=<$id{$id}> RELAY=<$relay>\n";
 }
 # Is it even possible for there to be a new ID in this line?
 next unless m/User unknown$/o;
 # Should we add it to our cache of interest?
 next unless $bar =~ /sendmail\[\d+\]:\s+(\w+):\s+.*\<([^\>]+)\>/o;
 $id{$1}=$2; # Create our cache
}

Here's my tee shot. Be prepared to show yours is shorter *and* faster.
Bring benchmarks. And stripping more whitespace doesn't count.

#!/usr/bin/perl
while(<>){chomp;$b=$_;study $b;{next unless $b=~m/:\s+$_:/i;$a=$_;last}
for keys %a;if($a){my($f)=$b=~m/\s+from=\<([^\>]+)\>/io;
my($r)=$b=~m/\s+relay=[^\[]*\[([\d\.]+)\]$/io;print
"FROM=<$f> TO=<$a{$a}> RELAY=<$r>\n";}next unless m/User unknown$/o;
next unless $b=~/sendmail\[\d+\]:\s+(\w+):\s+.*\<([^\>]+)\>/o;
$a{$1}=$2;}

Mike808/


---------------------------------------------
http://www.valuenet.net



-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.