Effective Perl Programming:
Without a map You Can Only grep
Your Way Around
by Joseph N. Hall
<joseph@5sigma.com>
Joseph N. Hall is the
author of Effective Perl Programming (Addison-Wesley, 1998). He
teaches Perl classes, consults, and plays a lot of golf in his spare
time.
Welcome to Effective Perl Programming, the column. In this and coming
articles I'm going to discuss ways that you can use Perl more
effectively, whether by mastering Perl idioms, using Perl modules, or
finding new applications for Perl programs. I begin by covering two
very powerful but underused language features: Perl's map
operator and its cousin, the grep operator. I'll cover the
basics quickly, then show you some useful techniques and a few neat
tricks
as well.
The Basics of map and grep
The map operator creates a transformed copy of a list by
evaluating a specified expression or code block for each element in the
list. Each time the expression or block is evaluated, $_
contains the value of the current element. The result from the
expression or block is appended to the list returned by map.
The syntax for map looks like this:
@result = map expression, list
@result = map { code } list
Rewritten without using map, the effect is like this:
@result = ();
foreach (list) { push @result, expression; }
For example:
@times_ten = map $_ * 10, 1..10;
returns the list (10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
and
@uppercased = map { ucfirst $_ } qw(george jane judy
elroy);
returns the list ('George', 'Jane', 'Judy', 'Elroy'). The
transform expression (or block) is evaluated in a list context. It does
not have to return a single element it can return two or more or
none at all (an empty list). More on that later.
The grep operator resembles the map operator
syntactically:
@result = grep expression, list
@result = grep { code } list
However, unlike the map operator, which constructs a
transformed copy of a list, the grep operator selects items
from a list. The selection expression (or block) is evaluated for each
element of its argument, with $_ set to the current element.
If the result is true (anything other than the empty string or the
string '0'), a copy of the element is appended to the result
from grep. For example:
# Returns (2, 4, 6, 8, 10).
@even = grep { not $_ % 2 } 1..10;
# Returns a list of text files in the current
directory.
@text_files = grep -T, glob "*";
# Classic grep -- imitating Unix grep. Prints lines containing
the
# word 'Joseph'.
print grep /\bJoseph\b/, <>;
The grep operator has been around for a long time, but the
map operator is new in Perl 5 (as much as anything that is
four years old can be called "new," anyway). The map operator
is more versatile and can do anything that grep can:
# Another way to get a list of text files.
@text_files = map { (-T) ? $_ : () } glob "*";
map and grep Idioms
The map operator is obviously useful for simple one-to-one
transformations:
# Print out the contents of a hash.
print map "$_: $hash{$_}\n", sort keys %hash;
Be careful, though: this approach creates a lot of temporary structures
in memory. For a very large hash it would be more appropriate to use an
each loop:
while (($key, $val) = each %hash) { print "$key: $val\n"
}
Using map to construct hashes is an important idiom. You can
construct existence hashes that are used to test whether a particular
value has been seen; in this case, set all the values in the hash to 1
(or some other "true" value). You can also use map to
construct hashes where the value is computed from the key. To use
map to construct a hash, return two values for each original
element the key and its corresponding value.
# Create keys for all the "words" in $text, so that we can test
for
# a word later with if $seen{$word}.
%words_seen = map { $_ => 1 } split /\s+/, $text;
# After this, $file_size{$file} gives -s $file -- saves time if
# we need to use it more than once.
%file_size = map { $_ => -s } @files;
The map operator is handy for "nesting" and "slicing"
multidimensional data structures. Using an anonymous array (or hash)
constructor inside map creates nested structures. For example,
you can blend parallel arrays into a single 2-d structure:
# Blend @x, @y, and @z into a single 2-d array @xyz ...
$xyz[0][0]
# is $x[0], $xyz[0][1] is $y[0], and so on.
@xyz = map [$x[$_], $y[$_], $z[$_]], 0..$#x;
You can use the same technique to create a hash of arrays:
# Cache the results from stat into a hash of arrays ... then
# $info{'file'}[7] gives the size of 'file', $info{'file'}[5]
# gives the owner's uid, and so on.
%info = map { $_, [ stat $_ ] } @files;
Extracting a slice of a nested structure is just as easy. Just use a
subscript inside map:
# This will extract @x from @xyz (undoing what we did above)
...
# $x[0] is $xyz[0][0], $x[1] is $xyz[1][0], and so on.
@x = map $_->[0], @xyz;
The grep operator isn't as versatile as map, but it
is usually the most succinct way to select items from a list. Don't
forget that it can be used on complex structures:
# Select elements from @xyz whose "coordinates" are all
>0.
# @gt_zero is still a 2-d array with the same organization as
@xyz.
@gt_zero = grep {$_->[0] > 0 and $_->[1] > 0 and
$_->[2] > 0} @xyz;
Cool Tricks with map
You can use map to read several lines of input at a time:
# Read 10 lines from STDIN.
@ten_lines = map scalar(<STDIN>), 1..10;
The "Schwartzian Transform" (named after fellow Perl trainer and author
Randal L. Schwartz) is a sort surrounded by maps. It
is generally preferred over other techniques when the sorting process
requires time-intensive key transformations:
# Sort files in descending order of size.
| @files_by_size = |
| map { $_->[0] }
| # 3. slice out the original list, now
sorted |
| sort { $b->[1] <=> $a->[1]
} | # 2. sort the list of tuples
|
| map { [$_, -s $_] }
| # 1. create a list of tuples by nesting
|
| @files;
| # the data to be sorted |
You can use map for some set operations. Here is an example of
using it to find the elements in one hash (%hash1) that are
not in another hash (%hash2). Depending on the relative sizes
of the hashes involved, this can be more efficient than other methods
(like using the delete operator):
# keys %result contains 2 4 6 7 8 9 when this is
done.
%hash1 = map { $_, 1 } 1..9; # some sample data
%hash2 = map { $_, 1 } 1, 3, 5; # more sample data
%result = map { $_, $hash1{$_} } grep { not exists $hash2{$_} }
keys %hash1;
# Another way to do the same thing, with delete.
%result = %hash1;
delete @result{keys %hash2};
Because map's transform expression is evaluated in a list
context, using map in combination with a pattern match that
contains some parentheses can produce unusually succinct code:
# Create a hash of user name vs. user id from lines in
/etc/passwd.
open PASSWD, "/etc/passwd" or
die "couldn't open password file: $!\n";
%name_to_id = map /(.*?):.*?:(.*?):/, <PASSWD>;
The map operator can even be useful for some string
operations:
# Convert a string like 'ABC' into its
# hex equivalent, '\x41\x42\x43'.
$hexed = join '', map { sprintf "\\x%x", ord $_ } split //,
$str;
# An alternative using s///, which is slightly slower
# for long strings.
($hexed = $str) =~ s/(.)/sprintf "\\x%x", ord $1/ge;
That should be enough for now. I hope you've enjoyed this little tour
of map and grep. My next column will be something of
a change of pace I will introduce object-oriented programming in
Perl.
|