SAGE - Feature


Effective Perl Programming:
Without a map You Can Only grep
Your Way Around

hall_joseph

by Joseph N. Hall
<joseph@5sigma.com>

Joseph N. Hall is the author of Effective Perl Programming (Addison-Wesley, 1998). He teaches Perl classes, consults, and plays a lot of golf in his spare time.


Welcome to Effective Perl Programming, the column. In this and coming articles I'm going to discuss ways that you can use Perl more effectively, whether by mastering Perl idioms, using Perl modules, or finding new applications for Perl programs. I begin by covering two very powerful but underused language features: Perl's map operator and its cousin, the grep operator. I'll cover the basics quickly, then show you some useful techniques and a few neat tricks as well.

The Basics of map and grep

The map operator creates a transformed copy of a list by evaluating a specified expression or code block for each element in the list. Each time the expression or block is evaluated, $_ contains the value of the current element. The result from the expression or block is appended to the list returned by map. The syntax for map looks like this:

@result = map expression, list
@result = map { code } list

Rewritten without using map, the effect is like this:

@result = ();
foreach (list) { push @result, expression; }

For example:

@times_ten = map $_ * 10, 1..10;

returns the list (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), and

@uppercased = map { ucfirst $_ } qw(george jane judy elroy);

returns the list ('George', 'Jane', 'Judy', 'Elroy'). The transform expression (or block) is evaluated in a list context. It does not have to return a single element ­ it can return two or more or none at all (an empty list). More on that later.

The grep operator resembles the map operator syntactically:

@result = grep expression, list
@result = grep { code } list

However, unlike the map operator, which constructs a transformed copy of a list, the grep operator selects items from a list. The selection expression (or block) is evaluated for each element of its argument, with $_ set to the current element. If the result is true (anything other than the empty string or the string '0'), a copy of the element is appended to the result from grep. For example:

# Returns (2, 4, 6, 8, 10).
@even = grep { not $_ % 2 } 1..10;

# Returns a list of text files in the current directory.
@text_files = grep -T, glob "*";

# Classic grep -- imitating Unix grep. Prints lines containing the
# word 'Joseph'.

print grep /\bJoseph\b/, <>;

The grep operator has been around for a long time, but the map operator is new in Perl 5 (as much as anything that is four years old can be called "new," anyway). The map operator is more versatile and can do anything that grep can:

# Another way to get a list of text files.
@text_files = map { (-T) ? $_ : () } glob "*";

map and grep Idioms

The map operator is obviously useful for simple one-to-one transformations:

# Print out the contents of a hash.

print map "$_: $hash{$_}\n", sort keys %hash;

Be careful, though: this approach creates a lot of temporary structures in memory. For a very large hash it would be more appropriate to use an each loop:

while (($key, $val) = each %hash) { print "$key: $val\n" }

Using map to construct hashes is an important idiom. You can construct existence hashes that are used to test whether a particular value has been seen; in this case, set all the values in the hash to 1 (or some other "true" value). You can also use map to construct hashes where the value is computed from the key. To use map to construct a hash, return two values for each original element ­ the key and its corresponding value.

# Create keys for all the "words" in $text, so that we can test for
# a word later with if $seen{$word}.

%words_seen = map { $_ => 1 } split /\s+/, $text;

# After this, $file_size{$file} gives -s $file -- saves time if
# we need to use it more than once.

%file_size = map { $_ => -s } @files;

The map operator is handy for "nesting" and "slicing" multidimensional data structures. Using an anonymous array (or hash) constructor inside map creates nested structures. For example, you can blend parallel arrays into a single 2-d structure:

# Blend @x, @y, and @z into a single 2-d array @xyz ... $xyz[0][0]
# is $x[0], $xyz[0][1] is $y[0], and so on.

@xyz = map [$x[$_], $y[$_], $z[$_]], 0..$#x;

You can use the same technique to create a hash of arrays:

# Cache the results from stat into a hash of arrays ... then
# $info{'file'}[7] gives the size of 'file', $info{'file'}[5]
# gives the owner's uid, and so on.

%info = map { $_, [ stat $_ ] } @files;

Extracting a slice of a nested structure is just as easy. Just use a subscript inside map:

# This will extract @x from @xyz (undoing what we did above) ...
# $x[0] is $xyz[0][0], $x[1] is $xyz[1][0], and so on.

@x = map $_->[0], @xyz;

The grep operator isn't as versatile as map, but it is usually the most succinct way to select items from a list. Don't forget that it can be used on complex structures:

# Select elements from @xyz whose "coordinates" are all >0.
# @gt_zero is still a 2-d array with the same organization as @xyz.

@gt_zero = grep {$_->[0] > 0 and $_->[1] > 0 and $_->[2] > 0} @xyz;

Cool Tricks with map

You can use map to read several lines of input at a time:

# Read 10 lines from STDIN.
@ten_lines = map scalar(<STDIN>), 1..10;

The "Schwartzian Transform" (named after fellow Perl trainer and author Randal L. Schwartz) is a sort surrounded by maps. It is generally preferred over other techniques when the sorting process requires time-intensive key transformations:

# Sort files in descending order of size.
@files_by_size =
map { $_->[0] } # 3. slice out the original list, now sorted
sort { $b->[1] <=> $a->[1] }# 2. sort the list of tuples
map { [$_, -s $_] } # 1. create a list of tuples by nesting
@files; # the data to be sorted

You can use map for some set operations. Here is an example of using it to find the elements in one hash (%hash1) that are not in another hash (%hash2). Depending on the relative sizes of the hashes involved, this can be more efficient than other methods (like using the delete operator):

# keys %result contains 2 4 6 7 8 9 when this is done.

%hash1 = map { $_, 1 } 1..9;    # some sample data
%hash2 = map { $_, 1 } 1, 3, 5; # more sample data
%result = map { $_, $hash1{$_} } grep { not exists $hash2{$_} } keys %hash1;

# Another way to do the same thing, with delete.

%result = %hash1;
delete @result{keys %hash2};

Because map's transform expression is evaluated in a list context, using map in combination with a pattern match that contains some parentheses can produce unusually succinct code:

# Create a hash of user name vs. user id from lines in /etc/passwd.

open PASSWD, "/etc/passwd" or
die "couldn't open password file: $!\n";
%name_to_id = map /(.*?):.*?:(.*?):/, <PASSWD>;

The map operator can even be useful for some string operations:

# Convert a string like 'ABC' into its
# hex equivalent, '\x41\x42\x43'.

$hexed = join '', map { sprintf "\\x%x", ord $_ } split //, $str;

# An alternative using s///, which is slightly slower
# for long strings.

($hexed = $str) =~ s/(.)/sprintf "\\x%x", ord $1/ge;

That should be enough for now. I hope you've enjoyed this little tour of map and grep. My next column will be something of a change of pace ­ I will introduce object-oriented programming in Perl.


?Need help? Use our Contacts page.
7th July 1998 efc
Last changed: 7th July 1998 efc
Issue index
;login: index
SAGE home