Out of the Starting Block - Page 157
January 22, 2001
The Benchmark module provides a number of tools for timing and
comparing the execution time of code. Depending how you use these
tools, you can time single statements, entire subroutines, or the
entire script. Often, you'll want to time all of these depending
on the script and the course of your investigation. For starters,
let's just say that you want to time a simple statement. Perhaps
you have some variations in mind, and are wondering which would
be fastest.
Imagine a string, perhaps a filename, and you want to extract the
filename extension — defined here as anything that occurs to the
right of the decimal; e.g. the string may be "filename.txt" and
we want the "txt" portion.
The first solution that comes to mind uses a regular expression
pattern match, such as:
$match=qq/filename.txt/;
$match=~/.*\.(.*)/;
In the above example, the desired result will appear in the
special variable $1, since it is the first (and
only) match possible from this regular expression. Then your mind
starts to wander ... what if you lopped off the righthand portion
using a split rather than a regular expression? What
if you used a substr rather than a
split? What if you get pizza tonight and swear that
you'll cook something from scratch tomorrow?
Enter the Benchmark. Using the Benchmark module's timethis
function, we can measure the execution time of one of these
solutions.
use Benchmark;
&Benchmark::timethis(10,
'$match=qq/filename.txt/;$match=~/.*\.(.*)/');
Same code as before, but this time supplied to the
timethis function. The first parameter supplied tells
Benchmark how many times to run this code. The more times it runs
the code, the more accurate a measurement is revealed. In fact,
if we run the line above, Benchmark will warn sternly: too few
iterations for a reliable count. Indeed, we should supply a
higher number of run-throughs. Nice and high, like 500000 --
that's right, five hundred thousand!
&Benchmark::timethis(500000,
'$match=qq/filename.txt/;$match=~/.*\.(.*)/');
Once done, Benchmark reports:
timethis 500000: 1 wallclock secs( 1.19 usr + 0.01
sys = 1.20 CPU) @ 415973.38/s (n=500000)
The results? Over 500,000 iterations, this snippet of code runs
at a rate of 415,973.38 times per second. Of course, this number
may vary if you try this test on your own machine, depending on
its resources, the version of Perl being used, and so on. These
particular examples were generated on a 900Mhz Athlon Thunderbird
running ActivePerl 5.6.0 under Windows 2000. Rather then get hung
up on the execution time taken on its own, what we're really
interested in is how this code fares relative to the alternative
solutions.
Another way to grab the filename extension is using
split on the decimal, and grabbing the second item
of the resulting array:
$match=qq/filename.txt/;
$match=@{[split /\./,$match]}->[1];
Let's time it!
use Benchmark;
&Benchmark::timethis(500000,
'$match=qq/filename.txt/;
$match=@{[split /\./,$match]}->[1]');
The timesheet says:
timethis 500000: 3 wallclock secs ( 2.59 usr + 0.00
sys = 2.59 CPU) @ 192752.51/s (n=500000)
A fascinating result — between the tortoise and the hare,
the split function was clearly the tortoise. Over
500,000 iterations, this solution only clocked in at 192,752.51
times per second: less than half the speed of the regular
expression!
One candidate remains on our brain, the idea of using the
substr function to grab the text in question. Such a
solution might look like:
$match=qq/filename.txt/;
$match=substr ($match,index($match,qq/./)+1);
In this approach, we use index to find the location
of the decimal in the string, and then grab the text starting one
place to the right of the decimal, through to the end of the
string. Again, to the Benchmark:
use Benchmark;
&Benchmark::timethis(500000,
'$match=qq/filename.txt/;
$match=substr ($match,index($match,qq/./)+1)');
Yields:
timethis 500000: 0 wallclock secs ( 0.65 usr + 0.00
sys = 0.65 CPU) @ 768049.16/s (n=500000)
Whoa, nelly! Indeed, the substr solution whooped the
regular expression at nearly twice the speed, and almost four
times the speed of the split function. In all three
of our cases, the code was itself was quite simple. But exactly
how Perl implements these different functions internally —
regular expressions, split, and substr
— varies. While we shouldn't draw across-the-board
conclusions from this test, we can draw the conclusion that for
this particular case, the substr function is much
faster than the alternatives as measured across 500,000
iterations. Also, ordering pizza tonight is probably a good idea,
no matter how long it takes.
Benchmarking Perl - Page 156
The Perl You Need to Know
Apples to Apples, Oranges to Oranges - Page 158
|