The Perl You Need to Know: Part 5 "Processing and Parsing Web Pages"
August 9, 1999
|
In The Perl You Need to Know we've explored a variety of means
to add content to web pages, but we've yet to see how to
retrieve information from a web page using Perl. Last month's
exploits featured the use of templates to easily insert
dynamic information into pre-structured pages such as the
Smallville Gazette. This month we extend this concept,
retrieving information from the web which will then be
dynamically included in a template-based output page. Our
partner in this scheme is Perl library LWP which, like a
Swiss army knife, provides a number of tools for carving,
slicing, dicing and parsing web pages.
|
In
The Perl You Need to Know we've explored a variety of
means to add content to web pages, but we've yet to see how
to retrieve information from a web page using
Perl. In fact, there
are many possible reasons you'd want to read and access pages
from your Perl scripts, instead of or in addition to
generating web pages as output.
Last month's exploits featured the use of templates to
easily insert dynamic information into pre-structured pages
such as the
Smallville Gazette.
This month we extend this concept, retrieving information
from the web which will then be dynamically included in a
template-based output page. That's a mouthful of jargon, to
be sure, but the results are simple and elegant.
Our partner in this scheme is
the library for WWW access in Perl,
thankfully also known as simply LWP. LWP encompasses
a set of Perl modules which, like a Swiss army knife, provide
a number of tools for chopping, carving, slicing, and dicing
web pages. Some of LWP's capabilities can be quite complex to
use while others are graciously simple. We begin our look at
LWP and its simpler uses in combination with the template
technique seen in
Part 4 of The Perl You Need to Know.
Contents:
A Simple Goal
Simply, LWP::Simple
Grasping for Tags
Pulling Tags Like Taffy: TokeParser
Parsing Attributes with Ease
The Proof is in the Parsing: A Web Page Summarizer
Conclusion
The Perl You Need to Know
A Simple Goal
|