suPerlative: The Site Map
The site map is generated by
cmap.pl.
It takes a list of all the index.html paths, created by using
the UNIX 'find' command. It
uses a neat algorithm (i.e. I'm quite
proud of it) to layout a table showing the hierarchy of the site
directories. As it only lists index files, the map isn't quite fully
comprehensive - but at 1500 pages for the site, a full site map would
be very large and hard to use. Since we structure the site very
thoroughly, with usually only a handful of files per directory,
the map actually gives quite a good overview.
To see how it works, consider this fragment of a 'dir' listing:
/Authoring/
/Authoring/CGI/
/Authoring/CGI/Input/
/Authoring/CGI/Output/
/Authoring/CGI/Process/
/Authoring/DB/
In the map table, 'Authoring' should span at least 5 rows;
and 'CGI' should span 3: Input, Output, and Process.
The program reads each line, and steps through the pathname components,
e.g. Authoring, CGI, Input. As it goes, it creates a name by sticking
the components back together (w/o the '/'), e.g. AuthoringCGI.
Each time it sees a name, it counts it (incrementing an associative
array indexed on the name). So AuthoringCGI will receive a count of 4;
subtract one to get the number of rows to span.
This number is then used for the 'rowspan' attribute in the corresponding
table cell.
Here is an outline of the algorithm:-
- Open the input and output files.
The input file is a list of paths to index.html files.
- Output an ht header and start the table.
- Read in the directory of index.html files, line by line.
- Skip over directories we don't want on the map.
- Add the path to a list array.
- For each item in the list (after sorting):
- Split the pathname into components.
- If there are not more parts than in the previous entry,
then the previous row is finished.
- For each component in the previous entry:
- Append to the name and add to its count.
- Save some data for each item:
- depth in hierarchy;
- whether at end of row;
- item name; etc.
- For each item except first two:
(why 2? first is that previous loop added spurious 'previous' entry,
second is that the home page top level isn't really needed.)
- Open the corresponding index.html file
- Get the Title from the file.
- Print the table cell.
- Print the row separator if it's the last row item.
- Print end of table; empty cell allows to complete last row validly.
suPerlative: Log File Analysers
suPerlative Web Construction !
suPerlative: Development and Public Servers
|