Basic File Processing in Python
by Ralph Heimburger
September 28, 2009
|
In this article, we'll cover the basics on one of Python's
most powerful capabilities, file I/O processing.
|
Introduction
One of the most powerful capabilities Python provides is
it's file I/O capability. Most sysadmins would agree that
file processing in any language is essential. Python makes
it very simple.
Quite commonly, I find myself writing Python
scripts to do just about anything with files and
directories: convert, translate, extract, dynamically create
folders etc.
Below is a basic reusable script to have Python
open a file as a command line parameter, read it's contents
and print out some information about the file itself.
(Note that in this example, the method of using the if
__name__ == '__main__' is not essential, just
good coding practice in case I want to add iterable
functions in the future.)
#myfile.py
#usage: myfile.py <filename>
if __name__== '__main__':
import os, sys
#first, get commandline arguments
scriptname=sys.argv[0]
try:
filename=sys.argv[1]
except:
filename='<nofile>'
#verify that the file does exist
if not os.path.isfile(filename):
print '%s: filename not provided or does not exist!' %scriptname
raise SystemExit
lines,chars,words=0,0,0
for data in file(filename):
lines+=1 #count lines
chars+=len(data) #count bytes
words+=len(data.split()) #count words
print '%s: %s %s %s %s' %(scriptname,filename,lines,chars,words)
Testing the code:
Test 1: running the script against itself
At the command prompt:
C:>myfile.py myfile.py
C:>myfile.py: myfile.py 24 638 57
Test 2: Error Handling
C:>myfile.py invalidfilename
C:\myfile.py: filename not provided or does not exist!
Because of Python's reliance on indentation blocks, there
is no need for Begin/End constructs. Also, this example was
done with just the basic Python built-in modules (sys and
os). In addition to file processing, we can also detect if
directories exist, create directories, remove directories.
os.path.isdir(path) vs .
os.path.isfile(path)
In this example you can see how to test for the existence
of a directory vs. the existence of a file and how Python
allows you to make that distinction.
>>> import os
>>> path="c:\\temp"
>>> os.path.isdir(path)
True
>>> os.path.isfile(path)
False
The path C:\temp is a folder and Python returns True from
os.path.isdir(). Next, let's test to see if a
path exists and if it doesn't, create it.
import os
import sys
path=sys.argv[1]
if not os.path.isfile(path):
if not os.path.isdir(path):
os.mkdir(path)
print 'path %s created' %path
else:
print 'path already exists'
else:
print 'the path provided is an existing file'
In the above example, the pathname comes from the command
line argument so we also needed to import sys. Note that I
could have just assumed that the argument was a path and not
a filename but some OS's don't allow you to create folders
with the same name as an existing filename so a nested test
allows it to not make that assumption.
Creating Files
So far we looked at how to open files, test and create
folders. Now we will also create a file. Files are created
with the open command and assigned to a fileobject:
outputfile=open('myfilename', 'wb')
outputfile is the fileobject with has the following exposed methods:
close closed
encoding fileno
flush isatty
mode name
newlines next
read readinto
readline readlines
seek softspace
tell truncate
write writelines
xreadlines
We call the methods of the fileobject to perform various
operations. In the next example, we'll iterate a list of
numbers and then write them to a file. We'll assume that the
file will be overwritten, if it already exists.
#fileouttest
fileout=open('fileout.txt', 'w')
for n in range(1,1000):
fileout.write('%s\n' %str(n))
fileout.close()
The second parameter passed to the open command is the
mode, the basic modes are 'r' and 'w' there is also a
submode 'b' for binary, e.g. 'rb' and 'wb' which are
required for files with non ascii or extended ascii
characters, e.g. unix style line-endings, etc.
Summary
Python provides a rich set of built in functions for file
I/O and/or directory management. File IO is an essential part
of any programming language. In Python, one can quickly be
productive in writing very advanced scripts for file
processing.
|