Subscribe Contact

Home  »  Chapter 8 : Files
Python File Object

Overview

Files are one of the core elements of working with a computer for users and programmers. Nearly all applications create or interact with files that are stored on a device, that is, computers, servers, cloud devices, mobile devices, tablets, hard drives, USB drives, etc. Files are the primary tools that we use to store persistent data that we want to access again and again. As a programmer, it is critical to become very familiar with files, and how files are created, manipulated, and stored. This includes developing an understanding of how operating systems handle files and storage. Python provides a full suite of tools for file handling and we will explore in this chapter.

The Python File Object

In Python programming, the primary tool we use to work with files is the Python file object. The file object is what we use to open files, create file, read files, write files, and close files.


Page Menu: 

File Security & Permissions
One of the primary tasks of an operating system is providing security and permissions to the file system. Computer users are granted various permission levels to directories and files as needed based on their needs. On a computer that you own you have access to everything generally. However, in businesses and other multi-user systems, System Administrators generally have full access and then there are various user levels in addition to others. These user levels affect a user's ability to access, read and/or write files to various locations on the computer, on a computer network, or cloud system. When we run our Python programs, these access levels also apply to the program as well.

For example, say a user named Bob is running a Python program that we wrote that needs to open, read and write a file in the user's Documents directory. If Bob runs the program and he has a connection and access permissions to his Documents folder then the Python program will be able to interact with the file. This is because the Python program must adhere to the user permissions of the user running the program.
However, if Bob were to change the Python program to try to access Sally's Documents directory, but he does not have access permissions to Sally's directory, then the Python program will throw an error because Bob does not have proper permissions to that directory and file.
As programmers then, we need to take into account file security and permissions when writing file handling code. I will discuss approaches to this a little later on this page.

File Access Approaches
Another consideration when deciding how to handle files in our Python programs is whether we want to read and write files sequentially or using random access. Sequential access means that we will read or write the file one line at a time, usually starting at the top of the file. Random access is the ability to move around inside a file, and locate specific points in the file and interact with its contents non-sequentially. For now, I will focus on sequential access and address random access later in this eBook.

Opening a File
To interact with a file in our Python code, we must first open the file. Opening a file using Python code involves establishing a file object variable and assigning the result of the open() function to that variable. Inside of the open() function we specify the file we want to open and the mode (the type of access (read, write) we want to the file) to use to open it. You can find a full list of file access modes here.

The general form of the open() function is as follows:

file_var = open([r][Path]file_name [, file_access_mode])
Code Details: Example 1

Let's say I have a file in my Documents directory called Sample.txt that I want to open to read in my Python code. If I open that file in a plain text editor it looks like this (notice the absolute path in the title bar of the screenshot window):


To open this file in Python I can write the following ...

my_file = open(r"C:\Users\John\Documents\Sample.txt", "r")
... and, if the file exists at that location, then the variable my_file will contain an object reference to the file and it will be opened with mode "r", which means read-only access. This means I'll be able to read the file but I am not able to write to the file. At this point, nothing has happened to the file itself yet.

Reading from a File
Once I have opened the file using the open() function, I can start reading from the file. I have several options for reading from the file, I can read it all at once into a variable or I can read it one line at a time (sequentially). For this first example, let's read it all at once and then print the contents on our screen.

my_file = open(r"C:\Users\John\Documents\Sample.txt", "r")
file_contents = my_file.read()
print(file_contents)
Output:

This is the first line of text in my file.
This is the second line of text in my file.
This is the third line of text in my file.
This is the fourth line of text in my file.
This is the fifth line of text in my file.
Now that I have read the contents of the file into a variable I can use that variable to work with that content as needed.

Closing a File
It is important to close external resources, such as files, as soon as we are finished with them. Not closing resources can cause conflicts and unexpected results otherwise. To close a file simply use the file method close(), like this...

my_file.close()
So, if we combine all of the code so far we have this ...

my_file = open(r"C:\Users\John\Documents\Sample.txt", "r")
file_contents = my_file.read()
my_file.close()
print(file_contents)
Notice where I placed the my_file.close() statement, on Line 3 right after I captured the contents of the file into my file_contents variable. I no longer needed the file because the file_contents variable has all of it, so I close the file right away. I can then work with the contents using my variable after that, like printing it as shown in this code.

Writing to a File
Next, to write (add content) to a file we first need to determine if the file we intend to write to will be a new file or if it is an existing file. This is an important distinction because it will determine which access mode we will use. The two modes we can use to write to a file are ...

File Access Mode Description
w
Opens the specified file for writing only. If the file exists already at the location specified, the existing file is overwritten. If the file does not exist already, the file is created with the name specified.
a
Opens the specified file for appending. If the file exists, the file pointer is positioned at the end of the file and the file is set to append mode, that is, any new data written to the file will be appended (added) to the end of the file. If the specified file does not already exist, then the file is created using the file name specified, the file pointer will be at the beginning of the empty file and the file will be set for writing.

Examples of File Access Mode: "w":

Continuing with our Sample.txt file from above, since it is an existing file with multiple lines of content, the following code example will open the file for writing. Subsequently, when we write data to it, its content will be overwritten.

my_file = open(r"C:\Users\John\Documents\Sample.txt", "w")

If we use the same code line with write mode, however with a file name that does not exist, that file will be created first, and then when we write data to it the data will be added beginning on the first line of the file.

my_file = open(r"C:\Users\John\Documents\NewFile.txt", "w")

Example of File Access Mode: "a":

If we consider our Sample.txt file again, however this time if we intend to write content to it without overwriting it, that is we want to add data to it without affecting the existing content, then we change the access mode to append "a".

my_file = open(r"C:\Users\John\Documents\Sample.txt", "a")

Full Code Examples::

Example 1::

my_file = open(r"C:\Users\John\Documents\Sample.txt", "w")
my_file.write("This is a line added to my Sample.txt file by my Python program.")
my_file.close()
Result: Notice in the screenshot below of our Sample.txt file there is only one line in the file now, the line from our Python program above. Since this file was an existing file and we used the write "w" access mode, the original file contents were overwritten by the Python program my_file.write() statement.


Example 2::

my_file = open(r"C:\Users\John\Documents\NewFile.txt", "w")
my_file.write("This is a line added to my NewFile.txt file by my Python program.")
my_file.write("This is another line added to my NewFile.txt file by my Python program.")
my_file.close()
Result: Notice in the screenshot below that in this code example, we created a new file and wrote two lines of text to it with the two write() statements.


Example 3::

my_file = open(r"C:\Users\John\Documents\Sample.txt", "a")
my_file.write("This is a line appended to my Sample.txt file by my Python program.\n")
my_file.write("This is another line appended to my Sample.txt file by my Python program.\n")
my_file.write("This is another line appended to my Sample.txt file by my Python program.\n")
my_file.write("This is another line appended to my Sample.txt file by my Python program.\n")
my_file.write("This is another line appended to my Sample.txt file by my Python program.\n")
my_file.write("This is yet another line appended to my Sample.txt file by my Python program.\n")
my_file.close()
Result: In the screenshot below .



Reading & Writing Files with Loops

In all of the examples above, we read the entire contents of the file into one variable in one read() operation. Another common approach is to read a file one line at a time. This is frequently used when we need to process each line in a file in some way.

Example 1:

Using the Sample.txt file again, this time we'll read the file one line at a time and for each line, we'll print a line number, the length of the line, and the line itself. This is an example of reading a file sequentially so that we can do something with each line as we read them.

line_count = 0
my_file = open(r"C:\Users\John\Documents\Sample.txt", "r")
line = my_file.readline()
while line:
    line_count += 1
    print(str(line_count) + ":\t[Line Length: " + str(len(line)) + "]\t" + line, end="")
    line = my_file.readline()
my_file.close()
Output:

1: [Line Length: 43]	This is the first line of text in my file.
2: [Line Length: 44]	This is the second line of text in my file.
3: [Line Length: 43]	This is the third line of text in my file.
4: [Line Length: 44]	This is the fourth line of text in my file.
5: [Line Length: 43]	This is the fifth line of text in my file.
Code & Output Details
Example 2:

In this example, we will write a new file sequentially using input from the user to create each line for the file and then write each line, one at a time, to the file. The result of this example is called a comma-delimited file, notice the file extension of .csv on Code Line 3. This file format is a very common format for transmitting data records between systems.

print("Enter first names and ages.\n"
    "Press Enter with no entries to finish.\n")
my_file = open(r"C:\Users\John\Documents\NamesAges.csv", "w")
while True:
    first_name = input("First Name: ")
    age = input("Current Age: ")
    if first_name == "" and age == "":
      break
    else:
      my_file.write(first_name + "," + age + "\n")
      print()
my_file.close()
print("\nDone!")
Sample Run & Output:

Enter first names and ages.
Press Enter with no entries to finish.

First Name: Bob
Current Age: 40

First Name: Sally
Current Age: 29

First Name: Raul
Current Age: 37

First Name:
Current Age:

Done!
Code & Output Details Here is the resulting .csv file in a plain text editor:



Handling Records Stored in Files

In our last example above I mentioned that .csv files are a very common format for transmitting data records between systems.

Concept:
Concept: Records

In programming and database terms, a record is a set of data elements that, together, represent some entity, such as a customer, an order, a catalog product, etc. Each record contains from 1 to many attributes that describe the entity. For example, a customer record might include a customer ID, their first name, last name, address, city, state, zip code, phone number, and email address.

Using the Concept definition for the record above, a data file containing many customer records is often in the .csv file format where each line in the file is one record (one customer) and each of the record's attributes is separated by commas. Here's an example .csv file containing customer records:



Notice that every line is one customer and each customer has the same attributes of a customer ID, their first name, last name, address, city, state, zip code, phone number, and email address. Using a looping technique similar to those shown in the previous section above, we can read this .csv file and process the records, which might include producing a report, adding the records to a database, etc. In the following example, we'll read the file and produce a report of the customer records.

Example: This example is a bit more substantial than most we've seen thus far. It combines the current topic of reading records from a file with several other concepts we have covered previously. I recommend that you study this carefully and experiment with it in your IDE. Pay particular attention to how we are reading the records from the customer .csv file. I've included functions for the report header and footer, along with a global variable and techniques for handling slicing and concatenation that would be a good review for you as well.

This example is also a good example of a pattern you commonly see when working with files, that is, we read the file, one line at a time, then for each line we do things with the attributes in the record. In this case, we are printing each attribute and handling each separately so that we can establish proper column widths or slicing and concatenating (phone number for example). In addition, we're using a counter to count the number of records in the file so that we can print the number as a summary in the footer of the report. Be sure to read through the Code Details under the sample output below.

# Global Variables
report_width = 120


# Functions
def print_header():
    report_title = "C u s t o m e r  R e p o r t"
    print("-" * report_width)
    print(" " * int((report_width / 2) - len(report_title) / 2), end="")
    print(report_title)
    print("-" * report_width)
    print("ID".ljust(8), end="")
    print("First".ljust(12), end="")
    print("Last".ljust(12), end="")
    print("Address".ljust(25), end="")
    print("City".ljust(15), end="")
    print("ST".ljust(5), end="")
    print("Zip".ljust(7), end="")
    print("Phone".ljust(17), end="")
    print("Email".ljust(30))
    print("-" * report_width)

def print_footer(counter):
    print("-" * report_width)
    print("Number of Customers: " + str(counter))
    print("-" * report_width)

# Main Program
customer_count = 0
print_header()
my_file = open(r"Customers.csv", "r")
for line in my_file:
    customer_record = line.rstrip().split(',')
    print(customer_record[0].ljust(8), end="")
    print(customer_record[1].ljust(12), end="")
    print(customer_record[2].ljust(12), end="")
    print(customer_record[3].ljust(25), end="")
    print(customer_record[4].ljust(15), end="")
    print(customer_record[5].ljust(5), end="")
    print(customer_record[6].ljust(7), end="")
    print("(" + customer_record[7][0:3] + ")" +
          customer_record[7][3:6] + "-" +
          customer_record[7][6:].ljust(8), end="")
    print(customer_record[8].ljust(30), end="")
    print()
    customer_count += 1
print_footer(customer_count)

Output:

------------------------------------------------------------------------------------------------------------------------
                                    C u s t o m e r  R e p o r t
------------------------------------------------------------------------------------------------------------------------
ID      First       Last        Address                  City           ST   Zip    Phone            Email
------------------------------------------------------------------------------------------------------------------------
123456  Daffy       Duck        123 Quackville Road      Feathers       UT   84555  (222)333-4444    daff@quack.com
234567  Marvin      Martian     234 Crater Lane          Mars           UT   84777  (333)444-5555    marv@mars.org
345678  Tazmanian   Devil       345 Taz Street           Mania          UT   84222  (444)555-6666    taz@devdev.net
456789  Bugs        Bunny       456 Carrot Blvd.         Hopp           UT   84999  (555)777-2222    buggs@hoppy.com
567890  Space       Ghost       999 Space Lane           Orbit          UT   84333  (666)777-8888    ghostie@rocket.com
678901  Yogi        Bear        454 Bear Blvd.           Bearville      UT   84524  (777)888-9999    yogi@bear.net
789012  Fred        Flintstone  825 Rock Street          Bedrock        UT   84846  (888)999-0000    freddy@stone.com
890123  Scooby      Doo         444 Snacks Street        Doggo          UT   84000  (999)111-2222    scoob@snackers.com
901234  Mickey      Mouse       356 Squeek Lane          Cheese         UT   84567  (111)222-3333    mick@mouse.com
466577  Charlie     Brown       987 Snoopy Street        Chuck          UT   84575  (234)645-7737    chuck@peanuts.com
------------------------------------------------------------------------------------------------------------------------
Number of Customers: 10
------------------------------------------------------------------------------------------------------------------------

Code Details: After the print_footer() function completes execution is returned to the main program which ends since there are no more code lines after the function call.

Libraries for File Handling

There are numerous Python library modules available for handling different types of files (csv, json, xml, etc.) and different file operations (reading, writing, parsing, etc.). In this section I demonstrate a few of the common libraries. Keep in mind though that this is only a small sample of what is available. You can find more file handling modules by searching the Python Package Index .

CSV Files

CSV (Comma Separated Values) is a file format used to store and exchange data between different software applications. CSV files are plain text files that store data in a tabular format, with each row representing a record, and each column representing a field of the record. The fields in a CSV file are separated by commas, hence the name "comma-separated values. CSV files are widely used because they are simple, lightweight, and can be easily imported into and exported from a variety of software applications, including spreadsheets, databases, and programming languages. They are also human-readable, making them easy to edit and understand. They are structured as tables, with rows and columns. Each row represents a record, while each column represents a field in that record. The first row of a CSV file typically contains the headers for each column, while subsequent rows contain the data. CSV files use a delimiter, usually a comma, to separate fields. However, other characters such as semicolons or tabs can also be used as delimiters, depending on the software application used to create or read the file. To avoid any issues with the delimiter characters appearing within fields, CSV files can use quotes to enclose fields that contain them. Double quotes are typically used for this purpose. CSV files are typically encoded in ASCII or UTF-8, which are widely supported and can be read by most software applications. Each row in a CSV file is separated by a line break, which can be represented using different characters depending on the operating system. For example, Windows uses a carriage return and line feed sequence ("\r\n"), while Unix-based systems use a line feed ("\n") character. CSV files usually have a ".csv" file extension, which helps identify them as CSV files.

In Python, we can use the csv library to work with csv files. You can find full documentation for this library here . And here is an example of using the CSV library:

Code

import csv

with open('Customers.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['ID','FirstName', 'LastName', 'Address', 'City', 'State', 'Zip','Phone','Email'])
    writer.writerow(['123456','Daffy','Duck','123 Quackville Road','Feathers','UT','84555','(222)333-4444','daff@quack.com'])
    writer.writerow(['234567','Marvin','Martian','234 Crater Lane','Mars','UT','84777','(333)444-5555','marv@mars.org'])
    writer.writerow(['345678','Tazmanian','Devil','345 Taz Street','Mania','UT','84222','(444)555-6666','taz@devdev.net'])
    writer.writerow(['456789','Bugs','Bunny','456 Carrot Blvd.','Hopp','UT','84999','(555)777-2222','buggs@hoppy.com'])
    writer.writerow(['567890','Space','Ghost','999 Space Lane','Orbit','UT','84333','(666)777-8888','ghostie@rocket.com'])

with open('Customers.csv', 'a', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['678901','Yogi','Bear','454 Bear Blvd.','Bearville','UT','84524','(777)888-9999','yogi@bear.net'])
    writer.writerow(['789012','Fred','Flintstone','825 Rock Street','Bedrock','UT','84846','(888)999-0000','freddy@stone.com'])
    writer.writerow(['890123','Scooby','Doo','444 Snacks Street','Doggo','UT','84000','(999)111-2222','scoob@snackers.com'])
    writer.writerow(['901234','Mickey','Mouse','356 Squeek Lane','Cheese',' UT','84567','(111)222-3333','mick@mouse.com'])
    writer.writerow(['466577','Charlie','Brown','987 Snoopy Street','Chuck','UT','84575','(234)645-7737','chuck@peanuts.com'])

with open('Customers.csv', 'r', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print("Row: ", row)
    print()
    print(row[0], "\t" + row[1] + " " + row[2] + "\n\t" + row[3] + "\n\t" +
          row[4] + ", " + row[5] + " " + row[6] + "\n\t" + row[7] + "\n\t" +
          row[8] + "\n")

Output

Row:  ['ID', 'FirstName', 'LastName', 'Address', 'City', 'State', 'Zip', 'Phone', 'Email']
Row:  ['123456', 'Daffy', 'Duck', '123 Quackville Road', 'Feathers', 'UT', '84555', '(222)333-4444', 'daff@quack.com']
Row:  ['234567', 'Marvin', 'Martian', '234 Crater Lane', 'Mars', 'UT', '84777', '(333)444-5555', 'marv@mars.org']
Row:  ['345678', 'Tazmanian', 'Devil', '345 Taz Street', 'Mania', 'UT', '84222', '(444)555-6666', 'taz@devdev.net']
Row:  ['456789', 'Bugs', 'Bunny', '456 Carrot Blvd.', 'Hopp', 'UT', '84999', '(555)777-2222', 'buggs@hoppy.com']
Row:  ['567890', 'Space', 'Ghost', '999 Space Lane', 'Orbit', 'UT', '84333', '(666)777-8888', 'ghostie@rocket.com']
Row:  ['678901', 'Yogi', 'Bear', '454 Bear Blvd.', 'Bearville', 'UT', '84524', '(777)888-9999', 'yogi@bear.net']
Row:  ['789012', 'Fred', 'Flintstone', '825 Rock Street', 'Bedrock', 'UT', '84846', '(888)999-0000', 'freddy@stone.com']
Row:  ['890123', 'Scooby', 'Doo', '444 Snacks Street', 'Doggo', 'UT', '84000', '(999)111-2222', 'scoob@snackers.com']
Row:  ['901234', 'Mickey', 'Mouse', '356 Squeek Lane', 'Cheese', ' UT', '84567', '(111)222-3333', 'mick@mouse.com']
Row:  ['466577', 'Charlie', 'Brown', '987 Snoopy Street', 'Chuck', 'UT', '84575', '(234)645-7737', 'chuck@peanuts.com']

466577  Charlie Brown
        987 Snoopy Street
        Chuck, UT 84575
        (234)645-7737
        chuck@peanuts.com

Code Details

JSON Files

JSON (JavaScript Object Notation) is a file format used as a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. JSON files are text files that contain data in the JSON format. JSON is widely used for transmitting data between a client and a server in web applications, and is also used as a data storage format in many applications. JSON data is composed of key-value pairs, where each key is a string and the value can be a string, number, boolean, array, or another JSON object. JSON objects are enclosed in curly braces {} and consist of zero or more key-value pairs, separated by commas. JSON files can be created and edited using a simple text editor or an integrated development environment (IDE). Many programming languages provide built-in support for working with JSON data, including parsing and generating JSON files.

In Python, we can use the json library to work with json files. You can find full documentation for this library here .

XML Files

XML (Extensible Markup Language) is a markup language that is widely used for data exchange and storage on the web. XML files are plain text files that contain data in a structured format. The data is enclosed in tags, which are similar to HTML tags, but have no predefined meaning. XML files can be used to represent a variety of data, including documents, configuration files, and data records. They are widely used in web services, as well as in software applications that require data exchange and interoperability between different systems. XML files are hierarchical in nature, with each tag representing a node in a tree-like structure. The root node is the top-level node, and all other nodes are its descendants. Each node can have one or more child nodes, and may also have attributes that provide additional information about the node. XML files typically start with an XML declaration, which identifies the version of the XML standard being used and any other special features of the document. After the XML declaration, the document typically contains a root element, which encloses all other elements in the document. Elements can contain other elements, as well as text data. They can also have attributes, which are enclosed in the opening tag and provide additional information about the element. XML files can be created and manipulated using various programming languages, including Python. The Python standard library provides several modules for working with XML files, including xml.etree.ElementTree, which provides a lightweight and easy-to-use API for parsing and creating XML files.

In Python, we can use the xml library to work with xml files. You can find full documentation for this library here .

Practice Problems

Problem 1

Write a Python program that performs the following tasks based on a text file:

  • Read the contents of the given text file.
  • Report the number of words in the file.
  • Remove all punctuation from the content (do this after reading the file, not in the file itself).
  • Create a list of unique words from the file contents.
  • Report the longest word in the file and its length
  • Report the number of unique words in the file.
  • Sort the list of unique words.
  • Print a columnar list of unique words with their number of occurrences.
Note 1: If you would like to follow along with the provided Solution to this Practice Problem you can download the text file I use for the solution code here.




 


«  Previous : Files : Files & File Systems
Next : Disciplines  »



© 2023 John Gordon
Cascade Street Publishing, LLC