Table of Contents » Chapter 3 : Processing : Repetition : Parsing Data
Parsing Data
Contents
Overview
In Chapter 2, I introduced numerous concepts about data and ways in which we might acquire data to process in our programs. String data in the forms of characters, words, phrases, sentences, paragraphs, etc. are very common in computing and often need to be processed in many different ways. Examples of processing data include counting characters or words, locating instances of particular words or phrases, extracting key words, names, places, dates (often called entitied), validating important information in textual data, and many others. On this page we will explore parsing strings of several types.
While we know how to read the entire contents of a file into a variable and print it, we often need to analyze the contents of a file more closely than its entire contents. In order to examine long strings more closely, we use a technique called parsing.
One of the first decisions we need to make when parsing string data is what to do about punctuation. In some cases, we'll want to preserve punctuation during parsing, but mostly commonly we'll remove punctuation so that our parsed tokens are independent of punctuation that existed in the original string. There are a number of approaches we can use to remove punctuation from a string. For now, we'll use repetition to do this, and then later we'll learn more efficient approaches.
Let's use the To be or not to be, that is the question. string example above for an example. In this string, there are two punctuation characters, the comma and the ending period. We could simply use the string replace() method to remove the two punctuation characters, like this:
s = "To be or not to be, that is the question."
print(s)
s = s.replace(",","")
print(s)
s = s.replace(".","")
print(s)
Output
To be or not to be, that is the question.
To be or not to be that is the question.
To be or not to be that is the question
While this works, it's really restricted to the one string example we have right now. What about other strings with different punctuation in them?
A bit more of a generic example would be to use a list of punctation characters we want to eliminate from any string through the use of comparison and repetition (loop) to remove any of the punctuation characters we don't want in our strings. Here's an example:
shake_string = "To be or not to be, that is the question."
out_string = ""
punctuation = ["!", "\"", "#", "$", "%", "&", "\'",
"(", ")", "*", "+", ",", "-", ".",
"/", ":", ";", "<", "=", ">", "?",
"@", "[", "\\", "]", "^", "_", "{",
"|", "}", "~", "`", "."]
for s in shake_string:
if s not in punctuation:
out_string += s
print(shake_string)
print(out_string)
Output
To be or not to be, that is the question.
To be or not to be that is the question
Code Details
- Note:
- This code example combines several key concepts we have learned thus far: variables, strings, lists, for loops, decisions, indentation, nested indentation, and assignment operator cominations (+= in this example).
- Code Line 1: First we assign our string to a variable.
- Code Line 2: We'll also create another variable that we will use to create a new string without the punctuation.
- Code Lines 3 thru 7: Next, we'll create a reusable list containing all of the punctuation characters we want to remove from any string we process in this code, that is, any string we assign to our string_to_process variable.
- Notes:
- There are special cases in which punctuation should be preserved, such as names like O'Reilly, so you can adjust the list of characters in the punctuation list as needed.
- Later in this chapter we will see a better approach to reusing segments of code, called Programmer Defined Functions.
- Notes:
- Code Line 8: Now we need our repetition code to loop through the string, character by character. The for loop is a good solution for this need since we can use the for in construct. The code in this example, for s in shake_string:, could be read as "for each character (s) in the string stored in the shake_string variable".
- Code Line 9: Then, inside the loop, we include a decision that checks if the character (s) in each iteration of the loop is (not) in the list of punctuation characters stored in the punctuation list we defined on Code Line 3. Notice that this if construct is very similar to the for loop construct above. The in operator is very useful with iterables, like lists. Also, notice that we do not need the else part of our decision structure because we included the not in our loop condition.
- Code Line 10: When the decision condition indicates that the character (s) in the current iteration is not punctuation, then we add that character to our new string out_string. So, as the loop iterates, each non-punctuation character gets added to the out_string
- Note:
- You can add a line of code here if you want to see each iteration of the loop to watch the out_string being built one character at a time during each iteration, like this:
for s in string_to_process: if s not in punctuation: out_string += s print(out_string) # << Add this line to see each character added per iteration print(string_to_process) print(out_string)
- You can add a line of code here if you want to see each iteration of the loop to watch the out_string being built one character at a time during each iteration, like this:
- Note:
- Code Line 11: Then, after the loop completes (based on the length of the string), we print the original string so we can compare to the out_string.
- Code Line 12: Lastly, we print the out_string to see the results of our processing.
- Notes:
- There are more efficient approaches for removing punctuation from strings, such as regular expressions, the string translate method, the filter function, and list comprehensions, all of which we will explore later.
- I would recommend changing the orginal string to something else with other punctuation included and re-running this to see the results and test the processing.
Depending on the reason we are parsing strings, in addition to dealing with punctuation, we may also need to consider whether to maintain the case of the objects in the original string or to make them all the same. In some circumstances, like the need to preserve capitalization of nouns, it is important to maintain the original case. In those circumstances, we would not need to do anything to the string after removing punctuation. However, other circumstances, like the need to be able to easily compare the ojects in the original string with other objects, it may be best to set the entire string to the same case (upper or lower).
For our purposes here, we will set the case to all lower case as we will be comparing objects from our original string with search words and it will be simpler to do so if the cases are all the same. The following code is identical to the code in the punctuation removal section above, other than the addition of the lower() method to the variable s on Code Line 10.
shake_string = "To be or not to be, that is the question."
out_string = ""
punctuation = ["!", "\"", "#", "$", "%", "&", "\'",
"(", ")", "*", "+", ",", "-", ".",
"/", ":", ";", "<", "=", ">", "?",
"@", "[", "\\", "]", "^", "_", "{",
"|", "}", "~", "`", "."]
for s in shake_string:
if s not in punctuation:
out_string += s.lower() # << This is the change to lower case the output string.
print(shake_string)
print(out_string)
The decision to change case or not is often indicated in data processing requirements or becomes obvious as we develop our solutions.
Now that we have learned the fundamentals of dealing with punctuation, now we'll learn to parse strings. The primary string method we use to parse strings is the split() method. This method divides a string into tokens (substrings) based on a specified delimiter (separator) and stores the tokens in a list. By default, the delimiter is any kind of space (spaces, newlines, tabs, etc.), otherwise split() will divide the string using a separator that we specify. The result of using the split() method is a Python list containing the divided (parsed) parts (tokens) of the original string.
Here are a few examples of using split() with the default delimiter and using a specified delimiter:
Example 1
# Using split() with the default delimiter (that is, none specified)
txt = "This is a string."
lst = txt.split()
print(lst)
Output
['This', 'is', 'a', 'string.']
Code & Output Details
- Code Line 2: On this line I initialized a variable txt to a string.
- Code Line 3: Next, I initialize a varible to receive the result of the string's split() method.
- Code Line 4: When I print the resulting list, we can see in the output that the string has been parsed (split) based on the default delimiter of spaces, since I did not specify the delimiter.
- Notes:
- Note that each element of the resulting list contains each word in the original string.
- Also, the spaces in the original string are not part of the result, they have dropped out.
- And, note that punctuation remains in the string as part of the closest word. In this example, the ending period appears as part of the last word in the list. This is important to be aware of, depending on your application and reason for parsing, you may need to take steps to remove the punctuation.
Example 2
If we use the same code again, but this time include a specific delimiter, such as the letter s, we can see different results:
# Using split() and specifying a delimiter
txt = "This is a string."
lst = txt.split('s')
print(lst)
Output
['Thi', ' i', ' a ', 'tring.']
Code & Output Details
- Code Line 2: On this line I initialized a variable txt to a string.
- Code Line 3: Next, I initialize a varible to receive the result of the string's split() method, this time I included the delimiter 's'.
- Code Line 4: When I print the resulting list, we can see in the output that the string has been parsed (split) based on the letter s in the original string.
- Notes:
- The original string contained three characters s, so the resulting list contains four tokens.
- Since the split() delimiter can be any valid alphanumeric character, using 's' as our delimiter parsed the string on that character.
- Note that the 's' characters in the original string have dropped out of the tokens, this is because Python uses those characters as the delimiter which is considered not part of the output.
Example 3
Another (more common) example of using split() with a delimiter is comma-separated data where there are words or strings separated by commas, like this:
# Using split() and specifying comma as the delimiter
txt = "Bob Smith, 1234 Nowhere Lane, Someplace, UT, 84999"
lst = txt.split(',')
print(lst)
Output
['Bob Smith', ' 1234 Nowhere Lane', ' Someplace', ' UT', ' 84999']
Code & Output Details
- Code Line 2: On this line I initialized a variable txt to a comma-separated string containing a name, address, city, state, and zip code.
- Code Line 3: Next, I initialize a varible to receive the result of the string's split() method based on the commas as the delimiter.
- Code Line 4: When I print the resulting list, we can see in the output that the string has been parsed (split) based on the , in the original string.
- Notes:
- The original string contained four commas, so the resulting list contains five tokens.
- Since the split() delimiter is the comma, we now have a list containing each of the data elements for the name, address, city, state, and zip code.
- Note that the comma delimiters in the original string have dropped out of the tokens, this is because Python uses those characters as the delimiter which is considered not part of the output.
Once we have parsed a string, and have a list of the tokens, we can use those tokens for various purposes. Here are some examples:
Use Case: Counting Words
A common task to perform with a parsed string is to count the number of words in the string. As with most things in programming, there's more than one way to count words in a string. Since we are focued on repetition structures in this chapter, we'll look at an approach using repetition for now, and will learn more approaches as we proceed. Since the tokens of a string are stored in a list as a result of using the split() method, counting the number of words in a string is actually counting tokens in a list. Given this, we can simply iterate through the list using the for in construct and use an accumulator variable to count the number of iterations through the list.
Here's an example:
# For the sake of simplicity and focus, we will assume that punctuation
# has already been removed from the string as described above.
txt = "to be or not to be that is the question"
accumulator = 0
lst = txt.split()
for tok in lst:
accumulator += 1
print("Number of words (tokens) in the string:", accumulator)
Output
Number of words (tokens) in the string: 10
Code & Output Details
- Code Line 3: This line assigns our string to a variable.
- Code Line 4: We also create an accumulator variable that we use to keep a running total (accumulation) of the count of tokens in the list.
- Code Line 5: Next we'll use the split() method of the string variable to divide the string into individual words (tokens). The result of using split() is the creation of a list containing all of the tokens from the split string as individual objects.
- Code Line 6: On this line we write our repetition (loop) using the for in structure. We could read this as for each token (tok) in the list (lst)".
- Code Line 7: Inside the loop, our code block accumulates the count of the number of tokens (words) in the list that came from the string.
- Code Line 8: After the loop finishes, we print a summary statement and the number of tokens counted by the loop. This is the count of the number of words parsed from the original string.
Alternative Approach Using a Python Function
Because the split() method stores the string's tokens in a list, there is a much simpler approach of counting the number of tokens in the list without the need for looping at all, that is, using the len() function. Here's an example:
txt = "To be or not to be that is the question"
lst = txt.split()
print("Number of words:", len(lst))
Note that in this alternative code, there is no need for the accumulator or the loop, in the print statement we can simply use the len() function instead. Since we are focused on repetition however, we will use repetition in the following use cases as well.
Use Case: Counting Specific Words
In this use case, we want to count the number of times a particular word (token) is in a string. The code to do this is very similar to the code we wrote in the previous example use case for counting words. The key difference is that in the code block inside of the loop, we'll need a conditional decision to determine if each word in the list matches the specified search word. When it matches, we'll increment our accumulator. When the token does not match, then we'll skip to the next token. Here's the code to accomplish this:
txt = "To be or not to be that is the question"
search_word = "be"
accumulator = 0
lst = txt.split()
for tok in lst:
if tok == search_word:
accumulator += 1
print("Number of occurrences of the word '" + search_word + "' is", accumulator, end=".")
Output
Number of occurrences of the word 'be' is 2.
Code & Output Details
- Code Line 1: This line assigns our string to a variable.
- Code Line 2: On this line we create another variable that will hold the word we want to count in the string, we'll call it our search_word.
- Code Line 3: We also create an accumulator variable that we use to keep a running total (accumulation) of the count of tokens that match the search_word in the list.
- Code Line 4: Next we'll use the split() method of the string variable to divide the string into individual words (tokens). The result of using split() is the creation of a list containing all of the tokens from the split string as individual objects.
- Code Line 5: On this line we write our repetition (loop) using the for in structure. We could read this as for each token (tok) in the list (lst)".
- Code Line 6: Inside the loop, our code block starts with a decision statement based on the condition of whether the token in each iteration matches the search_word.
- Code Line 7: If the token matches the search word, we increment the accumulator by 1.
- Code Line 8: After the loop finishes, we print a summary statement and the number of tokens counted by the loop. This is the count of the number of words parsed from the original string.
Use Case: Reversing a String by Word
Another use case that will add to your understanding of how to process strings that have been split() is reversing the string by word. Reversing strings usually means to reverse all of the characters, last character first to first character last. For example, if we start with the string to be or not to be and reverse it, it would turn out to be eb ot ton ro eb ot. This is reversing the order of all characters, however we want to reverse the order of the words in the string to read like be to not or be to instead. Since the split() method loads a list with the parsed tokens of the original string, there are a couple of ways in which we can reverse the word order.
Since we are focused on repetition in this chapter, we'll use a loop to reverse the order of the tokens in a list. Here's an example:
txt = "to be or not to be that is the question"
lst = txt.split()
rev_lst = []
for tok in lst:
rev_lst.insert(0, tok)
print("Original List: ", lst)
print("Reversed List: ", rev_lst)
Output
Original List: ['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
Reversed List: ['question', 'the', 'is', 'that', 'be', 'to', 'not', 'or', 'be', 'to']
Code & Output Details
- Code Line 1: This line assigns our string to a variable.
- Code Line 2: Next, we use the split() string method to divide the string into tokens and store them in a list.
- Code Line 3: We'll create another variable to hold an empty list that we'll use to store the reversed list.
- Code Line 4: On this line we write our repetition (loop) using the for in structure. We could read this as for each token (tok) in the list (lst).
- Code Line 5: The code block inside the loop uses the list insert() method to insert each token at the beginning of the list, effectively reversing the order of the list.
- Code Line 6: To demonstrate the result, first we print the original list.
- Code Line 7: Then we print the reversed (copy) of the list.
Alternative Approach Using a List Method
In Python, the list data structure includes a method for reversing the list. Here's an example:
txt = "To be or not to be that is the question"
lst.reverse()
print("Reversed List:", lst)
Note that in this alternative code, there is no need for a loop, we can simply use the list reverse() method. One detail to note that may be important depending on the application--the reverse() method reverses the original list's order. So, if the original order needs to be preserved, this approach of using reverse() on the original list may not be the best choice.
As introduced in Chapter 2 ↗, Data files are common resources we use to store and transmit data. There are many types of files we could engage with as programmers. For now, we will work with plain text files (files with a file extension of .txt) and comma-separate value files (files with a file extension of .csv). These two types of files are very common and learning to work with them programmatically is an important skill.
Text files (often referred to as plain text files) can contain any type of alphanumeric data in various forms. In Chapter 2 ↗ we saw that it is easy to create, read, and write the contents of a text file. In the example found there as a guide, let's create a file, write a name to it, close the file, reopen it, read its contents, and then print the contents:
# Open a file to store some text. Since this file
# does not exist, it will be created by the open
# function in the following code:
with open('example.txt', 'w') as file:
file.write("This is a sentence.\n") # < Note the use of the \n escape sequence
# Reopen the file and read it
with open('example.txt', 'r') as file:
contents = file.read()
print(contents)
Output:
This is a sentence.
Now that the text file exists, let's reopen it and add more lines of text, like this:
# Append more text
with open('example.txt', 'a') as file:
file.write("This is another sentence.\n") # < Note the use of the \n escape sequence
file.write("And another.\n") # < in the data written to the file
file.write("We can append as many lines of text as we want.\n")
# We could have done all of these in one write() line since the
# escape sequence will push each name to its own line, like this:
file.write("More text\n... and more ...\nok, ok, that's enough.\n")
file.close()
# Reopen the file and read it
with open('example.txt', 'r') as file:
contents = file.read()
print(contents)
file.close
Output:
This is a sentence.
This is another sentence.
And another.
We can append as many lines of text as we want.
More text
... and more ...
ok, ok, that's enough.
In the code above we use the print(contents) statement after we've read the file contents. This works fine for our purposes here where we just want to print the entire contents of the file. However, in other instances we may want to work with the file contents one line at a time instead. To do this we can use repetition and the readline() method of the file object.
Here's a code example of reading each line of the example.txt file we created above.
line_count = 0
my_file = open("example.txt", "r")
line = my_file.readline()
while line:
line_count += 1
print("Line " + str(line_count) + ": Length: " + str(len(line)) + "\t" + line, end="")
line = my_file.readline()
my_file.close()
Output:
Line 1: Length: 20 This is a sentence.
Line 2: Length: 26 This is another sentence.
Line 3: Length: 13 And another.
Line 4: Length: 48 We can append as many lines of text as we want.
Line 5: Length: 10 More text
Line 6: Length: 17 ... and more ...
Line 7: Length: 23 ... ok, ok, that's enough.
Code Details:
- Code Line 1: First we create a variable that will accumulate (count) the number of lines in the file.
- Code Line 2: Next, we use the open() function to open the file in read access mode. The file object reference is stored in the variable my_file.
- Code Line 3: Next, we use the readline() method of the file object to read one line, the first line, of the file. This method reads only one line, not the entire contents of the file as we did in the previous code examples.
- Code Line 4: Next, our while loop syntax is while line:, which takes advantage that in Python, an empty string used in a condition is evaluated as false, and when a string contains content, it is evaluated as true.
- Code Line 5: Inside the while loop's code block, first we increment the counter variable. The first iteration it will be set to 1 since it was initialized to 0 on Code Line 1.
- Code Line 6: Next, we print one output line containing the line number of the file we just read, the length of the line, and the content of the line itself.
- Code Line 7: On the last line of the while loop's code block we use the readline() method again to read the next line in the file. When we get to the last line, the line variable will be empty (false) which will cause the while loop to terminate.
- Code Line 8: And lastly we close the file.
Another example of why we might read a file one line at a time is that we might want to store each line in a data structure, which as a list, for further processing.
Here's a code example of reading each line of the example.txt file we created above and store each one in a list.
lst = []
my_file = open("example.txt", "r")
line = my_file.readline()
while line:
lst.append(line)
line = my_file.readline()
my_file.close()
print(lst)
Output:
['This is a sentence.\n', 'This is another sentence.\n', 'And another.\n', 'We can append as many lines of text as we want.\n', 'More text\n', '... and more ...\n', "... ok, ok, that's enough.\n"]
Code Details:
- Code Line 1: First we declare a empty list that we'll use to store the lines from the file.
- Code Line 2: Next, we use the open() function to open the file in read access mode. The file object reference is stored in the variable my_file.
- Code Line 3: Next, we use the readline() method of the file object to read one line, the first line, of the file. This method reads only one line, not the entire contents of the file as we did in the previous code examples.
- Code Line 4: Next, our while loop syntax is while line:, which takes advantage that in Python, an empty string used in a condition is evaluated as false, and when a string contains content, it is evaluated as true.
- Code Line 5: Inside the while loop's code block, first we use the list's append method to add the current line from the file to the list.
- Code Line 6: On the last line of the while loop's code block we use the readline() method again to read the next line in the file. When we get to the last line, the line variable will be empty (false) which will cause the while loop to terminate.
- Code Line 7: Now that the loop is finished, we don't need the file open any longer, so we close it.
- Code Line 8: The, to demonstrate that we now of the contents of the file stored in our list, we print the list.
In the previous examples we wrote individual lines to a text file, so we treated each line (sentence) individually. However, often times text files contain paragraphs, that is, more than one sentence with no line breaks (\n). When we read text files containing paragraphs, we'll often read the entire contents and then use the split() function to parse paragraphs into sentences as needed.
In the next code example, we'll write a paragraph to a text file and then read it and prints its contents. The text used here is generic placeholder text from Lorem Ipsum generator ↗, which can be a very useful tool to practice processing text in Python.
txt = """Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."""
with open('example.txt', 'w') as file:
file.write(txt)
with open('example.txt', 'r') as file:
contents = file.read()
print(contents)
Output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Code Details:
- Code Lines 1 thru 7: These lines assign a long string to the variable txt. Note the use of the triple-quote syntax that allows us to place the long string on multiple lines within our code.
- Code Lines 9 thru 14: The subsequent syntax is the same as our Chapter 2 example of reading the entire contents of a file and storing it in a variable. And we can then print the entire contents. This works well when we want to work with the entire contents of a file as a single string.
It is important to remember that the paragraph above is one long string. So when we store the contents of the file in the contents variable (Code Line 13 above), it contains the entire paragraph. We can parse it into individual sentences if we need to using the split() function. Here's an example:
txt = """Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. """
with open('example.txt', 'w') as file:
file.write(txt)
with open('example.txt', 'r') as file:
contents = file.read()
print(contents)
# Now let's parse the paragraph stored in the contents variable
# into indivdual sentences using the split() method, and store
# the sentences in a list, and print the list
sentences = contents.split(". ")
# Confirm the data type of the sentences variable ...
print()
print(type(sentences))
print()
# Print the entire list...
print(sentences)
print()
# Now print the sentences one per line ...
for s in sentences:
print(s)
Output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
< class 'list' >
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua', 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat', 'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur', 'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.']
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
Code Details:
- Code Lines 1 thru 9: These lines are identical to the previous example.
- Note: Read the comments in the code.
- Code Line 13: Here we use the string split() method to parse the entire contents of the file based on the delimiter of ". " (a period and a space, which is at the end of every sentence in the paragraph). This creates a list that contains each sentence in the paragraph as a separate element in the list.
- Code Line 16: Here we print the data type of the sentences variable, to confirm and demonstrate that it is a list.
- Code Line 19: Next we print the entire list to inspect its contents, where we see one element per sentence.
- Code Lines 22 thru 23: Here we use a for loop to print each sentence on its own line.
- Note: Once we have read a file containing paragraph data like this, and have split() it into a list, we can then process each sentence individually as needed.
When we're working with text files, we often need to programmatically analyze the contents of a file as an act of discovery about those contents. Text Analysis is a speciality within programming and can be very sophisticated. There are some fundamental tasks within the subfield we can explore here as part of our exploration of repetition and parsing.
We'll use an example to learn approaches to complete the following tasks on a text file:
- Read the contents of the given text file.
- Report the number of words in the file.
- Remove all punctuation from the content (do this after reading the file, not in the file itself).
- Create a list of unique words from the file contents.
- Report the longest word in the file and its length
- Report the number of unique words in the file.
- Sort the list of unique words.
- Print a columnar list of unique words with their number of occurrences.
Note 1: If you would like to follow along with the provided Solution to this Practice Problem you can download the text file I use for this code example here.
Note 2: Study the code comments carefully for information about the code segments.
Code
print("-" * 80)
print("Simple Analysis of File: DeclarationOfIndependence.txt")
print("-" * 80)
# Create list of punctuation characters we'll use to remove punctuation from the content
punctuation = ['~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '`',
'-', '=', '{', '}', '|', '[', ']', '\\', ':', '"', ';', '<', '>', '?',
',', '.', '/']
# Connect to our text file
file_contents = open(r"C:\Users\John\Documents\declarationofindependence.txt", "r")
# Read file contents into string variable
declaration = file_contents.read()
# Set everything to lower case
declaration = declaration.lower()
# Remove punctuation from string
for i in punctuation:
declaration = declaration.replace(i, ' ')
# Split string into individual words into list
words = declaration.split()
# The number of words in the file is the length of the words list
print("Number of words in the file: " + str(len(words)))
# Gather list of unique words in the content
# Create an empty list to store unique words
unique_words = []
# Loop through the words list and add each word to
# the unique_words list just once
for word in words:
if word not in unique_words:
unique_words.append(word)
print("Number of unique words in the file: " + str(len(unique_words)))
# Sort the list
unique_words.sort()
# Set counter variables we will use to identify the list index,
# the longest word length and index
list_index = 0
longest_word_length = 0
longest_word_index = 0
for w in unique_words:
# Determine the longest word in the unique list
if len(w) > longest_word_length:
longest_word_index = list_index
longest_word_length = len(w)
# Replace the word string with the string plus the number of
# occurrences in the words list in parenthesis
unique_words[list_index] = w + " (" + str(words.count(w)) + ")"
list_index += 1
print("-" * 80)
print("Longest Word: " + unique_words[longest_word_index] +
" Length: " + str(longest_word_length))
print("-" * 80)
print("Unique Words and Number of Occurrences Each:")
print("-" * 120)
list_index = 0
line_count = 1
for w in unique_words:
print(unique_words[list_index].ljust(20) + " ", end="")
if line_count % 5 == 0:
print()
list_index += 1
line_count += 1
print()
print("-" * 120)
Output
--------------------------------------------------------------------------------
Simple Analysis of File: DeclarationOfIndependence.txt
--------------------------------------------------------------------------------
Number of words in the file: 1326
Number of unique words in the file: 534
--------------------------------------------------------------------------------
Longest Word: representatives (1) Length: 15
--------------------------------------------------------------------------------
Unique Words and Number of Occurrences Each:
------------------------------------------------------------------------------------------------------------------------
a (16) abdicated (1) abolish (1) abolishing (3) absolute (3)
absolved (1) abuses (1) accommodation (1) accordingly (1) accustomed (1)
acquiesce (1) act (1) acts (2) administration (1) affected (1)
after (1) against (2) ages (2) all (10) allegiance (1)
alliances (1) alone (1) already (1) alter (2) altering (1)
america (1) among (5) amongst (1) amount (1) an (5)
and (57) annihilation (1) another (1) answered (1) any (2)
appealed (1) appealing (1) appropriations (1) arbitrary (1) are (9)
armed (1) armies (2) arms (1) as (4) assembled (1)
assent (4) assume (1) at (4) attempts (1) attend (1)
attentions (1) authority (1) away (1) bands (1) barbarous (1)
be (9) bear (1) become (1) becomes (2) been (4)
begun (1) benefits (1) between (1) beyond (1) bodies (2)
boundaries (1) brethren (2) bring (1) britain (2) british (2)
burnt (1) but (1) by (13) called (1) candid (1)
captive (1) cases (2) cause (1) causes (2) certain (1)
changed (1) character (1) charters (1) circumstances (2) citizens (1)
civil (1) civilized (1) coasts (1) colonies (4) combined (1)
commerce (1) commit (1) common (1) complete (1) compliance (1)
conclude (1) conditions (2) congress (1) conjured (1) connected (1)
connection (1) connections (1) consanguinity (1) consent (3) constitution (1)
constrained (1) constrains (1) contract (1) convulsions (1) correspondence (1)
country (1) course (1) created (1) creator (1) crown (1)
cruelty (1) cutting (1) dangers (1) deaf (1) death (1)
decent (1) declaration (2) declare (2) declaring (2) define (1)
denounces (1) dependent (1) depository (1) depriving (1) deriving (1)
design (1) desolation (1) despotism (1) destroyed (1) destruction (1)
destructive (1) dictate (1) direct (1) disavow (1) disposed (1)
dissolutions (1) dissolve (1) dissolved (2) distant (1) districts (1)
divine (1) do (3) domestic (1) duty (1) each (1)
earth (1) eat (1) effect (1) elected (1) emigration (1)
encourage (1) endeavored (2) endowed (1) ends (1) enemies (1)
english (1) enlarging (1) entitle (1) equal (2) erected (1)
establish (1) established (1) establishing (2) establishment (1) events (1)
every (2) evident (1) evils (1) evinces (1) example (1)
excited (1) executioners (1) exercise (1) experience (1) exposed (1)
extend (1) facts (1) fall (1) fatiguing (1) fellow (1)
firm (1) firmness (1) fit (1) for (29) forbidden (1)
foreign (2) foreigners (1) form (2) former (1) formidable (1)
forms (2) fortunes (1) foundation (1) free (4) friends (2)
from (6) frontiers (1) full (1) fundamentally (1) future (1)
general (1) giving (1) god (1) good (2) governed (1)
government (6) governments (3) governors (1) great (2) guards (1)
hands (1) happiness (2) harass (1) has (21) have (11)
having (1) he (19) head (1) here (2) high (1)
his (9) history (2) hither (2) hold (3) honor (1)
houses (1) human (1) humble (1) immediate (1) impel (1)
importance (1) imposing (1) in (19) incapable (1) indeed (1)
independence (1) independent (4) indian (1) inestimable (1) inevitably (1)
inhabitants (2) injuries (1) injury (1) institute (1) instituted (1)
instrument (1) insurrections (1) intentions (1) interrupt (1) into (2)
introducing (1) invariably (1) invasion (1) invasions (1) invested (1)
is (10) it (6) its (3) judge (1) judges (1)
judiciary (1) jurisdiction (2) jury (1) just (1) justice (3)
kept (1) kindred (1) king (1) known (1) lands (1)
large (4) laws (9) laying (1) legislate (1) legislation (1)
legislative (2) legislature (2) legislatures (2) let (1) levy (1)
liberty (1) life (1) light (1) likely (1) lives (2)
long (3) made (1) magnanimity (1) mankind (3) manly (1)
many (1) marked (1) may (2) meantime (1) measures (1)
men (2) mercenaries (1) merciless (1) migrations (1) military (1)
mock (1) more (1) most (5) multitude (1) murders (1)
must (1) mutually (1) name (1) nation (1) native (1)
naturalization (1) nature (1) nature’s (1) necessary (2) necessity (2)
neglected (1) neighboring (1) new (4) nor (1) not (1)
now (1) object (2) obstructed (1) obstructing (1) obtained (1)
of (77) off (2) offenses (1) officers (1) offices (2)
on (8) once (1) one (1) only (2) operation (1)
opinions (1) opposing (1) oppressions (1) or (2) organizing (1)
other (3) others (3) ought (2) our (26) out (2)
over (2) own (1) paralleled (1) parts (1) pass (3)
patient (1) payment (1) peace (3) people (10) perfidy (1)
petitioned (1) petitions (1) places (1) pledge (1) plundered (1)
political (2) population (1) power (3) powers (5) present (1)
pressing (1) pretended (2) prevent (1) prince (1) principles (1)
protecting (1) protection (2) prove (1) provide (1) providence (1)
province (1) prudence (1) public (2) publish (1) punishment (1)
purpose (2) pursuing (1) pursuit (1) quartering (1) raising (1)
ravaged (1) records (1) rectitude (1) redress (1) reduce (1)
refused (3) refusing (2) reliance (1) relinquish (1) remaining (1)
reminded (1) render (2) repeated (3) repeatedly (1) representation (1)
representative (1) representatives (1) requires (1) respect (1) rest (1)
returned (1) right (7) rights (3) rule (2) ruler (1)
sacred (1) safety (1) salaries (1) same (2) savages (1)
scarcely (1) seas (3) secure (1) security (1) seem (1)
self (1) sent (1) separate (1) separation (2) settlement (1)
sexes (1) shall (1) should (4) shown (1) so (2)
sole (1) solemnly (1) stage (1) standing (1) state (2)
states (7) station (1) subject (1) submitted (1) substance (1)
such (6) suffer (1) sufferable (1) sufferance (1) superior (1)
support (1) supreme (1) suspended (2) suspending (1) swarms (1)
system (1) systems (1) taken (1) taking (1) taxes (1)
tenure (1) terms (1) than (1) that (13) the (77)
their (20) them (15) themselves (3) therefore (2) therein (1)
these (13) they (7) things (1) this (3) those (1)
throw (1) thus (1) ties (1) till (1) time (4)
times (1) to (65) together (1) too (1) totally (2)
towns (1) trade (1) train (1) transient (1) transporting (2)
trial (2) tried (1) troops (1) truths (1) tyranny (2)
tyrant (1) tyrants (1) unacknowledged (1) unalienable (1) uncomfortable (1)
under (1) undistinguished (1) unfit (1) united (2) unless (2)
unusual (1) unwarrantable (1) unworthy (1) us (11) usurpations (3)
utterly (1) valuable (1) voice (1) waging (1) wanting (1)
war (3) warfare (1) warned (1) we (11) whatsoever (1)
when (3) whenever (1) whereby (1) which (10) while (1)
wholesome (1) whose (2) will (2) with (9) within (1)
without (3) works (1) world (3) would (2)
------------------------------------------------------------------------------------------------------------------------
Another type of text file commonly used to store and transfer data is called a Comma-Separated Value (CSV) file. CSV files are used to store and exchange data between different software applications. The content of CSV files are plain text, like a text (TXT) file. The primary difference between a plain text file (above) and a CSV file is that plain text files often contain unstructured data, that is, words, phrases, paragraphs, etc. that are not in any standard format. The content in CSV files, on the other hand, are in a tabular (rows and colums) format.
Figure 1 depicts that general format of a CSV file:

In Figure 1 we see that a CSV file is made up of from one-to-many (n) rows, and one-to-many (m) columns. Each row is called a record and each column is called field. Each record is made up of fields that describe that record. For example, a customer record might be made up of a Customer ID number, that customer's first name, last name, address, phone number and email address. So that record would contain fields in the CSV file, one for each of those attributes about the customer.
A CSV file containing customer records as described above might look like this:

Notice that this example file (named Customers.csv) contains ten customer records, one per row (or line) in the file. In this example, also, this file contains an header row of column titles that help us discern what each column represents in the data. A header row in a CSV file is optional.
Creating a Simple CSV File with One Field
We can create a simple CSV file by opening a new file with a file name and the .csv file extension, like this:
# Open (create) a new file we'll use to store names
with open('Customers.csv', 'w') as file:
file.write("Daffy\n") # < Note the use of the \n escape sequence
file.close()
# Reopen the file and read it
with open('names.csv', 'r') as file:
contents = file.read()
print(contents)
file.close
Output:
Daffy
This simple CSV file contains one record with one field, with no header row in it. While this is very minimal, it is a valid CSV file. We could have included a header row for it by adding one write additional write statement, like this:
# Open (create) a new file we'll use to store names
with open('Customers.csv', 'w') as file:
file.write("FirstName\n") # < Added this write statement to include a column header line.
file.write("Daffy\n")
file.close()
# Reopen the file and read it
with open('names.csv', 'r') as file:
contents = file.read()
print(contents)
file.close
Output:
FirstName
Daffy
Next, let's reopen it in append mode and add more names, like this:
# Append more names
with open('Customers.csv', 'a') as file:
file.write("Marvin\n")
file.write("Tazmanian\n")
file.write("Bugs\n")
file.write("Space\n")
file.write("Yogi\n")
# We could have done all of these in one write() line since the
# escape sequence will push each name to its own line, like this:
file.write("Fred\nScooby\nMickey\nCharlie\n")
file.close()
# Reopen the file and read it
with open('Customers.csv', 'r') as file:
contents = file.read()
print(contents)
file.close
Output:
FirstName
Daffy
Marvin
Tazmanian
Bugs
Space
Yogi
Fred
Scooby
Mickey
Charlie
Creating a CSV File with More Than One Field
Most of the time CSV files contain more than one field per record. Here is an example, expanding on the example above:
with open('Customers.csv', 'w') as file:
file.write("FirstName,LastName,Address\n") # < Write the header row to the file
file.write("Daffy,Duck,123 Quackville Road\n")
file.write("Marvin,Martian,234 Crater Lane\n")
file.write("Tazmanian,Devil,345 Taz Street\n")
file.write("Bugs,Bunny,456 Carrot Blvd.\n")
file.write("Space,Ghost,999 Space Lane\n")
file.write("Yogi,Bear,454 Bear Blvd.\n")
file.write("Fred,Flintstone,825 Rock Street\n")
file.write("Scooby,Doo,444 Snacks Street\n")
file.write("Mickey,Mouse,356 Squeek Lane\n")
file.write("Charlie,Brown,987 Snoopy Street\n")
file.close()
# Reopen the file and read it
with open('Customers.csv', 'r') as file:
contents = file.read()
print(contents)
file.close
Output:
FirstName,LastName,Address
Daffy,Duck,123 Quackville Road
Marvin,Martian,234 Crater Lane
Tazmanian,Devil,345 Taz Street
Bugs,Bunny,456 Carrot Blvd.
Space,Ghost,999 Space Lane
Yogi,Bear,454 Bear Blvd.
Fred,Flintstone,825 Rock Street
Scooby,Doo,444 Snacks Street
Mickey,Mouse,356 Squeek Lane
Charlie,Brown,987 Snoopy Street
Now we have a CSV file containing multiple records, each of which with more than one field.
Problem: Create a CSV file from user data entry
Write a Python program that prompts the user for customer records and writes those records to a CSV file called Customers.csv. The CSV file should contain the following fields:
- Customer ID
- First Name
- Last Name
- Address
- City
- State
- Zip Code
- Phone Number
- Email Address
Using Repetition to Read a CSV Line by Line
When we work with CSV files, it is often necessary to read the file one line at a time because each line in a CSV is a record. Using our Customers.csv file as an example, we can use a loop to read each line of the file, that is, each customer record, one a time. In our loop then we can process each record as needed.
The following example demonstrates a common pattern we may use when working with files, that is, we read the file, one line at a time, then for each line we do things with the attributes in the record. In this case, we are printing each attribute and handling each separately so that we can establish proper column widths or slicing and concatenating (phone number for example). In addition, we're using a counter to count the number of records in the file so that we can print the number as a summary in the footer of the report. Be sure to read through the Code Details under the sample output below.
# Global Variables
report_width = 120
# Functions
def print_header():
report_title = "C u s t o m e r R e p o r t"
print("-" * report_width)
print(" " * int((report_width / 2) - len(report_title) / 2), end="")
print(report_title)
print("-" * report_width)
print("ID".ljust(8), end="")
print("First".ljust(12), end="")
print("Last".ljust(12), end="")
print("Address".ljust(25), end="")
print("City".ljust(15), end="")
print("ST".ljust(5), end="")
print("Zip".ljust(7), end="")
print("Phone".ljust(17), end="")
print("Email".ljust(30))
print("-" * report_width)
def print_footer(counter):
print("-" * report_width)
print("Number of Customers: " + str(counter))
print("-" * report_width)
# Main Program
customer_count = 0
print_header()
my_file = open(r"Customers.csv", "r")
for line in my_file:
customer_record = line.rstrip().split(',')
print(customer_record[0].ljust(8), end="")
print(customer_record[1].ljust(12), end="")
print(customer_record[2].ljust(12), end="")
print(customer_record[3].ljust(25), end="")
print(customer_record[4].ljust(15), end="")
print(customer_record[5].ljust(5), end="")
print(customer_record[6].ljust(7), end="")
print("(" + customer_record[7][0:3] + ")" +
customer_record[7][3:6] + "-" +
customer_record[7][6:].ljust(8), end="")
print(customer_record[8].ljust(30), end="")
print()
customer_count += 1
print_footer(customer_count)
Output:
------------------------------------------------------------------------------------------------------------------------
C u s t o m e r R e p o r t
------------------------------------------------------------------------------------------------------------------------
ID First Last Address City ST Zip Phone Email
------------------------------------------------------------------------------------------------------------------------
123456 Daffy Duck 123 Quackville Road Feathers UT 84555 (222)333-4444 daff@quack.com
234567 Marvin Martian 234 Crater Lane Mars UT 84777 (333)444-5555 marv@mars.org
345678 Tazmanian Devil 345 Taz Street Mania UT 84222 (444)555-6666 taz@devdev.net
456789 Bugs Bunny 456 Carrot Blvd. Hopp UT 84999 (555)777-2222 buggs@hoppy.com
567890 Space Ghost 999 Space Lane Orbit UT 84333 (666)777-8888 ghostie@rocket.com
678901 Yogi Bear 454 Bear Blvd. Bearville UT 84524 (777)888-9999 yogi@bear.net
789012 Fred Flintstone 825 Rock Street Bedrock UT 84846 (888)999-0000 freddy@stone.com
890123 Scooby Doo 444 Snacks Street Doggo UT 84000 (999)111-2222 scoob@snackers.com
901234 Mickey Mouse 356 Squeek Lane Cheese UT 84567 (111)222-3333 mick@mouse.com
466577 Charlie Brown 987 Snoopy Street Chuck UT 84575 (234)645-7737 chuck@peanuts.com
------------------------------------------------------------------------------------------------------------------------
Number of Customers: 10
------------------------------------------------------------------------------------------------------------------------
Code Details:
- Code Line 2: Declares a global variable for the report width.
- Code Line 31: First we declare a counter variable for the number of customer records (lines in the file) and initialize it to zero.
- Code Line 32: We call the print_header() function:
- Code Line 6: Defines the print_header() function
- Code Line 7: Declares a report title variable with a title string. This is set as a variable to demonstrate flexible centering of a string (title in this example) as shown on Code Line 9.
- Code Line 8: Prints a horizontal line based on the report width global variable.
- Code Line 9: Prints a calculated number of blank spaces to prepare the report title so it will be centered.
- Code Line 10: Prints the report title positioned by the calculated number of blank spaces on Code Line 9.
- Code Line 11: Prints a horizontal line based on the report width global variable.
- Code Lines 12 thru 20: Prints column headers for the attributes. Note the use of the .ljust() string method to position the headers appropriately for the report columns. The hardcoded values could be changed here to be calculated based on a programmatic evaluation of the data rather than hardcoded values.
- Code Line 21: Prints a horizontal line based on the report width global variable.
- Code Line 33: We declare a variable for the file and use the open() function to open the Customers.csv file, with read "r" access mode.
- Code Line 34: We declare a for loop based on a line variable and the my_file file variable. The loop will repeat until the end of the file is reached no matter how many records we have in the file.
- Code Line 35: We declare a list variable to store the record in the file. This list variable is constructed by using a combination of the rstrip() string method (which removes any blank spaces from the end of the line) and the split() string method (which splits the line based on a separator, which we set to "," since this is a comma separated file (.csv)). The customer_record then contains all of the attributes for the current record in any given iteration of the loop.
- Code Lines 36 thru 46: Prints each customer record attribute for the current record in any given iteration of the loop. Notice the use of the .ljust() string method used to position the attribute values appropriately.
- Code Lines 43 thru 45: Notice the use of slicing and concatenation to format the phone number from the all-digit version in the .csv file to the more user-friendly format for the report.
- Code Line 47: Note all of the print statements above use the end="" parameter to keep the cursor position on one line so all attributes for any given record remain on the same line. We use an empty print() statement on Line 47, after all of the attributes have been printed on one line, to push the cursor position down to the next line in preparation for the next record in the next iteration of the loop.
- Code Line 48: We increment the customer count that will be used in the report footer.
- Code Line 49: After all records have been read from the file and the loop terminates, we then call the print_footer function.
- Code Line 24: Defines the print_footer() function
- Code Line 25: Prints a horizontal line based on the report width global variable.
- Code Line 26: Prints a summary line to indicate the number of customer records found in the file using the customer_counter variable as an argument for the print_footer() function call. This is an example of accumulating summary values during looping that we can use later, such as in report footers.
- Code Line 27: Prints a horizontal line based on the report width global variable.
- After the print_footer() function completes execution is returned to the main program which ends since there are no more code lines after the function call.
In Python, we can use the csv library to work with csv files. You can find full documentation for this library here ↗. And here is an example of using the CSV library:
Code
import csv
with open('Customers.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['ID','FirstName', 'LastName', 'Address', 'City', 'State', 'Zip','Phone','Email'])
writer.writerow(['123456','Daffy','Duck','123 Quackville Road','Feathers','UT','84555','(222)333-4444','daff@quack.com'])
writer.writerow(['234567','Marvin','Martian','234 Crater Lane','Mars','UT','84777','(333)444-5555','marv@mars.org'])
writer.writerow(['345678','Tazmanian','Devil','345 Taz Street','Mania','UT','84222','(444)555-6666','taz@devdev.net'])
writer.writerow(['456789','Bugs','Bunny','456 Carrot Blvd.','Hopp','UT','84999','(555)777-2222','buggs@hoppy.com'])
writer.writerow(['567890','Space','Ghost','999 Space Lane','Orbit','UT','84333','(666)777-8888','ghostie@rocket.com'])
with open('Customers.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['678901','Yogi','Bear','454 Bear Blvd.','Bearville','UT','84524','(777)888-9999','yogi@bear.net'])
writer.writerow(['789012','Fred','Flintstone','825 Rock Street','Bedrock','UT','84846','(888)999-0000','freddy@stone.com'])
writer.writerow(['890123','Scooby','Doo','444 Snacks Street','Doggo','UT','84000','(999)111-2222','scoob@snackers.com'])
writer.writerow(['901234','Mickey','Mouse','356 Squeek Lane','Cheese',' UT','84567','(111)222-3333','mick@mouse.com'])
writer.writerow(['466577','Charlie','Brown','987 Snoopy Street','Chuck','UT','84575','(234)645-7737','chuck@peanuts.com'])
with open('Customers.csv', 'r', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print("Row: ", row)
print()
print(row[0], "\t" + row[1] + " " + row[2] + "\n\t" + row[3] + "\n\t" +
row[4] + ", " + row[5] + " " + row[6] + "\n\t" + row[7] + "\n\t" +
row[8] + "\n")
Output
Row: ['ID', 'FirstName', 'LastName', 'Address', 'City', 'State', 'Zip', 'Phone', 'Email']
Row: ['123456', 'Daffy', 'Duck', '123 Quackville Road', 'Feathers', 'UT', '84555', '(222)333-4444', 'daff@quack.com']
Row: ['234567', 'Marvin', 'Martian', '234 Crater Lane', 'Mars', 'UT', '84777', '(333)444-5555', 'marv@mars.org']
Row: ['345678', 'Tazmanian', 'Devil', '345 Taz Street', 'Mania', 'UT', '84222', '(444)555-6666', 'taz@devdev.net']
Row: ['456789', 'Bugs', 'Bunny', '456 Carrot Blvd.', 'Hopp', 'UT', '84999', '(555)777-2222', 'buggs@hoppy.com']
Row: ['567890', 'Space', 'Ghost', '999 Space Lane', 'Orbit', 'UT', '84333', '(666)777-8888', 'ghostie@rocket.com']
Row: ['678901', 'Yogi', 'Bear', '454 Bear Blvd.', 'Bearville', 'UT', '84524', '(777)888-9999', 'yogi@bear.net']
Row: ['789012', 'Fred', 'Flintstone', '825 Rock Street', 'Bedrock', 'UT', '84846', '(888)999-0000', 'freddy@stone.com']
Row: ['890123', 'Scooby', 'Doo', '444 Snacks Street', 'Doggo', 'UT', '84000', '(999)111-2222', 'scoob@snackers.com']
Row: ['901234', 'Mickey', 'Mouse', '356 Squeek Lane', 'Cheese', ' UT', '84567', '(111)222-3333', 'mick@mouse.com']
Row: ['466577', 'Charlie', 'Brown', '987 Snoopy Street', 'Chuck', 'UT', '84575', '(234)645-7737', 'chuck@peanuts.com']
466577 Charlie Brown
987 Snoopy Street
Chuck, UT 84575
(234)645-7737
chuck@peanuts.com
Code Details
- Code Line 1: First we import the csv library. This library is not part of the Python Standard Library, so the first time you try to use it you likely need to install it first, which you can do with pip install csv at the command line or in the terminal.
- Code Line 3: On this line we use the with statement to open a file called Customers.csv with the w specifier, which means write. Using that specifier, if the file does not already exist it will be created and if the file already exists it will be overwritten.
- Code Line 4: Here we instantiate a csv.writer object using the writer() method and pass the CSV file that we'll use to write to the file.
- Code Lines 5 thru 10: Inside of the with block we write a number of records to the file. A record in this context is a row of data about an entity, in this case, we're simulating customer records. Each record has the same columns with each column separated by a comma. Note that the first row written (Code Line 5) is our column headers. Many CSV files store column headings as their first row. Then each of the remaining lines of code write a customer record to the file.
- Code Line 12: On this line we use the with statement to open a file called Customers.csv with the a specifier, which means append. Using that specifier, if the file does not already exist it will be created and if the file already exists any writing to the file will append records to the bottom of the file.
- Code Line 13: Here we instantiate a csv.writer object using the writer() method and pass the CSV file that we'll use to write to the file.
- Code Lines 14 thru 18: Inside of the with block we write (append) a number of records to the file. A record in this context is a row of data about an entity, in this case, we're simulating customer records. Each record has the same columns with each column separated by a comma. Each of the remaining lines of code write a customer record to the file.
- Code Line 20: On this line we use the with statement to open a file called Customers.csv with the r specifier, which means read. Using that specifier, the file can only be read, no writing will be allowed.
- Code Line 21: Here we instantiate a csv.reader object using the reader() method and pass the CSV file that we'll use to write to the file.
- Code Line 22: On this line we establish a for loop that will iterate over the reader object so that we can work with each row in the file.
- Code Line 23: For this example, we simply print each row which produce Output Lines 1 thru 11.
- Code Lines 25 thru 27: To demonstrate accessing columns on a row, these lines show printing a record in a formatted manner. We can access each column using the index values of the row.