Concept: Validation
Data validation is the process of ensuring that the data entered into a system meets specified criteria and is both correct and useful. This process is critical in maintaining data quality and reliability. Validation can involve checking the data for accuracy, completeness, and consistency. For example, it might include verifying that a phone number contains the correct number of digits, or ensuring that a mandatory field is not left empty in a form. Data validation can be performed through various methods, such as using predefined rules, algorithms, or even manual review. It's an essential step in data processing and management, as it helps prevent errors, inconsistencies, and the entry of invalid data, which can lead to poor decision-making, inefficiencies, and other issues in systems that rely on accurate data. Data validation is widely used in programming, data entry, database management, and data analysis.
Data validation is a large in-depth topic, so for now we'll keep it simple. As a programmer, it is your responsibility to validate all data that your program receives from any source as best you can. There's an old phrase that applies here:
garbage in garbage out. In our context, this means that if we allow invalid data into our programs, files, databases, etc., when we need to use that data for important processes such as transaction processing, customer records, financial calculations, etc. our results will be poor.
Let's take a look at the two problems identified above and discuss possible approaches to validate the data after the user enters it into our programs.
Invalid Data Types: The first problem we identified above about a user's entry for our age prompt is related to invalid data types. Even though the input() function accepts any data type entered as a string, ultimately the age must be a numeric and even more specifically an integer. Python provides some
string methods to help us determine the contents of a string. You can find a full list of string methods on the
String Methods Reference page. On that reference page, you'll notice a group of methods that start with "is", like
isalpha(),
isnumeric(), etc. These are often referred to as the
is methods. This group of methods evaluates a string and returns True or False if the string contains the type of data indicated by the method name. So, for example, if our string is
s = "Bob" then
isalpha() will return True because all characters in the string are alphabetic. If
s was
"Bob5", or
"Bob.Smith", or
"Bob ", etc. then
isalpha() would return False.
Returning to our
age example, one aspect of a valid age is that it is an integer with no decimal portion or other characters. If we look at the list of string methods, we see
isdigit() which returns True if all characters in a string are digits (no decimal points or other characters). We can confirm this will work for our purposes with a simple prompt and decision statement:
Code Example:
age = input("Please enter your age: ")
if age.isdigit():
print("The string contains all digits")
else:
print("The string does not contain all digits")
Code Details
- In Line 1 we prompt for the age as usual.
- We then use age.isdigit() in Line 2 to evaluate the contents of the user's entry.
- If the value they entered contains all digits, the if statement will return True.
- If the value they entered contains anything other than some combination of 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9, then the if statement will return False.
Output (after program runs 3 times):
Please enter your age: 25
The string contains all digits
Please enter your age: 25.5
The string does not contain all digits
Please enter your age: adsfasdf
The string does not contain all digits
Output Details
- Lines 1 & 2 demonstrate the first run of the program. In this case, the user entered 25, which is an integer and contains no non=digit characters, so the result is True.
- Lines 4 & 5 demonstrate a user's entry which is a decimal (float) value, which results in a False result from the isdigit() function because of the decimal point.
- Lines 7 & 8 demonstrate a user's entry which is a set of characters that are not numeric, which results in a False result from the if statement.
(Almost) Full Data Validation Solution Using isdigit()
Now, let's use
isdigit() to add data validation to our age example. By adding this functionality, the user will be prompted for the age, and then we will use a loop to test their entry. If they entered a valid integer we will continue, otherwise, we will repeatedly prompt the user until they provide a valid integer.
Code Example:
age = input("Please enter your age: ")
while not age.isdigit():
print("Error: Invalid age, please try again.")
age = input("Please enter your age: ")
print("Your age is", age, sep=" ")
Code Details
- On Line 1 we prompt the user for the age.
- Line 2 initiates a while loop with the not age.isdigit() signature. At first glance, this might not seem intuitive, but if we convert this to an English sentence it could be read "while the age variable is not all digits repeat the following...".
- If execution enters the loop, it means the user's entry contains characters in the string that are not integer digits, so we display an error message and repeat the age input() function prompt.
- Execution will stay inside the loop until the user enters a valid integer value.
- Once the user enters a valid integer value, execution exits the loop and prints the user age result.
Diagram Details
- The diagram depicts the flow of execution through the above code.
- Notice the manual input symbology for the age variable, as well as for the printing of the error and age messages.
Why can't we just use the int() function?:
One question you may have at this point is why can't we just use the int() function to convert the user's input into the integer value? Like this:
age = int(input("Please enter your age: "))
Without data validation, that will only work if the user enters a valid integer value. If the user types in any data type other than an integer (like string, float, etc.) the user will have a run-time error which is something we as programmers must try to avoid. We would want to convert the user-entered age (string) value to an integer at some point, however, we would do this after it has been validated.
Now ... A More Complete Data Validation Solution
Even after everything covered above, we still have not addressed the other problem related to the age variable. What happens if we implement the data validation above but the user enters a value like 56465 as the age? It is an integer, but it is still invalid because human age does not reach that high of a number. So, in addition to the data validation code, we used earlier, now we need to add
range boundary validation. This type of data validation is commonly referred to as
boundary testing.
Using the age example, the boundaries of age for humans is, commonly, greater than zero and less than 125. The oldest known person in history was 122 years old, so an upper boundary of 125 is reasonable, should cover all cases, and avoid outrageously large age values. So, let's add a boundary test to our solution. This time I will introduce a couple of
new concepts and
while loop options that help us with this type of validation.
Concept: Exceptions
When a program encounters an unexpected condition the result is called an exception. If the programmer has not accounted for exceptions then the program could crash, display confusing errors to the user, produce invalid results, etc. There are a large number of causes of exceptions, such as invalid data, missing variables, improper syntax, missing files, out-of-memory, and many others. The process of writing code to capture, prevent and/or resolve exceptions is called exception handling.
Concept: Try-Catch
Many programming languages include a try-catch structure, which allows programmers to capture errors and exceptions, and respond to them more elegantly than just allowing their program to crash. In Python, the actual syntax of this exception handling concept is called try-except.
The general form of the Try-Except exception handling structure in Python is as follows:
try:
Statement(s)
except:
Statement(s)
else:
Statement(s)
finally:
Statement(s)
Code Details:
- The try section contains the block of code you want to run.
- The except section runs if an exception occurs with the code in the try code block. This is often where we place error messages and code to try to gracefully handle the exception. If no exceptions occur, this block of code will not run.
- The else section is the code that will run if no exception(s) occur. If exceptions occur, this block of code will not run.
- The finally section is a block of code that always runs, whether or not there is an exception. We often place
Example: Division by Zero
If you recall we learned previously that division by zero in computers is undefined. A common use of
try-except is to handle the possibility of division by zero. In this example, I'll demonstrate this issue by prompting the user for two values and then dividing one from the other. Since I cannot control what the user enters, I'll use a try-except to catch the division by zero exception.
The following code is my first attempt at a solution.
x = int(input("Please enter first integer: "))
y = int(input("Please enter second integer: "))
try:
z = x / y
except ZeroDivisionError:
print("Error: You cannot divide by zero,\nso, your second integer cannot be zero.")
else:
print("The result of dividing your first integer by the second is", x / y, sep=" ")
finally:
print("Done.")
Code Details:
- Lines 1 & 2 prompt the user for two integer values. Note that these use the int() function to convert the user's entries into integers. I am assuming that the user will enter integers, but we know from the discussion above that this is problematic.
- On Line 3, I initiate the try: block.
- Line 4 contains the code we want to try. Note that it contains the division calculation, which is where the code could possibly fail if the user entered zero for the second integer input in Line 2.
- Line 5 initiates the except block, with the name ZeroDivisionError.
- Line 6 contains the code that will run if, and only if, there is an exception (in this example, division by zero), on Line 4.
- Line 7 initiates the else block.
- Line 8 runs if, and only if, there are no exceptions, which is the output of the division.
- Line 9 initiates the finally block.
- Line 10 runs regardless of whether there was an exception or not. In this example, I placed a simple print statement here just for demonstration purposes.
And now ... Let's Finish the Age Data Validation Solution Using try-except
Also, we'll introduce the
break and
continue statements as well...
Now that I have introduced you to the
try-except construct, we can add it to our evolving solution to the
age variable problem from above. As a reminder, we identified two problems with prompting the user for their age: the data type and the range of human age. We solved the data type issue with the
isdigit() function. And we began discussing the issue of accepting only a range of valid values for human age (0 thru 125). Now let's incorporate try-except to finish up our age data solution, which also introduces the
break and
continue statement options for while loops as well.
The code
while True:
age = input("Please enter your age (1 thru 125): ")
try:
age = int(age)
except:
print('Error: Please use only digits 0 thru 9.')
continue
if age < 1 or age > 125:
print('Error: Valid age range is 1 thru 125.')
continue
break
print('Your age is', age, sep=" ")
Code Details:
- Line 1 reads while True:. Interesting. If you remember from our discussion of while loops, this is an infinite loop. What's up with that? This is a common approach to handling an unknown number of iterations. In this example, we are forcing the user to enter a valid age, so, we as programmers have no idea how many times the user will enter invalid values, so we set an "infinite" loop and then control the exit of the loop inside the loop instead of in the signature with a condition. We'll see how we do that shortly.
- Line 2 is our input() function prompt. When the user enters a value, it is assigned to the variable age.
- Line 3 we add our try: statement, which begins our try-except block.
- Line 4 is the line where we attempt to convert the user-entered age value using the int() function. This is where we could have an exception because if the user entered a non-numeric value it would cause an exception. Also, if they entered a decimal value, it would cause an exception as well.
- Line 5 we add our except: block.
- Line 6 will run if, and only if, there is an exception on Line 4. This will display an error message to the user.
- Line 7 is a continue statement. The continue statement returns execution to the while loop signature. Since the signature is while True: the loop will simply repeat and prompt the user for their age again.
- If there is no exception, then Line 8 will run (notice that this is outside of the try-except structure). This line is our age range validation, the if statement checks if the age value entered by the user is between 0 and 125 as we decided earlier.
- If the user-entered age is outside of our range, an error message is printed on Line 9 for the user.
- Line 10 is another continue statement. The continue statement returns execution to the while loop signature. Since the signature is while True: the loop will simply repeat and prompt the user for their age again.
- We then place a break statement on Line 10. If execution reaches this line, it means that the user entered a valid integer and also a valid numeric value within our defined age range... in other words, a valid age. The break statement forces an exit from the while loop.
- Line 12 prints the age. Note that this is outside of the while loop.