Subscribe Contact

Home  »  Chapter 2 : Programming Basics
Data & Data Types

The word data is used in many contexts to mean different things. In the context of this study of Python, data are units of information. Often the words data and information are used interchangeably, but they are not the same thing. The common academic definition of information is that which is conveyed or represented by a particular arrangement or sequence of symbols. Notice that data is referred to as plural, "data are units." The singular form of data is datum.

Concept: Data

Data refers to raw facts, figures, and details collected from various sources, which, when processed and analyzed, becomes meaningful information. It can exist in various forms, such as numbers, text, bits and bytes, images, and sounds. Data is a fundamental element in computing and information sciences, used for analysis, decision-making, and creating strategies in various fields like business, science, technology, and social studies. In its raw form, data might seem random or disconnected, but through processing and analysis, it can reveal patterns, insights, and trends. The significance of data has grown immensely in the digital age, where vast amounts of it are generated and used every day, leading to the development of fields like data science and big data analytics, which focus on extracting meaningful information and knowledge from large and complex datasets.


The symbols that we use to represent information are things like alphabetic words, numbers, etc. For example, if someone asked you your age and you responded "I am 25 years old," you have conveyed information to them regarding the number of years since your birth. The key symbol here is the number 25. In programming, we would call the symbol 25 a datum. If five people were asked their ages and we wrote down each one, the set of five numbers we ended up with would be the data (plural) that represents the ages of that group of people.

As programmers, we work with data in most programs that we write. We need to understand what data is, how our programs will receive data (input), what form the data will be in, such as text, numbers, images, video, etc. (data types), what actions we need to perform on that data (algorithms & processing) and what we need to do with the results of those actions (output). This diagram represents the common flow of data through a program that processes data:



Input: When we write programs that work with data, we can receive that data in many different ways. For example, we can prompt the user to enter values (such as in a data entry application), we can read data from a file that is stored on the computer hard drive, we can query a database to select data from its tables, we can receive data from hardware devices, from the Internet, from the computer keyboard, mouse, trackpad, sensors, and the list goes on and on.

Data Types: Each item of data has a type, called its data type. The type informs us how we can work with that item of data. For example, an item of data can be a number, such as an integer (a counting number) or a decimal value with decimal digits that indicates fractional portions of a whole number. We can use numeric data to perform calculations. A data item can be alphanumeric, that is, a sequence of alphabetic letters and numbers, such as abc123. In Python we call these alphanumeric sequences strings. We cannot perform calculations with alphanumeric data. We can however manipulate strings in other ways, such as change their case (upper/lower), search them for substrings, concatenate them together to create longer strings, etc.

Algorithms: An algorithm is a set of steps to perform a calculation or conduct a process of solving a problem. Algorithms range from very simple (how to add two integers) to very complex, like those found in Machine Learning or Artificial Intelligence. When we process data between input and output, we often use one or more algorithms to use our input to answer questions or solve problems.

Processing: We use the term processing to indicate the actions we take with data we bring into our programs. As indicated above, often that involves using algorithms, however other processing may also occur that is not algorithmic in nature. For example, we may read data and simply store that data in a database system, without performing any algorithmic steps on that data.

Output: Once our processing is complete there is often some form of output, such as printing, displaying on a screen, writing data to an output file, etc. Output is generally the end-goal of our processing steps. We often use output to confirm that our processing completely correctly. For example, if we generate an output report as a result of our processing, and that output report contains identifiable errors, it indicates that our processing may be flawed in some way.

For now, let's take a closer look at data types and we will return to algorithms, processing and output more in depth later.

Data Types

At the top of this page, I used the example of asking someone's age. The response, 25, is a datum that represents the age of that one person. That datum is a symbol that we are familiar with from mathematics that we call an integer. We know from math that integers are whole numbers (sometimes called counting numbers) which have no fractional or decimal values. We use integers to represent datum that we do not usually consider fractional portions, such as age, number of purchases, number of children in a family, etc. As programmers, we can interpret any of these values as an integer, we know the types of calculations we can preform with it and subsequently we store it in memory or in a database as an integer value. In these examples, the fact that these datum age represents integers is called their data type.

Concept: Precision & Scale

When we talk about numeric data types we often need to know the precision and scale of the data types. Precision means the number of significant number of digits possible in a value. Scale refers to the number of digits to the right of the decimal point. So, for example, 12345.67 has a precision of 7 and a scale of 2.

Why is this important? When we are setting up our programs to process data, we choose data types based on the values we expect we will need. So, for the age example above, we would most likely want to store that as an integer, that is with no decimal values. So, knowing that we will not store decial values, we choose the integer data type. We can also decide on reasonable range of values to expect in that data. If we prompted a user for their age and they entered 4592, which has a value much larger than known human age, we would want to reject that value as invalid. That's an example of applying knowledge of both the data we are working with combined with knowledge of what is available in Python to store values.

Python supports many data types, here is a list of the first few we will explore and use. We will expand our list further as we progress through this eBook.

Category Python Syntax Description
Numeric int Integers in Python have no limit to how many digits of precision they can have (other than constraints of the amount of memory available to the program in the computer). This means we can specify an integer of (virtually) any length.

Examples:
  • 10
  • 157
  • 24785
  • 12234987923847928374982739847928374687693589056890347
Numeric float Numeric values with decimal values (scale greater than 0) are called float values in Python. Float values can have a scale of up to 1.8 X 10308 digits.

Examples:
  • 0.1
  • 0.5774
  • 3.14
  • 493.67498793465827634823875934860985498234827
Boolean bool Boolean is a binary state of either true or false. Boolean values in Python can only be one of those two possible values. We use boolean to in many scenarios, such as repeating an action until some condition becomes true. We often associate boolean result values of true and false to yes or no questions.

Example statements that we could implement in Python that would rely on boolean values (we will learn how to actually do this a little later):
  • Is today Monday?
  • Repeat adding numbers until we run out of numbers.
  • Is this student's GPA greater than or equal to 3.5?
  • Is the user employed?
Text str In programming we very often work with alphanumeric (textual) data. In Python we call sequences of alphanumeric characters strings. Python has a wide range of tools available in it to work with string data, as we will learn as we proceed.
Concept: Alphanumeric

Alphanumeric characters include the alphabetic letters A thru Z, both upper case and lowercase, as well as the digits 0 thru 9. When a digit is within an alphanumeric value, it is not considered a number but rather a textual representation of the digit. Alphanumeric also includes other characters found in keyboards, such as @ $ # & * ( ) { } [ ] , = - _ . + ; ' /. Also, a blank space (created on the keybard by the spacebar) is considered alphanumeric.

Examples (each of the following 4 examples represent strings, one per line):
  • Bob
  • Bob Smith
  • Bob Allen Smith
  • One day, maybe tomorrow, I will talk to Mr. O'Leary about the $5.00 I owe him.


In the above table, the category is how we group types of data types together, such as all data types that are numeric. The syntax column shows the actual syntax we use in Python to specify each particular data type. The brief description is provided as a simple introduction.

Don't try to memorize this table, as we progress through this eBook we will work with each of these. You'll learn each in turn in the context of learning Python.

Examples

Consider the following practical example of the data types listed above.

Let's say we need to write a program that processes data entered by a user. The user fills out a form and clicks a Save button. It might look like this:

This form represents information about the user, Bob Smith. How many datum items does this form contain? It prompts the user for their first name, last name, age, employment status, job title and hourly wage, so 6 datum. It appears as if there would be 7, however the Employed? question would be stored as a single value based on their answer.


What are the types of data for these datum items? Well, the first name, last name and job title are all expected to be alphanumeric (strings), so they would be str's. As we saw ealier, age is an integer (whole counting number with no decimal or fractional portion) or int in Python. The employment status is a yes or no value. As indicated in the chart above, we call this boolean, or bool in Pthon. And the hourly wage is a number, but not an ingeter because it has scale. So hourly is a decimal number, which in Python programming terms we call it a floating-point number, or float.

On this page we have learned about basic ideas about data and data types. There is much more to learn about these topics which we continue to explore as we learn Python. For now, our next step is to learn how to store data in variables.

The type() Function

The type() function in Python is used to determine the data type or class of an object. It returns the type of the given object. The syntax of the type function is as follows:

type(object)

The object parameter represents the object whose type needs to be determined. The return value of the function is called the type object, which reports the type or class of the given object. For now, we will use the type() function to check the data type of constant and literal values. We will expand the use of the type() fuction soon when we talk about variables.

Code Examples

print(type(10))
print(type(3.14))
print(type('A'))
print(type("Bob Smith"))
print(type(10 / 20))
print(type(10 > 20))

Output

< class 'int' >
< class 'float' >
< class 'str' >
< class 'str' >
< class 'float' >
< class 'bool' >

Code Details


«  Previous : Comments
Next : Variables  »




© 2023 John Gordon
Cascade Street Publishing, LLC