☰
Python Across Disciplines
with Python + AI Tool   
×
Table of Contents

1.1.   Introduction 1.2.   About the Author & Contact Info 1.3.   Book Conventions 1.4.   What (Who) is a Programmer? 1.5.   Programming Across Disciplines 1.6.   Foundational Computing Concepts 1.7.   About Python 1.8.   First Steps 1.8.1 Computer Setup 1.8.2 Python print() Function 1.8.3 Comments
2.1. About Data 2.2. Data Types 2.3. Variables 2.4. User Input 2.5. Data Structures (DS)         2.5.1. DS Concepts         2.5.2. Lists         2.5.3. Dictionaries         2.5.4. Others 2.6. Files         2.6.1. Files & File Systems         2.6.2. Python File Object         2.6.3. Data Files 2.7. Databases
3.1. About Processing 3.2. Decisions         3.2.1 Decision Concepts         3.2.2 Conditions & Booleans         3.2.3 if Statements         3.2.4 if-else Statements         3.2.5 if-elif-else Statements         3.2.6 In-Line if Statements 3.3. Repetition (a.k.a. Loops)         3.3.1  Repetition Concepts         3.3.2  while Loops         3.3.3  for Loops         3.3.4  Nested Loops         3.3.5  Validating User Input 3.4. Functions         3.4.1  Function Concepts         3.4.2  Built-In Functions         3.4.3  Programmer Defined Functions 3.5. Libraries         3.5.1  Library Concepts         3.5.2  Standard Library         3.5.3  External Libraries 3.6. Processing Case Studies         3.6.1  Case Studies         3.6.2  Parsing Data
4.1. About Output 4.2. Advanced Printing 4.3. Data Visualization   4.4  Sound
  4.5  Graphics
  4.6  Video
  4.7  Web Output
  4.8  PDFs & Documents
  4.9  Dashboards
  4.10  Animation & Games
  4.11  Text to Speech

5.1 About Disciplines 5.2 Accounting 5.3 Architecture 5.4 Art 5.5 Artificial Intelligence (AI) 5.6 Autonomous Vehicles 5.7 Bioinformatics 5.8 Biology 5.9 Bitcoin 5.10 Blockchain 5.11 Business 5.12 Business Analytics 5.13 Chemistry 5.14 Communication 5.15 Computational Photography 5.16 Computer Science 5.17 Creative Writing 5.18 Cryptocurrency 5.19 Cultural Studies 5.20 Data Analytics 5.21 Data Engineering 5.22 Data Science 5.23 Data Visualization 5.24 Drone Piloting 5.25 Economics 5.26 Education 5.27 Engineering 5.28 English 5.29 Entrepreneurship 5.30 Environmental Studies 5.31 Exercise Science 5.32 Film 5.33 Finance 5.34 Gaming 5.35 Gender Studies 5.36 Genetics 5.37 Geography 5.38 Geology 5.39 Geospatial Analysis ☯ 5.40 History 5.41 Humanities 5.42 Information Systems 5.43 Languages 5.44 Law 5.45 Linguistics 5.46 Literature 5.47 Machine Learning 5.48 Management 5.49 Marketing 5.50 Mathematics 5.51 Medicine 5.52 Military 5.53 Model Railroading 5.54 Music 5.55 Natural Language Processing (NLP) 5.56 Network Analysis 5.57 Neural Networks 5.58 Neurology 5.59 Nursing 5.60 Pharmacology 5.61 Philosophy 5.62 Physiology 5.63 Politics 5.64 Psychiatry 5.65 Psychology 5.66 Real Estate 5.67 Recreation 5.68 Remote Control (RC) Vehicles 5.69 Rhetoric 5.70 Science 5.71 Sociology 5.72 Sports 5.73 Stock Trading 5.74 Text Mining 5.75 Weather 5.76 Writing
6.1. Databases         6.1.1 Overview of Databases         6.1.2 SQLite Databases         6.1.3 Querying a SQLite Database         6.1.4 CRUD Operations with SQLite         6.1.5 Connecting to Other Databases
Built-In Functions Conceptss Data Types Date & Time Format Codes Dictionary Methods Escape Sequences File Access Modes File Object Methods Python Keywords List Methods Operators Set Methods String Methods Tuple Methods Glossary Index Appendices   Software Install & Setup
  Coding Tools:
  A.  Python    B.  Google CoLaboratory    C.  Visual Studio Code    D.  PyCharm IDE    E.  Git    F.  GitHub 
  Database Tools:
  G.  SQLite Database    H.  MySQL Database 


Python Across Disciplines
by John Gordon © 2023

Table of Contents

Table of Contents  »  Chapter 2 : Data (Input) : About Data

About Data

Subscribe Contact


Overview

Data is the foundation of most programming projects, regardless of discipline or purpose. In this chapter, we will explore what data is, the difference between data and information, the types of data, and how we work with data in our disciplines and Python. When we engage with data in Python, we can think of data as units of information. The words data and information are used interchangeably, but they are not the same. Here are general definitions of the two:

Concept: Data
Full Concepts List: Alphabetical  or By Chapter 

Data refers to raw, unprocessed facts collected through observations, measurements, or responses. These facts can be in various forms, such as numbers, words, images, or sounds, and they often lack context or meaning when viewed in isolation. Data is characterized by its accuracy, reliability, and objectivity, and it serves as the foundational building blocks for analysis and interpretation. Data is the raw material that can be processed and analyzed to extract valuable insights. It can be quantitative (numerical) or qualitative (descriptive), and its collection is driven by the need to record and track information about phenomena, processes, or events.

Concept: Information
Full Concepts List: Alphabetical  or By Chapter 

Information is data that is processed, organized, or structured to add context and meaning, making it useful and understandable to the person receiving it. Information arises from interpreting data, where the raw facts are analyzed to reveal patterns, relationships, or trends. This process involves sorting, aggregating, or transforming data to convey knowledge, solve problems, or make decisions. Information is meaningful and valuable because it provides insight, answers questions, or guides actions. For example, a collection of data points about weather patterns becomes information when analyzed and presented as a weather forecast, which people can use to plan their activities.

With these definitions in mind, we will learn features of Python that support the creation, collection, and storage of data. We use these features in preparation for the next step of the Input ⇨ Processing ⇨ Output cycle of processing.

Examples of Data (Input) ⇨ Processing ⇨ Output

There are many different ways to create and collect data. We will explore few examples of using Python to create or collect data in a Python program.

  • Hardcoding data values directly in our Python code. While this is not a common practice in production systems, we often hardcode values in our code to test our code and algorithms during development.
  • User Prompts are used in interactive Python programs. Prompts request data from the user by printing a prompt on the screen and waiting until the user has entered a response. When the user presses Enter, we can use their entry in our Python program.
  • Data files contain from one to many rows of data that we can read into our Python program for processing.
  • Databases are storage systems for large quantities of data that we can access using Python to request data based on specified criteria (called a query).
  • Web Scraping is an approach for collecting data by scrapping (extracting) data from websites programmatically using Python.
  • APIs (Application Programming Interfaces) are specialized tools for communication between two computer programs. This facilitates our Python programs' ability to communicate with APIs that are made available by various systems and web platforms (like Google Maps, the National Weather Services, and others).
  • Sensors & hardware devices often include the ability to transmit data to external programs, which allows us to access that data from our Python programs.
  • Online surveys & forms generate data when users complete these data collection tools online and submit their responses. We can use Python to process those responses to produce various storage and output tasks.
  • Computer, network, and web monitoring tools collect data used to manage systems. Python programs process that data in order to present information to system admininstrators.


Concept: Hardcode
Full Concepts List: Alphabetical  or By Chapter 

In programming, the term "hardcode" refers to embedding data directly into the source code of a program rather than obtaining it from external sources or generating it dynamically. Hardcoded data is fixed and does not change unless the source code is modified. This approach is used for values unlikely to change, such as configuration settings, constants, or specific resource file paths. While hardcoding can simplify development by reducing complexity and dependencies, it also makes the program less flexible and more challenging to update. For instance, changing a hardcoded value requires modifying the code and redeploying the application, which can be inefficient for values needing frequent updates. Moreover, hardcoding sensitive information, like passwords or API keys, is considered bad practice from a security perspective.

Concept: Black Box
Full Concepts List: Alphabetical  or By Chapter 

In programming and software engineering, the term "black box" refers to a system or component whose internal workings are unknown or accessible to the user or developer. In a black box approach, the focus is on the inputs and outputs of the system without any concern for its internal implementation. This concept is widely used in testing (black box testing), where the tester evaluates the system based solely on its functionality and response to inputs, without any knowledge of the internal code structure. Black box systems are also common in software components or APIs, where the user interacts with a well-defined interface without access to or knowledge of the underlying code. This abstraction allows users to utilize complex systems without needing to understand or manage the intricate details of their operations, promoting modularity and ease of use. However, it also means that troubleshooting or optimizing such systems can be challenging since their internal mechanisms are hidden.

Concept: Data File
Full Concepts List: Alphabetical  or By Chapter 

In programming, a data file is a file that primarily contains data used or generated by a software application. Unlike executable files, which contain code run by the computer, data files are designed to be read from or written to by programs. They come in various formats, depending on the type and use of the data, such as text files (.txt), comma-separated values files (.csv), JavaScript Object Notation files (.json), and eXtensible Markup Language files (.xml), among others. Data files can hold a wide array of information, ranging from simple text or numbers to complex structured data like user settings, program states, or large datasets used in data analysis. How these files are structured and accessed depends on the requirements of the application and the specific programming language in use. Efficient handling and manipulation of data files are often crucial in software development, especially in fields such as data science, database management, and web development.


Example of Hardcoded Data

As indicated above, hardcoded data within code is most often considered a bad programming practice. However, when you're learning to program and during development we often use hardcoded data values with the understanding that when we create live production-level programs we minimize or eliminate the use of hardcoding. Figure 1 depicts an Input ⇨ Processing ⇨ Output scenario in which we hardcode a data value into a Python program, which is processed by our program, and then an output is printed to the screen. In this example, we hardcode a data value (10), which will be used as our input. The process performs a calculation (black boxed in this image) that produces an output, in this case, a statement that reports the area of a circle calculated from the hardcoded data value of radius.


Figure 1: Example of hardcoded data.


Example of Data Entered by a User

In contrast to hardcoded data, prompting a user for a data value makes our programs more interactive and flexible. Figure 2 depicts the same scenario as seen in Figure 1, a program that calculates the area of a circle. In this example, though, the radius data value comes from a user entering a value. We use the Python input() function to prompt the user for values. The process performs a calculation (black boxed in this image) that produces an output, in this case, a statement that reports the area of a circle calculated based on the radius data value entered by the user.


Figure 2: Example of data entered by a user.


Example of Data Read from a Data File

In this third example, we'll consider a different scenario. Another way we collect data is by reading a data file. Figure 3 depicts a data file containing a summary list of student academic majors in a class. Each line of the data file includes a major and a number, the number of students in the class who have declared that major. The black box process in our Python program produces a chart visually displaying each major and the number of students in each as a horizontal bar chart.


Figure 3: Example of data read from a data file.

As we proceed, we will work through examples such as these and others, hands-on, using Python.

Types of Data

In addition to understanding the sources of data, we also need to be aware of different types of data for proper data collection, analysis, and interpretation in various fields, ranging from research to programming to business intelligence.

  • Quantitative data
    • Description: Quantitative data is numerical, allowing for measurement and comparison. It is typically used in statistical analysis and can be further classified into discrete and continuous data.
    • Examples:
      • Discrete Data: Counts of items, such as the number of students in a class (e.g., 30).
      • Continuous Data: Measurements that can take any value within a range, like the height of students in meters.
  • Qualitative Data
    • Description: This type of data is descriptive and characterizes attributes or properties that are not numerical. It can be observed but not measured.
    • Examples:
      • Nominal Data: Data without a natural order or ranking, such as types of cuisine (Italian, Chinese, Mexican).
      • Ordinal Data: Data with a set order or scale, but without a standard interval, like survey responses (poor, fair, good, very good, excellent).
  • Primary Data
    • Description: Primary data is collected firsthand by the researcher for a specific purpose. It is original and collected at the source.
    • Examples:
      • Survey responses collected by a marketer to assess customer satisfaction with a new product.
      • Laboratory experiment results in a scientific study.
  • Secondary Data
    • Description: This data is not collected directly by the user but obtained from existing sources. It was collected for other purpose but is being used for a different analysis.
    • Examples:
      • Census data used by a business for market analysis.
      • Historical sales data used for trend analysis.
  • Structured Data
    • Description: Structured data is highly organized and formatted in a way that is easily searchable in databases. It adheres to a specific format or schema.
    • Examples:
      • Database tables with rows and columns, like a student enrollment database.
      • Spreadsheets with defined data types for each column.
  • Unstructured Data
    • Description: This type of data lacks a predefined format or structure, making it more complex to analyze and process.
    • Examples:
      • Text files, such as emails or product reviews.
      • Multimedia content like images, audio, and video files.
  • Semi-structured Data
    • Description: Semi-structured data does not reside in a relational database but has some organizational properties that make it easier to analyze than unstructured data.
    • Examples:
      • JSON and XML files.
      • Emails that contain both structured elements (like sender, recipient, date) and unstructured text bodies.
  • Time-Series Data
    • Description: This data is a sequence of data points collected at consistently over time. It is used to analyze trends, patterns, and future forecasting.
    • Examples:
      • Stock market prices recorded throughout the trading day.
      • Daily temperature readings.
  • Cross-sectional Data
    • Description: Data collected from multiple sources at a single point in time. It is used to analyze and compare different variables at a specific moment.
    • Examples:
      • A survey of consumer preferences taken at a particular date.
      • Data on the GDP of different countries for a given year.
  • Big Data
    • Description: Refers to extremely large datasets that traditional data processing software cannot handle efficiently. Big data is characterized by its volume, velocity, and variety.
    • Examples:
      • Social media data with millions of posts, likes, and comments.
      • Sensor data from Internet of Things (IoT) devices.

Conclusion

This page briefly introduced data, gave examples of how data is created or collected, and provided a list of different types of data we might encounter. Next, we will learn about each type of data we can work with and how to manage and manipulate that data in Python.







© 2023 John Gordon
Cascade Street Publishing, LLC