☰
Python Across Disciplines
with Python + AI Tool   
×
Table of Contents

1.1.   Introduction 1.2.   About the Author & Contact Info 1.3.   Book Conventions 1.4.   What (Who) is a Programmer? 1.5.   Programming Across Disciplines 1.6.   Foundational Computing Concepts 1.7.   About Python 1.8.   First Steps 1.8.1 Computer Setup 1.8.2 Python print() Function 1.8.3 Comments
2.1. About Data 2.2. Data Types 2.3. Variables 2.4. User Input 2.5. Data Structures (DS)         2.5.1. DS Concepts         2.5.2. Lists         2.5.3. Dictionaries         2.5.4. Others 2.6. Files         2.6.1. Files & File Systems         2.6.2. Python File Object         2.6.3. Data Files 2.7. Databases
3.1. About Processing 3.2. Decisions         3.2.1 Decision Concepts         3.2.2 Conditions & Booleans         3.2.3 if Statements         3.2.4 if-else Statements         3.2.5 if-elif-else Statements         3.2.6 In-Line if Statements 3.3. Repetition (a.k.a. Loops)         3.3.1  Repetition Concepts         3.3.2  while Loops         3.3.3  for Loops         3.3.4  Nested Loops         3.3.5  Validating User Input 3.4. Functions         3.4.1  Function Concepts         3.4.2  Built-In Functions         3.4.3  Programmer Defined Functions 3.5. Libraries         3.5.1  Library Concepts         3.5.2  Standard Library         3.5.3  External Libraries 3.6. Processing Case Studies         3.6.1  Case Studies         3.6.2  Parsing Data
4.1. About Output 4.2. Advanced Printing 4.3. Data Visualization   4.4  Sound
  4.5  Graphics
  4.6  Video
  4.7  Web Output
  4.8  PDFs & Documents
  4.9  Dashboards
  4.10  Animation & Games
  4.11  Text to Speech

5.1 About Disciplines 5.2 Accounting 5.3 Architecture 5.4 Art 5.5 Artificial Intelligence (AI) 5.6 Autonomous Vehicles 5.7 Bioinformatics 5.8 Biology 5.9 Bitcoin 5.10 Blockchain 5.11 Business 5.12 Business Analytics 5.13 Chemistry 5.14 Communication 5.15 Computational Photography 5.16 Computer Science 5.17 Creative Writing 5.18 Cryptocurrency 5.19 Cultural Studies 5.20 Data Analytics 5.21 Data Engineering 5.22 Data Science 5.23 Data Visualization 5.24 Drone Piloting 5.25 Economics 5.26 Education 5.27 Engineering 5.28 English 5.29 Entrepreneurship 5.30 Environmental Studies 5.31 Exercise Science 5.32 Film 5.33 Finance 5.34 Gaming 5.35 Gender Studies 5.36 Genetics 5.37 Geography 5.38 Geology 5.39 Geospatial Analysis ☯ 5.40 History 5.41 Humanities 5.42 Information Systems 5.43 Languages 5.44 Law 5.45 Linguistics 5.46 Literature 5.47 Machine Learning 5.48 Management 5.49 Marketing 5.50 Mathematics 5.51 Medicine 5.52 Military 5.53 Model Railroading 5.54 Music 5.55 Natural Language Processing (NLP) 5.56 Network Analysis 5.57 Neural Networks 5.58 Neurology 5.59 Nursing 5.60 Pharmacology 5.61 Philosophy 5.62 Physiology 5.63 Politics 5.64 Psychiatry 5.65 Psychology 5.66 Real Estate 5.67 Recreation 5.68 Remote Control (RC) Vehicles 5.69 Rhetoric 5.70 Science 5.71 Sociology 5.72 Sports 5.73 Stock Trading 5.74 Text Mining 5.75 Weather 5.76 Writing
6.1. Databases         6.1.1 Overview of Databases         6.1.2 SQLite Databases         6.1.3 Querying a SQLite Database         6.1.4 CRUD Operations with SQLite         6.1.5 Connecting to Other Databases
Built-In Functions Conceptss Data Types Date & Time Format Codes Dictionary Methods Escape Sequences File Access Modes File Object Methods Python Keywords List Methods Operators Set Methods String Methods Tuple Methods Glossary Index Appendices   Software Install & Setup
  Coding Tools:
  A.  Python    B.  Google CoLaboratory    C.  Visual Studio Code    D.  PyCharm IDE    E.  Git    F.  GitHub 
  Database Tools:
  G.  SQLite Database    H.  MySQL Database 


Python Across Disciplines
by John Gordon © 2023

Table of Contents

Table of Contents  »  Chapter 4 : Output : Data Visualization

Data Visualization

Subscribe Contact


Overview

Data Visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In Python programming, numerous libraries make the process of visualizing data straightforward and effective. This section introduces you to the fundamental concepts of data visualization, discuss why it is important, and explore how you can use Python to create your visual representations.

Why Data Visualization Matters

Basic Types of Data Visualization



Example Line Chart


Example Bart Chart


Example Pie Chart


Example Scatter Plot


Example Histogram


Example Heat Map

Best Practices for Data Visualization

Common Libraries Used in Data Visualization

Python offers a wide range of data visualization libraries, catering to various needs from simple plots to advanced interactive visualizations. Here's an overview of some of the most popular and widely used data visualization libraries in Python:

Each of these libraries has its strengths and is suited to different types of data visualization tasks, from simple plots to complex interactive web applications. The choice of library often depends on the specific requirements of the project, such as the complexity of the visualization, the need for interactivity, and the target audience (web, publication, exploratory analysis, etc.).

Line Graphs

A line graph is a type of chart used to display information as a series of data points connected by straight line segments. It is a fundamental tool in data visualization that is effective at showing trends over time or categories. Line graphs are particularly useful for illustrating the change of one or more variables, allowing viewers to see patterns, progress, or fluctuations within the data. They are favored for their clarity and simplicity, making it easy to compare multiple data sets or to track changes across different periods or groups. One might use a line graph to analyze business trends, such as sales revenue or customer growth over months or years, to forecast future performance, or to identify patterns that could inform strategic decisions. By visually representing data in a line graph, complex information becomes more accessible, enabling a straightforward interpretation of temporal relationships, trends, and potential outliers within the dataset.

Note: We'll use the matplotlib library for the line graph examples below. You can find full documentation on this library here.

Simple Line Graph Example

This example uses the matplotlib library to create a simple line graph of some data points. Read the code carefully, especially the comments, as a guide to how this program works.

Code

# First import the matplotlib library
import matplotlib.pyplot as plt

# Create some simple data to use with this example
# The values in the two lists can be any numeric values
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Create a figure (container for the visualization) and axis (the x & y axis of the plot) for the plot
fig, ax = plt.subplots()

# Use the plot method to create a plot of the data
ax.plot(x, y)

# Set attributes of the plot for readability
ax.set_title('Simple Line Graph Plot')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')

# Show the plot
plt.show()

Output

Figure 1. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 1: Example Line Graph

Expand the Line Graph Example

Now let's copy the code above and make a few modifications to explore the ideas further. In this version, we'll add values to the dataset with the recommendation of trying different values, and rerunning the program to explore how the values alter the appearance of the graph. Also, we'll add markers to the graph so that the data points are more prominent on the graph. This is an example of using parameters of the plot() method to control the appearance of the graph. See the full documentation of the matplotlib.pyplot.plot method here for additional details.

Code

# First import the matplotlib library
import matplotlib.pyplot as plt

# In this example, we've added more values with more variability
# To experiment with the line graph, change a few values and 
# re-run this program to see the changes.
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
y = [2, 20, 17, 12, 11, 13, 14, 9, 17, 19, 21, 31, 29 ,28, 32, 27, 28, 30, 31, 35]

# Create a figure (container for the visualization) and axis (the x & y axis of the plot) for the plot
fig, ax = plt.subplots()

# Use the plot method to create a plot of the data
# In this example, we've added the "marker='o'" attribute
# to demonstrate how we can adjust visual aspects of 
# the graph with additional parameters in the call to
# the plot() method.
ax.plot(x, y, marker='o')

# Set attributes of the plot for readability
ax.set_title('Simple Line Graph Plot')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')

# Show the plot
plt.show()

Output

Figure 1. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 2: Modified Line Graph Example

Adding a Trend Line to the Line Graph Example

Next we'll add a trend line to the line graph to help us see the underlying trend in the dataset.

Concept: Trend Lines
Full Concepts List: Alphabetical  or By Chapter 

Trend lines are straight or curved line in a chart that represent the general pattern or direction of the data. It is used to illustrate the underlying trend in a dataset over a specific period. In Python, particularly with libraries like Matplotlib, Seaborn, or Plotly, trend lines can be generated through statistical methods, such as linear regression, to fit the best possible line that summarizes the relationship between two variables. The primary purpose of a trend line is to make it easier to identify and understand the underlying patterns in the data, whether they're increasing, decreasing, or remain constant over time. By adding a trend line to a scatter plot or any other kind of chart, analysts and data scientists can highlight the average movement, predict future values, and identify anomalies or deviations from the expected pattern. This makes trend lines an invaluable tool for data analysis and forecasting in various fields, including finance, economics, and environmental studies, where understanding the direction of trends is crucial for decision-making.

Code

# In addition to the matplotlib library we also need the numpy library
# which we'll use to produce the trendline based on calculations provided
# by the numpy library
import matplotlib.pyplot as plt
import numpy as np

# In this example, we've added more values with more variability
# To experiment with the line graph, change a few values and 
# re-run this program to see the changes.
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
y = [2, 20, 17, 12, 11, 13, 14, 9, 17, 19, 21, 31, 29 ,28, 32, 27, 28, 30, 31, 35]

# Use the plot method to create a plot of the dataset above. In addition to the
# "marker='o'" attribute that we included in the previous example, now we're 
# also adding a label as well to provide a legend of two different lines that 
# will appear on the output graph
plt.plot(x, y, marker="o", label='Original Data')

# Next we'll use the numpy library to calculate the trendline. At this point
# you don't need to understand the mathematics behind it, but numpy uses
# concepts called linear regression and polynomials to calculate the trend line
# and then we will add it to the graph. In the z calculation below, the 1 indicates
# a first-degree polynomial (linear) fit. If we change it to 2 or higher, the "fit" 
# will approach closer and closer to the plot of the dataset itself.
z = np.polyfit(x, y, 1)
p = np.poly1d(z)

# Now we can plot the trendline, including another label to add to the output legend
plt.plot(x, p(x), "r--", label='Trendline')

# Set attributes of the plot for readability
plt.title('Simple Line Plot with Trendline')
plt.xlabel('x axis')
plt.ylabel('y axis')

# Show the legend
plt.legend()

# Display the plot
plt.show()

Output

Figure 3. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 3: Line Graph with Trend Line Added

Experimentation

Once you have the code above working, you can experiment with this graph to learn a bit more about how it operates by altering the dataset values (Code Lines 11 & 12) and also the "fit" value (Code Line 26) by changing it to a number larger than 1, and then re-run the code to see the changes in the line graph.



Scatter Plots

Another common graphing tools is called scatter plots, which are a type of data visualization that display values for typically two variables for a set of data on a Cartesian coordinate system. Points on the plot represent the values of individual data points. This kind of plot is particularly useful for examining the relationship between two numerical variables, allowing viewers to detect any correlations, trends, clusters, or outliers within the data. Analysts and researchers often use scatter plots to explore hypotheses about causal relationships or to investigate the distribution and grouping of data points in a more exploratory phase of analysis. For instance, in the field of healthcare, scatter plots can be employed to study the relationship between patients' age and their response to a particular treatment. The intuitive and straightforward nature of scatter plots makes them an invaluable tool for initial data analysis, helping to uncover underlying patterns or relationships that might warrant further investigation or statistical analysis.

Simple Scatter Plot Example

This example uses the matplotlib library to create a simple scatter plot of some data points. Read the code carefully, especially the comments, as a guide to how this program works.

Code

# First import the matplotlib library
import matplotlib.pyplot as plt

# We'll continue using this dataset that we constructed for the line graphs above
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
y = [2, 20, 17, 12, 11, 13, 14, 9, 17, 19, 21, 31, 29 ,28, 32, 27, 28, 30, 31, 35]

# Next, we'll use matplotlib to create a scatter plot. Notice the use of the
# parameters to set up the appearance of the plot
plt.scatter(x, y, c='blue', marker='o', edgecolor='blue', linewidth=1, alpha=0.75)

# Set attributes of the plot for readability
plt.title('Scatter Plot Example')
plt.xlabel('X values')
plt.ylabel('Y values')

# Optional: Adding a grid for better readability
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Show the plot
plt.show()

Output

Figure 4. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 4: Simple Scatter Plot

Adding a Trend Line to the Scatter Plot Example

Next we'll add a trend line to the scatter plot to help us see the underlying trend in the dataset.

Code

import matplotlib.pyplot as plt
import numpy as np  # Import NumPy for numerical calculations

# Dataset
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
y = np.array([2, 20, 17, 12, 11, 13, 14, 9, 17, 19, 21, 31, 29 ,28, 32, 27, 28, 30, 31, 35])

# Creating a scatter plot
plt.scatter(x, y, c='blue', marker='o', edgecolor='black', linewidth=1, alpha=0.75, label='Data Points')

# Calculate coefficients for the trend line (linear regression)
m, b = np.polyfit(x, y, 1)

# Add the trend line to the plot
plt.plot(x, m*x + b, color='red', linewidth=1, label='Trend Line')

# Adding titles and labels
plt.title('Scatter Plot with Trend Line')
plt.xlabel('X values')
plt.ylabel('Y values')

# Optional: Adding a grid for better readability
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Adding legend to the plot to identify the trend line
plt.legend()

# Show the plot
plt.show()

Output

Figure 5. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 5: Scatter Plot with Trend Line Added

Experimentation

Once you have the code above working, you can experiment with this plot to learn a bit more about how it operates by altering dataset values, and then re-run the code to see the changes in the plot.



Tabular Datasets and the Pandas Library

In Chapter 3 we discussed Datasets which included a tabular dataset of high and low temperature data for the month of November, 2023 in Salt Lake City. That dataset is tabular and is represented in the Chapter 3 examples as a list of lists, like this:

data = [
    ["2023-11-01", 58, 32], ["2023-11-02", 64, 35], ["2023-11-03", 67, 44],
    ["2023-11-04", 67, 41], ["2023-11-05", 65, 45], ["2023-11-06", 71, 48],
    ["2023-11-07", 52, 38], ["2023-11-08", 49, 34], ["2023-11-09", 49, 31],
    ["2023-11-10", 53, 31], ["2023-11-11", 54, 31], ["2023-11-12", 63, 33],
    ["2023-11-13", 66, 42], ["2023-11-14", 66, 38], ["2023-11-15", 68, 44],
    ["2023-11-16", 59, 41], ["2023-11-17", 58, 36], ["2023-11-18", 52, 39],
    ["2023-11-19", 49, 35], ["2023-11-20", 46, 34], ["2023-11-21", 48, 30],
    ["2023-11-22", 49, 30], ["2023-11-23", 42, 32], ["2023-11-24", 36, 32],
    ["2023-11-25", 38, 29], ["2023-11-26", 36, 29], ["2023-11-27", 40, 25],
    ["2023-11-28", 41, 24], ["2023-11-29", 36, 25], ["2023-11-30", 34, 30]
]

Concept: Tabular Dataset
Full Concepts List: Alphabetical  or By Chapter 

Tabular datasets organize data into rows and columns, much like a spreadsheet, where each row typically represents a unique record, observation, or entity, and each column corresponds to a specific attribute or variable associated with that record. This format is highly structured, making it straightforward to access, manipulate, and analyze data using various software tools, including relational databases and data analysis libraries in Python like Pandas. Tabular data is extensively used in almost every domain, from business and finance to science and healthcare, due to its intuitive organization and compatibility with analytical processes. It supports a wide range of analyses, including statistical analysis, data visualization, and machine learning tasks. For example, in a healthcare dataset, each row might represent a patient, while columns could include attributes like age, gender, diagnosis, and treatment outcome. The tabular format's clear and organized nature facilitates efficient data processing, querying, and insight generation, making it a cornerstone of data analysis and decision-making.

When our dataset is tabular, the pandas Python library is a very good choice for managing that tabular data. In the following example, we'll use the temperature dataset along with pandas to produce a line graph and a scatter plot to demonstrate the use of the pandas library with tabular data and data visualization libraries.

Concept: DataFrame
Full Concepts List: Alphabetical  or By Chapter 

A DataFrame in Pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is one of the most widely used data structures in Pandas, providing a powerful and flexible tool to handle and analyze structured data. Each column in a DataFrame can be of a different data type, allowing it to mimic the functionality of a spreadsheet or SQL table but with the added advantages of being integrated into the Python ecosystem. DataFrames support a vast array of operations including data manipulation (such as sorting, grouping, and merging), filtering, and complex querying, making it exceptionally suited for data analysis tasks. Furthermore, DataFrames seamlessly interact with other Python libraries, including NumPy for numerical computations and Matplotlib for plotting, enabling comprehensive data analysis workflows. This versatility makes the Pandas DataFrame an indispensable component for data scientists and analysts working with data in Python, facilitating the easy exploration, cleaning, transformation, and visualization of complex datasets.

Line Graph of Tabular Data

In the following example, we'll use the November 2023 Salt Lake City temperature dataset to produce a line graph of the tabular data.

Code

# In this example, we'll import the pandas, seaborn and matplotlib libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Tabular dataset
data = [
    ["2023-11-01", 58, 32], ["2023-11-02", 64, 35], ["2023-11-03", 67, 44],
    ["2023-11-04", 67, 41], ["2023-11-05", 65, 45], ["2023-11-06", 71, 48],
    ["2023-11-07", 52, 38], ["2023-11-08", 49, 34], ["2023-11-09", 49, 31],
    ["2023-11-10", 53, 31], ["2023-11-11", 54, 31], ["2023-11-12", 63, 33],
    ["2023-11-13", 66, 42], ["2023-11-14", 66, 38], ["2023-11-15", 68, 44],
    ["2023-11-16", 59, 41], ["2023-11-17", 58, 36], ["2023-11-18", 52, 39],
    ["2023-11-19", 49, 35], ["2023-11-20", 46, 34], ["2023-11-21", 48, 30],
    ["2023-11-22", 49, 30], ["2023-11-23", 42, 32], ["2023-11-24", 36, 32],
    ["2023-11-25", 38, 29], ["2023-11-26", 36, 29], ["2023-11-27", 40, 25],
    ["2023-11-28", 41, 24], ["2023-11-29", 36, 25], ["2023-11-30", 34, 30]
]

# Convert the data to a pandas DataFrame
df = pd.DataFrame(data, columns=['Date', 'High Temp', 'Low Temp'])

# Convert (cast) the 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Create a line plot
sns.set_theme(style="whitegrid")  # Setting the theme
plt.figure(figsize=(10, 6))  # Adjusting the figure size

# Plotting both high and low temperatures as lines
sns.lineplot(x='Date', y='High Temp', data=df, color='red', label='High Temp')
sns.lineplot(x='Date', y='Low Temp', data=df, color='blue', label='Low Temp')

plt.title('High and Low Temperatures in November 2023')
plt.xlabel('Date')
plt.ylabel('Temperature (°F)')
plt.xticks(rotation=90)
plt.legend()

plt.show()

Output

Figure 6. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 6: Line Graph of Tabular Data using Pandas

Experimentation

Once you have the code above working, you can experiment with this plot to learn a bit more about how it operates by altering dataset values, and then re-run the code to see the changes in the plot.

Adding Trend Lines to the Line Graph of Tabular Data

Next we'll add a trend lines to the line graph to help us see the underlying trend in the tabular dataset.

Code

# In this example, we'll import the pandas, seaborn and matplotlib libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Tabular dataset
data = [
    ["2023-11-01", 58, 32], ["2023-11-02", 64, 35], ["2023-11-03", 67, 44],
    ["2023-11-04", 67, 41], ["2023-11-05", 65, 45], ["2023-11-06", 71, 48],
    ["2023-11-07", 52, 38], ["2023-11-08", 49, 34], ["2023-11-09", 49, 31],
    ["2023-11-10", 53, 31], ["2023-11-11", 54, 31], ["2023-11-12", 63, 33],
    ["2023-11-13", 66, 42], ["2023-11-14", 66, 38], ["2023-11-15", 68, 44],
    ["2023-11-16", 59, 41], ["2023-11-17", 58, 36], ["2023-11-18", 52, 39],
    ["2023-11-19", 49, 35], ["2023-11-20", 46, 34], ["2023-11-21", 48, 30],
    ["2023-11-22", 49, 30], ["2023-11-23", 42, 32], ["2023-11-24", 36, 32],
    ["2023-11-25", 38, 29], ["2023-11-26", 36, 29], ["2023-11-27", 40, 25],
    ["2023-11-28", 41, 24], ["2023-11-29", 36, 25], ["2023-11-30", 34, 30]
]

# Convert the data to a pandas DataFrame
df = pd.DataFrame(data, columns=['Date', 'High Temp', 'Low Temp'])

# Convert the 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Convert dates into a numeric format for regression
df['DateNum'] = mdates.date2num(df['Date'])

# Create a line plot
sns.set_theme(style="whitegrid")
plt.figure(figsize=(10, 6))

# Plotting both high and low temperatures as lines
sns.lineplot(x='Date', y='High Temp', data=df, color='red', label='High Temp')
sns.lineplot(x='Date', y='Low Temp', data=df, color='blue', label='Low Temp')

# Adding trend lines using regplot
# Note: regplot requires a scatterplot, but we can set scatter=False to only show the regression line
sns.regplot(x='DateNum', y='High Temp', data=df, color='red', scatter=False)
sns.regplot(x='DateNum', y='Low Temp', data=df, color='blue', scatter=False)

plt.title('High and Low Temperatures in November 2023 with Trend Lines')
plt.xlabel('Date')
plt.ylabel('Temperature (°F)')
plt.xticks(rotation=45)

# Convert the numeric x-ticks back to readable dates
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(locator)
plt.gca().xaxis.set_major_locator(locator)
plt.gca().xaxis.set_major_formatter(formatter)

plt.legend()
plt.show()

Output

Figure 7. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 7: Trend Lines on the Line Graph of Tabular Data using Pandas

Experimentation

Once you have the code above working, you can experiment with this plot to learn a bit more about how it operates by altering dataset values, and then re-run the code to see the changes in the plot.

Scatter Plot of Tabular Data

Next we'll create a scatter plot of the tabular dataset.

Code

# In this example, we'll import the pandas, seaborn and matplotlib libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Tabular dataset
data = [
    ["2023-11-01", 58, 32], ["2023-11-02", 64, 35], ["2023-11-03", 67, 44],
    ["2023-11-04", 67, 41], ["2023-11-05", 65, 45], ["2023-11-06", 71, 48],
    ["2023-11-07", 52, 38], ["2023-11-08", 49, 34], ["2023-11-09", 49, 31],
    ["2023-11-10", 53, 31], ["2023-11-11", 54, 31], ["2023-11-12", 63, 33],
    ["2023-11-13", 66, 42], ["2023-11-14", 66, 38], ["2023-11-15", 68, 44],
    ["2023-11-16", 59, 41], ["2023-11-17", 58, 36], ["2023-11-18", 52, 39],
    ["2023-11-19", 49, 35], ["2023-11-20", 46, 34], ["2023-11-21", 48, 30],
    ["2023-11-22", 49, 30], ["2023-11-23", 42, 32], ["2023-11-24", 36, 32],
    ["2023-11-25", 38, 29], ["2023-11-26", 36, 29], ["2023-11-27", 40, 25],
    ["2023-11-28", 41, 24], ["2023-11-29", 36, 25], ["2023-11-30", 34, 30]
]

# Convert the dataset to a pandas DataFrame
df = pd.DataFrame(data, columns=['Date', 'High Temp', 'Low Temp'])

# Convert (cast) the 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Create a scatter plot
sns.set_theme(style="whitegrid")  # Setting the theme
plt.figure(figsize=(10, 6))  # Adjust the figure size

# Plotting both high and low temperatures
sns.scatterplot(x='Date', y='High Temp', data=df, color='red', label='High Temp')
sns.scatterplot(x='Date', y='Low Temp', data=df, color='blue', label='Low Temp')

# Set titles and labels
plt.title('High and Low Temperatures in November 2023')
plt.xlabel('Date')
plt.ylabel('Temperature (°F)')
plt.xticks(rotation=90)
plt.legend()

plt.show()

Output

Figure 8. shows the output of the above code. Study the code, the comments in the code, and the output graph, pay particular attention how the elements of the code results in aspects of the graph.



Figure 8: Scatter Plot of Tabular Data using Pandas

Experimentation

Once you have the code above working, you can experiment with this plot to learn a bit more about how it operates by altering dataset values, and then re-run the code to see the changes in the plot.



 


«  Previous : Output : Advanced Printing
Next : Output : Sound  »




© 2023 John Gordon
Cascade Street Publishing, LLC