Programming Across Disciplines

Table of Contents » Chapter 3 : Processing : Case Studies : Automation

Automation

Overview
Thinking in Terms of Automation
Persistent Automation Processes
Scheduling Tasks in Python
Implementation Examples

Overview

Automation programming is a discipline within software development that involves creating programs or scripts to automate repetitive tasks or processes. It aims to streamline workflows, increase efficiency, and reduce human error by replacing manual interventions with automated solutions. Automation programming is widely used across various industries and domains, ranging from system administration and software testing to data analysis and web scraping.

Python offers a wide range of tools and libraries that make automation programming efficient and straightforward. Here are some examples of how Python can be used for automation:

File and Folder Manipulation
Web Scraping
Data Parsing and Processing
GUI Automation
Automated Testing
Task Scheduling
Email Automation
Database Automation
Network Automation
Social Media Automation

Thinking in Terms of Automation

Thinking in terms of automation refers to adopting a mindset that seeks to identify opportunities for automating tasks, processes, and workflows to increase efficiency, productivity, and accuracy. It involves analyzing repetitive or time-consuming activities and considering how they can be streamlined or eliminated through the use of automated solutions. By embracing this mindset, individuals and organizations can optimize their operations and focus on more valuable and strategic work.

Automation programming differs from user-centric software. A fully automated application has little to no output, so the traditional model of input-processing-output becomes input-processing. In this context then, the automation programmer has to eliminate all output, including error messages, because the application is intended to operate without human interaction. So, there is no one sitting in front of a computer screen to see the error messages or any other output. If you consider the enormous number of possible situations and errors that can occur in software, seemingly simple tasks require a great deal of forethought and planning.

To think in terms of automation, there are several key aspects to consider:

Identifying Automation Opportunities: The first step is to develop an awareness of tasks or processes that can be automated. This involves observing and analyzing repetitive activities that consume significant time and resources. It's essential to look for patterns, bottlenecks, and areas where human error is more likely to occur.
Evaluating Feasibility: Once potential automation opportunities are identified, it's crucial to assess their feasibility. This includes considering factors such as complexity, resources required, potential benefits, and the availability of tools or technologies that can automate the task effectively.
Considering the Return On Investment (ROI): Automation initiatives should be evaluated based on their ROI. Calculating the costs involved in implementing automation versus the expected benefits, such as time saved, increased accuracy, or improved productivity, helps determine the value of pursuing automation in a particular area.
Choosing the Right Automation Tools: Selecting appropriate tools or technologies is a crucial aspect of automation. It involves researching and identifying software, programming languages, libraries, or frameworks that best suit the task at hand. Factors to consider include functionality, ease of use, scalability, and compatibility with existing systems.
Designing Efficient Workflow: Thinking in terms of automation also requires envisioning optimized workflows. This involves breaking down processes into smaller, more manageable steps and identifying opportunities for parallelization or streamlining. It may also involve redefining roles and responsibilities to align with automated tasks.
Embracing Continuous Improvement: Automation is an ongoing journey. It's important to continuously evaluate and refine automated processes to ensure they remain effective and aligned with evolving needs. Regularly assessing the impact of automation, gathering feedback, and making necessary adjustments helps maximize its benefits.
Considering Human-Machine Collaboration: Automation does not mean replacing humans entirely. It's essential to recognize the value of human judgment, creativity, and adaptability. Thinking in terms of automation involves finding the right balance between automated and human-driven tasks, fostering collaboration between humans and machines, and leveraging the strengths of both.
Promoting a Culture of Automation: Embracing automation requires a cultural shift within organizations. Leaders should encourage employees to think critically about their work, identify automation opportunities, and provide the necessary resources and support for automation initiatives. Promoting a culture that values innovation, efficiency, and continuous improvement fosters a mindset of automation.

By adopting a mindset of thinking in terms of automation, individuals and organizations can unlock significant benefits. Automation frees up time and resources, reduces errors, improves efficiency, and enables a focus on higher-value tasks. It empowers individuals to leverage technology to its fullest potential and embrace the transformative power of automation in a rapidly evolving world.

Persistent Automation Processes

Persistent automation processes in Python refer to long-running tasks or workflows that are designed to run continuously or repeatedly, typically in the background, without the need for manual intervention. These processes automate repetitive tasks, handle data processing, or perform other operations that require continuous execution.

Here is a detailed description of persistent automation processes in Python:

Long-Running Tasks: Persistent automation processes often involve executing long-running tasks, such as data analysis, data scraping, or running complex algorithms. These tasks may require a significant amount of time to complete, and running them persistently ensures that they continue even if the program or script is not actively being monitored.
Background Execution: Persistent automation processes are usually executed in the background, separate from the main program or script. This allows the main program to continue executing its primary functions while the automation process runs concurrently. Background execution can be achieved using techniques like threading, multiprocessing, or asynchronous programming.
Scheduling and Triggers: Automation processes can be scheduled to run at specific times or triggered by specific events. Python provides several libraries, such as schedule or APScheduler, that allow you to define schedules or triggers for automation tasks. For example, you can schedule a task to run every day at a certain time or trigger it when a specific event occurs, like receiving a new file in a directory.
Error Handling and Logging: When dealing with persistent automation processes, it's important to handle errors and exceptions gracefully. Python provides built-in mechanisms for error handling, such as try-except blocks, which can be used to catch and handle exceptions that may occur during the execution of automation tasks. Additionally, logging mechanisms can be implemented to record information about the execution of tasks, including any errors or warnings encountered.
Persistence and State Management: Persistent automation processes often need to maintain their state between executions. This includes preserving data, configurations, or intermediate results across multiple runs. Python provides various methods to achieve persistence, such as storing data in databases, writing to files, or using caching mechanisms. By managing state effectively, automation processes can resume execution from where they left off and maintain continuity.
Monitoring and Reporting: It's crucial to monitor and track the execution of persistent automation processes. This can involve logging the progress, collecting metrics, and generating reports or notifications. Python libraries like logging or prometheus_client can assist in logging and monitoring various aspects of the automation process, allowing you to gain insights into its performance and detect any anomalies or issues.
Scalability and Parallelism: In scenarios where the workload of automation processes becomes substantial, it may be necessary to scale the execution to handle the increased demands. Python provides features like multiprocessing, distributed computing frameworks (e.g., Celery), or message queues (e.g., RabbitMQ) to enable parallelism and distributed processing, allowing tasks to be executed across multiple processes or machines.
Robustness and Error Recovery: Persistent automation processes should be designed to be robust and capable of recovering from failures. This involves implementing mechanisms to handle unexpected errors, restart failed tasks, or gracefully shut down the automation process when necessary. Python's exception handling capabilities and techniques like process supervision or container orchestration can help achieve fault tolerance and reliability.

Also, to ensure that a persistent automation process survives computer reboots, shutdowns, and restarts, you need to employ techniques for persistence and process management. Here's an explanation of how to achieve this:

Process Monitoring: To survive computer reboots and shutdowns, it's important to have a mechanism in place that monitors the automation process. This mechanism can be implemented using system tools or utilities that automatically start the process upon system boot or restart. For example, on Unix-based systems, you can use tools like systemd or init.d scripts, while on Windows, you can use services or the Windows Task Scheduler to start the process.
State Persistence: To ensure that the automation process resumes from where it left off after a computer restart, you need to persist its state. This involves storing critical information, such as the progress of tasks, configuration settings, or intermediate results, to a durable storage medium. Common approaches include writing data to databases, files, or using caching mechanisms. When the process restarts, it can retrieve the persisted state and continue execution based on that information.
Graceful Shutdown and Recovery: It's important to implement a mechanism for gracefully shutting down the automation process before a computer reboot or shutdown. This allows the process to save its state and perform any necessary cleanup operations. For example, you can catch system signals (e.g., SIGINT on Unix) or use operating system-specific APIs to detect shutdown events and trigger the shutdown process. By properly shutting down the process, you can ensure that it can recover and resume execution correctly when the system restarts.
Logging and Error Handling: Robust logging and error handling mechanisms are crucial for the recovery of persistent automation processes. By logging relevant information about the process's execution, errors, and exceptions encountered, you can gain insights into any issues that may have caused the process to halt or fail. With detailed logs, you can analyze the cause of failure and implement appropriate recovery strategies. Additionally, implementing error handling techniques, such as exception handling, allows you to gracefully handle errors and recover from them when the process restarts.
Watchdog and Self-Recovery: In some cases, the automation process itself can monitor its own health and recover from failures automatically. This is achieved by implementing a watchdog mechanism that periodically checks if the process is still running and functional. If the process becomes unresponsive or crashes, the watchdog can take action, such as restarting the process or triggering recovery procedures. This self-recovery mechanism helps ensure the automation process remains persistent and continues running even in the face of unexpected failures.
External Coordination: In certain scenarios, it may be necessary to coordinate the execution of persistent automation processes with external systems or components. For example, you may need to communicate with a task queue or a message broker to retrieve work items or notify external systems about the progress of tasks. By integrating with external systems through appropriate APIs or protocols, you can synchronize the state and progress of the automation process, even across computer reboots or restarts.

By combining these techniques, you can create a robust and persistent automation process in Python that survives computer reboots, shutdowns, and restarts. It ensures that the process resumes execution from its last state, handles errors gracefully, and recovers from failures to maintain continuity and reliability in automation workflows.

Scheduling Tasks in Python

There are numerous approaches to scheduling tasks in Python. A popular approach is using the schedule library (You can read more about this library and find full documentation on Pypi.org here ↗ or the schedule library home page here ↗). This library provides an intuitive approach for scheduling tasks in Python code. It can handle scheduling at a variety of intervals using its every() method, for example, every # minutes, hour, day, day of the week, every specified second of every minute, etc.

Schedule Library General Form

import schedule
schedule.every().[interval].at([time]).do([job])

Note: The general form presented here is simplified for our purposes here. See the full documentation ↗) for all of the options.

Code Details

Code Line 1: In order to use the schedule library, it must be imported at the beginning of your code.
Code Line 2: Use the every() method along with the following:
- [interval]: The interval can be seconds, minutes, hours, days, weeks, or specified days of the week, like monday, tuesday, etc.
- at([time]): Optionally, you can add an at() value to specify the time of day, like 09:15:30.
- do([job]): The do() method is where we specify what should happen at that scheduled time indicated above. This is often a function or program call.

Examples

schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every(10).hours.do(job)
schedule.every(10).days.do(job)
schedule.every(10).weeks.do(job)
schedule.every().monday.at("09:00").do(job)
schedule.every().friday.at("23:59").do(job)

Note: In the above examples, job represents any callable function or code that would contain the logic you want to run when the schedule event is activated.

Code Details

Code Line 1: This is an example of using the schedule library to run job every 10 seconds.
Code Line 2: This is an example of using the schedule library to run job every 10 minutes.
Code Line 3: This is an example of using the schedule library to run job every 10 hours.
Code Line 4: This is an example of using the schedule library to run job every 10 days.
Code Line 5: This is an example of using the schedule library to run job every Monday at 9:00 am.
Code Line 6: This is an example of using the schedule library to run job every Friday at 11:59 pm.

Implementation Examples

Logging Disk Space on an Schedule

In this example, we'll write a simple Python program that checks the amount of disk space a particular directory is consuming and write that information to a log file. This information could be used by a System Adminstrator to see the growth pattern of a directory over time. For demonstration purposes, the following code example checks a folder every thirty seconds, which would be very frequent for a real application. But using this example, I was able to demonstrate activity in a folder by moving files in and out of the folder over a short period of time. In a real implementation, the time schedule would likely be set to a wider timespan.

Code

import os
import datetime
import schedule
import time

TEMP_FOLDER = "c:\\TempFolder\\"
LOG_FILE = 'TempFolderDiskSpace.log'

def get_folder_size():
    total_size = 0
    for path, _, files in os.walk(TEMP_FOLDER):
        for file in files:
            file_path = os.path.join(path, file)
            total_size += os.path.getsize(file_path)
    return total_size

def write_to_log_file(size):
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    log_entry = f"{timestamp} - Size: {size} bytes\n"
    with open(LOG_FILE, 'a') as file:
        file.write(log_entry)

def check_folder_size():
    folder_size = get_folder_size()
    write_to_log_file(folder_size)

schedule.every(10).seconds.do(check_folder_size)

while True:
    schedule.run_pending()
    time.sleep(1)

Log File Output Example

The code above builds a log file containing one entry for each run of the automation. Each entry indicates the time and cumulative size of all of the files in the directory being monitored, as shown below.

2023-02-20 21:40:21 - Size: 0 bytes
2023-02-20 21:40:52 - Size: 8845308 bytes
2023-02-20 21:41:22 - Size: 8845308 bytes
2023-02-20 21:41:52 - Size: 8845308 bytes
2023-02-20 21:42:22 - Size: 32913212 bytes
2023-02-20 21:42:52 - Size: 455509692 bytes
2023-02-20 21:43:23 - Size: 2007772428 bytes
2023-02-20 21:43:53 - Size: 2861877172 bytes
2023-02-20 21:44:23 - Size: 2861877172 bytes
2023-02-20 21:44:53 - Size: 2861877172 bytes
2023-02-20 21:45:24 - Size: 1946145612 bytes
2023-02-20 21:45:54 - Size: 1433494891 bytes
2023-02-20 21:46:24 - Size: 8845308 bytes
2023-02-20 21:46:54 - Size: 1739204948 bytes
2023-02-20 21:47:24 - Size: 823473388 bytes
2023-02-20 21:47:55 - Size: 2591 bytes
2023-02-20 21:48:25 - Size: 0 bytes
2023-02-20 21:48:55 - Size: 0 bytes

As we can see by the contents of the log file, the first iteration indicated that the directory was empty (Line 1). then files were added to the folder (Lines 2 thru 10), removed from the folder (Lines 11 thru 13), and more added (Line 14). Ultimately all files were removed and it returned to empty (Lines 17 thru 18).

Code Details

Code Lines 1 thru 4: First we import the os, datetime, schedule, and time libraries. These libraries support the file, time and schedling needs of our program.
Code Line 6: On this line we declare a constant variable TEMP_FOLDER and initalize it with the path to the location of the directory (folder) we want to monitor.
Code Line 7: Next we declare another constant variable LOG_FILE and initalize it to the name of the log file where the code will write the results of each run of the scheduled automation.
Code Lines 9 thru 15 : This function, get_folder_size() uses iteration to loop through each file found in the LOG_FILE and adds up the cumulative size of all files it finds in the TEMP_FOLDER.
Code Lines 17 thru 21: This function, write_to_log_file writes an entry into the LOG_FILE.
Code Lines 23 thru 25: This function, check_folder_size calls the get_folder_size() and write_to_log_file functions.
Code Line 27: This line contains the call to the schedule library that sets the time interval for the scheduler to run the job (check_folder_size).
Code Lines 29 thru 31: This while loop is intentionally set as an infinite loop, which means it will run until the process terminated by an action outside of the loop itself.

Purging Temp Files on an Schedule

In this example, we'll write a simple Python program that purges (deletes) the contents of a temp folder (directory) on a scheduled basis.

Code

import os
import datetime
import schedule
import time

TEMP_FOLDER = "c:\\TempFolder\\"    # This can be any folder where temp files are stored
LOG_FILE = 'TempFolderPurge.log'
DAYS_OLD = 7  # Age in days after which files will be purged

def get_folder_size():
    total_size = 0
    for path, _, files in os.walk(TEMP_FOLDER):
        for file in files:
            file_path = os.path.join(path, file)
            total_size += os.path.getsize(file_path)
    return total_size

def write_to_log_file(message):
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    log_entry = f"{timestamp} - {message}\n"
    with open(LOG_FILE, 'a') as file:
        file.write(log_entry)

def purge_old_files():
    now = time.time()
    cutoff = now - (DAYS_OLD * 86400)
    
    for path, _, files in os.walk(TEMP_FOLDER):
        for file in files:
            file_path = os.path.join(path, file)
            if os.path.getmtime(file_path) < cutoff:
                os.remove(file_path)
                write_to_log_file(f"Deleted {file_path}")

def check_and_purge_folder():
    folder_size_before = get_folder_size()
    write_to_log_file(f"Size before purge: {folder_size_before} bytes")
    
    purge_old_files()
    
    folder_size_after = get_folder_size()
    write_to_log_file(f"Size after purge: {folder_size_after} bytes")

# Schedule the folder check and purge to run daily at midnight
schedule.every().day.at("00:00").do(check_and_purge_folder)

while True:
    schedule.run_pending()
    time.sleep(1)

Output

Code Details

Code Lines 1 thru 4: First we import the os, datetime, schedule, and time libraries. These libraries support the file, time and schedling needs of our program.
Code Line 6: On this line we declare a constant variable TEMP_FOLDER and initalize it with the path to the location of the directory (folder) we want to purge.
Code Line 7: Next we declare another constant variable LOG_FILE and initalize it to the name of the log file where the code will write the results of each run of the scheduled automation.
Code Line 8: Next we declare another constant variable DAYS_OLD and initalize it to the age of files after which those files will be purged.
Code Lines 10 thru 16 : This function, get_folder_size() uses iteration to loop through each file found in the LOG_FILE and adds up the cumulative size of all files it finds in the TEMP_FOLDER.
Code Lines 18 thru 22: This function, write_to_log_file writes an entry into the LOG_FILE.
Code Lines 35 thru 42: This function, check_and_purge_folder calls the other functions to manage the execution flow of this program.
Code Line 45: This line contains the call to the schedule library that sets the time interval for the scheduler to run the job (check_and_purge_folder).
Code Lines 47 thru 49: This while loop is intentionally set as an infinite loop, which means it will run until the process terminated by an action outside of the loop itself.

Get National Weather Service Forecast

Code

import requests

def get_weather_forecast():
    url = "https://api.weather.gov/gridpoints/SLC/100,175/forecast"
    headers = {
        'User-Agent': 'Weather application',
        'Accept': 'application/json',
    }
    graph_line = ""
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        forecast_data = response.json()
        print()
        print("Salt Lake City Weather Forecast")
        print("-" * 130)
        print("Period".ljust(17) + "Temp".ljust(8) + "Graph".ljust(70) + "Forecast")
        print("-" * 130)
        for period in forecast_data['properties']['periods'][:11]:  # Adjust slice to change periods
            graph_line = "\u2584" * period['temperature']
            print(f"{period['name'].ljust(17)}{period['temperature']}°{period['temperatureUnit'].ljust(5)}{graph_line.ljust(70)}{period['shortForecast']}")
        print("-" * 130)
        print()
    else:
        print("Failed to retrieve forecast data")

if __name__ == "__main__":
    get_weather_forecast()

Output

----------------------------------------------------------------------------------------------------------------------------------
Salt Lake City Weather Forecast
----------------------------------------------------------------------------------------------------------------------------------
Period           Temp    Graph                                                                 Forecast
----------------------------------------------------------------------------------------------------------------------------------
This Afternoon   57°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄             Mostly Sunny
Tonight          45°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                         Mostly Cloudy
Friday           59°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄           Chance Light Rain
Friday Night     42°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                            Chance Light Rain
Saturday         53°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                 Rain then Patchy Blowing Dust
Saturday Night   28°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                                          Snow
Sunday           38°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                                Snow Likely
Sunday Night     21°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                                                 Chance Light Snow
Monday           38°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                                Slight Chance Light Snow
Monday Night     21°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                                                 Mostly Cloudy
Tuesday          41°F    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄                             Mostly Sunny
----------------------------------------------------------------------------------------------------------------------------------

Code Details

Code Line 1: First we import the requests library, which handles connections between this program and Internet resources.
Code Lines 3 thru 24: The get_weather_forecast function drives the logic of this program.
Code Line 4: On this line we set the URL of the Weather Service's API. Note that in this URL, the region location is indicated (in this example SLC). To get weather forecast information for other areas see the National Weather Service API Service documentation..
Code Lines 5 thru 8: These code lines define a dictionary named headers containing HTTP headers to be sent with a request, specifying the client (User-Agent) as a 'Weather application' and indicating that the client expects a response in JSON format (Accept: 'application/json'). These headers are often used to provide essential information to the server about the client making the request and the type of response it can handle.
Code Line 9: Next we initialize a variable that will store the graph line for each iteration over the records of the forecast.
Code Line 10: This line calls the weather API using the requests.get() method by passing the URL and headers dictionary to the API.
Code Line 11: This line checks if the HTTP status code returned from a web server's response is 200, indicating that the request was successfully received, understood, and processed, typically meaning "OK". It's a common way to verify that an HTTP request made using libraries like requests in Python completed successfully without errors.
Code Line 12: If the response is valid, then we capture the data returned by the API via JSON into the forecast_data variable.
Code Lines 13 thru 17: These lines print an informative report header for the user.
Code Lines 18 thru 20: This for loop iterates through each of the returned results to form an output line to print for each result.
Code Line 24: In the event no results are returned, this line prints a message indicating that the API call failed to return results.
Code Lines 26 thru 27: These two lines are the Main program, which simply calls the get_weather_forecast() function which controls the logic of the program.

« Previous : Processing : Case Studies : Named Entity Recognition (NER)

Next : Output : About Output »

« Previous : Processing : Case Studies : Named Entity Recognition (NER)
Next : Output : About Output »