Programming Across Disciplines

The first consideration is the operating system. While the concept of working with files is essentially the same in the Windows, Macintosh, and Linux operating systems, there are some differences that programmers need to be aware of when working with files in the file systems in those operating systems. For example, if I save a Word document like the example above, where is it saved? Many software packages, like Microsoft Word, have a default location where it will save files. On a Windows computer, it is often in a folder called Documents--easy. Well, it is easy, except who owns that folder? The user who was logged in at the time the file was saved. What if a different user logs into that same computer and needs to access my document? They won't find it in their Documents folder because it was saved in my Documents folder previously. Confusing. All of this leads us to several important concepts to review.

Example: Windows Computer

Let's take a Windows computer as an example. A user opens an application (Microsoft Word, for example). The application communicates with the operating system (Windows), which allocates memory (Random Access Memory (RAM)) and storage (hard drive) space to the application. When the user saves a document (file), and specifies a file name and location where they want it stored, the application communicates with the operating system and requests storage space at the requested location with the requested file name to store the file. The operating system uses the file system to arrange the storage location of the document (file), which is on the hardware (hard disk, for example). Later, after the user has closed the application when they want to open the document, they open the application again and use the application's file management tools (File - Open) to navigate to the location where their document is stored. The operating system uses the file system to locate the file and provides a connection to the file to the application so the user can see and work with it.

In this scenario, inside our application (MS Word in this example), to initially save our document, we would use the Save As dialog, which might look like this:

From this dialog, let's focus on a few key concepts that will lead us into working with files in Python. Those concepts are file names, file paths, and File Types.

File names are the labels we give to our files that we can use to identify and locate our files. Depending on the operating system in use, there are rules for file names, that is, what characters you can use in the name, the length of the name, the file extensions, whether file names are upper case or lower case, etc. Also, since the file name is our identifier for an individual file, file names must be unique from one file to the next.

File Paths indicate where a file was stored in the file system. Every file in the file system has a file path, like the above example, that we can follow to locate the file. Continuing with our example from above, we would write our file path to the Word document as C:\Users\John\Documents\Word\Example\ExampleDocument.docx. Notice that this is a full path, including the root (on a Windows computer, this is usually C:\). In more technical terms, we call this the absolute path. We can also see this visually like this ->

We can also consider the position of a file relative to the location where the application was installed or where its current working directory is located. The current working directory is the location our application focuses on at any given moment. For example, by default, Microsoft Word is focused on my Documents directory. So, instead of requiring the full path, Word can reference our example file by a path of Word\Example\ExcampleDocument.docx because that is the relative path from Word's current working directory to the file.

Shortly we will see that when writing code in Python, we will be able to work with files similarly, by their absolute path or relative paths. Before we get to that, though, let's take a look at the third concept from our screenshot above: file types.

A primary purpose of a file system is to provide a structured approach to storing, retrieving, and manipulatng digital files efficiently. File storages are the underlying mechanisms that dictate how files are physically stored and accessed on storage devices. Various technologies exist for storying digital files, such as hard disk drives, solid-state drives, network attached storage, cloud storage, and optical storage, which are used to implement file storage solutions, each with its own characteristics and advantages.

Here is a list of some of the major file storage technologies:

Magnetic Hard Disk Drives (HDDs): Magnetic hard disk drives have been a dominant storage medium for decades. These drives utilize magnetic platters coated with a ferromagnetic material to store data. The read/write heads, positioned above the spinning platters, magnetically encode and retrieve data. HDDs offer large storage capacities, making them ideal for storing vast amounts of data. However, they are relatively slow compared to other storage technologies, and mechanical failures can occur.
Solid-State Drives (SSDs): Solid-state drives have revolutionized file storage by eliminating mechanical components. Instead of magnetic platters, SSDs utilize flash memory chips to store data. This technology provides lightning-fast read and write speeds, significantly improving performance compared to HDDs. SSDs are highly reliable, shock-resistant, and consume less power. Although they are more expensive per gigabyte than HDDs, their speed and reliability make them popular for personal computers and data centers.
Network Attached Storage (NAS): Network Attached Storage is a specialized file storage system for network environments. NAS devices are independent storage units connected to a local network, allowing multiple users to access and share files. They provide a centralized storage solution, offering data redundancy, access control, and advanced features like remote access and data backup. NAS devices are used in homes, small offices, and enterprise environments where data sharing and collaboration are crucial.
Cloud Storage: Cloud storage has gained immense popularity due to its convenience and scalability. It enables users to store and access their files remotely through an internet connection. Cloud storage providers maintain vast data centers, where data is securely stored and replicated across multiple servers. Users can access their files from any device with an internet connection, making it an ideal choice for seamless file sharing and synchronization. Cloud storage services often provide additional features like version control, data encryption, and integration with other applications.
Optical Storage: Although less prevalent in modern computing, optical storage still holds significance in certain areas. It utilizes optical discs, such as CDs, DVDs, and Blu-ray discs, for data storage. Optical storage offers high-capacity, non-volatile storage, making it suitable for archiving and distributing large volumes of data. However, it has limitations regarding read/write speeds and rewritability.

Understanding different types of file storage is essential for working with computer systems. Each storage technology has its advantages and limitations, catering to diverse needs and use cases. By selecting the appropriate file storage, programmers can optimize their applications' data management, performance, and reliability.

Concept: File

A file in a computer is a collection of information stored on your computer's hard drive or other storage devices, such as a USB flash drive. This information can include text, pictures, music, videos, or other data. Each file is given a named, making it easy to find and access the file later on. Think of it like a physical file folder where you can store documents, except on a computer it is all digital. Files can be created, modified, deleted, and moved around on your computer's storage devices, making them a fundamental part of computer use.

There are hundreds of different types of files in computing. We classify file types in several ways, such as proprietary (file types created and controlled by a specific company or group), open standard (those that anyone can create and share), binary (those whose contents are not human readable, they are only readable by specific software or systems), text (those that are human-readable, sometimes referred to as plain text or ASCII text) and many others.

Concept: Encoding & Decoding

Encoding is setting characters (alphabetic characters, numbers, and symbols) into specialized formats for efficient storage and transmission. Decoding is the opposite process of returning characters to their original sequence. There are many proprietary encoding schemes and formats, such as the encoding scheme of a Microsoft Word Document (.docx), an Adobe Photoshop image file (.psd), etc. Other encoding schemes are open standards such as ASCII, UTF-8, XML, and JSON, which are used as common formats for sharing and saving data.

We often group files by type based on the type of application or use of those files. We often use file extensions as indicators of the type of file we're dealing with. Below is a brief list of common file types. For a more comprehensive list of file types, see the List of File Formats Wikipedia page.

Grouping	Common File Extensions & Formats
Text	ASCII, .html, .txt, UTF-8, .xml,
Compressed	.7z, .gzip, .rar, .tar, .zip,
Still Images	.bmp, .gif, .jpg, .png, .tiff
Moving Images	.avi, .mov, .mp4, .mpeg
Sounds	.aiff, .mp3, .mxf, .wav
Data Files	.csv, .json, .xml
Databases	.accdb, .db, .dbf, .mdb, .mdf
Souce Code Files	.asm, .c, .cpp, .cs, .go, .java, .js, .php, .py, .r, .sh,

Text vs Binary Files: One of the critical distinctions programmers need to understand is the difference between a text file and a binary file. The simple explanation is that text files contain human-readable characters while binary files do not. But what does that mean? Let's consider a Microsoft Word Document file with a file extension of .docx. That file type is a binary file, containing characters (data) in it that are not readable by humans. This is because the Word application stores a great deal of information in the file for formatting information, user data, etc.

Consider this example: If I create a new document in MS Word, type Hello and save it as Hello.docx. The only thing I put in that document is one word which looks like this when I have that document open in MS Word:

However, if I were to open this document in a plain text editor, it would look like this:

The plain text editor used in the above example is Notepad++. When you open a file with it, it does its best to interpret the contents of the file. In this case, the MS Word .docx file is binary, as indicated above, so Notepadd++ detects many characters in the file, but because it's binary and not plain text, it displays what it can, which ends up looking like strange characters.

Remember, I only typed "Hello" in the Word document, but notice at the bottom of the Notepad++ screen there are 12,189 characters and 72 lines in the file. If all I did was type one word, what is all that? As a proprietary file, Microsoft embeds a great deal of information in our documents to keep our formatting information (fonts, colors, etc.) as well as user information and other data. It stores all of that in its proprietary binary format. Plain text editors are not able to interpret all of that information.

If we reverse this and create a new file in Notepad++, type "Hello" and save it as Hello.txt, like this:

The result is a plain text file, not binary. So, if I open Hello.txt in any other editor, including MS Word, the only content is the word Hello; there are no additional characters in the file.

As we explore file handling in Python, these concepts will be important because we will programmatically work with file systems, locations, and types directly in the programming language. Rather than relying on an application to handle files, we will replace the application and handle files ourselves.

Python Scripts

A Python script is a set of instructions written in the Python programming language that a computer can read and execute. Think of it like a recipe for a computer, telling it what to do and how to do it. Python scripts can perform various tasks, such as manipulating data, automating repetitive tasks, or creating complex applications. To run a Python script, you need a Python interpreter installed on your computer, which reads the instructions in the script and executes them. Python is a popular language for scripting because it is relatively easy to learn yet powerful enough to handle many tasks. Once you have written a Python script, you can save it as a file on your computer and execute it whenever you need it to perform its job.

Python Modules

A Python module is a file containing Python code that defines functions, variables, and classes used by other Python programs. Think of it like a toolbox that contains specific tools you can use to solve particular problems. By importing a module into your Python program, you gain access to all the functionality defined in the module without having to write it from scratch. Python has many built-in modules, such as "os" for interacting with the operating system, "datetime" for working with dates and times, and "math" for performing mathematical operations. Additionally, many third-party modules are available that you can install and use in your Python programs to extend their functionality. Utilizing modules allows you to write more efficient and organized code, since you don't have to reinvent the wheel whenever you need to solve a particular problem.

Python Packages

In Python, a package is a way of organizing related modules together in a hierarchical manner. Think of it like a folder containing multiple Python files, each defining functionality that can be used in your program. A package can include other packages, and modules, making it a powerful way to organize and structure your Python code. Python packages often have a unique name, such as "numpy" for numerical computing or "pandas" for data analysis. By importing a package into your Python program, you gain access to all the modules and functionality defined in the package. This allows you to write more complex programs and leverage the power of existing Python libraries without having to write everything from scratch. Overall, Python packages are a key language feature that make it easy to organize and reuse your code.

Python Libraries

In Python, a library is a collection of modules, packages, and functions that provide specific functionality for your Python programs. Think of it like a set of tools that you can use to solve various problems. Python has a vast and diverse library ecosystem, with thousands of third-party libraries available. These libraries cover various domains, from scientific computing to web development to machine learning. By importing a library into your Python program, you can leverage the functionality provided by the library to write more powerful and efficient code. Some popular Python libraries include NumPy for numerical computing, Pandas for data analysis, and Matplotlib for data visualization. Overall, Python libraries are an essential part of the language making it easy to solve complex problems and create sophisticated applications.

Python Files Summary

In summary, a script is a standalone program; a module is a single file that contains Python code, a package is a collection of related modules, and a library is a collection of code designed to be reused by multiple programs or scripts.

Overview

File Systems, Storage & Files

Concept: File Systems

File Storage

Concept: File Storage

Files