Table of Contents » Chapter 2 : : Files : Files & File Systems
Files & File Systems
Contents
Overview
Today, data is everywhere and as a programmer, it is important to be aware of how data is stored, organized, and managed. This chapter introduces you to the fundamental concepts of file systems, file storage, and files, and how these concepts are applied in Python. You may already be familiar with some of this information, but as a programmer, you will need to develop a more detailed understanding of these concepts.
There are several key concepts to consider related to files and file systems:
Operating systems (OS) are the backbone of our computing experiences. They are sophisticated software that manage a computer's hardware resources, provide an environment for software programs to run, and offer essential services for computer programs. An operating system acts as a bridge between the user and the computer's physical hardware, making it possible for us to interact with computers in a meaningful way.
When it comes to operating systems, three major players dominate the scene: Windows, macOS, and Linux. Each has its unique features and caters to different user needs.
Windows developed by Microsoft, is the most widely used operating system in the world, especially among personal computer users. Known for its user-friendly interface, Windows OS supports a vast range of software applications, games, and utilities. It's popular in both home and office environments due to its versatility and ease of use. Over the years, Windows has evolved significantly, with versions like Windows XP, Windows 7, Windows 10, and the latest, Windows 11, each bringing new features and improvements. Windows is particularly favored in the business world for its compatibility with a wide range of business applications.
macOS is the operating system that powers Apple's line of Mac computers. It's known for its sleek interface, robust performance, and tight integration with Apple's ecosystem of services and products, like the iPhone, iPad, and Apple Watch. macOS excels in graphic design, video editing, and music production, making it a popular choice among creative professionals. The OS is praised for its stability and security features, which are deeply ingrained in its Unix-based architecture. Over the years, macOS has seen a series of updates, with names inspired by California landmarks, like Mavericks, Yosemite, Mojave, and most recently, Monterey.
Linux, unlike Windows and macOS, is an open-source operating system. This means its source code is freely available for anyone to view, modify, and distribute. Linux is known for its flexibility, security, and stability. It's used extensively in server environments and by professionals who prefer a customizable and robust operating system. There are many distributions (distros) of Linux, like Ubuntu, Fedora, and Debian, each offering different interfaces and user experiences. Linux is favored by developers and system administrators due to its powerful command-line interface and the vast array of tools available for programming, network management, and system administration.
Python is a cross-platform language, which means it can run on Windows, macOS, and Linux. Each operating system provides different environments and tools for Python development, but the core Python language remains largely the same across these platforms. This cross-platform nature makes Python a versatile language for developing a wide range of applications, from simple scripts to complex, large-scale applications. As you explore Python programming, your choice of operating system may depend on your personal preferences, the kind of software you want to develop, or the environment in which your software will run. Regardless of your choice, Python's flexibility allows you to develop and deploy your applications across any of these operating systems.
When you create a document in Microsoft Word and you use the Save command, what actually happens? On the surface we'd say, it saves the document. Right. But how? And where? And what does the file look like when it is saved? How do you find it in your computer? How do you change its name if you need to? And how do you send a file to someone else? And how do you back up a file to ensure you don't loose it? And, as a programmer, how do you use Python to read the contents of files? How do you create files and write content to them using Python? We must be able to answer all of these questions and more importantly, handle all of these tasks in the programs we write.
While the concept of working with files is essentially the same in the Windows, Macintosh, and Linux operating systems, there are some differences that programmers need to be aware of when working with files in the file systems in those operating systems. For example, if I save a Word document like the example above, where is it saved? Many software applications, like Microsoft Word, have a default location where it will save files. On a Windows computer, it is often in a folder called Documents--easy. Well, it is easy, except who owns that folder? The user who was logged in at the time the file was saved. What if a different user logs into that same computer and needs to access my document? They won't find it in their Documents folder because it was saved in my Documents folder previously. Confusing. All of this leads us to several important concepts to review.
Example: Windows Computer
Let's take a Windows computer as an example. A user opens an application (Microsoft Word, for example). The application communicates with the operating system (Windows), which allocates memory (Random Access Memory (RAM)) and storage (hard drive) space to the application. When the user saves a document (file), and specifies a file name and location where they want it stored, the application communicates with the operating system and requests storage space at the requested location with the requested file name to store the file. The operating system uses the file system to arrange the storage location of the document (file), which is on the hardware (hard disk, for example). Later, after the user has closed the application when they want to open the document, they open the application again and use the application's file management tools (File - Open) to navigate to the location where their document is stored. The operating system uses the file system to locate the file and provides a connection to the file to the application so the user can see and work with it.

In this scenario, inside our application (MS Word in this example), to initially save our document, we would use the Save As dialog, which might look like this:

From this dialog, let's focus on a few key concepts that will lead us into working with files in Python. Those concepts are file names, file paths, and File Types.
File names are the labels we give to our files that we can use to identify and locate our files. Depending on the operating system in use, there are rules for file names, that is, what characters you can use in the name, the length of the name, the file extensions, whether file names are upper case or lower case, etc. Also, since the file name is our identifier for an individual file, file names must be unique from one file to the next.
File Paths indicate where a file was stored in the file system. Every file in the file system has a file path, like the above example, that we can follow to locate the file. Continuing with our example from above, we would write our file path to the Word document as C:\Users\John\Documents\Word\Example\ExampleDocument.docx. Notice that this is a full path, including the root (on a Windows computer, this is usually C:\). In more technical terms, we call this the absolute path. We can also see this visually like this ->

We can also consider the position of a file relative to the location where the application was installed or where its current working directory is located. The current working directory is the location our application focuses on at any given moment. For example, by default, Microsoft Word is focused on my Documents directory. So, instead of requiring the full path, Word can reference our example file by a path of Word\Example\ExcampleDocument.docx because that is the relative path from Word's current working directory to the file.
Shortly we will see that when writing code in Python, we will be able to work with files similarly, by their absolute path or relative paths. Before we get to that, though, let's take a look at the third concept from our screenshot above: file types.

One of the primary purposes of a file system is to provide a structured approach to storing, retrieving, and manipulatng digital files efficiently. File storages are the underlying mechanisms that dictate how files are physically stored and accessed on storage devices. Various technologies exist for storying digital files, such as hard disk drives, solid-state drives, network attached storage, cloud storage, and optical storage, which are used to implement file storage solutions, each with its own characteristics and advantages.
Here is a list of some of the major file storage technologies:
- Magnetic Hard Disk Drives (HDDs): Magnetic hard disk drives have been a dominant storage medium for decades. These drives utilize magnetic platters coated with a ferromagnetic material to store data. The read/write heads, positioned above the spinning platters, magnetically encode and retrieve data. HDDs offer large storage capacities, making them ideal for storing vast amounts of data. However, they are relatively slow compared to other storage technologies, and mechanical failures can occur.
- Solid-State Drives (SSDs): Solid-state drives have revolutionized file storage by eliminating mechanical components. Instead of magnetic platters, SSDs utilize flash memory chips to store data. This technology provides lightning-fast read and write speeds, significantly improving performance compared to HDDs. SSDs are highly reliable, shock-resistant, and consume less power. Although they are more expensive per gigabyte than HDDs, their speed and reliability make them popular for personal computers and data centers.
- Network Attached Storage (NAS): Network Attached Storage is a specialized file storage system for network environments. NAS devices are independent storage units connected to a local network, allowing multiple users to access and share files. They provide a centralized storage solution, offering data redundancy, access control, and advanced features like remote access and data backup. NAS devices are used in homes, small offices, and enterprise environments where data sharing and collaboration are crucial.
- Cloud Storage: Cloud storage has gained immense popularity due to its convenience and scalability. It enables users to store and access their files remotely through an internet connection. Cloud storage providers maintain vast data centers, where data is securely stored and replicated across multiple servers. Users can access their files from any device with an internet connection, making it an ideal choice for seamless file sharing and synchronization. Cloud storage services often provide additional features like version control, data encryption, and integration with other applications.
- Optical Storage: Although less prevalent in modern computing, optical storage still holds significance in certain areas. It utilizes optical discs, such as CDs, DVDs, and Blu-ray discs, for data storage. Optical storage offers high-capacity, non-volatile storage, making it suitable for archiving and distributing large volumes of data. However, it has limitations regarding read/write speeds and rewritability.
Understanding different types of file storage is essential for working with computer systems. Each storage technology has its advantages and limitations, catering to diverse needs and use cases. By selecting the appropriate file storage, programmers can optimize their applications' data management, performance, and reliability.
There are hundreds of different types of files in computing. We classify file types in several ways, such as proprietary (file types created and controlled by a specific company or group), open standard (those that anyone can create and share), binary (those whose contents are not human readable, they are only readable by specific software or systems), text (those that are human-readable, sometimes referred to as plain text or ASCII text) and many others.
We often group files by type based on the type of application or use of those files. We often use file extensions as indicators of the type of file we're dealing with. Below is a brief list of common file types. For a more comprehensive list of file types, see the List of File Formats Wikipedia page.
Grouping | Common File Extensions & Formats |
---|---|
Text | ASCII, .html, .txt, UTF-8, .xml, |
Compressed | .7z, .gzip, .rar, .tar, .zip, |
Still Images | .bmp, .gif, .jpg, .png, .tiff |
Moving Images | .avi, .mov, .mp4, .mpeg |
Sounds | .aiff, .mp3, .mxf, .wav |
Data Files | .csv, .json, .xml |
Databases | .accdb, .db, .dbf, .mdb, .mdf |
Souce Code Files | .asm, .c, .cpp, .cs, .go, .java, .js, .php, .py, .r, .sh, |
Text vs Binary Files: One of the critical distinctions programmers need to understand is the difference between a text file and a binary file. The simple explanation is that text files contain human-readable characters while binary files do not. But what does that mean? Let's consider a Microsoft Word Document file with a file extension of .docx. That file type is a binary file, containing characters (data) in it that are not readable by humans. This is because the Word application stores a great deal of information in the file for formatting information, user data, etc.
Consider this example: If I create a new document in MS Word, type Hello and save it as Hello.docx. The only thing I put in that document is one word which looks like this when I have that document open in MS Word:

However, if I were to open this document in a plain text editor, it would look like this:

The plain text editor used in the above example is Notepad++. When you open a file with it, it does its best to interpret the contents of the file. In this case, the MS Word .docx file is binary, as indicated above, so Notepadd++ detects many characters in the file, but because it's binary and not plain text, it displays what it can, which ends up looking like strange characters.
Remember, I only typed "Hello" in the Word document, but notice at the bottom of the Notepad++ screen there are 12,189 characters and 72 lines in the file. If all I did was type one word, what is all that? As a proprietary file, Microsoft embeds a great deal of information in our documents to keep our formatting information (fonts, colors, etc.) as well as user information and other data. It stores all of that in its proprietary binary format. Plain text editors are not able to interpret all of that information.
If we reverse this and create a new file in Notepad++, type "Hello" and save it as Hello.txt, like this:

The result is a plain text file, not binary. So, if I open Hello.txt in any other editor, including MS Word, the only content is the word Hello; there are no additional characters in the file.

As we explore file handling in Python, these concepts will be important because we will programmatically work with file systems, locations, and types directly in the programming language. Rather than relying on an application to handle files, we will replace the application and handle files ourselves.
One of the primary tasks of an operating system is providing security and permissions to the file system. Computer users are granted various permission levels to directories and files as needed based on their needs. On a computer that you own you have access to everything generally. However, in businesses and other multi-user systems, System Administrators generally have full access and then there are various user levels in addition to others. These user levels affect a user's ability to access, read and/or write files to various locations on the computer, on a computer network, or cloud system. When we run our Python programs, these access levels also apply to the program as well.
For example, let's say that a programmer named Bob writes a Python program that opens, reads and writes a file in his Documents folder. If Bob runs the program and he has a connection and access permissions to his Documents folder then the Python program will be able to interact with the file. This is because the Python program adheres to the user permissions of the user running the program.

However, if Bob were to change the Python program to try to access Sally's Documents directory, to which he does not have access permissions, then the Python program will fail because Bob does not have proper permissions to that directory and file.

As programmers, we need to take into account file security and permissions when writing file handling code. We will explore more details about permissions later.