Subscribe Contact

Home  »  Chapter 8 : Files
Files & File Systems

Overview

To begin learning Python Programming, we need to start by establishing a solid foundation of understanding the file systems, file storage, and files. You are likely at least partially familiar with these fundamental concepts. As a programmer, though, you need to have a more detailed grasp of these concepts and how to work with them directly. To that end, on this page, we will invest some time looking at each and then on the next page we will focus on files specific to Python that we will work with within our file systems, file storage and files.


Page Menu: 

File Systems, Storage & Files

What happens if you were to open Microsoft Word, create a new document, type some text and save the document? What does save mean? That may seem like an obvious question; it saves my document on my computer. Right? Sure, but how? Where? What does the file look like? How do I find my document after I close Word and want to find it again? How do I make a backup copy of my document somewhere else to avoid losing it? What if I want to share my document with someone else? There are many more considerations about my document than one might think. As programmers, we must be able to answer all of these questions and more importantly, handle all of these tasks in the programs we write.

Concept: File Systems

File Systems control how data is stored and retrieved. There are various types of file systems, each designed for specific purposes, such as local storage (hard) drive file systems (like those used on Windows, Mac, and Linux computers for the local storage (hard) drives), network file systems, cloud-based file systems, etc. File systems control file names, file sizes, directories (folders), and access to those resources.

The first consideration is the operating system. While the concept of working with files is essentially the same in the Windows, Macintosh, and Linux operating systems, there are some differences that programmers need to be aware of when working with files in the file systems in those operating systems. For example, if I save a Word document like the example above, where is it saved? Many software packages, like Microsoft Word, have a default location where it will save files. On a Windows computer, it is often in a folder called Documents--easy. Well, it is easy, except who owns that folder? The user who was logged in at the time the file was saved. What if a different user logs into that same computer and needs to access my document? They won't find it in their Documents folder because it was saved in my Documents folder previously. Confusing. All of this leads us to several important concepts to review.

Example: Windows Computer

Let's take a Windows computer as an example. A user opens an application (Microsoft Word, for example). The application communicates with the operating system (Windows), which allocates memory (Random Access Memory (RAM)) and storage (hard drive) space to the application. When the user saves a document (file), and specifies a file name and location where they want it stored, the application communicates with the operating system and requests storage space at the requested location with the requested file name to store the file. The operating system uses the file system to arrange the storage location of the document (file), which is on the hardware (hard disk, for example). Later, after the user has closed the application when they want to open the document, they open the application again and use the application's file management tools (File - Open) to navigate to the location where their document is stored. The operating system uses the file system to locate the file and provides a connection to the file to the application so the user can see and work with it.


In this scenario, inside our application (MS Word in this example), to initially save our document, we would use the Save As dialog, which might look like this:



From this dialog, let's focus on a few key concepts that will lead us into working with files in Python. Those concepts are file names, file paths, and File Types.

File names are the labels we give to our files that we can use to identify and locate our files. Depending on the operating system in use, there are rules for file names, that is, what characters you can use in the name, the length of the name, the file extensions, whether file names are upper case or lower case, etc. Also, since the file name is our identifier for an individual file, file names must be unique from one file to the next.

File Paths indicate where a file was stored in the file system. Every file in the file system has a file path, like the above example, that we can follow to locate the file. Continuing with our example from above, we would write our file path to the Word document as C:\Users\John\Documents\Word\Example\ExampleDocument.docx. Notice that this is a full path, including the root (on a Windows computer, this is usually C:\). In more technical terms, we call this the absolute path. We can also see this visually like this ->

We can also consider the position of a file relative to the location where the application was installed or where its current working directory is located. The current working directory is the location our application focuses on at any given moment. For example, by default, Microsoft Word is focused on my Documents directory. So, instead of requiring the full path, Word can reference our example file by a path of Word\Example\ExcampleDocument.docx because that is the relative path from Word's current working directory to the file.







Shortly we will see that when writing code in Python, we will be able to work with files similarly, by their absolute path or relative paths. Before we get to that, though, let's take a look at the third concept from our screenshot above: file types.

File Storage

Concept: File Storage

File Storage is the process and mechanism of storing and organizing digital files on storage devices. It involves allocating space, managing file structures, and facilitating access to stored files. File storage systems determine how data is physically stored, retrieved, and manipulated, ensuring efficient and reliable management of files within a computer system.

A primary purpose of a file system is to provide a structured approach to storing, retrieving, and manipulatng digital files efficiently. File storages are the underlying mechanisms that dictate how files are physically stored and accessed on storage devices. Various technologies exist for storying digital files, such as hard disk drives, solid-state drives, network attached storage, cloud storage, and optical storage, which are used to implement file storage solutions, each with its own characteristics and advantages.

Here is a list of some of the major file storage technologies:

Understanding different types of file storage is essential for working with computer systems. Each storage technology has its advantages and limitations, catering to diverse needs and use cases. By selecting the appropriate file storage, programmers can optimize their applications' data management, performance, and reliability.

Files

Concept: File

A file in a computer is a collection of information stored on your computer's hard drive or other storage devices, such as a USB flash drive. This information can include text, pictures, music, videos, or other data. Each file is given a named, making it easy to find and access the file later on. Think of it like a physical file folder where you can store documents, except on a computer it is all digital. Files can be created, modified, deleted, and moved around on your computer's storage devices, making them a fundamental part of computer use.

There are hundreds of different types of files in computing. We classify file types in several ways, such as proprietary (file types created and controlled by a specific company or group), open standard (those that anyone can create and share), binary (those whose contents are not human readable, they are only readable by specific software or systems), text (those that are human-readable, sometimes referred to as plain text or ASCII text) and many others.

Concept: Encoding & Decoding

Encoding is setting characters (alphabetic characters, numbers, and symbols) into specialized formats for efficient storage and transmission. Decoding is the opposite process of returning characters to their original sequence. There are many proprietary encoding schemes and formats, such as the encoding scheme of a Microsoft Word Document (.docx), an Adobe Photoshop image file (.psd), etc. Other encoding schemes are open standards such as ASCII, UTF-8, XML, and JSON, which are used as common formats for sharing and saving data.

We often group files by type based on the type of application or use of those files. We often use file extensions as indicators of the type of file we're dealing with. Below is a brief list of common file types. For a more comprehensive list of file types, see the List of File Formats Wikipedia page.

Grouping Common File Extensions & Formats
Text ASCII, .html, .txt, UTF-8, .xml,
Compressed .7z, .gzip, .rar, .tar, .zip,
Still Images .bmp, .gif, .jpg, .png, .tiff
Moving Images .avi, .mov, .mp4, .mpeg
Sounds .aiff, .mp3, .mxf, .wav
Data Files .csv, .json, .xml
Databases .accdb, .db, .dbf, .mdb, .mdf
Souce Code Files .asm, .c, .cpp, .cs, .go, .java, .js, .php, .py, .r, .sh,


Text vs Binary Files: One of the critical distinctions programmers need to understand is the difference between a text file and a binary file. The simple explanation is that text files contain human-readable characters while binary files do not. But what does that mean? Let's consider a Microsoft Word Document file with a file extension of .docx. That file type is a binary file, containing characters (data) in it that are not readable by humans. This is because the Word application stores a great deal of information in the file for formatting information, user data, etc.

Consider this example: If I create a new document in MS Word, type Hello and save it as Hello.docx. The only thing I put in that document is one word which looks like this when I have that document open in MS Word:



However, if I were to open this document in a plain text editor, it would look like this:



The plain text editor used in the above example is Notepad++. When you open a file with it, it does its best to interpret the contents of the file. In this case, the MS Word .docx file is binary, as indicated above, so Notepadd++ detects many characters in the file, but because it's binary and not plain text, it displays what it can, which ends up looking like strange characters.

Remember, I only typed "Hello" in the Word document, but notice at the bottom of the Notepad++ screen there are 12,189 characters and 72 lines in the file. If all I did was type one word, what is all that? As a proprietary file, Microsoft embeds a great deal of information in our documents to keep our formatting information (fonts, colors, etc.) as well as user information and other data. It stores all of that in its proprietary binary format. Plain text editors are not able to interpret all of that information.

If we reverse this and create a new file in Notepad++, type "Hello" and save it as Hello.txt, like this:



The result is a plain text file, not binary. So, if I open Hello.txt in any other editor, including MS Word, the only content is the word Hello; there are no additional characters in the file.



As we explore file handling in Python, these concepts will be important because we will programmatically work with file systems, locations, and types directly in the programming language. Rather than relying on an application to handle files, we will replace the application and handle files ourselves.

Python Files

Python Scripts

A Python script is a set of instructions written in the Python programming language that a computer can read and execute. Think of it like a recipe for a computer, telling it what to do and how to do it. Python scripts can perform various tasks, such as manipulating data, automating repetitive tasks, or creating complex applications. To run a Python script, you need a Python interpreter installed on your computer, which reads the instructions in the script and executes them. Python is a popular language for scripting because it is relatively easy to learn yet powerful enough to handle many tasks. Once you have written a Python script, you can save it as a file on your computer and execute it whenever you need it to perform its job.

Python Modules

A Python module is a file containing Python code that defines functions, variables, and classes used by other Python programs. Think of it like a toolbox that contains specific tools you can use to solve particular problems. By importing a module into your Python program, you gain access to all the functionality defined in the module without having to write it from scratch. Python has many built-in modules, such as "os" for interacting with the operating system, "datetime" for working with dates and times, and "math" for performing mathematical operations. Additionally, many third-party modules are available that you can install and use in your Python programs to extend their functionality. Utilizing modules allows you to write more efficient and organized code, since you don't have to reinvent the wheel whenever you need to solve a particular problem.

Python Packages

In Python, a package is a way of organizing related modules together in a hierarchical manner. Think of it like a folder containing multiple Python files, each defining functionality that can be used in your program. A package can include other packages, and modules, making it a powerful way to organize and structure your Python code. Python packages often have a unique name, such as "numpy" for numerical computing or "pandas" for data analysis. By importing a package into your Python program, you gain access to all the modules and functionality defined in the package. This allows you to write more complex programs and leverage the power of existing Python libraries without having to write everything from scratch. Overall, Python packages are a key language feature that make it easy to organize and reuse your code.

Python Libraries

In Python, a library is a collection of modules, packages, and functions that provide specific functionality for your Python programs. Think of it like a set of tools that you can use to solve various problems. Python has a vast and diverse library ecosystem, with thousands of third-party libraries available. These libraries cover various domains, from scientific computing to web development to machine learning. By importing a library into your Python program, you can leverage the functionality provided by the library to write more powerful and efficient code. Some popular Python libraries include NumPy for numerical computing, Pandas for data analysis, and Matplotlib for data visualization. Overall, Python libraries are an essential part of the language making it easy to solve complex problems and create sophisticated applications.

Python Files Summary

In summary, a script is a standalone program; a module is a single file that contains Python code, a package is a collection of related modules, and a library is a collection of code designed to be reused by multiple programs or scripts.



 


«  Previous : Files
Next : Files : Python File Object  »



© 2023 John Gordon
Cascade Street Publishing, LLC