Home » Chapter 8 : Files
Files & File Systems
To begin learning Python Programming, we need to start by establishing a solid foundation of understanding the file systems, file storage, and files. You are likely at least partially familiar with these fundamental concepts. As a programmer, though, you need to have a more detailed grasp of these concepts and how to work with them directly. To that end, on this page, we will invest some time looking at each and then on the next page we will focus on files specific to Python that we will work with within our file systems, file storage and files.
What happens if you were to open Microsoft Word, create a new document, type some text and save the document? What does save mean? That may seem like an obvious question; it saves my document on my computer. Right? Sure, but how? Where? What does the file look like? How do I find my document after I close Word and want to find it again? How do I make a backup copy of my document somewhere else to avoid losing it? What if I want to share my document with someone else? There are many more considerations about my document than one might think. As programmers, we must be able to answer all of these questions and more importantly, handle all of these tasks in the programs we write.
The first consideration is the operating system. While the concept of working with files is essentially the same in the Windows, Macintosh, and Linux operating systems, there are some differences that programmers need to be aware of when working with files in the file systems in those operating systems. For example, if I save a Word document like the example above, where is it saved? Many software packages, like Microsoft Word, have a default location where it will save files. On a Windows computer, it is often in a folder called Documents--easy. Well, it is easy, except who owns that folder? The user who was logged in at the time the file was saved. What if a different user logs into that same computer and needs to access my document? They won't find it in their Documents folder because it was saved in my Documents folder previously. Confusing. All of this leads us to several important concepts to review.
File names are the labels we give to our files that we can use to identify and locate our files. Depending on the operating system in use, there are rules for file names, that is, what characters you can use in the name, the length of the name, the file extensions, whether file names are upper case or lower case, etc. Also, since the file name is our identifier for an individual file, file names must be unique from one file to the next.
File Paths indicate where a file was stored in the file system. Every file in the file system has a file path, like the above example, that we can follow to locate the file. Continuing with our example from above, we would write our file path to the Word document as C:\Users\John\Documents\Word\Example\ExampleDocument.docx. Notice that this is a full path, including the root (on a Windows computer, this is usually C:\). In more technical terms, we call this the absolute path. We can also see this visually like this ->
We can also consider the position of a file relative to the location where the application was installed or where its current working directory is located. The current working directory is the location our application focuses on at any given moment. For example, by default, Microsoft Word is focused on my Documents directory. So, instead of requiring the full path, Word can reference our example file by a path of Word\Example\ExcampleDocument.docx because that is the relative path from Word's current working directory to the file.
Shortly we will see that when writing code in Python, we will be able to work with files similarly, by their absolute path or relative paths. Before we get to that, though, let's take a look at the third concept from our screenshot above: file types.
A primary purpose of a file system is to provide a structured approach to storing, retrieving, and manipulatng digital files efficiently. File storages are the underlying mechanisms that dictate how files are physically stored and accessed on storage devices. Various technologies exist for storying digital files, such as hard disk drives, solid-state drives, network attached storage, cloud storage, and optical storage, which are used to implement file storage solutions, each with its own characteristics and advantages.
Here is a list of some of the major file storage technologies:
There are hundreds of different types of files in computing. We classify file types in several ways, such as proprietary (file types created and controlled by a specific company or group), open standard (those that anyone can create and share), binary (those whose contents are not human readable, they are only readable by specific software or systems), text (those that are human-readable, sometimes referred to as plain text or ASCII text) and many others.
We often group files by type based on the type of application or use of those files. We often use file extensions as indicators of the type of file we're dealing with. Below is a brief list of common file types. For a more comprehensive list of file types, see the List of File Formats Wikipedia page.
Grouping | Common File Extensions & Formats |
---|---|
Text | ASCII, .html, .txt, UTF-8, .xml, |
Compressed | .7z, .gzip, .rar, .tar, .zip, |
Still Images | .bmp, .gif, .jpg, .png, .tiff |
Moving Images | .avi, .mov, .mp4, .mpeg |
Sounds | .aiff, .mp3, .mxf, .wav |
Data Files | .csv, .json, .xml |
Databases | .accdb, .db, .dbf, .mdb, .mdf |
Souce Code Files | .asm, .c, .cpp, .cs, .go, .java, .js, .php, .py, .r, .sh, |
A Python script is a set of instructions written in the Python programming language that a computer can read and execute. Think of it like a recipe for a computer, telling it what to do and how to do it. Python scripts can perform various tasks, such as manipulating data, automating repetitive tasks, or creating complex applications. To run a Python script, you need a Python interpreter installed on your computer, which reads the instructions in the script and executes them. Python is a popular language for scripting because it is relatively easy to learn yet powerful enough to handle many tasks. Once you have written a Python script, you can save it as a file on your computer and execute it whenever you need it to perform its job.
A Python module is a file containing Python code that defines functions, variables, and classes used by other Python programs. Think of it like a toolbox that contains specific tools you can use to solve particular problems. By importing a module into your Python program, you gain access to all the functionality defined in the module without having to write it from scratch. Python has many built-in modules, such as "os" for interacting with the operating system, "datetime" for working with dates and times, and "math" for performing mathematical operations. Additionally, many third-party modules are available that you can install and use in your Python programs to extend their functionality. Utilizing modules allows you to write more efficient and organized code, since you don't have to reinvent the wheel whenever you need to solve a particular problem.
In Python, a package is a way of organizing related modules together in a hierarchical manner. Think of it like a folder containing multiple Python files, each defining functionality that can be used in your program. A package can include other packages, and modules, making it a powerful way to organize and structure your Python code. Python packages often have a unique name, such as "numpy" for numerical computing or "pandas" for data analysis. By importing a package into your Python program, you gain access to all the modules and functionality defined in the package. This allows you to write more complex programs and leverage the power of existing Python libraries without having to write everything from scratch. Overall, Python packages are a key language feature that make it easy to organize and reuse your code.
In Python, a library is a collection of modules, packages, and functions that provide specific functionality for your Python programs. Think of it like a set of tools that you can use to solve various problems. Python has a vast and diverse library ecosystem, with thousands of third-party libraries available. These libraries cover various domains, from scientific computing to web development to machine learning. By importing a library into your Python program, you can leverage the functionality provided by the library to write more powerful and efficient code. Some popular Python libraries include NumPy for numerical computing, Pandas for data analysis, and Matplotlib for data visualization. Overall, Python libraries are an essential part of the language making it easy to solve complex problems and create sophisticated applications.
In summary, a script is a standalone program; a module is a single file that contains Python code, a package is a collection of related modules, and a library is a collection of code designed to be reused by multiple programs or scripts.