Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 70 additions & 35 deletions Python/Module5_OddsAndEnds/WorkingWithFiles.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"metadata": {},
"source": [
"# Working with Files\n",
"This section will discuss the best practices for writing Python code that involves reading from and writing to files. We will learn about the built-in `pathlib.Path` object, which will help to ensure that the code that we write is portable across operating systems (OS) (i.e. Windows vs MacOS vs Linux). We will also be introduced to a so-called context manager, which will permit us to read-from and write-to a file safely; by \"safely\" we mean that we will be assured that any file that we open will eventually be closed properly, so that it will not be corrupted even in the event that our code hits an error."
"This section will discuss the best practices for writing Python code that involves reading from and writing to files. We will learn about the built-in `pathlib.Path` object, which will help to ensure that the code that we write is portable across operating systems (OS) (e.g. Windows, MacOS, Linux). We will also be introduced to a *context manager*, `open`, which will permit us to read-from and write-to a file safely; by \"safely\" we mean that we will be assured that any file that we open will eventually be closed properly, so that it will not be corrupted even in the event that our code hits an error. Lastly, we will briefly encounter the `pickle` module which allows you to save (or \"pickle\") and load Python objects to and from your computer's file system. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistent voice, this should say "which allows us to save (or "pickle") and load Python objects to and from our computer's file system"

]
},
{
Expand All @@ -26,7 +26,7 @@
"\n",
"### pathlib.Path\n",
"\n",
"The built-in [pathlib module](https://docs.python.org/3/library/pathlib.html) provides a number of classes that make it easy to work with file system paths across operating systems. We will limit our discussion to the `pathlib.Path` class, which will take care of all of our most pressing needs. This class allows us to write all of our path-related code in a single way, and it will convert the path to the operating system-appropriate format for us underneath the hood.\n",
"The standard library's [pathlib module](https://docs.python.org/3/library/pathlib.html) provides a number of classes that make it easy to work with file system paths across operating systems. We will limit our discussion to the `pathlib.Path` class, which will take care of all of our most pressing needs. This class allows us to write all of our path-related code in a single way, and it will convert the path to the operating system-appropriate format for us underneath the hood.\n",
"\n",
"Let's begin by creating a `Path` object that points to the directory containing the present notebook:\n",
"\n",
Expand All @@ -46,7 +46,7 @@
"If I were running on a Linux or MacOS machine, it would have formed a `PosixPath` object instead. Fortunately, we need not worry about these details as these classes handle them for us! The `Path` class has many useful methods for us to leverage. First, see that it conveniently overrides the `/` operator (by implementing a [special method](http://www.pythonlikeyoumeanit.com/Module4_OOP/Special_Methods.html)) so that we can create a path to a subsequent directory. Let's see this in action:\n",
"\n",
"```python\n",
"# creating a path to a file in a subdirectory\n",
"# creating a path to the file 'data1.txt' in the subdirectory 'data'\n",
">>> path_to_data1 = root / \"data\" / \"data1.txt\"\n",
">>> path_to_data1\n",
"WindowsPath('data/data1.txt')\n",
Expand Down Expand Up @@ -87,10 +87,13 @@
"# convert a path-object to a string formatted for the present OS\n",
">>> str(path_to_data1)\n",
"'data\\\\data1.txt'\n",
"```\n",
"\n",
"\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
"\n",
"**Takeaway**: \n",
Expand Down Expand Up @@ -127,7 +130,9 @@
"```python\n",
"# demonstrating the use of the `open` context manager\n",
"\n",
"path_to_file = Path(\".\") / \"file1.txt\"\n",
"# we will write to the file named \"file1.txt\", located \n",
"# in the present directory\n",
"path_to_file = Path(\"file1.txt\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...I had originally wondered why you wanted to do ./file1.txt but I figured you had your reasons 😛

"with open(path_to_file, mode=\"w\") as f:\n",
" # The indented space enters the \"context\" of the open file.\n",
" # Leaving the indented space exist the context of the opened file, forcing\n",
Expand Down Expand Up @@ -176,7 +181,7 @@
"```\n",
"\n",
"### Working with the File Object\n",
"When we invoke `open` to open a file, we gain access to an opened file object. The methods of this file object allow us to write-to and read-from the opened file (assuming that we have utilized the appropriate mode when opening it).\n",
"When we invoke `open` to open a file, the context manager produces an opened file object. The methods of this file object allow us to write-to and read-from the opened file (assuming that we have utilized the appropriate mode when opening it).\n",
"\n",
"```python\n",
"# demonstrating the `read` method of the file object\n",
Expand Down Expand Up @@ -230,7 +235,7 @@
" my_open_file.write(some_text)\n",
"```\n",
"\n",
"Now read in each line of the file and append that line to the list `out`, but *only if that line starts with the letter 'A'*:\n",
"Now let's read in each line of the file and append that line to the list `out`, but *only if that line starts with the letter 'A'* (just to make things a little bit more involved):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say "append them to the list out..." instead? Read in each line and append that line sounds weird.

"\n",
"```python\n",
"with open(\"a_poem.txt\", mode=\"r\") as my_open_file:\n",
Expand All @@ -250,54 +255,82 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example: Writing and Reading NumPy Arrays\n",
"First, note that NumPy's standard binary file type, used to store array data, is known as '.npy' file. The NumPy binary archive format, which stores multiple arrays in one file, is known as the '.npz' format.\n",
"\n",
"Let's save the array `x = np.array([1, 2, 3])` to the binary file (not a text file) \"my_array.npz\".\n",
"# Saving & Loading Python Objects: pickle\n",
"Suppose that you have just populated a dictionary that is serving as a grade book for a course that you are teaching:\n",
"```python\n",
">>> grades = {\"Albert\": 92, \"David\": 85, \"Emmy\": 98, \"Marie\": 79} \n",
"```\n",
"How do you save this dictionary so that you can revisit these grades at a later time? Python's standard library includes the [`pickle`](https://docs.python.org/3/library/pickle.html) module, which provides functions for saving and loading Python objects to disk. Let's \"pickle\" this dictionary, saving it to the file \"grades.pkl\" in our present directory:\n",
"\n",
"```python\n",
"import numpy as np\n",
"x = np.array([1, 2, 3])\n",
"import pickle\n",
"\n",
"# we aren't saving text thus we must open \n",
"# our file in binary-write mode\n",
"with open(\"my_array.npy\", mode=\"wb\") as f:\n",
" np.save(f, x)\n",
"# pickling a dictionary\n",
"with open(\"grades.pkl\", mode=\"wb\") as opened_file:\n",
" pickle.dump(grades, opened_file)\n",
"```\n",
"`pickle.dump` creates a serialized representation of our dictionary, which is then written to our opened file via the file object that we supplied. Note that we open the file in write-binary mode as we are writing binary data and not text data that first needs to be encoded to binary data. Also note that we use the \".pkl\" suffix to indicate that the file is binary data that was written using Python's pickle protocol. Using this suffix is not necessary but is good practice.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to explain what a serialized representation is?

"\n",
"Now let's load this array from the binary file:\n",
"`pickle.load` will unpickle our Python object from disk, permitting us to resume work with our grade book.\n",
"\n",
"```python\n",
"# once again, the file must be opened to read\n",
"# binary data, not text data\n",
"with open(\"my_array.npy\", mode=\"rb\") as f:\n",
" y = np.load(f)\n",
"# unpickling a dictionary\n",
"with open(\"grades.pkl\", mode=\"rb\") as opened_file:\n",
" my_loaded_grades = pickle.load(opened_file)\n",
"```\n",
"\n",
"```python\n",
">>> my_loaded_grades\n",
"{'Albert': 92, 'David': 85, 'Emmy': 98, 'Marie': 79}\n",
"```\n",
"\n",
"`pickle.dump` and `pickle.load` cover the vast majority of our object-pickling needs. A wide range of Python objects can be saved in this way, including functions that we define and instances of custom classes. Please refer to [the official documentation](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled) for a discussion of the Python objects that can and cannot be pickled. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Saving and Loading NumPy Arrays\n",
"NumPy provides its own functions for saving and loading arrays. Although these arrays can be pickled, it is strongly advised to leverage NumPy's file-IO functions. NumPy's standard binary file type used to store array data is known as an '.npy' file. The NumPy binary archive format, which stores multiple arrays in one file, is known as the '.npz' format.\n",
"\n",
"Let's save the array `x = np.array([1, 2, 3])` to the binary file (not a text file) \"my_array.npz\". `numpy.save` and `numpy.load` will save and load arrays, handling all of the file opening and closing for you. Thus there is no need to use a context manager when using these functions.\n",
"\n",
"```python\n",
">>> import numpy as np\n",
">>> x = np.array([1, 2, 3])\n",
"\n",
"# save a numpy array to disk\n",
">>> np.save(\"my_array.npy\", x)\n",
"\n",
"# load the saved array from disk\n",
">>> y = np.load(\"my_array.npy\")\n",
"\n",
">>> y\n",
"array([1, 2, 3])\n",
"```\n",
"\n",
"Finally, let's save multiple arrays to a single archive file \"my_archive.npz\".\n",
"We can use `numpy.savez` to save multiple arrays to a single archive file \"my_archive.npz\". Here we will save three arrays to the archive. We can specify the names of these arrays, via the keyword arguments that we provide, so that we can distinguish them when loading the archive.\n",
"\n",
"```python\n",
"# save three arrays to a numpy archive file\n",
"a0 = np.array([1, 2, 3])\n",
"a1 = np.array([4, 5, 6])\n",
"a2 = np.array([7, 8, 9])\n",
"\n",
"with open(\"my_archive.npz\", mode=\"wb\") as my_archive_file:\n",
" # you can use keyword arguments to specify the names used\n",
" # to store the array in the archive\n",
" np.savez(my_archive_file, array_0=a0, array_1=a1, array_2=a2)\n",
"# we provide the keywords arguments `soil`, `crust`, and `bedrock`,\n",
"# as the names of the respective arrays in the archive.\n",
"np.savez(\"my_archive.npz\", soil=a0, crust=a1, bedrock=a2)\n",
"```\n",
"\n",
"`np.load` can be used as a context manager in lieu of `open`. The file-object that it produces is our archive of numpy arrays, and it provides a dictionary-like interface for accessing these arrays:\n",
"Loading arrays from an archive is slightly more involved than loading a single array; we will want to open our archive file using a context manager and then load the arrays as we see fit. `np.load` can be used as a context manager in lieu of `open`. The file-object that it produces is our archive of numpy arrays, and it provides a dictionary-like interface for accessing these arrays:\n",
"\n",
"```python\n",
"# opening the archive and accessing each array by name\n",
"with np.load(\"my_archive.npz\") as my_archive_file:\n",
" out0 = my_archive_file[\"array_0\"]\n",
" out1 = my_archive_file[\"array_1\"]\n",
" out2 = my_archive_file[\"array_2\"]\n",
" out0 = my_archive_file[\"soil\"]\n",
" out1 = my_archive_file[\"crust\"]\n",
" out2 = my_archive_file[\"bedrock\"]\n",
"```\n",
"```python\n",
">>> out0\n",
Expand All @@ -317,7 +350,9 @@
"\n",
"- [The 'pathlib' module](https://docs.python.org/3/library/pathlib.html)\n",
"- [The 'open' function](https://docs.python.org/3/library/functions.html#open)\n",
"- [Official tutorial: reading and writing files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)"
"- [Official tutorial: reading and writing files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)\n",
"- [The `pickle` module](https://docs.python.org/3/library/pickle.html)\n",
" - [What can and cannot be pickled?](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled)"
]
}
],
Expand Down