Input/Output, generally referred to as I/O, is a term that covers the ways that a computer interacts with the world. Screens, keyboards, files, and networks are all forms of I/O. Data from these devices is sent to and from programs as a stream of characters/bytes.
Unix-like systems treat all external devices as files. We can see these under the /dev
directory. Read this list for a quick description of all the devices we might find under /dev
for OS X.
For example (truncated for brevity):
$ tree /dev /dev ├── disk0 ├── fd │ ├── 0 │ ├── 1 │ ├── 2 │ └── 3 [error opening dir] ├── null ├── stderr -> fd/2 ├── stdin -> fd/0 ├── stdout -> fd/1 ├── tty └── zero
I/O streams are located under the /dev/fd
directory. Files there are given a number, known as a file descriptor. The operating system provides three streams by default. They are:
- Standard input (
/dev/fd/0
) - Standard output (
/dev/fd/1
) - Standard error (
/dev/fd/2
)
They are often abbreviated to stdin, stdout, and stderr respectively. Standard input will default to reading from the keyboard while standard output and standard error both default to writing to the terminal. As can be seen above, /dev/stdout
, /dev/stdin
, and /dev/stderr
are just symlinks to the appropriate file descriptor.
The IO
class
Ruby IO
objects wrap Input/Output streams. The constants STDIN
, STDOUT
, and STDERR
point to IO
objects wrapping the standard streams. By default the global variables $stdin
, $stdout
, and $stderr
point to their respective constants. While the constants should always point to the default streams, the globals can be overwritten to point to another I/O stream such as a file. IO
objects can be written to via puts
and print
.
$stdout.puts 'Hello World'
We’ve all written the shorthand version of this program:
puts 'Hello World'
The bare puts
method is provided by ruby’s Kernel
module that is just an alias to $stdout.puts
. Similarly, IO
objects can be read from via gets
. The bare gets
provided by Kernel
is an alias to $stdin.gets
$stdin
is read-only while $stdout
and $stderr
are write-only.
[1] pry(main)> $stdin.puts 'foo' IOError: not opened for writing [2] pry(main)> $stdout.gets IOError: not opened for reading [3] pry(main)> $stderr.gets IOError: not opened for reading
To create a new IO
object, we need a file descriptor. In this case, 1 (stdout).
[1] pry(main)> io = IO.new(1) => #<IO:fd 1> [2] pry(main)> io.puts 'hello world' hello world => nil
What about creating IOs to other streams? They don’t have constant file descriptors so we first need to get that via IO.sysopen
.
[1] pry(main)> fd = IO.sysopen('/dev/null', 'w+') => 8 [2] pry(main)> dev_null = IO.new(fd) => #<IO:fd 8> [3] pry(main)> dev_null.puts 'hello' => nil [4] pry(main)> dev_null.gets => nil [5] pry(main)> dev_null.close => nil
/dev/null
(sometimes referred to as the “bit bucket” or “black hole”) is the null device on Unix-like systems. Writing to it does nothing and attempting to read from it returns nothing (nil
in Ruby)
First, we get a file descriptor for a stream that that is read/write to the dev/null
device. Then we create an IO
object for this stream so we can interact with it in Ruby. When writing to dev_null
, the text no longer appears on the screen. When reading from dev_null
, we get nil
.
Since everything on a Unix-like system is a file, we can open an IO
stream to a text file in the same way we would open a device. We just create a file descriptor with the path to our file and then create an IO
object for that file descriptor. When we are done with it, we close the stream to flush Ruby’s buffer and release the file descriptor back to the operating system. Attempting read or write from a closed stream will raise an IOError
.
Position
When working with an IO
, we have to keep position in mind. Given that we’ve opened a stream to the following file:
Lorem ipsum dolor sit amet...
and we call gets
on it:
[1] pry(main)> IO.sysopen '/Users/joelquenneville/Desktop/lorem.txt' => 8 [2] pry(main)> lorem = IO.new(8) => #<IO:fd 8> [3] pry(main)> lorem.gets => "Lorem ipsum\n"
it returns the first line of the file and moves the cursor to the next line. If we check the position of the cursor:
[4] pry(main)> lorem.pos => 12
If we call gets
a few more times:
[5] pry(main)> lorem.gets => "dolor\n" [6] pry(main)> lorem.gets => "sit amet...\n" [7] pry(main)> lorem.pos => 30
we can see ruby’s “cursor” has moved. Now that we have read the whole file, what happens if we try to call gets
?
[8] pry(main)> lorem.gets => nil [9] pry(main)> lorem.eof? => true
We see that it returns nil
. We can ask a stream if we have reached “end of file” via eof?
. To return to the beginning of the stream, we can call rewind
.
[10] pry(main)> lorem.rewind => 0 [11] pry(main)> lorem.pos => 0
This can lead to surprises when writing to a stream.
[1] pry(main)> fd = IO.sysopen '/Users/joelquenneville/Desktop/test.txt', 'w+' => 8 [2] pry(main)> io = IO.new(fd) => #<IO:fd 8> [3] pry(main)> io.puts 'hello world' => nil [4] pry(main)> io.puts 'goodbye world' => nil
This stream has the lines “hello world” and “goodbye world”. If we were to attempt to read:
[5] pry(main)> io.gets => nil [6] pry(main)> io.eof? => true
Our cursor is currently at the end of the file. In order to read we would need to first rewind.
[7] pry(main)> io.rewind => 0 [8] pry(main)> io.gets => "hello world\n"
Any write operations in the middle of a stream will overwrite the existing data:
[9] pry(main)> io.pos => 12 [10] pry(main)> io.puts "middle" => nil [11] pry(main)> io.rewind => 0 [12] pry(main)> io.read => "hello world\nmiddle\n world\n"
This kind of behavior is necessary because streams do not get loaded into memory. Instead, only the lines being operated on are loaded. This is very useful because some streams can point to very large files that would be expensive to load in memory all at once. Streams can also be infinite. For example, $stdin
has no end. We can always read more data from it (when it receive the message gets
, it waits for the user to type something).
Sub-classes and Duck-types
Ruby gives us a couple subclasses of IO
that are more specialized for a particular type of IO:
File
Probably the most well known IO
subclass. File
allows us to read/write files without messing around with file descriptors. It also adds file-specific convenience methods such as File#size
, File#chmod
, and File.path
.
The Sockets
Socket docs:
Ruby’s various socket classes inherit all ultimately inherit from IO
.
For example, I have a server running on localhost:3000
[1] pry(main)> require 'socket' => true [2] pry(main)> socket = TCPSocket.new 'localhost', 3000 => #<TCPSocket:fd 10> [3] pry(main)> socket.puts 'GET "/"' => nil [4] pry(main)> socket.gets => "HTTP/1.1 400 Bad Request \r\n"
StringIO
StringIO
allows strings to behave like IO
s. This is useful when we want to pass strings into systems that consume streams. This is common in tests where we might inject a StringIO
instead of reading an actual file from disk. Unlike previous classes showcased, StringIO
does not inherit from IO
.
[1] pry(main)> string_io = StringIO.new('hello world') => #<StringIO:0x007feacb0cd4e8> [2] pry(main)> string_io.gets => "hello world" [3] pry(main)> string_io.puts 'goodby world' => nil [4] pry(main)> string_io.rewind => 0 [5] pry(main)> string_io.read => "hello worldgoodby world\n"
Tempfile
Tempfile
is another class that doesn’t inherit from IO
. Instead, it implements File
‘s interface and deals with temporary files. As such, it can be passed to any object that consumes IO
-like objects.
Putting it all together
Say we have the following class for some command-line program:
class SystemTask def execute puts "preparing to execute" puts "starting first task" first_task puts "starting second task" second_task puts "execution complete" end end
Testing this class causes all these messages to be output, cluttering our results. One approach to solving this problem would be to inject IO
objects instead of calling Kernel#puts
and to pass in a null object in tests.
class SystemTask def initialize(io=$stdout) @io = io end def execute @io.puts "preparing to execute" @io.puts "starting first task" first_task @io.puts "starting second task" second_task @io.puts "execution complete" end end
In production, we can still call SystemTask.new.execute
as before. Now we can pass in our own IO
in tests. This could be a test double, a StringIO
, or a stream to /dev/null
describe SystemTask do # test double it "executes tasks" do io = double("io", puts: nil) system_task = SystemTask.new(io) system_task.execute # expect things to have happened # if we care about the messages, we can also expect on the double expect(io).to have_received(:puts).with("preparing to execute") end # StringIO it "executes tasks" do io = StringIO.new system_task = SystemTask.new(io) system_task.execute # expect things to have happened # if we care about the messages read from the string io io.rewind expect(io.read).to eq "preparing to execute\nstarting first task\nstarting second task\nexecution complete\n" end # /dev/null it "executes tasks" do io = File.open(File::NULL, 'w') system_task = SystemTask.new(io) system_task.execute # expect things to have happened # only use /dev/null if we don't care about the messages end end
Working with disparate APIs
While working on a recent project that pulled reports from several APIs, we noticed some responses were strings, others were CSV documents, and others generate the report and then we had to make a request to another endpoint to download it
The solution was to create an adapter for each API that would get the data and return in a standard format wrapped in some type of IO-like object. A persistor object could then process and persist any of the reports as long as they were formatted the same way and were IO
-like. For example:
class API1Report def fetch # fetch report (comes down as a CSV doc) # process it to get it in a standard format # return standardized report as a Tempfile object end end class API2Report def fetch # fetch report # returns it as a File object end end class Persistor def initialize(report) @report = report end def persist # process and persist the report end end
What’s next
Read an overview of 4.4 BSD’s I/O to develop a deeper understanding of Unix I/O, file descriptors, and devices.
Read the TTY system to understand the relationship between Unix jobs, processes, and I/O with the TTY device.
Practice Ruby I/O by cloning this repo.
Finally, go deeper into Ruby’s I/O in this chapter from Read Ruby.