This post is a work in progress.
Inspired by a recent purchase of the Red Book, which provides a curated list of important papers around database systems, I’ve decided to begin assembling a list of important papers in distributed systems. Similar to the Red Book, I’ve broken each group of papers out into a series of categories, each highlighting a progression of related ideas over time focused in a specific area of research within the field.
Keeping the tradition of the Red Book, I’ve included both papers which resulted in very successful systems and/or techniques, as well as papers which introduced a concept which was either immediately dismissed or proven incorrect. This emphasizes the progression of ideas which lead to the development of these systems.
Consensus
The problems of establishing consensus in a distributed system.
Consistency
Types of consistency, and practical solutions to solving ensuring atomic operations across a set of replicas.
- Highly Available Transactions: Virtues and Limitations Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica 2013
- Consistency Tradeoffs in Modern Distributed Database System Design Daniel J. Abadi 2012
- CAP Twelve Years Later: How the “Rules” Have Changed Eric Brewer 2012
- Calvin: Fast Distributed Transactions for Partitioned Database Systems Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi 2012
- Optimistic Replication Yasushi Saito and Marc Shapiro 2005
- Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Seth Gilbert, Nancy Lynch 2002
- Harvest, Yield, and Scalable Tolerant Systems Armando Fox, Eric A. Brewer 1999
- Linearizability: A Correctness Condition for Concurrent Objects Maurice P. Herlihy, Jeannette M. Wing 1990
- Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport 1978
Conflict-free data structures
Studies on data structures which do not require coordination to ensure convergence to the correct value.
Distributed programming
Languages aimed towards disorderly distributed programming as well as case studies on problems in distributed programming.
- Logic and Lattices for Distributed Programming Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier 2012
- Dedalus: Datalog in Time and Space Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears 2011
- MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean, Sanjay Ghemawat 2004
- A Note On Distributed Computing Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant 1994
Systems
Implemented and theoretical distributed systems.
- Spanner: Google’s Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,Christopher Taylor, Ruth Wang, Dale Woodford 2012
- ZooKeeper: Wait-free coordination for Internet-scale systems Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed 2010
- A History Of The Virtual Synchrony Replication Model Ken Birman 2010
- Cassandra — A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik 2009
- Dynamo: Amazon’s Highly Available Key-Value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels 2007
- Stasis: Flexible Transactional Storage Russell Sears, Eric Brewer 2006
- Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber 2006
- The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 2003
- Lessons from Giant-Scale Services Eric A. Brewer 2001
- Towards Robust Distributed Systems Eric A. Brewer 2000
- Cluster-Based Scalable Network Services Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier 1997
- The Process Group Approach to Reliable Distributed Computing Ken Birman 1993
Books
Overviews and details covering many of the above papers and concepts compiled into single resources.
I’m hoping to make this into a living document, so please submit pull requests or leave comments!