My reviewers pointed out the "Bootstrapping an Infrastructure in 2025" article could use some clarification.
The first part of setting up a cluster has these parts:
Version Control - CVS, track who made changes, backout Gold Server - only require changes in one place Host Install Tools - install hosts without human intervention Ad Hoc Change Tools - 'expect', to recover from early or big problems
"Version Control" these days is Git.
"Host Install Tools" are tools so that when a new computer is booted, it's setup with a base operating system, so it can become a functioning member of the cluster. In other words, PXE. In cloud world it's like AMI or Packer or Docker images.
A "Gold Server" is a server that's central to managing the cluster. Instead of making changes to each individual service machine, an admin registers the change centrally, then lets the cluster make the changes happen. "Ad Hoc Change Tools" is ssh (manual changes) vs the standard path. Ad hoc changes are flexible but dangerous.
When the paper was written, computers were individual little snowflakes. To fix a database server, you'd connect using ssh to the server, figure out what's wrong, then run commands or edit files on the server to fix the issue. This method is fun, effective, and flexible, but breaks down almost instantly. You don't remember what you changed. Other people can change things randomly, and also forget. The system doesn't crash per se, but mostly works. This is worse. The system works except sometimes it acts really strangely and causes an enormous amount of effort to fix.
The Bootstrapping
paper recommends another way to make changes:
1) setup a change in the central, "gold" server. Example:
database servers should have "postgres" process running
2) from the gold server, trigger some or all other servers to check for changes
3) when a database server checks the central server, it'll find the "make sure postgres is running" change, and execute that change.
This has a lot of advantages. The major one is "eventual consistency". Changes eventually make it out to all the correct machines.
In a medium or large cluster, very often changes fail. The server isn't up, or is too busy, or something else is going on. A centrally-pushed change is applied to only a subset of servers.
In the "pull" style, each server periodically polls the central gold server for changes. Changes set up once, in the central server, eventually are applied to the appropriate machines.
Top comments (0)