You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>In addition to classic replication using <i>createHistoryStream</i>, some Scuttlebutt clients implement a more efficient form of replication known as <i>Epidemic broadcast tree replication</i>. This is often referred to by the abbreviation <i>EBT</i>. The implementation of <i>EBT</i> used in Scuttlebutt is loosely based on the push-lazy-push multicast tree protocol, more commonly known as the <i>Plumtree</i> protocol [1].</p>
<p>An <i>EBT</i> session may be initiated once two peers have completed the secret handshake and have established their respective box streams. The peer who acted as the client during the secret handshake takes on the role of the requester, sending an <i>["ebt", "replicate"]</i> request to the connected peer.</p>
<p>The peer who acted as the server during the secret handshake takes on the role of the responder. After having received the replicate request, the responder first validates the arguments to ensure that the version is 3 and the format is "classic". If either of those values are incorrect, the responder terminates the stream with an error.</p>
1126
+
<h4id="vector-clocks">Vector Clocks</h4>
1127
+
<p>The responder then sends a vector clock (also known as a "note" or "control message") to the requester. The vector clock takes the form of a JSON object with one or more key-value pairs. The key of each pair specifies a Scuttlebutt feed identified by the @-prefixed public key of the author. The value of each pair is a signed integer encoding a replicate flag, a receive flag and a feed sequence number.
<p>The requester terminates the stream with an error if any of the received feed identifiers or encoded values are malformed. If the received vector clock is valid, the requester can proceed with decoding the values.</p>
1146
+
<p>The value in each key-value pair of a vector clock encodes a maximum of three data points: a replicate flag, a receive flag and a sequence number. A negative value (usually -1) signals that the responder does not wish to replicate the associated feed, neither sending nor receiving messages. In this scenario, the replicate flag is set to false and both the receive flag and sequence number are irrelevant.</p>
1147
+
<p>A positive value signals that the responder wishes to replicate the associated feed. If the value is positive it should be decoded as follows. First, the JSON number is parsed and converted to a signed integer. Then, the rightmost (lowest order) bit of the number is interpreted as a binary flag with 0 equal to true and 1 equal to false. This flag is referred to as the receive flag. Next, a sign-extending right shift (also called arithmetic right shift) by 1 bit is performed on the binary number, therefore discarding the rightmost (lowest order) bit. The remaining number is then interpreted as a sequence number for the associated feed.</p>
1148
+
<p>If the receive flag is set to true, the peer who sent the vector clock wishes to receive messages for the associated feed. The decoded sequence number defines the latest message held by the peer for that feed.</p>
1149
+
<p>Encoding of a vector clock value involves reversing the steps outlined above. If the peer does not wish to replicate a feed, the value is simply set to -1. Otherwise, the latest sequence number of the associated feed is stored as a signed integer and an arithmetic left shift is performed. The rightmost (lowest order) bit is then set according to the replicate flag as described previously.</p>
1150
+
<tableclass="clock-values">
1151
+
<thead>
1152
+
<tr>
1153
+
<throwspan="2">Encoded</th>
1154
+
<thcolspan="3">Decoded</th>
1155
+
</tr>
1156
+
<tr>
1157
+
<th>Replicate flag</th>
1158
+
<th>Receive flag</th>
1159
+
<th>Sequence</th>
1160
+
</tr>
1161
+
</thead>
1162
+
<tbody>
1163
+
<tr>
1164
+
<td>-1</td>
1165
+
<td>False</td>
1166
+
<td>Irrelevant</td>
1167
+
<td>Irrelevant</td>
1168
+
</tr>
1169
+
<tr>
1170
+
<td>0</td>
1171
+
<td>True</td>
1172
+
<td>True</td>
1173
+
<td>0</td>
1174
+
</tr>
1175
+
<tr>
1176
+
<td>1</td>
1177
+
<td>True</td>
1178
+
<td>False</td>
1179
+
<td>0</td>
1180
+
</tr>
1181
+
<tr>
1182
+
<td>2</td>
1183
+
<td>True</td>
1184
+
<td>True</td>
1185
+
<td>1</td>
1186
+
</tr>
1187
+
<tr>
1188
+
<td>3</td>
1189
+
<td>True</td>
1190
+
<td>False</td>
1191
+
<td>1</td>
1192
+
</tr>
1193
+
<tr>
1194
+
<td>12</td>
1195
+
<td>True</td>
1196
+
<td>True</td>
1197
+
<td>6</td>
1198
+
</tr>
1199
+
<tr>
1200
+
<td>450</td>
1201
+
<td>True</td>
1202
+
<td>True</td>
1203
+
<td>225</td>
1204
+
</tr>
1205
+
</tbody>
1206
+
</table>
1207
+
<p>The requester then sends their own vector clock to the responder. At this point, the initial exchange of vector clocks is complete and both peers may begin sending messages at will. Updated vector clocks may continue to be sent by both peers at any point during the session. These updated clocks may reference a subset of the feeds represented in the initial vector clock, or they may reference different feeds entirely. This provides a means for responding to state changes in the local database and follow graph.</p>
<p>An <i>EBT</i> session may be terminated by either peer at any point, either by sending an error response or by closing the stream. If no error has occurred, the stream is closed when a peer wishes to conclude the session (as described in the <b>Source example</b> of the <b>RPC protocol</b> section above).</p>
1256
+
<h4id="request-skipping">Request Skipping</h4>
1257
+
<p><i>EBT</i> implementations rely on a mechanism known as <i>request skipping</i> to lower bandwidth overhead and increase replication efficiency. Each peer stores the vector clocks they receive from remote peers; these may be held in memory and persisted to disk to allow later retrieval. When a subsequent <i>EBT</i> session is initiated between peers, each peer first checks the stored vector clock of their remote peer before calculating an updated vector clock to be sent. If the latest locally-available sequence of a feed from the remote peer's vector clock is the same as the sequence in the saved vector clock for that peer, that feed is left out of the new vector clock in the outgoing request (hence the name <i>request skipping</i>). This provides a mechanism for limiting the total number of bytes to be sent over the wire.</p>
1258
+
<aside>
1259
+
<p>Approaches to tracking EBT session state may be gleaned from the JS and Go implementations.</p>
1260
+
</aside>
1261
+
<p>The stored vector clock for the remote peer may differ from their current vector clock. In that case, the remote peer will include the updated feed in their request and the local peer will respond by sending an additional partial vector clock including their sequence for that feed. Once both sides have exchanged their sequence for a particular feed, replication of messages in that feed may occur.</p>
<p>In order to further increase efficiency when connecting to multiple peers, feeds for which the local peer would like to receive updates are only sent to one peer at a time (in the outbound vector clock). A timeout may be used to request the feed from an alternate peer if no updates are available from the initial peer. In this way, the total set of requested feeds is spread across multiple peers.</p>
<p><i>EBT</i> is the preferred means for peers to exchange messages. However, not all Scuttlebutt clients support <i>EBT</i> replication. In the case that only one of two connected peers support <i>EBT</i>, both peers may instead fallback to using <i>createHistoryStream</i> to exchange messages. There are several scenarios which may trigger initiation of <i>createHistoryStream</i> replication:</p>
1266
+
<ul>
1267
+
<li>If the requester attempts to initiate an <i>EBT</i> session but the session is terminated with an error by the responder</li>
1268
+
<li>If a <i>createHistoryStream</i> request is immediately sent by the client upon successful connection</li>
1269
+
<li>If the client doesn’t attempt to initiate an <i>EBT</i> session for a certain amount of time</li>
1270
+
</ul>
1271
+
<p>[1] Joao Leitao, Jose Pereira and Luis Rodrigues. 2007. Epidemic Broadcast Trees. In <i>2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007)</i>, 301-310. https://doi.org/10.1109/SRDS.2007.27</p>
0 commit comments