blob: 68d7185dafa6c438e353c26f16f4aef452424ea4 [file] [log] [blame]
Junio C Hamano04495a12022-08-18 21:13:081<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
3 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
5<head>
6<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
7<meta name="generator" content="AsciiDoc 10.2.0" />
8<title>Bundle URIs</title>
9<style type="text/css">
10/* Shared CSS for AsciiDoc xhtml11 and html5 backends */
11
12/* Default font. */
13body {
14 font-family: Georgia,serif;
15}
16
17/* Title font. */
18h1, h2, h3, h4, h5, h6,
19div.title, caption.title,
20thead, p.table.header,
21#toctitle,
22#author, #revnumber, #revdate, #revremark,
23#footer {
24 font-family: Arial,Helvetica,sans-serif;
25}
26
27body {
28 margin: 1em 5% 1em 5%;
29}
30
31a {
32 color: blue;
33 text-decoration: underline;
34}
35a:visited {
36 color: fuchsia;
37}
38
39em {
40 font-style: italic;
41 color: navy;
42}
43
44strong {
45 font-weight: bold;
46 color: #083194;
47}
48
49h1, h2, h3, h4, h5, h6 {
50 color: #527bbd;
51 margin-top: 1.2em;
52 margin-bottom: 0.5em;
53 line-height: 1.3;
54}
55
56h1, h2, h3 {
57 border-bottom: 2px solid silver;
58}
59h2 {
60 padding-top: 0.5em;
61}
62h3 {
63 float: left;
64}
65h3 + * {
66 clear: left;
67}
68h5 {
69 font-size: 1.0em;
70}
71
72div.sectionbody {
73 margin-left: 0;
74}
75
76hr {
77 border: 1px solid silver;
78}
79
80p {
81 margin-top: 0.5em;
82 margin-bottom: 0.5em;
83}
84
85ul, ol, li > p {
86 margin-top: 0;
87}
88ul > li { color: #aaa; }
89ul > li > * { color: black; }
90
91.monospaced, code, pre {
92 font-family: "Courier New", Courier, monospace;
93 font-size: inherit;
94 color: navy;
95 padding: 0;
96 margin: 0;
97}
98pre {
99 white-space: pre-wrap;
100}
101
102#author {
103 color: #527bbd;
104 font-weight: bold;
105 font-size: 1.1em;
106}
107#email {
108}
109#revnumber, #revdate, #revremark {
110}
111
112#footer {
113 font-size: small;
114 border-top: 2px solid silver;
115 padding-top: 0.5em;
116 margin-top: 4.0em;
117}
118#footer-text {
119 float: left;
120 padding-bottom: 0.5em;
121}
122#footer-badges {
123 float: right;
124 padding-bottom: 0.5em;
125}
126
127#preamble {
128 margin-top: 1.5em;
129 margin-bottom: 1.5em;
130}
131div.imageblock, div.exampleblock, div.verseblock,
132div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
133div.admonitionblock {
134 margin-top: 1.0em;
135 margin-bottom: 1.5em;
136}
137div.admonitionblock {
138 margin-top: 2.0em;
139 margin-bottom: 2.0em;
140 margin-right: 10%;
141 color: #606060;
142}
143
144div.content { /* Block element content. */
145 padding: 0;
146}
147
148/* Block element titles. */
149div.title, caption.title {
150 color: #527bbd;
151 font-weight: bold;
152 text-align: left;
153 margin-top: 1.0em;
154 margin-bottom: 0.5em;
155}
156div.title + * {
157 margin-top: 0;
158}
159
160td div.title:first-child {
161 margin-top: 0.0em;
162}
163div.content div.title:first-child {
164 margin-top: 0.0em;
165}
166div.content + div.title {
167 margin-top: 0.0em;
168}
169
170div.sidebarblock > div.content {
171 background: #ffffee;
172 border: 1px solid #dddddd;
173 border-left: 4px solid #f0f0f0;
174 padding: 0.5em;
175}
176
177div.listingblock > div.content {
178 border: 1px solid #dddddd;
179 border-left: 5px solid #f0f0f0;
180 background: #f8f8f8;
181 padding: 0.5em;
182}
183
184div.quoteblock, div.verseblock {
185 padding-left: 1.0em;
186 margin-left: 1.0em;
187 margin-right: 10%;
188 border-left: 5px solid #f0f0f0;
189 color: #888;
190}
191
192div.quoteblock > div.attribution {
193 padding-top: 0.5em;
194 text-align: right;
195}
196
197div.verseblock > pre.content {
198 font-family: inherit;
199 font-size: inherit;
200}
201div.verseblock > div.attribution {
202 padding-top: 0.75em;
203 text-align: left;
204}
205/* DEPRECATED: Pre version 8.2.7 verse style literal block. */
206div.verseblock + div.attribution {
207 text-align: left;
208}
209
210div.admonitionblock .icon {
211 vertical-align: top;
212 font-size: 1.1em;
213 font-weight: bold;
214 text-decoration: underline;
215 color: #527bbd;
216 padding-right: 0.5em;
217}
218div.admonitionblock td.content {
219 padding-left: 0.5em;
220 border-left: 3px solid #dddddd;
221}
222
223div.exampleblock > div.content {
224 border-left: 3px solid #dddddd;
225 padding-left: 0.5em;
226}
227
228div.imageblock div.content { padding-left: 0; }
229span.image img { border-style: none; vertical-align: text-bottom; }
230a.image:visited { color: white; }
231
232dl {
233 margin-top: 0.8em;
234 margin-bottom: 0.8em;
235}
236dt {
237 margin-top: 0.5em;
238 margin-bottom: 0;
239 font-style: normal;
240 color: navy;
241}
242dd > *:first-child {
243 margin-top: 0.1em;
244}
245
246ul, ol {
247 list-style-position: outside;
248}
249ol.arabic {
250 list-style-type: decimal;
251}
252ol.loweralpha {
253 list-style-type: lower-alpha;
254}
255ol.upperalpha {
256 list-style-type: upper-alpha;
257}
258ol.lowerroman {
259 list-style-type: lower-roman;
260}
261ol.upperroman {
262 list-style-type: upper-roman;
263}
264
265div.compact ul, div.compact ol,
266div.compact p, div.compact p,
267div.compact div, div.compact div {
268 margin-top: 0.1em;
269 margin-bottom: 0.1em;
270}
271
272tfoot {
273 font-weight: bold;
274}
275td > div.verse {
276 white-space: pre;
277}
278
279div.hdlist {
280 margin-top: 0.8em;
281 margin-bottom: 0.8em;
282}
283div.hdlist tr {
284 padding-bottom: 15px;
285}
286dt.hdlist1.strong, td.hdlist1.strong {
287 font-weight: bold;
288}
289td.hdlist1 {
290 vertical-align: top;
291 font-style: normal;
292 padding-right: 0.8em;
293 color: navy;
294}
295td.hdlist2 {
296 vertical-align: top;
297}
298div.hdlist.compact tr {
299 margin: 0;
300 padding-bottom: 0;
301}
302
303.comment {
304 background: yellow;
305}
306
307.footnote, .footnoteref {
308 font-size: 0.8em;
309}
310
311span.footnote, span.footnoteref {
312 vertical-align: super;
313}
314
315#footnotes {
316 margin: 20px 0 20px 0;
317 padding: 7px 0 0 0;
318}
319
320#footnotes div.footnote {
321 margin: 0 0 5px 0;
322}
323
324#footnotes hr {
325 border: none;
326 border-top: 1px solid silver;
327 height: 1px;
328 text-align: left;
329 margin-left: 0;
330 width: 20%;
331 min-width: 100px;
332}
333
334div.colist td {
335 padding-right: 0.5em;
336 padding-bottom: 0.3em;
337 vertical-align: top;
338}
339div.colist td img {
340 margin-top: 0.3em;
341}
342
343@media print {
344 #footer-badges { display: none; }
345}
346
347#toc {
348 margin-bottom: 2.5em;
349}
350
351#toctitle {
352 color: #527bbd;
353 font-size: 1.1em;
354 font-weight: bold;
355 margin-top: 1.0em;
356 margin-bottom: 0.1em;
357}
358
359div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
360 margin-top: 0;
361 margin-bottom: 0;
362}
363div.toclevel2 {
364 margin-left: 2em;
365 font-size: 0.9em;
366}
367div.toclevel3 {
368 margin-left: 4em;
369 font-size: 0.9em;
370}
371div.toclevel4 {
372 margin-left: 6em;
373 font-size: 0.9em;
374}
375
376span.aqua { color: aqua; }
377span.black { color: black; }
378span.blue { color: blue; }
379span.fuchsia { color: fuchsia; }
380span.gray { color: gray; }
381span.green { color: green; }
382span.lime { color: lime; }
383span.maroon { color: maroon; }
384span.navy { color: navy; }
385span.olive { color: olive; }
386span.purple { color: purple; }
387span.red { color: red; }
388span.silver { color: silver; }
389span.teal { color: teal; }
390span.white { color: white; }
391span.yellow { color: yellow; }
392
393span.aqua-background { background: aqua; }
394span.black-background { background: black; }
395span.blue-background { background: blue; }
396span.fuchsia-background { background: fuchsia; }
397span.gray-background { background: gray; }
398span.green-background { background: green; }
399span.lime-background { background: lime; }
400span.maroon-background { background: maroon; }
401span.navy-background { background: navy; }
402span.olive-background { background: olive; }
403span.purple-background { background: purple; }
404span.red-background { background: red; }
405span.silver-background { background: silver; }
406span.teal-background { background: teal; }
407span.white-background { background: white; }
408span.yellow-background { background: yellow; }
409
410span.big { font-size: 2em; }
411span.small { font-size: 0.6em; }
412
413span.underline { text-decoration: underline; }
414span.overline { text-decoration: overline; }
415span.line-through { text-decoration: line-through; }
416
417div.unbreakable { page-break-inside: avoid; }
418
419
420/*
421 * xhtml11 specific
422 *
423 * */
424
425div.tableblock {
426 margin-top: 1.0em;
427 margin-bottom: 1.5em;
428}
429div.tableblock > table {
430 border: 3px solid #527bbd;
431}
432thead, p.table.header {
433 font-weight: bold;
434 color: #527bbd;
435}
436p.table {
437 margin-top: 0;
438}
439/* Because the table frame attribute is overridden by CSS in most browsers. */
440div.tableblock > table[frame="void"] {
441 border-style: none;
442}
443div.tableblock > table[frame="hsides"] {
444 border-left-style: none;
445 border-right-style: none;
446}
447div.tableblock > table[frame="vsides"] {
448 border-top-style: none;
449 border-bottom-style: none;
450}
451
452
453/*
454 * html5 specific
455 *
456 * */
457
458table.tableblock {
459 margin-top: 1.0em;
460 margin-bottom: 1.5em;
461}
462thead, p.tableblock.header {
463 font-weight: bold;
464 color: #527bbd;
465}
466p.tableblock {
467 margin-top: 0;
468}
469table.tableblock {
470 border-width: 3px;
471 border-spacing: 0px;
472 border-style: solid;
473 border-color: #527bbd;
474 border-collapse: collapse;
475}
476th.tableblock, td.tableblock {
477 border-width: 1px;
478 padding: 4px;
479 border-style: solid;
480 border-color: #527bbd;
481}
482
483table.tableblock.frame-topbot {
484 border-left-style: hidden;
485 border-right-style: hidden;
486}
487table.tableblock.frame-sides {
488 border-top-style: hidden;
489 border-bottom-style: hidden;
490}
491table.tableblock.frame-none {
492 border-style: hidden;
493}
494
495th.tableblock.halign-left, td.tableblock.halign-left {
496 text-align: left;
497}
498th.tableblock.halign-center, td.tableblock.halign-center {
499 text-align: center;
500}
501th.tableblock.halign-right, td.tableblock.halign-right {
502 text-align: right;
503}
504
505th.tableblock.valign-top, td.tableblock.valign-top {
506 vertical-align: top;
507}
508th.tableblock.valign-middle, td.tableblock.valign-middle {
509 vertical-align: middle;
510}
511th.tableblock.valign-bottom, td.tableblock.valign-bottom {
512 vertical-align: bottom;
513}
514
515
516/*
517 * manpage specific
518 *
519 * */
520
521body.manpage h1 {
522 padding-top: 0.5em;
523 padding-bottom: 0.5em;
524 border-top: 2px solid silver;
525 border-bottom: 2px solid silver;
526}
527body.manpage h2 {
528 border-style: none;
529}
530body.manpage div.sectionbody {
531 margin-left: 3em;
532}
533
534@media print {
535 body.manpage div#toc { display: none; }
536}
537
538
539</style>
540<script type="text/javascript">
541/*<![CDATA[*/
542var asciidoc = { // Namespace.
543
544/////////////////////////////////////////////////////////////////////
545// Table Of Contents generator
546/////////////////////////////////////////////////////////////////////
547
548/* Author: Mihai Bazon, September 2002
549 * http://students.infoiasi.ro/~mishoo
550 *
551 * Table Of Content generator
552 * Version: 0.4
553 *
554 * Feel free to use this script under the terms of the GNU General Public
555 * License, as long as you do not remove or alter this notice.
556 */
557
558 /* modified by Troy D. Hanson, September 2006. License: GPL */
559 /* modified by Stuart Rackham, 2006, 2009. License: GPL */
560
561// toclevels = 1..4.
562toc: function (toclevels) {
563
564 function getText(el) {
565 var text = "";
566 for (var i = el.firstChild; i != null; i = i.nextSibling) {
567 if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
568 text += i.data;
569 else if (i.firstChild != null)
570 text += getText(i);
571 }
572 return text;
573 }
574
575 function TocEntry(el, text, toclevel) {
576 this.element = el;
577 this.text = text;
578 this.toclevel = toclevel;
579 }
580
581 function tocEntries(el, toclevels) {
582 var result = new Array;
583 var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
584 // Function that scans the DOM tree for header elements (the DOM2
585 // nodeIterator API would be a better technique but not supported by all
586 // browsers).
587 var iterate = function (el) {
588 for (var i = el.firstChild; i != null; i = i.nextSibling) {
589 if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
590 var mo = re.exec(i.tagName);
591 if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
592 result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
593 }
594 iterate(i);
595 }
596 }
597 }
598 iterate(el);
599 return result;
600 }
601
602 var toc = document.getElementById("toc");
603 if (!toc) {
604 return;
605 }
606
607 // Delete existing TOC entries in case we're reloading the TOC.
608 var tocEntriesToRemove = [];
609 var i;
610 for (i = 0; i < toc.childNodes.length; i++) {
611 var entry = toc.childNodes[i];
612 if (entry.nodeName.toLowerCase() == 'div'
613 && entry.getAttribute("class")
614 && entry.getAttribute("class").match(/^toclevel/))
615 tocEntriesToRemove.push(entry);
616 }
617 for (i = 0; i < tocEntriesToRemove.length; i++) {
618 toc.removeChild(tocEntriesToRemove[i]);
619 }
620
621 // Rebuild TOC entries.
622 var entries = tocEntries(document.getElementById("content"), toclevels);
623 for (var i = 0; i < entries.length; ++i) {
624 var entry = entries[i];
625 if (entry.element.id == "")
626 entry.element.id = "_toc_" + i;
627 var a = document.createElement("a");
628 a.href = "#" + entry.element.id;
629 a.appendChild(document.createTextNode(entry.text));
630 var div = document.createElement("div");
631 div.appendChild(a);
632 div.className = "toclevel" + entry.toclevel;
633 toc.appendChild(div);
634 }
635 if (entries.length == 0)
636 toc.parentNode.removeChild(toc);
637},
638
639
640/////////////////////////////////////////////////////////////////////
641// Footnotes generator
642/////////////////////////////////////////////////////////////////////
643
644/* Based on footnote generation code from:
645 * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
646 */
647
648footnotes: function () {
649 // Delete existing footnote entries in case we're reloading the footnodes.
650 var i;
651 var noteholder = document.getElementById("footnotes");
652 if (!noteholder) {
653 return;
654 }
655 var entriesToRemove = [];
656 for (i = 0; i < noteholder.childNodes.length; i++) {
657 var entry = noteholder.childNodes[i];
658 if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
659 entriesToRemove.push(entry);
660 }
661 for (i = 0; i < entriesToRemove.length; i++) {
662 noteholder.removeChild(entriesToRemove[i]);
663 }
664
665 // Rebuild footnote entries.
666 var cont = document.getElementById("content");
667 var spans = cont.getElementsByTagName("span");
668 var refs = {};
669 var n = 0;
670 for (i=0; i<spans.length; i++) {
671 if (spans[i].className == "footnote") {
672 n++;
673 var note = spans[i].getAttribute("data-note");
674 if (!note) {
675 // Use [\s\S] in place of . so multi-line matches work.
676 // Because JavaScript has no s (dotall) regex flag.
677 note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
678 spans[i].innerHTML =
679 "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
680 "' title='View footnote' class='footnote'>" + n + "</a>]";
681 spans[i].setAttribute("data-note", note);
682 }
683 noteholder.innerHTML +=
684 "<div class='footnote' id='_footnote_" + n + "'>" +
685 "<a href='#_footnoteref_" + n + "' title='Return to text'>" +
686 n + "</a>. " + note + "</div>";
687 var id =spans[i].getAttribute("id");
688 if (id != null) refs["#"+id] = n;
689 }
690 }
691 if (n == 0)
692 noteholder.parentNode.removeChild(noteholder);
693 else {
694 // Process footnoterefs.
695 for (i=0; i<spans.length; i++) {
696 if (spans[i].className == "footnoteref") {
697 var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
698 href = href.match(/#.*/)[0]; // Because IE return full URL.
699 n = refs[href];
700 spans[i].innerHTML =
701 "[<a href='#_footnote_" + n +
702 "' title='View footnote' class='footnote'>" + n + "</a>]";
703 }
704 }
705 }
706},
707
708install: function(toclevels) {
709 var timerId;
710
711 function reinstall() {
712 asciidoc.footnotes();
713 if (toclevels) {
714 asciidoc.toc(toclevels);
715 }
716 }
717
718 function reinstallAndRemoveTimer() {
719 clearInterval(timerId);
720 reinstall();
721 }
722
723 timerId = setInterval(reinstall, 500);
724 if (document.addEventListener)
725 document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
726 else
727 window.onload = reinstallAndRemoveTimer;
728}
729
730}
731asciidoc.install();
732/*]]>*/
733</script>
734</head>
735<body class="article">
736<div id="header">
737<h1>Bundle URIs</h1>
Junio C Hamano9469c132023-06-23 20:26:29738<span id="revdate">2023-06-23</span>
Junio C Hamano04495a12022-08-18 21:13:08739</div>
740<div id="content">
741<div id="preamble">
742<div class="sectionbody">
743<div class="paragraph"><p>Git bundles are files that store a pack-file along with some extra metadata,
744including a set of refs and a (possibly empty) set of necessary commits. See
Junio C Hamanoa1ee1292022-09-21 22:47:52745<a href="../git-bundle.html">git-bundle(1)</a> and <a href="../gitformat-bundle.html">gitformat-bundle(5)</a> for more information.</p></div>
Junio C Hamano04495a12022-08-18 21:13:08746<div class="paragraph"><p>Bundle URIs are locations where Git can download one or more bundles in
747order to bootstrap the object database in advance of fetching the remaining
748objects from a remote.</p></div>
749<div class="paragraph"><p>One goal is to speed up clones and fetches for users with poor network
750connectivity to the origin server. Another benefit is to allow heavy users,
751such as CI build farms, to use local resources for the majority of Git data
752and thereby reducing the load on the origin server.</p></div>
753<div class="paragraph"><p>To enable the bundle URI feature, users can specify a bundle URI using
754command-line options or the origin server can advertise one or more URIs
755via a protocol v2 capability.</p></div>
756</div>
757</div>
758<div class="sect1">
759<h2 id="_design_goals">Design Goals</h2>
760<div class="sectionbody">
761<div class="paragraph"><p>The bundle URI standard aims to be flexible enough to satisfy multiple
762workloads. The bundle provider and the Git client have several choices in
763how they create and consume bundle URIs.</p></div>
764<div class="ulist"><ul>
765<li>
766<p>
767Bundles can have whatever name the server desires. This name could refer
768 to immutable data by using a hash of the bundle contents. However, this
769 means that a new URI will be needed after every update of the content.
770 This might be acceptable if the server is advertising the URI (and the
771 server is aware of new bundles being generated) but would not be
772 ergonomic for users using the command line option.
773</p>
774</li>
775<li>
776<p>
777The bundles could be organized specifically for bootstrapping full
778 clones, but could also be organized with the intention of bootstrapping
779 incremental fetches. The bundle provider must decide on one of several
780 organization schemes to minimize client downloads during incremental
781 fetches, but the Git client can also choose whether to use bundles for
782 either of these operations.
783</p>
784</li>
785<li>
786<p>
787The bundle provider can choose to support full clones, partial clones,
788 or both. The client can detect which bundles are appropriate for the
789 repository&#8217;s partial clone filter, if any.
790</p>
791</li>
792<li>
793<p>
794The bundle provider can use a single bundle (for clones only), or a
795 list of bundles. When using a list of bundles, the provider can specify
796 whether or not the client needs <em>all</em> of the bundle URIs for a full
797 clone, or if <em>any</em> one of the bundle URIs is sufficient. This allows the
798 bundle provider to use different URIs for different geographies.
799</p>
800</li>
801<li>
802<p>
803The bundle provider can organize the bundles using heuristics, such as
804 creation tokens, to help the client prevent downloading bundles it does
805 not need. When the bundle provider does not provide these heuristics,
806 the client can use optimizations to minimize how much of the data is
807 downloaded.
808</p>
809</li>
810<li>
811<p>
812The bundle provider does not need to be associated with the Git server.
813 The client can choose to use the bundle provider without it being
814 advertised by the Git server.
815</p>
816</li>
817<li>
818<p>
819The client can choose to discover bundle providers that are advertised
820 by the Git server. This could happen during <code>git clone</code>, during
821 <code>git fetch</code>, both, or neither. The user can choose which combination
822 works best for them.
823</p>
824</li>
825<li>
826<p>
827The client can choose to configure a bundle provider manually at any
828 time. The client can also choose to specify a bundle provider manually
829 as a command-line option to <code>git clone</code>.
830</p>
831</li>
832</ul></div>
833<div class="paragraph"><p>Each repository is different and every Git server has different needs.
834Hopefully the bundle URI feature is flexible enough to satisfy all needs.
835If not, then the feature can be extended through its versioning mechanism.</p></div>
836</div>
837</div>
838<div class="sect1">
839<h2 id="_server_requirements">Server requirements</h2>
840<div class="sectionbody">
841<div class="paragraph"><p>To provide a server-side implementation of bundle servers, no other parts
842of the Git protocol are required. This allows server maintainers to use
843static content solutions such as CDNs in order to serve the bundle files.</p></div>
844<div class="paragraph"><p>At the current scope of the bundle URI feature, all URIs are expected to
845be HTTP(S) URLs where content is downloaded to a local file using a <code>GET</code>
846request to that URL. The server could include authentication requirements
847to those requests with the aim of triggering the configured credential
848helper for secure access. (Future extensions could use "file://" URIs or
849SSH URIs.)</p></div>
850<div class="paragraph"><p>Assuming a <code>200 OK</code> response from the server, the content at the URL is
851inspected. First, Git attempts to parse the file as a bundle file of
852version 2 or higher. If the file is not a bundle, then the file is parsed
853as a plain-text file using Git&#8217;s config parser. The key-value pairs in
854that config file are expected to describe a list of bundle URIs. If
855neither of these parse attempts succeed, then Git will report an error to
856the user that the bundle URI provided erroneous data.</p></div>
857<div class="paragraph"><p>Any other data provided by the server is considered erroneous.</p></div>
858</div>
859</div>
860<div class="sect1">
861<h2 id="_bundle_lists">Bundle Lists</h2>
862<div class="sectionbody">
863<div class="paragraph"><p>The Git server can advertise bundle URIs using a set of <code>key=value</code> pairs.
864A bundle URI can also serve a plain-text file in the Git config format
865containing these same <code>key=value</code> pairs. In both cases, we consider this
866to be a <em>bundle list</em>. The pairs specify information about the bundles
867that the client can use to make decisions for which bundles to download
868and which to ignore.</p></div>
869<div class="paragraph"><p>A few keys focus on properties of the list itself.</p></div>
870<div class="dlist"><dl>
871<dt class="hdlist1">
872bundle.version
873</dt>
874<dd>
875<p>
876 (Required) This value provides a version number for the bundle
877 list. If a future Git change enables a feature that needs the Git
878 client to react to a new key in the bundle list file, then this version
879 will increment. The only current version number is 1, and if any other
880 value is specified then Git will fail to use this file.
881</p>
882</dd>
883<dt class="hdlist1">
884bundle.mode
885</dt>
886<dd>
887<p>
888 (Required) This value has one of two values: <code>all</code> and <code>any</code>. When <code>all</code>
889 is specified, then the client should expect to need all of the listed
890 bundle URIs that match their repository&#8217;s requirements. When <code>any</code> is
891 specified, then the client should expect that any one of the bundle URIs
892 that match their repository&#8217;s requirements will suffice. Typically, the
893 <code>any</code> option is used to list a number of different bundle servers
894 located in different geographies.
895</p>
896</dd>
897<dt class="hdlist1">
898bundle.heuristic
899</dt>
900<dd>
901<p>
902 If this string-valued key exists, then the bundle list is designed to
903 work well with incremental <code>git fetch</code> commands. The heuristic signals
904 that there are additional keys available for each bundle that help
905 determine which subset of bundles the client should download. The only
906 heuristic currently planned is <code>creationToken</code>.
907</p>
908</dd>
909</dl></div>
910<div class="paragraph"><p>The remaining keys include an <code>&lt;id&gt;</code> segment which is a server-designated
911name for each available bundle. The <code>&lt;id&gt;</code> must contain only alphanumeric
912and <code>-</code> characters.</p></div>
913<div class="dlist"><dl>
914<dt class="hdlist1">
915bundle.&lt;id&gt;.uri
916</dt>
917<dd>
918<p>
919 (Required) This string value is the URI for downloading bundle <code>&lt;id&gt;</code>.
920 If the URI begins with a protocol (<code>http://</code> or <code>https://</code>) then the URI
921 is absolute. Otherwise, the URI is interpreted as relative to the URI
922 used for the bundle list. If the URI begins with <code>/</code>, then that relative
923 path is relative to the domain name used for the bundle list. (This use
924 of relative paths is intended to make it easier to distribute a set of
925 bundles across a large number of servers or CDNs with different domain
926 names.)
927</p>
928</dd>
929<dt class="hdlist1">
930bundle.&lt;id&gt;.filter
931</dt>
932<dd>
933<p>
934 This string value represents an object filter that should also appear in
935 the header of this bundle. The server uses this value to differentiate
936 different kinds of bundles from which the client can choose those that
937 match their object filters.
938</p>
939</dd>
940<dt class="hdlist1">
941bundle.&lt;id&gt;.creationToken
942</dt>
943<dd>
944<p>
945 This value is a nonnegative 64-bit integer used for sorting the bundles
Junio C Hamano5948fb62022-10-11 18:09:30946 list. This is used to download a subset of bundles during a fetch when
947 <code>bundle.heuristic=creationToken</code>.
Junio C Hamano04495a12022-08-18 21:13:08948</p>
949</dd>
950<dt class="hdlist1">
951bundle.&lt;id&gt;.location
952</dt>
953<dd>
954<p>
955 This string value advertises a real-world location from where the bundle
956 URI is served. This can be used to present the user with an option for
957 which bundle URI to use or simply as an informative indicator of which
958 bundle URI was selected by Git. This is only valuable when
959 <code>bundle.mode</code> is <code>any</code>.
960</p>
961</dd>
962</dl></div>
963<div class="paragraph"><p>Here is an example bundle list using the Git config format:</p></div>
964<div class="literalblock">
965<div class="content">
966<pre><code>[bundle]
967 version = 1
968 mode = all
969 heuristic = creationToken</code></pre>
970</div></div>
971<div class="literalblock">
972<div class="content">
973<pre><code>[bundle "2022-02-09-1644442601-daily"]
974 uri = https://bundles.example.com/git/git/2022-02-09-1644442601-daily.bundle
975 creationToken = 1644442601</code></pre>
976</div></div>
977<div class="literalblock">
978<div class="content">
979<pre><code>[bundle "2022-02-02-1643842562"]
980 uri = https://bundles.example.com/git/git/2022-02-02-1643842562.bundle
981 creationToken = 1643842562</code></pre>
982</div></div>
983<div class="literalblock">
984<div class="content">
985<pre><code>[bundle "2022-02-09-1644442631-daily-blobless"]
986 uri = 2022-02-09-1644442631-daily-blobless.bundle
987 creationToken = 1644442631
988 filter = blob:none</code></pre>
989</div></div>
990<div class="literalblock">
991<div class="content">
992<pre><code>[bundle "2022-02-02-1643842568-blobless"]
993 uri = /git/git/2022-02-02-1643842568-blobless.bundle
994 creationToken = 1643842568
995 filter = blob:none</code></pre>
996</div></div>
997<div class="paragraph"><p>This example uses <code>bundle.mode=all</code> as well as the
998<code>bundle.&lt;id&gt;.creationToken</code> heuristic. It also uses the <code>bundle.&lt;id&gt;.filter</code>
999options to present two parallel sets of bundles: one for full clones and
1000another for blobless partial clones.</p></div>
1001<div class="paragraph"><p>Suppose that this bundle list was found at the URI
1002<code>https://bundles.example.com/git/git/</code> and so the two blobless bundles have
1003the following fully-expanded URIs:</p></div>
1004<div class="ulist"><ul>
1005<li>
1006<p>
1007<code>https://bundles.example.com/git/git/2022-02-09-1644442631-daily-blobless.bundle</code>
1008</p>
1009</li>
1010<li>
1011<p>
1012<code>https://bundles.example.com/git/git/2022-02-02-1643842568-blobless.bundle</code>
1013</p>
1014</li>
1015</ul></div>
1016</div>
1017</div>
1018<div class="sect1">
1019<h2 id="_advertising_bundle_uris">Advertising Bundle URIs</h2>
1020<div class="sectionbody">
1021<div class="paragraph"><p>If a user knows a bundle URI for the repository they are cloning, then
1022they can specify that URI manually through a command-line option. However,
1023a Git host may want to advertise bundle URIs during the clone operation,
1024helping users unaware of the feature.</p></div>
1025<div class="paragraph"><p>The only thing required for this feature is that the server can advertise
1026one or more bundle URIs. This advertisement takes the form of a new
1027protocol v2 capability specifically for discovering bundle URIs.</p></div>
1028<div class="paragraph"><p>The client could choose an arbitrary bundle URI as an option <em>or</em> select
1029the URI with best performance by some exploratory checks. It is up to the
1030bundle provider to decide if having multiple URIs is preferable to a
1031single URI that is geodistributed through server-side infrastructure.</p></div>
1032</div>
1033</div>
1034<div class="sect1">
1035<h2 id="_cloning_with_bundle_uris">Cloning with Bundle URIs</h2>
1036<div class="sectionbody">
1037<div class="paragraph"><p>The primary need for bundle URIs is to speed up clones. The Git client
1038will interact with bundle URIs according to the following flow:</p></div>
1039<div class="olist arabic"><ol class="arabic">
1040<li>
1041<p>
1042The user specifies a bundle URI with the <code>--bundle-uri</code> command-line
1043 option <em>or</em> the client discovers a bundle list advertised by the
1044 Git server.
1045</p>
1046</li>
1047<li>
1048<p>
1049If the downloaded data from a bundle URI is a bundle, then the client
1050 inspects the bundle headers to check that the prerequisite commit OIDs
1051 are present in the client repository. If some are missing, then the
1052 client delays unbundling until other bundles have been unbundled,
1053 making those OIDs present. When all required OIDs are present, the
1054 client unbundles that data using a refspec. The default refspec is
1055 <code>+refs/heads/*:refs/bundles/*</code>, but this can be configured. These refs
Junio C Hamano5948fb62022-10-11 18:09:301056 are stored so that later <code>git fetch</code> negotiations can communicate each
1057 bundled ref as a <code>have</code>, reducing the size of the fetch over the Git
Junio C Hamano04495a12022-08-18 21:13:081058 protocol. To allow pruning refs from this ref namespace, Git may
Junio C Hamano5948fb62022-10-11 18:09:301059 introduce a numbered namespace (such as <code>refs/bundles/&lt;i&gt;/*</code>) such that
Junio C Hamano04495a12022-08-18 21:13:081060 stale bundle refs can be deleted.
1061</p>
1062</li>
1063<li>
1064<p>
1065If the file is instead a bundle list, then the client inspects the
1066 <code>bundle.mode</code> to see if the list is of the <code>all</code> or <code>any</code> form.
1067</p>
1068<div class="olist loweralpha"><ol class="loweralpha">
1069<li>
1070<p>
1071If <code>bundle.mode=all</code>, then the client considers all bundle
1072 URIs. The list is reduced based on the <code>bundle.&lt;id&gt;.filter</code> options
1073 matching the client repository&#8217;s partial clone filter. Then, all
1074 bundle URIs are requested. If the <code>bundle.&lt;id&gt;.creationToken</code>
1075 heuristic is provided, then the bundles are downloaded in decreasing
1076 order by the creation token, stopping when a bundle has all required
1077 OIDs. The bundles can then be unbundled in increasing creation token
1078 order. The client stores the latest creation token as a heuristic
1079 for avoiding future downloads if the bundle list does not advertise
1080 bundles with larger creation tokens.
1081</p>
1082</li>
1083<li>
1084<p>
1085If <code>bundle.mode=any</code>, then the client can choose any one of the
1086 bundle URIs to inspect. The client can use a variety of ways to
1087 choose among these URIs. The client can also fallback to another URI
1088 if the initial choice fails to return a result.
1089</p>
1090</li>
1091</ol></div>
1092</li>
1093</ol></div>
1094<div class="paragraph"><p>Note that during a clone we expect that all bundles will be required, and
1095heuristics such as <code>bundle.&lt;uri&gt;.creationToken</code> can be used to download
1096bundles in chronological order or in parallel.</p></div>
1097<div class="paragraph"><p>If a given bundle URI is a bundle list with a <code>bundle.heuristic</code>
1098value, then the client can choose to store that URI as its chosen bundle
1099URI. The client can then navigate directly to that URI during later <code>git
1100fetch</code> calls.</p></div>
1101<div class="paragraph"><p>When downloading bundle URIs, the client can choose to inspect the initial
1102content before committing to downloading the entire content. This may
1103provide enough information to determine if the URI is a bundle list or
1104a bundle. In the case of a bundle, the client may inspect the bundle
1105header to determine that all advertised tips are already in the client
1106repository and cancel the remaining download.</p></div>
1107</div>
1108</div>
1109<div class="sect1">
1110<h2 id="_fetching_with_bundle_uris">Fetching with Bundle URIs</h2>
1111<div class="sectionbody">
1112<div class="paragraph"><p>When the client fetches new data, it can decide to fetch from bundle
1113servers before fetching from the origin remote. This could be done via a
1114command-line option, but it is more likely useful to use a config value
1115such as the one specified during the clone.</p></div>
1116<div class="paragraph"><p>The fetch operation follows the same procedure to download bundles from a
1117bundle list (although we do <em>not</em> want to use parallel downloads here). We
1118expect that the process will end when all prerequisite commit OIDs in a
1119thin bundle are already in the object database.</p></div>
1120<div class="paragraph"><p>When using the <code>creationToken</code> heuristic, the client can avoid downloading
Junio C Hamanoa1ee1292022-09-21 22:47:521121any bundles if their creation tokens are not larger than the stored
Junio C Hamano04495a12022-08-18 21:13:081122creation token. After fetching new bundles, Git updates this local
1123creation token.</p></div>
1124<div class="paragraph"><p>If the bundle provider does not provide a heuristic, then the client
1125should attempt to inspect the bundle headers before downloading the full
1126bundle data in case the bundle tips already exist in the client
1127repository.</p></div>
1128</div>
1129</div>
1130<div class="sect1">
1131<h2 id="_error_conditions">Error Conditions</h2>
1132<div class="sectionbody">
1133<div class="paragraph"><p>If the Git client discovers something unexpected while downloading
1134information according to a bundle URI or the bundle list found at that
1135location, then Git can ignore that data and continue as if it was not
1136given a bundle URI. The remote Git server is the ultimate source of truth,
1137not the bundle URI.</p></div>
1138<div class="paragraph"><p>Here are a few example error conditions:</p></div>
1139<div class="ulist"><ul>
1140<li>
1141<p>
1142The client fails to connect with a server at the given URI or a connection
1143 is lost without any chance to recover.
1144</p>
1145</li>
1146<li>
1147<p>
1148The client receives a 400-level response (such as <code>404 Not Found</code> or
1149 <code>401 Not Authorized</code>). The client should use the credential helper to
1150 find and provide a credential for the URI, but match the semantics of
1151 Git&#8217;s other HTTP protocols in terms of handling specific 400-level
1152 errors.
1153</p>
1154</li>
1155<li>
1156<p>
Junio C Hamanoa1ee1292022-09-21 22:47:521157The server reports any other failure response.
Junio C Hamano04495a12022-08-18 21:13:081158</p>
1159</li>
1160<li>
1161<p>
1162The client receives data that is not parsable as a bundle or bundle list.
1163</p>
1164</li>
1165<li>
1166<p>
1167A bundle includes a filter that does not match expectations.
1168</p>
1169</li>
1170<li>
1171<p>
1172The client cannot unbundle the bundles because the prerequisite commit OIDs
1173 are not in the object database and there are no more bundles to download.
1174</p>
1175</li>
1176</ul></div>
1177<div class="paragraph"><p>There are also situations that could be seen as wasteful, but are not
1178error conditions:</p></div>
1179<div class="ulist"><ul>
1180<li>
1181<p>
1182The downloaded bundles contain more information than is requested by
1183 the clone or fetch request. A primary example is if the user requests
1184 a clone with <code>--single-branch</code> but downloads bundles that store every
1185 reachable commit from all <code>refs/heads/*</code> references. This might be
1186 initially wasteful, but perhaps these objects will become reachable by
1187 a later ref update that the client cares about.
1188</p>
1189</li>
1190<li>
1191<p>
1192A bundle download during a <code>git fetch</code> contains objects already in the
1193 object database. This is probably unavoidable if we are using bundles
1194 for fetches, since the client will almost always be slightly ahead of
1195 the bundle servers after performing its "catch-up" fetch to the remote
1196 server. This extra work is most wasteful when the client is fetching
1197 much more frequently than the server is computing bundles, such as if
1198 the client is using hourly prefetches with background maintenance, but
1199 the server is computing bundles weekly. For this reason, the client
1200 should not use bundle URIs for fetch unless the server has explicitly
1201 recommended it through a <code>bundle.heuristic</code> value.
1202</p>
1203</li>
1204</ul></div>
1205</div>
1206</div>
1207<div class="sect1">
1208<h2 id="_example_bundle_provider_organization">Example Bundle Provider organization</h2>
1209<div class="sectionbody">
1210<div class="paragraph"><p>The bundle URI feature is intentionally designed to be flexible to
1211different ways a bundle provider wants to organize the object data.
1212However, it can be helpful to have a complete organization model described
1213here so providers can start from that base.</p></div>
1214<div class="paragraph"><p>This example organization is a simplified model of what is used by the
1215GVFS Cache Servers (see section near the end of this document) which have
1216been beneficial in speeding up clones and fetches for very large
1217repositories, although using extra software outside of Git.</p></div>
1218<div class="paragraph"><p>The bundle provider deploys servers across multiple geographies. Each
1219server manages its own bundle set. The server can track a number of Git
1220repositories, but provides a bundle list for each based on a pattern. For
1221example, when mirroring a repository at <code>https://&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;</code>
1222the bundle server could have its bundle list available at
1223<code>https://&lt;server-url&gt;/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;</code>. The origin Git server can
1224list all of these servers under the "any" mode:</p></div>
1225<div class="literalblock">
1226<div class="content">
1227<pre><code>[bundle]
1228 version = 1
1229 mode = any</code></pre>
1230</div></div>
1231<div class="literalblock">
1232<div class="content">
1233<pre><code>[bundle "eastus"]
1234 uri = https://eastus.example.com/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;</code></pre>
1235</div></div>
1236<div class="literalblock">
1237<div class="content">
1238<pre><code>[bundle "europe"]
1239 uri = https://europe.example.com/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;</code></pre>
1240</div></div>
1241<div class="literalblock">
1242<div class="content">
1243<pre><code>[bundle "apac"]
1244 uri = https://apac.example.com/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;</code></pre>
1245</div></div>
1246<div class="paragraph"><p>This "list of lists" is static and only changes if a bundle server is
1247added or removed.</p></div>
1248<div class="paragraph"><p>Each bundle server manages its own set of bundles. The initial bundle list
1249contains only a single bundle, containing all of the objects received from
1250cloning the repository from the origin server. The list uses the
1251<code>creationToken</code> heuristic and a <code>creationToken</code> is made for the bundle
1252based on the server&#8217;s timestamp.</p></div>
1253<div class="paragraph"><p>The bundle server runs regularly-scheduled updates for the bundle list,
1254such as once a day. During this task, the server fetches the latest
1255contents from the origin server and generates a bundle containing the
1256objects reachable from the latest origin refs, but not contained in a
1257previously-computed bundle. This bundle is added to the list, with care
1258that the <code>creationToken</code> is strictly greater than the previous maximum
1259<code>creationToken</code>.</p></div>
1260<div class="paragraph"><p>When the bundle list grows too large, say more than 30 bundles, then the
1261oldest "<em>N</em> minus 30" bundles are combined into a single bundle. This
1262bundle&#8217;s <code>creationToken</code> is equal to the maximum <code>creationToken</code> among the
1263merged bundles.</p></div>
1264<div class="paragraph"><p>An example bundle list is provided here, although it only has two daily
1265bundles and not a full list of 30:</p></div>
1266<div class="literalblock">
1267<div class="content">
1268<pre><code>[bundle]
1269 version = 1
1270 mode = all
1271 heuristic = creationToken</code></pre>
1272</div></div>
1273<div class="literalblock">
1274<div class="content">
1275<pre><code>[bundle "2022-02-13-1644770820-daily"]
1276 uri = https://eastus.example.com/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;/2022-02-09-1644770820-daily.bundle
1277 creationToken = 1644770820</code></pre>
1278</div></div>
1279<div class="literalblock">
1280<div class="content">
1281<pre><code>[bundle "2022-02-09-1644442601-daily"]
1282 uri = https://eastus.example.com/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;/2022-02-09-1644442601-daily.bundle
1283 creationToken = 1644442601</code></pre>
1284</div></div>
1285<div class="literalblock">
1286<div class="content">
1287<pre><code>[bundle "2022-02-02-1643842562"]
1288 uri = https://eastus.example.com/&lt;domain&gt;/&lt;org&gt;/&lt;repo&gt;/2022-02-02-1643842562.bundle
1289 creationToken = 1643842562</code></pre>
1290</div></div>
1291<div class="paragraph"><p>To avoid storing and serving object data in perpetuity despite becoming
1292unreachable in the origin server, this bundle merge can be more careful.
1293Instead of taking an absolute union of the old bundles, instead the bundle
1294can be created by looking at the newer bundles and ensuring that their
1295necessary commits are all available in this merged bundle (or in another
1296one of the newer bundles). This allows "expiring" object data that is not
1297being used by new commits in this window of time. That data could be
1298reintroduced by a later push.</p></div>
1299<div class="paragraph"><p>The intention of this data organization has two main goals. First, initial
1300clones of the repository become faster by downloading precomputed object
1301data from a closer source. Second, <code>git fetch</code> commands can be faster,
1302especially if the client has not fetched for a few days. However, if a
1303client does not fetch for 30 days, then the bundle list organization would
1304cause redownloading a large amount of object data.</p></div>
1305<div class="paragraph"><p>One way to make this organization more useful to users who fetch frequently
1306is to have more frequent bundle creation. For example, bundles could be
1307created every hour, and then once a day those "hourly" bundles could be
1308merged into a "daily" bundle. The daily bundles are merged into the
1309oldest bundle after 30 days.</p></div>
Junio C Hamanoa1ee1292022-09-21 22:47:521310<div class="paragraph"><p>It is recommended that this bundle strategy is repeated with the <code>blob:none</code>
Junio C Hamano04495a12022-08-18 21:13:081311filter if clients of this repository are expecting to use blobless partial
1312clones. This list of blobless bundles stays in the same list as the full
1313bundles, but uses the <code>bundle.&lt;id&gt;.filter</code> key to separate the two groups.
1314For very large repositories, the bundle provider may want to <em>only</em> provide
1315blobless bundles.</p></div>
1316</div>
1317</div>
1318<div class="sect1">
1319<h2 id="_implementation_plan">Implementation Plan</h2>
1320<div class="sectionbody">
1321<div class="paragraph"><p>This design document is being submitted on its own as an aspirational
1322document, with the goal of implementing all of the mentioned client
1323features over the course of several patch series. Here is a potential
1324outline for submitting these features:</p></div>
1325<div class="olist arabic"><ol class="arabic">
1326<li>
1327<p>
1328Integrate bundle URIs into <code>git clone</code> with a <code>--bundle-uri</code> option.
1329 This will include a new <code>git fetch --bundle-uri</code> mode for use as the
1330 implementation underneath <code>git clone</code>. The initial version here will
1331 expect a single bundle at the given URI.
1332</p>
1333</li>
1334<li>
1335<p>
1336Implement the ability to parse a bundle list from a bundle URI and
1337 update the <code>git fetch --bundle-uri</code> logic to properly distinguish
1338 between <code>bundle.mode</code> options. Specifically design the feature so
1339 that the config format parsing feeds a list of key-value pairs into the
1340 bundle list logic.
1341</p>
1342</li>
1343<li>
1344<p>
1345Create the <code>bundle-uri</code> protocol v2 command so Git servers can advertise
1346 bundle URIs using the key-value pairs. Plug into the existing key-value
1347 input to the bundle list logic. Allow <code>git clone</code> to discover these
1348 bundle URIs and bootstrap the client repository from the bundle data.
1349 (This choice is an opt-in via a config option and a command-line
1350 option.)
1351</p>
1352</li>
1353<li>
1354<p>
Junio C Hamano3583c5c2023-02-22 23:31:591355Allow the client to understand the <code>bundle.heuristic</code> configuration key
Junio C Hamano04495a12022-08-18 21:13:081356 and the <code>bundle.&lt;id&gt;.creationToken</code> heuristic. When <code>git clone</code>
Junio C Hamano3583c5c2023-02-22 23:31:591357 discovers a bundle URI with <code>bundle.heuristic</code>, it configures the client
1358 repository to check that bundle URI during later <code>git fetch &lt;remote&gt;</code>
Junio C Hamano04495a12022-08-18 21:13:081359 commands.
1360</p>
1361</li>
1362<li>
1363<p>
1364Allow clients to discover bundle URIs during <code>git fetch</code> and configure
Junio C Hamano3583c5c2023-02-22 23:31:591365 a bundle URI for later fetches if <code>bundle.heuristic</code> is set.
Junio C Hamano04495a12022-08-18 21:13:081366</p>
1367</li>
1368<li>
1369<p>
1370Implement the "inspect headers" heuristic to reduce data downloads when
1371 the <code>bundle.&lt;id&gt;.creationToken</code> heuristic is not available.
1372</p>
1373</li>
1374</ol></div>
1375<div class="paragraph"><p>As these features are reviewed, this plan might be updated. We also expect
1376that new designs will be discovered and implemented as this feature
1377matures and becomes used in real-world scenarios.</p></div>
1378</div>
1379</div>
1380<div class="sect1">
1381<h2 id="_related_work_packfile_uris">Related Work: Packfile URIs</h2>
1382<div class="sectionbody">
1383<div class="paragraph"><p>The Git protocol already has a capability where the Git server can list
1384a set of URLs along with the packfile response when serving a client
1385request. The client is then expected to download the packfiles at those
1386locations in order to have a complete understanding of the response.</p></div>
1387<div class="paragraph"><p>This mechanism is used by the Gerrit server (implemented with JGit) and
1388has been effective at reducing CPU load and improving user performance for
1389clones.</p></div>
1390<div class="paragraph"><p>A major downside to this mechanism is that the origin server needs to know
1391<em>exactly</em> what is in those packfiles, and the packfiles need to be available
1392to the user for some time after the server has responded. This coupling
1393between the origin and the packfile data is difficult to manage.</p></div>
1394<div class="paragraph"><p>Further, this implementation is extremely hard to make work with fetches.</p></div>
1395</div>
1396</div>
1397<div class="sect1">
1398<h2 id="_related_work_gvfs_cache_servers">Related Work: GVFS Cache Servers</h2>
1399<div class="sectionbody">
1400<div class="paragraph"><p>The GVFS Protocol [2] is a set of HTTP endpoints designed independently of
1401the Git project before Git&#8217;s partial clone was created. One feature of this
1402protocol is the idea of a "cache server" which can be colocated with build
1403machines or developer offices to transfer Git data without overloading the
1404central server.</p></div>
1405<div class="paragraph"><p>The endpoint that VFS for Git is famous for is the <code>GET /gvfs/objects/{oid}</code>
1406endpoint, which allows downloading an object on-demand. This is a critical
1407piece of the filesystem virtualization of that product.</p></div>
1408<div class="paragraph"><p>However, a more subtle need is the <code>GET /gvfs/prefetch?lastPackTimestamp=&lt;t&gt;</code>
1409endpoint. Given an optional timestamp, the cache server responds with a list
1410of precomputed packfiles containing the commits and trees that were introduced
1411in those time intervals.</p></div>
1412<div class="paragraph"><p>The cache server computes these "prefetch" packfiles using the following
1413strategy:</p></div>
1414<div class="olist arabic"><ol class="arabic">
1415<li>
1416<p>
1417Every hour, an "hourly" pack is generated with a given timestamp.
1418</p>
1419</li>
1420<li>
1421<p>
1422Nightly, the previous 24 hourly packs are rolled up into a "daily" pack.
1423</p>
1424</li>
1425<li>
1426<p>
1427Nightly, all prefetch packs more than 30 days old are rolled up into
1428 one pack.
1429</p>
1430</li>
1431</ol></div>
1432<div class="paragraph"><p>When a user runs <code>gvfs clone</code> or <code>scalar clone</code> against a repo with cache
1433servers, the client requests all prefetch packfiles, which is at most
1434<code>24 + 30 + 1</code> packfiles downloading only commits and trees. The client
1435then follows with a request to the origin server for the references, and
1436attempts to checkout that tip reference. (There is an extra endpoint that
1437helps get all reachable trees from a given commit, in case that commit
1438was not already in a prefetch packfile.)</p></div>
1439<div class="paragraph"><p>During a <code>git fetch</code>, a hook requests the prefetch endpoint using the
1440most-recent timestamp from a previously-downloaded prefetch packfile.
1441Only the list of packfiles with later timestamps are downloaded. Most
1442users fetch hourly, so they get at most one hourly prefetch pack. Users
1443whose machines have been off or otherwise have not fetched in over 30 days
1444might redownload all prefetch packfiles. This is rare.</p></div>
1445<div class="paragraph"><p>It is important to note that the clients always contact the origin server
1446for the refs advertisement, so the refs are frequently "ahead" of the
1447prefetched pack data. The missing objects are downloaded on-demand using
1448the <code>GET gvfs/objects/{oid}</code> requests, when needed by a command such as
1449<code>git checkout</code> or <code>git log</code>. Some Git optimizations disable checks that
1450would cause these on-demand downloads to be too aggressive.</p></div>
1451</div>
1452</div>
1453<div class="sect1">
1454<h2 id="_see_also">See Also</h2>
1455<div class="sectionbody">
1456<div class="paragraph"><p>[1] <a href="https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/">https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/</a>
1457 An earlier RFC for a bundle URI feature.</p></div>
1458<div class="paragraph"><p>[2] <a href="https://github.com/microsoft/VFSForGit/blob/master/Protocol.md">https://github.com/microsoft/VFSForGit/blob/master/Protocol.md</a>
1459 The GVFS Protocol</p></div>
1460</div>
1461</div>
1462</div>
1463<div id="footnotes"><hr /></div>
1464<div id="footer">
1465<div id="footer-text">
1466Last updated
Junio C Hamanoa7b2c102023-06-13 21:00:151467 2023-02-22 15:29:29 PST
Junio C Hamano04495a12022-08-18 21:13:081468</div>
1469</div>
1470</body>
1471</html>