| <?xml version="1.0" encoding="UTF-8"?> | |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" | |
| "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> | |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> | |
| <head> | |
| <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /> | |
| <meta name="generator" content="AsciiDoc 8.6.10" /> | |
| <title>Concerning Git’s Packing Heuristics</title> | |
| <style type="text/css"> | |
| /* Shared CSS for AsciiDoc xhtml11 and html5 backends */ | |
| /* Default font. */ | |
| body { | |
| font-family: Georgia,serif; | |
| } | |
| /* Title font. */ | |
| h1, h2, h3, h4, h5, h6, | |
| div.title, caption.title, | |
| thead, p.table.header, | |
| #toctitle, | |
| #author, #revnumber, #revdate, #revremark, | |
| #footer { | |
| font-family: Arial,Helvetica,sans-serif; | |
| } | |
| body { | |
| margin: 1em 5% 1em 5%; | |
| } | |
| a { | |
| color: blue; | |
| text-decoration: underline; | |
| } | |
| a:visited { | |
| color: fuchsia; | |
| } | |
| em { | |
| font-style: italic; | |
| color: navy; | |
| } | |
| strong { | |
| font-weight: bold; | |
| color: #083194; | |
| } | |
| h1, h2, h3, h4, h5, h6 { | |
| color: #527bbd; | |
| margin-top: 1.2em; | |
| margin-bottom: 0.5em; | |
| line-height: 1.3; | |
| } | |
| h1, h2, h3 { | |
| border-bottom: 2px solid silver; | |
| } | |
| h2 { | |
| padding-top: 0.5em; | |
| } | |
| h3 { | |
| float: left; | |
| } | |
| h3 + * { | |
| clear: left; | |
| } | |
| h5 { | |
| font-size: 1.0em; | |
| } | |
| div.sectionbody { | |
| margin-left: 0; | |
| } | |
| hr { | |
| border: 1px solid silver; | |
| } | |
| p { | |
| margin-top: 0.5em; | |
| margin-bottom: 0.5em; | |
| } | |
| ul, ol, li > p { | |
| margin-top: 0; | |
| } | |
| ul > li { color: #aaa; } | |
| ul > li > * { color: black; } | |
| .monospaced, code, pre { | |
| font-family: "Courier New", Courier, monospace; | |
| font-size: inherit; | |
| color: navy; | |
| padding: 0; | |
| margin: 0; | |
| } | |
| pre { | |
| white-space: pre-wrap; | |
| } | |
| #author { | |
| color: #527bbd; | |
| font-weight: bold; | |
| font-size: 1.1em; | |
| } | |
| #email { | |
| } | |
| #revnumber, #revdate, #revremark { | |
| } | |
| #footer { | |
| font-size: small; | |
| border-top: 2px solid silver; | |
| padding-top: 0.5em; | |
| margin-top: 4.0em; | |
| } | |
| #footer-text { | |
| float: left; | |
| padding-bottom: 0.5em; | |
| } | |
| #footer-badges { | |
| float: right; | |
| padding-bottom: 0.5em; | |
| } | |
| #preamble { | |
| margin-top: 1.5em; | |
| margin-bottom: 1.5em; | |
| } | |
| div.imageblock, div.exampleblock, div.verseblock, | |
| div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock, | |
| div.admonitionblock { | |
| margin-top: 1.0em; | |
| margin-bottom: 1.5em; | |
| } | |
| div.admonitionblock { | |
| margin-top: 2.0em; | |
| margin-bottom: 2.0em; | |
| margin-right: 10%; | |
| color: #606060; | |
| } | |
| div.content { /* Block element content. */ | |
| padding: 0; | |
| } | |
| /* Block element titles. */ | |
| div.title, caption.title { | |
| color: #527bbd; | |
| font-weight: bold; | |
| text-align: left; | |
| margin-top: 1.0em; | |
| margin-bottom: 0.5em; | |
| } | |
| div.title + * { | |
| margin-top: 0; | |
| } | |
| td div.title:first-child { | |
| margin-top: 0.0em; | |
| } | |
| div.content div.title:first-child { | |
| margin-top: 0.0em; | |
| } | |
| div.content + div.title { | |
| margin-top: 0.0em; | |
| } | |
| div.sidebarblock > div.content { | |
| background: #ffffee; | |
| border: 1px solid #dddddd; | |
| border-left: 4px solid #f0f0f0; | |
| padding: 0.5em; | |
| } | |
| div.listingblock > div.content { | |
| border: 1px solid #dddddd; | |
| border-left: 5px solid #f0f0f0; | |
| background: #f8f8f8; | |
| padding: 0.5em; | |
| } | |
| div.quoteblock, div.verseblock { | |
| padding-left: 1.0em; | |
| margin-left: 1.0em; | |
| margin-right: 10%; | |
| border-left: 5px solid #f0f0f0; | |
| color: #888; | |
| } | |
| div.quoteblock > div.attribution { | |
| padding-top: 0.5em; | |
| text-align: right; | |
| } | |
| div.verseblock > pre.content { | |
| font-family: inherit; | |
| font-size: inherit; | |
| } | |
| div.verseblock > div.attribution { | |
| padding-top: 0.75em; | |
| text-align: left; | |
| } | |
| /* DEPRECATED: Pre version 8.2.7 verse style literal block. */ | |
| div.verseblock + div.attribution { | |
| text-align: left; | |
| } | |
| div.admonitionblock .icon { | |
| vertical-align: top; | |
| font-size: 1.1em; | |
| font-weight: bold; | |
| text-decoration: underline; | |
| color: #527bbd; | |
| padding-right: 0.5em; | |
| } | |
| div.admonitionblock td.content { | |
| padding-left: 0.5em; | |
| border-left: 3px solid #dddddd; | |
| } | |
| div.exampleblock > div.content { | |
| border-left: 3px solid #dddddd; | |
| padding-left: 0.5em; | |
| } | |
| div.imageblock div.content { padding-left: 0; } | |
| span.image img { border-style: none; vertical-align: text-bottom; } | |
| a.image:visited { color: white; } | |
| dl { | |
| margin-top: 0.8em; | |
| margin-bottom: 0.8em; | |
| } | |
| dt { | |
| margin-top: 0.5em; | |
| margin-bottom: 0; | |
| font-style: normal; | |
| color: navy; | |
| } | |
| dd > *:first-child { | |
| margin-top: 0.1em; | |
| } | |
| ul, ol { | |
| list-style-position: outside; | |
| } | |
| ol.arabic { | |
| list-style-type: decimal; | |
| } | |
| ol.loweralpha { | |
| list-style-type: lower-alpha; | |
| } | |
| ol.upperalpha { | |
| list-style-type: upper-alpha; | |
| } | |
| ol.lowerroman { | |
| list-style-type: lower-roman; | |
| } | |
| ol.upperroman { | |
| list-style-type: upper-roman; | |
| } | |
| div.compact ul, div.compact ol, | |
| div.compact p, div.compact p, | |
| div.compact div, div.compact div { | |
| margin-top: 0.1em; | |
| margin-bottom: 0.1em; | |
| } | |
| tfoot { | |
| font-weight: bold; | |
| } | |
| td > div.verse { | |
| white-space: pre; | |
| } | |
| div.hdlist { | |
| margin-top: 0.8em; | |
| margin-bottom: 0.8em; | |
| } | |
| div.hdlist tr { | |
| padding-bottom: 15px; | |
| } | |
| dt.hdlist1.strong, td.hdlist1.strong { | |
| font-weight: bold; | |
| } | |
| td.hdlist1 { | |
| vertical-align: top; | |
| font-style: normal; | |
| padding-right: 0.8em; | |
| color: navy; | |
| } | |
| td.hdlist2 { | |
| vertical-align: top; | |
| } | |
| div.hdlist.compact tr { | |
| margin: 0; | |
| padding-bottom: 0; | |
| } | |
| .comment { | |
| background: yellow; | |
| } | |
| .footnote, .footnoteref { | |
| font-size: 0.8em; | |
| } | |
| span.footnote, span.footnoteref { | |
| vertical-align: super; | |
| } | |
| #footnotes { | |
| margin: 20px 0 20px 0; | |
| padding: 7px 0 0 0; | |
| } | |
| #footnotes div.footnote { | |
| margin: 0 0 5px 0; | |
| } | |
| #footnotes hr { | |
| border: none; | |
| border-top: 1px solid silver; | |
| height: 1px; | |
| text-align: left; | |
| margin-left: 0; | |
| width: 20%; | |
| min-width: 100px; | |
| } | |
| div.colist td { | |
| padding-right: 0.5em; | |
| padding-bottom: 0.3em; | |
| vertical-align: top; | |
| } | |
| div.colist td img { | |
| margin-top: 0.3em; | |
| } | |
| @media print { | |
| #footer-badges { display: none; } | |
| } | |
| #toc { | |
| margin-bottom: 2.5em; | |
| } | |
| #toctitle { | |
| color: #527bbd; | |
| font-size: 1.1em; | |
| font-weight: bold; | |
| margin-top: 1.0em; | |
| margin-bottom: 0.1em; | |
| } | |
| div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 { | |
| margin-top: 0; | |
| margin-bottom: 0; | |
| } | |
| div.toclevel2 { | |
| margin-left: 2em; | |
| font-size: 0.9em; | |
| } | |
| div.toclevel3 { | |
| margin-left: 4em; | |
| font-size: 0.9em; | |
| } | |
| div.toclevel4 { | |
| margin-left: 6em; | |
| font-size: 0.9em; | |
| } | |
| span.aqua { color: aqua; } | |
| span.black { color: black; } | |
| span.blue { color: blue; } | |
| span.fuchsia { color: fuchsia; } | |
| span.gray { color: gray; } | |
| span.green { color: green; } | |
| span.lime { color: lime; } | |
| span.maroon { color: maroon; } | |
| span.navy { color: navy; } | |
| span.olive { color: olive; } | |
| span.purple { color: purple; } | |
| span.red { color: red; } | |
| span.silver { color: silver; } | |
| span.teal { color: teal; } | |
| span.white { color: white; } | |
| span.yellow { color: yellow; } | |
| span.aqua-background { background: aqua; } | |
| span.black-background { background: black; } | |
| span.blue-background { background: blue; } | |
| span.fuchsia-background { background: fuchsia; } | |
| span.gray-background { background: gray; } | |
| span.green-background { background: green; } | |
| span.lime-background { background: lime; } | |
| span.maroon-background { background: maroon; } | |
| span.navy-background { background: navy; } | |
| span.olive-background { background: olive; } | |
| span.purple-background { background: purple; } | |
| span.red-background { background: red; } | |
| span.silver-background { background: silver; } | |
| span.teal-background { background: teal; } | |
| span.white-background { background: white; } | |
| span.yellow-background { background: yellow; } | |
| span.big { font-size: 2em; } | |
| span.small { font-size: 0.6em; } | |
| span.underline { text-decoration: underline; } | |
| span.overline { text-decoration: overline; } | |
| span.line-through { text-decoration: line-through; } | |
| div.unbreakable { page-break-inside: avoid; } | |
| /* | |
| * xhtml11 specific | |
| * | |
| * */ | |
| div.tableblock { | |
| margin-top: 1.0em; | |
| margin-bottom: 1.5em; | |
| } | |
| div.tableblock > table { | |
| border: 3px solid #527bbd; | |
| } | |
| thead, p.table.header { | |
| font-weight: bold; | |
| color: #527bbd; | |
| } | |
| p.table { | |
| margin-top: 0; | |
| } | |
| /* Because the table frame attribute is overriden by CSS in most browsers. */ | |
| div.tableblock > table[frame="void"] { | |
| border-style: none; | |
| } | |
| div.tableblock > table[frame="hsides"] { | |
| border-left-style: none; | |
| border-right-style: none; | |
| } | |
| div.tableblock > table[frame="vsides"] { | |
| border-top-style: none; | |
| border-bottom-style: none; | |
| } | |
| /* | |
| * html5 specific | |
| * | |
| * */ | |
| table.tableblock { | |
| margin-top: 1.0em; | |
| margin-bottom: 1.5em; | |
| } | |
| thead, p.tableblock.header { | |
| font-weight: bold; | |
| color: #527bbd; | |
| } | |
| p.tableblock { | |
| margin-top: 0; | |
| } | |
| table.tableblock { | |
| border-width: 3px; | |
| border-spacing: 0px; | |
| border-style: solid; | |
| border-color: #527bbd; | |
| border-collapse: collapse; | |
| } | |
| th.tableblock, td.tableblock { | |
| border-width: 1px; | |
| padding: 4px; | |
| border-style: solid; | |
| border-color: #527bbd; | |
| } | |
| table.tableblock.frame-topbot { | |
| border-left-style: hidden; | |
| border-right-style: hidden; | |
| } | |
| table.tableblock.frame-sides { | |
| border-top-style: hidden; | |
| border-bottom-style: hidden; | |
| } | |
| table.tableblock.frame-none { | |
| border-style: hidden; | |
| } | |
| th.tableblock.halign-left, td.tableblock.halign-left { | |
| text-align: left; | |
| } | |
| th.tableblock.halign-center, td.tableblock.halign-center { | |
| text-align: center; | |
| } | |
| th.tableblock.halign-right, td.tableblock.halign-right { | |
| text-align: right; | |
| } | |
| th.tableblock.valign-top, td.tableblock.valign-top { | |
| vertical-align: top; | |
| } | |
| th.tableblock.valign-middle, td.tableblock.valign-middle { | |
| vertical-align: middle; | |
| } | |
| th.tableblock.valign-bottom, td.tableblock.valign-bottom { | |
| vertical-align: bottom; | |
| } | |
| /* | |
| * manpage specific | |
| * | |
| * */ | |
| body.manpage h1 { | |
| padding-top: 0.5em; | |
| padding-bottom: 0.5em; | |
| border-top: 2px solid silver; | |
| border-bottom: 2px solid silver; | |
| } | |
| body.manpage h2 { | |
| border-style: none; | |
| } | |
| body.manpage div.sectionbody { | |
| margin-left: 3em; | |
| } | |
| @media print { | |
| body.manpage div#toc { display: none; } | |
| } | |
| </style> | |
| <script type="text/javascript"> | |
| /*<+'])'); | |
| // Function that scans the DOM tree for header elements (the DOM2 | |
| // nodeIterator API would be a better technique but not supported by all | |
| // browsers). | |
| var iterate = function (el) { | |
| for (var i = el.firstChild; i != null; i = i.nextSibling) { | |
| if (i.nodeType == 1 /* Node.ELEMENT_NODE */) { | |
| var mo = re.exec(i.tagName); | |
| if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") { | |
| result[result.length] = new TocEntry(i, getText(i), mo[1]-1); | |
| } | |
| iterate(i); | |
| } | |
| } | |
| } | |
| iterate(el); | |
| return result; | |
| } | |
| var toc = document.getElementById("toc"); | |
| if (!toc) { | |
| return; | |
| } | |
| // Delete existing TOC entries in case we're reloading the TOC. | |
| var tocEntriesToRemove = []; | |
| var i; | |
| for (i = 0; i < toc.childNodes.length; i++) { | |
| var entry = toc.childNodes[i]; | |
| if (entry.nodeName.toLowerCase() == 'div' | |
| && entry.getAttribute("class") | |
| && entry.getAttribute("class").match(/^toclevel/)) | |
| tocEntriesToRemove.push(entry); | |
| } | |
| for (i = 0; i < tocEntriesToRemove.length; i++) { | |
| toc.removeChild(tocEntriesToRemove[i]); | |
| } | |
| // Rebuild TOC entries. | |
| var entries = tocEntries(document.getElementById("content"), toclevels); | |
| for (var i = 0; i < entries.length; ++i) { | |
| var entry = entries[i]; | |
| if (entry.element.id == "") | |
| entry.element.id = "_toc_" + i; | |
| var a = document.createElement("a"); | |
| a.href = "#" + entry.element.id; | |
| a.appendChild(document.createTextNode(entry.text)); | |
| var div = document.createElement("div"); | |
| div.appendChild(a); | |
| div.className = "toclevel" + entry.toclevel; | |
| toc.appendChild(div); | |
| } | |
| if (entries.length == 0) | |
| toc.parentNode.removeChild(toc); | |
| }, | |
| ///////////////////////////////////////////////////////////////////// | |
| // Footnotes generator | |
| ///////////////////////////////////////////////////////////////////// | |
| /* Based on footnote generation code from: | |
| * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html | |
| */ | |
| footnotes: function () { | |
| // Delete existing footnote entries in case we're reloading the footnodes. | |
| var i; | |
| var noteholder = document.getElementById("footnotes"); | |
| if (!noteholder) { | |
| return; | |
| } | |
| var entriesToRemove = []; | |
| for (i = 0; i < noteholder.childNodes.length; i++) { | |
| var entry = noteholder.childNodes[i]; | |
| if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote") | |
| entriesToRemove.push(entry); | |
| } | |
| for (i = 0; i < entriesToRemove.length; i++) { | |
| noteholder.removeChild(entriesToRemove[i]); | |
| } | |
| // Rebuild footnote entries. | |
| var cont = document.getElementById("content"); | |
| var spans = cont.getElementsByTagName("span"); | |
| var refs = {}; | |
| var n = 0; | |
| for (i=0; i<spans.length; i++) { | |
| if (spans[i].className == "footnote") { | |
| n++; | |
| var note = spans[i].getAttribute("data-note"); | |
| if (!note) { | |
| // Use [\s\S] in place of . so multi-line matches work. | |
| // Because JavaScript has no s (dotall) regex flag. | |
| note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1]; | |
| spans[i].innerHTML = | |
| "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n + | |
| "' title='View footnote' class='footnote'>" + n + "</a>]"; | |
| spans[i].setAttribute("data-note", note); | |
| } | |
| noteholder.innerHTML += | |
| "<div class='footnote' id='_footnote_" + n + "'>" + | |
| "<a href='#_footnoteref_" + n + "' title='Return to text'>" + | |
| n + "</a>. " + note + "</div>"; | |
| var id =spans[i].getAttribute("id"); | |
| if (id != null) refs["#"+id] = n; | |
| } | |
| } | |
| if (n == 0) | |
| noteholder.parentNode.removeChild(noteholder); | |
| else { | |
| // Process footnoterefs. | |
| for (i=0; i<spans.length; i++) { | |
| if (spans[i].className == "footnoteref") { | |
| var href = spans[i].getElementsByTagName("a")[0].getAttribute("href"); | |
| href = href.match(/#.*/)[0]; // Because IE return full URL. | |
| n = refs[href]; | |
| spans[i].innerHTML = | |
| "[<a href='#_footnote_" + n + | |
| "' title='View footnote' class='footnote'>" + n + "</a>]"; | |
| } | |
| } | |
| } | |
| }, | |
| install: function(toclevels) { | |
| var timerId; | |
| function reinstall() { | |
| asciidoc.footnotes(); | |
| if (toclevels) { | |
| asciidoc.toc(toclevels); | |
| } | |
| } | |
| function reinstallAndRemoveTimer() { | |
| clearInterval(timerId); | |
| reinstall(); | |
| } | |
| timerId = setInterval(reinstall, 500); | |
| if (document.addEventListener) | |
| document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false); | |
| else | |
| window.onload = reinstallAndRemoveTimer; | |
| } | |
| } | |
| asciidoc.install(); | |
| /*]]>*/ | |
| </script> | |
| </head> | |
| <body class="article"> | |
| <div id="header"> | |
| <h1>Concerning Git’s Packing Heuristics</h1> | |
| </div> | |
| <div id="content"> | |
| <div id="preamble"> | |
| <div class="sectionbody"> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Oh, here's a really stupid question:</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code> Where do I go | |
| to learn the details | |
| of Git's packing heuristics?</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Be careful what you ask!</p></div> | |
| <div class="paragraph"><p>Followers of the Git, please open the Git IRC Log and turn to | |
| February 10, 2006.</p></div> | |
| <div class="paragraph"><p>It’s a rare occasion, and we are joined by the King Git Himself, | |
| Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor | |
| and seeks enlightenment. Others are present, but silent.</p></div> | |
| <div class="paragraph"><p>Let’s listen in!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Oh, here's a really stupid question -- where do I go to | |
| learn the details of Git's packing heuristics? google avails | |
| me not, reading the source didn't help a lot, and wading | |
| through the whole mailing list seems less efficient than any | |
| of that.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>It is a bold start! A plea for help combined with a simultaneous | |
| tri-part attack on some of the tried and true mainstays in the quest | |
| for enlightenment. Brash accusations of google being useless. Hubris! | |
| Maligning the source. Heresy! Disdain for the mailing list archives. | |
| Woe.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><pasky> yes, the packing-related delta stuff is somewhat | |
| mysterious even for me ;)</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Ah! Modesty after all.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> njs, I don't think the docs exist. That's something where | |
| I don't think anybody else than me even really got involved. | |
| Most of the rest of Git others have been busy with (especially | |
| Junio), but packing nobody touched after I did it.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>It’s cryptic, yet vague. Linus in style for sure. Wise men | |
| interpret this as an apology. A few argue it is merely a | |
| statement of fact.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> I guess the next step is "read the source again", but I | |
| have to build up a certain level of gumption first :-)</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Indeed! On both points.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> The packing heuristic is actually really really simple.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Bait…</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> But strange.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>And switch. That ought to do it!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Remember: Git really doesn't follow files. So what it does is | |
| - generate a list of all objects | |
| - sort the list according to magic heuristics | |
| - walk the list, using a sliding window, seeing if an object | |
| can be diffed against another object in the window | |
| - write out the list in recency order</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>The traditional understatement:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> I suspect that what I'm missing is the precise definition of | |
| the word "magic"</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>The traditional insight:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><pasky> yes</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>And Babel-like confusion flowed.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> oh, hmm, and I'm not sure what this sliding window means either</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><pasky> iirc, it appeared to me to be just the sha1 of the object | |
| when reading the code casually ...</code></pre> | |
| </div></div> | |
| <div class="olist lowerroman"><ol class="lowerroman"> | |
| <li> | |
| <p> | |
| which simply doesn’t sound as a very good heuristics, though ;) | |
| </p> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> .....and recency order. okay, I think it's clear I didn't | |
| even realize how much I wasn't realizing :-)</code></pre> | |
| </div></div> | |
| </li> | |
| </ol></div> | |
| <div class="paragraph"><p>Ah, grasshopper! And thus the enlightenment begins anew.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> The "magic" is actually in theory totally arbitrary. | |
| ANY order will give you a working pack, but no, it's not | |
| ordered by SHA-1.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Before talking about the ordering for the sliding delta | |
| window, let's talk about the recency order. That's more | |
| important in one way.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Right, but if all you want is a working way to pack things | |
| together, you could just use cat and save yourself some | |
| trouble...</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Waaait for it….</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> The recency ordering (which is basically: put objects | |
| _physically_ into the pack in the order that they are | |
| "reachable" from the head) is important.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> okay</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> It's important because that's the thing that gives packs | |
| good locality. It keeps the objects close to the head (whether | |
| they are old or new, but they are _reachable_ from the head) | |
| at the head of the pack. So packs actually have absolutely | |
| _wonderful_ IO patterns.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Read that again, because it is important.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> But recency ordering is totally useless for deciding how | |
| to actually generate the deltas, so the delta ordering is | |
| something else.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>The delta ordering is (wait for it): | |
| - first sort by the "basename" of the object, as defined by | |
| the name the object was _first_ reached through when | |
| generating the object list | |
| - within the same basename, sort by size of the object | |
| - but always sort different types separately (commits first).</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>That's not exactly it, but it's very close.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> The "_first_ reached" thing is not too important, just you | |
| need some way to break ties since the same objects may be | |
| reachable many ways, yes?</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>And as if to clarify:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> The point is that it's all really just any random | |
| heuristic, and the ordering is totally unimportant for | |
| correctness, but it helps a lot if the heuristic gives | |
| "clumping" for things that are likely to delta well against | |
| each other.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>It is an important point, so secretly, I did my own research and have | |
| included my results below. To be fair, it has changed some over time. | |
| And through the magic of Revisionistic History, I draw upon this entry | |
| from The Git IRC Logs on my father’s birthday, March 1:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><gitster> The quote from the above linus should be rewritten a | |
| bit (wait for it): | |
| - first sort by type. Different objects never delta with | |
| each other. | |
| - then sort by filename/dirname. hash of the basename | |
| occupies the top BITS_PER_INT-DIR_BITS bits, and bottom | |
| DIR_BITS are for the hash of leading path elements. | |
| - then if we are doing "thin" pack, the objects we are _not_ | |
| going to pack but we know about are sorted earlier than | |
| other objects. | |
| - and finally sort by size, larger to smaller.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>In one swell-foop, clarification and obscurification! Nonetheless, | |
| authoritative. Cryptic, yet concise. It even solicits notions of | |
| quotes from The Source Code. Clearly, more study is needed.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><gitster> That's the sort order. What this means is: | |
| - we do not delta different object types. | |
| - we prefer to delta the objects with the same full path, but | |
| allow files with the same name from different directories. | |
| - we always prefer to delta against objects we are not going | |
| to send, if there are some. | |
| - we prefer to delta against larger objects, so that we have | |
| lots of removals.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>The penultimate rule is for "thin" packs. It is used when | |
| the other side is known to have such objects.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>There it is again. "Thin" packs. I’m thinking to myself, "What | |
| is a <em>thin</em> pack?" So I ask:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><jdl> What is a "thin" pack?</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><gitster> Use of --objects-edge to rev-list as the upstream of | |
| pack-objects. The pack transfer protocol negotiates that.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Woo hoo! Cleared that <em>right</em> up!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><gitster> There are two directions - push and fetch.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>There! Did you see it? It is not <em>"push" and "pull"</em>! How often the | |
| confusion has started here. So casually mentioned, too!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><gitster> For push, git-send-pack invokes git-receive-pack on the | |
| other end. The receive-pack says "I have up to these commits". | |
| send-pack looks at them, and computes what are missing from | |
| the other end. So "thin" could be the default there.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>In the other direction, fetch, git-fetch-pack and | |
| git-clone-pack invokes git-upload-pack on the other end | |
| (via ssh or by talking to the daemon).</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>There are two cases: fetch-pack with -k and clone-pack is one, | |
| fetch-pack without -k is the other. clone-pack and fetch-pack | |
| with -k will keep the downloaded packfile without expanded, so | |
| we do not use thin pack transfer. Otherwise, the generated | |
| pack will have delta without base object in the same pack.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>But fetch-pack without -k will explode the received pack into | |
| individual objects, so we automatically ask upload-pack to | |
| give us a thin pack if upload-pack supports it.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>OK then.</p></div> | |
| <div class="paragraph"><p>Uh.</p></div> | |
| <div class="paragraph"><p>Let’s return to the previous conversation still in progress.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> and "basename" means something like "the tail of end of | |
| path of file objects and dir objects, as per basename(3), and | |
| we just declare all commit and tag objects to have the same | |
| basename" or something?</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Luckily, that too is a point that gitster clarified for us!</p></div> | |
| <div class="paragraph"><p>If I might add, the trick is to make files that <em>might</em> be similar be | |
| located close to each other in the hash buckets based on their file | |
| names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and | |
| "Makefile" all landed in the same bucket due to their common basename, | |
| "Makefile". However, now they land in "close" buckets.</p></div> | |
| <div class="paragraph"><p>The algorithm allows not just for the <em>same</em> bucket, but for <em>close</em> | |
| buckets to be considered delta candidates. The rationale is | |
| essentially that files, like Makefiles, often have very similar | |
| content no matter what directory they live in.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> I played around with different delta algorithms, and with | |
| making the "delta window" bigger, but having too big of a | |
| sliding window makes it very expensive to generate the pack: | |
| you need to compare every object with a _ton_ of other objects.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>There are a number of other trivial heuristics too, which | |
| basically boil down to "don't bother even trying to delta this | |
| pair" if we can tell before-hand that the delta isn't worth it | |
| (due to size differences, where we can take a previous delta | |
| result into account to decide that "ok, no point in trying | |
| that one, it will be worse").</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>End result: packing is actually very size efficient. It's | |
| somewhat CPU-wasteful, but on the other hand, since you're | |
| really only supposed to do it maybe once a month (and you can | |
| do it during the night), nobody really seems to care.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Nice Engineering Touch, there. Find when it doesn’t matter, and | |
| proclaim it a non-issue. Good style too!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> So, just to repeat to see if I'm following, we start by | |
| getting a list of the objects we want to pack, we sort it by | |
| this heuristic (basically lexicographically on the tuple | |
| (type, basename, size)).</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Then we walk through this list, and calculate a delta of | |
| each object against the last n (tunable parameter) objects, | |
| and pick the smallest of these deltas.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Vastly simplified, but the essence is there!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Correct.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> And then once we have picked a delta or fulltext to | |
| represent each object, we re-sort by recency, and write them | |
| out in that order.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Yup. Some other small details:</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>And of course there is the "Other Shoe" Factor too.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> - We limit the delta depth to another magic value (right | |
| now both the window and delta depth magic values are just "10")</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Hrm, my intuition is that you'd end up with really _bad_ IO | |
| patterns, because the things you want are near by, but to | |
| actually reconstruct them you may have to jump all over in | |
| random ways.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> - When we write out a delta, and we haven't yet written | |
| out the object it is a delta against, we write out the base | |
| object first. And no, when we reconstruct them, we actually | |
| get nice IO patterns, because: | |
| - larger objects tend to be "more recent" (Linus' law: files grow) | |
| - we actively try to generate deltas from a larger object to a | |
| smaller one | |
| - this means that the top-of-tree very seldom has deltas | |
| (i.e. deltas in _practice_ are "backwards deltas")</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Again, we should reread that whole paragraph. Not just because | |
| Linus has slipped Linus’s Law in there on us, but because it is | |
| important. Let’s make sure we clarify some of the points here:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> So the point is just that in practice, delta order and | |
| recency order match each other quite well.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Yes. There's another nice side to this (and yes, it was | |
| designed that way ;): | |
| - the reason we generate deltas against the larger object is | |
| actually a big space saver too!</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Hmm, but your last comment (if "we haven't yet written out | |
| the object it is a delta against, we write out the base object | |
| first"), seems like it would make these facts mostly | |
| irrelevant because even if in practice you would not have to | |
| wander around much, in fact you just brute-force say that in | |
| the cases where you might have to wander, don't do that :-)</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Yes and no. Notice the rule: we only write out the base | |
| object first if the delta against it was more recent. That | |
| means that you can actually have deltas that refer to a base | |
| object that is _not_ close to the delta object, but that only | |
| happens when the delta is needed to generate an _old_ object.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> See?</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Yeah, no. I missed that on the first two or three readings myself.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> This keeps the front of the pack dense. The front of the | |
| pack never contains data that isn't relevant to a "recent" | |
| object. The size optimization comes from our use of xdelta | |
| (but is true for many other delta algorithms): removing data | |
| is cheaper (in size) than adding data.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>When you remove data, you only need to say "copy bytes n--m". | |
| In contrast, in a delta that _adds_ data, you have to say "add | |
| these bytes: 'actual data goes here'"</code></pre> | |
| </div></div> | |
| <div class="ulist"><ul> | |
| <li> | |
| <p> | |
| njs` has quit: Read error: 104 (Connection reset by peer) | |
| </p> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Uhhuh. I hope I didn't blow njs` mind.</code></pre> | |
| </div></div> | |
| </li> | |
| <li> | |
| <p> | |
| njs` has joined channel #git | |
| </p> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><pasky> :)</code></pre> | |
| </div></div> | |
| </li> | |
| </ul></div> | |
| <div class="paragraph"><p>The silent observers are amused. Of course.</p></div> | |
| <div class="paragraph"><p>And as if njs` was expected to be omniscient:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> njs - did you miss anything?</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>OK, I’ll spell it out. That’s Geek Humor. If njs` was not actually | |
| connected for a little bit there, how would he know if missed anything | |
| while he was disconnected? He’s a benevolent dictator with a sense of | |
| humor! Well noted!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Stupid router. Or gremlins, or whatever.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>It’s a cheap shot at Cisco. Take 'em when you can.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Yes and no. Notice the rule: we only write out the base | |
| object first if the delta against it was more recent.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>I'm getting lost in all these orders, let me re-read :-) | |
| So the write-out order is from most recent to least recent? | |
| (Conceivably it could be the opposite way too, I'm not sure if | |
| we've said) though my connection back at home is logging, so I | |
| can just read what you said there :-)</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>And for those of you paying attention, the Omniscient Trick has just | |
| been detailed!</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Yes, we always write out most recent first</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> And, yeah, I got the part about deeper-in-history stuff | |
| having worse IO characteristics, one sort of doesn't care.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> With the caveat that if the "most recent" needs an older | |
| object to delta against (hey, shrinking sometimes does | |
| happen), we write out the old object with the delta.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> (if only it happened more...)</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Anyway, the pack-file could easily be denser still, but | |
| because it's used both for streaming (the Git protocol) and | |
| for on-disk, it has a few pessimizations.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Actually, it is a made-up word. But it is a made-up word being | |
| used as setup for a later optimization, which is a real word:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> In particular, while the pack-file is then compressed, | |
| it's compressed just one object at a time, so the actual | |
| compression factor is less than it could be in theory. But it | |
| means that it's all nice random-access with a simple index to | |
| do "object name->location in packfile" translation.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> I'm assuming the real win for delta-ing large->small is | |
| more homogeneous statistics for gzip to run over?</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>(You have to put the bytes in one place or another, but | |
| putting them in a larger blob wins on compression)</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Actually, what is the compression strategy -- each delta | |
| individually gzipped, the whole file gzipped, somewhere in | |
| between, no compression at all, ....?</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Right.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Reality IRC sets in. For example:</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><pasky> I'll read the rest in the morning, I really have to go | |
| sleep or there's no hope whatsoever for me at the today's | |
| exam... g'nite all.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Heh.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> pasky: g'nite</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> pasky: 'luck</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Right: large->small matters exactly because of compression | |
| behaviour. If it was non-compressed, it probably wouldn't make | |
| any difference.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> yeah</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Anyway: I'm not even trying to claim that the pack-files | |
| are perfect, but they do tend to have a nice balance of | |
| density vs ease-of use.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Gasp! OK, saved. That’s a fair Engineering trade off. Close call! | |
| In fact, Linus reflects on some Basic Engineering Fundamentals, | |
| design options, etc.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> More importantly, they allow Git to still _conceptually_ | |
| never deal with deltas at all, and be a "whole object" store.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Which has some problems (we discussed bad huge-file | |
| behaviour on the Git lists the other day), but it does mean | |
| that the basic Git concepts are really really simple and | |
| straightforward.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>It's all been quite stable.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Which I think is very much a result of having very simple | |
| basic ideas, so that there's never any confusion about what's | |
| going on.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code>Bugs happen, but they are "simple" bugs. And bugs that | |
| actually get some object store detail wrong are almost always | |
| so obvious that they never go anywhere.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> Yeah.</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>Nuff said.</p></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><linus> Anyway. I'm off for bed. It's not 6AM here, but I've got | |
| three kids, and have to get up early in the morning to send | |
| them off. I need my beauty sleep.</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> :-)</code></pre> | |
| </div></div> | |
| <div class="literalblock"> | |
| <div class="content"> | |
| <pre><code><njs`> appreciate the infodump, I really was failing to find the | |
| details on Git packs :-)</code></pre> | |
| </div></div> | |
| <div class="paragraph"><p>And now you know the rest of the story.</p></div> | |
| </div> | |
| </div> | |
| </div> | |
| <div id="footnotes"><hr /></div> | |
| <div id="footer"> | |
| <div id="footer-text"> | |
| Last updated | |
| 2018-01-26 15:11:04 PST | |
| </div> | |
| </div> | |
| </body> | |
| </html> |