Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 1 | Concerning Git's Packing Heuristics |
| 2 | =================================== |
| 3 | |
| 4 | Oh, here's a really stupid question: |
| 5 | |
| 6 | Where do I go |
| 7 | to learn the details |
| 8 | of git's packing heuristics? |
| 9 | |
| 10 | Be careful what you ask! |
| 11 | |
| 12 | Followers of the git, please open the git IRC Log and turn to |
| 13 | February 10, 2006. |
| 14 | |
| 15 | It's a rare occasion, and we are joined by the King Git Himself, |
| 16 | Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor |
| 17 | and seeks enlightenment. Others are present, but silent. |
| 18 | |
| 19 | Let's listen in! |
| 20 | |
| 21 | <njs`> Oh, here's a really stupid question -- where do I go to |
| 22 | learn the details of git's packing heuristics? google avails |
| 23 | me not, reading the source didn't help a lot, and wading |
| 24 | through the whole mailing list seems less efficient than any |
| 25 | of that. |
| 26 | |
| 27 | It is a bold start! A plea for help combined with a simultaneous |
| 28 | tri-part attack on some of the tried and true mainstays in the quest |
| 29 | for enlightenment. Brash accusations of google being useless. Hubris! |
| 30 | Maligning the source. Heresy! Disdain for the mailing list archives. |
| 31 | Woe. |
| 32 | |
| 33 | <pasky> yes, the packing-related delta stuff is somewhat |
| 34 | mysterious even for me ;) |
| 35 | |
| 36 | Ah! Modesty after all. |
| 37 | |
| 38 | <linus> njs, I don't think the docs exist. That's something where |
| 39 | I don't think anybody else than me even really got involved. |
| 40 | Most of the rest of git others have been busy with (especially |
| 41 | Junio), but packing nobody touched after I did it. |
| 42 | |
| 43 | It's cryptic, yet vague. Linus in style for sure. Wise men |
| 44 | interpret this as an apology. A few argue it is merely a |
| 45 | statement of fact. |
| 46 | |
| 47 | <njs`> I guess the next step is "read the source again", but I |
| 48 | have to build up a certain level of gumption first :-) |
| 49 | |
| 50 | Indeed! On both points. |
| 51 | |
| 52 | <linus> The packing heuristic is actually really really simple. |
| 53 | |
| 54 | Bait... |
| 55 | |
| 56 | <linus> But strange. |
| 57 | |
| 58 | And switch. That ought to do it! |
| 59 | |
| 60 | <linus> Remember: git really doesn't follow files. So what it does is |
| 61 | - generate a list of all objects |
| 62 | - sort the list according to magic heuristics |
| 63 | - walk the list, using a sliding window, seeing if an object |
| 64 | can be diffed against another object in the window |
| 65 | - write out the list in recency order |
| 66 | |
| 67 | The traditional understatement: |
| 68 | |
| 69 | <njs`> I suspect that what I'm missing is the precise definition of |
| 70 | the word "magic" |
| 71 | |
| 72 | The traditional insight: |
| 73 | |
| 74 | <pasky> yes |
| 75 | |
| 76 | And Babel-like confusion flowed. |
| 77 | |
| 78 | <njs`> oh, hmm, and I'm not sure what this sliding window means either |
| 79 | |
| 80 | <pasky> iirc, it appeared to me to be just the sha1 of the object |
| 81 | when reading the code casually ... |
| 82 | |
| 83 | ... which simply doesn't sound as a very good heuristics, though ;) |
| 84 | |
| 85 | <njs`> .....and recency order. okay, I think it's clear I didn't |
| 86 | even realize how much I wasn't realizing :-) |
| 87 | |
| 88 | Ah, grasshopper! And thus the enlightenment begins anew. |
| 89 | |
| 90 | <linus> The "magic" is actually in theory totally arbitrary. |
| 91 | ANY order will give you a working pack, but no, it's not |
| 92 | ordered by SHA1. |
| 93 | |
| 94 | Before talking about the ordering for the sliding delta |
| 95 | window, let's talk about the recency order. That's more |
| 96 | important in one way. |
| 97 | |
| 98 | <njs`> Right, but if all you want is a working way to pack things |
| 99 | together, you could just use cat and save yourself some |
| 100 | trouble... |
| 101 | |
| 102 | Waaait for it.... |
| 103 | |
| 104 | <linus> The recency ordering (which is basically: put objects |
| 105 | _physically_ into the pack in the order that they are |
| 106 | "reachable" from the head) is important. |
| 107 | |
| 108 | <njs`> okay |
| 109 | |
| 110 | <linus> It's important because that's the thing that gives packs |
| 111 | good locality. It keeps the objects close to the head (whether |
| 112 | they are old or new, but they are _reachable_ from the head) |
| 113 | at the head of the pack. So packs actually have absolutely |
| 114 | _wonderful_ IO patterns. |
| 115 | |
| 116 | Read that again, because it is important. |
| 117 | |
| 118 | <linus> But recency ordering is totally useless for deciding how |
| 119 | to actually generate the deltas, so the delta ordering is |
| 120 | something else. |
| 121 | |
| 122 | The delta ordering is (wait for it): |
| 123 | - first sort by the "basename" of the object, as defined by |
| 124 | the name the object was _first_ reached through when |
| 125 | generating the object list |
| 126 | - within the same basename, sort by size of the object |
| 127 | - but always sort different types separately (commits first). |
| 128 | |
| 129 | That's not exactly it, but it's very close. |
| 130 | |
| 131 | <njs`> The "_first_ reached" thing is not too important, just you |
| 132 | need some way to break ties since the same objects may be |
| 133 | reachable many ways, yes? |
| 134 | |
| 135 | And as if to clarify: |
| 136 | |
| 137 | <linus> The point is that it's all really just any random |
| 138 | heuristic, and the ordering is totally unimportant for |
| 139 | correctness, but it helps a lot if the heuristic gives |
| 140 | "clumping" for things that are likely to delta well against |
| 141 | each other. |
| 142 | |
| 143 | It is an important point, so secretly, I did my own research and have |
| 144 | included my results below. To be fair, it has changed some over time. |
| 145 | And through the magic of Revisionistic History, I draw upon this entry |
| 146 | from The Git IRC Logs on my father's birthday, March 1: |
| 147 | |
| 148 | <gitster> The quote from the above linus should be rewritten a |
| 149 | bit (wait for it): |
| 150 | - first sort by type. Different objects never delta with |
| 151 | each other. |
| 152 | - then sort by filename/dirname. hash of the basename |
| 153 | occupies the top BITS_PER_INT-DIR_BITS bits, and bottom |
| 154 | DIR_BITS are for the hash of leading path elements. |
| 155 | - then if we are doing "thin" pack, the objects we are _not_ |
| 156 | going to pack but we know about are sorted earlier than |
| 157 | other objects. |
| 158 | - and finally sort by size, larger to smaller. |
| 159 | |
| 160 | In one swell-foop, clarification and obscurification! Nonetheless, |
| 161 | authoritative. Cryptic, yet concise. It even solicits notions of |
| 162 | quotes from The Source Code. Clearly, more study is needed. |
| 163 | |
| 164 | <gitster> That's the sort order. What this means is: |
| 165 | - we do not delta different object types. |
| 166 | - we prefer to delta the objects with the same full path, but |
| 167 | allow files with the same name from different directories. |
| 168 | - we always prefer to delta against objects we are not going |
| 169 | to send, if there are some. |
| 170 | - we prefer to delta against larger objects, so that we have |
| 171 | lots of removals. |
| 172 | |
| 173 | The penultimate rule is for "thin" packs. It is used when |
| 174 | the other side is known to have such objects. |
| 175 | |
| 176 | There it is again. "Thin" packs. I'm thinking to myself, "What |
| 177 | is a 'thin' pack?" So I ask: |
| 178 | |
| 179 | <jdl> What is a "thin" pack? |
| 180 | |
| 181 | <gitster> Use of --objects-edge to rev-list as the upstream of |
| 182 | pack-objects. The pack transfer protocol negotiates that. |
| 183 | |
| 184 | Woo hoo! Cleared that _right_ up! |
| 185 | |
| 186 | <gitster> There are two directions - push and fetch. |
| 187 | |
| 188 | There! Did you see it? It is not '"push" and "pull"'! How often the |
| 189 | confusion has started here. So casually mentioned, too! |
| 190 | |
| 191 | <gitster> For push, git-send-pack invokes git-receive-pack on the |
| 192 | other end. The receive-pack says "I have up to these commits". |
| 193 | send-pack looks at them, and computes what are missing from |
| 194 | the other end. So "thin" could be the default there. |
| 195 | |
| 196 | In the other direction, fetch, git-fetch-pack and |
| 197 | git-clone-pack invokes git-upload-pack on the other end |
| 198 | (via ssh or by talking to the daemon). |
| 199 | |
| 200 | There are two cases: fetch-pack with -k and clone-pack is one, |
| 201 | fetch-pack without -k is the other. clone-pack and fetch-pack |
| 202 | with -k will keep the downloaded packfile without expanded, so |
| 203 | we do not use thin pack transfer. Otherwise, the generated |
| 204 | pack will have delta without base object in the same pack. |
| 205 | |
| 206 | But fetch-pack without -k will explode the received pack into |
| 207 | individual objects, so we automatically ask upload-pack to |
| 208 | give us a thin pack if upload-pack supports it. |
| 209 | |
| 210 | OK then. |
| 211 | |
| 212 | Uh. |
| 213 | |
| 214 | Let's return to the previous conversation still in progress. |
| 215 | |
| 216 | <njs`> and "basename" means something like "the tail of end of |
| 217 | path of file objects and dir objects, as per basename(3), and |
| 218 | we just declare all commit and tag objects to have the same |
| 219 | basename" or something? |
| 220 | |
| 221 | Luckily, that too is a point that gitster clarified for us! |
| 222 | |
| 223 | If I might add, the trick is to make files that _might_ be similar be |
| 224 | located close to each other in the hash buckets based on their file |
| 225 | names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and |
| 226 | "Makefile" all landed in the same bucket due to their common basename, |
| 227 | "Makefile". However, now they land in "close" buckets. |
| 228 | |
| 229 | The algorithm allows not just for the _same_ bucket, but for _close_ |
| 230 | buckets to be considered delta candidates. The rationale is |
| 231 | essentially that files, like Makefiles, often have very similar |
| 232 | content no matter what directory they live in. |
| 233 | |
| 234 | <linus> I played around with different delta algorithms, and with |
| 235 | making the "delta window" bigger, but having too big of a |
| 236 | sliding window makes it very expensive to generate the pack: |
| 237 | you need to compare every object with a _ton_ of other objects. |
| 238 | |
| 239 | There are a number of other trivial heuristics too, which |
| 240 | basically boil down to "don't bother even trying to delta this |
| 241 | pair" if we can tell before-hand that the delta isn't worth it |
| 242 | (due to size differences, where we can take a previous delta |
| 243 | result into account to decide that "ok, no point in trying |
| 244 | that one, it will be worse"). |
| 245 | |
| 246 | End result: packing is actually very size efficient. It's |
| 247 | somewhat CPU-wasteful, but on the other hand, since you're |
| 248 | really only supposed to do it maybe once a month (and you can |
| 249 | do it during the night), nobody really seems to care. |
| 250 | |
| 251 | Nice Engineering Touch, there. Find when it doesn't matter, and |
| 252 | proclaim it a non-issue. Good style too! |
| 253 | |
| 254 | <njs`> So, just to repeat to see if I'm following, we start by |
| 255 | getting a list of the objects we want to pack, we sort it by |
| 256 | this heuristic (basically lexicographically on the tuple |
| 257 | (type, basename, size)). |
| 258 | |
| 259 | Then we walk through this list, and calculate a delta of |
| 260 | each object against the last n (tunable parameter) objects, |
| 261 | and pick the smallest of these deltas. |
| 262 | |
| 263 | Vastly simplified, but the essence is there! |
| 264 | |
| 265 | <linus> Correct. |
| 266 | |
| 267 | <njs`> And then once we have picked a delta or fulltext to |
| 268 | represent each object, we re-sort by recency, and write them |
| 269 | out in that order. |
| 270 | |
| 271 | <linus> Yup. Some other small details: |
| 272 | |
| 273 | And of course there is the "Other Shoe" Factor too. |
| 274 | |
| 275 | <linus> - We limit the delta depth to another magic value (right |
| 276 | now both the window and delta depth magic values are just "10") |
| 277 | |
| 278 | <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO |
| 279 | patterns, because the things you want are near by, but to |
| 280 | actually reconstruct them you may have to jump all over in |
| 281 | random ways. |
| 282 | |
| 283 | <linus> - When we write out a delta, and we haven't yet written |
| 284 | out the object it is a delta against, we write out the base |
| 285 | object first. And no, when we reconstruct them, we actually |
| 286 | get nice IO patterns, because: |
| 287 | - larger objects tend to be "more recent" (Linus' law: files grow) |
| 288 | - we actively try to generate deltas from a larger object to a |
| 289 | smaller one |
| 290 | - this means that the top-of-tree very seldom has deltas |
| 291 | (i.e. deltas in _practice_ are "backwards deltas") |
| 292 | |
| 293 | Again, we should reread that whole paragraph. Not just because |
| 294 | Linus has slipped Linus's Law in there on us, but because it is |
| 295 | important. Let's make sure we clarify some of the points here: |
| 296 | |
| 297 | <njs`> So the point is just that in practice, delta order and |
| 298 | recency order match each other quite well. |
| 299 | |
| 300 | <linus> Yes. There's another nice side to this (and yes, it was |
| 301 | designed that way ;): |
| 302 | - the reason we generate deltas against the larger object is |
| 303 | actually a big space saver too! |
| 304 | |
| 305 | <njs`> Hmm, but your last comment (if "we haven't yet written out |
| 306 | the object it is a delta against, we write out the base object |
| 307 | first"), seems like it would make these facts mostly |
| 308 | irrelevant because even if in practice you would not have to |
| 309 | wander around much, in fact you just brute-force say that in |
| 310 | the cases where you might have to wander, don't do that :-) |
| 311 | |
| 312 | <linus> Yes and no. Notice the rule: we only write out the base |
| 313 | object first if the delta against it was more recent. That |
| 314 | means that you can actually have deltas that refer to a base |
| 315 | object that is _not_ close to the delta object, but that only |
| 316 | happens when the delta is needed to generate an _old_ object. |
| 317 | |
| 318 | <linus> See? |
| 319 | |
| 320 | Yeah, no. I missed that on the first two or three readings myself. |
| 321 | |
| 322 | <linus> This keeps the front of the pack dense. The front of the |
| 323 | pack never contains data that isn't relevant to a "recent" |
| 324 | object. The size optimization comes from our use of xdelta |
| 325 | (but is true for many other delta algorithms): removing data |
| 326 | is cheaper (in size) than adding data. |
| 327 | |
| 328 | When you remove data, you only need to say "copy bytes n--m". |
| 329 | In contrast, in a delta that _adds_ data, you have to say "add |
| 330 | these bytes: 'actual data goes here'" |
| 331 | |
| 332 | *** njs` has quit: Read error: 104 (Connection reset by peer) |
| 333 | |
| 334 | <linus> Uhhuh. I hope I didn't blow njs` mind. |
| 335 | |
| 336 | *** njs` has joined channel #git |
| 337 | |
| 338 | <pasky> :) |
| 339 | |
| 340 | The silent observers are amused. Of course. |
| 341 | |
| 342 | And as if njs` was expected to be omniscient: |
| 343 | |
| 344 | <linus> njs - did you miss anything? |
| 345 | |
| 346 | OK, I'll spell it out. That's Geek Humor. If njs` was not actually |
| 347 | connected for a little bit there, how would he know if missed anything |
| 348 | while he was disconnected? He's a benevolent dictator with a sense of |
| 349 | humor! Well noted! |
| 350 | |
| 351 | <njs`> Stupid router. Or gremlins, or whatever. |
| 352 | |
| 353 | It's a cheap shot at Cisco. Take 'em when you can. |
| 354 | |
| 355 | <njs`> Yes and no. Notice the rule: we only write out the base |
| 356 | object first if the delta against it was more recent. |
| 357 | |
| 358 | I'm getting lost in all these orders, let me re-read :-) |
| 359 | So the write-out order is from most recent to least recent? |
| 360 | (Conceivably it could be the opposite way too, I'm not sure if |
| 361 | we've said) though my connection back at home is logging, so I |
| 362 | can just read what you said there :-) |
| 363 | |
| 364 | And for those of you paying attention, the Omniscient Trick has just |
| 365 | been detailed! |
| 366 | |
| 367 | <linus> Yes, we always write out most recent first |
| 368 | |
| 369 | For the other record: |
| 370 | |
| 371 | <pasky> njs`: http://pastebin.com/547965 |
| 372 | |
| 373 | The 'net never forgets, so that should be good until the end of time. |
| 374 | |
| 375 | <njs`> And, yeah, I got the part about deeper-in-history stuff |
| 376 | having worse IO characteristics, one sort of doesn't care. |
| 377 | |
| 378 | <linus> With the caveat that if the "most recent" needs an older |
| 379 | object to delta against (hey, shrinking sometimes does |
| 380 | happen), we write out the old object with the delta. |
| 381 | |
| 382 | <njs`> (if only it happened more...) |
| 383 | |
| 384 | <linus> Anyway, the pack-file could easily be denser still, but |
| 385 | because it's used both for streaming (the git protocol) and |
| 386 | for on-disk, it has a few pessimizations. |
| 387 | |
| 388 | Actually, it is a made-up word. But it is a made-up word being |
| 389 | used as setup for a later optimization, which is a real word: |
| 390 | |
| 391 | <linus> In particular, while the pack-file is then compressed, |
| 392 | it's compressed just one object at a time, so the actual |
| 393 | compression factor is less than it could be in theory. But it |
| 394 | means that it's all nice random-access with a simple index to |
| 395 | do "object name->location in packfile" translation. |
| 396 | |
| 397 | <njs`> I'm assuming the real win for delta-ing large->small is |
| 398 | more homogeneous statistics for gzip to run over? |
| 399 | |
| 400 | (You have to put the bytes in one place or another, but |
| 401 | putting them in a larger blob wins on compression) |
| 402 | |
| 403 | Actually, what is the compression strategy -- each delta |
| 404 | individually gzipped, the whole file gzipped, somewhere in |
| 405 | between, no compression at all, ....? |
| 406 | |
| 407 | Right. |
| 408 | |
| 409 | Reality IRC sets in. For example: |
| 410 | |
| 411 | <pasky> I'll read the rest in the morning, I really have to go |
| 412 | sleep or there's no hope whatsoever for me at the today's |
| 413 | exam... g'nite all. |
| 414 | |
| 415 | Heh. |
| 416 | |
| 417 | <linus> pasky: g'nite |
| 418 | |
| 419 | <njs`> pasky: 'luck |
| 420 | |
| 421 | <linus> Right: large->small matters exactly because of compression |
| 422 | behaviour. If it was non-compressed, it probably wouldn't make |
| 423 | any difference. |
| 424 | |
| 425 | <njs`> yeah |
| 426 | |
| 427 | <linus> Anyway: I'm not even trying to claim that the pack-files |
| 428 | are perfect, but they do tend to have a nice balance of |
| 429 | density vs ease-of use. |
| 430 | |
| 431 | Gasp! OK, saved. That's a fair Engineering trade off. Close call! |
| 432 | In fact, Linus reflects on some Basic Engineering Fundamentals, |
| 433 | design options, etc. |
| 434 | |
| 435 | <linus> More importantly, they allow git to still _conceptually_ |
| 436 | never deal with deltas at all, and be a "whole object" store. |
| 437 | |
| 438 | Which has some problems (we discussed bad huge-file |
| 439 | behaviour on the git lists the other day), but it does mean |
| 440 | that the basic git concepts are really really simple and |
| 441 | straightforward. |
| 442 | |
| 443 | It's all been quite stable. |
| 444 | |
| 445 | Which I think is very much a result of having very simple |
| 446 | basic ideas, so that there's never any confusion about what's |
| 447 | going on. |
| 448 | |
| 449 | Bugs happen, but they are "simple" bugs. And bugs that |
| 450 | actually get some object store detail wrong are almost always |
| 451 | so obvious that they never go anywhere. |
| 452 | |
| 453 | <njs`> Yeah. |
| 454 | |
| 455 | Nuff said. |
| 456 | |
| 457 | <linus> Anyway. I'm off for bed. It's not 6AM here, but I've got |
| 458 | three kids, and have to get up early in the morning to send |
| 459 | them off. I need my beauty sleep. |
| 460 | |
| 461 | <njs`> :-) |
| 462 | |
| 463 | <njs`> appreciate the infodump, I really was failing to find the |
| 464 | details on git packs :-) |
| 465 | |
| 466 | And now you know the rest of the story. |