Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 1 | GIT pack format |
| 2 | =============== |
| 3 | |
Junio C Hamano | f2b7494 | 2012-11-20 21:06:26 | [diff] [blame^] | 4 | == pack-*.pack files have the following format: |
Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 5 | |
| 6 | - A header appears at the beginning and consists of the following: |
| 7 | |
| 8 | 4-byte signature: |
| 9 | The signature is: {'P', 'A', 'C', 'K'} |
| 10 | |
| 11 | 4-byte version number (network byte order): |
| 12 | GIT currently accepts version number 2 or 3 but |
| 13 | generates version 2 only. |
| 14 | |
| 15 | 4-byte number of objects contained in the pack (network byte order) |
| 16 | |
| 17 | Observation: we cannot have more than 4G versions ;-) and |
| 18 | more than 4G objects in a pack. |
| 19 | |
| 20 | - The header is followed by number of object entries, each of |
| 21 | which looks like this: |
| 22 | |
| 23 | (undeltified representation) |
| 24 | n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
| 25 | compressed data |
| 26 | |
| 27 | (deltified representation) |
| 28 | n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
| 29 | 20-byte base object name |
| 30 | compressed delta data |
| 31 | |
| 32 | Observation: length of each object is encoded in a variable |
| 33 | length format and is not constrained to 32-bit or anything. |
| 34 | |
| 35 | - The trailer records 20-byte SHA1 checksum of all of the above. |
| 36 | |
Junio C Hamano | f2b7494 | 2012-11-20 21:06:26 | [diff] [blame^] | 37 | == Original (version 1) pack-*.idx files have the following format: |
Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 38 | |
| 39 | - The header consists of 256 4-byte network byte order |
| 40 | integers. N-th entry of this table records the number of |
| 41 | objects in the corresponding pack, the first byte of whose |
| 42 | object name is less than or equal to N. This is called the |
| 43 | 'first-level fan-out' table. |
| 44 | |
| 45 | - The header is followed by sorted 24-byte entries, one entry |
| 46 | per object in the pack. Each entry is: |
| 47 | |
| 48 | 4-byte network byte order integer, recording where the |
| 49 | object is stored in the packfile as the offset from the |
| 50 | beginning. |
| 51 | |
| 52 | 20-byte object name. |
| 53 | |
| 54 | - The file is concluded with a trailer: |
| 55 | |
| 56 | A copy of the 20-byte SHA1 checksum at the end of |
| 57 | corresponding packfile. |
| 58 | |
| 59 | 20-byte SHA1-checksum of all of the above. |
| 60 | |
| 61 | Pack Idx file: |
| 62 | |
| 63 | -- +--------------------------------+ |
| 64 | fanout | fanout[0] = 2 (for example) |-. |
| 65 | table +--------------------------------+ | |
| 66 | | fanout[1] | | |
| 67 | +--------------------------------+ | |
| 68 | | fanout[2] | | |
| 69 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 70 | | fanout[255] = total objects |---. |
| 71 | -- +--------------------------------+ | | |
| 72 | main | offset | | | |
| 73 | index | object name 00XXXXXXXXXXXXXXXX | | | |
| 74 | table +--------------------------------+ | | |
| 75 | | offset | | | |
| 76 | | object name 00XXXXXXXXXXXXXXXX | | | |
| 77 | +--------------------------------+<+ | |
| 78 | .-| offset | | |
| 79 | | | object name 01XXXXXXXXXXXXXXXX | | |
| 80 | | +--------------------------------+ | |
| 81 | | | offset | | |
| 82 | | | object name 01XXXXXXXXXXXXXXXX | | |
| 83 | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 84 | | | offset | | |
| 85 | | | object name FFXXXXXXXXXXXXXXXX | | |
| 86 | --| +--------------------------------+<--+ |
| 87 | trailer | | packfile checksum | |
| 88 | | +--------------------------------+ |
| 89 | | | idxfile checksum | |
| 90 | | +--------------------------------+ |
| 91 | .-------. |
| 92 | | |
| 93 | Pack file entry: <+ |
| 94 | |
| 95 | packed object header: |
| 96 | 1-byte size extension bit (MSB) |
| 97 | type (next 3 bit) |
| 98 | size0 (lower 4-bit) |
| 99 | n-byte sizeN (as long as MSB is set, each 7-bit) |
| 100 | size0..sizeN form 4+7+7+..+7 bit integer, size0 |
| 101 | is the least significant part, and sizeN is the |
| 102 | most significant part. |
| 103 | packed object data: |
| 104 | If it is not DELTA, then deflated bytes (the size above |
| 105 | is the size before compression). |
Junio C Hamano | c41cdd1 | 2008-04-07 06:14:15 | [diff] [blame] | 106 | If it is REF_DELTA, then |
Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 107 | 20-byte base object name SHA1 (the size above is the |
| 108 | size of the delta data that follows). |
| 109 | delta data, deflated. |
Junio C Hamano | c41cdd1 | 2008-04-07 06:14:15 | [diff] [blame] | 110 | If it is OFS_DELTA, then |
| 111 | n-byte offset (see below) interpreted as a negative |
| 112 | offset from the type-byte of the header of the |
| 113 | ofs-delta entry (the size above is the size of |
| 114 | the delta data that follows). |
| 115 | delta data, deflated. |
| 116 | |
| 117 | offset encoding: |
| 118 | n bytes with MSB set in all but the last one. |
| 119 | The offset is then the number constructed by |
| 120 | concatenating the lower 7 bit of each byte, and |
| 121 | for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) |
| 122 | to the result. |
| 123 | |
Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 124 | |
| 125 | |
Junio C Hamano | f2b7494 | 2012-11-20 21:06:26 | [diff] [blame^] | 126 | == Version 2 pack-*.idx files support packs larger than 4 GiB, and |
| 127 | have some other reorganizations. They have the format: |
Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 128 | |
| 129 | - A 4-byte magic number '\377tOc' which is an unreasonable |
| 130 | fanout[0] value. |
| 131 | |
| 132 | - A 4-byte version number (= 2) |
| 133 | |
| 134 | - A 256-entry fan-out table just like v1. |
| 135 | |
| 136 | - A table of sorted 20-byte SHA1 object names. These are |
| 137 | packed together without offset values to reduce the cache |
| 138 | footprint of the binary search for a specific object name. |
| 139 | |
| 140 | - A table of 4-byte CRC32 values of the packed object data. |
| 141 | This is new in v2 so compressed data can be copied directly |
Junio C Hamano | 4e27231 | 2008-01-08 09:13:21 | [diff] [blame] | 142 | from pack to pack during repacking without undetected |
Junio C Hamano | 3dac504 | 2007-12-15 08:40:54 | [diff] [blame] | 143 | data corruption. |
| 144 | |
| 145 | - A table of 4-byte offset values (in network byte order). |
| 146 | These are usually 31-bit pack file offsets, but large |
| 147 | offsets are encoded as an index into the next table with |
| 148 | the msbit set. |
| 149 | |
| 150 | - A table of 8-byte offset entries (empty for pack files less |
| 151 | than 2 GiB). Pack files are organized with heavily used |
| 152 | objects toward the front, so most object references should |
| 153 | not need to refer to this table. |
| 154 | |
| 155 | - The same trailer as a v1 pack file: |
| 156 | |
| 157 | A copy of the 20-byte SHA1 checksum at the end of |
| 158 | corresponding packfile. |
| 159 | |
| 160 | 20-byte SHA1-checksum of all of the above. |