Junio C Hamano | 076ffcc | 2013-02-06 05:13:21 | [diff] [blame] | 1 | Git index format |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 2 | ================ |
| 3 | |
Junio C Hamano | 076ffcc | 2013-02-06 05:13:21 | [diff] [blame] | 4 | == The Git index file has the following format |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 5 | |
| 6 | All binary numbers are in network byte order. Version 2 is described |
| 7 | here unless stated otherwise. |
| 8 | |
| 9 | - A 12-byte header consisting of |
| 10 | |
| 11 | 4-byte signature: |
| 12 | The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") |
| 13 | |
| 14 | 4-byte version number: |
Junio C Hamano | 947ab82 | 2013-03-19 23:07:29 | [diff] [blame] | 15 | The current supported versions are 2, 3 and 4. |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 16 | |
| 17 | 32-bit number of index entries. |
| 18 | |
| 19 | - A number of sorted index entries (see below). |
| 20 | |
| 21 | - Extensions |
| 22 | |
| 23 | Extensions are identified by signature. Optional extensions can |
Junio C Hamano | 076ffcc | 2013-02-06 05:13:21 | [diff] [blame] | 24 | be ignored if Git does not understand them. |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 25 | |
Junio C Hamano | 076ffcc | 2013-02-06 05:13:21 | [diff] [blame] | 26 | Git currently supports cached tree and resolve undo extensions. |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 27 | |
| 28 | 4-byte extension signature. If the first byte is 'A'..'Z' the |
| 29 | extension is optional and can be ignored. |
| 30 | |
| 31 | 32-bit size of the extension |
| 32 | |
| 33 | Extension data |
| 34 | |
| 35 | - 160-bit SHA-1 over the content of the index file before this |
| 36 | checksum. |
| 37 | |
| 38 | == Index entry |
| 39 | |
| 40 | Index entries are sorted in ascending order on the name field, |
| 41 | interpreted as a string of unsigned bytes (i.e. memcmp() order, no |
| 42 | localization, no special casing of directory separator '/'). Entries |
| 43 | with the same name are sorted by their stage field. |
| 44 | |
| 45 | 32-bit ctime seconds, the last time a file's metadata changed |
| 46 | this is stat(2) data |
| 47 | |
| 48 | 32-bit ctime nanosecond fractions |
| 49 | this is stat(2) data |
| 50 | |
| 51 | 32-bit mtime seconds, the last time a file's data changed |
| 52 | this is stat(2) data |
| 53 | |
| 54 | 32-bit mtime nanosecond fractions |
| 55 | this is stat(2) data |
| 56 | |
| 57 | 32-bit dev |
| 58 | this is stat(2) data |
| 59 | |
| 60 | 32-bit ino |
| 61 | this is stat(2) data |
| 62 | |
| 63 | 32-bit mode, split into (high to low bits) |
| 64 | |
| 65 | 4-bit object type |
| 66 | valid values in binary are 1000 (regular file), 1010 (symbolic link) |
| 67 | and 1110 (gitlink) |
| 68 | |
| 69 | 3-bit unused |
| 70 | |
| 71 | 9-bit unix permission. Only 0755 and 0644 are valid for regular files. |
| 72 | Symbolic links and gitlinks have value 0 in this field. |
| 73 | |
| 74 | 32-bit uid |
| 75 | this is stat(2) data |
| 76 | |
| 77 | 32-bit gid |
| 78 | this is stat(2) data |
| 79 | |
| 80 | 32-bit file size |
| 81 | This is the on-disk size from stat(2), truncated to 32-bit. |
| 82 | |
| 83 | 160-bit SHA-1 for the represented object |
| 84 | |
| 85 | A 16-bit 'flags' field split into (high to low bits) |
| 86 | |
| 87 | 1-bit assume-valid flag |
| 88 | |
| 89 | 1-bit extended flag (must be zero in version 2) |
| 90 | |
| 91 | 2-bit stage (during merge) |
| 92 | |
| 93 | 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF |
| 94 | is stored in this field. |
| 95 | |
Junio C Hamano | 947ab82 | 2013-03-19 23:07:29 | [diff] [blame] | 96 | (Version 3 or later) A 16-bit field, only applicable if the |
| 97 | "extended flag" above is 1, split into (high to low bits). |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 98 | |
| 99 | 1-bit reserved for future |
| 100 | |
| 101 | 1-bit skip-worktree flag (used by sparse checkout) |
| 102 | |
| 103 | 1-bit intent-to-add flag (used by "git add -N") |
| 104 | |
| 105 | 13-bit unused, must be zero |
| 106 | |
| 107 | Entry path name (variable length) relative to top level directory |
| 108 | (without leading slash). '/' is used as path separator. The special |
| 109 | path components ".", ".." and ".git" (without quotes) are disallowed. |
| 110 | Trailing slash is also disallowed. |
| 111 | |
| 112 | The exact encoding is undefined, but the '.' and '/' characters |
| 113 | are encoded in 7-bit ASCII and the encoding cannot contain a NUL |
| 114 | byte (iow, this is a UNIX pathname). |
| 115 | |
Junio C Hamano | b76a686 | 2012-05-02 22:02:46 | [diff] [blame] | 116 | (Version 4) In version 4, the entry path name is prefix-compressed |
| 117 | relative to the path name for the previous entry (the very first |
| 118 | entry is encoded as if the path name for the previous entry is an |
| 119 | empty string). At the beginning of an entry, an integer N in the |
| 120 | variable width encoding (the same encoding as the offset is encoded |
| 121 | for OFS_DELTA pack entries; see pack-format.txt) is stored, followed |
| 122 | by a NUL-terminated string S. Removing N bytes from the end of the |
| 123 | path name for the previous entry, and replacing it with the string S |
| 124 | yields the path name for this entry. |
| 125 | |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 126 | 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes |
| 127 | while keeping the name NUL-terminated. |
| 128 | |
Junio C Hamano | b76a686 | 2012-05-02 22:02:46 | [diff] [blame] | 129 | (Version 4) In version 4, the padding after the pathname does not |
| 130 | exist. |
| 131 | |
Junio C Hamano | 6f0c944 | 2014-07-16 21:51:32 | [diff] [blame] | 132 | Interpretation of index entries in split index mode is completely |
| 133 | different. See below for details. |
| 134 | |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 135 | == Extensions |
| 136 | |
| 137 | === Cached tree |
| 138 | |
| 139 | Cached tree extension contains pre-computed hashes for trees that can |
| 140 | be derived from the index. It helps speed up tree object generation |
| 141 | from index for a new commit. |
| 142 | |
| 143 | When a path is updated in index, the path must be invalidated and |
| 144 | removed from tree cache. |
| 145 | |
| 146 | The signature for this extension is { 'T', 'R', 'E', 'E' }. |
| 147 | |
| 148 | A series of entries fill the entire extension; each of which |
| 149 | consists of: |
| 150 | |
| 151 | - NUL-terminated path component (relative to its parent directory); |
| 152 | |
| 153 | - ASCII decimal number of entries in the index that is covered by the |
| 154 | tree this entry represents (entry_count); |
| 155 | |
| 156 | - A space (ASCII 32); |
| 157 | |
| 158 | - ASCII decimal number that represents the number of subtrees this |
| 159 | tree has; |
| 160 | |
| 161 | - A newline (ASCII 10); and |
| 162 | |
| 163 | - 160-bit object name for the object that would result from writing |
| 164 | this span of index as a tree. |
| 165 | |
Junio C Hamano | 0f7806a | 2011-08-01 03:31:32 | [diff] [blame] | 166 | An entry can be in an invalidated state and is represented by having |
Junio C Hamano | db81b99 | 2012-12-21 23:49:12 | [diff] [blame] | 167 | a negative number in the entry_count field. In this case, there is no |
| 168 | object name and the next entry starts immediately after the newline. |
| 169 | When writing an invalid entry, -1 should always be used as entry_count. |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 170 | |
| 171 | The entries are written out in the top-down, depth-first order. The |
| 172 | first entry represents the root level of the repository, followed by the |
Junio C Hamano | ee61580 | 2015-10-29 21:45:26 | [diff] [blame] | 173 | first subtree--let's call this A--of the root level (with its name |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 174 | relative to the root level), followed by the first subtree of A (with |
| 175 | its name relative to A), ... |
| 176 | |
| 177 | === Resolve undo |
| 178 | |
| 179 | A conflict is represented in the index as a set of higher stage entries. |
| 180 | When a conflict is resolved (e.g. with "git add path"), these higher |
Junio C Hamano | d7fccbf | 2013-07-25 03:24:57 | [diff] [blame] | 181 | stage entries will be removed and a stage-0 entry with proper resolution |
Junio C Hamano | f32ce26 | 2011-03-27 07:35:11 | [diff] [blame] | 182 | is added. |
| 183 | |
| 184 | When these higher stage entries are removed, they are saved in the |
| 185 | resolve undo extension, so that conflicts can be recreated (e.g. with |
| 186 | "git checkout -m"), in case users want to redo a conflict resolution |
| 187 | from scratch. |
| 188 | |
| 189 | The signature for this extension is { 'R', 'E', 'U', 'C' }. |
| 190 | |
| 191 | A series of entries fill the entire extension; each of which |
| 192 | consists of: |
| 193 | |
| 194 | - NUL-terminated pathname the entry describes (relative to the root of |
| 195 | the repository, i.e. full pathname); |
| 196 | |
| 197 | - Three NUL-terminated ASCII octal numbers, entry mode of entries in |
| 198 | stage 1 to 3 (a missing stage is represented by "0" in this field); |
| 199 | and |
| 200 | |
| 201 | - At most three 160-bit object names of the entry in stages from 1 to 3 |
| 202 | (nothing is written for a missing stage). |
| 203 | |
Junio C Hamano | 6f0c944 | 2014-07-16 21:51:32 | [diff] [blame] | 204 | === Split index |
| 205 | |
| 206 | In split index mode, the majority of index entries could be stored |
| 207 | in a separate file. This extension records the changes to be made on |
| 208 | top of that to produce the final index. |
| 209 | |
Junio C Hamano | 6df93d9 | 2014-12-22 22:48:09 | [diff] [blame] | 210 | The signature for this extension is { 'l', 'i', 'n', 'k' }. |
Junio C Hamano | 6f0c944 | 2014-07-16 21:51:32 | [diff] [blame] | 211 | |
| 212 | The extension consists of: |
| 213 | |
| 214 | - 160-bit SHA-1 of the shared index file. The shared index file path |
| 215 | is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the |
| 216 | index does not require a shared index file. |
| 217 | |
| 218 | - An ewah-encoded delete bitmap, each bit represents an entry in the |
| 219 | shared index. If a bit is set, its corresponding entry in the |
| 220 | shared index will be removed from the final index. Note, because |
| 221 | a delete operation changes index entry positions, but we do need |
| 222 | original positions in replace phase, it's best to just mark |
| 223 | entries for removal, then do a mass deletion after replacement. |
| 224 | |
| 225 | - An ewah-encoded replace bitmap, each bit represents an entry in |
| 226 | the shared index. If a bit is set, its corresponding entry in the |
| 227 | shared index will be replaced with an entry in this index |
| 228 | file. All replaced entries are stored in sorted order in this |
| 229 | index. The first "1" bit in the replace bitmap corresponds to the |
| 230 | first index entry, the second "1" bit to the second entry and so |
| 231 | on. Replaced entries may have empty path names to save space. |
| 232 | |
| 233 | The remaining index entries after replaced ones will be added to the |
Junio C Hamano | fb5ffde | 2014-11-04 22:38:56 | [diff] [blame] | 234 | final index. These added entries are also sorted by entry name then |
Junio C Hamano | 6f0c944 | 2014-07-16 21:51:32 | [diff] [blame] | 235 | stage. |
Junio C Hamano | c4e2a20 | 2015-05-26 21:38:47 | [diff] [blame] | 236 | |
| 237 | == Untracked cache |
| 238 | |
| 239 | Untracked cache saves the untracked file list and necessary data to |
| 240 | verify the cache. The signature for this extension is { 'U', 'N', |
| 241 | 'T', 'R' }. |
| 242 | |
| 243 | The extension starts with |
| 244 | |
| 245 | - A sequence of NUL-terminated strings, preceded by the size of the |
| 246 | sequence in variable width encoding. Each string describes the |
| 247 | environment where the cache can be used. |
| 248 | |
| 249 | - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from |
| 250 | ctime field until "file size". |
| 251 | |
| 252 | - Stat data of core.excludesfile |
| 253 | |
| 254 | - 32-bit dir_flags (see struct dir_struct) |
| 255 | |
| 256 | - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file |
| 257 | does not exist. |
| 258 | |
| 259 | - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does |
| 260 | not exist. |
| 261 | |
| 262 | - NUL-terminated string of per-dir exclude file name. This usually |
| 263 | is ".gitignore". |
| 264 | |
| 265 | - The number of following directory blocks, variable width |
| 266 | encoding. If this number is zero, the extension ends here with a |
| 267 | following NUL. |
| 268 | |
| 269 | - A number of directory blocks in depth-first-search order, each |
| 270 | consists of |
| 271 | |
| 272 | - The number of untracked entries, variable width encoding. |
| 273 | |
| 274 | - The number of sub-directory blocks, variable width encoding. |
| 275 | |
| 276 | - The directory name terminated by NUL. |
| 277 | |
Junio C Hamano | b6aa12e | 2015-08-19 22:40:17 | [diff] [blame] | 278 | - A number of untracked file/dir names terminated by NUL. |
Junio C Hamano | c4e2a20 | 2015-05-26 21:38:47 | [diff] [blame] | 279 | |
| 280 | The remaining data of each directory block is grouped by type: |
| 281 | |
| 282 | - An ewah bitmap, the n-th bit marks whether the n-th directory has |
| 283 | valid untracked cache entries. |
| 284 | |
| 285 | - An ewah bitmap, the n-th bit records "check-only" bit of |
| 286 | read_directory_recursive() for the n-th directory. |
| 287 | |
| 288 | - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data |
| 289 | is valid for the n-th directory and exists in the next data. |
| 290 | |
| 291 | - An array of stat data. The n-th data corresponds with the n-th |
| 292 | "one" bit in the previous ewah bitmap. |
| 293 | |
| 294 | - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit |
| 295 | in the previous ewah bitmap. |
| 296 | |
| 297 | - One NUL. |
Junio C Hamano | d710560 | 2017-11-21 05:32:50 | [diff] [blame] | 298 | |
| 299 | == File System Monitor cache |
| 300 | |
| 301 | The file system monitor cache tracks files for which the core.fsmonitor |
| 302 | hook has told us about changes. The signature for this extension is |
| 303 | { 'F', 'S', 'M', 'N' }. |
| 304 | |
| 305 | The extension starts with |
| 306 | |
| 307 | - 32-bit version number: the current supported version is 1. |
| 308 | |
| 309 | - 64-bit time: the extension data reflects all changes through the given |
| 310 | time which is stored as the nanoseconds elapsed since midnight, |
| 311 | January 1, 1970. |
| 312 | |
| 313 | - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. |
| 314 | |
| 315 | - An ewah bitmap, the n-th bit indicates whether the n-th index entry |
| 316 | is not CE_FSMONITOR_VALID. |
Junio C Hamano | abe2c59 | 2018-10-19 05:42:53 | [diff] [blame] | 317 | |
| 318 | == End of Index Entry |
| 319 | |
| 320 | The End of Index Entry (EOIE) is used to locate the end of the variable |
Junio C Hamano | 8ef91f3 | 2019-12-01 22:58:27 | [diff] [blame] | 321 | length index entries and the beginning of the extensions. Code can take |
Junio C Hamano | abe2c59 | 2018-10-19 05:42:53 | [diff] [blame] | 322 | advantage of this to quickly locate the index extensions without having |
| 323 | to parse through all of the index entries. |
| 324 | |
| 325 | Because it must be able to be loaded before the variable length cache |
| 326 | entries and other index extensions, this extension must be written last. |
| 327 | The signature for this extension is { 'E', 'O', 'I', 'E' }. |
| 328 | |
| 329 | The extension consists of: |
| 330 | |
| 331 | - 32-bit offset to the end of the index entries |
| 332 | |
| 333 | - 160-bit SHA-1 over the extension types and their sizes (but not |
| 334 | their contents). E.g. if we have "TREE" extension that is N-bytes |
| 335 | long, "REUC" extension that is M-bytes long, followed by "EOIE", |
| 336 | then the hash would be: |
| 337 | |
| 338 | SHA-1("TREE" + <binary representation of N> + |
| 339 | "REUC" + <binary representation of M>) |
| 340 | |
| 341 | == Index Entry Offset Table |
| 342 | |
| 343 | The Index Entry Offset Table (IEOT) is used to help address the CPU |
| 344 | cost of loading the index by enabling multi-threading the process of |
| 345 | converting cache entries from the on-disk format to the in-memory format. |
| 346 | The signature for this extension is { 'I', 'E', 'O', 'T' }. |
| 347 | |
| 348 | The extension consists of: |
| 349 | |
| 350 | - 32-bit version (currently 1) |
| 351 | |
| 352 | - A number of index offset entries each consisting of: |
| 353 | |
Junio C Hamano | 8ef91f3 | 2019-12-01 22:58:27 | [diff] [blame] | 354 | - 32-bit offset from the beginning of the file to the first cache entry |
Junio C Hamano | abe2c59 | 2018-10-19 05:42:53 | [diff] [blame] | 355 | in this block of entries. |
| 356 | |
| 357 | - 32-bit count of cache entries in this block |