Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 1 | gitpacking(7) |
| 2 | ============= |
| 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | gitpacking - Advanced concepts related to packing in Git |
| 7 | |
| 8 | SYNOPSIS |
| 9 | -------- |
| 10 | gitpacking |
| 11 | |
| 12 | DESCRIPTION |
| 13 | ----------- |
| 14 | |
| 15 | This document aims to describe some advanced concepts related to packing |
| 16 | in Git. |
| 17 | |
| 18 | Many concepts are currently described scattered between manual pages of |
| 19 | various Git commands, including linkgit:git-pack-objects[1], |
| 20 | linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5], |
| 21 | and parts of the `Documentation/technical` tree. |
| 22 | |
| 23 | There are many aspects of packing in Git that are not covered in this |
| 24 | document that instead live in the aforementioned areas. Over time, those |
| 25 | scattered bits may coalesce into this document. |
| 26 | |
| 27 | == Pseudo-merge bitmaps |
| 28 | |
| 29 | NOTE: Pseudo-merge bitmaps are considered an experimental feature, so |
| 30 | the configuration and many of the ideas are subject to change. |
| 31 | |
| 32 | === Background |
| 33 | |
| 34 | Reachability bitmaps are most efficient when we have on-disk stored |
| 35 | bitmaps for one or more of the starting points of a traversal. For this |
| 36 | reason, Git prefers storing bitmaps for commits at the tips of refs, |
| 37 | because traversals tend to start with those points. |
| 38 | |
| 39 | But if you have a large number of refs, it's not feasible to store a |
| 40 | bitmap for _every_ ref tip. It takes up space, and just OR-ing all of |
| 41 | those bitmaps together is expensive. |
| 42 | |
| 43 | One way we can deal with that is to create bitmaps that represent |
| 44 | _groups_ of refs. When a traversal asks about the entire group, then we |
| 45 | can use this single bitmap instead of considering each ref individually. |
| 46 | Because these bitmaps represent the set of objects which would be |
| 47 | reachable in a hypothetical merge of all of the commits, we call them |
| 48 | pseudo-merge bitmaps. |
| 49 | |
| 50 | === Overview |
| 51 | |
| 52 | A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as |
| 53 | follows: |
| 54 | |
| 55 | Commit bitmap:: |
| 56 | |
| 57 | A bitmap whose set bits describe the set of commits included in the |
| 58 | pseudo-merge's "merge" bitmap (as below). |
| 59 | |
| 60 | Merge bitmap:: |
| 61 | |
| 62 | A bitmap whose set bits describe the reachability closure over the set |
| 63 | of commits in the pseudo-merge's "commits" bitmap (as above). An |
| 64 | identical bitmap would be generated for an octopus merge with the same |
| 65 | set of parents as described in the commits bitmap. |
| 66 | |
| 67 | Pseudo-merge bitmaps can accelerate bitmap traversals when all commits |
| 68 | for a given pseudo-merge are listed on either side of the traversal, |
| 69 | either directly (by explicitly asking for them as part of the `HAVES` |
| 70 | or `WANTS`) or indirectly (by encountering them during a fill-in |
| 71 | traversal). |
| 72 | |
| 73 | === Use-cases |
| 74 | |
| 75 | For example, suppose there exists a pseudo-merge bitmap with a large |
| 76 | number of commits, all of which are listed in the `WANTS` section of |
| 77 | some bitmap traversal query. When pseudo-merge bitmaps are enabled, the |
| 78 | bitmap machinery can quickly determine there is a pseudo-merge which |
| 79 | satisfies some subset of the wanted objects on either side of the query. |
| 80 | Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the |
| 81 | resulting bitmap. By contrast, without pseudo-merge bitmaps, we would |
| 82 | have to repeat the decompression and `OR`-ing step over a potentially |
| 83 | large number of individual bitmaps, which can take proportionally more |
| 84 | time. |
| 85 | |
| 86 | Another benefit of pseudo-merges arises when there is some combination |
| 87 | of (a) a large number of references, with (b) poor bitmap coverage, and |
| 88 | (c) deep, nested trees, making fill-in traversal relatively expensive. |
| 89 | For example, suppose that there are a large enough number of tags where |
| 90 | bitmapping each of the tags individually is infeasible. Without |
| 91 | pseudo-merge bitmaps, computing the result of, say, `git rev-list |
| 92 | --use-bitmap-index --count --objects --tags` would likely require a |
| 93 | large amount of fill-in traversal. But when a large quantity of those |
| 94 | tags are stored together in a pseudo-merge bitmap, the bitmap machinery |
| 95 | can take advantage of the fact that we only care about the union of |
| 96 | objects reachable from all of those tags, and answer the query much |
| 97 | faster. |
| 98 | |
| 99 | === Configuration |
| 100 | |
| 101 | Reference tips are grouped into different pseudo-merge groups according |
| 102 | to two criteria. A reference name matches one or more of the defined |
| 103 | pseudo-merge patterns, and optionally one or more capture groups within |
| 104 | that pattern which further partition the group. |
| 105 | |
| 106 | Within a group, commits may be considered "stable", or "unstable" |
| 107 | depending on their age. These are adjusted by setting the |
| 108 | `bitmapPseudoMerge.<name>.stableThreshold` and |
| 109 | `bitmapPseudoMerge.<name>.threshold` configuration values, respectively. |
| 110 | |
| 111 | All stable commits are grouped into pseudo-merges of equal size |
| 112 | (`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize` |
| 113 | configuration is set to, say, 100, then the first 100 commits (ordered |
| 114 | by committer date) which are older than the `stableThreshold` value will |
| 115 | form one group, the next 100 commits will form another group, and so on. |
| 116 | |
| 117 | Among unstable commits, the pseudo-merge machinery will attempt to |
| 118 | combine older commits into large groups as opposed to newer commits |
| 119 | which will appear in smaller groups. This is based on the heuristic that |
| 120 | references whose tip commit is older are less likely to be modified to |
| 121 | point at a different commit than a reference whose tip commit is newer. |
| 122 | |
| 123 | The size of groups is determined by a power-law decay function, and the |
| 124 | decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`, |
| 125 | where `f(n)` describes the size of the `n`-th pseudo-merge group. The |
| 126 | sample rate controls what percentage of eligible commits are considered |
| 127 | as candidates. The threshold parameter indicates the minimum age (so as |
| 128 | to avoid including too-recent commits in a pseudo-merge group, making it |
| 129 | less likely to be valid). The "maxMerges" parameter sets an upper-bound |
| 130 | on the number of pseudo-merge commits an individual group |
| 131 | |
| 132 | The "stable"-related parameters control "stable" pseudo-merge groups, |
| 133 | comprised of a fixed number of commits which are older than the |
| 134 | configured "stable threshold" value and may be grouped together in |
| 135 | chunks of "stableSize" in order of age. |
| 136 | |
| 137 | The exact configuration for pseudo-merges is as follows: |
| 138 | |
| 139 | include::config/bitmap-pseudo-merge.txt[] |
| 140 | |
| 141 | === Examples |
| 142 | |
| 143 | Suppose that you have a repository with a large number of references, |
| 144 | and you want a bare-bones configuration of pseudo-merge bitmaps that |
| 145 | will enhance bitmap coverage of the `refs/` namespace. You may start |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 146 | with a configuration like so: |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 147 | |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 148 | ---- |
| 149 | [bitmapPseudoMerge "all"] |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 150 | pattern = "refs/" |
| 151 | threshold = now |
| 152 | stableThreshold = never |
| 153 | sampleRate = 100 |
| 154 | maxMerges = 64 |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 155 | ---- |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 156 | |
| 157 | This will create pseudo-merge bitmaps for all references, regardless of |
| 158 | their age, and group them into 64 pseudo-merge commits. |
| 159 | |
| 160 | If you wanted to separate tags from branches when generating |
| 161 | pseudo-merge commits, you would instead define the pattern with a |
| 162 | capture group, like so: |
| 163 | |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 164 | ---- |
| 165 | [bitmapPseudoMerge "all"] |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 166 | pattern = "refs/(heads/tags)/" |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 167 | ---- |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 168 | |
| 169 | Suppose instead that you are working in a fork-network repository, with |
| 170 | each fork specified by some numeric ID, and whose refs reside in |
| 171 | `refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some |
| 172 | fork) in the network. In this instance, you may instead write something |
| 173 | like: |
| 174 | |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 175 | ---- |
| 176 | [bitmapPseudoMerge "all"] |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 177 | pattern = "refs/virtual/([0-9]+)/(heads|tags)/" |
| 178 | threshold = now |
| 179 | stableThreshold = never |
| 180 | sampleRate = 100 |
| 181 | maxMerges = 64 |
Junio C Hamano | 84f1873 | 2024-07-18 16:54:26 | [diff] [blame] | 182 | ---- |
Junio C Hamano | d5838b4 | 2024-06-25 00:34:33 | [diff] [blame] | 183 | |
| 184 | Which would generate pseudo-merge group identifiers like "1234-heads", |
| 185 | and "5678-tags" (for branches in fork "1234", and tags in remote "5678", |
| 186 | respectively). |
| 187 | |
| 188 | SEE ALSO |
| 189 | -------- |
| 190 | linkgit:git-pack-objects[1] |
| 191 | linkgit:git-repack[1] |
| 192 | |
| 193 | GIT |
| 194 | --- |
| 195 | Part of the linkgit:git[1] suite |