blob: 321154d4e65dbcf14b5454c1129e6127f059a143 [file] [log] [blame]
Junio C Hamanod5838b42024-06-25 00:34:331gitpacking(7)
2=============
3
4NAME
5----
6gitpacking - Advanced concepts related to packing in Git
7
8SYNOPSIS
9--------
10gitpacking
11
12DESCRIPTION
13-----------
14
15This document aims to describe some advanced concepts related to packing
16in Git.
17
18Many concepts are currently described scattered between manual pages of
19various Git commands, including linkgit:git-pack-objects[1],
20linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5],
21and parts of the `Documentation/technical` tree.
22
23There are many aspects of packing in Git that are not covered in this
24document that instead live in the aforementioned areas. Over time, those
25scattered bits may coalesce into this document.
26
27== Pseudo-merge bitmaps
28
29NOTE: Pseudo-merge bitmaps are considered an experimental feature, so
30the configuration and many of the ideas are subject to change.
31
32=== Background
33
34Reachability bitmaps are most efficient when we have on-disk stored
35bitmaps for one or more of the starting points of a traversal. For this
36reason, Git prefers storing bitmaps for commits at the tips of refs,
37because traversals tend to start with those points.
38
39But if you have a large number of refs, it's not feasible to store a
40bitmap for _every_ ref tip. It takes up space, and just OR-ing all of
41those bitmaps together is expensive.
42
43One way we can deal with that is to create bitmaps that represent
44_groups_ of refs. When a traversal asks about the entire group, then we
45can use this single bitmap instead of considering each ref individually.
46Because these bitmaps represent the set of objects which would be
47reachable in a hypothetical merge of all of the commits, we call them
48pseudo-merge bitmaps.
49
50=== Overview
51
52A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
53follows:
54
55Commit bitmap::
56
57 A bitmap whose set bits describe the set of commits included in the
58 pseudo-merge's "merge" bitmap (as below).
59
60Merge bitmap::
61
62 A bitmap whose set bits describe the reachability closure over the set
63 of commits in the pseudo-merge's "commits" bitmap (as above). An
64 identical bitmap would be generated for an octopus merge with the same
65 set of parents as described in the commits bitmap.
66
67Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
68for a given pseudo-merge are listed on either side of the traversal,
69either directly (by explicitly asking for them as part of the `HAVES`
70or `WANTS`) or indirectly (by encountering them during a fill-in
71traversal).
72
73=== Use-cases
74
75For example, suppose there exists a pseudo-merge bitmap with a large
76number of commits, all of which are listed in the `WANTS` section of
77some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
78bitmap machinery can quickly determine there is a pseudo-merge which
79satisfies some subset of the wanted objects on either side of the query.
80Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
81resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
82have to repeat the decompression and `OR`-ing step over a potentially
83large number of individual bitmaps, which can take proportionally more
84time.
85
86Another benefit of pseudo-merges arises when there is some combination
87of (a) a large number of references, with (b) poor bitmap coverage, and
88(c) deep, nested trees, making fill-in traversal relatively expensive.
89For example, suppose that there are a large enough number of tags where
90bitmapping each of the tags individually is infeasible. Without
91pseudo-merge bitmaps, computing the result of, say, `git rev-list
92--use-bitmap-index --count --objects --tags` would likely require a
93large amount of fill-in traversal. But when a large quantity of those
94tags are stored together in a pseudo-merge bitmap, the bitmap machinery
95can take advantage of the fact that we only care about the union of
96objects reachable from all of those tags, and answer the query much
97faster.
98
99=== Configuration
100
101Reference tips are grouped into different pseudo-merge groups according
102to two criteria. A reference name matches one or more of the defined
103pseudo-merge patterns, and optionally one or more capture groups within
104that pattern which further partition the group.
105
106Within a group, commits may be considered "stable", or "unstable"
107depending on their age. These are adjusted by setting the
108`bitmapPseudoMerge.<name>.stableThreshold` and
109`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.
110
111All stable commits are grouped into pseudo-merges of equal size
112(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
113configuration is set to, say, 100, then the first 100 commits (ordered
114by committer date) which are older than the `stableThreshold` value will
115form one group, the next 100 commits will form another group, and so on.
116
117Among unstable commits, the pseudo-merge machinery will attempt to
118combine older commits into large groups as opposed to newer commits
119which will appear in smaller groups. This is based on the heuristic that
120references whose tip commit is older are less likely to be modified to
121point at a different commit than a reference whose tip commit is newer.
122
123The size of groups is determined by a power-law decay function, and the
124decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
125where `f(n)` describes the size of the `n`-th pseudo-merge group. The
126sample rate controls what percentage of eligible commits are considered
127as candidates. The threshold parameter indicates the minimum age (so as
128to avoid including too-recent commits in a pseudo-merge group, making it
129less likely to be valid). The "maxMerges" parameter sets an upper-bound
130on the number of pseudo-merge commits an individual group
131
132The "stable"-related parameters control "stable" pseudo-merge groups,
133comprised of a fixed number of commits which are older than the
134configured "stable threshold" value and may be grouped together in
135chunks of "stableSize" in order of age.
136
137The exact configuration for pseudo-merges is as follows:
138
139include::config/bitmap-pseudo-merge.txt[]
140
141=== Examples
142
143Suppose that you have a repository with a large number of references,
144and you want a bare-bones configuration of pseudo-merge bitmaps that
145will enhance bitmap coverage of the `refs/` namespace. You may start
Junio C Hamano84f18732024-07-18 16:54:26146with a configuration like so:
Junio C Hamanod5838b42024-06-25 00:34:33147
Junio C Hamano84f18732024-07-18 16:54:26148----
149[bitmapPseudoMerge "all"]
Junio C Hamanod5838b42024-06-25 00:34:33150pattern = "refs/"
151threshold = now
152stableThreshold = never
153sampleRate = 100
154maxMerges = 64
Junio C Hamano84f18732024-07-18 16:54:26155----
Junio C Hamanod5838b42024-06-25 00:34:33156
157This will create pseudo-merge bitmaps for all references, regardless of
158their age, and group them into 64 pseudo-merge commits.
159
160If you wanted to separate tags from branches when generating
161pseudo-merge commits, you would instead define the pattern with a
162capture group, like so:
163
Junio C Hamano84f18732024-07-18 16:54:26164----
165[bitmapPseudoMerge "all"]
Junio C Hamanod5838b42024-06-25 00:34:33166pattern = "refs/(heads/tags)/"
Junio C Hamano84f18732024-07-18 16:54:26167----
Junio C Hamanod5838b42024-06-25 00:34:33168
169Suppose instead that you are working in a fork-network repository, with
170each fork specified by some numeric ID, and whose refs reside in
171`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
172fork) in the network. In this instance, you may instead write something
173like:
174
Junio C Hamano84f18732024-07-18 16:54:26175----
176[bitmapPseudoMerge "all"]
Junio C Hamanod5838b42024-06-25 00:34:33177pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
178threshold = now
179stableThreshold = never
180sampleRate = 100
181maxMerges = 64
Junio C Hamano84f18732024-07-18 16:54:26182----
Junio C Hamanod5838b42024-06-25 00:34:33183
184Which would generate pseudo-merge group identifiers like "1234-heads",
185and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
186respectively).
187
188SEE ALSO
189--------
190linkgit:git-pack-objects[1]
191linkgit:git-repack[1]
192
193GIT
194---
195Part of the linkgit:git[1] suite