<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" | |
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> | |
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> | |
<head> | |
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> | |
<meta name="generator" content="AsciiDoc 7.0.2" /> | |
<style type="text/css"> | |
/* Debug borders */ | |
p, li, dt, dd, div, pre, h1, h2, h3, h4, h5, h6 { | |
/* | |
border: 1px solid red; | |
*/ | |
} | |
body { | |
margin: 1em 5% 1em 5%; | |
} | |
a { color: blue; } | |
a:visited { color: fuchsia; } | |
em { | |
font-style: italic; | |
} | |
strong { | |
font-weight: bold; | |
} | |
tt { | |
color: navy; | |
} | |
h1, h2, h3, h4, h5, h6 { | |
color: #527bbd; | |
font-family: sans-serif; | |
margin-top: 1.2em; | |
margin-bottom: 0.5em; | |
line-height: 1.3; | |
} | |
h1 { | |
border-bottom: 2px solid silver; | |
} | |
h2 { | |
border-bottom: 2px solid silver; | |
padding-top: 0.5em; | |
} | |
div.sectionbody { | |
font-family: serif; | |
margin-left: 0; | |
} | |
hr { | |
border: 1px solid silver; | |
} | |
p { | |
margin-top: 0.5em; | |
margin-bottom: 0.5em; | |
} | |
pre { | |
padding: 0; | |
margin: 0; | |
} | |
span#author { | |
color: #527bbd; | |
font-family: sans-serif; | |
font-weight: bold; | |
font-size: 1.2em; | |
} | |
span#email { | |
} | |
span#revision { | |
font-family: sans-serif; | |
} | |
div#footer { | |
font-family: sans-serif; | |
font-size: small; | |
border-top: 2px solid silver; | |
padding-top: 0.5em; | |
margin-top: 4.0em; | |
} | |
div#footer-text { | |
float: left; | |
padding-bottom: 0.5em; | |
} | |
div#footer-badges { | |
float: right; | |
padding-bottom: 0.5em; | |
} | |
div#preamble, | |
div.tableblock, div.imageblock, div.exampleblock, div.verseblock, | |
div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock, | |
div.admonitionblock { | |
margin-right: 10%; | |
margin-top: 1.5em; | |
margin-bottom: 1.5em; | |
} | |
div.admonitionblock { | |
margin-top: 2.5em; | |
margin-bottom: 2.5em; | |
} | |
div.content { /* Block element content. */ | |
padding: 0; | |
} | |
/* Block element titles. */ | |
div.title, caption.title { | |
font-family: sans-serif; | |
font-weight: bold; | |
text-align: left; | |
margin-top: 1.0em; | |
margin-bottom: 0.5em; | |
} | |
div.title + * { | |
margin-top: 0; | |
} | |
td div.title:first-child { | |
margin-top: 0.0em; | |
} | |
div.content div.title:first-child { | |
margin-top: 0.0em; | |
} | |
div.content + div.title { | |
margin-top: 0.0em; | |
} | |
div.sidebarblock > div.content { | |
background: #ffffee; | |
border: 1px solid silver; | |
padding: 0.5em; | |
} | |
div.listingblock > div.content { | |
border: 1px solid silver; | |
background: #f4f4f4; | |
padding: 0.5em; | |
} | |
div.quoteblock > div.content { | |
padding-left: 2.0em; | |
} | |
div.quoteblock .attribution { | |
text-align: right; | |
} | |
div.admonitionblock .icon { | |
vertical-align: top; | |
font-size: 1.1em; | |
font-weight: bold; | |
text-decoration: underline; | |
color: #527bbd; | |
padding-right: 0.5em; | |
} | |
div.admonitionblock td.content { | |
padding-left: 0.5em; | |
border-left: 2px solid silver; | |
} | |
div.exampleblock > div.content { | |
border-left: 2px solid silver; | |
padding: 0.5em; | |
} | |
div.verseblock div.content { | |
white-space: pre; | |
} | |
div.imageblock div.content { padding-left: 0; } | |
div.imageblock img { border: 1px solid silver; } | |
span.image img { border-style: none; } | |
dl { | |
margin-top: 0.8em; | |
margin-bottom: 0.8em; | |
} | |
dt { | |
margin-top: 0.5em; | |
margin-bottom: 0; | |
font-style: italic; | |
} | |
dd > *:first-child { | |
margin-top: 0; | |
} | |
ul, ol { | |
list-style-position: outside; | |
} | |
ol.olist2 { | |
list-style-type: lower-alpha; | |
} | |
div.tableblock > table { | |
border-color: #527bbd; | |
border-width: 3px; | |
} | |
thead { | |
font-family: sans-serif; | |
font-weight: bold; | |
} | |
tfoot { | |
font-weight: bold; | |
} | |
div.hlist { | |
margin-top: 0.8em; | |
margin-bottom: 0.8em; | |
} | |
td.hlist1 { | |
vertical-align: top; | |
font-style: italic; | |
padding-right: 0.8em; | |
} | |
td.hlist2 { | |
vertical-align: top; | |
} | |
@media print { | |
div#footer-badges { display: none; } | |
} | |
include::./stylesheets/xhtml11-manpage.css[] | |
/* Workarounds for IE6's broken and incomplete CSS2. */ | |
div.sidebar-content { | |
background: #ffffee; | |
border: 1px solid silver; | |
padding: 0.5em; | |
} | |
div.sidebar-title, div.image-title { | |
font-family: sans-serif; | |
font-weight: bold; | |
margin-top: 0.0em; | |
margin-bottom: 0.5em; | |
} | |
div.listingblock div.content { | |
border: 1px solid silver; | |
background: #f4f4f4; | |
padding: 0.5em; | |
} | |
div.quoteblock-content { | |
padding-left: 2.0em; | |
} | |
div.exampleblock-content { | |
border-left: 2px solid silver; | |
padding-left: 0.5em; | |
} | |
</style> | |
<title>git-fast-import(1)</title> | |
</head> | |
<body> | |
<div id="header"> | |
<h1> | |
git-fast-import(1) Manual Page | |
</h1> | |
<h2>NAME</h2> | |
<div class="sectionbody"> | |
<p>git-fast-import - | |
Backend for fast Git data importers | |
</p> | |
</div> | |
</div> | |
<h2>SYNOPSIS</h2> | |
<div class="sectionbody"> | |
<p>frontend | <em>git-fast-import</em> [options]</p> | |
</div> | |
<h2>DESCRIPTION</h2> | |
<div class="sectionbody"> | |
<p>This program is usually not what the end user wants to run directly. | |
Most end users want to use one of the existing frontend programs, | |
which parses a specific type of foreign source and feeds the contents | |
stored there to git-fast-import.</p> | |
<p>fast-import reads a mixed command/data stream from standard input and | |
writes one or more packfiles directly into the current repository. | |
When EOF is received on standard input, fast import writes out | |
updated branch and tag refs, fully updating the current repository | |
with the newly imported data.</p> | |
<p>The fast-import backend itself can import into an empty repository (one that | |
has already been initialized by <a href="git-init.html">git-init(1)</a>) or incrementally | |
update an existing populated repository. Whether or not incremental | |
imports are supported from a particular foreign source depends on | |
the frontend program in use.</p> | |
</div> | |
<h2>OPTIONS</h2> | |
<div class="sectionbody"> | |
<dl> | |
<dt> | |
--date-format=<fmt> | |
</dt> | |
<dd> | |
<p> | |
Specify the type of dates the frontend will supply to | |
fast-import within <tt>author</tt>, <tt>committer</tt> and <tt>tagger</tt> commands. | |
See “Date Formats” below for details about which formats | |
are supported, and their syntax. | |
</p> | |
</dd> | |
<dt> | |
--force | |
</dt> | |
<dd> | |
<p> | |
Force updating modified existing branches, even if doing | |
so would cause commits to be lost (as the new commit does | |
not contain the old commit). | |
</p> | |
</dd> | |
<dt> | |
--max-pack-size=<n> | |
</dt> | |
<dd> | |
<p> | |
Maximum size of each output packfile, expressed in MiB. | |
The default is 4096 (4 GiB) as that is the maximum allowed | |
packfile size (due to file format limitations). Some | |
importers may wish to lower this, such as to ensure the | |
resulting packfiles fit on CDs. | |
</p> | |
</dd> | |
<dt> | |
--depth=<n> | |
</dt> | |
<dd> | |
<p> | |
Maximum delta depth, for blob and tree deltification. | |
Default is 10. | |
</p> | |
</dd> | |
<dt> | |
--active-branches=<n> | |
</dt> | |
<dd> | |
<p> | |
Maximum number of branches to maintain active at once. | |
See “Memory Utilization” below for details. Default is 5. | |
</p> | |
</dd> | |
<dt> | |
--export-marks=<file> | |
</dt> | |
<dd> | |
<p> | |
Dumps the internal marks table to <file> when complete. | |
Marks are written one per line as <tt>:markid SHA-1</tt>. | |
Frontends can use this file to validate imports after they | |
have been completed, or to save the marks table across | |
incremental runs. As <file> is only opened and truncated | |
at checkpoint (or completion) the same path can also be | |
safely given to --import-marks. | |
</p> | |
</dd> | |
<dt> | |
--import-marks=<file> | |
</dt> | |
<dd> | |
<p> | |
Before processing any input, load the marks specified in | |
<file>. The input file must exist, must be readable, and | |
must use the same format as produced by --export-marks. | |
Multiple options may be supplied to import more than one | |
set of marks. If a mark is defined to different values, | |
the last file wins. | |
</p> | |
</dd> | |
<dt> | |
--export-pack-edges=<file> | |
</dt> | |
<dd> | |
<p> | |
After creating a packfile, print a line of data to | |
<file> listing the filename of the packfile and the last | |
commit on each branch that was written to that packfile. | |
This information may be useful after importing projects | |
whose total object set exceeds the 4 GiB packfile limit, | |
as these commits can be used as edge points during calls | |
to <a href="git-pack-objects.html">git-pack-objects(1)</a>. | |
</p> | |
</dd> | |
<dt> | |
--quiet | |
</dt> | |
<dd> | |
<p> | |
Disable all non-fatal output, making fast-import silent when it | |
is successful. This option disables the output shown by | |
--stats. | |
</p> | |
</dd> | |
<dt> | |
--stats | |
</dt> | |
<dd> | |
<p> | |
Display some basic statistics about the objects fast-import has | |
created, the packfiles they were stored into, and the | |
memory used by fast-import during this run. Showing this output | |
is currently the default, but can be disabled with --quiet. | |
</p> | |
</dd> | |
</dl> | |
</div> | |
<h2>Performance</h2> | |
<div class="sectionbody"> | |
<p>The design of fast-import allows it to import large projects in a minimum | |
amount of memory usage and processing time. Assuming the frontend | |
is able to keep up with fast-import and feed it a constant stream of data, | |
import times for projects holding 10+ years of history and containing | |
100,000+ individual commits are generally completed in just 1-2 | |
hours on quite modest (~$2,000 USD) hardware.</p> | |
<p>Most bottlenecks appear to be in foreign source data access (the | |
source just cannot extract revisions fast enough) or disk IO (fast-import | |
writes as fast as the disk will take the data). Imports will run | |
faster if the source data is stored on a different drive than the | |
destination Git repository (due to less IO contention).</p> | |
</div> | |
<h2>Development Cost</h2> | |
<div class="sectionbody"> | |
<p>A typical frontend for fast-import tends to weigh in at approximately 200 | |
lines of Perl/Python/Ruby code. Most developers have been able to | |
create working importers in just a couple of hours, even though it | |
is their first exposure to fast-import, and sometimes even to Git. This is | |
an ideal situation, given that most conversion tools are throw-away | |
(use once, and never look back).</p> | |
</div> | |
<h2>Parallel Operation</h2> | |
<div class="sectionbody"> | |
<p>Like <tt>git-push</tt> or <tt>git-fetch</tt>, imports handled by fast-import are safe to | |
run alongside parallel <tt>git repack -a -d</tt> or <tt>git gc</tt> invocations, | |
or any other Git operation (including <tt>git prune</tt>, as loose objects | |
are never used by fast-import).</p> | |
<p>fast-import does not lock the branch or tag refs it is actively importing. | |
After the import, during its ref update phase, fast-import tests each | |
existing branch ref to verify the update will be a fast-forward | |
update (the commit stored in the ref is contained in the new | |
history of the commit to be written). If the update is not a | |
fast-forward update, fast-import will skip updating that ref and instead | |
prints a warning message. fast-import will always attempt to update all | |
branch refs, and does not stop on the first failure.</p> | |
<p>Branch updates can be forced with --force, but its recommended that | |
this only be used on an otherwise quiet repository. Using --force | |
is not necessary for an initial import into an empty repository.</p> | |
</div> | |
<h2>Technical Discussion</h2> | |
<div class="sectionbody"> | |
<p>fast-import tracks a set of branches in memory. Any branch can be created | |
or modified at any point during the import process by sending a | |
<tt>commit</tt> command on the input stream. This design allows a frontend | |
program to process an unlimited number of branches simultaneously, | |
generating commits in the order they are available from the source | |
data. It also simplifies the frontend programs considerably.</p> | |
<p>fast-import does not use or alter the current working directory, or any | |
file within it. (It does however update the current Git repository, | |
as referenced by <tt>GIT_DIR</tt>.) Therefore an import frontend may use | |
the working directory for its own purposes, such as extracting file | |
revisions from the foreign source. This ignorance of the working | |
directory also allows fast-import to run very quickly, as it does not | |
need to perform any costly file update operations when switching | |
between branches.</p> | |
</div> | |
<h2>Input Format</h2> | |
<div class="sectionbody"> | |
<p>With the exception of raw file data (which Git does not interpret) | |
the fast-import input format is text (ASCII) based. This text based | |
format simplifies development and debugging of frontend programs, | |
especially when a higher level language such as Perl, Python or | |
Ruby is being used.</p> | |
<p>fast-import is very strict about its input. Where we say SP below we mean | |
<strong>exactly</strong> one space. Likewise LF means one (and only one) linefeed. | |
Supplying additional whitespace characters will cause unexpected | |
results, such as branch names or file names with leading or trailing | |
spaces in their name, or early termination of fast-import when it encounters | |
unexpected input.</p> | |
<h3>Stream Comments</h3> | |
<p>To aid in debugging frontends fast-import ignores any line that | |
begins with <tt>#</tt> (ASCII pound/hash) up to and including the line | |
ending <tt>LF</tt>. A comment line may contain any sequence of bytes | |
that does not contain an LF and therefore may be used to include | |
any detailed debugging information that might be specific to the | |
frontend and useful when inspecting a fast-import data stream.</p> | |
<h3>Date Formats</h3> | |
<p>The following date formats are supported. A frontend should select | |
the format it will use for this import by passing the format name | |
in the --date-format=<fmt> command line option.</p> | |
<dl> | |
<dt> | |
<tt>raw</tt> | |
</dt> | |
<dd> | |
<p> | |
This is the Git native format and is <tt><time> SP <offutc></tt>. | |
It is also fast-import's default format, if --date-format was | |
not specified. | |
</p> | |
<p>The time of the event is specified by <tt><time></tt> as the number of | |
seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is | |
written as an ASCII decimal integer.</p> | |
<p>The local offset is specified by <tt><offutc></tt> as a positive or negative | |
offset from UTC. For example EST (which is 5 hours behind UTC) | |
would be expressed in <tt><tz></tt> by “-0500” while UTC is “+0000”. | |
The local offset does not affect <tt><time></tt>; it is used only as an | |
advisement to help formatting routines display the timestamp.</p> | |
<p>If the local offset is not available in the source material, use | |
“+0000”, or the most common local offset. For example many | |
organizations have a CVS repository which has only ever been accessed | |
by users who are located in the same location and timezone. In this | |
case a reasonable offset from UTC could be assumed.</p> | |
<p>Unlike the <tt>rfc2822</tt> format, this format is very strict. Any | |
variation in formatting will cause fast-import to reject the value.</p> | |
</dd> | |
<dt> | |
<tt>rfc2822</tt> | |
</dt> | |
<dd> | |
<p> | |
This is the standard email format as described by RFC 2822. | |
</p> | |
<p>An example value is “Tue Feb 6 11:22:18 2007 -0500”. The Git | |
parser is accurate, but a little on the lenient side. It is the | |
same parser used by <a href="git-am.html">git-am(1)</a> when applying patches | |
received from email.</p> | |
<p>Some malformed strings may be accepted as valid dates. In some of | |
these cases Git will still be able to obtain the correct date from | |
the malformed string. There are also some types of malformed | |
strings which Git will parse wrong, and yet consider valid. | |
Seriously malformed strings will be rejected.</p> | |
<p>Unlike the <tt>raw</tt> format above, the timezone/UTC offset information | |
contained in an RFC 2822 date string is used to adjust the date | |
value to UTC prior to storage. Therefore it is important that | |
this information be as accurate as possible.</p> | |
<p>If the source material uses RFC 2822 style dates, | |
the frontend should let fast-import handle the parsing and conversion | |
(rather than attempting to do it itself) as the Git parser has | |
been well tested in the wild.</p> | |
<p>Frontends should prefer the <tt>raw</tt> format if the source material | |
already uses UNIX-epoch format, can be coaxed to give dates in that | |
format, or its format is easily convertible to it, as there is no | |
ambiguity in parsing.</p> | |
</dd> | |
<dt> | |
<tt>now</tt> | |
</dt> | |
<dd> | |
<p> | |
Always use the current time and timezone. The literal | |
<tt>now</tt> must always be supplied for <tt><when></tt>. | |
</p> | |
<p>This is a toy format. The current time and timezone of this system | |
is always copied into the identity string at the time it is being | |
created by fast-import. There is no way to specify a different time or | |
timezone.</p> | |
<p>This particular format is supplied as its short to implement and | |
may be useful to a process that wants to create a new commit | |
right now, without needing to use a working directory or | |
<a href="git-update-index.html">git-update-index(1)</a>.</p> | |
<p>If separate <tt>author</tt> and <tt>committer</tt> commands are used in a <tt>commit</tt> | |
the timestamps may not match, as the system clock will be polled | |
twice (once for each command). The only way to ensure that both | |
author and committer identity information has the same timestamp | |
is to omit <tt>author</tt> (thus copying from <tt>committer</tt>) or to use a | |
date format other than <tt>now</tt>.</p> | |
</dd> | |
</dl> | |
<h3>Commands</h3> | |
<p>fast-import accepts several commands to update the current repository | |
and control the current import process. More detailed discussion | |
(with examples) of each command follows later.</p> | |
<dl> | |
<dt> | |
<tt>commit</tt> | |
</dt> | |
<dd> | |
<p> | |
Creates a new branch or updates an existing branch by | |
creating a new commit and updating the branch to point at | |
the newly created commit. | |
</p> | |
</dd> | |
<dt> | |
<tt>tag</tt> | |
</dt> | |
<dd> | |
<p> | |
Creates an annotated tag object from an existing commit or | |
branch. Lightweight tags are not supported by this command, | |
as they are not recommended for recording meaningful points | |
in time. | |
</p> | |
</dd> | |
<dt> | |
<tt>reset</tt> | |
</dt> | |
<dd> | |
<p> | |
Reset an existing branch (or a new branch) to a specific | |
revision. This command must be used to change a branch to | |
a specific revision without making a commit on it. | |
</p> | |
</dd> | |
<dt> | |
<tt>blob</tt> | |
</dt> | |
<dd> | |
<p> | |
Convert raw file data into a blob, for future use in a | |
<tt>commit</tt> command. This command is optional and is not | |
needed to perform an import. | |
</p> | |
</dd> | |
<dt> | |
<tt>checkpoint</tt> | |
</dt> | |
<dd> | |
<p> | |
Forces fast-import to close the current packfile, generate its | |
unique SHA-1 checksum and index, and start a new packfile. | |
This command is optional and is not needed to perform | |
an import. | |
</p> | |
</dd> | |
<dt> | |
<tt>progress</tt> | |
</dt> | |
<dd> | |
<p> | |
Causes fast-import to echo the entire line to its own | |
standard output. This command is optional and is not needed | |
to perform an import. | |
</p> | |
</dd> | |
</dl> | |
<h3><tt>commit</tt></h3> | |
<p>Create or update a branch with a new commit, recording one logical | |
change to the project.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'commit' SP <ref> LF | |
mark? | |
('author' SP <name> SP LT <email> GT SP <when> LF)? | |
'committer' SP <name> SP LT <email> GT SP <when> LF | |
data | |
('from' SP <committish> LF)? | |
('merge' SP <committish> LF)? | |
(filemodify | filedelete | filecopy | filerename | filedeleteall)* | |
LF?</tt></pre> | |
</div></div> | |
<p>where <tt><ref></tt> is the name of the branch to make the commit on. | |
Typically branch names are prefixed with <tt>refs/heads/</tt> in | |
Git, so importing the CVS branch symbol <tt>RELENG-1_0</tt> would use | |
<tt>refs/heads/RELENG-1_0</tt> for the value of <tt><ref></tt>. The value of | |
<tt><ref></tt> must be a valid refname in Git. As <tt>LF</tt> is not valid in | |
a Git refname, no quoting or escaping syntax is supported here.</p> | |
<p>A <tt>mark</tt> command may optionally appear, requesting fast-import to save a | |
reference to the newly created commit for future use by the frontend | |
(see below for format). It is very common for frontends to mark | |
every commit they create, thereby allowing future branch creation | |
from any imported commit.</p> | |
<p>The <tt>data</tt> command following <tt>committer</tt> must supply the commit | |
message (see below for <tt>data</tt> command syntax). To import an empty | |
commit message use a 0 length data. Commit messages are free-form | |
and are not interpreted by Git. Currently they must be encoded in | |
UTF-8, as fast-import does not permit other encodings to be specified.</p> | |
<p>Zero or more <tt>filemodify</tt>, <tt>filedelete</tt>, <tt>filecopy</tt>, <tt>filerename</tt> | |
and <tt>filedeleteall</tt> commands | |
may be included to update the contents of the branch prior to | |
creating the commit. These commands may be supplied in any order. | |
However it is recommended that a <tt>filedeleteall</tt> command precede | |
all <tt>filemodify</tt>, <tt>filecopy</tt> and <tt>filerename</tt> commands in the same | |
commit, as <tt>filedeleteall</tt> | |
wipes the branch clean (see below).</p> | |
<p>The <tt>LF</tt> after the command is optional (it used to be required).</p> | |
<h4><tt>author</tt></h4> | |
<p>An <tt>author</tt> command may optionally appear, if the author information | |
might differ from the committer information. If <tt>author</tt> is omitted | |
then fast-import will automatically use the committer's information for | |
the author portion of the commit. See below for a description of | |
the fields in <tt>author</tt>, as they are identical to <tt>committer</tt>.</p> | |
<h4><tt>committer</tt></h4> | |
<p>The <tt>committer</tt> command indicates who made this commit, and when | |
they made it.</p> | |
<p>Here <tt><name></tt> is the person's display name (for example | |
“Com M Itter”) and <tt><email></tt> is the person's email address | |
(“cm@example.com”). <tt>LT</tt> and <tt>GT</tt> are the literal less-than (\x3c) | |
and greater-than (\x3e) symbols. These are required to delimit | |
the email address from the other fields in the line. Note that | |
<tt><name></tt> is free-form and may contain any sequence of bytes, except | |
<tt>LT</tt> and <tt>LF</tt>. It is typically UTF-8 encoded.</p> | |
<p>The time of the change is specified by <tt><when></tt> using the date format | |
that was selected by the --date-format=<fmt> command line option. | |
See “Date Formats” above for the set of supported formats, and | |
their syntax.</p> | |
<h4><tt>from</tt></h4> | |
<p>The <tt>from</tt> command is used to specify the commit to initialize | |
this branch from. This revision will be the first ancestor of the | |
new commit.</p> | |
<p>Omitting the <tt>from</tt> command in the first commit of a new branch | |
will cause fast-import to create that commit with no ancestor. This | |
tends to be desired only for the initial commit of a project. | |
Omitting the <tt>from</tt> command on existing branches is usually desired, | |
as the current commit on that branch is automatically assumed to | |
be the first ancestor of the new commit.</p> | |
<p>As <tt>LF</tt> is not valid in a Git refname or SHA-1 expression, no | |
quoting or escaping syntax is supported within <tt><committish></tt>.</p> | |
<p>Here <tt><committish></tt> is any of the following:</p> | |
<ul> | |
<li> | |
<p> | |
The name of an existing branch already in fast-import's internal branch | |
table. If fast-import doesn't know the name, its treated as a SHA-1 | |
expression. | |
</p> | |
</li> | |
<li> | |
<p> | |
A mark reference, <tt>:<idnum></tt>, where <tt><idnum></tt> is the mark number. | |
</p> | |
<p>The reason fast-import uses <tt>:</tt> to denote a mark reference is this character | |
is not legal in a Git branch name. The leading <tt>:</tt> makes it easy | |
to distinguish between the mark 42 (<tt>:42</tt>) and the branch 42 (<tt>42</tt> | |
or <tt>refs/heads/42</tt>), or an abbreviated SHA-1 which happened to | |
consist only of base-10 digits.</p> | |
<p>Marks must be declared (via <tt>mark</tt>) before they can be used.</p> | |
</li> | |
<li> | |
<p> | |
A complete 40 byte or abbreviated commit SHA-1 in hex. | |
</p> | |
</li> | |
<li> | |
<p> | |
Any valid Git SHA-1 expression that resolves to a commit. See | |
“SPECIFYING REVISIONS” in <a href="git-rev-parse.html">git-rev-parse(1)</a> for details. | |
</p> | |
</li> | |
</ul> | |
<p>The special case of restarting an incremental import from the | |
current branch value should be written as:</p> | |
<div class="listingblock"> | |
<div class="content"> | |
<pre><tt> from refs/heads/branch^0</tt></pre> | |
</div></div> | |
<p>The <tt>^0</tt> suffix is necessary as fast-import does not permit a branch to | |
start from itself, and the branch is created in memory before the | |
<tt>from</tt> command is even read from the input. Adding <tt>^0</tt> will force | |
fast-import to resolve the commit through Git's revision parsing library, | |
rather than its internal branch table, thereby loading in the | |
existing value of the branch.</p> | |
<h4><tt>merge</tt></h4> | |
<p>Includes one additional ancestor commit, and makes the current | |
commit a merge commit. An unlimited number of <tt>merge</tt> commands per | |
commit are permitted by fast-import, thereby establishing an n-way merge. | |
However Git's other tools never create commits with more than 15 | |
additional ancestors (forming a 16-way merge). For this reason | |
it is suggested that frontends do not use more than 15 <tt>merge</tt> | |
commands per commit.</p> | |
<p>Here <tt><committish></tt> is any of the commit specification expressions | |
also accepted by <tt>from</tt> (see above).</p> | |
<h4><tt>filemodify</tt></h4> | |
<p>Included in a <tt>commit</tt> command to add a new file or change the | |
content of an existing file. This command has two different means | |
of specifying the content of the file.</p> | |
<dl> | |
<dt> | |
External data format | |
</dt> | |
<dd> | |
<p> | |
The data content for the file was already supplied by a prior | |
<tt>blob</tt> command. The frontend just needs to connect it. | |
</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'M' SP <mode> SP <dataref> SP <path> LF</tt></pre> | |
</div></div> | |
<p>Here <tt><dataref></tt> can be either a mark reference (<tt>:<idnum></tt>) | |
set by a prior <tt>blob</tt> command, or a full 40-byte SHA-1 of an | |
existing Git blob object.</p> | |
</dd> | |
<dt> | |
Inline data format | |
</dt> | |
<dd> | |
<p> | |
The data content for the file has not been supplied yet. | |
The frontend wants to supply it as part of this modify | |
command. | |
</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'M' SP <mode> SP 'inline' SP <path> LF | |
data</tt></pre> | |
</div></div> | |
<p>See below for a detailed description of the <tt>data</tt> command.</p> | |
</dd> | |
</dl> | |
<p>In both formats <tt><mode></tt> is the type of file entry, specified | |
in octal. Git only supports the following modes:</p> | |
<ul> | |
<li> | |
<p> | |
<tt>100644</tt> or <tt>644</tt>: A normal (not-executable) file. The majority | |
of files in most projects use this mode. If in doubt, this is | |
what you want. | |
</p> | |
</li> | |
<li> | |
<p> | |
<tt>100755</tt> or <tt>755</tt>: A normal, but executable, file. | |
</p> | |
</li> | |
<li> | |
<p> | |
<tt>120000</tt>: A symlink, the content of the file will be the link target. | |
</p> | |
</li> | |
</ul> | |
<p>In both formats <tt><path></tt> is the complete path of the file to be added | |
(if not already existing) or modified (if already existing).</p> | |
<p>A <tt><path></tt> string must use UNIX-style directory separators (forward | |
slash <tt>/</tt>), may contain any byte other than <tt>LF</tt>, and must not | |
start with double quote (<tt>"</tt>).</p> | |
<p>If an <tt>LF</tt> or double quote must be encoded into <tt><path></tt> shell-style | |
quoting should be used, e.g. <tt>"path/with\n and \" in it"</tt>.</p> | |
<p>The value of <tt><path></tt> must be in canonical form. That is it must not:</p> | |
<ul> | |
<li> | |
<p> | |
contain an empty directory component (e.g. <tt>foo//bar</tt> is invalid), | |
</p> | |
</li> | |
<li> | |
<p> | |
end with a directory separator (e.g. <tt>foo/</tt> is invalid), | |
</p> | |
</li> | |
<li> | |
<p> | |
start with a directory separator (e.g. <tt>/foo</tt> is invalid), | |
</p> | |
</li> | |
<li> | |
<p> | |
contain the special component <tt>.</tt> or <tt>..</tt> (e.g. <tt>foo/./bar</tt> and | |
<tt>foo/../bar</tt> are invalid). | |
</p> | |
</li> | |
</ul> | |
<p>It is recommended that <tt><path></tt> always be encoded using UTF-8.</p> | |
<h4><tt>filedelete</tt></h4> | |
<p>Included in a <tt>commit</tt> command to remove a file or recursively | |
delete an entire directory from the branch. If the file or directory | |
removal makes its parent directory empty, the parent directory will | |
be automatically removed too. This cascades up the tree until the | |
first non-empty directory or the root is reached.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'D' SP <path> LF</tt></pre> | |
</div></div> | |
<p>here <tt><path></tt> is the complete path of the file or subdirectory to | |
be removed from the branch. | |
See <tt>filemodify</tt> above for a detailed description of <tt><path></tt>.</p> | |
<h4><tt>filecopy</tt></h4> | |
<p>Recursively copies an existing file or subdirectory to a different | |
location within the branch. The existing file or directory must | |
exist. If the destination exists it will be completely replaced | |
by the content copied from the source.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'C' SP <path> SP <path> LF</tt></pre> | |
</div></div> | |
<p>here the first <tt><path></tt> is the source location and the second | |
<tt><path></tt> is the destination. See <tt>filemodify</tt> above for a detailed | |
description of what <tt><path></tt> may look like. To use a source path | |
that contains SP the path must be quoted.</p> | |
<p>A <tt>filecopy</tt> command takes effect immediately. Once the source | |
location has been copied to the destination any future commands | |
applied to the source location will not impact the destination of | |
the copy.</p> | |
<h4><tt>filerename</tt></h4> | |
<p>Renames an existing file or subdirectory to a different location | |
within the branch. The existing file or directory must exist. If | |
the destination exists it will be replaced by the source directory.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'R' SP <path> SP <path> LF</tt></pre> | |
</div></div> | |
<p>here the first <tt><path></tt> is the source location and the second | |
<tt><path></tt> is the destination. See <tt>filemodify</tt> above for a detailed | |
description of what <tt><path></tt> may look like. To use a source path | |
that contains SP the path must be quoted.</p> | |
<p>A <tt>filerename</tt> command takes effect immediately. Once the source | |
location has been renamed to the destination any future commands | |
applied to the source location will create new files there and not | |
impact the destination of the rename.</p> | |
<p>Note that a <tt>filerename</tt> is the same as a <tt>filecopy</tt> followed by a | |
<tt>filedelete</tt> of the source location. There is a slight performance | |
advantage to using <tt>filerename</tt>, but the advantage is so small | |
that it is never worth trying to convert a delete/add pair in | |
source material into a rename for fast-import. This <tt>filerename</tt> | |
command is provided just to simplify frontends that already have | |
rename information and don't want bother with decomposing it into a | |
<tt>filecopy</tt> followed by a <tt>filedelete</tt>.</p> | |
<h4><tt>filedeleteall</tt></h4> | |
<p>Included in a <tt>commit</tt> command to remove all files (and also all | |
directories) from the branch. This command resets the internal | |
branch structure to have no files in it, allowing the frontend | |
to subsequently add all interesting files from scratch.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'deleteall' LF</tt></pre> | |
</div></div> | |
<p>This command is extremely useful if the frontend does not know | |
(or does not care to know) what files are currently on the branch, | |
and therefore cannot generate the proper <tt>filedelete</tt> commands to | |
update the content.</p> | |
<p>Issuing a <tt>filedeleteall</tt> followed by the needed <tt>filemodify</tt> | |
commands to set the correct content will produce the same results | |
as sending only the needed <tt>filemodify</tt> and <tt>filedelete</tt> commands. | |
The <tt>filedeleteall</tt> approach may however require fast-import to use slightly | |
more memory per active branch (less than 1 MiB for even most large | |
projects); so frontends that can easily obtain only the affected | |
paths for a commit are encouraged to do so.</p> | |
<h3><tt>mark</tt></h3> | |
<p>Arranges for fast-import to save a reference to the current object, allowing | |
the frontend to recall this object at a future point in time, without | |
knowing its SHA-1. Here the current object is the object creation | |
command the <tt>mark</tt> command appears within. This can be <tt>commit</tt>, | |
<tt>tag</tt>, and <tt>blob</tt>, but <tt>commit</tt> is the most common usage.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'mark' SP ':' <idnum> LF</tt></pre> | |
</div></div> | |
<p>where <tt><idnum></tt> is the number assigned by the frontend to this mark. | |
The value of <tt><idnum></tt> is expressed as an ASCII decimal integer. | |
The value 0 is reserved and cannot be used as | |
a mark. Only values greater than or equal to 1 may be used as marks.</p> | |
<p>New marks are created automatically. Existing marks can be moved | |
to another object simply by reusing the same <tt><idnum></tt> in another | |
<tt>mark</tt> command.</p> | |
<h3><tt>tag</tt></h3> | |
<p>Creates an annotated tag referring to a specific commit. To create | |
lightweight (non-annotated) tags see the <tt>reset</tt> command below.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'tag' SP <name> LF | |
'from' SP <committish> LF | |
'tagger' SP <name> SP LT <email> GT SP <when> LF | |
data</tt></pre> | |
</div></div> | |
<p>where <tt><name></tt> is the name of the tag to create.</p> | |
<p>Tag names are automatically prefixed with <tt>refs/tags/</tt> when stored | |
in Git, so importing the CVS branch symbol <tt>RELENG-1_0-FINAL</tt> would | |
use just <tt>RELENG-1_0-FINAL</tt> for <tt><name></tt>, and fast-import will write the | |
corresponding ref as <tt>refs/tags/RELENG-1_0-FINAL</tt>.</p> | |
<p>The value of <tt><name></tt> must be a valid refname in Git and therefore | |
may contain forward slashes. As <tt>LF</tt> is not valid in a Git refname, | |
no quoting or escaping syntax is supported here.</p> | |
<p>The <tt>from</tt> command is the same as in the <tt>commit</tt> command; see | |
above for details.</p> | |
<p>The <tt>tagger</tt> command uses the same format as <tt>committer</tt> within | |
<tt>commit</tt>; again see above for details.</p> | |
<p>The <tt>data</tt> command following <tt>tagger</tt> must supply the annotated tag | |
message (see below for <tt>data</tt> command syntax). To import an empty | |
tag message use a 0 length data. Tag messages are free-form and are | |
not interpreted by Git. Currently they must be encoded in UTF-8, | |
as fast-import does not permit other encodings to be specified.</p> | |
<p>Signing annotated tags during import from within fast-import is not | |
supported. Trying to include your own PGP/GPG signature is not | |
recommended, as the frontend does not (easily) have access to the | |
complete set of bytes which normally goes into such a signature. | |
If signing is required, create lightweight tags from within fast-import with | |
<tt>reset</tt>, then create the annotated versions of those tags offline | |
with the standard <a href="git-tag.html">git-tag(1)</a> process.</p> | |
<h3><tt>reset</tt></h3> | |
<p>Creates (or recreates) the named branch, optionally starting from | |
a specific revision. The reset command allows a frontend to issue | |
a new <tt>from</tt> command for an existing branch, or to create a new | |
branch from an existing commit without creating a new commit.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'reset' SP <ref> LF | |
('from' SP <committish> LF)? | |
LF?</tt></pre> | |
</div></div> | |
<p>For a detailed description of <tt><ref></tt> and <tt><committish></tt> see above | |
under <tt>commit</tt> and <tt>from</tt>.</p> | |
<p>The <tt>LF</tt> after the command is optional (it used to be required).</p> | |
<p>The <tt>reset</tt> command can also be used to create lightweight | |
(non-annotated) tags. For example:</p> | |
<div class="exampleblock"> | |
<div class="exampleblock-content"> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt>reset refs/tags/938 | |
from :938</tt></pre> | |
</div></div> | |
</div></div> | |
<p>would create the lightweight tag <tt>refs/tags/938</tt> referring to | |
whatever commit mark <tt>:938</tt> references.</p> | |
<h3><tt>blob</tt></h3> | |
<p>Requests writing one file revision to the packfile. The revision | |
is not connected to any commit; this connection must be formed in | |
a subsequent <tt>commit</tt> command by referencing the blob through an | |
assigned mark.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'blob' LF | |
mark? | |
data</tt></pre> | |
</div></div> | |
<p>The mark command is optional here as some frontends have chosen | |
to generate the Git SHA-1 for the blob on their own, and feed that | |
directly to <tt>commit</tt>. This is typically more work than its worth | |
however, as marks are inexpensive to store and easy to use.</p> | |
<h3><tt>data</tt></h3> | |
<p>Supplies raw data (for use as blob/file content, commit messages, or | |
annotated tag messages) to fast-import. Data can be supplied using an exact | |
byte count or delimited with a terminating line. Real frontends | |
intended for production-quality conversions should always use the | |
exact byte count format, as it is more robust and performs better. | |
The delimited format is intended primarily for testing fast-import.</p> | |
<p>Comment lines appearing within the <tt><raw></tt> part of <tt>data</tt> commands | |
are always taken to be part of the body of the data and are therefore | |
never ignored by fast-import. This makes it safe to import any | |
file/message content whose lines might start with <tt>#</tt>.</p> | |
<dl> | |
<dt> | |
Exact byte count format | |
</dt> | |
<dd> | |
<p> | |
The frontend must specify the number of bytes of data. | |
</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'data' SP <count> LF | |
<raw> LF?</tt></pre> | |
</div></div> | |
<p>where <tt><count></tt> is the exact number of bytes appearing within | |
<tt><raw></tt>. The value of <tt><count></tt> is expressed as an ASCII decimal | |
integer. The <tt>LF</tt> on either side of <tt><raw></tt> is not | |
included in <tt><count></tt> and will not be included in the imported data.</p> | |
<p>The <tt>LF</tt> after <tt><raw></tt> is optional (it used to be required) but | |
recommended. Always including it makes debugging a fast-import | |
stream easier as the next command always starts in column 0 | |
of the next line, even if <tt><raw></tt> did not end with an <tt>LF</tt>.</p> | |
</dd> | |
<dt> | |
Delimited format | |
</dt> | |
<dd> | |
<p> | |
A delimiter string is used to mark the end of the data. | |
fast-import will compute the length by searching for the delimiter. | |
This format is primarily useful for testing and is not | |
recommended for real data. | |
</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'data' SP '<<' <delim> LF | |
<raw> LF | |
<delim> LF | |
LF?</tt></pre> | |
</div></div> | |
<p>where <tt><delim></tt> is the chosen delimiter string. The string <tt><delim></tt> | |
must not appear on a line by itself within <tt><raw></tt>, as otherwise | |
fast-import will think the data ends earlier than it really does. The <tt>LF</tt> | |
immediately trailing <tt><raw></tt> is part of <tt><raw></tt>. This is one of | |
the limitations of the delimited format, it is impossible to supply | |
a data chunk which does not have an LF as its last byte.</p> | |
<p>The <tt>LF</tt> after <tt><delim> LF</tt> is optional (it used to be required).</p> | |
</dd> | |
</dl> | |
<h3><tt>checkpoint</tt></h3> | |
<p>Forces fast-import to close the current packfile, start a new one, and to | |
save out all current branch refs, tags and marks.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'checkpoint' LF | |
LF?</tt></pre> | |
</div></div> | |
<p>Note that fast-import automatically switches packfiles when the current | |
packfile reaches --max-pack-size, or 4 GiB, whichever limit is | |
smaller. During an automatic packfile switch fast-import does not update | |
the branch refs, tags or marks.</p> | |
<p>As a <tt>checkpoint</tt> can require a significant amount of CPU time and | |
disk IO (to compute the overall pack SHA-1 checksum, generate the | |
corresponding index file, and update the refs) it can easily take | |
several minutes for a single <tt>checkpoint</tt> command to complete.</p> | |
<p>Frontends may choose to issue checkpoints during extremely large | |
and long running imports, or when they need to allow another Git | |
process access to a branch. However given that a 30 GiB Subversion | |
repository can be loaded into Git through fast-import in about 3 hours, | |
explicit checkpointing may not be necessary.</p> | |
<p>The <tt>LF</tt> after the command is optional (it used to be required).</p> | |
<h3><tt>progress</tt></h3> | |
<p>Causes fast-import to print the entire <tt>progress</tt> line unmodified to | |
its standard output channel (file descriptor 1) when the command is | |
processed from the input stream. The command otherwise has no impact | |
on the current import, or on any of fast-import's internal state.</p> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt> 'progress' SP <any> LF | |
LF?</tt></pre> | |
</div></div> | |
<p>The <tt><any></tt> part of the command may contain any sequence of bytes | |
that does not contain <tt>LF</tt>. The <tt>LF</tt> after the command is optional. | |
Callers may wish to process the output through a tool such as sed to | |
remove the leading part of the line, for example:</p> | |
<div class="exampleblock"> | |
<div class="exampleblock-content"> | |
<div class="literalblock"> | |
<div class="content"> | |
<pre><tt>frontend | git-fast-import | sed 's/^progress //'</tt></pre> | |
</div></div> | |
</div></div> | |
<p>Placing a <tt>progress</tt> command immediately after a <tt>checkpoint</tt> will | |
inform the reader when the <tt>checkpoint</tt> has been completed and it | |
can safely access the refs that fast-import updated.</p> | |
</div> | |
<h2>Tips and Tricks</h2> | |
<div class="sectionbody"> | |
<p>The following tips and tricks have been collected from various | |
users of fast-import, and are offered here as suggestions.</p> | |
<h3>Use One Mark Per Commit</h3> | |
<p>When doing a repository conversion, use a unique mark per commit | |
(<tt>mark :<n></tt>) and supply the --export-marks option on the command | |
line. fast-import will dump a file which lists every mark and the Git | |
object SHA-1 that corresponds to it. If the frontend can tie | |
the marks back to the source repository, it is easy to verify the | |
accuracy and completeness of the import by comparing each Git | |
commit to the corresponding source revision.</p> | |
<p>Coming from a system such as Perforce or Subversion this should be | |
quite simple, as the fast-import mark can also be the Perforce changeset | |
number or the Subversion revision number.</p> | |
<h3>Freely Skip Around Branches</h3> | |
<p>Don't bother trying to optimize the frontend to stick to one branch | |
at a time during an import. Although doing so might be slightly | |
faster for fast-import, it tends to increase the complexity of the frontend | |
code considerably.</p> | |
<p>The branch LRU builtin to fast-import tends to behave very well, and the | |
cost of activating an inactive branch is so low that bouncing around | |
between branches has virtually no impact on import performance.</p> | |
<h3>Handling Renames</h3> | |
<p>When importing a renamed file or directory, simply delete the old | |
name(s) and modify the new name(s) during the corresponding commit. | |
Git performs rename detection after-the-fact, rather than explicitly | |
during a commit.</p> | |
<h3>Use Tag Fixup Branches</h3> | |
<p>Some other SCM systems let the user create a tag from multiple | |
files which are not from the same commit/changeset. Or to create | |
tags which are a subset of the files available in the repository.</p> | |
<p>Importing these tags as-is in Git is impossible without making at | |
least one commit which “fixes up” the files to match the content | |
of the tag. Use fast-import's <tt>reset</tt> command to reset a dummy branch | |
outside of your normal branch space to the base commit for the tag, | |
then commit one or more file fixup commits, and finally tag the | |
dummy branch.</p> | |
<p>For example since all normal branches are stored under <tt>refs/heads/</tt> | |
name the tag fixup branch <tt>TAG_FIXUP</tt>. This way it is impossible for | |
the fixup branch used by the importer to have namespace conflicts | |
with real branches imported from the source (the name <tt>TAG_FIXUP</tt> | |
is not <tt>refs/heads/TAG_FIXUP</tt>).</p> | |
<p>When committing fixups, consider using <tt>merge</tt> to connect the | |
commit(s) which are supplying file revisions to the fixup branch. | |
Doing so will allow tools such as <a href="git-blame.html">git-blame(1)</a> to track | |
through the real commit history and properly annotate the source | |
files.</p> | |
<p>After fast-import terminates the frontend will need to do <tt>rm .git/TAG_FIXUP</tt> | |
to remove the dummy branch.</p> | |
<h3>Import Now, Repack Later</h3> | |
<p>As soon as fast-import completes the Git repository is completely valid | |
and ready for use. Typically this takes only a very short time, | |
even for considerably large projects (100,000+ commits).</p> | |
<p>However repacking the repository is necessary to improve data | |
locality and access performance. It can also take hours on extremely | |
large projects (especially if -f and a large --window parameter is | |
used). Since repacking is safe to run alongside readers and writers, | |
run the repack in the background and let it finish when it finishes. | |
There is no reason to wait to explore your new Git project!</p> | |
<p>If you choose to wait for the repack, don't try to run benchmarks | |
or performance tests until repacking is completed. fast-import outputs | |
suboptimal packfiles that are simply never seen in real use | |
situations.</p> | |
<h3>Repacking Historical Data</h3> | |
<p>If you are repacking very old imported data (e.g. older than the | |
last year), consider expending some extra CPU time and supplying | |
--window=50 (or higher) when you run <a href="git-repack.html">git-repack(1)</a>. | |
This will take longer, but will also produce a smaller packfile. | |
You only need to expend the effort once, and everyone using your | |
project will benefit from the smaller repository.</p> | |
<h3>Include Some Progress Messages</h3> | |
<p>Every once in a while have your frontend emit a <tt>progress</tt> message | |
to fast-import. The contents of the messages are entirely free-form, | |
so one suggestion would be to output the current month and year | |
each time the current commit date moves into the next month. | |
Your users will feel better knowing how much of the data stream | |
has been processed.</p> | |
</div> | |
<h2>Packfile Optimization</h2> | |
<div class="sectionbody"> | |
<p>When packing a blob fast-import always attempts to deltify against the last | |
blob written. Unless specifically arranged for by the frontend, | |
this will probably not be a prior version of the same file, so the | |
generated delta will not be the smallest possible. The resulting | |
packfile will be compressed, but will not be optimal.</p> | |
<p>Frontends which have efficient access to all revisions of a | |
single file (for example reading an RCS/CVS ,v file) can choose | |
to supply all revisions of that file as a sequence of consecutive | |
<tt>blob</tt> commands. This allows fast-import to deltify the different file | |
revisions against each other, saving space in the final packfile. | |
Marks can be used to later identify individual file revisions during | |
a sequence of <tt>commit</tt> commands.</p> | |
<p>The packfile(s) created by fast-import do not encourage good disk access | |
patterns. This is caused by fast-import writing the data in the order | |
it is received on standard input, while Git typically organizes | |
data within packfiles to make the most recent (current tip) data | |
appear before historical data. Git also clusters commits together, | |
speeding up revision traversal through better cache locality.</p> | |
<p>For this reason it is strongly recommended that users repack the | |
repository with <tt>git repack -a -d</tt> after fast-import completes, allowing | |
Git to reorganize the packfiles for faster data access. If blob | |
deltas are suboptimal (see above) then also adding the <tt>-f</tt> option | |
to force recomputation of all deltas can significantly reduce the | |
final packfile size (30-50% smaller can be quite typical).</p> | |
</div> | |
<h2>Memory Utilization</h2> | |
<div class="sectionbody"> | |
<p>There are a number of factors which affect how much memory fast-import | |
requires to perform an import. Like critical sections of core | |
Git, fast-import uses its own memory allocators to amortize any overheads | |
associated with malloc. In practice fast-import tends to amortize any | |
malloc overheads to 0, due to its use of large block allocations.</p> | |
<h3>per object</h3> | |
<p>fast-import maintains an in-memory structure for every object written in | |
this execution. On a 32 bit system the structure is 32 bytes, | |
on a 64 bit system the structure is 40 bytes (due to the larger | |
pointer sizes). Objects in the table are not deallocated until | |
fast-import terminates. Importing 2 million objects on a 32 bit system | |
will require approximately 64 MiB of memory.</p> | |
<p>The object table is actually a hashtable keyed on the object name | |
(the unique SHA-1). This storage configuration allows fast-import to reuse | |
an existing or already written object and avoid writing duplicates | |
to the output packfile. Duplicate blobs are surprisingly common | |
in an import, typically due to branch merges in the source.</p> | |
<h3>per mark</h3> | |
<p>Marks are stored in a sparse array, using 1 pointer (4 bytes or 8 | |
bytes, depending on pointer size) per mark. Although the array | |
is sparse, frontends are still strongly encouraged to use marks | |
between 1 and n, where n is the total number of marks required for | |
this import.</p> | |
<h3>per branch</h3> | |
<p>Branches are classified as active and inactive. The memory usage | |
of the two classes is significantly different.</p> | |
<p>Inactive branches are stored in a structure which uses 96 or 120 | |
bytes (32 bit or 64 bit systems, respectively), plus the length of | |
the branch name (typically under 200 bytes), per branch. fast-import will | |
easily handle as many as 10,000 inactive branches in under 2 MiB | |
of memory.</p> | |
<p>Active branches have the same overhead as inactive branches, but | |
also contain copies of every tree that has been recently modified on | |
that branch. If subtree <tt>include</tt> has not been modified since the | |
branch became active, its contents will not be loaded into memory, | |
but if subtree <tt>src</tt> has been modified by a commit since the branch | |
became active, then its contents will be loaded in memory.</p> | |
<p>As active branches store metadata about the files contained on that | |
branch, their in-memory storage size can grow to a considerable size | |
(see below).</p> | |
<p>fast-import automatically moves active branches to inactive status based on | |
a simple least-recently-used algorithm. The LRU chain is updated on | |
each <tt>commit</tt> command. The maximum number of active branches can be | |
increased or decreased on the command line with --active-branches=.</p> | |
<h3>per active tree</h3> | |
<p>Trees (aka directories) use just 12 bytes of memory on top of the | |
memory required for their entries (see “per active file” below). | |
The cost of a tree is virtually 0, as its overhead amortizes out | |
over the individual file entries.</p> | |
<h3>per active file entry</h3> | |
<p>Files (and pointers to subtrees) within active trees require 52 or 64 | |
bytes (32/64 bit platforms) per entry. To conserve space, file and | |
tree names are pooled in a common string table, allowing the filename | |
“Makefile” to use just 16 bytes (after including the string header | |
overhead) no matter how many times it occurs within the project.</p> | |
<p>The active branch LRU, when coupled with the filename string pool | |
and lazy loading of subtrees, allows fast-import to efficiently import | |
projects with 2,000+ branches and 45,114+ files in a very limited | |
memory footprint (less than 2.7 MiB per active branch).</p> | |
</div> | |
<h2>Author</h2> | |
<div class="sectionbody"> | |
<p>Written by Shawn O. Pearce <spearce@spearce.org>.</p> | |
</div> | |
<h2>Documentation</h2> | |
<div class="sectionbody"> | |
<p>Documentation by Shawn O. Pearce <spearce@spearce.org>.</p> | |
</div> | |
<h2>GIT</h2> | |
<div class="sectionbody"> | |
<p>Part of the <a href="git.html">git(7)</a> suite</p> | |
</div> | |
<div id="footer"> | |
<div id="footer-text"> | |
Last updated 25-Aug-2007 03:53:08 UTC | |
</div> | |
</div> | |
</body> | |
</html> |