blob: 708da6ca31c1bc0a714df9f437d6813b016dacb7 [file] [log] [blame]
Junio C Hamano775a0f42006-12-31 01:19:141At the core level, git is character encoding agnostic.
2
3 - The pathnames recorded in the index and in the tree objects
4 are treated as uninterpreted sequences of non-NUL bytes.
5 What readdir(2) returns are what are recorded and compared
6 with the data git keeps track of, which in turn are expected
7 to be what lstat(2) and creat(2) accepts. There is no such
8 thing as pathname encoding translation.
9
Junio C Hamano54bf1e22008-12-20 06:30:1110 - The contents of the blob objects are uninterpreted sequences
Junio C Hamano775a0f42006-12-31 01:19:1411 of bytes. There is no encoding translation at the core
12 level.
13
Junio C Hamano54bf1e22008-12-20 06:30:1114 - The commit log messages are uninterpreted sequences of non-NUL
Junio C Hamano775a0f42006-12-31 01:19:1415 bytes.
16
17Although we encourage that the commit log messages are encoded
18in UTF-8, both the core and git Porcelain are designed not to
19force UTF-8 on projects. If all participants of a particular
20project find it more convenient to use legacy encodings, git
21does not forbid it. However, there are a few things to keep in
22mind.
23
Junio C Hamano37276192008-09-04 00:24:0724. 'git-commit' and 'git-commit-tree' issues
Junio C Hamano14b76482008-01-05 10:32:2625 a warning if the commit log message given to it does not look
Junio C Hamano775a0f42006-12-31 01:19:1426 like a valid UTF-8 string, unless you explicitly say your
27 project uses a legacy encoding. The way to say this is to
Junio C Hamano35bb3f62007-02-19 05:35:5328 have i18n.commitencoding in `.git/config` file, like this:
Junio C Hamano775a0f42006-12-31 01:19:1429+
30------------
Junio C Hamano35bb3f62007-02-19 05:35:5331[i18n]
Junio C Hamano775a0f42006-12-31 01:19:1432commitencoding = ISO-8859-1
33------------
34+
35Commit objects created with the above setting record the value
Junio C Hamano35bb3f62007-02-19 05:35:5336of `i18n.commitencoding` in its `encoding` header. This is to
Junio C Hamano775a0f42006-12-31 01:19:1437help other people who look at them later. Lack of this header
38implies that the commit log message is encoded in UTF-8.
39
Junio C Hamanoaa17c7c2008-11-03 04:36:5840. 'git-log', 'git-show', 'git-blame' and friends look at the
41 `encoding` header of a commit object, and try to re-code the
42 log message into UTF-8 unless otherwise specified. You can
Junio C Hamano775a0f42006-12-31 01:19:1443 specify the desired output encoding with
Junio C Hamano35bb3f62007-02-19 05:35:5344 `i18n.logoutputencoding` in `.git/config` file, like this:
Junio C Hamano775a0f42006-12-31 01:19:1445+
46------------
Junio C Hamano35bb3f62007-02-19 05:35:5347[i18n]
Junio C Hamano775a0f42006-12-31 01:19:1448logoutputencoding = ISO-8859-1
49------------
50+
51If you do not have this configuration variable, the value of
Junio C Hamano35bb3f62007-02-19 05:35:5352`i18n.commitencoding` is used instead.
Junio C Hamano775a0f42006-12-31 01:19:1453
54Note that we deliberately chose not to re-code the commit log
55message when a commit is made to force UTF-8 at the commit
56object level, because re-coding to UTF-8 is not necessarily a
57reversible operation.