| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 1 | At the core level, git is character encoding agnostic. | 
 | 2 |  | 
 | 3 |  - The pathnames recorded in the index and in the tree objects | 
 | 4 |  are treated as uninterpreted sequences of non-NUL bytes. | 
 | 5 |  What readdir(2) returns are what are recorded and compared | 
 | 6 |  with the data git keeps track of, which in turn are expected | 
 | 7 |  to be what lstat(2) and creat(2) accepts. There is no such | 
 | 8 |  thing as pathname encoding translation. | 
 | 9 |  | 
 | 10 |  - The contents of the blob objects are uninterpreted sequence | 
 | 11 |  of bytes. There is no encoding translation at the core | 
 | 12 |  level. | 
 | 13 |  | 
 | 14 |  - The commit log messages are uninterpreted sequence of non-NUL | 
 | 15 |  bytes. | 
 | 16 |  | 
 | 17 | Although we encourage that the commit log messages are encoded | 
 | 18 | in UTF-8, both the core and git Porcelain are designed not to | 
 | 19 | force UTF-8 on projects. If all participants of a particular | 
 | 20 | project find it more convenient to use legacy encodings, git | 
 | 21 | does not forbid it. However, there are a few things to keep in | 
 | 22 | mind. | 
 | 23 |  | 
| Junio C Hamano | 3727619 | 2008-09-04 00:24:07 | [diff] [blame] | 24 | . 'git-commit' and 'git-commit-tree' issues | 
| Junio C Hamano | 14b7648 | 2008-01-05 10:32:26 | [diff] [blame] | 25 |  a warning if the commit log message given to it does not look | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 26 |  like a valid UTF-8 string, unless you explicitly say your | 
 | 27 |  project uses a legacy encoding. The way to say this is to | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 28 |  have i18n.commitencoding in `.git/config` file, like this: | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 29 | + | 
 | 30 | ------------ | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 31 | [i18n] | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 32 | commitencoding = ISO-8859-1 | 
 | 33 | ------------ | 
 | 34 | + | 
 | 35 | Commit objects created with the above setting record the value | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 36 | of `i18n.commitencoding` in its `encoding` header. This is to | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 37 | help other people who look at them later. Lack of this header | 
 | 38 | implies that the commit log message is encoded in UTF-8. | 
 | 39 |  | 
| Junio C Hamano | aa17c7c | 2008-11-03 04:36:58 | [diff] [blame] | 40 | . 'git-log', 'git-show', 'git-blame' and friends look at the | 
 | 41 |  `encoding` header of a commit object, and try to re-code the | 
 | 42 |  log message into UTF-8 unless otherwise specified. You can | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 43 |  specify the desired output encoding with | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 44 |  `i18n.logoutputencoding` in `.git/config` file, like this: | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 45 | + | 
 | 46 | ------------ | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 47 | [i18n] | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 48 | logoutputencoding = ISO-8859-1 | 
 | 49 | ------------ | 
 | 50 | + | 
 | 51 | If you do not have this configuration variable, the value of | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 52 | `i18n.commitencoding` is used instead. | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 53 |  | 
 | 54 | Note that we deliberately chose not to re-code the commit log | 
 | 55 | message when a commit is made to force UTF-8 at the commit | 
 | 56 | object level, because re-coding to UTF-8 is not necessarily a | 
 | 57 | reversible operation. |