Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 1 | At the core level, git is character encoding agnostic. |
| 2 | |
| 3 | - The pathnames recorded in the index and in the tree objects |
| 4 | are treated as uninterpreted sequences of non-NUL bytes. |
| 5 | What readdir(2) returns are what are recorded and compared |
| 6 | with the data git keeps track of, which in turn are expected |
| 7 | to be what lstat(2) and creat(2) accepts. There is no such |
| 8 | thing as pathname encoding translation. |
| 9 | |
Junio C Hamano | 54bf1e2 | 2008-12-20 06:30:11 | [diff] [blame^] | 10 | - The contents of the blob objects are uninterpreted sequences |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 11 | of bytes. There is no encoding translation at the core |
| 12 | level. |
| 13 | |
Junio C Hamano | 54bf1e2 | 2008-12-20 06:30:11 | [diff] [blame^] | 14 | - The commit log messages are uninterpreted sequences of non-NUL |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 15 | bytes. |
| 16 | |
| 17 | Although we encourage that the commit log messages are encoded |
| 18 | in UTF-8, both the core and git Porcelain are designed not to |
| 19 | force UTF-8 on projects. If all participants of a particular |
| 20 | project find it more convenient to use legacy encodings, git |
| 21 | does not forbid it. However, there are a few things to keep in |
| 22 | mind. |
| 23 | |
Junio C Hamano | 3727619 | 2008-09-04 00:24:07 | [diff] [blame] | 24 | . 'git-commit' and 'git-commit-tree' issues |
Junio C Hamano | 14b7648 | 2008-01-05 10:32:26 | [diff] [blame] | 25 | a warning if the commit log message given to it does not look |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 26 | like a valid UTF-8 string, unless you explicitly say your |
| 27 | project uses a legacy encoding. The way to say this is to |
Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 28 | have i18n.commitencoding in `.git/config` file, like this: |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 29 | + |
| 30 | ------------ |
Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 31 | [i18n] |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 32 | commitencoding = ISO-8859-1 |
| 33 | ------------ |
| 34 | + |
| 35 | Commit objects created with the above setting record the value |
Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 36 | of `i18n.commitencoding` in its `encoding` header. This is to |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 37 | help other people who look at them later. Lack of this header |
| 38 | implies that the commit log message is encoded in UTF-8. |
| 39 | |
Junio C Hamano | aa17c7c | 2008-11-03 04:36:58 | [diff] [blame] | 40 | . 'git-log', 'git-show', 'git-blame' and friends look at the |
| 41 | `encoding` header of a commit object, and try to re-code the |
| 42 | log message into UTF-8 unless otherwise specified. You can |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 43 | specify the desired output encoding with |
Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 44 | `i18n.logoutputencoding` in `.git/config` file, like this: |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 45 | + |
| 46 | ------------ |
Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 47 | [i18n] |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 48 | logoutputencoding = ISO-8859-1 |
| 49 | ------------ |
| 50 | + |
| 51 | If you do not have this configuration variable, the value of |
Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 52 | `i18n.commitencoding` is used instead. |
Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 53 | |
| 54 | Note that we deliberately chose not to re-code the commit log |
| 55 | message when a commit is made to force UTF-8 at the commit |
| 56 | object level, because re-coding to UTF-8 is not necessarily a |
| 57 | reversible operation. |