| Junio C Hamano | d7ed404 | 2015-08-03 19:43:00 | [diff] [blame] | 1 | Git is to some extent character encoding agnostic. | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 2 |  | 
| Junio C Hamano | 54bf1e2 | 2008-12-20 06:30:11 | [diff] [blame] | 3 |  - The contents of the blob objects are uninterpreted sequences | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 4 |  of bytes. There is no encoding translation at the core | 
 | 5 |  level. | 
 | 6 |  | 
| Junio C Hamano | d7ed404 | 2015-08-03 19:43:00 | [diff] [blame] | 7 |  - Path names are encoded in UTF-8 normalization form C. This | 
 | 8 |  applies to tree objects, the index file, ref names, as well as | 
 | 9 |  path names in command line arguments, environment variables | 
 | 10 |  and config files (`.git/config` (see linkgit:git-config[1]), | 
 | 11 |  linkgit:gitignore[5], linkgit:gitattributes[5] and | 
 | 12 |  linkgit:gitmodules[5]). | 
 | 13 | + | 
 | 14 | Note that Git at the core level treats path names simply as | 
 | 15 | sequences of non-NUL bytes, there are no path name encoding | 
 | 16 | conversions (except on Mac and Windows). Therefore, using | 
 | 17 | non-ASCII path names will mostly work even on platforms and file | 
 | 18 | systems that use legacy extended ASCII encodings. However, | 
 | 19 | repositories created on such systems will not work properly on | 
 | 20 | UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa. | 
 | 21 | Additionally, many Git-based tools simply assume path names to | 
 | 22 | be UTF-8 and will fail to display other encodings correctly. | 
 | 23 |  | 
 | 24 |  - Commit log messages are typically encoded in UTF-8, but other | 
 | 25 |  extended ASCII encodings are also supported. This includes | 
 | 26 |  ISO-8859-x, CP125x and many others, but _not_ UTF-16/32, | 
 | 27 |  EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5, | 
 | 28 |  EUC-x, CP9xx etc.). | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 29 |  | 
 | 30 | Although we encourage that the commit log messages are encoded | 
| Junio C Hamano | 076ffcc | 2013-02-06 05:13:21 | [diff] [blame] | 31 | in UTF-8, both the core and Git Porcelain are designed not to | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 32 | force UTF-8 on projects. If all participants of a particular | 
| Junio C Hamano | 076ffcc | 2013-02-06 05:13:21 | [diff] [blame] | 33 | project find it more convenient to use legacy encodings, Git | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 34 | does not forbid it. However, there are a few things to keep in | 
 | 35 | mind. | 
 | 36 |  | 
| Junio C Hamano | 1aa40d2 | 2010-01-21 17:46:43 | [diff] [blame] | 37 | . 'git commit' and 'git commit-tree' issues | 
| Junio C Hamano | 14b7648 | 2008-01-05 10:32:26 | [diff] [blame] | 38 |  a warning if the commit log message given to it does not look | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 39 |  like a valid UTF-8 string, unless you explicitly say your | 
 | 40 |  project uses a legacy encoding. The way to say this is to | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 41 |  have i18n.commitencoding in `.git/config` file, like this: | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 42 | + | 
 | 43 | ------------ | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 44 | [i18n] | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 45 | commitencoding = ISO-8859-1 | 
 | 46 | ------------ | 
 | 47 | + | 
 | 48 | Commit objects created with the above setting record the value | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 49 | of `i18n.commitencoding` in its `encoding` header. This is to | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 50 | help other people who look at them later. Lack of this header | 
 | 51 | implies that the commit log message is encoded in UTF-8. | 
 | 52 |  | 
| Junio C Hamano | 1aa40d2 | 2010-01-21 17:46:43 | [diff] [blame] | 53 | . 'git log', 'git show', 'git blame' and friends look at the | 
| Junio C Hamano | aa17c7c | 2008-11-03 04:36:58 | [diff] [blame] | 54 |  `encoding` header of a commit object, and try to re-code the | 
 | 55 |  log message into UTF-8 unless otherwise specified. You can | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 56 |  specify the desired output encoding with | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 57 |  `i18n.logoutputencoding` in `.git/config` file, like this: | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 58 | + | 
 | 59 | ------------ | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 60 | [i18n] | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 61 | logoutputencoding = ISO-8859-1 | 
 | 62 | ------------ | 
 | 63 | + | 
 | 64 | If you do not have this configuration variable, the value of | 
| Junio C Hamano | 35bb3f6 | 2007-02-19 05:35:53 | [diff] [blame] | 65 | `i18n.commitencoding` is used instead. | 
| Junio C Hamano | 775a0f4 | 2006-12-31 01:19:14 | [diff] [blame] | 66 |  | 
 | 67 | Note that we deliberately chose not to re-code the commit log | 
 | 68 | message when a commit is made to force UTF-8 at the commit | 
 | 69 | object level, because re-coding to UTF-8 is not necessarily a | 
 | 70 | reversible operation. |