Blame - technical/rerere.txt - external/gitster/git-htmldocs

blob: aa22d7ace8930fab0d54adc0e613055ea1aca0f4 [file] [log] [blame]

Junio C Hamano	980e61e	2018-09-17 22:45:52	[diff] [blame]	1	Rerere
				2	======
				3
				4	This document describes the rerere logic.
				5
				6	Conflict normalization
				7	----------------------
				8
				9	To ensure recorded conflict resolutions can be looked up in the rerere
				10	database, even when branches are merged in a different order,
				11	different branches are merged that result in the same conflict, or
				12	when different conflict style settings are used, rerere normalizes the
				13	conflicts before writing them to the rerere database.
				14
				15	Different conflict styles and branch names are normalized by stripping
				16	the labels from the conflict markers, and removing the common ancestor
				17	version from the `diff3` conflict style. Branches that are merged
				18	in different order are normalized by sorting the conflict hunks. More
				19	on each of those steps in the following sections.
				20
				21	Once these two normalization operations are applied, a conflict ID is
				22	calculated based on the normalized conflict, which is later used by
				23	rerere to look up the conflict in the rerere database.
				24
				25	Removing the common ancestor version
				26	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				27
				28	Say we have three branches AB, AC and AC2. The common ancestor of
				29	these branches has a file with a line containing the string "A" (for
				30	brevity this is called "line A" in the rest of the document). In
				31	branch AB this line is changed to "B", in AC, this line is changed to
				32	"C", and branch AC2 is forked off of AC, after the line was changed to
				33	"C".
				34
				35	Forking a branch ABAC off of branch AB and then merging AC into it, we
				36	get a conflict like the following:
				37
				38	<<<<<<< HEAD
				39	B
				40	=======
				41	C
				42	>>>>>>> AC
				43
				44	Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
				45	and then merging branch AC2 into it), using the diff3 conflict style,
				46	we get a conflict like the following:
				47
				48	<<<<<<< HEAD
				49	B
				50	\|\|\|\|\|\|\| merged common ancestors
				51	A
				52	=======
				53	C
				54	>>>>>>> AC2
				55
				56	By resolving this conflict, to leave line D, the user declares:
				57
				58	After examining what branches AB and AC did, I believe that making
				59	line A into line D is the best thing to do that is compatible with
				60	what AB and AC wanted to do.
				61
				62	As branch AC2 refers to the same commit as AC, the above implies that
				63	this is also compatible what AB and AC2 wanted to do.
				64
				65	By extension, this means that rerere should recognize that the above
				66	conflicts are the same. To do this, the labels on the conflict
				67	markers are stripped, and the common ancestor version is removed. The above
				68	examples would both result in the following normalized conflict:
				69
				70	<<<<<<<
				71	B
				72	=======
				73	C
				74	>>>>>>>
				75
				76	Sorting hunks
				77	~~~~~~~~~~~~~
				78
				79	As before, lets imagine that a common ancestor had a file with line A
				80	its early part, and line X in its late part. And then four branches
				81	are forked that do these things:
				82
				83	- AB: changes A to B
				84	- AC: changes A to C
				85	- XY: changes X to Y
				86	- XZ: changes X to Z
				87
				88	Now, forking a branch ABAC off of branch AB and then merging AC into
				89	it, and forking a branch ACAB off of branch AC and then merging AB
				90	into it, would yield the conflict in a different order. The former
				91	would say "A became B or C, what now?" while the latter would say "A
				92	became C or B, what now?"
				93
				94	As a reminder, the act of merging AC into ABAC and resolving the
				95	conflict to leave line D means that the user declares:
				96
				97	After examining what branches AB and AC did, I believe that
				98	making line A into line D is the best thing to do that is
				99	compatible with what AB and AC wanted to do.
				100
				101	So the conflict we would see when merging AB into ACAB should be
				102	resolved the same way---it is the resolution that is in line with that
				103	declaration.
				104
				105	Imagine that similarly previously a branch XYXZ was forked from XY,
				106	and XZ was merged into it, and resolved "X became Y or Z" into "X
				107	became W".
				108
				109	Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
				110	would have line B in its early part and line Y in its later part.
				111	Such a merge would be quite clean. We can construct 4 combinations
				112	using these four branches ((AB, AC) x (XY, XZ)).
				113
				114	Merging ABXY and ACXZ would make "an early A became B or C, a late X
				115	became Y or Z" conflict, while merging ACXY and ABXZ would make "an
				116	early A became C or B, a late X became Y or Z". We can see there are
				117	4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
				118
				119	By sorting, the conflict is given its canonical name, namely, "an
				120	early part became B or C, a late part becames X or Y", and whenever
				121	any of these four patterns appear, and we can get to the same conflict
				122	and resolution that we saw earlier.
				123
				124	Without the sorting, we'd have to somehow find a previous resolution
				125	from combinatorial explosion.
				126
				127	Conflict ID calculation
				128	~~~~~~~~~~~~~~~~~~~~~~~
				129
				130	Once the conflict normalization is done, the conflict ID is calculated
				131	as the sha1 hash of the conflict hunks appended to each other,
				132	separated by <NUL> characters. The conflict markers are stripped out
				133	before the sha1 is calculated. So in the example above, where we
				134	merge branch AC which changes line A to line C, into branch AB, which
				135	changes line A to line C, the conflict ID would be
				136	SHA1('B<NUL>C<NUL>').
				137
				138	If there are multiple conflicts in one file, the sha1 is calculated
				139	the same way with all hunks appended to each other, in the order in
				140	which they appear in the file, separated by a <NUL> character.
				141
				142	Nested conflicts
				143	~~~~~~~~~~~~~~~~
				144
				145	Nested conflicts are handled very similarly to "simple" conflicts.
				146	Similar to simple conflicts, the conflict is first normalized by
				147	stripping the labels from conflict markers, stripping the common ancestor
				148	version, and the sorting the conflict hunks, both for the outer and the
				149	inner conflict. This is done recursively, so any number of nested
				150	conflicts can be handled.
				151
				152	Note that this only works for conflict markers that "cleanly nest". If
				153	there are any unmatched conflict markers, rerere will fail to handle
				154	the conflict and record a conflict resolution.
				155
				156	The only difference is in how the conflict ID is calculated. For the
				157	inner conflict, the conflict markers themselves are not stripped out
				158	before calculating the sha1.
				159
				160	Say we have the following conflict for example:
				161
				162	<<<<<<< HEAD
				163	1
				164	=======
				165	<<<<<<< HEAD
				166	3
				167	=======
				168	2
				169	>>>>>>> branch-2
				170	>>>>>>> branch-3~
				171
				172	After stripping out the labels of the conflict markers, and sorting
				173	the hunks, the conflict would look as follows:
				174
				175	<<<<<<<
				176	1
				177	=======
				178	<<<<<<<
				179	2
				180	=======
				181	3
				182	>>>>>>>
				183	>>>>>>>
				184
				185	and finally the conflict ID would be calculated as:
				186	`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`