git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH 0/7] rerere: handle nested conflicts
@ 2018-05-20 21:12 Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
                   ` (7 more replies)
  0 siblings, 8 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

I started this whole patch series when I did a git rebase, and was too
lazy to resolve a conflict and just added the file with the conflict
markers and continued.  Once I got nested conflicts in the file, I
decided to abort the rebase with 'git rebase --abort' and got a
segfault in 'git rerere clear'.

Even if we can't handle the conflict, we shouldn't end with crashing
'git rerere clear'.  While trying to understand how 'git rerere' works
internally I noticed some other improvements that could be made, such
as marking the strings for translation and adding some docs on how
rerere works, since I had to find out from the code, and reading some
documentation would definitely have been helpful.

The next patches are more related to the actual problem I encountered,
first fixing the the possible crashing of 'git rerere clear' when we
can't handle conflicts in a file, and then actually trying to handle
nested conflicts.

I don't know if it's actually worth trying to handle nested conflicts,
as they are more than likely a very rare use-case, but on the other
hand resolving such conflicts is especially painful, so only having to
do it once would be much nicer.

This whole patch series is marked as RFC/PATCH, as this is my first
time touching the rerere code, so I may well misunderstand some bits
of the code.

Thomas Gummerer (7):
  rerere: unify error message when read_cache fails
  rerere: mark strings for translation
  rerere: add some documentation
  rerere: fix crash when conflict goes unresolved
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: teach rerere to handle nested conflicts

 Documentation/technical/rerere.txt |  43 +++++
 rerere.c                           | 244 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  25 +++
 3 files changed, 186 insertions(+), 126 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [RFC/PATCH 1/7] rerere: unify error message when read_cache fails
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-21 19:00   ` Stefan Beller
  2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

"index file corrupt" is also what Stefan chose for his series unifying
these error messages (and 'die'ing, which I'm not sure is the right
thing to do here as also mentioned in my reply to [1]).  I'm happy to
drop this if we decide to go with that series.

[1]: <20180516222118.233868-8-sbeller@google.com>

 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index 18cae2d11c..4b4869662d 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC/PATCH 2/7] rerere: mark strings for translation
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24  7:20   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a plumbing command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on its output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 68 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/rerere.c b/rerere.c
index 4b4869662d..af5e6179a9 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno(_("Could not open %s"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno(_("Could not write %s"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error(_("There were errors while writing %s (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error(_("Could not parse conflict hunks in %s"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno(_("failed utime() on %s"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno(_("Could not open %s"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno(_("Could not write %s"), path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno(_("Writing %s failed"), path);
 
 out:
 	free(cur.ptr);
@@ -715,13 +715,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die(_("Unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -753,7 +753,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -787,9 +787,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -803,11 +803,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die(_("Could not create directory %s"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1032,7 +1032,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error(_("Could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1050,7 +1050,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error(_("Failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1061,16 +1061,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error(_("no remembered resolution for %s"), path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno(_("cannot unlink %s"), filename);
 		goto fail_exit;
 	}
 
@@ -1080,7 +1080,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1089,7 +1089,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf_ln(stderr, _("Forgot resolution for %s"), path);
 	return 0;
 
 fail_exit:
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1192,7 +1192,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC/PATCH 3/7] rerere: add some documentation
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24  9:20   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.  Also describe a bug that happens because we just linearly
scan for conflict markers.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

This documents my understanding of the rerere conflict normalization
and conflict ID computation logic.  Writing this down helped me
understand the logic, and I thought it may be useful to have this as
documentation in Documentation/technical as well.  Junio: as you wrote
the original NEEDSWORK comment, did you have something more in mind
here that should be documented?

 Documentation/technical/rerere.txt | 43 ++++++++++++++++++++++++++++++
 rerere.c                           |  4 ---
 2 files changed, 43 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..94cc6a7ef0
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,43 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To try and re-do a conflict resolution, even when different merge
+strategies are used, 'rerere' computes a conflict ID for each
+conflict in the file.
+
+This is done by discarding the common ancestor version in the
+diff3-style, and re-ordering the two sides of the conflict, in
+alphabetic order.
+
+Using this technique a conflict that looks as follows when for example
+'master' was merged into a topic branch:
+
+    <<<<<<< HEAD
+    foo
+    =======
+    bar
+    >>>>>>> master
+
+and the opposite way when the topic branch is merged into 'master':
+
+    <<<<<<< HEAD
+    bar
+    =======
+    foo
+    >>>>>>> topic
+
+can be recognized as the same conflict, and can automatically be
+re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
+be calculated from 'bar<NUL>foo<NUL>' in both cases.
+
+If there are multiple conflicts in one file, they are all appended to
+one another, both in the 'preimage' file as well as in the conflict
+ID.
+
+This is currently implemented by simply scanning through the file and
+looking for conflict markers.
diff --git a/rerere.c b/rerere.c
index af5e6179a9..a02a38e072 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (2 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24  9:47   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict in a file, but
commits the file with the conflict markers, and later the file ends up
in a state in which rerere can't handle it, subsequent rerere
operations that are interested in that path, such as 'rerere clear' or
'rerere forget <path>' will fail, or even worse in the case of 'rerere
clear' segfault.

Such states include nested conflicts, or an extra conflict marker that
doesn't have any match.

This is because the first 'git rerere' when there was only one
conflict in the file leaves an entry in the MERGE_RR file behind.  The
next 'git rerere' will then pick the rerere ID for that file up, and
not assign a new ID as it can't successfully calculate one.  It will
however still try to do the rerere operation, because of the existing
ID.  As the handle_file function fails, it will remove the 'preimage'
for the ID in the process, while leaving the ID in the MERGE_RR file.

Now when 'rerere clear' for example is run, it will segfault in
'has_rerere_resolution', because status is NULL.

To fix this, remove the rerere ID from the MERGE_RR file in case we
can't handle it, and remove the folder for the ID.  Removing it
unconditionally is fine here, because if the user would have resolved
the conflict and ran rerere, the entry would no longer be in the
MERGE_RR file, so we wouldn't have this problem in the first place,
while if the conflict was not resolved, the only thing that's left in
the folder is the 'preimage', which by itself will be regenerated by
git if necessary, so the user won't loose any work.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

I realize the test here may not be as complete as we would want it to
be.  But I first wanted to get some feedback on the approach, before
spending too much time on a proper test (I did test it manually, and
the test does show that the original problem is fixed, but it probably
deserves some cleanup).

 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 25 +++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index a02a38e072..49ace8e108 100644
--- a/rerere.c
+++ b/rerere.c
@@ -824,10 +824,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -835,7 +832,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index eaf18c81cb..27f8afc0b4 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,29 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with extra conflict markers keeps working' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge branch-1 &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (3 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24 10:02   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 49ace8e108..f3e658e374 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("Could not parse conflict hunks in %s"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -955,7 +956,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1009,11 +1010,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC/PATCH 6/7] rerere: factor out handle_conflict function
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (4 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  7 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is a preparation for the next step, where
this function will be re-used.  No functional changes intended.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 143 +++++++++++++++++++++++++------------------------------
 1 file changed, 65 insertions(+), 78 deletions(-)

diff --git a/rerere.c b/rerere.c
index f3e658e374..f3cfd1c09b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,37 +352,25 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
-
+	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
-		} else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size))
+			goto bad;
+		else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
@@ -427,42 +383,73 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			has_conflicts = 1;
-			hunk = RR_CONTEXT;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			goto out;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
+bad:
+	has_conflicts = -1;
+out:
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf out = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(&out, io, marker_size,
+							    sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+	strbuf_release(&out);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (5 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24 10:21   ` Junio C Hamano
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  7 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be reordered as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as

    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place should have no
practical impact, but it simplifies the implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

No automated test for this yet.  As mentioned in the cover letter as
well, I'm not sure if this is common enough for us to actually
consider this use case.  I don't know how nested conflicts could
actually be created apart from committing a file with conflict
markers, but maybe I'm just lacking imagination, so if someone has an
example for that I would be very grateful :)  If we decide to do this,
it probably also merits a mention in
Documentation/technical/rerere.txt.

 rerere.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f3cfd1c09b..45e2bd6ff1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size))
-			goto bad;
-		else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				goto bad;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
+		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 1/7] rerere: unify error message when read_cache fails
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
@ 2018-05-21 19:00   ` Stefan Beller
  0 siblings, 0 replies; 84+ messages in thread
From: Stefan Beller @ 2018-05-21 19:00 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Junio C Hamano

On Sun, May 20, 2018 at 2:12 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> We have multiple different variants of the error message we show to
> the user if 'read_cache' fails.  The "Could not read index" variant we
> are using in 'rerere.c' is currently not used anywhere in translated
> form.
>
> As a subsequent commit will mark all output that comes from 'rerere.c'
> for translation, make the life of the translators a little bit easier
> by using a string that is used elsewhere, and marked for translation
> there, and thus most likely already translated.
>
> "index file corrupt" seems to be the most common error message we show
> when 'read_cache' fails, so use that here as well.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>
> "index file corrupt" is also what Stefan chose for his series unifying
> these error messages (and 'die'ing, which I'm not sure is the right
> thing to do here as also mentioned in my reply to [1]).  I'm happy to
> drop this if we decide to go with that series.

Acked-by: <me>

I'd happily have this patch instead of the one in my series.

I was about to ask for translation, but the commit message hints
at a follow up patch marking this for translation, so I'll read on.

Stefan

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 2/7] rerere: mark strings for translation
  2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
@ 2018-05-24  7:20   ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2018-05-24  7:20 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

>  		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
> -			die("unable to write rerere record");
> +			die(_("unable to write rerere record"));

As we'd be adding these new strings to the .po file, perhaps we
would want to downcase the first letter in the message to match the
convention?

> ...
> -		return error_errno("Could not open %s", path);
> +		return error_errno(_("Could not open %s"), path);
> ...
> -		error("There were errors while writing %s (%s)",
> +		error(_("There were errors while writing %s (%s)"),
> ...
> -		io.io.wrerror = error_errno("Failed to flush %s", path);
> +		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
> ...
> -		return error("Could not parse conflict hunks in %s", path);
> +		return error(_("Could not parse conflict hunks in %s"), path);

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 3/7] rerere: add some documentation
  2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
@ 2018-05-24  9:20   ` Junio C Hamano
  2018-06-03 11:41     ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-05-24  9:20 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> +Conflict normalization
> +----------------------
> +
> +To try and re-do a conflict resolution, even when different merge
> +strategies are used, 'rerere' computes a conflict ID for each
> +conflict in the file.
> +
> +This is done by discarding the common ancestor version in the
> +diff3-style, and re-ordering the two sides of the conflict, in
> +alphabetic order.

s/discarding.*-style/normalising the conflicted section to 'merge' style/

The motivation behind the normalization should probably be given
upfront in the first paragraph.  It is to ensure the recorded
resolutions can be looked up from the rerere database for
application, even when branches are merged in different order.  I am
not sure what you meant by even when different merge stratagies are
used; I'd drop that if I were writing the paragraph.

> +Using this technique a conflict that looks as follows when for example
> +'master' was merged into a topic branch:
> +
> +    <<<<<<< HEAD
> +    foo
> +    =======
> +    bar
> +    >>>>>>> master
> +
> +and the opposite way when the topic branch is merged into 'master':
> +
> +    <<<<<<< HEAD
> +    bar
> +    =======
> +    foo
> +    >>>>>>> topic
> +
> +can be recognized as the same conflict, and can automatically be
> +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> +be calculated from 'bar<NUL>foo<NUL>' in both cases.

You earlier talked about normalizing and reordering, but did not
talk about "concatenate both with NUL in between and hash", so the
explanation in the last two lines are not quite understandable by
mere mortals, even though I know which part of the code you are
referring to.  When you talk about hasing, you may want to make sure
the readers understand that the branch label on <<< and >>> lines
are ignored.

> +If there are multiple conflicts in one file, they are all appended to
> +one another, both in the 'preimage' file as well as in the conflict
> +ID.

In case it was not clear (and I do not think it is to those who only
read your description and haven't thought things through
themselves), this concatenation is why the normalization by
reordering is helpful.  Imagine that a common ancestor had a file
with a line with string "A" on it (I'll call such a line "line A"
for brevity in the following) in its early part, and line X in its
late part.  And then you fork four branches that do these things:

    - AB: changes A to B
    - AC: changes A to C
    - XY: changes X to Y
    - XZ: changes X to Z

Now, forking a branch ABAC off of branch AB and then merging AC into
it, and forking a branch ACAB off of branch AC and then merging AB
into it, would yield the conflict in a different order.  The former
would say "A became B or C, what now?" while the latter would say "A
became C or B, what now?"

But the act of merging AC into ABAC and resolving the conflict to
leave line D means that you declare: 

    After examining what branches AB and AC did, I believe that
    making line A into line D is the best thing to do that is
    compatible with what AB and AC wanted to do.

So the conflict we would see when merging AB into ACAB should be
resolved the same way---it is the resolution that is in line with
that declaration.

Imagine that similarly you had previously forked branch XYXZ from
XY, merged XZ into it, and resolved "X became Y or Z" into "X became
W".

Now, if you forked a branch ABXY from AB and then merged XY, then
ABXY would have line B in its early part and line Y in its later
part.  Such a merge would be quite clean.  We can construct
4 combinations using these four branches ((AB, AC) x (XY, XZ)).

Merging ABXY and ACXZ would make "an early A became B or C, a late X
became Y or Z" conflict, while merging ACXY and ABXZ would make "an
early A became C or B, a late X became Y or Z".  We can see there
are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").

By sorting, we can give the conflict its canonical name, namely, "an
early part became B or C, a late part becames X or Y", and whenever
any of these four patterns appear, we can get to the same conflict
and resolution that we saw earlier.  Without the sorting, we will
have to somehow find a previous resolution from combinatorial
explosion ;-)

These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
2016-05-23), the conflict ID can safely collide, i.e. hash
collisions that drops completely different conflicts and their
resolutions into the same .git/rr-cache/$id directory will not
interfere with proper operation of the system, thanks to that
rerere-multi topic that allows us to store multiple preimage
conflicts that happens to share the same conflict ID with their
corresponding postimage resolutions.

In theory, we *should* be able to stub out the SHA-1 computation and
give every conflict the same ID and rerere should still operate
correctly, even though I haven't tried it yet myself.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-05-24  9:47   ` Junio C Hamano
  2018-05-24 18:54     ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-05-24  9:47 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> To fix this, remove the rerere ID from the MERGE_RR file in case we
> can't handle it, and remove the folder for the ID.  Removing it
> unconditionally is fine here, because if the user would have resolved
> the conflict and ran rerere, the entry would no longer be in the
> MERGE_RR file, so we wouldn't have this problem in the first place,

I do not think removing the directory and losing _other_ conflicts
and their resolutions, if they exist, is fine in the modern world
order post rerere-multi update in 2016.  Well, it is just as safe as
"rm -rf .git/rr-cache/" in the sense that it won't make Git start
segfaulting, but it is not fine as it is discarding information of
conflicts that has nothing to do with the current one that is
problematic.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not
  2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-05-24 10:02   ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2018-05-24 10:02 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> We currently return the exact number of conflict hunks a certain path
> has from the 'handle_paths' function.  However all of its callers only
> care whether there are conflicts or not or if there is an error.
> Return only that information, and document that only that information
> is returned.  This will simplify the code in the subsequent steps.

Makes sense.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts
  2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-05-24 10:21   ` Junio C Hamano
  2018-05-24 19:07     ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-05-24 10:21 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> No automated test for this yet.  As mentioned in the cover letter as
> well, I'm not sure if this is common enough for us to actually
> consider this use case.  I don't know how nested conflicts could
> actually be created apart from committing a file with conflict
> markers,

Recursive merge whose inner merge leaves conflict markers?

One thing that makes me wonder is that the conflict markers may not
"nest" so nicely.  For example, if inner merges had two conflicts
like these:

<<<
 <<<<<
 A
 =====
 B
 >>>>>
===
 <<<<<
 A
 =====
 C
 >>>>>
>>>

where one side made something to A or B, while the other side made
something (or something else) to A or C, I would imagine that the
outer conflict could be "optimized" to produce this instead:


 <<<<<
 A
 =====
<<<
 B
===
 C
>>>
 >>>>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-24  9:47   ` Junio C Hamano
@ 2018-05-24 18:54     ` Thomas Gummerer
  2018-05-25  1:20       ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-24 18:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/24, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > To fix this, remove the rerere ID from the MERGE_RR file in case we
> > can't handle it, and remove the folder for the ID.  Removing it
> > unconditionally is fine here, because if the user would have resolved
> > the conflict and ran rerere, the entry would no longer be in the
> > MERGE_RR file, so we wouldn't have this problem in the first place,
> 
> I do not think removing the directory and losing _other_ conflicts
> and their resolutions, if they exist, is fine in the modern world
> order post rerere-multi update in 2016.  Well, it is just as safe as
> "rm -rf .git/rr-cache/" in the sense that it won't make Git start
> segfaulting, but it is not fine as it is discarding information of
> conflicts that has nothing to do with the current one that is
> problematic.

Sorry I botched the description here, and failed to describe what the
code is actually doing.  We're actually only removing the variant in
the MERGE_RR file, whose path we are now no longer able to handle.
And I think that's fine to do, because if it is still in the MERGE_RR
file the conflict hasn't been resolved yet, afaiu.

Will update the commit message.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts
  2018-05-24 10:21   ` Junio C Hamano
@ 2018-05-24 19:07     ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-05-24 19:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/24, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > No automated test for this yet.  As mentioned in the cover letter as
> > well, I'm not sure if this is common enough for us to actually
> > consider this use case.  I don't know how nested conflicts could
> > actually be created apart from committing a file with conflict
> > markers,
> 
> Recursive merge whose inner merge leaves conflict markers?

Thanks, lots of stuff in Git I still have to learn :)

> One thing that makes me wonder is that the conflict markers may not
> "nest" so nicely.  For example, if inner merges had two conflicts
> like these:
> 
> <<<
>  <<<<<
>  A
>  =====
>  B
>  >>>>>
> ===
>  <<<<<
>  A
>  =====
>  C
>  >>>>>
> >>>
> 
> where one side made something to A or B, while the other side made
> something (or something else) to A or C, I would imagine that the
> outer conflict could be "optimized" to produce this instead:
> 
> 
>  <<<<<
>  A
>  =====
> <<<
>  B
> ===
>  C
> >>>
>  >>>>>

Yeah, I do think that would be a nicer merge conflict to solve.  But I
think that should be done in a separate patch series if we decide to
do so.  When this one lands rerere will be able to handle the conflict
either way :)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-24 18:54     ` Thomas Gummerer
@ 2018-05-25  1:20       ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2018-05-25  1:20 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Sorry I botched the description here, and failed to describe what the
> code is actually doing.  We're actually only removing the variant in
> the MERGE_RR file, whose path we are now no longer able to handle.

Oh, that's absolutely fine, then.  Thanks for a prompt update.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC/PATCH 3/7] rerere: add some documentation
  2018-05-24  9:20   ` Junio C Hamano
@ 2018-06-03 11:41     ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-03 11:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/24, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > +Conflict normalization
> > +----------------------
> > +
> > +To try and re-do a conflict resolution, even when different merge
> > +strategies are used, 'rerere' computes a conflict ID for each
> > +conflict in the file.
> > +
> > +This is done by discarding the common ancestor version in the
> > +diff3-style, and re-ordering the two sides of the conflict, in
> > +alphabetic order.
> 
> s/discarding.*-style/normalising the conflicted section to 'merge' style/
> 
> The motivation behind the normalization should probably be given
> upfront in the first paragraph.  It is to ensure the recorded
> resolutions can be looked up from the rerere database for
> application, even when branches are merged in different order.  I am
> not sure what you meant by even when different merge stratagies are
> used; I'd drop that if I were writing the paragraph.

What I meant was when different conflict styles are used, and when the
branches are merged in different orders.  But merge strategies is
obviously not a good word for that.  Will rephrase this.

> > +Using this technique a conflict that looks as follows when for example
> > +'master' was merged into a topic branch:
> > +
> > +    <<<<<<< HEAD
> > +    foo
> > +    =======
> > +    bar
> > +    >>>>>>> master
> > +
> > +and the opposite way when the topic branch is merged into 'master':
> > +
> > +    <<<<<<< HEAD
> > +    bar
> > +    =======
> > +    foo
> > +    >>>>>>> topic
> > +
> > +can be recognized as the same conflict, and can automatically be
> > +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> > +be calculated from 'bar<NUL>foo<NUL>' in both cases.
> 
> You earlier talked about normalizing and reordering, but did not
> talk about "concatenate both with NUL in between and hash", so the
> explanation in the last two lines are not quite understandable by
> mere mortals, even though I know which part of the code you are
> referring to.  When you talk about hasing, you may want to make sure
> the readers understand that the branch label on <<< and >>> lines
> are ignored.
> 
> > +If there are multiple conflicts in one file, they are all appended to
> > +one another, both in the 'preimage' file as well as in the conflict
> > +ID.
> 
> In case it was not clear (and I do not think it is to those who only
> read your description and haven't thought things through
> themselves), this concatenation is why the normalization by
> reordering is helpful.  Imagine that a common ancestor had a file
> with a line with string "A" on it (I'll call such a line "line A"
> for brevity in the following) in its early part, and line X in its
> late part.  And then you fork four branches that do these things:
> 
>     - AB: changes A to B
>     - AC: changes A to C
>     - XY: changes X to Y
>     - XZ: changes X to Z
> 
> Now, forking a branch ABAC off of branch AB and then merging AC into
> it, and forking a branch ACAB off of branch AC and then merging AB
> into it, would yield the conflict in a different order.  The former
> would say "A became B or C, what now?" while the latter would say "A
> became C or B, what now?"
> 
> But the act of merging AC into ABAC and resolving the conflict to
> leave line D means that you declare: 
> 
>     After examining what branches AB and AC did, I believe that
>     making line A into line D is the best thing to do that is
>     compatible with what AB and AC wanted to do.
> 
> So the conflict we would see when merging AB into ACAB should be
> resolved the same way---it is the resolution that is in line with
> that declaration.
> 
> Imagine that similarly you had previously forked branch XYXZ from
> XY, merged XZ into it, and resolved "X became Y or Z" into "X became
> W".
> 
> Now, if you forked a branch ABXY from AB and then merged XY, then
> ABXY would have line B in its early part and line Y in its later
> part.  Such a merge would be quite clean.  We can construct
> 4 combinations using these four branches ((AB, AC) x (XY, XZ)).
> 
> Merging ABXY and ACXZ would make "an early A became B or C, a late X
> became Y or Z" conflict, while merging ACXY and ABXZ would make "an
> early A became C or B, a late X became Y or Z".  We can see there
> are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
> 
> By sorting, we can give the conflict its canonical name, namely, "an
> early part became B or C, a late part becames X or Y", and whenever
> any of these four patterns appear, we can get to the same conflict
> and resolution that we saw earlier.  Without the sorting, we will
> have to somehow find a previous resolution from combinatorial
> explosion ;-)

Thanks for the in depth explanation!  I'll incorporate this into the
document.

> These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
> 2016-05-23), the conflict ID can safely collide, i.e. hash
> collisions that drops completely different conflicts and their
> resolutions into the same .git/rr-cache/$id directory will not
> interfere with proper operation of the system, thanks to that
> rerere-multi topic that allows us to store multiple preimage
> conflicts that happens to share the same conflict ID with their
> corresponding postimage resolutions.
> 
> In theory, we *should* be able to stub out the SHA-1 computation and
> give every conflict the same ID and rerere should still operate
> correctly, even though I haven't tried it yet myself.

I gave this a quick try, and the test suite seems to pass with the
hash computation giving the same ID to all conflicts.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 00/10] rerere: handle nested conflicts
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (6 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-06-05 21:52 ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
                     ` (11 more replies)
  7 siblings, 12 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

The previous round was at
<20180520211210.1248-1-t.gummerer@gmail.com>.

Thanks Junio for the comments on the previous round.

Changes since v2:
 - lowercase the first letter in some error/warning messages before
   marking them for translation
 - wrap paths in output in single quotes, for consistency, and to make
   some of the messages the same as ones that are already translated
 - mark messages in builtin/rerere.c for translation as well, which I
   had previously forgotten.
 - expanded the technical documentation on rerere.  The entire
   document is basically rewritten.
 - changed the test in 6/10 to just fake a conflict marker inside of
   one of the hunks instead of using an inner conflict created by a
   merge.  This is to make sure the codepath is still hit after we
   handle inner conflicts properly.
 - added tests for handling inner conflict markers
 - added one commit to recalculate the conflict ID when an unresolved
   conflict is committed, and the subsequent operation conflicts again
   in the same file.  More explanation in the commit message of that
   commit.

range-diff below.  A few commits changed enough for range-diff
to give up showing the differences in those, they are probably best
reviewed as the whole patch anyway:

1:  901b638400 ! 1:  2825342cc2 rerere: unify error message when read_cache fails
    @@ -1,6 +1,6 @@
     Author: Thomas Gummerer <t.gummerer@gmail.com>
     
    -    rerere: unify error message when read_cache fails
    +    rerere: unify error messages when read_cache fails
     
         We have multiple different variants of the error message we show to
         the user if 'read_cache' fails.  The "Could not read index" variant we
-:  ---------- > 2:  d1500028aa rerere: lowercase error messages
-:  ---------- > 3:  ed3601ee71 rerere: wrap paths in output in sq
2:  c48ffededd ! 4:  6ead84a199 rerere: mark strings for translation
    @@ -9,6 +9,28 @@
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    +diff --git a/builtin/rerere.c b/builtin/rerere.c
    +--- a/builtin/rerere.c
    ++++ b/builtin/rerere.c
    +@@
    + 	if (!strcmp(argv[0], "forget")) {
    + 		struct pathspec pathspec;
    + 		if (argc < 2)
    +-			warning("'git rerere forget' without paths is deprecated");
    ++			warning(_("'git rerere forget' without paths is deprecated"));
    + 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
    + 			       prefix, argv + 1);
    + 		return rerere_forget(&pathspec);
    +@@
    + 			const char *path = merge_rr.items[i].string;
    + 			const struct rerere_id *id = merge_rr.items[i].util;
    + 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
    +-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
    ++				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
    + 		}
    + 	} else
    + 		usage_with_options(rerere_usage, options);
    +
     diff --git a/rerere.c b/rerere.c
     --- a/rerere.c
     +++ b/rerere.c
    @@ -53,14 +75,14 @@
      	io.input = fopen(path, "r");
      	io.io.wrerror = 0;
      	if (!io.input)
    --		return error_errno("Could not open %s", path);
    -+		return error_errno(_("Could not open %s"), path);
    +-		return error_errno("could not open '%s'", path);
    ++		return error_errno(_("could not open '%s'"), path);
      
      	if (output) {
      		io.io.output = fopen(output, "w");
      		if (!io.io.output) {
    --			error_errno("Could not write %s", output);
    -+			error_errno(_("Could not write %s"), output);
    +-			error_errno("could not write '%s'", output);
    ++			error_errno(_("could not write '%s'"), output);
      			fclose(io.input);
      			return -1;
      		}
    @@ -68,18 +90,18 @@
      
      	fclose(io.input);
      	if (io.io.wrerror)
    --		error("There were errors while writing %s (%s)",
    -+		error(_("There were errors while writing %s (%s)"),
    +-		error("there were errors while writing '%s' (%s)",
    ++		error(_("there were errors while writing '%s' (%s)"),
      		      path, strerror(io.io.wrerror));
      	if (io.io.output && fclose(io.io.output))
    --		io.io.wrerror = error_errno("Failed to flush %s", path);
    -+		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
    +-		io.io.wrerror = error_errno("failed to flush '%s'", path);
    ++		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
      
      	if (hunk_no < 0) {
      		if (output)
      			unlink_or_warn(output);
    --		return error("Could not parse conflict hunks in %s", path);
    -+		return error(_("Could not parse conflict hunks in %s"), path);
    +-		return error("could not parse conflict hunks in '%s'", path);
    ++		return error(_("could not parse conflict hunks in '%s'"), path);
      	}
      	if (io.io.wrerror)
      		return -1;
    @@ -105,21 +127,21 @@
      	 * Mark that "postimage" was used to help gc.
      	 */
      	if (utime(rerere_path(id, "postimage"), NULL) < 0)
    --		warning_errno("failed utime() on %s",
    -+		warning_errno(_("failed utime() on %s"),
    +-		warning_errno("failed utime() on '%s'",
    ++		warning_errno(_("failed utime() on '%s'"),
      			      rerere_path(id, "postimage"));
      
      	/* Update "path" with the resolution */
      	f = fopen(path, "w");
      	if (!f)
    --		return error_errno("Could not open %s", path);
    -+		return error_errno(_("Could not open %s"), path);
    +-		return error_errno("could not open '%s'", path);
    ++		return error_errno(_("could not open '%s'"), path);
      	if (fwrite(result.ptr, result.size, 1, f) != 1)
    --		error_errno("Could not write %s", path);
    -+		error_errno(_("Could not write %s"), path);
    +-		error_errno("could not write '%s'", path);
    ++		error_errno(_("could not write '%s'"), path);
      	if (fclose(f))
    --		return error_errno("Writing %s failed", path);
    -+		return error_errno(_("Writing %s failed"), path);
    +-		return error_errno("writing '%s' failed", path);
    ++		return error_errno(_("writing '%s' failed"), path);
      
      out:
      	free(cur.ptr);
    @@ -134,8 +156,8 @@
      
      	if (write_locked_index(&the_index, &index_lock,
      			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
    --		die("Unable to write new index file");
    -+		die(_("Unable to write new index file"));
    +-		die("unable to write new index file");
    ++		die(_("unable to write new index file"));
      }
      
      static void remove_variant(struct rerere_id *id)
    @@ -179,8 +201,8 @@
      		return rr_cache_exists;
      
      	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
    --		die("Could not create directory %s", git_path_rr_cache());
    -+		die(_("Could not create directory %s"), git_path_rr_cache());
    +-		die("could not create directory '%s'", git_path_rr_cache());
    ++		die(_("could not create directory '%s'"), git_path_rr_cache());
      	return 1;
      }
      
    @@ -188,8 +210,8 @@
      	 */
      	ret = handle_cache(path, sha1, NULL);
      	if (ret < 1)
    --		return error("Could not parse conflict hunks in '%s'", path);
    -+		return error(_("Could not parse conflict hunks in '%s'"), path);
    +-		return error("could not parse conflict hunks in '%s'", path);
    ++		return error(_("could not parse conflict hunks in '%s'"), path);
      
      	/* Nuke the recorded resolution for the conflict */
      	id = new_rerere_id(sha1);
    @@ -214,11 +236,11 @@
      	filename = rerere_path(id, "postimage");
      	if (unlink(filename)) {
      		if (errno == ENOENT)
    --			error("no remembered resolution for %s", path);
    -+			error(_("no remembered resolution for %s"), path);
    +-			error("no remembered resolution for '%s'", path);
    ++			error(_("no remembered resolution for '%s'"), path);
      		else
    --			error_errno("cannot unlink %s", filename);
    -+			error_errno(_("cannot unlink %s"), filename);
    +-			error_errno("cannot unlink '%s'", filename);
    ++			error_errno(_("cannot unlink '%s'"), filename);
      		goto fail_exit;
      	}
      
    @@ -235,8 +257,8 @@
      	item = string_list_insert(rr, path);
      	free_rerere_id(item);
      	item->util = id;
    --	fprintf(stderr, "Forgot resolution for %s\n", path);
    -+	fprintf_ln(stderr, _("Forgot resolution for %s"), path);
    +-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
    ++	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
      	return 0;
      
      fail_exit:
3:  e29449406f < -:  ---------- rerere: add some documentation
-:  ---------- > 5:  caad276aca rerere: add some documentation
4:  3b41520b28 ! 6:  ad88a6b8a8 rerere: fix crash when conflict goes unresolved
    @@ -23,14 +23,18 @@
         Now when 'rerere clear' for example is run, it will segfault in
         'has_rerere_resolution', because status is NULL.
     
    -    To fix this, remove the rerere ID from the MERGE_RR file in case we
    -    can't handle it, and remove the folder for the ID.  Removing it
    -    unconditionally is fine here, because if the user would have resolved
    -    the conflict and ran rerere, the entry would no longer be in the
    -    MERGE_RR file, so we wouldn't have this problem in the first place,
    -    while if the conflict was not resolved, the only thing that's left in
    -    the folder is the 'preimage', which by itself will be regenerated by
    -    git if necessary, so the user won't loose any work.
    +    To fix this, remove the rerere ID from the MERGE_RR file in the case
    +    when we can't handle it, and remove the corresponding variant from
    +    .git/rr-cache/.  Removing it unconditionally is fine here, because if
    +    the user would have resolved the conflict and ran rerere, the entry
    +    would no longer be in the MERGE_RR file, so we wouldn't have this
    +    problem in the first place, while if the conflict was not resolved,
    +    the only thing that's left in the folder is the 'preimage', which by
    +    itself will be regenerated by git if necessary, so the user won't
    +    loose any work.
    +
    +    Note that other variants that have the same conflict ID will not be
    +    touched.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    @@ -71,16 +75,13 @@
      	count_pre_post 0 0
      '
      
    -+test_expect_success 'rerere with extra conflict markers keeps working' '
    ++test_expect_success 'rerere with unexpected conflict markers does not crash' '
     +	git reset --hard &&
     +
     +	git checkout -b branch-1 master &&
     +	echo "bar" >test &&
     +	git add test &&
     +	git commit -q -m two &&
    -+	echo "baz" >test &&
    -+	git add test &&
    -+	git commit -q -m three &&
     +
     +	git reset --hard &&
     +	git checkout -b branch-2 master &&
    @@ -88,10 +89,10 @@
     +	git add test &&
     +	git commit -q -a -m one &&
     +
    -+	test_must_fail git merge branch-1~ &&
    -+	git add test &&
    -+	git commit -q -m "will solve conflicts later" &&
     +	test_must_fail git merge branch-1 &&
    ++	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
    ++	mv test.tmp test &&
    ++	git rerere &&
     +
     +	git rerere clear
     +'
5:  411a4ee37e ! 7:  15f9efcba6 rerere: only return whether a path has conflicts or not
    @@ -67,13 +67,13 @@
      	if (io.io.wrerror)
     @@
      	if (io.io.output && fclose(io.io.output))
    - 		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
    + 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
      
     -	if (hunk_no < 0) {
     +	if (has_conflicts < 0) {
      		if (output)
      			unlink_or_warn(output);
    - 		return error(_("Could not parse conflict hunks in %s"), path);
    + 		return error(_("could not parse conflict hunks in '%s'"), path);
      	}
      	if (io.io.wrerror)
      		return -1;
6:  fc9f715913 = 8:  1490efaad3 rerere: factor out handle_conflict function
7:  f7dea09a0a < -:  ---------- rerere: teach rerere to handle nested conflicts
-:  ---------- > 9:  6619650c42 rerere: teach rerere to handle nested conflicts
-:  ---------- > 10:  4b11dce7dd rerere: recalculate conflict ID when unresolved conflict is committed

Thomas Gummerer (10):
  rerere: unify error messages when read_cache fails
  rerere: lowercase error messages
  rerere: wrap paths in output in sq
  rerere: mark strings for translation
  rerere: add some documentation
  rerere: fix crash when conflict goes unresolved
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: teach rerere to handle nested conflicts
  rerere: recalculate conflict ID when unresolved conflict is committed

 Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
 builtin/rerere.c                   |   4 +-
 rerere.c                           | 246 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  67 ++++++++
 4 files changed, 372 insertions(+), 127 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

-- 
2.18.0.rc1.242.g61856ae69


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 01/10] rerere: unify error messages when read_cache fails
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index 18cae2d11c..4b4869662d 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 02/10] rerere: lowercase error messages
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Documentation/CodingGuidelines mentions that error messages should be
lowercase.  Prior to marking them for translation follow that pattern
in rerere as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 4b4869662d..eca182023f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno("could not write %s", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error("there were errors while writing %s (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush %s", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in %s", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -690,11 +690,11 @@ static int merge(const struct rerere_id *id, const char *path)
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno("could not write %s", path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno("writing %s failed", path);
 
 out:
 	free(cur.ptr);
@@ -721,7 +721,7 @@ static void update_paths(struct string_list *update)
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die("unable to write new index file");
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die("could not create directory %s", git_path_rr_cache());
 	return 1;
 }
 
@@ -1032,7 +1032,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error("could not parse conflict hunks in '%s'", path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 03/10] rerere: wrap paths in output in sq
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

It looks like most paths in the output in the git codebase are wrapped
in single quotes.  Standardize on that in rerere as well.

Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  2 +-
 rerere.c         | 26 +++++++++++++-------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0bc40298c2..e0c67c98e9 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for %s", rerere_path(id, NULL));
+				die("unable to generate diff for '%s'", rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index eca182023f..0e5956a51c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write %s", output);
+			error_errno("could not write '%s'", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing %s (%s)",
+		error("there were errors while writing '%s' (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush '%s'", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in '%s'", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno("failed utime() on '%s'",
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write %s", path);
+		error_errno("could not write '%s'", path);
 	if (fclose(f))
-		return error_errno("writing %s failed", path);
+		return error_errno("writing '%s' failed", path);
 
 out:
 	free(cur.ptr);
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory %s", git_path_rr_cache());
+		die("could not create directory '%s'", git_path_rr_cache());
 	return 1;
 }
 
@@ -1068,9 +1068,9 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error("no remembered resolution for '%s'", path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno("cannot unlink '%s'", filename);
 		goto fail_exit;
 	}
 
@@ -1089,7 +1089,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf(stderr, "Forgot resolution for '%s'\n", path);
 	return 0;
 
 fail_exit:
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 04/10] rerere: mark strings for translation
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (2 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a plumbing command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on its output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  4 +--
 rerere.c         | 68 ++++++++++++++++++++++++------------------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index e0c67c98e9..5ed941b91f 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -75,7 +75,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[0], "forget")) {
 		struct pathspec pathspec;
 		if (argc < 2)
-			warning("'git rerere forget' without paths is deprecated");
+			warning(_("'git rerere forget' without paths is deprecated"));
 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
 			       prefix, argv + 1);
 		return rerere_forget(&pathspec);
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
+				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index 0e5956a51c..74ce422634 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write '%s'", output);
+			error_errno(_("could not write '%s'"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing '%s' (%s)",
+		error(_("there were errors while writing '%s' (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush '%s'", path);
+		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on '%s'",
+		warning_errno(_("failed utime() on '%s'"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write '%s'", path);
+		error_errno(_("could not write '%s'"), path);
 	if (fclose(f))
-		return error_errno("writing '%s' failed", path);
+		return error_errno(_("writing '%s' failed"), path);
 
 out:
 	free(cur.ptr);
@@ -715,13 +715,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("unable to write new index file");
+		die(_("unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -753,7 +753,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -787,9 +787,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -803,11 +803,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory '%s'", git_path_rr_cache());
+		die(_("could not create directory '%s'"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1032,7 +1032,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1050,7 +1050,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error(_("Failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1061,16 +1061,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for '%s'", path);
+			error(_("no remembered resolution for '%s'"), path);
 		else
-			error_errno("cannot unlink '%s'", filename);
+			error_errno(_("cannot unlink '%s'"), filename);
 		goto fail_exit;
 	}
 
@@ -1080,7 +1080,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1089,7 +1089,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
+	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
 	return 0;
 
 fail_exit:
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1192,7 +1192,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 05/10] rerere: add some documentation
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (3 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 142 +++++++++++++++++++++++++++++
 rerere.c                           |   4 -
 2 files changed, 142 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..2c517fe0fc
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,142 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Differnt conflict styles and branch names are dealt with by stripping
+that information from the conflict markers, and removing extraneous
+information from the `diff3` conflict style.
+
+Branches being merged in different order are dealt with by sorting the
+conflict hunks.  More on each of those parts in the following
+sections.
+
+Once these two normalization operations are applied, a conflict ID is
+created based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with with a line with the string "A" (for
+brevity this line is called "line A" for brevity in the following) in
+it.  In branch AB this line is changed to "B", in AC, this line is
+changed to C, and branch AC2 is forked off of AC, after the line was
+changed to C.
+
+Now forking a branch ABAC off of branch AB and then merging AC into it,
+we'd get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    =======
+    C
+    >>>>>>> AC
+
+Now doing the analogous with AC2 (forking a branch ABAC2 off of branch
+AB and then merging branch AC2 into it), maybe using the diff3
+conflict style, we'd get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    ||||||| merged common ancestors
+    A
+    =======
+    C
+    >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+    After examining what branches AB and AC did, I believe that making
+    line A into line D is the best thing to do that is compatible with
+    what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed.  The above
+examples would both result in the following normalized conflict:
+
+    <<<<<<<
+    B
+    =======
+    C
+    >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+    - AB: changes A to B
+    - AC: changes A to C
+    - XY: changes X to Y
+    - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+    After examining what branches AB and AC did, I believe that
+    making line A into line D is the best thing to do that is
+    compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters.  The conflict markers are stripped out
+before the sha1 is calculated.  So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index 74ce422634..ef23abe4dd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (4 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict in a file, but
commits the file with the conflict markers, and later the file ends up
in a state in which rerere can't handle it, subsequent rerere
operations that are interested in that path, such as 'rerere clear' or
'rerere forget <path>' will fail, or even worse in the case of 'rerere
clear' segfault.

Such states include nested conflicts, or an extra conflict marker that
doesn't have any match.

This is because the first 'git rerere' when there was only one
conflict in the file leaves an entry in the MERGE_RR file behind.  The
next 'git rerere' will then pick the rerere ID for that file up, and
not assign a new ID as it can't successfully calculate one.  It will
however still try to do the rerere operation, because of the existing
ID.  As the handle_file function fails, it will remove the 'preimage'
for the ID in the process, while leaving the ID in the MERGE_RR file.

Now when 'rerere clear' for example is run, it will segfault in
'has_rerere_resolution', because status is NULL.

To fix this, remove the rerere ID from the MERGE_RR file in the case
when we can't handle it, and remove the corresponding variant from
.git/rr-cache/.  Removing it unconditionally is fine here, because if
the user would have resolved the conflict and ran rerere, the entry
would no longer be in the MERGE_RR file, so we wouldn't have this
problem in the first place, while if the conflict was not resolved,
the only thing that's left in the folder is the 'preimage', which by
itself will be regenerated by git if necessary, so the user won't
loose any work.

Note that other variants that have the same conflict ID will not be
touched.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index ef23abe4dd..220020187b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -824,10 +824,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -835,7 +832,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index eaf18c81cb..5ce411b70d 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with unexpected conflict markers does not crash' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1 &&
+	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
+	mv test.tmp test &&
+	git rerere &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 07/10] rerere: only return whether a path has conflicts or not
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (5 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 220020187b..da3744b86b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -955,7 +956,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1009,11 +1010,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 08/10] rerere: factor out handle_conflict function
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (6 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is a preparation for the next step, where
this function will be re-used.  No functional changes intended.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 143 +++++++++++++++++++++++++------------------------------
 1 file changed, 65 insertions(+), 78 deletions(-)

diff --git a/rerere.c b/rerere.c
index da3744b86b..fac90663b0 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,37 +352,25 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
-
+	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
-		} else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size))
+			goto bad;
+		else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
@@ -427,42 +383,73 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			has_conflicts = 1;
-			hunk = RR_CONTEXT;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			goto out;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
+bad:
+	has_conflicts = -1;
+out:
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf out = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(&out, io, marker_size,
+							    sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+	strbuf_release(&out);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (7 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be reordered as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as
    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place should have no
practical impact, but it simplifies the implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

I couldn't actually get the conflict markers the right way just using
merge-recursive.  But I think that would be fixed either way by
d694a17986 ("ll-merge: use a longer conflict marker for internal
merge", 2016-04-14), if I read that correctly.

Either way I still think this can be an improvement for when the user
commits merge conflicts (even though they shouldn't do that in the
first place), and for possible other edge cases that I'm not able to
produce right now, but I may just not be creative enough for those.

 Documentation/technical/rerere.txt | 40 ++++++++++++++++++++++++++++++
 rerere.c                           | 14 ++++++++---
 t/t4200-rerere.sh                  | 38 ++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index 2c517fe0fc..7077ab4a08 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -140,3 +140,43 @@ SHA1('B<NUL>C<NUL>').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Same as before, labels on conflict markers and diff3 output is
+stripped, and the conflict hunks are sorted, for both the outer and
+the inner conflict.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+    <<<<<<< HEAD
+    1
+    =======
+    <<<<<<< HEAD
+    3
+    =======
+    2
+    >>>>>>> branch-2
+    >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, the conflict
+would look as follows:
+
+    <<<<<<<
+    1
+    =======
+    <<<<<<<
+    3
+    =======
+    2
+    >>>>>>>
+    >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
diff --git a/rerere.c b/rerere.c
index fac90663b0..f611db7873 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size))
-			goto bad;
-		else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				goto bad;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
+		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 5ce411b70d..f433848ccb 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -602,4 +602,42 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
 	git rerere clear
 '
 
+test_expect_success 'rerere with inner conflict markers' '
+	git reset --hard &&
+
+	git checkout -b A master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b B master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+
+	echo "resolved" >test &&
+	git add test &&
+	git commit -q -m "solved conflict" &&
+
+	echo "resolved" >expect &&
+
+	git reset --hard HEAD~~ &&
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+	cat test >actual &&
+	test_cmp expect actual
+'
+
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (8 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  11 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict, commits the results,
and does an operation which creates another conflict, rerere will use
the ID of the previously unresolved conflict for the new conflict.
This is because the conflict is kept in the MERGE_RR file, which
'rerere' reads every time it is invoked.

After the new conflict is solved, rerere will record the resolution
with the ID of the old conflict.  So in order to replay the conflict,
both merges would have to be re-done, instead of just the last one, in
order for rerere to be able to automatically resolve the conflict.

Instead of that, assign a new conflict ID if there are still conflicts
in a file and the file had conflicts at a previous step.  This ID
matches the conflict we actually resolved at the corresponding step.

Note that there are no backwards compatibility worries here, as rerere
would have failed to even normalize the conflict before this patch
series.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 7 +++----
 t/t4200-rerere.sh | 7 +++++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f611db7873..644f185180 100644
--- a/rerere.c
+++ b/rerere.c
@@ -818,7 +818,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret, has_string;
+		int ret;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -826,12 +826,11 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		has_string = string_list_has_string(rr, path);
-		if (ret < 0 && has_string) {
+		if (ret != 0 && string_list_has_string(rr, path)) {
 			remove_variant(string_list_lookup(rr, path)->util);
 			string_list_remove(rr, path, 1);
 		}
-		if (ret < 1 || has_string)
+		if (ret < 1)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index f433848ccb..9578215ff2 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -636,6 +636,13 @@ test_expect_success 'rerere with inner conflict markers' '
 	git commit -q -m "will solve conflicts later" &&
 	test_must_fail git merge A &&
 	cat test >actual &&
+	test_cmp expect actual &&
+
+	git add test &&
+	git commit -m "rerere solved conflict" &&
+	git reset --hard HEAD~ &&
+	test_must_fail git merge A &&
+	cat test >actual &&
 	test_cmp expect actual
 '
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/10] rerere: handle nested conflicts
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (9 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
@ 2018-07-03 21:05   ` Thomas Gummerer
  2018-07-06 17:56     ` Junio C Hamano
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  11 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-03 21:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On 06/05, Thomas Gummerer wrote:
> The previous round was at
> <20180520211210.1248-1-t.gummerer@gmail.com>.
> 
> Thanks Junio for the comments on the previous round.
> 
> Changes since v2:
>  - lowercase the first letter in some error/warning messages before
>    marking them for translation
>  - wrap paths in output in single quotes, for consistency, and to make
>    some of the messages the same as ones that are already translated
>  - mark messages in builtin/rerere.c for translation as well, which I
>    had previously forgotten.
>  - expanded the technical documentation on rerere.  The entire
>    document is basically rewritten.
>  - changed the test in 6/10 to just fake a conflict marker inside of
>    one of the hunks instead of using an inner conflict created by a
>    merge.  This is to make sure the codepath is still hit after we
>    handle inner conflicts properly.
>  - added tests for handling inner conflict markers
>  - added one commit to recalculate the conflict ID when an unresolved
>    conflict is committed, and the subsequent operation conflicts again
>    in the same file.  More explanation in the commit message of that
>    commit.

Now that 2.18 is out (and I'm caught up on the list after being away
from it for a few days), is there any interest in this series? I guess
it was overlooked as it's been sent in the rc phase for 2.18.

I think the most important bit here is 6/10 which fixes a crash that
can happen in "normal" usage of git.  The translation bits are also
nice to have I think, but I could send them in a different series if
that's preferred.

The other patches would be nice to have, but are arguably less
important.

> range-diff below.  A few commits changed enough for range-diff
> to give up showing the differences in those, they are probably best
> reviewed as the whole patch anyway:
>
> [snip]
> 
> Thomas Gummerer (10):
>   rerere: unify error messages when read_cache fails
>   rerere: lowercase error messages
>   rerere: wrap paths in output in sq
>   rerere: mark strings for translation
>   rerere: add some documentation
>   rerere: fix crash when conflict goes unresolved
>   rerere: only return whether a path has conflicts or not
>   rerere: factor out handle_conflict function
>   rerere: teach rerere to handle nested conflicts
>   rerere: recalculate conflict ID when unresolved conflict is committed
> 
>  Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
>  builtin/rerere.c                   |   4 +-
>  rerere.c                           | 246 ++++++++++++++---------------
>  t/t4200-rerere.sh                  |  67 ++++++++
>  4 files changed, 372 insertions(+), 127 deletions(-)
>  create mode 100644 Documentation/technical/rerere.txt
> 
> -- 
> 2.18.0.rc1.242.g61856ae69
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/10] rerere: handle nested conflicts
  2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
@ 2018-07-06 17:56     ` Junio C Hamano
  2018-07-10 21:37       ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-07-06 17:56 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> On 06/05, Thomas Gummerer wrote:
>> The previous round was at
>> <20180520211210.1248-1-t.gummerer@gmail.com>.
>> 
>> Thanks Junio for the comments on the previous round.
>> 
>> Changes since v2:
>>  - lowercase the first letter in some error/warning messages before
>>    marking them for translation
>>  - wrap paths in output in single quotes, for consistency, and to make
>>    some of the messages the same as ones that are already translated
>>  - mark messages in builtin/rerere.c for translation as well, which I
>>    had previously forgotten.
>>  - expanded the technical documentation on rerere.  The entire
>>    document is basically rewritten.
>>  - changed the test in 6/10 to just fake a conflict marker inside of
>>    one of the hunks instead of using an inner conflict created by a
>>    merge.  This is to make sure the codepath is still hit after we
>>    handle inner conflicts properly.
>>  - added tests for handling inner conflict markers
>>  - added one commit to recalculate the conflict ID when an unresolved
>>    conflict is committed, and the subsequent operation conflicts again
>>    in the same file.  More explanation in the commit message of that
>>    commit.
>
> Now that 2.18 is out (and I'm caught up on the list after being away
> from it for a few days), is there any interest in this series? I guess
> it was overlooked as it's been sent in the rc phase for 2.18.

I deliberately ignored, not because I wasn't interested in it, but
because I'd be distracted during the pre-release feature freeze as
I'd be heavily intereseted in it.

Now is a good time to repost to stir/re-ignite the interest from
others, possibly after rebasing on v2.18.0 and polishing further.

Thanks.

>
> I think the most important bit here is 6/10 which fixes a crash that
> can happen in "normal" usage of git.  The translation bits are also
> nice to have I think, but I could send them in a different series if
> that's preferred.
>
> The other patches would be nice to have, but are arguably less
> important.
>
>> range-diff below.  A few commits changed enough for range-diff
>> to give up showing the differences in those, they are probably best
>> reviewed as the whole patch anyway:
>>
>> [snip]
>> 
>> Thomas Gummerer (10):
>>   rerere: unify error messages when read_cache fails
>>   rerere: lowercase error messages
>>   rerere: wrap paths in output in sq
>>   rerere: mark strings for translation
>>   rerere: add some documentation
>>   rerere: fix crash when conflict goes unresolved
>>   rerere: only return whether a path has conflicts or not
>>   rerere: factor out handle_conflict function
>>   rerere: teach rerere to handle nested conflicts
>>   rerere: recalculate conflict ID when unresolved conflict is committed
>> 
>>  Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
>>  builtin/rerere.c                   |   4 +-
>>  rerere.c                           | 246 ++++++++++++++---------------
>>  t/t4200-rerere.sh                  |  67 ++++++++
>>  4 files changed, 372 insertions(+), 127 deletions(-)
>>  create mode 100644 Documentation/technical/rerere.txt
>> 
>> -- 
>> 2.18.0.rc1.242.g61856ae69
>> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/10] rerere: handle nested conflicts
  2018-07-06 17:56     ` Junio C Hamano
@ 2018-07-10 21:37       ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-10 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/06, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > On 06/05, Thomas Gummerer wrote:
> >> The previous round was at
> >> <20180520211210.1248-1-t.gummerer@gmail.com>.
> >> 
> >> Thanks Junio for the comments on the previous round.
> >> 
> >> Changes since v2:
> >>  - lowercase the first letter in some error/warning messages before
> >>    marking them for translation
> >>  - wrap paths in output in single quotes, for consistency, and to make
> >>    some of the messages the same as ones that are already translated
> >>  - mark messages in builtin/rerere.c for translation as well, which I
> >>    had previously forgotten.
> >>  - expanded the technical documentation on rerere.  The entire
> >>    document is basically rewritten.
> >>  - changed the test in 6/10 to just fake a conflict marker inside of
> >>    one of the hunks instead of using an inner conflict created by a
> >>    merge.  This is to make sure the codepath is still hit after we
> >>    handle inner conflicts properly.
> >>  - added tests for handling inner conflict markers
> >>  - added one commit to recalculate the conflict ID when an unresolved
> >>    conflict is committed, and the subsequent operation conflicts again
> >>    in the same file.  More explanation in the commit message of that
> >>    commit.
> >
> > Now that 2.18 is out (and I'm caught up on the list after being away
> > from it for a few days), is there any interest in this series? I guess
> > it was overlooked as it's been sent in the rc phase for 2.18.
> 
> I deliberately ignored, not because I wasn't interested in it, but
> because I'd be distracted during the pre-release feature freeze as
> I'd be heavily intereseted in it.
> 
> Now is a good time to repost to stir/re-ignite the interest from
> others, possibly after rebasing on v2.18.0 and polishing further.

I sometimes find it hard to gauge whether there are no replies because
nobody is interested in the series, or if it is because it was ignored
or slipped to the cracks.  I guess I could have inferred it from your
replies to the previous iteration though :)

I'll go back and polish my patches, and then send a new iteration,
thanks! 

> Thanks.
> 
> >
> > I think the most important bit here is 6/10 which fixes a crash that
> > can happen in "normal" usage of git.  The translation bits are also
> > nice to have I think, but I could send them in a different series if
> > that's preferred.
> >
> > The other patches would be nice to have, but are arguably less
> > important.
> >
> >> range-diff below.  A few commits changed enough for range-diff
> >> to give up showing the differences in those, they are probably best
> >> reviewed as the whole patch anyway:
> >>
> >> [snip]
> >> 
> >> Thomas Gummerer (10):
> >>   rerere: unify error messages when read_cache fails
> >>   rerere: lowercase error messages
> >>   rerere: wrap paths in output in sq
> >>   rerere: mark strings for translation
> >>   rerere: add some documentation
> >>   rerere: fix crash when conflict goes unresolved
> >>   rerere: only return whether a path has conflicts or not
> >>   rerere: factor out handle_conflict function
> >>   rerere: teach rerere to handle nested conflicts
> >>   rerere: recalculate conflict ID when unresolved conflict is committed
> >> 
> >>  Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
> >>  builtin/rerere.c                   |   4 +-
> >>  rerere.c                           | 246 ++++++++++++++---------------
> >>  t/t4200-rerere.sh                  |  67 ++++++++
> >>  4 files changed, 372 insertions(+), 127 deletions(-)
> >>  create mode 100644 Documentation/technical/rerere.txt
> >> 
> >> -- 
> >> 2.18.0.rc1.242.g61856ae69
> >> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 00/11] rerere: handle nested conflicts
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (10 preceding siblings ...)
  2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
@ 2018-07-14 21:44   ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
                       ` (12 more replies)
  11 siblings, 13 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

The previous rounds were at
<20180520211210.1248-1-t.gummerer@gmail.com> and
<20180605215219.28783-1-t.gummerer@gmail.com>.

This round is a more polished version of the previous round, as
suggested by Junio in <xmqq1scgmemy.fsf@gitster-ct.c.googlers.com>.
It's also rebased on v2.18.

The series grew by one patch, because 8/10 has been split into two
patches hopefully making it easier to follow.

range-diff before below:

1:  2825342cc2 = 1:  018bd68a8a rerere: unify error messages when read_cache fails
2:  d1500028aa ! 2:  281fcbf24f rerere: lowercase error messages
    @@ -4,7 +4,8 @@
     
         Documentation/CodingGuidelines mentions that error messages should be
         lowercase.  Prior to marking them for translation follow that pattern
    -    in rerere as well.
    +    in rerere as well, so translators won't have to translate messages
    +    that don't conform to our guidelines.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    @@ -87,3 +88,12 @@
      
      	/* Nuke the recorded resolution for the conflict */
      	id = new_rerere_id(sha1);
    +@@
    + 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
    + 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
    + 			free(cur.ptr);
    +-			error("Failed to update conflicted state in '%s'", path);
    ++			error("failed to update conflicted state in '%s'", path);
    + 			goto fail_exit;
    + 		}
    + 		cleanly_resolved = !try_merge(id, path, &cur, &result);
3:  ed3601ee71 = 3:  b6d5e2e26d rerere: wrap paths in output in sq
4:  6ead84a199 ! 4:  45f0d7a99f rerere: mark strings for translation
    @@ -4,7 +4,7 @@
     
         'git rerere' is considered a plumbing command and as such its output
         should be translated.  Its functionality is also only enabled through
    -    a config setting, so scripts really shouldn't rely on its output
    +    a config setting, so scripts really shouldn't rely on the output
         either way.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
    @@ -219,8 +219,8 @@
      		handle_cache(path, sha1, rerere_path(id, "thisimage"));
      		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
      			free(cur.ptr);
    --			error("Failed to update conflicted state in '%s'", path);
    -+			error(_("Failed to update conflicted state in '%s'"), path);
    +-			error("failed to update conflicted state in '%s'", path);
    ++			error(_("failed to update conflicted state in '%s'"), path);
      			goto fail_exit;
      		}
      		cleanly_resolved = !try_merge(id, path, &cur, &result);
5:  caad276aca ! 5:  993857a816 rerere: add some documentation
    @@ -1,6 +1,6 @@
     Author: Thomas Gummerer <t.gummerer@gmail.com>
     
    -    rerere: add some documentation
    +    rerere: add documentation for conflict normalization
     
         Add some documentation for the logic behind the conflict normalization
         in rerere.
    @@ -27,30 +27,28 @@
     +when different conflict style settings are used, rerere normalizes the
     +conflicts before writing them to the rerere database.
     +
    -+Differnt conflict styles and branch names are dealt with by stripping
    -+that information from the conflict markers, and removing extraneous
    -+information from the `diff3` conflict style.
    -+
    -+Branches being merged in different order are dealt with by sorting the
    -+conflict hunks.  More on each of those parts in the following
    -+sections.
    ++Different conflict styles and branch names are normalized by stripping
    ++the labels from the conflict markers, and removing extraneous
    ++information from the `diff3` conflict style. Branches that are merged
    ++in different order are normalized by sorting the conflict hunks.  More
    ++on each of those steps in the following sections.
     +
     +Once these two normalization operations are applied, a conflict ID is
    -+created based on the normalized conflict, which is later used by
    ++calculated based on the normalized conflict, which is later used by
     +rerere to look up the conflict in the rerere database.
     +
     +Stripping extraneous information
     +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +
     +Say we have three branches AB, AC and AC2.  The common ancestor of
    -+these branches has a file with with a line with the string "A" (for
    -+brevity this line is called "line A" for brevity in the following) in
    -+it.  In branch AB this line is changed to "B", in AC, this line is
    -+changed to C, and branch AC2 is forked off of AC, after the line was
    -+changed to C.
    ++these branches has a file with a line containing the string "A" (for
    ++brevity this is called "line A" in the rest of the document).  In
    ++branch AB this line is changed to "B", in AC, this line is changed to
    ++"C", and branch AC2 is forked off of AC, after the line was changed to
    ++"C".
     +
    -+Now forking a branch ABAC off of branch AB and then merging AC into it,
    -+we'd get a conflict like the following:
    ++Forking a branch ABAC off of branch AB and then merging AC into it, we
    ++get a conflict like the following:
     +
     +    <<<<<<< HEAD
     +    B
    @@ -58,9 +56,9 @@
     +    C
     +    >>>>>>> AC
     +
    -+Now doing the analogous with AC2 (forking a branch ABAC2 off of branch
    -+AB and then merging branch AC2 into it), maybe using the diff3
    -+conflict style, we'd get a conflict like the following:
    ++Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
    ++and then merging branch AC2 into it), using the diff3 conflict style,
    ++we get a conflict like the following:
     +
     +    <<<<<<< HEAD
     +    B
6:  ad88a6b8a8 = 6:  a7a0f657f3 rerere: fix crash when conflict goes unresolved
7:  15f9efcba6 = 7:  f1afd4b9a4 rerere: only return whether a path has conflicts or not
8:  1490efaad3 ! 8:  1f5cef506a rerere: factor out handle_conflict function
    @@ -3,53 +3,14 @@
         rerere: factor out handle_conflict function
     
         Factor out the handle_conflict function, which handles a single
    -    conflict in a path.  This is a preparation for the next step, where
    -    this function will be re-used.  No functional changes intended.
    +    conflict in a path.  This is in preparation for a subsequent commit,
    +    where this function will be re-used.  No functional changes intended.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
     diff --git a/rerere.c b/rerere.c
     --- a/rerere.c
     +++ b/rerere.c
    -@@
    - 		ferr_puts(str, io->output, &io->wrerror);
    - }
    - 
    --/*
    -- * Write a conflict marker to io->output (if defined).
    -- */
    --static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
    --{
    --	char buf[64];
    --
    --	while (size) {
    --		if (size <= sizeof(buf) - 2) {
    --			memset(buf, ch, size);
    --			buf[size] = '\n';
    --			buf[size + 1] = '\0';
    --			size = 0;
    --		} else {
    --			int sz = sizeof(buf) - 1;
    --
    --			/*
    --			 * Make sure we will not write everything out
    --			 * in this round by leaving at least 1 byte
    --			 * for the next round, giving the next round
    --			 * a chance to add the terminating LF.  Yuck.
    --			 */
    --			if (size <= sz)
    --				sz -= (sz - size) + 1;
    --			memset(buf, ch, sz);
    --			buf[sz] = '\0';
    --			size -= sz;
    --		}
    --		rerere_io_putstr(buf, io);
    --	}
    --}
    --
    - static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
    - {
    - 	if (io->output)
     @@
      	return isspace(*buf);
      }
    @@ -67,14 +28,7 @@
     - * hunks and -1 if an error occured.
     - */
     -static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
    -+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
    -+{
    -+	strbuf_addchars(buf, ch, size);
    -+	strbuf_addch(buf, '\n');
    -+}
    -+
    -+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
    -+			   int marker_size, git_SHA_CTX *ctx)
    ++static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
      {
     -	git_SHA_CTX ctx;
     -	int has_conflicts = 0;
    @@ -88,38 +42,39 @@
     -
     -	if (sha1)
     -		git_SHA1_Init(&ctx);
    --
    -+	int has_conflicts = 1;
    ++	int has_conflicts = -1;
    + 
      	while (!io->getline(&buf, io)) {
    --		if (is_cmarker(buf.buf, '<', marker_size)) {
    + 		if (is_cmarker(buf.buf, '<', marker_size)) {
     -			if (hunk != RR_CONTEXT)
     -				goto bad;
     -			hunk = RR_SIDE_1;
    --		} else if (is_cmarker(buf.buf, '|', marker_size)) {
    -+		if (is_cmarker(buf.buf, '<', marker_size))
    -+			goto bad;
    -+		else if (is_cmarker(buf.buf, '|', marker_size)) {
    ++			break;
    + 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
      			if (hunk != RR_SIDE_1)
    - 				goto bad;
    +-				goto bad;
    ++				break;
      			hunk = RR_ORIGINAL;
    -@@
    - 				goto bad;
    + 		} else if (is_cmarker(buf.buf, '=', marker_size)) {
    + 			if (hunk != RR_SIDE_1 && hunk != RR_ORIGINAL)
    +-				goto bad;
    ++				break;
    + 			hunk = RR_SIDE_2;
    + 		} else if (is_cmarker(buf.buf, '>', marker_size)) {
    + 			if (hunk != RR_SIDE_2)
    +-				goto bad;
    ++				break;
      			if (strbuf_cmp(&one, &two) > 0)
      				strbuf_swap(&one, &two);
    --			has_conflicts = 1;
    + 			has_conflicts = 1;
     -			hunk = RR_CONTEXT;
    --			rerere_io_putconflict('<', marker_size, io);
    --			rerere_io_putmem(one.buf, one.len, io);
    --			rerere_io_putconflict('=', marker_size, io);
    --			rerere_io_putmem(two.buf, two.len, io);
    --			rerere_io_putconflict('>', marker_size, io);
    + 			rerere_io_putconflict('<', marker_size, io);
    + 			rerere_io_putmem(one.buf, one.len, io);
    + 			rerere_io_putconflict('=', marker_size, io);
    + 			rerere_io_putmem(two.buf, two.len, io);
    + 			rerere_io_putconflict('>', marker_size, io);
     -			if (sha1) {
     -				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
    -+			rerere_strbuf_putconflict(out, '<', marker_size);
    -+			strbuf_addbuf(out, &one);
    -+			rerere_strbuf_putconflict(out, '=', marker_size);
    -+			strbuf_addbuf(out, &two);
    -+			rerere_strbuf_putconflict(out, '>', marker_size);
     +			if (ctx) {
     +				git_SHA1_Update(ctx, one.buf ? one.buf : "",
      					    one.len + 1);
    @@ -129,7 +84,7 @@
      			}
     -			strbuf_reset(&one);
     -			strbuf_reset(&two);
    -+			goto out;
    ++			break;
      		} else if (hunk == RR_SIDE_1)
      			strbuf_addbuf(&one, &buf);
      		else if (hunk == RR_ORIGINAL)
    @@ -143,9 +98,6 @@
     -		hunk = 99; /* force error exit */
     -		break;
      	}
    -+bad:
    -+	has_conflicts = -1;
    -+out:
      	strbuf_release(&one);
      	strbuf_release(&two);
      	strbuf_release(&buf);
    @@ -169,24 +121,20 @@
     +{
     +	git_SHA_CTX ctx;
     +	struct strbuf buf = STRBUF_INIT;
    -+	struct strbuf out = STRBUF_INIT;
     +	int has_conflicts = 0;
     +	if (sha1)
     +		git_SHA1_Init(&ctx);
     +
     +	while (!io->getline(&buf, io)) {
     +		if (is_cmarker(buf.buf, '<', marker_size)) {
    -+			has_conflicts = handle_conflict(&out, io, marker_size,
    -+							    sha1 ? &ctx : NULL);
    ++			has_conflicts = handle_conflict(io, marker_size,
    ++							sha1 ? &ctx : NULL);
     +			if (has_conflicts < 0)
     +				break;
    -+			rerere_io_putmem(out.buf, out.len, io);
    -+			strbuf_reset(&out);
     +		} else
     +			rerere_io_putstr(buf.buf, io);
     +	}
     +	strbuf_release(&buf);
    -+	strbuf_release(&out);
     +
      	if (sha1)
      		git_SHA1_Final(sha1, &ctx);
-:  ---------- > 9:  8ac0d3e903 rerere: return strbuf from handle path
9:  6619650c42 ! 10:  ef84fdc201 rerere: teach rerere to handle nested conflicts
    @@ -27,7 +27,7 @@
             >>>>>>> branch-2
             >>>>>>> branch-3~
     
    -    it would be reordered as follows in the preimage:
    +    it would be recorde as follows in the preimage:
     
             <<<<<<<
             1
    @@ -40,14 +40,16 @@
             >>>>>>>
     
         and the conflict ID would be calculated as
    +
             sha1(1<NUL><<<<<<<
             2
             =======
             3
             >>>>>>><NUL>)
     
    -    Stripping out vs. leaving the conflict markers in place should have no
    -    practical impact, but it simplifies the implementation.
    +    Stripping out vs. leaving the conflict markers in place in the inner
    +    conflict should have no practical impact, but it simplifies the
    +    implementation.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    @@ -63,9 +65,11 @@
     +~~~~~~~~~~~~~~~~
     +
     +Nested conflicts are handled very similarly to "simple" conflicts.
    -+Same as before, labels on conflict markers and diff3 output is
    -+stripped, and the conflict hunks are sorted, for both the outer and
    -+the inner conflict.
    ++Similar to simple conflicts, the conflict is first normalized by
    ++stripping the labels from conflict markers, stripping the diff3
    ++output, and the sorting the conflict hunks, both for the outer and the
    ++inner conflict.  This is done recursively, so any number of nested
    ++conflicts can be handled.
     +
     +The only difference is in how the conflict ID is calculated.  For the
     +inner conflict, the conflict markers themselves are not stripped out
    @@ -83,16 +87,16 @@
     +    >>>>>>> branch-2
     +    >>>>>>> branch-3~
     +
    -+After stripping out the labels of the conflict markers, the conflict
    -+would look as follows:
    ++After stripping out the labels of the conflict markers, and sorting
    ++the hunks, the conflict would look as follows:
     +
     +    <<<<<<<
     +    1
     +    =======
     +    <<<<<<<
    -+    3
    -+    =======
     +    2
    ++    =======
    ++    3
     +    >>>>>>>
     +    >>>>>>>
     +
    @@ -108,23 +112,21 @@
      	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
     -	struct strbuf buf = STRBUF_INIT;
     +	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
    - 	int has_conflicts = 1;
    + 	int has_conflicts = -1;
    + 
      	while (!io->getline(&buf, io)) {
    --		if (is_cmarker(buf.buf, '<', marker_size))
    --			goto bad;
    --		else if (is_cmarker(buf.buf, '|', marker_size)) {
    -+		if (is_cmarker(buf.buf, '<', marker_size)) {
    + 		if (is_cmarker(buf.buf, '<', marker_size)) {
    +-			break;
     +			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
    -+				goto bad;
    ++				break;
     +			if (hunk == RR_SIDE_1)
     +				strbuf_addbuf(&one, &conflict);
     +			else
     +				strbuf_addbuf(&two, &conflict);
     +			strbuf_release(&conflict);
    -+		} else if (is_cmarker(buf.buf, '|', marker_size)) {
    + 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
      			if (hunk != RR_SIDE_1)
    - 				goto bad;
    - 			hunk = RR_ORIGINAL;
    + 				break;
     
     diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
     --- a/t/t4200-rerere.sh
    @@ -169,6 +171,5 @@
     +	cat test >actual &&
     +	test_cmp expect actual
     +'
    -+
     +
      test_done
10:  4b11dce7dd = 11:  35a826908f rerere: recalculate conflict ID when unresolved conflict is committed


Thomas Gummerer (11):
  rerere: unify error messages when read_cache fails
  rerere: lowercase error messages
  rerere: wrap paths in output in sq
  rerere: mark strings for translation
  rerere: add documentation for conflict normalization
  rerere: fix crash when conflict goes unresolved
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: return strbuf from handle path
  rerere: teach rerere to handle nested conflicts
  rerere: recalculate conflict ID when unresolved conflict is committed

 Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
 builtin/rerere.c                   |   4 +-
 rerere.c                           | 243 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  66 ++++++++
 4 files changed, 366 insertions(+), 129 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

-- 
2.18.0.233.g985f88cf7e

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 01/11] rerere: unify error messages when read_cache fails
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index e0862e2778..473d32a5cd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 02/11] rerere: lowercase error messages
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Documentation/CodingGuidelines mentions that error messages should be
lowercase.  Prior to marking them for translation follow that pattern
in rerere as well, so translators won't have to translate messages
that don't conform to our guidelines.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/rerere.c b/rerere.c
index 473d32a5cd..c5d9ea171f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno("could not write %s", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error("there were errors while writing %s (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush %s", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in %s", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -690,11 +690,11 @@ static int merge(const struct rerere_id *id, const char *path)
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno("could not write %s", path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno("writing %s failed", path);
 
 out:
 	free(cur.ptr);
@@ -720,7 +720,7 @@ static void update_paths(struct string_list *update)
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die("unable to write new index file");
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die("could not create directory %s", git_path_rr_cache());
 	return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error("could not parse conflict hunks in '%s'", path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error("failed to update conflicted state in '%s'", path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 03/11] rerere: wrap paths in output in sq
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
                       ` (9 subsequent siblings)
  12 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

It looks like most paths in the output in the git codebase are wrapped
in single quotes.  Standardize on that in rerere as well.

Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  2 +-
 rerere.c         | 26 +++++++++++++-------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0bc40298c2..e0c67c98e9 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for %s", rerere_path(id, NULL));
+				die("unable to generate diff for '%s'", rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index c5d9ea171f..cde1f6e696 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write %s", output);
+			error_errno("could not write '%s'", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing %s (%s)",
+		error("there were errors while writing '%s' (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush '%s'", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in '%s'", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno("failed utime() on '%s'",
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write %s", path);
+		error_errno("could not write '%s'", path);
 	if (fclose(f))
-		return error_errno("writing %s failed", path);
+		return error_errno("writing '%s' failed", path);
 
 out:
 	free(cur.ptr);
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory %s", git_path_rr_cache());
+		die("could not create directory '%s'", git_path_rr_cache());
 	return 1;
 }
 
@@ -1067,9 +1067,9 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error("no remembered resolution for '%s'", path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno("cannot unlink '%s'", filename);
 		goto fail_exit;
 	}
 
@@ -1088,7 +1088,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf(stderr, "Forgot resolution for '%s'\n", path);
 	return 0;
 
 fail_exit:
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 04/11] rerere: mark strings for translation
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (2 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-15 13:24       ` Simon Ruderich
  2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
                       ` (8 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a plumbing command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on the output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  4 +--
 rerere.c         | 68 ++++++++++++++++++++++++------------------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index e0c67c98e9..5ed941b91f 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -75,7 +75,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[0], "forget")) {
 		struct pathspec pathspec;
 		if (argc < 2)
-			warning("'git rerere forget' without paths is deprecated");
+			warning(_("'git rerere forget' without paths is deprecated"));
 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
 			       prefix, argv + 1);
 		return rerere_forget(&pathspec);
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
+				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index cde1f6e696..be98c0afcb 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write '%s'", output);
+			error_errno(_("could not write '%s'"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing '%s' (%s)",
+		error(_("there were errors while writing '%s' (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush '%s'", path);
+		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on '%s'",
+		warning_errno(_("failed utime() on '%s'"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write '%s'", path);
+		error_errno(_("could not write '%s'"), path);
 	if (fclose(f))
-		return error_errno("writing '%s' failed", path);
+		return error_errno(_("writing '%s' failed"), path);
 
 out:
 	free(cur.ptr);
@@ -714,13 +714,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("unable to write new index file");
+		die(_("unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -752,7 +752,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -786,9 +786,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -802,11 +802,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory '%s'", git_path_rr_cache());
+		die(_("could not create directory '%s'"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("failed to update conflicted state in '%s'", path);
+			error(_("failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1060,16 +1060,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for '%s'", path);
+			error(_("no remembered resolution for '%s'"), path);
 		else
-			error_errno("cannot unlink '%s'", filename);
+			error_errno(_("cannot unlink '%s'"), filename);
 		goto fail_exit;
 	}
 
@@ -1079,7 +1079,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1088,7 +1088,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
+	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
 	return 0;
 
 fail_exit:
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1191,7 +1191,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 05/11] rerere: add documentation for conflict normalization
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (3 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:50       ` Junio C Hamano
  2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
                       ` (7 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 140 +++++++++++++++++++++++++++++
 rerere.c                           |   4 -
 2 files changed, 140 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..4102cce7aa
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing extraneous
+information from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks.  More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document).  In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    =======
+    C
+    >>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    ||||||| merged common ancestors
+    A
+    =======
+    C
+    >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+    After examining what branches AB and AC did, I believe that making
+    line A into line D is the best thing to do that is compatible with
+    what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed.  The above
+examples would both result in the following normalized conflict:
+
+    <<<<<<<
+    B
+    =======
+    C
+    >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+    - AB: changes A to B
+    - AC: changes A to C
+    - XY: changes X to Y
+    - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+    After examining what branches AB and AC did, I believe that
+    making line A into line D is the best thing to do that is
+    compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters.  The conflict markers are stripped out
+before the sha1 is calculated.  So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index be98c0afcb..da1ab54027 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (4 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:50       ` Junio C Hamano
  2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
                       ` (6 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict in a file, but
commits the file with the conflict markers, and later the file ends up
in a state in which rerere can't handle it, subsequent rerere
operations that are interested in that path, such as 'rerere clear' or
'rerere forget <path>' will fail, or even worse in the case of 'rerere
clear' segfault.

Such states include nested conflicts, or an extra conflict marker that
doesn't have any match.

This is because the first 'git rerere' when there was only one
conflict in the file leaves an entry in the MERGE_RR file behind.  The
next 'git rerere' will then pick the rerere ID for that file up, and
not assign a new ID as it can't successfully calculate one.  It will
however still try to do the rerere operation, because of the existing
ID.  As the handle_file function fails, it will remove the 'preimage'
for the ID in the process, while leaving the ID in the MERGE_RR file.

Now when 'rerere clear' for example is run, it will segfault in
'has_rerere_resolution', because status is NULL.

To fix this, remove the rerere ID from the MERGE_RR file in the case
when we can't handle it, and remove the corresponding variant from
.git/rr-cache/.  Removing it unconditionally is fine here, because if
the user would have resolved the conflict and ran rerere, the entry
would no longer be in the MERGE_RR file, so we wouldn't have this
problem in the first place, while if the conflict was not resolved,
the only thing that's left in the folder is the 'preimage', which by
itself will be regenerated by git if necessary, so the user won't
loose any work.

Note that other variants that have the same conflict ID will not be
touched.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index da1ab54027..895ad80c0c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 8417e5a4b1..34f0518a5e 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with unexpected conflict markers does not crash' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1 &&
+	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
+	mv test.tmp test &&
+	git rerere &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 07/11] rerere: only return whether a path has conflicts or not
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (5 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:50       ` Junio C Hamano
  2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
                       ` (5 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 895ad80c0c..bf803043e2 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -954,7 +955,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1008,11 +1009,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 08/11] rerere: factor out handle_conflict function
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (6 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:51       ` Junio C Hamano
  2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
                       ` (4 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is in preparation for a subsequent commit,
where this function will be re-used.  No functional changes intended.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 87 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index bf803043e2..2d62251943 100644
--- a/rerere.c
+++ b/rerere.c
@@ -384,85 +384,92 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
+	int has_conflicts = -1;
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
+			break;
 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
-				goto bad;
+				break;
 			hunk = RR_ORIGINAL;
 		} else if (is_cmarker(buf.buf, '=', marker_size)) {
 			if (hunk != RR_SIDE_1 && hunk != RR_ORIGINAL)
-				goto bad;
+				break;
 			hunk = RR_SIDE_2;
 		} else if (is_cmarker(buf.buf, '>', marker_size)) {
 			if (hunk != RR_SIDE_2)
-				goto bad;
+				break;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
 			has_conflicts = 1;
-			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
 			rerere_io_putconflict('=', marker_size, io);
 			rerere_io_putmem(two.buf, two.len, io);
 			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			break;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(io, marker_size,
+							sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 09/11] rerere: return strbuf from handle path
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (7 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:51       ` Junio C Hamano
  2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
                       ` (3 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently we write the conflict to disk directly in the handle_path
function.  To make it re-usable for nested conflicts, instead of
writing the conflict out directly, store it in a strbuf and let the
caller write it out.

This does mean some slight increase in memory usage, however that
increase is limited to the size of the largest conflict we've
currently processed.  We already keep one copy of the conflict in
memory, and it shouldn't be too large, so the increase in memory usage
seems acceptable.

As a bonus this lets us get replace the rerere_io_putconflict function
with a trivial two line function.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 58 ++++++++++++++++++--------------------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index 2d62251943..a35b88916c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,7 +352,14 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
 	enum {
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
@@ -410,11 +385,11 @@ static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *c
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
 			has_conflicts = 1;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
 			if (ctx) {
 				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
@@ -451,21 +426,24 @@ static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *c
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, out = STRBUF_INIT;
 	int has_conflicts = 0;
 	if (sha1)
 		git_SHA1_Init(&ctx);
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			has_conflicts = handle_conflict(io, marker_size,
+			has_conflicts = handle_conflict(&out, io, marker_size,
 							sha1 ? &ctx : NULL);
 			if (has_conflicts < 0)
 				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
 		} else
 			rerere_io_putstr(buf.buf, io);
 	}
 	strbuf_release(&buf);
+	strbuf_release(&out);
 
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (8 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:45       ` Junio C Hamano
  2018-07-14 21:44     ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
                       ` (2 subsequent siblings)
  12 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be recorde as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as

    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place in the inner
conflict should have no practical impact, but it simplifies the
implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 42 ++++++++++++++++++++++++++++++
 rerere.c                           | 10 +++++--
 t/t4200-rerere.sh                  | 37 ++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index 4102cce7aa..60d48dc4fe 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -138,3 +138,45 @@ SHA1('B<NUL>C<NUL>').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Similar to simple conflicts, the conflict is first normalized by
+stripping the labels from conflict markers, stripping the diff3
+output, and the sorting the conflict hunks, both for the outer and the
+inner conflict.  This is done recursively, so any number of nested
+conflicts can be handled.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+    <<<<<<< HEAD
+    1
+    =======
+    <<<<<<< HEAD
+    3
+    =======
+    2
+    >>>>>>> branch-2
+    >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, and sorting
+the hunks, the conflict would look as follows:
+
+    <<<<<<<
+    1
+    =======
+    <<<<<<<
+    2
+    =======
+    3
+    >>>>>>>
+    >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
diff --git a/rerere.c b/rerere.c
index a35b88916c..f78bef80b1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = -1;
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			break;
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				break;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				break;
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 34f0518a5e..d63fe2b33b 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -602,4 +602,41 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
 	git rerere clear
 '
 
+test_expect_success 'rerere with inner conflict markers' '
+	git reset --hard &&
+
+	git checkout -b A master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b B master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+
+	echo "resolved" >test &&
+	git add test &&
+	git commit -q -m "solved conflict" &&
+
+	echo "resolved" >expect &&
+
+	git reset --hard HEAD~~ &&
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+	cat test >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (9 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-30 17:50     ` [PATCH v3 00/11] rerere: handle nested conflicts Junio C Hamano
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
  12 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict, commits the results,
and does an operation which creates another conflict, rerere will use
the ID of the previously unresolved conflict for the new conflict.
This is because the conflict is kept in the MERGE_RR file, which
'rerere' reads every time it is invoked.

After the new conflict is solved, rerere will record the resolution
with the ID of the old conflict.  So in order to replay the conflict,
both merges would have to be re-done, instead of just the last one, in
order for rerere to be able to automatically resolve the conflict.

Instead of that, assign a new conflict ID if there are still conflicts
in a file and the file had conflicts at a previous step.  This ID
matches the conflict we actually resolved at the corresponding step.

Note that there are no backwards compatibility worries here, as rerere
would have failed to even normalize the conflict before this patch
series.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 7 +++----
 t/t4200-rerere.sh | 7 +++++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f78bef80b1..dd81d09e19 100644
--- a/rerere.c
+++ b/rerere.c
@@ -815,7 +815,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret, has_string;
+		int ret;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -823,12 +823,11 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		has_string = string_list_has_string(rr, path);
-		if (ret < 0 && has_string) {
+		if (ret != 0 && string_list_has_string(rr, path)) {
 			remove_variant(string_list_lookup(rr, path)->util);
 			string_list_remove(rr, path, 1);
 		}
-		if (ret < 1 || has_string)
+		if (ret < 1)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index d63fe2b33b..bfb37ed4fc 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -636,6 +636,13 @@ test_expect_success 'rerere with inner conflict markers' '
 	git commit -q -m "will solve conflicts later" &&
 	test_must_fail git merge A &&
 	cat test >actual &&
+	test_cmp expect actual &&
+
+	git add test &&
+	git commit -m "rerere solved conflict" &&
+	git reset --hard HEAD~ &&
+	test_must_fail git merge A &&
+	cat test >actual &&
 	test_cmp expect actual
 '
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 04/11] rerere: mark strings for translation
  2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
@ 2018-07-15 13:24       ` Simon Ruderich
  2018-07-16 20:40         ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Simon Ruderich @ 2018-07-15 13:24 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Junio C Hamano

On Sat, Jul 14, 2018 at 10:44:36PM +0100, Thomas Gummerer wrote:
> 'git rerere' is considered a plumbing command and as such its output

s/plumbing/porcelain/?

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 04/11] rerere: mark strings for translation
  2018-07-15 13:24       ` Simon Ruderich
@ 2018-07-16 20:40         ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-16 20:40 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: git, Junio C Hamano

On 07/15, Simon Ruderich wrote:
> On Sat, Jul 14, 2018 at 10:44:36PM +0100, Thomas Gummerer wrote:
> > 'git rerere' is considered a plumbing command and as such its output
> 
> s/plumbing/porcelain/?

Ah yes indeed.  Thanks for catching!

> Regards
> Simon
> -- 
> + privacy is necessary
> + using gnupg http://gnupg.org
> + public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts
  2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-07-30 17:45       ` Junio C Hamano
  2018-07-30 20:20         ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:45 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Currently rerere can't handle nested conflicts and will error out when
> it encounters such conflicts.  Do that by recursively calling the
> 'handle_conflict' function to normalize the conflict.
>
> The conflict ID calculation here deserves some explanation:
>
> As we are using the same handle_conflict function, the nested conflict
> is normalized the same way as for non-nested conflicts, which means
> the ancestor in the diff3 case is stripped out, and the parts of the
> conflict are ordered alphabetically.
>
> The conflict ID is however is only calculated in the top level
> handle_conflict call, so it will include the markers that 'rerere'
> adds to the output.  e.g. say there's the following conflict:
>
>     <<<<<<< HEAD
>     1
>     =======
>     <<<<<<< HEAD
>     3
>     =======
>     2
>     >>>>>>> branch-2
>     >>>>>>> branch-3~

Hmph, I vaguely recall that I made inner merges to use the conflict
markers automatically lengthened (by two, if I recall correctly)
than its immediate outer merge.  Wouldn't the above look more like

     <<<<<<< HEAD
     1
     =======
     <<<<<<<<< HEAD
     3
     =========
     2
     >>>>>>>>> branch-2
     >>>>>>> branch-3~
    
Perhaps I am not recalling it correctly.

> it would be recorde as follows in the preimage:
>
>     <<<<<<<
>     1
>     =======
>     <<<<<<<
>     2
>     =======
>     3
>     >>>>>>>
>     >>>>>>>
>
> and the conflict ID would be calculated as
>
>     sha1(1<NUL><<<<<<<
>     2
>     =======
>     3
>     >>>>>>><NUL>)
>
> Stripping out vs. leaving the conflict markers in place in the inner
> conflict should have no practical impact, but it simplifies the
> implementation.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  Documentation/technical/rerere.txt | 42 ++++++++++++++++++++++++++++++
>  rerere.c                           | 10 +++++--
>  t/t4200-rerere.sh                  | 37 ++++++++++++++++++++++++++
>  3 files changed, 87 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
> index 4102cce7aa..60d48dc4fe 100644
> --- a/Documentation/technical/rerere.txt
> +++ b/Documentation/technical/rerere.txt
> @@ -138,3 +138,45 @@ SHA1('B<NUL>C<NUL>').
>  If there are multiple conflicts in one file, the sha1 is calculated
>  the same way with all hunks appended to each other, in the order in
>  which they appear in the file, separated by a <NUL> character.
> +
> +Nested conflicts
> +~~~~~~~~~~~~~~~~
> +
> +Nested conflicts are handled very similarly to "simple" conflicts.
> +Similar to simple conflicts, the conflict is first normalized by
> +stripping the labels from conflict markers, stripping the diff3
> +output, and the sorting the conflict hunks, both for the outer and the
> +inner conflict.  This is done recursively, so any number of nested
> +conflicts can be handled.
> +
> +The only difference is in how the conflict ID is calculated.  For the
> +inner conflict, the conflict markers themselves are not stripped out
> +before calculating the sha1.
> +
> +Say we have the following conflict for example:
> +
> +    <<<<<<< HEAD
> +    1
> +    =======
> +    <<<<<<< HEAD
> +    3
> +    =======
> +    2
> +    >>>>>>> branch-2
> +    >>>>>>> branch-3~
> +
> +After stripping out the labels of the conflict markers, and sorting
> +the hunks, the conflict would look as follows:
> +
> +    <<<<<<<
> +    1
> +    =======
> +    <<<<<<<
> +    2
> +    =======
> +    3
> +    >>>>>>>
> +    >>>>>>>
> +
> +and finally the conflict ID would be calculated as:
> +`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
> diff --git a/rerere.c b/rerere.c
> index a35b88916c..f78bef80b1 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
>  		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
>  	} hunk = RR_SIDE_1;
>  	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
> -	struct strbuf buf = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
>  	int has_conflicts = -1;
>  
>  	while (!io->getline(&buf, io)) {
>  		if (is_cmarker(buf.buf, '<', marker_size)) {
> -			break;
> +			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
> +				break;
> +			if (hunk == RR_SIDE_1)
> +				strbuf_addbuf(&one, &conflict);
> +			else
> +				strbuf_addbuf(&two, &conflict);

Hmph, do we ever see the inner conflict block while we are skipping
and ignoring the common ancestor version, or it is impossible that
we see '<' only while processing either our or their side?

> +			strbuf_release(&conflict);
>  		} else if (is_cmarker(buf.buf, '|', marker_size)) {
>  			if (hunk != RR_SIDE_1)
>  				break;
> diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
> index 34f0518a5e..d63fe2b33b 100755
> --- a/t/t4200-rerere.sh
> +++ b/t/t4200-rerere.sh
> @@ -602,4 +602,41 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
>  	git rerere clear
>  '
>  
> +test_expect_success 'rerere with inner conflict markers' '
> +	git reset --hard &&
> +
> +	git checkout -b A master &&
> +	echo "bar" >test &&
> +	git add test &&
> +	git commit -q -m two &&
> +	echo "baz" >test &&
> +	git add test &&
> +	git commit -q -m three &&
> +
> +	git reset --hard &&
> +	git checkout -b B master &&
> +	echo "foo" >test &&
> +	git add test &&
> +	git commit -q -a -m one &&
> +
> +	test_must_fail git merge A~ &&
> +	git add test &&
> +	git commit -q -m "will solve conflicts later" &&
> +	test_must_fail git merge A &&
> +
> +	echo "resolved" >test &&
> +	git add test &&
> +	git commit -q -m "solved conflict" &&
> +
> +	echo "resolved" >expect &&
> +
> +	git reset --hard HEAD~~ &&
> +	test_must_fail git merge A~ &&
> +	git add test &&
> +	git commit -q -m "will solve conflicts later" &&
> +	test_must_fail git merge A &&
> +	cat test >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 00/11] rerere: handle nested conflicts
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (10 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
@ 2018-07-30 17:50     ` Junio C Hamano
  2018-07-30 20:49       ` Thomas Gummerer
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
  12 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:50 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Thomas Gummerer (11):
>   rerere: unify error messages when read_cache fails
>   rerere: lowercase error messages
>   rerere: wrap paths in output in sq
>   rerere: mark strings for translation
>   rerere: add documentation for conflict normalization
>   rerere: fix crash when conflict goes unresolved
>   rerere: only return whether a path has conflicts or not
>   rerere: factor out handle_conflict function
>   rerere: return strbuf from handle path
>   rerere: teach rerere to handle nested conflicts
>   rerere: recalculate conflict ID when unresolved conflict is committed

Even though I am not certain about the last two steps, everything
before them looked trivially correct and good changes (well, the
"strbuf" one's goodness obviously depends on the goodness of the
last two, which are helped by it).

Sorry for taking so long before getting to the series.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 05/11] rerere: add documentation for conflict normalization
  2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
@ 2018-07-30 17:50       ` Junio C Hamano
  2018-07-30 20:21         ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:50 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> +Different conflict styles and branch names are normalized by stripping
> +the labels from the conflict markers, and removing extraneous
> +information from the `diff3` conflict style. Branches that are merged

s/extraneous information/commmon ancestor version/ perhaps, to be
fact-based without passing value judgment?

We drop the common ancestor version only because we cannot normalize
from `merge` style to `diff3` style by adding one, and not because
it is extraneous.  It does help humans understand the conflict a lot
better to have that section.

> +By extension, this means that rerere should recognize that the above
> +conflicts are the same.  To do this, the labels on the conflict
> +markers are stripped, and the diff3 output is removed.  The above

s/diff3 output/common ancestor version/, as "diff3 output" would
mean the whole thing between <<< and >>> to readers.

> diff --git a/rerere.c b/rerere.c
> index be98c0afcb..da1ab54027 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
>   * and NUL concatenated together.
>   *
>   * Return the number of conflict hunks found.
> - *
> - * NEEDSWORK: the logic and theory of operation behind this conflict
> - * normalization may deserve to be documented somewhere, perhaps in
> - * Documentation/technical/rerere.txt.
>   */
>  static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
>  {

Thanks for finally removing this age-old NEEDSWORK comment.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved
  2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-07-30 17:50       ` Junio C Hamano
  2018-07-30 20:45         ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:50 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Currently when a user doesn't resolve a conflict in a file, but
> commits the file with the conflict markers, and later the file ends up
> in a state in which rerere can't handle it, subsequent rerere
> operations that are interested in that path, such as 'rerere clear' or
> 'rerere forget <path>' will fail, or even worse in the case of 'rerere
> clear' segfault.
>
> Such states include nested conflicts, or an extra conflict marker that
> doesn't have any match.
>
> This is because the first 'git rerere' when there was only one
> conflict in the file leaves an entry in the MERGE_RR file behind.  The

I find this sentence, especially the "only one conflict in the file"
part, a bit unclear.  What does the sentence count as one conflict?
One block of lines enclosed inside "<<<"...">>>" pair?  The command
behaves differently when there are two such blocks instead?

> next 'git rerere' will then pick the rerere ID for that file up, and
> not assign a new ID as it can't successfully calculate one.  It will
> however still try to do the rerere operation, because of the existing
> ID.  As the handle_file function fails, it will remove the 'preimage'
> for the ID in the process, while leaving the ID in the MERGE_RR file.
>
> Now when 'rerere clear' for example is run, it will segfault in
> 'has_rerere_resolution', because status is NULL.

I think this "status" refers to the collection->status[].  How do we
get into that state, though?

new_rerere_id() and new_rerere_id_hex() fills id->collection by
calling find_rerere_dir(), which either finds an existing rerere_dir
instance or manufactures one with .status==NULL.  The .status[]
array is later grown by calling fit_variant as we scan and find the
pre/post images, but because there is no pre/post image for a file
with unparseable conflicts, it is left NULL.

So another possible fix could be to make sure that .status[] is only
read when .status_nr says there is something worth reading.  I am
not saying that would be a better fix---I am just thinking out loud
to make sure I understand the issue correctly.

> To fix this, remove the rerere ID from the MERGE_RR file in the case
> when we can't handle it, and remove the corresponding variant from
> .git/rr-cache/.  Removing it unconditionally is fine here, because if
> the user would have resolved the conflict and ran rerere, the entry
> would no longer be in the MERGE_RR file, so we wouldn't have this
> problem in the first place, while if the conflict was not resolved,
> the only thing that's left in the folder is the 'preimage', which by
> itself will be regenerated by git if necessary, so the user won't
> loose any work.

s/loose/lose/

> Note that other variants that have the same conflict ID will not be
> touched.

Nice.  Thanks for a fix.

>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  rerere.c          | 12 +++++++-----
>  t/t4200-rerere.sh | 22 ++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/rerere.c b/rerere.c
> index da1ab54027..895ad80c0c 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
>  		struct rerere_id *id;
>  		unsigned char sha1[20];
>  		const char *path = conflict.items[i].string;
> -		int ret;
> -
> -		if (string_list_has_string(rr, path))
> -			continue;
> +		int ret, has_string;
>  
>  		/*
>  		 * Ask handle_file() to scan and assign a
> @@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
>  		 * yet.
>  		 */
>  		ret = handle_file(path, sha1, NULL);
> -		if (ret < 1)
> +		has_string = string_list_has_string(rr, path);
> +		if (ret < 0 && has_string) {
> +			remove_variant(string_list_lookup(rr, path)->util);
> +			string_list_remove(rr, path, 1);
> +		}
> +		if (ret < 1 || has_string)
>  			continue;

We used to say "if we know about the path we do not do anything
here, if we do not see any conflict in the file we do nothing,
otherwise we assign a new id"; we now say "see if we can parse
and also see if we have conflict(s); if we know about the path and
we cannot parse, drop it from the rr database (because otherwise the
entry will cause us trouble elsewhere later).  Otherwise, if we do
not have any conflict or we already know about the path, no need to
do anything. Otherwise, i.e. a newly discovered path with conflicts
gets a new id".

Makes sense.  "A known path with unparseable conflict gets dropped"
is the important change in this hunk.

> diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
> index 8417e5a4b1..34f0518a5e 100755
> --- a/t/t4200-rerere.sh
> +++ b/t/t4200-rerere.sh
> @@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' '
>  	count_pre_post 0 0
>  '
>  
> +test_expect_success 'rerere with unexpected conflict markers does not crash' '
> +	git reset --hard &&
> +
> +	git checkout -b branch-1 master &&
> +	echo "bar" >test &&
> +	git add test &&
> +	git commit -q -m two &&
> +
> +	git reset --hard &&
> +	git checkout -b branch-2 master &&
> +	echo "foo" >test &&
> +	git add test &&
> +	git commit -q -a -m one &&
> +
> +	test_must_fail git merge branch-1 &&
> +	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
> +	mv test.tmp test &&

OK, so the "only one conflict" in the log message meant just one
side of the conflict marker.  More generally, the troublesome is
to have "conflict marker(s) that cannot be parsed" in the file.

> +	git rerere &&
> +
> +	git rerere clear
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 07/11] rerere: only return whether a path has conflicts or not
  2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-07-30 17:50       ` Junio C Hamano
  2018-07-30 20:47         ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:50 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> We currently return the exact number of conflict hunks a certain path
> has from the 'handle_paths' function.  However all of its callers only
> care whether there are conflicts or not or if there is an error.
> Return only that information, and document that only that information
> is returned.  This will simplify the code in the subsequent steps.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  rerere.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)

I do recall writing this code without knowing if the actual number
of conflicts would be useful by callers, but it is apparent that it
wasn't.  I won't mind losing that bit of info at all.  Besides, we
won't risk mistaking a file with 2 billion conflicts with a file
whose conflicts cannot be parsed ;-).

The patch looks good.  Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 08/11] rerere: factor out handle_conflict function
  2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-07-30 17:51       ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:51 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Factor out the handle_conflict function, which handles a single
> conflict in a path.  This is in preparation for a subsequent commit,
> where this function will be re-used.  No functional changes intended.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  rerere.c | 87 ++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 47 insertions(+), 40 deletions(-)

Renumbering of the enum made me raise my eyebrow a bit briefly but
it is merely to keep track of the state locally and invisible from
the outside, so it is perfectly fine.

> -	git_SHA_CTX ctx;
> -	int has_conflicts = 0;
>  	enum {
> -		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
> -	} hunk = RR_CONTEXT;
> +		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
> +	} hunk = RR_SIDE_1;

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 09/11] rerere: return strbuf from handle path
  2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
@ 2018-07-30 17:51       ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2018-07-30 17:51 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Currently we write the conflict to disk directly in the handle_path
> function.  To make it re-usable for nested conflicts, instead of
> writing the conflict out directly, store it in a strbuf and let the
> caller write it out.
>
> This does mean some slight increase in memory usage, however that
> increase is limited to the size of the largest conflict we've
> currently processed.  We already keep one copy of the conflict in
> memory, and it shouldn't be too large, so the increase in memory usage
> seems acceptable.
>
> As a bonus this lets us get replace the rerere_io_putconflict function
> with a trivial two line function.

Makes sense.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts
  2018-07-30 17:45       ` Junio C Hamano
@ 2018-07-30 20:20         ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-30 20:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Currently rerere can't handle nested conflicts and will error out when
> > it encounters such conflicts.  Do that by recursively calling the
> > 'handle_conflict' function to normalize the conflict.
> >
> > The conflict ID calculation here deserves some explanation:
> >
> > As we are using the same handle_conflict function, the nested conflict
> > is normalized the same way as for non-nested conflicts, which means
> > the ancestor in the diff3 case is stripped out, and the parts of the
> > conflict are ordered alphabetically.
> >
> > The conflict ID is however is only calculated in the top level
> > handle_conflict call, so it will include the markers that 'rerere'
> > adds to the output.  e.g. say there's the following conflict:
> >
> >     <<<<<<< HEAD
> >     1
> >     =======
> >     <<<<<<< HEAD
> >     3
> >     =======
> >     2
> >     >>>>>>> branch-2
> >     >>>>>>> branch-3~
> 
> Hmph, I vaguely recall that I made inner merges to use the conflict
> markers automatically lengthened (by two, if I recall correctly)
> than its immediate outer merge.  Wouldn't the above look more like
> 
>      <<<<<<< HEAD
>      1
>      =======
>      <<<<<<<<< HEAD
>      3
>      =========
>      2
>      >>>>>>>>> branch-2
>      >>>>>>> branch-3~
>     
> Perhaps I am not recalling it correctly.

The only way I could reproduce this is by not resolving a conflict
(just leaving the conflict markers in place, but running 'git add
conflicted'), and then merging something else, which produces another
conflict, where one of the sides was the one with conflict markers
already in the file, same as what I did in the test.

So in that case, the conflict markers of the already existing conflict
would just be treated as normal text during the merge I believe, and
thus the new conflict markers would be the same length.

The usage of git is really a bit wrong here, so I don't know if it's
actually worth helping the users at this point.  But trying to
understand how rerere exactly works, I had this written up already, so
I thought I would include it in this series anyway in case it helps
somebody :)

> > it would be recorde as follows in the preimage:
> >
> >     <<<<<<<
> >     1
> >     =======
> >     <<<<<<<
> >     2
> >     =======
> >     3
> >     >>>>>>>
> >     >>>>>>>
> >
> > and the conflict ID would be calculated as
> >
> >     sha1(1<NUL><<<<<<<
> >     2
> >     =======
> >     3
> >     >>>>>>><NUL>)
> >
> > Stripping out vs. leaving the conflict markers in place in the inner
> > conflict should have no practical impact, but it simplifies the
> > implementation.
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  Documentation/technical/rerere.txt | 42 ++++++++++++++++++++++++++++++
> >  rerere.c                           | 10 +++++--
> >  t/t4200-rerere.sh                  | 37 ++++++++++++++++++++++++++
> >  3 files changed, 87 insertions(+), 2 deletions(-)
> >
> > [..snip..]
> > 
> > diff --git a/rerere.c b/rerere.c
> > index a35b88916c..f78bef80b1 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
> >  		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
> >  	} hunk = RR_SIDE_1;
> >  	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
> > -	struct strbuf buf = STRBUF_INIT;
> > +	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
> >  	int has_conflicts = -1;
> >  
> >  	while (!io->getline(&buf, io)) {
> >  		if (is_cmarker(buf.buf, '<', marker_size)) {
> > -			break;
> > +			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
> > +				break;
> > +			if (hunk == RR_SIDE_1)
> > +				strbuf_addbuf(&one, &conflict);
> > +			else
> > +				strbuf_addbuf(&two, &conflict);
> 
> Hmph, do we ever see the inner conflict block while we are skipping
> and ignoring the common ancestor version, or it is impossible that
> we see '<' only while processing either our or their side?

As mentioned above, I haven't been able to reproduce creating an inner
conflict block outside of the case mentioned above, where the user
committed conflict markers, and then did another merge.

I don't think it can appear outside of that case in "normal"
operation.

> > +			strbuf_release(&conflict);
> >  		} else if (is_cmarker(buf.buf, '|', marker_size)) {
> >  			if (hunk != RR_SIDE_1)
> >  				break;
> > diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
> > index 34f0518a5e..d63fe2b33b 100755
> > --- a/t/t4200-rerere.sh
> > +++ b/t/t4200-rerere.sh
> > @@ -602,4 +602,41 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
> >  	git rerere clear
> >  '
> >  
> > +test_expect_success 'rerere with inner conflict markers' '
> > +	git reset --hard &&
> > +
> > +	git checkout -b A master &&
> > +	echo "bar" >test &&
> > +	git add test &&
> > +	git commit -q -m two &&
> > +	echo "baz" >test &&
> > +	git add test &&
> > +	git commit -q -m three &&
> > +
> > +	git reset --hard &&
> > +	git checkout -b B master &&
> > +	echo "foo" >test &&
> > +	git add test &&
> > +	git commit -q -a -m one &&
> > +
> > +	test_must_fail git merge A~ &&
> > +	git add test &&
> > +	git commit -q -m "will solve conflicts later" &&
> > +	test_must_fail git merge A &&
> > +
> > +	echo "resolved" >test &&
> > +	git add test &&
> > +	git commit -q -m "solved conflict" &&
> > +
> > +	echo "resolved" >expect &&
> > +
> > +	git reset --hard HEAD~~ &&
> > +	test_must_fail git merge A~ &&
> > +	git add test &&
> > +	git commit -q -m "will solve conflicts later" &&
> > +	test_must_fail git merge A &&
> > +	cat test >actual &&
> > +	test_cmp expect actual
> > +'
> > +
> >  test_done

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 05/11] rerere: add documentation for conflict normalization
  2018-07-30 17:50       ` Junio C Hamano
@ 2018-07-30 20:21         ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-30 20:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > +Different conflict styles and branch names are normalized by stripping
> > +the labels from the conflict markers, and removing extraneous
> > +information from the `diff3` conflict style. Branches that are merged
> 
> s/extraneous information/commmon ancestor version/ perhaps, to be
> fact-based without passing value judgment?

Yeah I meant "extraneous information for rerere", but common ancester
version is better.

> We drop the common ancestor version only because we cannot normalize
> from `merge` style to `diff3` style by adding one, and not because
> it is extraneous.  It does help humans understand the conflict a lot
> better to have that section.
> 
> > +By extension, this means that rerere should recognize that the above
> > +conflicts are the same.  To do this, the labels on the conflict
> > +markers are stripped, and the diff3 output is removed.  The above
> 
> s/diff3 output/common ancestor version/, as "diff3 output" would
> mean the whole thing between <<< and >>> to readers.

Makes sense, will fix in the re-roll, thanks!

> > diff --git a/rerere.c b/rerere.c
> > index be98c0afcb..da1ab54027 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
> >   * and NUL concatenated together.
> >   *
> >   * Return the number of conflict hunks found.
> > - *
> > - * NEEDSWORK: the logic and theory of operation behind this conflict
> > - * normalization may deserve to be documented somewhere, perhaps in
> > - * Documentation/technical/rerere.txt.
> >   */
> >  static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
> >  {
> 
> Thanks for finally removing this age-old NEEDSWORK comment.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved
  2018-07-30 17:50       ` Junio C Hamano
@ 2018-07-30 20:45         ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-30 20:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Currently when a user doesn't resolve a conflict in a file, but
> > commits the file with the conflict markers, and later the file ends up
> > in a state in which rerere can't handle it, subsequent rerere
> > operations that are interested in that path, such as 'rerere clear' or
> > 'rerere forget <path>' will fail, or even worse in the case of 'rerere
> > clear' segfault.
> >
> > Such states include nested conflicts, or an extra conflict marker that
> > doesn't have any match.
> >
> > This is because the first 'git rerere' when there was only one
> > conflict in the file leaves an entry in the MERGE_RR file behind.  The
> 
> I find this sentence, especially the "only one conflict in the file"
> part, a bit unclear.  What does the sentence count as one conflict?
> One block of lines enclosed inside "<<<"...">>>" pair?  The command
> behaves differently when there are two such blocks instead?

Yeah as you mentioned below, conflict marker(s) that cannot be parsed
here would make more sense.  Will adjust the commit message.

> > next 'git rerere' will then pick the rerere ID for that file up, and
> > not assign a new ID as it can't successfully calculate one.  It will
> > however still try to do the rerere operation, because of the existing
> > ID.  As the handle_file function fails, it will remove the 'preimage'
> > for the ID in the process, while leaving the ID in the MERGE_RR file.
> >
> > Now when 'rerere clear' for example is run, it will segfault in
> > 'has_rerere_resolution', because status is NULL.
> 
> I think this "status" refers to the collection->status[].  How do we
> get into that state, though?
> 
> new_rerere_id() and new_rerere_id_hex() fills id->collection by
> calling find_rerere_dir(), which either finds an existing rerere_dir
> instance or manufactures one with .status==NULL.  The .status[]
> array is later grown by calling fit_variant as we scan and find the
> pre/post images, but because there is no pre/post image for a file
> with unparseable conflicts, it is left NULL.
> 
> So another possible fix could be to make sure that .status[] is only
> read when .status_nr says there is something worth reading.  I am
> not saying that would be a better fix---I am just thinking out loud
> to make sure I understand the issue correctly.

Yeah what you are writing above matches my understanding, and that
should fix the issue as well.  I haven't actually tried what you're
proposing above, but I think I find it nicer to just remove the entry
we can't do anything with anyway.

> > To fix this, remove the rerere ID from the MERGE_RR file in the case
> > when we can't handle it, and remove the corresponding variant from
> > .git/rr-cache/.  Removing it unconditionally is fine here, because if
> > the user would have resolved the conflict and ran rerere, the entry
> > would no longer be in the MERGE_RR file, so we wouldn't have this
> > problem in the first place, while if the conflict was not resolved,
> > the only thing that's left in the folder is the 'preimage', which by
> > itself will be regenerated by git if necessary, so the user won't
> > loose any work.
> 
> s/loose/lose/
> 
> > Note that other variants that have the same conflict ID will not be
> > touched.
> 
> Nice.  Thanks for a fix.
> 
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  rerere.c          | 12 +++++++-----
> >  t/t4200-rerere.sh | 22 ++++++++++++++++++++++
> >  2 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/rerere.c b/rerere.c
> > index da1ab54027..895ad80c0c 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
> >  		struct rerere_id *id;
> >  		unsigned char sha1[20];
> >  		const char *path = conflict.items[i].string;
> > -		int ret;
> > -
> > -		if (string_list_has_string(rr, path))
> > -			continue;
> > +		int ret, has_string;
> >  
> >  		/*
> >  		 * Ask handle_file() to scan and assign a
> > @@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
> >  		 * yet.
> >  		 */
> >  		ret = handle_file(path, sha1, NULL);
> > -		if (ret < 1)
> > +		has_string = string_list_has_string(rr, path);
> > +		if (ret < 0 && has_string) {
> > +			remove_variant(string_list_lookup(rr, path)->util);
> > +			string_list_remove(rr, path, 1);
> > +		}
> > +		if (ret < 1 || has_string)
> >  			continue;
> 
> We used to say "if we know about the path we do not do anything
> here, if we do not see any conflict in the file we do nothing,
> otherwise we assign a new id"; we now say "see if we can parse
> and also see if we have conflict(s); if we know about the path and
> we cannot parse, drop it from the rr database (because otherwise the
> entry will cause us trouble elsewhere later).  Otherwise, if we do
> not have any conflict or we already know about the path, no need to
> do anything. Otherwise, i.e. a newly discovered path with conflicts
> gets a new id".
> 
> Makes sense.  "A known path with unparseable conflict gets dropped"
> is the important change in this hunk.
> 
> > diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
> > index 8417e5a4b1..34f0518a5e 100755
> > --- a/t/t4200-rerere.sh
> > +++ b/t/t4200-rerere.sh
> > @@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' '
> >  	count_pre_post 0 0
> >  '
> >  
> > +test_expect_success 'rerere with unexpected conflict markers does not crash' '
> > +	git reset --hard &&
> > +
> > +	git checkout -b branch-1 master &&
> > +	echo "bar" >test &&
> > +	git add test &&
> > +	git commit -q -m two &&
> > +
> > +	git reset --hard &&
> > +	git checkout -b branch-2 master &&
> > +	echo "foo" >test &&
> > +	git add test &&
> > +	git commit -q -a -m one &&
> > +
> > +	test_must_fail git merge branch-1 &&
> > +	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
> > +	mv test.tmp test &&
> 
> OK, so the "only one conflict" in the log message meant just one
> side of the conflict marker.  More generally, the troublesome is
> to have "conflict marker(s) that cannot be parsed" in the file.
> 
> > +	git rerere &&
> > +
> > +	git rerere clear
> > +'
> > +
> >  test_done

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 07/11] rerere: only return whether a path has conflicts or not
  2018-07-30 17:50       ` Junio C Hamano
@ 2018-07-30 20:47         ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-30 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > We currently return the exact number of conflict hunks a certain path
> > has from the 'handle_paths' function.  However all of its callers only
> > care whether there are conflicts or not or if there is an error.
> > Return only that information, and document that only that information
> > is returned.  This will simplify the code in the subsequent steps.
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  rerere.c | 23 ++++++++++++-----------
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> I do recall writing this code without knowing if the actual number
> of conflicts would be useful by callers, but it is apparent that it
> wasn't.  I won't mind losing that bit of info at all.  Besides, we
> won't risk mistaking a file with 2 billion conflicts with a file
> whose conflicts cannot be parsed ;-).

Hah, I would love to see someone actually achieve that ;)

> The patch looks good.  Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 00/11] rerere: handle nested conflicts
  2018-07-30 17:50     ` [PATCH v3 00/11] rerere: handle nested conflicts Junio C Hamano
@ 2018-07-30 20:49       ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-07-30 20:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Thomas Gummerer (11):
> >   rerere: unify error messages when read_cache fails
> >   rerere: lowercase error messages
> >   rerere: wrap paths in output in sq
> >   rerere: mark strings for translation
> >   rerere: add documentation for conflict normalization
> >   rerere: fix crash when conflict goes unresolved
> >   rerere: only return whether a path has conflicts or not
> >   rerere: factor out handle_conflict function
> >   rerere: return strbuf from handle path
> >   rerere: teach rerere to handle nested conflicts
> >   rerere: recalculate conflict ID when unresolved conflict is committed
> 
> Even though I am not certain about the last two steps, everything
> before them looked trivially correct and good changes (well, the
> "strbuf" one's goodness obviously depends on the goodness of the
> last two, which are helped by it).
> 
> Sorry for taking so long before getting to the series.

No worries, I realize you are busy with a lot of other things.  Thanks
a lot for your review!

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v4 00/11] rerere: handle nested conflicts
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (11 preceding siblings ...)
  2018-07-30 17:50     ` [PATCH v3 00/11] rerere: handle nested conflicts Junio C Hamano
@ 2018-08-05 17:20     ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
                         ` (10 more replies)
  12 siblings, 11 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

The previous rounds were at
<20180520211210.1248-1-t.gummerer@gmail.com>,
<20180605215219.28783-1-t.gummerer@gmail.com> and
<20180714214443.7184-1-t.gummerer@gmail.com>.

Thanks Junio for the review and Simon for pointing out an error in my
commit message.

The changes in this round are mainly improving the commit messages,
and polishing the documentation.

It also simplifies one test case in patch 6/11.

Patches 10 and 11 are still included, however I'm not going to be too
sad if we decide to not include them, as they really only help in an
obscure case, which could be considered using git "wrong".

I also realized that while I wrote "no functional changes intended" in
7/11, and functional changes were in fact not intended, there still is
a slight functional change.  As I think that's a good change, I
documented it in the commit message.

Thomas Gummerer (11):
  rerere: unify error messages when read_cache fails
  rerere: lowercase error messages
  rerere: wrap paths in output in sq
  rerere: mark strings for translation
  rerere: add documentation for conflict normalization
  rerere: fix crash with files rerere can't handle
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: return strbuf from handle path
  rerere: teach rerere to handle nested conflicts
  rerere: recalculate conflict ID when unresolved conflict is committed

 Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
 builtin/rerere.c                   |   4 +-
 rerere.c                           | 243 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  65 ++++++++
 4 files changed, 365 insertions(+), 129 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

Range diff below:

 1:  ce876f1b6b =  1:  018bd68a8a rerere: unify error messages when read_cache fails
 2:  0326503c4a =  2:  281fcbf24f rerere: lowercase error messages
 3:  a33211e3d3 =  3:  b6d5e2e26d rerere: wrap paths in output in sq
 4:  3da84604f0 !  4:  6ed390c8f5 rerere: mark strings for translation
    @@ -2,7 +2,7 @@
     
         rerere: mark strings for translation
     
    -    'git rerere' is considered a plumbing command and as such its output
    +    'git rerere' is considered a porcelain command and as such its output
         should be translated.  Its functionality is also only enabled through
         a config setting, so scripts really shouldn't rely on the output
         either way.
 5:  749d49a625 !  5:  3cef1d57bc rerere: add documentation for conflict normalization
    @@ -28,8 +28,8 @@
     +conflicts before writing them to the rerere database.
     +
     +Different conflict styles and branch names are normalized by stripping
    -+the labels from the conflict markers, and removing extraneous
    -+information from the `diff3` conflict style. Branches that are merged
    ++the labels from the conflict markers, and removing the common ancestor
    ++version from the `diff3` conflict style. Branches that are merged
     +in different order are normalized by sorting the conflict hunks.  More
     +on each of those steps in the following sections.
     +
    @@ -37,8 +37,8 @@
     +calculated based on the normalized conflict, which is later used by
     +rerere to look up the conflict in the rerere database.
     +
    -+Stripping extraneous information
    -+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ++Removing the common ancestor version
    ++~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +
     +Say we have three branches AB, AC and AC2.  The common ancestor of
     +these branches has a file with a line containing the string "A" (for
    @@ -79,7 +79,7 @@
     +
     +By extension, this means that rerere should recognize that the above
     +conflicts are the same.  To do this, the labels on the conflict
    -+markers are stripped, and the diff3 output is removed.  The above
    ++markers are stripped, and the common ancestor version is removed.  The above
     +examples would both result in the following normalized conflict:
     +
     +    <<<<<<<
 6:  d465bd087e !  6:  a02d90157d rerere: fix crash when conflict goes unresolved
    @@ -1,37 +1,42 @@
     Author: Thomas Gummerer <t.gummerer@gmail.com>
     
    -    rerere: fix crash when conflict goes unresolved
    +    rerere: fix crash with files rerere can't handle
     
    -    Currently when a user doesn't resolve a conflict in a file, but
    -    commits the file with the conflict markers, and later the file ends up
    -    in a state in which rerere can't handle it, subsequent rerere
    -    operations that are interested in that path, such as 'rerere clear' or
    -    'rerere forget <path>' will fail, or even worse in the case of 'rerere
    -    clear' segfault.
    +    Currently when a user does a conflict resolution and ends it (in any
    +    way that calls 'git rerere' again) with a file 'rerere' can't handle,
    +    subsequent rerere operations that are interested in that path, such as
    +    'rerere clear' or 'rerere forget <path>' will fail, or even worse in
    +    the case of 'rerere clear' segfault.
     
    -    Such states include nested conflicts, or an extra conflict marker that
    +    Such states include nested conflicts, or a conflict marker that
         doesn't have any match.
     
    -    This is because the first 'git rerere' when there was only one
    -    conflict in the file leaves an entry in the MERGE_RR file behind.  The
    -    next 'git rerere' will then pick the rerere ID for that file up, and
    -    not assign a new ID as it can't successfully calculate one.  It will
    -    however still try to do the rerere operation, because of the existing
    -    ID.  As the handle_file function fails, it will remove the 'preimage'
    -    for the ID in the process, while leaving the ID in the MERGE_RR file.
    +    This is because 'git rerere' calculates a conflict file and writes it
    +    to the MERGE_RR file.  When the user then changes the file in any way
    +    rerere can't handle, and then calls 'git rerere' on it again to record
    +    the conflict resolution, the handle_file function fails, and removes
    +    the 'preimage' file in the rr-cache in the process, while leaving the
    +    ID in the MERGE_RR file.
     
    -    Now when 'rerere clear' for example is run, it will segfault in
    -    'has_rerere_resolution', because status is NULL.
    +    Now when 'rerere clear' is run, it reads the ID from the MERGE_RR
    +    file, however the 'fit_variant' function for the ID is never called as
    +    the 'preimage' file does not exist anymore.  This means
    +    'collection->status' in 'has_rerere_resolution' is NULL, and the
    +    command will crash.
     
         To fix this, remove the rerere ID from the MERGE_RR file in the case
    -    when we can't handle it, and remove the corresponding variant from
    -    .git/rr-cache/.  Removing it unconditionally is fine here, because if
    -    the user would have resolved the conflict and ran rerere, the entry
    -    would no longer be in the MERGE_RR file, so we wouldn't have this
    -    problem in the first place, while if the conflict was not resolved,
    -    the only thing that's left in the folder is the 'preimage', which by
    -    itself will be regenerated by git if necessary, so the user won't
    -    loose any work.
    +    when we can't handle it, just after the 'preimage' file was removed
    +    and remove the corresponding variant from .git/rr-cache/.  Removing it
    +    unconditionally is fine here, because if the user would have resolved
    +    the conflict and ran rerere, the entry would no longer be in the
    +    MERGE_RR file, so we wouldn't have this problem in the first place,
    +    while if the conflict was not resolved.
    +
    +    Currently there is nothing left in this folder, as the 'preimage'
    +    was already deleted by the 'handle_file' function, so 'remove_variant'
    +    is a no-op.  Still call the function, to make sure we clean everything
    +    up, in case we add some other files corresponding to a variant in the
    +    future.
     
         Note that other variants that have the same conflict ID will not be
         touched.
    @@ -90,8 +95,7 @@
     +	git commit -q -a -m one &&
     +
     +	test_must_fail git merge branch-1 &&
    -+	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
    -+	mv test.tmp test &&
    ++	echo "<<<<<<< a" >test &&
     +	git rerere &&
     +
     +	git rerere clear
 7:  fac2b79245 =  7:  49815bee02 rerere: only return whether a path has conflicts or not
 8:  b5892c1861 !  8:  0c51696d10 rerere: factor out handle_conflict function
    @@ -4,7 +4,13 @@
     
         Factor out the handle_conflict function, which handles a single
         conflict in a path.  This is in preparation for a subsequent commit,
    -    where this function will be re-used.  No functional changes intended.
    +    where this function will be re-used.
    +
    +    Note that this does change the behaviour of 'git rerere' slightly.
    +    Where previously we'd consider all files where an unmatched conflict
    +    marker is found as invalid, we now only consider files invalid when
    +    the "ours" conflict marker ("<<<<<<< <text>") is unmatched, not when
    +    other conflict markers (e.g. "=======") is unmatched.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
 9:  e8e0ca4db9 =  9:  f604efe05d rerere: return strbuf from handle path
10:  1fc106ffaa ! 10:  a2393d3424 rerere: teach rerere to handle nested conflicts
    @@ -6,6 +6,10 @@
         it encounters such conflicts.  Do that by recursively calling the
         'handle_conflict' function to normalize the conflict.
     
    +    Note that a conflict like this would only be produced if a user
    +    commits a file with conflict markers, and gets a conflict including
    +    that in a susbsequent operation.
    +
         The conflict ID calculation here deserves some explanation:
     
         As we are using the same handle_conflict function, the nested conflict
    @@ -66,8 +70,8 @@
     +
     +Nested conflicts are handled very similarly to "simple" conflicts.
     +Similar to simple conflicts, the conflict is first normalized by
    -+stripping the labels from conflict markers, stripping the diff3
    -+output, and the sorting the conflict hunks, both for the outer and the
    ++stripping the labels from conflict markers, stripping the common ancestor
    ++version, and the sorting the conflict hunks, both for the outer and the
     +inner conflict.  This is done recursively, so any number of nested
     +conflicts can be handled.
     +
11:  4463aed2f8 = 11:  371af30766 rerere: recalculate conflict ID when unresolved conflict is committed

-- 
2.18.0.720.gf7a957e2e7

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v4 01/11] rerere: unify error messages when read_cache fails
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 02/11] rerere: lowercase error messages Thomas Gummerer
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index e0862e2778..473d32a5cd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 02/11] rerere: lowercase error messages
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 03/11] rerere: wrap paths in output in sq Thomas Gummerer
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Documentation/CodingGuidelines mentions that error messages should be
lowercase.  Prior to marking them for translation follow that pattern
in rerere as well, so translators won't have to translate messages
that don't conform to our guidelines.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/rerere.c b/rerere.c
index 473d32a5cd..c5d9ea171f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno("could not write %s", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error("there were errors while writing %s (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush %s", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in %s", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -690,11 +690,11 @@ static int merge(const struct rerere_id *id, const char *path)
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno("could not write %s", path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno("writing %s failed", path);
 
 out:
 	free(cur.ptr);
@@ -720,7 +720,7 @@ static void update_paths(struct string_list *update)
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die("unable to write new index file");
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die("could not create directory %s", git_path_rr_cache());
 	return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error("could not parse conflict hunks in '%s'", path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error("failed to update conflicted state in '%s'", path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 03/11] rerere: wrap paths in output in sq
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 02/11] rerere: lowercase error messages Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 04/11] rerere: mark strings for translation Thomas Gummerer
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

It looks like most paths in the output in the git codebase are wrapped
in single quotes.  Standardize on that in rerere as well.

Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  2 +-
 rerere.c         | 26 +++++++++++++-------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0bc40298c2..e0c67c98e9 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for %s", rerere_path(id, NULL));
+				die("unable to generate diff for '%s'", rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index c5d9ea171f..cde1f6e696 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write %s", output);
+			error_errno("could not write '%s'", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing %s (%s)",
+		error("there were errors while writing '%s' (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush '%s'", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in '%s'", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno("failed utime() on '%s'",
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write %s", path);
+		error_errno("could not write '%s'", path);
 	if (fclose(f))
-		return error_errno("writing %s failed", path);
+		return error_errno("writing '%s' failed", path);
 
 out:
 	free(cur.ptr);
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory %s", git_path_rr_cache());
+		die("could not create directory '%s'", git_path_rr_cache());
 	return 1;
 }
 
@@ -1067,9 +1067,9 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error("no remembered resolution for '%s'", path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno("cannot unlink '%s'", filename);
 		goto fail_exit;
 	}
 
@@ -1088,7 +1088,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf(stderr, "Forgot resolution for '%s'\n", path);
 	return 0;
 
 fail_exit:
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 04/11] rerere: mark strings for translation
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (2 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 03/11] rerere: wrap paths in output in sq Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a porcelain command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on the output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  4 +--
 rerere.c         | 68 ++++++++++++++++++++++++------------------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index e0c67c98e9..5ed941b91f 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -75,7 +75,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[0], "forget")) {
 		struct pathspec pathspec;
 		if (argc < 2)
-			warning("'git rerere forget' without paths is deprecated");
+			warning(_("'git rerere forget' without paths is deprecated"));
 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
 			       prefix, argv + 1);
 		return rerere_forget(&pathspec);
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
+				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index cde1f6e696..be98c0afcb 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write '%s'", output);
+			error_errno(_("could not write '%s'"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing '%s' (%s)",
+		error(_("there were errors while writing '%s' (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush '%s'", path);
+		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on '%s'",
+		warning_errno(_("failed utime() on '%s'"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write '%s'", path);
+		error_errno(_("could not write '%s'"), path);
 	if (fclose(f))
-		return error_errno("writing '%s' failed", path);
+		return error_errno(_("writing '%s' failed"), path);
 
 out:
 	free(cur.ptr);
@@ -714,13 +714,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("unable to write new index file");
+		die(_("unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -752,7 +752,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -786,9 +786,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -802,11 +802,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory '%s'", git_path_rr_cache());
+		die(_("could not create directory '%s'"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("failed to update conflicted state in '%s'", path);
+			error(_("failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1060,16 +1060,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for '%s'", path);
+			error(_("no remembered resolution for '%s'"), path);
 		else
-			error_errno("cannot unlink '%s'", filename);
+			error_errno(_("cannot unlink '%s'"), filename);
 		goto fail_exit;
 	}
 
@@ -1079,7 +1079,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1088,7 +1088,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
+	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
 	return 0;
 
 fail_exit:
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1191,7 +1191,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 05/11] rerere: add documentation for conflict normalization
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (3 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 04/11] rerere: mark strings for translation Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 06/11] rerere: fix crash with files rerere can't handle Thomas Gummerer
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 140 +++++++++++++++++++++++++++++
 rerere.c                           |   4 -
 2 files changed, 140 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..3d10dbfa67
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing the common ancestor
+version from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks.  More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Removing the common ancestor version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document).  In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    =======
+    C
+    >>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    ||||||| merged common ancestors
+    A
+    =======
+    C
+    >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+    After examining what branches AB and AC did, I believe that making
+    line A into line D is the best thing to do that is compatible with
+    what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the common ancestor version is removed.  The above
+examples would both result in the following normalized conflict:
+
+    <<<<<<<
+    B
+    =======
+    C
+    >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+    - AB: changes A to B
+    - AC: changes A to C
+    - XY: changes X to Y
+    - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+    After examining what branches AB and AC did, I believe that
+    making line A into line D is the best thing to do that is
+    compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters.  The conflict markers are stripped out
+before the sha1 is calculated.  So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index be98c0afcb..da1ab54027 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 06/11] rerere: fix crash with files rerere can't handle
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (4 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
                         ` (4 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user does a conflict resolution and ends it (in any
way that calls 'git rerere' again) with a file 'rerere' can't handle,
subsequent rerere operations that are interested in that path, such as
'rerere clear' or 'rerere forget <path>' will fail, or even worse in
the case of 'rerere clear' segfault.

Such states include nested conflicts, or a conflict marker that
doesn't have any match.

This is because 'git rerere' calculates a conflict file and writes it
to the MERGE_RR file.  When the user then changes the file in any way
rerere can't handle, and then calls 'git rerere' on it again to record
the conflict resolution, the handle_file function fails, and removes
the 'preimage' file in the rr-cache in the process, while leaving the
ID in the MERGE_RR file.

Now when 'rerere clear' is run, it reads the ID from the MERGE_RR
file, however the 'fit_variant' function for the ID is never called as
the 'preimage' file does not exist anymore.  This means
'collection->status' in 'has_rerere_resolution' is NULL, and the
command will crash.

To fix this, remove the rerere ID from the MERGE_RR file in the case
when we can't handle it, just after the 'preimage' file was removed
and remove the corresponding variant from .git/rr-cache/.  Removing it
unconditionally is fine here, because if the user would have resolved
the conflict and ran rerere, the entry would no longer be in the
MERGE_RR file, so we wouldn't have this problem in the first place,
while if the conflict was not resolved.

Currently there is nothing left in this folder, as the 'preimage'
was already deleted by the 'handle_file' function, so 'remove_variant'
is a no-op.  Still call the function, to make sure we clean everything
up, in case we add some other files corresponding to a variant in the
future.

Note that other variants that have the same conflict ID will not be
touched.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 21 +++++++++++++++++++++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index da1ab54027..895ad80c0c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 8417e5a4b1..23f9c0ca45 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,25 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with unexpected conflict markers does not crash' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1 &&
+	echo "<<<<<<< a" >test &&
+	git rerere &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 07/11] rerere: only return whether a path has conflicts or not
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (5 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 06/11] rerere: fix crash with files rerere can't handle Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 08/11] rerere: factor out handle_conflict function Thomas Gummerer
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 895ad80c0c..bf803043e2 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -954,7 +955,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1008,11 +1009,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 08/11] rerere: factor out handle_conflict function
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (6 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 09/11] rerere: return strbuf from handle path Thomas Gummerer
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is in preparation for a subsequent commit,
where this function will be re-used.

Note that this does change the behaviour of 'git rerere' slightly.
Where previously we'd consider all files where an unmatched conflict
marker is found as invalid, we now only consider files invalid when
the "ours" conflict marker ("<<<<<<< <text>") is unmatched, not when
other conflict markers (e.g. "=======") is unmatched.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 87 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index bf803043e2..2d62251943 100644
--- a/rerere.c
+++ b/rerere.c
@@ -384,85 +384,92 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
+	int has_conflicts = -1;
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
+			break;
 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
-				goto bad;
+				break;
 			hunk = RR_ORIGINAL;
 		} else if (is_cmarker(buf.buf, '=', marker_size)) {
 			if (hunk != RR_SIDE_1 && hunk != RR_ORIGINAL)
-				goto bad;
+				break;
 			hunk = RR_SIDE_2;
 		} else if (is_cmarker(buf.buf, '>', marker_size)) {
 			if (hunk != RR_SIDE_2)
-				goto bad;
+				break;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
 			has_conflicts = 1;
-			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
 			rerere_io_putconflict('=', marker_size, io);
 			rerere_io_putmem(two.buf, two.len, io);
 			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			break;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(io, marker_size,
+							sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 09/11] rerere: return strbuf from handle path
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (7 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 08/11] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
  2018-08-05 17:20       ` [PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently we write the conflict to disk directly in the handle_path
function.  To make it re-usable for nested conflicts, instead of
writing the conflict out directly, store it in a strbuf and let the
caller write it out.

This does mean some slight increase in memory usage, however that
increase is limited to the size of the largest conflict we've
currently processed.  We already keep one copy of the conflict in
memory, and it shouldn't be too large, so the increase in memory usage
seems acceptable.

As a bonus this lets us get replace the rerere_io_putconflict function
with a trivial two line function.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 58 ++++++++++++++++++--------------------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index 2d62251943..a35b88916c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,7 +352,14 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
 	enum {
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
@@ -410,11 +385,11 @@ static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *c
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
 			has_conflicts = 1;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
 			if (ctx) {
 				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
@@ -451,21 +426,24 @@ static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *c
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, out = STRBUF_INIT;
 	int has_conflicts = 0;
 	if (sha1)
 		git_SHA1_Init(&ctx);
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			has_conflicts = handle_conflict(io, marker_size,
+			has_conflicts = handle_conflict(&out, io, marker_size,
 							sha1 ? &ctx : NULL);
 			if (has_conflicts < 0)
 				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
 		} else
 			rerere_io_putstr(buf.buf, io);
 	}
 	strbuf_release(&buf);
+	strbuf_release(&out);
 
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (8 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 09/11] rerere: return strbuf from handle path Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  2018-08-22 11:00         ` Ævar Arnfjörð Bjarmason
  2018-08-05 17:20       ` [PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
  10 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

Note that a conflict like this would only be produced if a user
commits a file with conflict markers, and gets a conflict including
that in a susbsequent operation.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be recorde as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as

    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place in the inner
conflict should have no practical impact, but it simplifies the
implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 42 ++++++++++++++++++++++++++++++
 rerere.c                           | 10 +++++--
 t/t4200-rerere.sh                  | 37 ++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index 3d10dbfa67..e65ba9b0c6 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -138,3 +138,45 @@ SHA1('B<NUL>C<NUL>').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Similar to simple conflicts, the conflict is first normalized by
+stripping the labels from conflict markers, stripping the common ancestor
+version, and the sorting the conflict hunks, both for the outer and the
+inner conflict.  This is done recursively, so any number of nested
+conflicts can be handled.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+    <<<<<<< HEAD
+    1
+    =======
+    <<<<<<< HEAD
+    3
+    =======
+    2
+    >>>>>>> branch-2
+    >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, and sorting
+the hunks, the conflict would look as follows:
+
+    <<<<<<<
+    1
+    =======
+    <<<<<<<
+    2
+    =======
+    3
+    >>>>>>>
+    >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
diff --git a/rerere.c b/rerere.c
index a35b88916c..f78bef80b1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = -1;
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			break;
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				break;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				break;
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 23f9c0ca45..afaf085e42 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -601,4 +601,41 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
 	git rerere clear
 '
 
+test_expect_success 'rerere with inner conflict markers' '
+	git reset --hard &&
+
+	git checkout -b A master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b B master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+
+	echo "resolved" >test &&
+	git add test &&
+	git commit -q -m "solved conflict" &&
+
+	echo "resolved" >expect &&
+
+	git reset --hard HEAD~~ &&
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+	cat test >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed
  2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
                         ` (9 preceding siblings ...)
  2018-08-05 17:20       ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-08-05 17:20       ` Thomas Gummerer
  10 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-05 17:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict, commits the results,
and does an operation which creates another conflict, rerere will use
the ID of the previously unresolved conflict for the new conflict.
This is because the conflict is kept in the MERGE_RR file, which
'rerere' reads every time it is invoked.

After the new conflict is solved, rerere will record the resolution
with the ID of the old conflict.  So in order to replay the conflict,
both merges would have to be re-done, instead of just the last one, in
order for rerere to be able to automatically resolve the conflict.

Instead of that, assign a new conflict ID if there are still conflicts
in a file and the file had conflicts at a previous step.  This ID
matches the conflict we actually resolved at the corresponding step.

Note that there are no backwards compatibility worries here, as rerere
would have failed to even normalize the conflict before this patch
series.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 7 +++----
 t/t4200-rerere.sh | 7 +++++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f78bef80b1..dd81d09e19 100644
--- a/rerere.c
+++ b/rerere.c
@@ -815,7 +815,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret, has_string;
+		int ret;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -823,12 +823,11 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		has_string = string_list_has_string(rr, path);
-		if (ret < 0 && has_string) {
+		if (ret != 0 && string_list_has_string(rr, path)) {
 			remove_variant(string_list_lookup(rr, path)->util);
 			string_list_remove(rr, path, 1);
 		}
-		if (ret < 1 || has_string)
+		if (ret < 1)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index afaf085e42..819f6dd672 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -635,6 +635,13 @@ test_expect_success 'rerere with inner conflict markers' '
 	git commit -q -m "will solve conflicts later" &&
 	test_must_fail git merge A &&
 	cat test >actual &&
+	test_cmp expect actual &&
+
+	git add test &&
+	git commit -m "rerere solved conflict" &&
+	git reset --hard HEAD~ &&
+	test_must_fail git merge A &&
+	cat test >actual &&
 	test_cmp expect actual
 '
 
-- 
2.18.0.720.gf7a957e2e7


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-05 17:20       ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-08-22 11:00         ` Ævar Arnfjörð Bjarmason
  2018-08-22 16:06           ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-22 11:00 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List, Junio C Hamano

On Sun, Aug 5, 2018 at 7:23 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:

Late reply since I just saw this in next.

> Currently rerere can't handle nested conflicts and will error out when
> it encounters such conflicts.  Do that by recursively calling the
> 'handle_conflict' function to normalize the conflict.
> [...]

Makes sense.

> --- a/Documentation/technical/rerere.txt
> +++ b/Documentation/technical/rerere.txt

But why not add this to the git-rerere manpage? These technical docs
get way less exposure, and in this case we're not describing some
interna implementation detail, which the technical docs are for, but
something that's user-visible, let's put that in  the user-visiblee
docs.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-22 11:00         ` Ævar Arnfjörð Bjarmason
@ 2018-08-22 16:06           ` Junio C Hamano
  2018-08-22 20:34             ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-08-22 16:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Thomas Gummerer, Git Mailing List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But why not add this to the git-rerere manpage? These technical docs
> get way less exposure, and in this case we're not describing some
> interna implementation detail, which the technical docs are for, but
> something that's user-visible, let's put that in  the user-visiblee
> docs.

I actually consider that the documentation describes low-level
internal implementation detail, which the end users do not care nor
need to know in order to make use of "rerere".  How would it help
the end-users to know that the common ancestor portion of diff3
style conflict does not participate in conflict identification,
sides of conflicts sometimes get swapped for easier indexing of
conflicts, or conflict shapes are hashed via SHA-1 to determine
which subdirectory of $GIT_DIR/rr-cache/ to use to store it, etc.?

By the way, I just noticed that what the last section (i.e. nested
conflicts) says is completely bogus.  Nested conflicts are handled
by lengthening markers for conflict in inner-merge and paying
attention only to the outermost merge.  The only case where the
conflict markers can appear in the way depicted in the section is
when the contents from branches being merged had these conflict
marker looking strings from the beginning---that's "doctor it hurts
when I do this---don't do it then" situation.  The section may
describe correctly what the code happens to do when it gets thrown
such a garbage at, but I do not think it is a useful piece of
information about a designed behaviour.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-22 16:06           ` Junio C Hamano
@ 2018-08-22 20:34             ` Thomas Gummerer
  2018-08-22 21:07               ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-22 20:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On 08/22, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
> > But why not add this to the git-rerere manpage? These technical docs
> > get way less exposure, and in this case we're not describing some
> > interna implementation detail, which the technical docs are for, but
> > something that's user-visible, let's put that in  the user-visiblee
> > docs.
> 
> I actually consider that the documentation describes low-level
> internal implementation detail, which the end users do not care nor
> need to know in order to make use of "rerere".  How would it help
> the end-users to know that the common ancestor portion of diff3
> style conflict does not participate in conflict identification,
> sides of conflicts sometimes get swapped for easier indexing of
> conflicts, or conflict shapes are hashed via SHA-1 to determine
> which subdirectory of $GIT_DIR/rr-cache/ to use to store it, etc.?

Agreed, I don't think this would be very helpful for users.

> By the way, I just noticed that what the last section (i.e. nested
> conflicts) says is completely bogus.  Nested conflicts are handled
> by lengthening markers for conflict in inner-merge and paying
> attention only to the outermost merge.  The only case where the
> conflict markers can appear in the way depicted in the section is
> when the contents from branches being merged had these conflict
> marker looking strings from the beginning---that's "doctor it hurts
> when I do this---don't do it then" situation.  The section may
> describe correctly what the code happens to do when it gets thrown
> such a garbage at, but I do not think it is a useful piece of
> information about a designed behaviour.

Hmm, it does describe what happens in the code, which is what this
patch implements.  Maybe we should rephrase the title here?

Or are you suggesting dropping this patch (and the next one)
completely, as we don't want to try and handle the case where this
kind of garbage is thrown at 'rerere'?  I don't think it would make
sense to drop this documentation without dropping the patch itself, as
it does document how rerere handles this case.  Without this bit of
documentation (but with the code in this patch), the technical
'rerere' documentation feels incomplete to me.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-22 20:34             ` Thomas Gummerer
@ 2018-08-22 21:07               ` Junio C Hamano
  2018-08-24 21:56                 ` Thomas Gummerer
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-08-22 21:07 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Hmm, it does describe what happens in the code, which is what this
> patch implements.  Maybe we should rephrase the title here?
>
> Or are you suggesting dropping this patch (and the next one)
> completely, as we don't want to try and handle the case where this
> kind of garbage is thrown at 'rerere'?

I consider these two patches as merely attempting to punt a bit
better.  Once users start committing conflict-marker-looking lines
in the contents, and getting them involved in actual conflicts, I do
not think any approach (including what the original rerere uses
before this patch) that assumes the markers will neatly form set of
blocks of text enclosed in << == >> will reliably step around such
broken contents.  E.g. it is entirely conceivable both branches have
the <<< beginning of conflict marker plus contents from the HEAD
before they recorded the marker that are identical, that diverge as
you scan the text down and get closer to ===, something like:

        side A                  side B
        --------------------    --------------------

        shared                  shared
        <<<<<<<                 <<<<<<<
        version before          version before
        these guys merged       these guys merged
        their ancestor          their ancestor
        versions                versions.
        but some                now some
        lines are different     lines are different
        =======                 ========
        and other               totally different
        contents                contents
        ...                     ...

And a merge of these may make <<< part shared (i.e. outside the
conflicted region) while lines near and below ==== part of conflict,
which would give us something like

        merge of side A & B
        -------------------

        shared                  
        <<<<<<<                 (this is part of contents)
        version before          
        these guys merged       
        their ancestor          
        <<<<<<< HEAD            (conflict marker)
        versions
        but some
        lines are different
        =======                 (this is part of contents)
        and other
        contents
        ...
        =======                 (conflict marker)
        versions.
        now some
        lines are different
        =======                 (this is part of contents)
        totally different
        contents
        ...
        >>>>>>> theirs          (conflict marker)

Depending on the shape of the original conflict that was committed,
we may have two versions of <<<, together with the real conflict
marker, but shared closing >>> marker.  With contents like that,
there is no way for us to split these lines into two groups at a
line '=====' (which one?) and swap to come up with the normalized
shape.

The original rerere algorithm would punt when such an unmatched
markers are found, and deals with "nested conflict" situation by
avoiding to create such a thing altogether.  I am sure your two
patches may make the code punt less, but I suspect that is not a
foolproof "solution" but more of a workaround, as I do not think it
is solvable, once you allow users to commit conflict-marker looking
strings in contents.  As the heuristics used in such a workaround
are very likely to change, and something the end-users should not
even rely on, I'd rather not document and promise the exact
behaviour---perhaps we should stress "don't do that" even stronger
instead.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-22 21:07               ` Junio C Hamano
@ 2018-08-24 21:56                 ` Thomas Gummerer
  2018-08-24 22:10                   ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
                                     ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-24 21:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On 08/22, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Hmm, it does describe what happens in the code, which is what this
> > patch implements.  Maybe we should rephrase the title here?
> >
> > Or are you suggesting dropping this patch (and the next one)
> > completely, as we don't want to try and handle the case where this
> > kind of garbage is thrown at 'rerere'?
> 
> I consider these two patches as merely attempting to punt a bit
> better.  Once users start committing conflict-marker-looking lines
> in the contents, and getting them involved in actual conflicts, I do
> not think any approach (including what the original rerere uses
> before this patch) that assumes the markers will neatly form set of
> blocks of text enclosed in << == >> will reliably step around such
> broken contents.  E.g. it is entirely conceivable both branches have
> the <<< beginning of conflict marker plus contents from the HEAD
> before they recorded the marker that are identical, that diverge as
> you scan the text down and get closer to ===, something like:
> 
>         side A                  side B
>         --------------------    --------------------
> 
>         shared                  shared
>         <<<<<<<                 <<<<<<<
>         version before          version before
>         these guys merged       these guys merged
>         their ancestor          their ancestor
>         versions                versions.
>         but some                now some
>         lines are different     lines are different
>         =======                 ========
>         and other               totally different
>         contents                contents
>         ...                     ...
> 
> And a merge of these may make <<< part shared (i.e. outside the
> conflicted region) while lines near and below ==== part of conflict,
> which would give us something like
> 
>         merge of side A & B
>         -------------------
> 
>         shared                  
>         <<<<<<<                 (this is part of contents)
>         version before          
>         these guys merged       
>         their ancestor          
>         <<<<<<< HEAD            (conflict marker)
>         versions
>         but some
>         lines are different
>         =======                 (this is part of contents)
>         and other
>         contents
>         ...
>         =======                 (conflict marker)
>         versions.
>         now some
>         lines are different
>         =======                 (this is part of contents)
>         totally different
>         contents
>         ...
>         >>>>>>> theirs          (conflict marker)
> 
> Depending on the shape of the original conflict that was committed,
> we may have two versions of <<<, together with the real conflict
> marker, but shared closing >>> marker.  With contents like that,
> there is no way for us to split these lines into two groups at a
> line '=====' (which one?) and swap to come up with the normalized
> shape.
> 
> The original rerere algorithm would punt when such an unmatched
> markers are found, and deals with "nested conflict" situation by
> avoiding to create such a thing altogether.  I am sure your two
> patches may make the code punt less, but I suspect that is not a
> foolproof "solution" but more of a workaround, as I do not think it
> is solvable, once you allow users to commit conflict-marker looking
> strings in contents.

Agreed.  I think it may be solvable if we'd actually get the
information about what belongs to which side from the merge algorithm
directly.  But that sounds way more involved than what I'm able to
commit to for something that I don't forsee running into myself :)

>                       As the heuristics used in such a workaround
> are very likely to change, and something the end-users should not
> even rely on, I'd rather not document and promise the exact
> behaviour---perhaps we should stress "don't do that" even stronger
> instead.

Fair enough.  I thought of the technical documentation as something
that doesn't promise users anything, but rather describes how the
internals work right now, which is what this bit of documentation
attempted to write down.  But if we are worried about this giving end
users ideas then I definitely agree and we should get rid of this bit
of documentation.  I'll send a patch for that, and for adding a note
about "don't do that" in the man page.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 1/2] rerere: remove documentation for "nested conflicts"
  2018-08-24 21:56                 ` Thomas Gummerer
@ 2018-08-24 22:10                   ` Thomas Gummerer
  2018-08-24 22:10                     ` [PATCH 2/2] rerere: add not about files with existing conflict markers Thomas Gummerer
  2018-08-28 21:27                     ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
  2018-08-27 17:33                   ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Junio C Hamano
  2018-08-27 19:36                   ` Junio C Hamano
  2 siblings, 2 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-24 22:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Thomas Gummerer

4af32207bc ("rerere: teach rerere to handle nested conflicts",
2018-08-05) introduced slightly better behaviour if the user commits
conflict markers and then gets another conflict in 'git rerere'.
However this is just a heuristic to punt on such conflicts better, and
the documentation might be misleading to users, in case we change the
heuristic in the future.

Remove this documentation to avoid being potentially misleading in the
documentation.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

The original series already made it into 'next', so these patches are
on top of that.  I also see it is marked as "will merge to master" in
the "What's cooking" email, so these two patches would be on top of
that.  If you are not planning to merge the series down to master
before 2.19, we could squash this into 10/11, otherwise I'm happy with
the patches on top.

 Documentation/technical/rerere.txt | 42 ------------------------------
 1 file changed, 42 deletions(-)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index e65ba9b0c6..3d10dbfa67 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -138,45 +138,3 @@ SHA1('B<NUL>C<NUL>').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.
-
-Nested conflicts
-~~~~~~~~~~~~~~~~
-
-Nested conflicts are handled very similarly to "simple" conflicts.
-Similar to simple conflicts, the conflict is first normalized by
-stripping the labels from conflict markers, stripping the common ancestor
-version, and the sorting the conflict hunks, both for the outer and the
-inner conflict.  This is done recursively, so any number of nested
-conflicts can be handled.
-
-The only difference is in how the conflict ID is calculated.  For the
-inner conflict, the conflict markers themselves are not stripped out
-before calculating the sha1.
-
-Say we have the following conflict for example:
-
-    <<<<<<< HEAD
-    1
-    =======
-    <<<<<<< HEAD
-    3
-    =======
-    2
-    >>>>>>> branch-2
-    >>>>>>> branch-3~
-
-After stripping out the labels of the conflict markers, and sorting
-the hunks, the conflict would look as follows:
-
-    <<<<<<<
-    1
-    =======
-    <<<<<<<
-    2
-    =======
-    3
-    >>>>>>>
-    >>>>>>>
-
-and finally the conflict ID would be calculated as:
-`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
-- 
2.18.0.1088.ge017bf2cd1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 2/2] rerere: add not about files with existing conflict markers
  2018-08-24 22:10                   ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
@ 2018-08-24 22:10                     ` Thomas Gummerer
  2018-08-28 21:27                     ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
  1 sibling, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-24 22:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Thomas Gummerer

When a file contains lines that look like conflict markers, 'git
rerere' may fail not be able to record a conflict resolution.
Emphasize that in the man page.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

Not sure if there may be a better place in the man page for this, but
this is the best I could come up with.

 Documentation/git-rerere.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-rerere.txt b/Documentation/git-rerere.txt
index 031f31fa47..036ea11528 100644
--- a/Documentation/git-rerere.txt
+++ b/Documentation/git-rerere.txt
@@ -211,6 +211,12 @@ would conflict the same way as the test merge you resolved earlier.
 'git rerere' will be run by 'git rebase' to help you resolve this
 conflict.
 
+[NOTE]
+'git rerere' relies on the conflict markers in the file to detect the
+conflict.  If the file already contains lines that look the same as
+lines with conflict markers, 'git rerere' may fail to record a
+conflict resolution.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
2.18.0.1088.ge017bf2cd1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-24 21:56                 ` Thomas Gummerer
  2018-08-24 22:10                   ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
@ 2018-08-27 17:33                   ` Junio C Hamano
  2018-08-28 22:05                     ` Thomas Gummerer
  2018-08-27 19:36                   ` Junio C Hamano
  2 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-08-27 17:33 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Agreed.  I think it may be solvable if we'd actually get the
> information about what belongs to which side from the merge algorithm
> directly.

The merge machinery may (eh, rather, "does") know, but we do not
have a way to express that in the working tree file that becomes the
input to the rerere algorithm, without making backward-incompatible
changes to the output format.

In a sense, that is already a solved problem, even though the
solution was done a bit differently ;-) If the end users need to
commit a half-resolved result with conflict markers (perhaps they
want to share it among themselves and work on resolving further),
what they can do is to also say that these are now part of contents,
not conflict markers, with conflict-marker-size attribute.  Perhaps
they prepare such a half-resolved result with unusual value of the
attribute, so that later merge of these with standard conflict
marker size will not get confused.

That reminds me of another thing.  I've been running with these in
my $GIT_DIR/info/attributes file for the past few years.  Perhaps we
should add them to Documentation/.gitattributes and t/.gitattributes
so that project participants would all benefit?

Documentation/git-merge.txt	conflict-marker-size=32
Documentation/user-manual.txt	conflict-marker-size=32
t/t????-*.sh			conflict-marker-size=32

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-24 21:56                 ` Thomas Gummerer
  2018-08-24 22:10                   ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
  2018-08-27 17:33                   ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Junio C Hamano
@ 2018-08-27 19:36                   ` Junio C Hamano
  2 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2018-08-27 19:36 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Fair enough.  I thought of the technical documentation as something
> that doesn't promise users anything, but rather describes how the
> internals work right now, which is what this bit of documentation
> attempted to write down.

That's fine.  I'd rather keep it but perhaps add a reminder to tell
readers that it works only when the merging of contents that already
records with nested conflict markers happen to "cleanly nest".

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers
  2018-08-24 22:10                   ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
  2018-08-24 22:10                     ` [PATCH 2/2] rerere: add not about files with existing conflict markers Thomas Gummerer
@ 2018-08-28 21:27                     ` Thomas Gummerer
  2018-08-28 21:27                       ` [PATCH v2 2/2] rerere: add note about files with existing " Thomas Gummerer
  2018-08-29 16:04                       ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Junio C Hamano
  1 sibling, 2 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-28 21:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Thomas Gummerer

4af3220 ("rerere: teach rerere to handle nested conflicts",
2018-08-05) introduced slightly better behaviour if the user commits
conflict markers and then gets another conflict in 'git rerere'.

However this is just a heuristic to punt on such conflicts better, and
doesn't deal with any unmatched conflict markers.  Make that clearer
in the documentation.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

> That's fine.  I'd rather keep it but perhaps add a reminder to tell
> readers that it works only when the merging of contents that already
> records with nested conflict markers happen to "cleanly nest".

Yeah that makes sense.  Maybe something like this?

(replying to <xmqq4lffk3ez.fsf@gitster-ct.c.googlers.com> here to keep
the patches in one thread)

 Documentation/technical/rerere.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index e65ba9b0c6..8fefe51b00 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for the outer and the
 inner conflict.  This is done recursively, so any number of nested
 conflicts can be handled.
 
+Note that this only works for conflict markers that "cleanly nest".  If
+there are any unmatched conflict markers, rerere will fail to handle
+the conflict and record a conflict resolution.
+
 The only difference is in how the conflict ID is calculated.  For the
 inner conflict, the conflict markers themselves are not stripped out
 before calculating the sha1.
-- 
2.18.0.1088.ge017bf2cd1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 2/2] rerere: add note about files with existing conflict markers
  2018-08-28 21:27                     ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
@ 2018-08-28 21:27                       ` Thomas Gummerer
  2018-08-29 16:04                       ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-28 21:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Thomas Gummerer

When a file contains lines that look like conflict markers, 'git
rerere' may fail not be able to record a conflict resolution.
Emphasize that in the man page, and mention a possible workaround for
the issue.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

Compared to v1, this now mentions the workaround of setting the
'conflict-marker-size', as mentioned in
<xmqqk1oblnor.fsf@gitster-ct.c.googlers.com>

 Documentation/git-rerere.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-rerere.txt b/Documentation/git-rerere.txt
index 031f31fa47..df310d2a58 100644
--- a/Documentation/git-rerere.txt
+++ b/Documentation/git-rerere.txt
@@ -211,6 +211,12 @@ would conflict the same way as the test merge you resolved earlier.
 'git rerere' will be run by 'git rebase' to help you resolve this
 conflict.
 
+[NOTE] 'git rerere' relies on the conflict markers in the file to
+detect the conflict.  If the file already contains lines that look the
+same as lines with conflict markers, 'git rerere' may fail to record a
+conflict resolution.  To work around this, the `conflict-marker-size`
+setting in linkgit:gitattributes[5] can be used.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
2.18.0.1088.ge017bf2cd1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts
  2018-08-27 17:33                   ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Junio C Hamano
@ 2018-08-28 22:05                     ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-08-28 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On 08/27, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Agreed.  I think it may be solvable if we'd actually get the
> > information about what belongs to which side from the merge algorithm
> > directly.
> 
> The merge machinery may (eh, rather, "does") know, but we do not
> have a way to express that in the working tree file that becomes the
> input to the rerere algorithm, without making backward-incompatible
> changes to the output format.

Right, I was more thinking along the lines of using the stages in the
index to redo the merge and get the information that way.  But that
may not work as well with using 'git rerere' from the command line,
and have other backwards compatibility woes, that I didn't quite think
through yet :)

> In a sense, that is already a solved problem, even though the
> solution was done a bit differently ;-) If the end users need to
> commit a half-resolved result with conflict markers (perhaps they
> want to share it among themselves and work on resolving further),
> what they can do is to also say that these are now part of contents,
> not conflict markers, with conflict-marker-size attribute.  Perhaps
> they prepare such a half-resolved result with unusual value of the
> attribute, so that later merge of these with standard conflict
> marker size will not get confused.

Right, I wasn't aware of the conflict-marker-size attribute.  Thanks
for mentioning it!

> That reminds me of another thing.  I've been running with these in
> my $GIT_DIR/info/attributes file for the past few years.  Perhaps we
> should add them to Documentation/.gitattributes and t/.gitattributes
> so that project participants would all benefit?
> 
> Documentation/git-merge.txt	conflict-marker-size=32
> Documentation/user-manual.txt	conflict-marker-size=32
> t/t????-*.sh			conflict-marker-size=32

I do think that would be a good idea.  I am wondering what the right
value is though.  Seeing such a long conflict marker before I knew
about this setting would have struck me as odd, and probably made me
try and track down where it is coming from.  But on the other hand it
makes the conflict markers very easy to tell apart from the rest of
the lines that kind of look like conflict markers.

I think these tradeoffs probably make it worth setting them to a value
this large.

One other file that I see needs such a treatment is
Documentation/gitk.txt, where the first header is 7 "="s, and
therefore could confuse 'git rerere' as well.  Arguably that's less
important, as there's unlikely to be a conflict containing that line,
but it may be worth including for completeness sake.

Maybe something like this?  Though it may be good for others to chime
in if they find this helpful or whether they find the long conflict
markers distracting.

--- >8 ---
Subject: [PATCH] .gitattributes: add conflict-marker-size for relevant files

Some files in git.git contain lines that look like conflict markers,
either in examples or tests, or in the case of Documentation/gitk.txt
because of the asciidoc heading.

Having conflict markers the same length as the actual content can be
confusing for humans, and is impossible to handle for tools like 'git
rerere'.  Work around that by setting the 'conflict-marker-size'
attribute for those files to 32, which makes the conflict markers
unambiguous.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 .gitattributes | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/.gitattributes b/.gitattributes
index 1bdc91e282..49b3051641 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -9,3 +9,7 @@
 /command-list.txt eol=lf
 /GIT-VERSION-GEN eol=lf
 /mergetools/* eol=lf
+/Documentation/git-merge.txt conflict-marker-size=32
+/Documentation/gitk.txt conflict-marker-size=32
+/Documentation/user-manual.txt conflict-marker-size=32
+/t/t????-*.sh conflict-marker-size=32
-- 
2.18.0.1088.ge017bf2cd1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers
  2018-08-28 21:27                     ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
  2018-08-28 21:27                       ` [PATCH v2 2/2] rerere: add note about files with existing " Thomas Gummerer
@ 2018-08-29 16:04                       ` Junio C Hamano
  2018-09-01  9:00                         ` Thomas Gummerer
  1 sibling, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2018-08-29 16:04 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Ævar Arnfjörð Bjarmason

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Yeah that makes sense.  Maybe something like this?
>
> (replying to <xmqq4lffk3ez.fsf@gitster-ct.c.googlers.com> here to keep
> the patches in one thread)
>
>  Documentation/technical/rerere.txt | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
> index e65ba9b0c6..8fefe51b00 100644
> --- a/Documentation/technical/rerere.txt
> +++ b/Documentation/technical/rerere.txt
> @@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for the outer and the
>  inner conflict.  This is done recursively, so any number of nested
>  conflicts can be handled.
>  
> +Note that this only works for conflict markers that "cleanly nest".  If
> +there are any unmatched conflict markers, rerere will fail to handle
> +the conflict and record a conflict resolution.
> +
>  The only difference is in how the conflict ID is calculated.  For the
>  inner conflict, the conflict markers themselves are not stripped out
>  before calculating the sha1.

Looks good to me except for the line count on the @@ line.  The
preimage ought to have 6 (not 7) lines and adding 4 new lines makes
it a 10 line postimage.  I wonder who miscounted the hunk---it is
immediately followed by the signature cut mark "-- \n" and some
tools (including Emacs's patch editing mode) are known to
misinterpret it as a preimage line that was removed.

What is curious is that your 2/2 counts the preimage lines
correctly.

In any case, both patches look good.  Will apply.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers
  2018-08-29 16:04                       ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Junio C Hamano
@ 2018-09-01  9:00                         ` Thomas Gummerer
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Gummerer @ 2018-09-01  9:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð Bjarmason

On 08/29, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Yeah that makes sense.  Maybe something like this?
> >
> > (replying to <xmqq4lffk3ez.fsf@gitster-ct.c.googlers.com> here to keep
> > the patches in one thread)
> >
> >  Documentation/technical/rerere.txt | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
> > index e65ba9b0c6..8fefe51b00 100644
> > --- a/Documentation/technical/rerere.txt
> > +++ b/Documentation/technical/rerere.txt
> > @@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for the outer and the
> >  inner conflict.  This is done recursively, so any number of nested
> >  conflicts can be handled.
> >  
> > +Note that this only works for conflict markers that "cleanly nest".  If
> > +there are any unmatched conflict markers, rerere will fail to handle
> > +the conflict and record a conflict resolution.
> > +
> >  The only difference is in how the conflict ID is calculated.  For the
> >  inner conflict, the conflict markers themselves are not stripped out
> >  before calculating the sha1.
> 
> Looks good to me except for the line count on the @@ line.  The
> preimage ought to have 6 (not 7) lines and adding 4 new lines makes
> it a 10 line postimage.  I wonder who miscounted the hunk---it is
> immediately followed by the signature cut mark "-- \n" and some
> tools (including Emacs's patch editing mode) are known to
> misinterpret it as a preimage line that was removed.

Sorry about that.  Yeah Emacs's patch editing mode doing that would
explain it.  I did a round of proof-reading in my editor, and spotted
a typo.  Since it was trivial to fix I just edited the patch
directly, and Emacs changed the line count.  Sorry about that, I'll be
more careful about this in the future.

> What is curious is that your 2/2 counts the preimage lines
> correctly.

I only added some text after the '---' line in 2/2, but did not edit
the patch directly.  Emacs's patch editing mode only seems to change
the line numbers of the patch that's being edited, not if anything
surrounding that is changed, so the line count stayed the same as what
format-patch put in the file in the first place.

> In any case, both patches look good.  Will apply.

Thanks!

> Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2018-09-01  9:05 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
2018-05-21 19:00   ` Stefan Beller
2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
2018-05-24  7:20   ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
2018-05-24  9:20   ` Junio C Hamano
2018-06-03 11:41     ` Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-05-24  9:47   ` Junio C Hamano
2018-05-24 18:54     ` Thomas Gummerer
2018-05-25  1:20       ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-05-24 10:02   ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-05-24 10:21   ` Junio C Hamano
2018-05-24 19:07     ` Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
2018-07-06 17:56     ` Junio C Hamano
2018-07-10 21:37       ` Thomas Gummerer
2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
2018-07-15 13:24       ` Simon Ruderich
2018-07-16 20:40         ` Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
2018-07-30 17:50       ` Junio C Hamano
2018-07-30 20:21         ` Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-07-30 17:50       ` Junio C Hamano
2018-07-30 20:45         ` Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-07-30 17:50       ` Junio C Hamano
2018-07-30 20:47         ` Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-07-30 17:51       ` Junio C Hamano
2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-07-30 17:51       ` Junio C Hamano
2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-07-30 17:45       ` Junio C Hamano
2018-07-30 20:20         ` Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-30 17:50     ` [PATCH v3 00/11] rerere: handle nested conflicts Junio C Hamano
2018-07-30 20:49       ` Thomas Gummerer
2018-08-05 17:20     ` [PATCH v4 " Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 02/11] rerere: lowercase error messages Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 04/11] rerere: mark strings for translation Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 06/11] rerere: fix crash with files rerere can't handle Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-08-05 17:20       ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-08-22 11:00         ` Ævar Arnfjörð Bjarmason
2018-08-22 16:06           ` Junio C Hamano
2018-08-22 20:34             ` Thomas Gummerer
2018-08-22 21:07               ` Junio C Hamano
2018-08-24 21:56                 ` Thomas Gummerer
2018-08-24 22:10                   ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
2018-08-24 22:10                     ` [PATCH 2/2] rerere: add not about files with existing conflict markers Thomas Gummerer
2018-08-28 21:27                     ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
2018-08-28 21:27                       ` [PATCH v2 2/2] rerere: add note about files with existing " Thomas Gummerer
2018-08-29 16:04                       ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Junio C Hamano
2018-09-01  9:00                         ` Thomas Gummerer
2018-08-27 17:33                   ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Junio C Hamano
2018-08-28 22:05                     ` Thomas Gummerer
2018-08-27 19:36                   ` Junio C Hamano
2018-08-05 17:20       ` [PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).