All of lore.kernel.org
 help / color / mirror / Atom feed
* combined diff does not detect binary files and ignores -diff attribute
@ 2011-05-22 20:12 Jay Soffian
  2011-05-23 13:30 ` Michael J Gruber
  0 siblings, 1 reply; 37+ messages in thread
From: Jay Soffian @ 2011-05-22 20:12 UTC (permalink / raw)
  To: git

In a merge commit, I added some PNGs (so they were new to both
parents). I was surprised when I did a "git show" on the commit and
had the binary bits spewed to the terminal.

I thought it was just git not detecting the PNG as a binary file, but
adding "*.png -diff" to .git/info/attributes made no difference.

Just reporting this for now, as I don't have time to investigate.

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-22 20:12 combined diff does not detect binary files and ignores -diff attribute Jay Soffian
@ 2011-05-23 13:30 ` Michael J Gruber
  2011-05-23 15:17   ` Jay Soffian
  0 siblings, 1 reply; 37+ messages in thread
From: Michael J Gruber @ 2011-05-23 13:30 UTC (permalink / raw)
  To: Jay Soffian; +Cc: git

Jay Soffian venit, vidit, dixit 22.05.2011 22:12:
> In a merge commit, I added some PNGs (so they were new to both
> parents). I was surprised when I did a "git show" on the commit and
> had the binary bits spewed to the terminal.
> 
> I thought it was just git not detecting the PNG as a binary file, but
> adding "*.png -diff" to .git/info/attributes made no difference.
> 
> Just reporting this for now, as I don't have time to investigate.
> 
> j.

We know, but we somehow got stuck, see:

http://permalink.gmane.org/gmane.comp.version-control.git/171613

I don't have the time to follow up on this currently, it got out of
proportion.

Michael

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 13:30 ` Michael J Gruber
@ 2011-05-23 15:17   ` Jay Soffian
  2011-05-23 17:07     ` Junio C Hamano
  2011-05-23 18:11     ` Jeff King
  0 siblings, 2 replies; 37+ messages in thread
From: Jay Soffian @ 2011-05-23 15:17 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git

On Mon, May 23, 2011 at 9:30 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> Jay Soffian venit, vidit, dixit 22.05.2011 22:12:
>> In a merge commit, I added some PNGs (so they were new to both
>> parents). I was surprised when I did a "git show" on the commit and
>> had the binary bits spewed to the terminal.
>>
>> I thought it was just git not detecting the PNG as a binary file, but
>> adding "*.png -diff" to .git/info/attributes made no difference.
>>
>> Just reporting this for now, as I don't have time to investigate.
>>
>> j.
>
> We know, but we somehow got stuck, see:
>
> http://permalink.gmane.org/gmane.comp.version-control.git/171613
>
> I don't have the time to follow up on this currently, it got out of
> proportion.

Drat, seems like the perfect being the enemy of the good there. I
wonder if Junio's patch in that thread isn't good enough for now.

Thanks,

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 15:17   ` Jay Soffian
@ 2011-05-23 17:07     ` Junio C Hamano
  2011-05-23 18:11     ` Jeff King
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2011-05-23 17:07 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git

Here is a quick summary of what I have in the "talked about, sounded
interesting, and never got completed" list.


gmane=http://thread.gmane.org/gmane.comp.version-control.git/

* Teach pack protocol to transfer estimated pack size and history
  depth to allow receiving end make more intelligent decision between
  unpack-objects and index-pack.

  $gmane/173610

* The "combined" diff always assumes it deals with text files. Teach it
  to punt on binary and also use the textconv.

  $gname/171613

* Audit use of symbolic-ref without -m in our scripts and for each
  case decide if leaving a reflog entry for the HEAD is desirable.
  If so, add them.

  $gmane/172516

* "git status" on intent-to-add index entries (say "I" in the first
  column instead of "A" for short status, add "(needs 'git add')" at the
  end of "new file: $path " in long status).

  $gmane/170658

* synopsys: use {} instead of () for grouping alternatives (Jari Aalto)
  $gmane/72243

* "[alias] st = status" and "cd .git && git st" (Jeff King)
  $gmane/72327

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 15:17   ` Jay Soffian
  2011-05-23 17:07     ` Junio C Hamano
@ 2011-05-23 18:11     ` Jeff King
  2011-05-23 20:15       ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff King @ 2011-05-23 18:11 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git

On Mon, May 23, 2011 at 11:17:14AM -0400, Jay Soffian wrote:

> > We know, but we somehow got stuck, see:
> >
> > http://permalink.gmane.org/gmane.comp.version-control.git/171613
> >
> > I don't have the time to follow up on this currently, it got out of
> > proportion.
> 
> Drat, seems like the perfect being the enemy of the good there. I
> wonder if Junio's patch in that thread isn't good enough for now.

I'll have a patch series in a few minutes that at least handles the
binary case. I'll see how painful the textconv bit is on top of that.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 18:11     ` Jeff King
@ 2011-05-23 20:15       ` Jeff King
  2011-05-23 20:16         ` [PATCH 1/5] combine-diff: split header printing into its own function Jeff King
                           ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 20:15 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

On Mon, May 23, 2011 at 02:11:47PM -0400, Jeff King wrote:

> On Mon, May 23, 2011 at 11:17:14AM -0400, Jay Soffian wrote:
> 
> > > We know, but we somehow got stuck, see:
> > >
> > > http://permalink.gmane.org/gmane.comp.version-control.git/171613
> > >
> > > I don't have the time to follow up on this currently, it got out of
> > > proportion.
> > 
> > Drat, seems like the perfect being the enemy of the good there. I
> > wonder if Junio's patch in that thread isn't good enough for now.
> 
> I'll have a patch series in a few minutes that at least handles the
> binary case. I'll see how painful the textconv bit is on top of that.

It turned out not too bad:

  [1/5]: combine-diff: split header printing into its own function
  [2/5]: combine-diff: calculate mode_differs earlier
  [3/5]: combine-diff: handle binary files as binary
  [4/5]: refactor get_textconv to not require diff_filespec
  [5/5]: combine-diff: respect textconv attributes

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/5] combine-diff: split header printing into its own function
  2011-05-23 20:15       ` Jeff King
@ 2011-05-23 20:16         ` Jeff King
  2011-05-23 20:16         ` [PATCH 2/5] combine-diff: calculate mode_differs earlier Jeff King
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 20:16 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

This is a pretty big logical chunk, so it makes the function
a bit more readable to have it split out. In addition, it
will make it easier to add an alternate code path for binary
diffs in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c |  135 ++++++++++++++++++++++++++++++-------------------------
 1 files changed, 74 insertions(+), 61 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 655fa89..309dc6c 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -681,6 +681,78 @@ static void dump_quoted_path(const char *head,
 	puts(buf.buf);
 }
 
+static void show_combined_header(struct combine_diff_path *elem,
+				 int num_parent,
+				 int dense,
+				 struct rev_info *rev,
+				 int mode_differs)
+{
+	struct diff_options *opt = &rev->diffopt;
+	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? 40 : DEFAULT_ABBREV;
+	const char *a_prefix = opt->a_prefix ? opt->a_prefix : "a/";
+	const char *b_prefix = opt->b_prefix ? opt->b_prefix : "b/";
+	int use_color = DIFF_OPT_TST(opt, COLOR_DIFF);
+	const char *c_meta = diff_get_color(use_color, DIFF_METAINFO);
+	const char *c_reset = diff_get_color(use_color, DIFF_RESET);
+	const char *abb;
+	int added = 0;
+	int deleted = 0;
+	int i;
+
+	if (rev->loginfo && !rev->no_commit_id)
+		show_log(rev);
+
+	dump_quoted_path(dense ? "diff --cc " : "diff --combined ",
+			 "", elem->path, c_meta, c_reset);
+	printf("%sindex ", c_meta);
+	for (i = 0; i < num_parent; i++) {
+		abb = find_unique_abbrev(elem->parent[i].sha1,
+					 abbrev);
+		printf("%s%s", i ? "," : "", abb);
+	}
+	abb = find_unique_abbrev(elem->sha1, abbrev);
+	printf("..%s%s\n", abb, c_reset);
+
+	if (mode_differs) {
+		deleted = !elem->mode;
+
+		/* We say it was added if nobody had it */
+		added = !deleted;
+		for (i = 0; added && i < num_parent; i++)
+			if (elem->parent[i].status !=
+			    DIFF_STATUS_ADDED)
+				added = 0;
+		if (added)
+			printf("%snew file mode %06o",
+			       c_meta, elem->mode);
+		else {
+			if (deleted)
+				printf("%sdeleted file ", c_meta);
+			printf("mode ");
+			for (i = 0; i < num_parent; i++) {
+				printf("%s%06o", i ? "," : "",
+				       elem->parent[i].mode);
+			}
+			if (elem->mode)
+				printf("..%06o", elem->mode);
+		}
+		printf("%s\n", c_reset);
+	}
+
+	if (added)
+		dump_quoted_path("--- ", "", "/dev/null",
+				 c_meta, c_reset);
+	else
+		dump_quoted_path("--- ", a_prefix, elem->path,
+				 c_meta, c_reset);
+	if (deleted)
+		dump_quoted_path("+++ ", "", "/dev/null",
+				 c_meta, c_reset);
+	else
+		dump_quoted_path("+++ ", b_prefix, elem->path,
+				 c_meta, c_reset);
+}
+
 static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			    int dense, struct rev_info *rev)
 {
@@ -692,13 +764,9 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	int mode_differs = 0;
 	int i, show_hunks;
 	int working_tree_file = is_null_sha1(elem->sha1);
-	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? 40 : DEFAULT_ABBREV;
-	const char *a_prefix, *b_prefix;
 	mmfile_t result_file;
 
 	context = opt->context;
-	a_prefix = opt->a_prefix ? opt->a_prefix : "a/";
-	b_prefix = opt->b_prefix ? opt->b_prefix : "b/";
 
 	/* Read the result of merge first */
 	if (!working_tree_file)
@@ -832,63 +900,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	show_hunks = make_hunks(sline, cnt, num_parent, dense);
 
 	if (show_hunks || mode_differs || working_tree_file) {
-		const char *abb;
-		int use_color = DIFF_OPT_TST(opt, COLOR_DIFF);
-		const char *c_meta = diff_get_color(use_color, DIFF_METAINFO);
-		const char *c_reset = diff_get_color(use_color, DIFF_RESET);
-		int added = 0;
-		int deleted = 0;
-
-		if (rev->loginfo && !rev->no_commit_id)
-			show_log(rev);
-		dump_quoted_path(dense ? "diff --cc " : "diff --combined ",
-				 "", elem->path, c_meta, c_reset);
-		printf("%sindex ", c_meta);
-		for (i = 0; i < num_parent; i++) {
-			abb = find_unique_abbrev(elem->parent[i].sha1,
-						 abbrev);
-			printf("%s%s", i ? "," : "", abb);
-		}
-		abb = find_unique_abbrev(elem->sha1, abbrev);
-		printf("..%s%s\n", abb, c_reset);
-
-		if (mode_differs) {
-			deleted = !elem->mode;
-
-			/* We say it was added if nobody had it */
-			added = !deleted;
-			for (i = 0; added && i < num_parent; i++)
-				if (elem->parent[i].status !=
-				    DIFF_STATUS_ADDED)
-					added = 0;
-			if (added)
-				printf("%snew file mode %06o",
-				       c_meta, elem->mode);
-			else {
-				if (deleted)
-					printf("%sdeleted file ", c_meta);
-				printf("mode ");
-				for (i = 0; i < num_parent; i++) {
-					printf("%s%06o", i ? "," : "",
-					       elem->parent[i].mode);
-				}
-				if (elem->mode)
-					printf("..%06o", elem->mode);
-			}
-			printf("%s\n", c_reset);
-		}
-		if (added)
-			dump_quoted_path("--- ", "", "/dev/null",
-					 c_meta, c_reset);
-		else
-			dump_quoted_path("--- ", a_prefix, elem->path,
-					 c_meta, c_reset);
-		if (deleted)
-			dump_quoted_path("+++ ", "", "/dev/null",
-					 c_meta, c_reset);
-		else
-			dump_quoted_path("+++ ", b_prefix, elem->path,
-					 c_meta, c_reset);
+		show_combined_header(elem, num_parent, dense, rev,
+				     mode_differs);
 		dump_sline(sline, cnt, num_parent,
 			   DIFF_OPT_TST(opt, COLOR_DIFF), result_deleted);
 	}
-- 
1.7.5.2.4.g43415

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/5] combine-diff: calculate mode_differs earlier
  2011-05-23 20:15       ` Jeff King
  2011-05-23 20:16         ` [PATCH 1/5] combine-diff: split header printing into its own function Jeff King
@ 2011-05-23 20:16         ` Jeff King
  2011-05-23 20:27         ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 20:16 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

One loop combined both the patch generation and checking
whether there was any mode change to report. Let's factor
that into two separate loops, as we may care about the mode
change even if we are not generating patches (e.g., because
we are showing a binary diff, which will come in a future
patch).

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 309dc6c..2183184 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -845,6 +845,13 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			close(fd);
 	}
 
+	for (i = 0; i < num_parent; i++) {
+		if (elem->parent[i].mode != elem->mode) {
+			mode_differs = 1;
+			break;
+		}
+	}
+
 	for (cnt = 0, cp = result; cp < result + result_size; cp++) {
 		if (*cp == '\n')
 			cnt++;
@@ -893,8 +900,6 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 				     elem->parent[i].mode,
 				     &result_file, sline,
 				     cnt, i, num_parent, result_deleted);
-		if (elem->parent[i].mode != elem->mode)
-			mode_differs = 1;
 	}
 
 	show_hunks = make_hunks(sline, cnt, num_parent, dense);
-- 
1.7.5.2.4.g43415

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-23 20:15       ` Jeff King
  2011-05-23 20:16         ` [PATCH 1/5] combine-diff: split header printing into its own function Jeff King
  2011-05-23 20:16         ` [PATCH 2/5] combine-diff: calculate mode_differs earlier Jeff King
@ 2011-05-23 20:27         ` Jeff King
  2011-05-23 23:02           ` Junio C Hamano
  2011-05-30  6:33           ` Junio C Hamano
  2011-05-23 20:30         ` [PATCH 4/5] refactor get_textconv to not require diff_filespec Jeff King
                           ` (2 subsequent siblings)
  5 siblings, 2 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 20:27 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

The combined diff code path is totally different from the
regular diff code path, and didn't handle binary files at
all. The results of a combined diff on a binary file could
range from annoying (since we spewed binary garbage,
possibly upsetting the user's terminal), to wrong (embedded
NULs caused us to show incorrect diffs, with lines truncated
at the NUL character), to potential security problems
(embedded NULs could interfere with "-z" output, possibly
defeating policy hooks which parse diff output).

Instead, we consider a combined diff to be binary if any of
the input blobs is binary. To show a binary combined diff,
we indicate "Binary blobs differ"; the "index" meta line
will show which parents had which blob.

Signed-off-by: Jeff King <peff@peff.net>
---
The big question here is what we want the output to look like. I chose
this:

  diff --combined foo
  index 7ea6ded,6197570..9563691
  Binary files differ

which contains all of the information we have: for file "foo" there were
two parents (at 7ea6ded and 6197570 respectively), and the resolution
was 9563691. If you want to see full sha1s, you can use --full-index.
This format maps very closely to the non-combined case, which looks
like:

  diff --git a/foo b/foo
  index 6197570..9563691 100644
  Binary files a/foo and b/foo differ

Two other obvious options for format would be:

  1. A combined diff of fake text, like:

     - Binary blob 7ea6ded
      -Binary blob 6197570
     ++Binary blob 9563691

    But this contains no additional information over the index line, and
    I worry that it will be less obvious to readers what is going on.

  2. The --raw format, which looks like:

     ::100644 100644 100644 7ea6ded... 6197570... 9563691... MM binary

     but again, there's really no extra information there (the modes
     are there, but in the case of changed modes, we already produce a
     separate mode line like "mode 100644,100755 100755". And I
     personally find it way less readable (it's more compact, which is
     good for actual --raw mode, but in patch mode we are already
     spending quite a few lines per file).

So I think what I have is succint, complete, and matches the
non-combined case as well as we can.

 combine-diff.c                  |   37 ++++++++++++-
 t/t4048-diff-combined-binary.sh |  113 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 2 deletions(-)
 create mode 100755 t/t4048-diff-combined-binary.sh

diff --git a/combine-diff.c b/combine-diff.c
index 2183184..94a207f 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -7,6 +7,7 @@
 #include "xdiff-interface.h"
 #include "log-tree.h"
 #include "refs.h"
+#include "userdiff.h"
 
 static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent)
 {
@@ -685,7 +686,8 @@ static void show_combined_header(struct combine_diff_path *elem,
 				 int num_parent,
 				 int dense,
 				 struct rev_info *rev,
-				 int mode_differs)
+				 int mode_differs,
+				 int show_file_header)
 {
 	struct diff_options *opt = &rev->diffopt;
 	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? 40 : DEFAULT_ABBREV;
@@ -739,6 +741,9 @@ static void show_combined_header(struct combine_diff_path *elem,
 		printf("%s\n", c_reset);
 	}
 
+	if (!show_file_header)
+		return;
+
 	if (added)
 		dump_quoted_path("--- ", "", "/dev/null",
 				 c_meta, c_reset);
@@ -765,8 +770,13 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	int i, show_hunks;
 	int working_tree_file = is_null_sha1(elem->sha1);
 	mmfile_t result_file;
+	struct userdiff_driver *userdiff;
+	int is_binary;
 
 	context = opt->context;
+	userdiff = userdiff_find_by_path(elem->path);
+	if (!userdiff)
+		userdiff = userdiff_find_by_name("default");
 
 	/* Read the result of merge first */
 	if (!working_tree_file)
@@ -852,6 +862,29 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 		}
 	}
 
+	if (userdiff->binary != -1)
+		is_binary = userdiff->binary;
+	else {
+		is_binary = buffer_is_binary(result, result_size);
+		for (i = 0; !is_binary && i < num_parent; i++) {
+			char *buf;
+			unsigned long size;
+			buf = grab_blob(elem->parent[i].sha1,
+					elem->parent[i].mode,
+					&size);
+			if (buffer_is_binary(buf, size))
+				is_binary = 1;
+			free(buf);
+		}
+	}
+	if (is_binary) {
+		show_combined_header(elem, num_parent, dense, rev,
+				     mode_differs, 0);
+		printf("Binary files differ\n");
+		free(result);
+		return;
+	}
+
 	for (cnt = 0, cp = result; cp < result + result_size; cp++) {
 		if (*cp == '\n')
 			cnt++;
@@ -906,7 +939,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 
 	if (show_hunks || mode_differs || working_tree_file) {
 		show_combined_header(elem, num_parent, dense, rev,
-				     mode_differs);
+				     mode_differs, 1);
 		dump_sline(sline, cnt, num_parent,
 			   DIFF_OPT_TST(opt, COLOR_DIFF), result_deleted);
 	}
diff --git a/t/t4048-diff-combined-binary.sh b/t/t4048-diff-combined-binary.sh
new file mode 100755
index 0000000..a943994
--- /dev/null
+++ b/t/t4048-diff-combined-binary.sh
@@ -0,0 +1,113 @@
+#!/bin/sh
+
+test_description='combined and merge diff handle binary files and textconv'
+. ./test-lib.sh
+
+test_expect_success 'setup binary merge conflict' '
+	echo oneQ1 | q_to_nul >binary &&
+	git add binary &&
+	git commit -m one &&
+	echo twoQ2 | q_to_nul >binary &&
+	git commit -a -m two &&
+	git checkout -b branch-binary HEAD^ &&
+	echo threeQ3 | q_to_nul >binary &&
+	git commit -a -m three &&
+	test_must_fail git merge master &&
+	echo resolvedQhooray | q_to_nul >binary &&
+	git commit -a -m resolved
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --git a/binary b/binary
+index 7ea6ded..9563691 100644
+Binary files a/binary and b/binary differ
+resolved
+
+diff --git a/binary b/binary
+index 6197570..9563691 100644
+Binary files a/binary and b/binary differ
+EOF
+test_expect_success 'diff -m indicates binary-ness' '
+	git show --format=%s -m >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --combined binary
+index 7ea6ded,6197570..9563691
+Binary files differ
+EOF
+test_expect_success 'diff -c indicates binary-ness' '
+	git show --format=%s -c >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --cc binary
+index 7ea6ded,6197570..9563691
+Binary files differ
+EOF
+test_expect_success 'diff --cc indicates binary-ness' '
+	git show --format=%s --cc >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'setup non-binary with binary attribute' '
+	git checkout master &&
+	test_commit one text &&
+	test_commit two text &&
+	git checkout -b branch-text HEAD^ &&
+	test_commit three text &&
+	test_must_fail git merge master &&
+	test_commit resolved text &&
+	echo text -diff >.gitattributes
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --git a/text b/text
+index 2bdf67a..2ab19ae 100644
+Binary files a/text and b/text differ
+resolved
+
+diff --git a/text b/text
+index f719efd..2ab19ae 100644
+Binary files a/text and b/text differ
+EOF
+test_expect_success 'diff -m respects binary attribute' '
+	git show --format=%s -m >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --combined text
+index 2bdf67a,f719efd..2ab19ae
+Binary files differ
+EOF
+test_expect_success 'diff -c respects binary attribute' '
+	git show --format=%s -c >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --cc text
+index 2bdf67a,f719efd..2ab19ae
+Binary files differ
+EOF
+test_expect_success 'diff --cc respects binary attribute' '
+	git show --format=%s --cc >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
1.7.5.2.4.g43415

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/5] refactor get_textconv to not require diff_filespec
  2011-05-23 20:15       ` Jeff King
                           ` (2 preceding siblings ...)
  2011-05-23 20:27         ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
@ 2011-05-23 20:30         ` Jeff King
  2011-05-23 20:31         ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
  2011-05-23 22:55         ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
  5 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 20:30 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

This function actually does two things:

  1. Load the userdiff driver for the filespec.

  2. Decide whether the driver has a textconv component, and
     initialize the textconv cache if applicable.

Only part (1) requires the filespec object, and some callers
may not have a filespec at all. So let's split them it into
two functions, and put part (2) with the userdiff code,
which is a better fit.

Signed-off-by: Jeff King <peff@peff.net>
---
It's a little funny to have "userdiff_get_textconv(foo)" return either
NULL or "foo"; you are not really getting the textconv but rather seeing
if there is a textconv available, and if so, preparing it. But that is
not a new thing; get_textconv was already like that.

One possible refactoring would be that the textconv (including the
cache) should be a sub-structure of the userdiff driver. I chose not to
do that refactoring because of the amount of noise it would introduce in
diff.c. But I can do it if we care.

 diff.c     |   14 +-------------
 userdiff.c |   17 +++++++++++++++++
 userdiff.h |    2 ++
 3 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index ba5f7aa..3f538f5 100644
--- a/diff.c
+++ b/diff.c
@@ -1976,19 +1976,7 @@ struct userdiff_driver *get_textconv(struct diff_filespec *one)
 		return NULL;
 
 	diff_filespec_load_driver(one);
-	if (!one->driver->textconv)
-		return NULL;
-
-	if (one->driver->textconv_want_cache && !one->driver->textconv_cache) {
-		struct notes_cache *c = xmalloc(sizeof(*c));
-		struct strbuf name = STRBUF_INIT;
-
-		strbuf_addf(&name, "textconv/%s", one->driver->name);
-		notes_cache_init(c, name.buf, one->driver->textconv);
-		one->driver->textconv_cache = c;
-	}
-
-	return one->driver;
+	return userdiff_get_textconv(one->driver);
 }
 
 static void builtin_diff(const char *name_a,
diff --git a/userdiff.c b/userdiff.c
index 1ff4797..5d62e79 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -267,3 +267,20 @@ struct userdiff_driver *userdiff_find_by_path(const char *path)
 		return NULL;
 	return userdiff_find_by_name(check.value);
 }
+
+struct userdiff_driver *userdiff_get_textconv(struct userdiff_driver *driver)
+{
+	if (!driver->textconv)
+		return NULL;
+
+	if (driver->textconv_want_cache && !driver->textconv_cache) {
+		struct notes_cache *c = xmalloc(sizeof(*c));
+		struct strbuf name = STRBUF_INIT;
+
+		strbuf_addf(&name, "textconv/%s", driver->name);
+		notes_cache_init(c, name.buf, driver->textconv);
+		driver->textconv_cache = c;
+	}
+
+	return driver;
+}
diff --git a/userdiff.h b/userdiff.h
index 942d594..4a7e78f 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -23,4 +23,6 @@ int userdiff_config(const char *k, const char *v);
 struct userdiff_driver *userdiff_find_by_name(const char *name);
 struct userdiff_driver *userdiff_find_by_path(const char *path);
 
+struct userdiff_driver *userdiff_get_textconv(struct userdiff_driver *driver);
+
 #endif /* USERDIFF */
-- 
1.7.5.2.4.g43415

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/5] combine-diff: respect textconv attributes
  2011-05-23 20:15       ` Jeff King
                           ` (3 preceding siblings ...)
  2011-05-23 20:30         ` [PATCH 4/5] refactor get_textconv to not require diff_filespec Jeff King
@ 2011-05-23 20:31         ` Jeff King
  2011-05-23 22:47           ` Junio C Hamano
  2011-05-24 16:20           ` Junio C Hamano
  2011-05-23 22:55         ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
  5 siblings, 2 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 20:31 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

When doing a combined diff, we did not respect textconv
attributes at all. This generally lead to us printing
"Binary files differ" when we could show a combined diff of
the converted text.

This patch converts file contents according to textconv
attributes. The implementation is slightly ugly; because the
textconv code is tightly linked with the diff_filespec code,
we temporarily create a diff_filespec during conversion. In
practice, though, this should not create a performance
problem.

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c                  |   41 +++++++++++++----
 t/t4048-diff-combined-binary.sh |   99 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 131 insertions(+), 9 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 94a207f..fbed374 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -93,7 +93,8 @@ struct sline {
 	unsigned long *p_lno;
 };
 
-static char *grab_blob(const unsigned char *sha1, unsigned int mode, unsigned long *size)
+static char *grab_blob(const unsigned char *sha1, unsigned int mode,
+		       unsigned long *size, struct userdiff_driver *textconv)
 {
 	char *blob;
 	enum object_type type;
@@ -106,6 +107,13 @@ static char *grab_blob(const unsigned char *sha1, unsigned int mode, unsigned lo
 		/* deleted blob */
 		*size = 0;
 		return xcalloc(1, 1);
+	} else if (textconv) {
+		/* yuck, the textconv code is linked heavily with
+		 * filespecs */
+		struct diff_filespec *df = alloc_filespec("");
+		fill_filespec(df, sha1, mode);
+		*size = fill_textconv(textconv, df, &blob);
+		free_filespec(df);
 	} else {
 		blob = read_sha1_file(sha1, &type, size);
 		if (type != OBJ_BLOB)
@@ -205,7 +213,8 @@ static void consume_line(void *state_, char *line, unsigned long len)
 static void combine_diff(const unsigned char *parent, unsigned int mode,
 			 mmfile_t *result_file,
 			 struct sline *sline, unsigned int cnt, int n,
-			 int num_parent, int result_deleted)
+			 int num_parent, int result_deleted,
+			 struct userdiff_driver *textconv)
 {
 	unsigned int p_lno, lno;
 	unsigned long nmask = (1UL << n);
@@ -218,7 +227,7 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	if (result_deleted)
 		return; /* result deleted */
 
-	parent_file.ptr = grab_blob(parent, mode, &sz);
+	parent_file.ptr = grab_blob(parent, mode, &sz, textconv);
 	parent_file.size = sz;
 	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
@@ -771,16 +780,20 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	int working_tree_file = is_null_sha1(elem->sha1);
 	mmfile_t result_file;
 	struct userdiff_driver *userdiff;
+	struct userdiff_driver *textconv = NULL;
 	int is_binary;
 
 	context = opt->context;
 	userdiff = userdiff_find_by_path(elem->path);
 	if (!userdiff)
 		userdiff = userdiff_find_by_name("default");
+	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV))
+		textconv = userdiff_get_textconv(userdiff);
 
 	/* Read the result of merge first */
 	if (!working_tree_file)
-		result = grab_blob(elem->sha1, elem->mode, &result_size);
+		result = grab_blob(elem->sha1, elem->mode, &result_size,
+				   textconv);
 	else {
 		/* Used by diff-tree to read from the working tree */
 		struct stat st;
@@ -803,9 +816,16 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 		} else if (S_ISDIR(st.st_mode)) {
 			unsigned char sha1[20];
 			if (resolve_gitlink_ref(elem->path, "HEAD", sha1) < 0)
-				result = grab_blob(elem->sha1, elem->mode, &result_size);
+				result = grab_blob(elem->sha1, elem->mode,
+						   &result_size, NULL);
 			else
-				result = grab_blob(sha1, elem->mode, &result_size);
+				result = grab_blob(sha1, elem->mode,
+						   &result_size, NULL);
+		} else if (textconv) {
+			struct diff_filespec *df = alloc_filespec(elem->path);
+			fill_filespec(df, null_sha1, st.st_mode);
+			result_size = fill_textconv(textconv, df, &result);
+			free_filespec(df);
 		} else if (0 <= (fd = open(elem->path, O_RDONLY))) {
 			size_t len = xsize_t(st.st_size);
 			ssize_t done;
@@ -862,7 +882,9 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 		}
 	}
 
-	if (userdiff->binary != -1)
+	if (textconv)
+		is_binary = 0;
+	else if (userdiff->binary != -1)
 		is_binary = userdiff->binary;
 	else {
 		is_binary = buffer_is_binary(result, result_size);
@@ -871,7 +893,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			unsigned long size;
 			buf = grab_blob(elem->parent[i].sha1,
 					elem->parent[i].mode,
-					&size);
+					&size, NULL);
 			if (buffer_is_binary(buf, size))
 				is_binary = 1;
 			free(buf);
@@ -932,7 +954,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			combine_diff(elem->parent[i].sha1,
 				     elem->parent[i].mode,
 				     &result_file, sline,
-				     cnt, i, num_parent, result_deleted);
+				     cnt, i, num_parent, result_deleted,
+				     textconv);
 	}
 
 	show_hunks = make_hunks(sline, cnt, num_parent, dense);
diff --git a/t/t4048-diff-combined-binary.sh b/t/t4048-diff-combined-binary.sh
index a943994..87a8949 100755
--- a/t/t4048-diff-combined-binary.sh
+++ b/t/t4048-diff-combined-binary.sh
@@ -110,4 +110,103 @@ test_expect_success 'diff --cc respects binary attribute' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup textconv attribute' '
+	echo "text diff=upcase" >.gitattributes &&
+	git config diff.upcase.textconv "tr a-z A-Z <"
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --git a/text b/text
+index 2bdf67a..2ab19ae 100644
+--- a/text
++++ b/text
+@@ -1 +1 @@
+-THREE
++RESOLVED
+resolved
+
+diff --git a/text b/text
+index f719efd..2ab19ae 100644
+--- a/text
++++ b/text
+@@ -1 +1 @@
+-TWO
++RESOLVED
+EOF
+test_expect_success 'diff -m respects textconv attribute' '
+	git show --format=%s -m >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --combined text
+index 2bdf67a,f719efd..2ab19ae
+--- a/text
++++ b/text
+@@@ -1,1 -1,1 +1,1 @@@
+- THREE
+ -TWO
+++RESOLVED
+EOF
+test_expect_success 'diff -c respects textconv attribute' '
+	git show --format=%s -c >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+resolved
+
+diff --cc text
+index 2bdf67a,f719efd..2ab19ae
+--- a/text
++++ b/text
+@@@ -1,1 -1,1 +1,1 @@@
+- THREE
+ -TWO
+++RESOLVED
+EOF
+test_expect_success 'diff --cc respects textconv attribute' '
+	git show --format=%s --cc >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+diff --combined text
+index 2bdf67a,f719efd..2ab19ae
+--- a/text
++++ b/text
+@@@ -1,1 -1,1 +1,1 @@@
+- three
+ -two
+++resolved
+EOF
+test_expect_success 'diff-tree plumbing does not respect textconv' '
+	git diff-tree HEAD -c -p >full &&
+	tail -n +2 full >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+diff --cc text
+index 2bdf67a,f719efd..0000000
+--- a/text
++++ b/text
+@@@ -1,1 -1,1 +1,5 @@@
+++<<<<<<< HEAD
+ +THREE
+++=======
++ TWO
+++>>>>>>> MASTER
+EOF
+test_expect_success 'diff --cc respects textconv on worktree file' '
+	git reset --hard HEAD^ &&
+	test_must_fail git merge master &&
+	git diff >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.7.5.2.4.g43415

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/5] combine-diff: respect textconv attributes
  2011-05-23 20:31         ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
@ 2011-05-23 22:47           ` Junio C Hamano
  2011-05-23 23:39             ` Jeff King
  2011-05-24 16:20           ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2011-05-23 22:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

> +	} else if (textconv) {
> +		/* yuck, the textconv code is linked heavily with
> +		 * filespecs */
> +		struct diff_filespec *df = alloc_filespec("");
> +		fill_filespec(df, sha1, mode);
> +		*size = fill_textconv(textconv, df, &blob);

I thought your 4/5 talked something about not requiring filespec for
textconv handling...?  Is it still yuck?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 20:15       ` Jeff King
                           ` (4 preceding siblings ...)
  2011-05-23 20:31         ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
@ 2011-05-23 22:55         ` Jay Soffian
  2011-05-23 23:31           ` Jay Soffian
  2011-05-23 23:41           ` Jeff King
  5 siblings, 2 replies; 37+ messages in thread
From: Jay Soffian @ 2011-05-23 22:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, git, Junio C Hamano

On Mon, May 23, 2011 at 4:15 PM, Jeff King <peff@peff.net> wrote:
> It turned out not too bad:
>
>  [1/5]: combine-diff: split header printing into its own function
>  [2/5]: combine-diff: calculate mode_differs earlier
>  [3/5]: combine-diff: handle binary files as binary

Tested-by: Jay Soffian <jaysoffian@gmail.com>

In a real-world merge, png's were correctly shown as "Binary files
differ". I also tested with "*.xib -diff" and that worked as expected.

However, custom diff drivers (still) don't work. :-)

Also read the patches and they LGTM.

>  [4/5]: refactor get_textconv to not require diff_filespec
>  [5/5]: combine-diff: respect textconv attributes

Didn't test explicitly, but don't see anything obviously wrong.

Thanks Peff,

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-23 20:27         ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
@ 2011-05-23 23:02           ` Junio C Hamano
  2011-05-23 23:50             ` Jeff King
  2011-05-30  6:33           ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2011-05-23 23:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

> The big question here is what we want the output to look like. I chose
> this:
>
>   diff --combined foo
>   index 7ea6ded,6197570..9563691
>   Binary files differ
>
> which contains all of the information we have: for file "foo" there were
> two parents (at 7ea6ded and 6197570 respectively), and the resolution
> was 9563691.

Yeah, I think the above is sane and matches more-or-less the conclusion of
the previous discussion Michael quoted.

I've only read up to this patch (I wanted to conclude the first round of
today's integration round before 17:00 local) and they all looked good;
will queue the whole thing to 'pu' in the meantime.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 22:55         ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
@ 2011-05-23 23:31           ` Jay Soffian
  2011-05-23 23:49             ` Jeff King
  2011-05-23 23:41           ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Jay Soffian @ 2011-05-23 23:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, git, Junio C Hamano

On Mon, May 23, 2011 at 6:55 PM, Jay Soffian <jaysoffian@gmail.com> wrote:
>>  [4/5]: refactor get_textconv to not require diff_filespec
>>  [5/5]: combine-diff: respect textconv attributes
>
> Didn't test explicitly, but don't see anything obviously wrong.

Okay, tested these now. I noticed two tiny issues unrelated to this
series, but rather since this is the first time I've used textconv.

1. The temporary path git generates loses the original file extension.
At least in the case of Mac OS X, ibtool gets angry when the ".xib"
isn' there. I worked around this with a wrapper, but maybe git could
preserve the extension.

2. It's not really clear to me when one should use textconv vs a
custom diff driver. (Also, --no-textconv isn't documented...).

Thanks,

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/5] combine-diff: respect textconv attributes
  2011-05-23 22:47           ` Junio C Hamano
@ 2011-05-23 23:39             ` Jeff King
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 23:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jay Soffian, Michael J Gruber, git

On Mon, May 23, 2011 at 03:47:40PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +	} else if (textconv) {
> > +		/* yuck, the textconv code is linked heavily with
> > +		 * filespecs */
> > +		struct diff_filespec *df = alloc_filespec("");
> > +		fill_filespec(df, sha1, mode);
> > +		*size = fill_textconv(textconv, df, &blob);
> 
> I thought your 4/5 talked something about not requiring filespec for
> textconv handling...?  Is it still yuck?

Yeah, I removed the requirement for get_textconv to need a filespec.
Which is nice, because it means the non-textconv codepath doesn't deal
with this at all.

But removing the filespec from fill_textconv is much nastier, because it
relies on diff.c's prepare_temp_file, which also uses the filespec.

So it's still yuck, but limited to this little conditional. It would be
nice if we used filespecs all the way through the combine-diff code
path, but that was a bit more refactoring than I was up for today.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 22:55         ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
  2011-05-23 23:31           ` Jay Soffian
@ 2011-05-23 23:41           ` Jeff King
  2011-05-24  4:46             ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff King @ 2011-05-23 23:41 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

On Mon, May 23, 2011 at 06:55:07PM -0400, Jay Soffian wrote:

> On Mon, May 23, 2011 at 4:15 PM, Jeff King <peff@peff.net> wrote:
> > It turned out not too bad:
> >
> >  [1/5]: combine-diff: split header printing into its own function
> >  [2/5]: combine-diff: calculate mode_differs earlier
> >  [3/5]: combine-diff: handle binary files as binary
> 
> Tested-by: Jay Soffian <jaysoffian@gmail.com>
> 
> In a real-world merge, png's were correctly shown as "Binary files
> differ". I also tested with "*.xib -diff" and that worked as expected.
> 
> However, custom diff drivers (still) don't work. :-)

Yeah, I didn't add any support for that. I'm not sure what it should do;
custom diff drivers don't know how to handle combined diff, do they?

If you write me a test case that explains what _should_ happen, I'll see
what I can do. :)

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 23:31           ` Jay Soffian
@ 2011-05-23 23:49             ` Jeff King
  2011-05-24  0:59               ` Jay Soffian
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2011-05-23 23:49 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Michael J Gruber, git, Junio C Hamano

On Mon, May 23, 2011 at 07:31:02PM -0400, Jay Soffian wrote:

> 1. The temporary path git generates loses the original file extension.
> At least in the case of Mac OS X, ibtool gets angry when the ".xib"
> isn' there. I worked around this with a wrapper, but maybe git could
> preserve the extension.

Hmm, I thought we did preserve the extension. I think it may actually be
related to me making a fake filespec and not passing the path in.

Does the (totally untested) patch below fix it for you?

> 2. It's not really clear to me when one should use textconv vs a
> custom diff driver. (Also, --no-textconv isn't documented...).

The motivation for textconv was originally a combination of "when you
are too lazy to write a full diff driver" and "when you like how git
formats the diff with pretty colors and word-diff". But now I think we
can add to that:

  1. When you want something fancy and git-ish like combined diff.

  2. When you want free caching to speed up repeated log viewing (try
     setting diff.*.cachetextconv to true).

Maybe it is worth adding an "advantages of textconv" section to
gitattributes(5) (the advantage of a full external diff command is that
it can be more flexible, or even graphical in nature).

Anyway, here's the patch.

---
diff --git a/combine-diff.c b/combine-diff.c
index fbed374..a4f2ff5 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -94,7 +94,8 @@ struct sline {
 };
 
 static char *grab_blob(const unsigned char *sha1, unsigned int mode,
-		       unsigned long *size, struct userdiff_driver *textconv)
+		       unsigned long *size, struct userdiff_driver *textconv,
+		       const char *path)
 {
 	char *blob;
 	enum object_type type;
@@ -110,7 +111,7 @@ static char *grab_blob(const unsigned char *sha1, unsigned int mode,
 	} else if (textconv) {
 		/* yuck, the textconv code is linked heavily with
 		 * filespecs */
-		struct diff_filespec *df = alloc_filespec("");
+		struct diff_filespec *df = alloc_filespec(path);
 		fill_filespec(df, sha1, mode);
 		*size = fill_textconv(textconv, df, &blob);
 		free_filespec(df);
@@ -214,7 +215,8 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 			 mmfile_t *result_file,
 			 struct sline *sline, unsigned int cnt, int n,
 			 int num_parent, int result_deleted,
-			 struct userdiff_driver *textconv)
+			 struct userdiff_driver *textconv,
+			 const char *path)
 {
 	unsigned int p_lno, lno;
 	unsigned long nmask = (1UL << n);
@@ -227,7 +229,7 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	if (result_deleted)
 		return; /* result deleted */
 
-	parent_file.ptr = grab_blob(parent, mode, &sz, textconv);
+	parent_file.ptr = grab_blob(parent, mode, &sz, textconv, path);
 	parent_file.size = sz;
 	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
@@ -793,7 +795,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	/* Read the result of merge first */
 	if (!working_tree_file)
 		result = grab_blob(elem->sha1, elem->mode, &result_size,
-				   textconv);
+				   textconv, elem->path);
 	else {
 		/* Used by diff-tree to read from the working tree */
 		struct stat st;
@@ -817,10 +819,10 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			unsigned char sha1[20];
 			if (resolve_gitlink_ref(elem->path, "HEAD", sha1) < 0)
 				result = grab_blob(elem->sha1, elem->mode,
-						   &result_size, NULL);
+						   &result_size, NULL, NULL);
 			else
 				result = grab_blob(sha1, elem->mode,
-						   &result_size, NULL);
+						   &result_size, NULL, NULL);
 		} else if (textconv) {
 			struct diff_filespec *df = alloc_filespec(elem->path);
 			fill_filespec(df, null_sha1, st.st_mode);
@@ -893,7 +895,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			unsigned long size;
 			buf = grab_blob(elem->parent[i].sha1,
 					elem->parent[i].mode,
-					&size, NULL);
+					&size, NULL, NULL);
 			if (buffer_is_binary(buf, size))
 				is_binary = 1;
 			free(buf);
@@ -955,7 +957,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 				     elem->parent[i].mode,
 				     &result_file, sline,
 				     cnt, i, num_parent, result_deleted,
-				     textconv);
+				     textconv, elem->path);
 	}
 
 	show_hunks = make_hunks(sline, cnt, num_parent, dense);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-23 23:02           ` Junio C Hamano
@ 2011-05-23 23:50             ` Jeff King
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-23 23:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jay Soffian, Michael J Gruber, git

On Mon, May 23, 2011 at 04:02:58PM -0700, Junio C Hamano wrote:

> Yeah, I think the above is sane and matches more-or-less the conclusion of
> the previous discussion Michael quoted.
> 
> I've only read up to this patch (I wanted to conclude the first round of
> today's integration round before 17:00 local) and they all looked good;
> will queue the whole thing to 'pu' in the meantime.

Thanks. I think there is a minor tweak to 5/5 that I posted elsewhere in
the thread. I'll do more testing with it and post tomorrow. I'm done
gitting for today.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 23:49             ` Jeff King
@ 2011-05-24  0:59               ` Jay Soffian
  0 siblings, 0 replies; 37+ messages in thread
From: Jay Soffian @ 2011-05-24  0:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, git, Junio C Hamano

On Mon, May 23, 2011 at 7:49 PM, Jeff King <peff@peff.net> wrote:
> Hmm, I thought we did preserve the extension. I think it may actually be
> related to me making a fake filespec and not passing the path in.
>
> Does the (totally untested) patch below fix it for you?

Yes.

>> 2. It's not really clear to me when one should use textconv vs a
>> custom diff driver. (Also, --no-textconv isn't documented...).
>
> The motivation for textconv was originally a combination of "when you
> are too lazy to write a full diff driver" and "when you like how git
> formats the diff with pretty colors and word-diff". But now I think we
> can add to that:
>
>  1. When you want something fancy and git-ish like combined diff.
>
>  2. When you want free caching to speed up repeated log viewing (try
>     setting diff.*.cachetextconv to true).

Gotcha.

> Maybe it is worth adding an "advantages of textconv" section to
> gitattributes(5) (the advantage of a full external diff command is that
> it can be more flexible, or even graphical in nature).

Okay.

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-23 23:41           ` Jeff King
@ 2011-05-24  4:46             ` Junio C Hamano
  2011-05-24  7:19               ` Michael J Gruber
  2011-05-24 14:40               ` Jay Soffian
  0 siblings, 2 replies; 37+ messages in thread
From: Junio C Hamano @ 2011-05-24  4:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

>> However, custom diff drivers (still) don't work. :-)
>
> Yeah, I didn't add any support for that. I'm not sure what it should do;
> custom diff drivers don't know how to handle combined diff, do they?
>
> If you write me a test case that explains what _should_ happen, I'll see
> what I can do. :)

I do not think it is sensible to expect anybody to come up with a sane
semantics for combined diff to work with GIT_EXTERNAL_DIFF (and external
diff driver that can be specified via the attributes mechanism) in any
meaningful way.

The whole point of the external diff mechanism is that an external command
can take _two_ files and represent the change between them in a way that
is more suited for the need of the user than the patch form. The output
from such an external command does not have any obligation to even follow
the convention used by the patch output, namely:

  @@ from here to there things have changed @@
   this is common
  -this was the removed content
  +this is the new content

as the _whole_ point of the external diff mechanism is to give something
that is _different_ from the patch form, in the hope that it is in a more
appropriate form for whoever consumes the output.

On the other hand, combined diff is all about combining multiple patches
show them side-by-side in a combined fashion. Without the above four kinds
of cues, there is no way to even _align_ the change outputs from two
parents, let alone _combining_ them.

Anybody interested can check "compare-cooking.perl" in the todo branch,
which is used as an external diff driver to view the differences between
"What's cooking" postings via these:

    [diff "whatscooking"]
            xfuncname = "^\\[(.*)\\]$"
            command = ./compare-cooking.perl

in the .git/config file, together with

    whats-cooking.txt diff=whatscooking

in the .gitattributes file. Running

    $ git log -p --ext-diff todo -- whats-cooking.txt

would give a sample output.

It is conceivable that we _could_ newly define a "combined external diff
driver" that would take 3 or more files, and compute and show the combined
result by itself, but that will certainly not go through the codepath you
touched with the textconv patch. Calling out to such a new type of
external diff driver would have to happen at the level where we have 1+N
blob object names for a N-parent commit, namely, at the beginning of
show_patch_diff(), bypassing the entire contents of that function and
instead letting the new n-way external diff driver do everything.

I however highly doubt that such an interface would make sense. For
example, what would be the desirable format to compare three versions of
"What's cooking" postings, and how would the updated compare-cooking.perl
script would look like?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24  4:46             ` Junio C Hamano
@ 2011-05-24  7:19               ` Michael J Gruber
  2011-05-24 15:36                 ` Junio C Hamano
  2011-05-24 19:13                 ` Jeff King
  2011-05-24 14:40               ` Jay Soffian
  1 sibling, 2 replies; 37+ messages in thread
From: Michael J Gruber @ 2011-05-24  7:19 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Jay Soffian, git

First of all:

Jeff, thanks a bunch for taking this up again! That's a great
improvement. (I'm not sure I can devote enough time to reviewing, but
I'll see.)

Junio C Hamano venit, vidit, dixit 24.05.2011 06:46:
> Jeff King <peff@peff.net> writes:
> 
>>> However, custom diff drivers (still) don't work. :-)
>>
>> Yeah, I didn't add any support for that. I'm not sure what it should do;
>> custom diff drivers don't know how to handle combined diff, do they?
>>
>> If you write me a test case that explains what _should_ happen, I'll see
>> what I can do. :)
> 
> I do not think it is sensible to expect anybody to come up with a sane
> semantics for combined diff to work with GIT_EXTERNAL_DIFF (and external
> diff driver that can be specified via the attributes mechanism) in any
> meaningful way.
> 
> The whole point of the external diff mechanism is that an external command
> can take _two_ files and represent the change between them in a way that
> is more suited for the need of the user than the patch form. The output
> from such an external command does not have any obligation to even follow
> the convention used by the patch output, namely:
> 
>   @@ from here to there things have changed @@
>    this is common
>   -this was the removed content
>   +this is the new content
> 
> as the _whole_ point of the external diff mechanism is to give something
> that is _different_ from the patch form, in the hope that it is in a more
> appropriate form for whoever consumes the output.
> 
> On the other hand, combined diff is all about combining multiple patches
> show them side-by-side in a combined fashion. Without the above four kinds
> of cues, there is no way to even _align_ the change outputs from two
> parents, let alone _combining_ them.
> 
> Anybody interested can check "compare-cooking.perl" in the todo branch,
> which is used as an external diff driver to view the differences between
> "What's cooking" postings via these:
> 
>     [diff "whatscooking"]
>             xfuncname = "^\\[(.*)\\]$"
>             command = ./compare-cooking.perl
> 
> in the .git/config file, together with
> 
>     whats-cooking.txt diff=whatscooking
> 
> in the .gitattributes file. Running
> 
>     $ git log -p --ext-diff todo -- whats-cooking.txt
> 
> would give a sample output.
> 
> It is conceivable that we _could_ newly define a "combined external diff
> driver" that would take 3 or more files, and compute and show the combined
> result by itself, but that will certainly not go through the codepath you
> touched with the textconv patch. Calling out to such a new type of
> external diff driver would have to happen at the level where we have 1+N
> blob object names for a N-parent commit, namely, at the beginning of
> show_patch_diff(), bypassing the entire contents of that function and
> instead letting the new n-way external diff driver do everything.
> 
> I however highly doubt that such an interface would make sense. For
> example, what would be the desirable format to compare three versions of
> "What's cooking" postings, and how would the updated compare-cooking.perl
> script would look like?

Yeah, currently --cc with external makes no sense, but there are several
external tools which could present a 3-way diff in a useful way (or even
n-way with n>3), e.g. vimdiff, kdiff3, meld.

When the --cc/textconv issue came up I looked into this, and maybe
difftool is a place where one could plug this in first in the sense of
refactoring that even more and providing a diff3tool or such to view a
merge commit (or compare any 3 versions), or/and provide "git diff3 A B
C" which creates a fake merge (A+B -> C). Now, imagine we also have
INDEX and WORKTREE pseudorevs and can do

git diff3[tool] HEAD INDEX WORKTREE

:)

Michael

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24  4:46             ` Junio C Hamano
  2011-05-24  7:19               ` Michael J Gruber
@ 2011-05-24 14:40               ` Jay Soffian
  1 sibling, 0 replies; 37+ messages in thread
From: Jay Soffian @ 2011-05-24 14:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, git

On Tue, May 24, 2011 at 12:46 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>>> However, custom diff drivers (still) don't work. :-)
>>
>> Yeah, I didn't add any support for that. I'm not sure what it should do;
>> custom diff drivers don't know how to handle combined diff, do they?
>>
>> If you write me a test case that explains what _should_ happen, I'll see
>> what I can do. :)
>
> I do not think it is sensible to expect anybody to come up with a sane
> semantics for combined diff to work with GIT_EXTERNAL_DIFF (and external
> diff driver that can be specified via the attributes mechanism) in any
> meaningful way.

Indeed, it was a thinko that I even considered it. It turns out I had
been using an external diff command where I should have been using
textconv all along.

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24  7:19               ` Michael J Gruber
@ 2011-05-24 15:36                 ` Junio C Hamano
  2011-05-24 16:38                   ` Michael J Gruber
  2011-05-24 19:13                 ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2011-05-24 15:36 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Jeff King, Jay Soffian, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

> Jeff, thanks a bunch for taking this up again! That's a great
> improvement. (I'm not sure I can devote enough time to reviewing, but
> I'll see.)

Thanks. I've given a cursory look at the rest of the series (including the
addendum to be squashed in) and they seemed Ok. Will give it another round
of eyeballing before merging to "next".

>> I however highly doubt that such an interface would make sense. For
>> example, what would be the desirable format to compare three versions of
>> "What's cooking" postings, and how would the updated compare-cooking.perl
>> script would look like?
>
> Yeah, currently --cc with external makes no sense, but there are several
> external tools which could present a 3-way diff in a useful way (or even
> n-way with n>3), e.g. vimdiff, kdiff3, meld.

With an external command that can make 3-way diffs of arbitrary three
directories, you can trivially do something like:

	#!/bin/sh
	mkdir tmp-head tmp-index &&
        ( cd tmp-head &&
          GIT_DIR=../.git GIT_INDEX_FILE=$GIT_DIR/.index-head &&
          git checkout -f HEAD
	) &&
        ( cd tmp-index &&
          GIT_DIR=../.git &&
          git checkout -f .
	) &&
        $ext_diff3_cmd tmp-head/ tmp-index/ . &
        wait
        rm -fr tmp-index tmp-head

But that seems totally offtopic and has nothing to do with the "combined
diff" discussion, no?

If you want to plug in an external command that can make n-way diff of n
files when some paths are still shown using the usual --cc codepath, then
you would need an interface totally different from the diff.<driver>.cmd
interface for two-way diff to the external diff.  I pointed at where to
plug such a thing, but I do not think it would be of much use unless you
are handing the whole n-trees to the external command (which essentially
is what the above outline does). How would the user read the output that
comes out mixed from different codepaths, some from our own --cc while
others come from the external command, possibly opening separate windows
and even worse grabbing control and getting the caller stuck until the
user closes that window?

	Side note: about getting stuck, will we see an update to the
	diffstat count series by the end of this cycle? I do not mind
	carrying it over to the next cycle at all, but I'd rather see
	something already started gets finished.

> When the --cc/textconv issue came up I looked into this, and maybe
> difftool is a place where one could plug this in first in the sense of
> refactoring that even more and providing a diff3tool or such to view a
> merge commit (or compare any 3 versions), or/and provide "git diff3 A B
> C" which creates a fake merge (A+B -> C).

You do not need "git diff3 A B C" for a fake merge.

	$ git diff 61d7503d 2d22086 5bf29b9

already is a way to show you how the commit 61d7503d was created by
merging the other two (the merge result comes first and then its parents).
You could put the index into the mix by doing something like:

	$ git diff next master $(git write-tree)

Trying to show combined diff to merge the index and the working tree into
the current HEAD (which may be an example that does not make much sense)
would look like this:

	$ git diff HEAD $(git write-tree) $(
		git read-tree --index-output=.tmp-index HEAD &&
		GIT_INDEX_FILE=.tmp-index git add -A :/ &&
                GIT_INDEX_FILE=.tmp-index git write-tree
	)

But for the "working tree" set, which paths should be included? The same
set as what is in the index? Or would we use the set that is the union of
other tree-like things that are being compared, including the ones that
are not in the index? Or everything in the working tree, as we are
interested in what the user _could_ add?  That is one of the reasons why I
do not think it makes much sense trying to throw the working tree into the
picture, as it would have to open a large can of worms.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/5] combine-diff: respect textconv attributes
  2011-05-23 20:31         ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
  2011-05-23 22:47           ` Junio C Hamano
@ 2011-05-24 16:20           ` Junio C Hamano
  2011-05-24 18:52             ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2011-05-24 16:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

> This patch converts file contents according to textconv attributes. The
> implementation is slightly ugly; because the textconv code is tightly
> linked with the diff_filespec code, we temporarily create a
> diff_filespec during conversion.

After reading this patch again, I think this aversion to diff_filespec is
probably unjustified.  It is primarily a structure that records what path
has which blob by recording its object name and what mode, and also
in-core data when we make the contents available. There are small diff
specific data associated with it, but that could be separated out if you
really wanted to, perhaps like:

	struct diff_filespec {
        	struct filespec {
			... the generic and essential part ...
                } spec;
                /* diff specific part follows */
                const char *funcname_pattern_ident;
                int xfrm_flags;
                int rename_used;
                unsigned dirty_submodule :2;
                struct userdiff_driver *driver;
                int is_binary;
	};

and let most users use "struct filespec".

If anything else, we should be using the type in _more_ codepaths that are
not diff related but want to represent a path with its contents in the git
namespace (be it from working tree, index or a tree), not less, and in the
longer term weaken functions like fill_textconv() that take diff_filespec
to take filespec so that they can be made more reusable.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24 15:36                 ` Junio C Hamano
@ 2011-05-24 16:38                   ` Michael J Gruber
  2011-05-24 16:43                     ` Junio C Hamano
  2011-05-24 16:52                     ` Jay Soffian
  0 siblings, 2 replies; 37+ messages in thread
From: Michael J Gruber @ 2011-05-24 16:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Jay Soffian, git

Junio C Hamano venit, vidit, dixit 24.05.2011 17:36:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>> Jeff, thanks a bunch for taking this up again! That's a great
>> improvement. (I'm not sure I can devote enough time to reviewing, but
>> I'll see.)
> 
> Thanks. I've given a cursory look at the rest of the series (including the
> addendum to be squashed in) and they seemed Ok. Will give it another round
> of eyeballing before merging to "next".
> 
>>> I however highly doubt that such an interface would make sense. For
>>> example, what would be the desirable format to compare three versions of
>>> "What's cooking" postings, and how would the updated compare-cooking.perl
>>> script would look like?
>>
>> Yeah, currently --cc with external makes no sense, but there are several
>> external tools which could present a 3-way diff in a useful way (or even
>> n-way with n>3), e.g. vimdiff, kdiff3, meld.
> 
> With an external command that can make 3-way diffs of arbitrary three
> directories, you can trivially do something like:
> 
> 	#!/bin/sh
> 	mkdir tmp-head tmp-index &&
>         ( cd tmp-head &&
>           GIT_DIR=../.git GIT_INDEX_FILE=$GIT_DIR/.index-head &&
>           git checkout -f HEAD
> 	) &&
>         ( cd tmp-index &&
>           GIT_DIR=../.git &&
>           git checkout -f .
> 	) &&
>         $ext_diff3_cmd tmp-head/ tmp-index/ . &
>         wait
>         rm -fr tmp-index tmp-head
> 

and the fancy version of that is "difftool" in a sense, yes.

> But that seems totally offtopic and has nothing to do with the "combined
> diff" discussion, no?

Well, it seemed that Jay wanted an external tool for merge diffs. "-m"
does a series of simple 2way diffs, but "--cc" is special in that it
looks at all diffs at once, so a corresponding external tool would have
to do the same.

> If you want to plug in an external command that can make n-way diff of n
> files when some paths are still shown using the usual --cc codepath, then
> you would need an interface totally different from the diff.<driver>.cmd
> interface for two-way diff to the external diff.  I pointed at where to
> plug such a thing, but I do not think it would be of much use unless you
> are handing the whole n-trees to the external command (which essentially
> is what the above outline does). How would the user read the output that
> comes out mixed from different codepaths, some from our own --cc while
> others come from the external command, possibly opening separate windows
> and even worse grabbing control and getting the caller stuck until the
> user closes that window?

I don't quite follow, I think I/we/? am/are mixing external diff drivers
and difftools. I agree that is a side track/off-topic.

> 
> 	Side note: about getting stuck, will we see an update to the
> 	diffstat count series by the end of this cycle? I do not mind
> 	carrying it over to the next cycle at all, but I'd rather see
> 	something already started gets finished.

Yes, on my list. End of month you said, right?

>> When the --cc/textconv issue came up I looked into this, and maybe
>> difftool is a place where one could plug this in first in the sense of
>> refactoring that even more and providing a diff3tool or such to view a
>> merge commit (or compare any 3 versions), or/and provide "git diff3 A B
>> C" which creates a fake merge (A+B -> C).
> 
> You do not need "git diff3 A B C" for a fake merge.
> 
> 	$ git diff 61d7503d 2d22086 5bf29b9
> 
> already is a way to show you how the commit 61d7503d was created by
> merging the other two (the merge result comes first and then its parents).

Yes. (I came across this when I investigated a bit back then but failed
to mention it.)

> You could put the index into the mix by doing something like:
> 
> 	$ git diff next master $(git write-tree)
> 
> Trying to show combined diff to merge the index and the working tree into
> the current HEAD (which may be an example that does not make much sense)
> would look like this:
> 
> 	$ git diff HEAD $(git write-tree) $(
> 		git read-tree --index-output=.tmp-index HEAD &&
> 		GIT_INDEX_FILE=.tmp-index git add -A :/ &&
>                 GIT_INDEX_FILE=.tmp-index git write-tree
> 	)
> 
> But for the "working tree" set, which paths should be included? The same
> set as what is in the index? Or would we use the set that is the union of
> other tree-like things that are being compared, including the ones that
> are not in the index? Or everything in the working tree, as we are
> interested in what the user _could_ add?  That is one of the reasons why I
> do not think it makes much sense trying to throw the working tree into the
> picture, as it would have to open a large can of worms.

I've simply been drooling too much over vim-fugitive and was wondering
which aspects would fit into our ui. I was thinking of a 3-way-diff
version of git status, so to say. Next would be a 3-way-merge interface
which does all your add/reset/checkout -p from 3 vim bufs in diff mode,
which even vim-fugitive doesn't do. We can all dream, can't we?

Michael

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24 16:38                   ` Michael J Gruber
@ 2011-05-24 16:43                     ` Junio C Hamano
  2011-05-24 16:52                     ` Jay Soffian
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2011-05-24 16:43 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Jeff King, Jay Soffian, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

>> 	Side note: about getting stuck, will we see an update to the
>> 	diffstat count series by the end of this cycle? I do not mind
>> 	carrying it over to the next cycle at all, but I'd rather see
>> 	something already started gets finished.
>
> Yes, on my list. End of month you said, right?

Yes, but if you post anything on May 31st, it will never have enough time
to land in 'master' before the feature freeze ;-).

> I've simply been drooling too much over vim-fugitive and was wondering
> which aspects would fit into our ui.

Heh.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24 16:38                   ` Michael J Gruber
  2011-05-24 16:43                     ` Junio C Hamano
@ 2011-05-24 16:52                     ` Jay Soffian
  1 sibling, 0 replies; 37+ messages in thread
From: Jay Soffian @ 2011-05-24 16:52 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Junio C Hamano, Jeff King, git

On Tue, May 24, 2011 at 12:38 PM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> Well, it seemed that Jay wanted an external tool for merge diffs.

Only because I wasn't using textconv, but I should have been.

j.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/5] combine-diff: respect textconv attributes
  2011-05-24 16:20           ` Junio C Hamano
@ 2011-05-24 18:52             ` Jeff King
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-24 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jay Soffian, Michael J Gruber, git

On Tue, May 24, 2011 at 09:20:53AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This patch converts file contents according to textconv attributes. The
> > implementation is slightly ugly; because the textconv code is tightly
> > linked with the diff_filespec code, we temporarily create a
> > diff_filespec during conversion.
> 
> After reading this patch again, I think this aversion to diff_filespec is
> probably unjustified.

It's not an aversion to diff_filespec; it's an aversion to only half
buying-into diff_filespec. I think the best thing would be rewriting all
of combine-diff in terms of diff_filespec; I just started to do it and
it looked big and ugly for not much gain.

It's a little ugly to have to manually do the conversion into another
data structure just to call a function like fill_textconv, but I can
live with it. What I worry about more is that the ad-hoc use of a
structure like diff_filespec means we are violating some assumption that
other code has about the data structure (e.g., the bug Jay already
uncovered that we must have a valid name in the "path" field).

I think what is there now is correct; it's just that as a general rule,
switching data formats or abstractions in the middle of code makes me
feel I'm doing something wrong (or that the code interfaces need to be
refactored).

> If anything else, we should be using the type in _more_ codepaths that are
> not diff related but want to represent a path with its contents in the git
> namespace (be it from working tree, index or a tree), not less, and in the
> longer term weaken functions like fill_textconv() that take diff_filespec
> to take filespec so that they can be made more reusable.

Yeah, I would agree with that.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24  7:19               ` Michael J Gruber
  2011-05-24 15:36                 ` Junio C Hamano
@ 2011-05-24 19:13                 ` Jeff King
  2011-05-25  7:38                   ` Michael J Gruber
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff King @ 2011-05-24 19:13 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Junio C Hamano, Jay Soffian, git

On Tue, May 24, 2011 at 09:19:43AM +0200, Michael J Gruber wrote:

> > It is conceivable that we _could_ newly define a "combined external diff
> > driver" that would take 3 or more files, and compute and show the combined
> > result by itself, but that will certainly not go through the codepath you
> > touched with the textconv patch. Calling out to such a new type of
> > external diff driver would have to happen at the level where we have 1+N
> > blob object names for a N-parent commit, namely, at the beginning of
> > show_patch_diff(), bypassing the entire contents of that function and
> > instead letting the new n-way external diff driver do everything.
> > 
> > I however highly doubt that such an interface would make sense. For
> > example, what would be the desirable format to compare three versions of
> > "What's cooking" postings, and how would the updated compare-cooking.perl
> > script would look like?
> 
> Yeah, currently --cc with external makes no sense, but there are several
> external tools which could present a 3-way diff in a useful way (or even
> n-way with n>3), e.g. vimdiff, kdiff3, meld.

I agree with Junio that we would need a new config option and external
interface for "n-way combined diff". However, isn't what things like
vimdiff and meld do the reverse of our combined diff? That is, don't
they assume the 3 trees are: ours, theirs, and ancestor (i.e., merge
base)? Whereas in a combined diff, it is actually: merge parent 1
(ours), merge parent 2 (theirs), and merge _result_.

Also, do those tools generally handle n-way comparisons as opposed to
3-way?

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-24 19:13                 ` Jeff King
@ 2011-05-25  7:38                   ` Michael J Gruber
  2011-05-25 15:29                     ` Jeff King
  0 siblings, 1 reply; 37+ messages in thread
From: Michael J Gruber @ 2011-05-25  7:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jay Soffian, git

Jeff King venit, vidit, dixit 24.05.2011 21:13:
> On Tue, May 24, 2011 at 09:19:43AM +0200, Michael J Gruber wrote:
> 
>>> It is conceivable that we _could_ newly define a "combined external diff
>>> driver" that would take 3 or more files, and compute and show the combined
>>> result by itself, but that will certainly not go through the codepath you
>>> touched with the textconv patch. Calling out to such a new type of
>>> external diff driver would have to happen at the level where we have 1+N
>>> blob object names for a N-parent commit, namely, at the beginning of
>>> show_patch_diff(), bypassing the entire contents of that function and
>>> instead letting the new n-way external diff driver do everything.
>>>
>>> I however highly doubt that such an interface would make sense. For
>>> example, what would be the desirable format to compare three versions of
>>> "What's cooking" postings, and how would the updated compare-cooking.perl
>>> script would look like?
>>
>> Yeah, currently --cc with external makes no sense, but there are several
>> external tools which could present a 3-way diff in a useful way (or even
>> n-way with n>3), e.g. vimdiff, kdiff3, meld.
> 
> I agree with Junio that we would need a new config option and external
> interface for "n-way combined diff". However, isn't what things like
> vimdiff and meld do the reverse of our combined diff? That is, don't
> they assume the 3 trees are: ours, theirs, and ancestor (i.e., merge
> base)? Whereas in a combined diff, it is actually: merge parent 1
> (ours), merge parent 2 (theirs), and merge _result_.
> 
> Also, do those tools generally handle n-way comparisons as opposed to
> 3-way?

"Those" tools do "that" which I mean :)

It depends on the tool, of course. vimdiff does not have any assumptions
that I know of, it marks those hunks which are not common to all
buffers, and marks them differently depending on what subset of buffers
shares them. That aspect is the same for kdiff3 and meld (n<=3 for meld)
in diff mode.

There are more differences in merge mode:

In vimdiff, you can "put" a hunk (i.e. apply the change) to another
buffer (or "obtain" one), i.e. you can easily move hunks between the
buffers (to do a merge). In kdiff3's auto merge mode there is the
assumption you mention (because it actively does some merging).

So, in vimdiff, you could in principle change and write any buffer but
our tools don't support it so far because it is not needed for a merge
which has one "target" only. In the ui of my dreams, I would have 3
buffers HEAD INDEX WORKTREE (H I W) and moving hunks between them would
do all of add -p, reset -p and checkout -p (the HEAD buf would be
read-only).

With "differently abled" tools it could still be useful to have, say a
tool invoked for the merge HEAD+WORKTREE -> INDEX (with the current
state of the INDEX as the automatic resolution to start off from) so
that you can do add+reset -p with your mergetool, and maybe similarly
for HEAD+INDEX -> WORKTREE. I.e. addtool and checkouttool in addition to
the difftool and mergetool which we have. Just not for the current
release cycle any more.

Michael

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: combined diff does not detect binary files and ignores -diff attribute
  2011-05-25  7:38                   ` Michael J Gruber
@ 2011-05-25 15:29                     ` Jeff King
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2011-05-25 15:29 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Junio C Hamano, Jay Soffian, git

On Wed, May 25, 2011 at 09:38:43AM +0200, Michael J Gruber wrote:

> > I agree with Junio that we would need a new config option and external
> > interface for "n-way combined diff". However, isn't what things like
> > vimdiff and meld do the reverse of our combined diff? That is, don't
> > they assume the 3 trees are: ours, theirs, and ancestor (i.e., merge
> > base)? Whereas in a combined diff, it is actually: merge parent 1
> > (ours), merge parent 2 (theirs), and merge _result_.
> > 
> > Also, do those tools generally handle n-way comparisons as opposed to
> > 3-way?
> 
> "Those" tools do "that" which I mean :)

OK, they are more capable than I realized, then. :)

> So, in vimdiff, you could in principle change and write any buffer but
> our tools don't support it so far because it is not needed for a merge
> which has one "target" only. In the ui of my dreams, I would have 3
> buffers HEAD INDEX WORKTREE (H I W) and moving hunks between them would
> do all of add -p, reset -p and checkout -p (the HEAD buf would be
> read-only).

This should be relatively easy to implement on the git side, as you are
doing the hard parts of the "-p" operation in vim already. You just need
to write vim's buffer into the worktree or into the index via
hash-object / update-index.

Also, wouldn't you potentially want more than 3 buffers, if you were
cherry-picking changes from another commit (as in "git checkout -p
$commit" or even putting and taking things from a stash? I think that
would be a simple generalization of what you are proposing though.

> With "differently abled" tools it could still be useful to have, say a
> tool invoked for the merge HEAD+WORKTREE -> INDEX (with the current
> state of the INDEX as the automatic resolution to start off from) so
> that you can do add+reset -p with your mergetool, and maybe similarly
> for HEAD+INDEX -> WORKTREE. I.e. addtool and checkouttool in addition to
> the difftool and mergetool which we have. Just not for the current
> release cycle any more.

Yeah, getting back to the original discussion. How badly do people
actually want an external diff driver that can do fancy things with
multiple parents? It seems that for non-text diff viewers, the preferred
solution is to use difftool these days (but that is just my impression;
I don't use either difftool or external diff drivers).

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-23 20:27         ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
  2011-05-23 23:02           ` Junio C Hamano
@ 2011-05-30  6:33           ` Junio C Hamano
  2011-05-30 14:36             ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2011-05-30  6:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

> @@ -852,6 +862,29 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>  		}
>  	}
>  
> +	if (userdiff->binary != -1)
> +		is_binary = userdiff->binary;
> +	else {
> +		is_binary = buffer_is_binary(result, result_size);
> +		for (i = 0; !is_binary && i < num_parent; i++) {
> +			char *buf;
> +			unsigned long size;
> +			buf = grab_blob(elem->parent[i].sha1,
> +					elem->parent[i].mode,
> +					&size);
> +			if (buffer_is_binary(buf, size))
> +				is_binary = 1;
> +			free(buf);
> +		}
> +	}

Two comments.

 - This loop will grab the blob from all parents just to peek and discard
   for most of commits. It feels somewhat wasteful, especially because
   "binary" that is not marked as binary is a rare minor case (most of the
   paths are text). Stopping immediately at the first binary blob does not
   optimize the loop even though it is better than not having one.

 - It may make sense to compare [i].sha1 with earlier [j].sha1 (j < i) and
   avoid grab_blob() altogether?  Cf. 3c39e9b (combine-diff: reuse diff
   from the same blob., 2006-02-01).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-30  6:33           ` Junio C Hamano
@ 2011-05-30 14:36             ` Jeff King
  2011-05-30 16:19               ` Jeff King
  2011-05-31 22:42               ` Junio C Hamano
  0 siblings, 2 replies; 37+ messages in thread
From: Jeff King @ 2011-05-30 14:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jay Soffian, Michael J Gruber, git

On Sun, May 29, 2011 at 11:33:38PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > @@ -852,6 +862,29 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
> >  		}
> >  	}
> >  
> > +	if (userdiff->binary != -1)
> > +		is_binary = userdiff->binary;
> > +	else {
> > +		is_binary = buffer_is_binary(result, result_size);
> > +		for (i = 0; !is_binary && i < num_parent; i++) {
> > +			char *buf;
> > +			unsigned long size;
> > +			buf = grab_blob(elem->parent[i].sha1,
> > +					elem->parent[i].mode,
> > +					&size);
> > +			if (buffer_is_binary(buf, size))
> > +				is_binary = 1;
> > +			free(buf);
> > +		}
> > +	}
> 
> Two comments.
> 
>  - This loop will grab the blob from all parents just to peek and discard
>    for most of commits. It feels somewhat wasteful, especially because
>    "binary" that is not marked as binary is a rare minor case (most of the
>    paths are text). Stopping immediately at the first binary blob does not
>    optimize the loop even though it is better than not having one.

Yeah, I am not too happy with that bit. Our choices are:

  1. Grab each blob, check binary-ness, and free. This double-loads in
     the common, non-binary case.

  2. Grab each blob, check binary-ness, and keep it in memory. This
     means using N times as much memory, where N is the number of
     parents. In practice, N generally equals 2, and the file sizes
     aren't gigantic, so it's not that bad. But there are some corner
     cases.

  3. Choose (1) or (2) based on file size, or based on the number of
     parents. The problem cases for (2) are going to be big files
     (bigger than bigFileThreshold?), and gigantic numbers of parents.

I'd also eventually like to do a "peek" binary-ness check on top of your
streaming work, as I showed for the non-combined diff case (speaking of
which, I need to polish that now that the streaming series seems to have
settled). And we would probably want to peek only in the big file case.

I'll try to take a look at it this week and get some measurements on (1)
versus (2) for both speed and peak memory usage. And then see if I can
do better with (3), and implement the "peek" solution both here and in
regular diff.

>  - It may make sense to compare [i].sha1 with earlier [j].sha1 (j < i) and
>    avoid grab_blob() altogether?  Cf. 3c39e9b (combine-diff: reuse diff
>    from the same blob., 2006-02-01).

Yeah, that is probably worth doing, as it collapses the "N" above into
"how many parents all modified the same file". I'll add it to my list to
measure.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-30 14:36             ` Jeff King
@ 2011-05-30 16:19               ` Jeff King
  2011-05-30 19:32                 ` Junio C Hamano
  2011-05-31 22:42               ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff King @ 2011-05-30 16:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jay Soffian, Michael J Gruber, git

On Mon, May 30, 2011 at 10:36:27AM -0400, Jeff King wrote:

>   1. Grab each blob, check binary-ness, and free. This double-loads in
>      the common, non-binary case.
> [...]
>
> I'll try to take a look at it this week and get some measurements on (1)
> versus (2) for both speed and peak memory usage. And then see if I can
> do better with (3), and implement the "peek" solution both here and in
> regular diff.

I was curious about this, so I stole a few minutes to do some
preliminary benchmarks this morning.

The first thing to look at is the performance of the original code, that
does not check binary-ness at all. It's going to represent the best we
can do with any strategy. So I tried:

  git log -p --cc --merges origin/master

on git.git using both v1.7.5.3 and the jk/combine-diff-binary-etc
branch. And it turns out that the extra loads really don't make a
difference in practice. My best-of-5 for the two cases were:

  $ time git.v1.7.5.3 log -p --cc --merges origin/master >/dev/null
  real    0m59.518s
  user    0m58.672s
  sys     0m0.688s

  $ time git.jk.binary-combined-diff log -p --cc \
      --merges origin/master >/dev/null
  real    0m58.949s
  user    0m58.220s
  sys     0m0.572s

The new code actually came out slightly faster.  One reason may be that
there are 3 combined diffs of git-gui/lib/git-gui.ico that we avoid
doing (and just say "Binary files differ"). That's not a lot, but it
gives us a very tiny edge (though that edge is very close to the amount
of noise between runs). Still, I think it implies that the extra loads
in the common non-binary case are not actually measurable.

The peak memory use between the two should be the same (since we free
each blob immediately), but I didn't measure it.

So I think in practice it's not a big deal. I'll still take a look at
the "peek" optimization later this week, since that can make a
difference in some corner cases. And as part of that, it will probably
make sense to keep the buffers around for small-ish files, so we'll get
the optimization I mentioned more or less for free. I'll also do the
check for duplicated sha1s that you mentioned.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-30 16:19               ` Jeff King
@ 2011-05-30 19:32                 ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2011-05-30 19:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

>   git log -p --cc --merges origin/master
>
> on git.git using both v1.7.5.3 and the jk/combine-diff-binary-etc
> branch. And it turns out that the extra loads really don't make a
> difference in practice. My best-of-5 for the two cases were:

I am not surprised. git.git is small enough and developer machines these
days are beefy enough so that most of the data from the disk are sitting
in the buffer cache, and if you are well packed and Linus's law "a file
always grows" holds true, it is likely that you are also benefiting from
the delta-base cache, as the history traversal of the command goes newer
to older.

>   $ time git.v1.7.5.3 log -p --cc --merges origin/master >/dev/null
>   real    0m59.518s
>   user    0m58.672s
>   sys     0m0.688s

Comparing that with the run time of the same log without patch generation
would roughly give the cost of "all text combined" diff. I suspect more
than 90% of the time goes to running the textual diff algorithm.

I guess you are hinting that we should optimize that part before worrying
about the wastage of reading these blobs always twice.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] combine-diff: handle binary files as binary
  2011-05-30 14:36             ` Jeff King
  2011-05-30 16:19               ` Jeff King
@ 2011-05-31 22:42               ` Junio C Hamano
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2011-05-31 22:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Jay Soffian, Michael J Gruber, git

Jeff King <peff@peff.net> writes:

> ... speaking of
> which, I need to polish that now that the streaming series seems to have
> settled...

I merely put it on the back burner, not necessarily declaring it as
"settled".

I just created a pair of test repository with a single 800M binary junk
file. One repository has it as a single loose object file, and the other
repository has it in a packfile.  The path is not explicitly marked with
any attributes, but I have no CRLF funniness configured, so streaming
would be used but there won't be any filtering involved.

Removing the file from the working tree and then checking it out of the
index gave me a pleasant surprise. I originally did this only to help
smaller machines that cannot comfortably fit inflated blob data in core,
but it turns out that streaming write seems to perform better.

These are the typical /usr/bin/time output:

  $ /usr/bin/time git checkout a ;# w/o streaming
  1.39user 2.23system 0:03.62elapsed 99%CPU (0avgtext+0avgdata 6297056maxresident)k
  0inputs+1572872outputs (0major+399828minor)pagefaults 0swaps

  $ /usr/bin/time git checkout a ;# w/ streaming
  1.35user 1.22system 0:02.52elapsed 101%CPU (0avgtext+0avgdata 3151536maxresident)k
  0inputs+1572872outputs (0major+203226minor)pagefaults 0swaps

So now I can say I am reasonably happy with the series, but I haven't
really exercised the code heavily, so there may still be bugs lurking ;-).

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-05-31 22:43 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-22 20:12 combined diff does not detect binary files and ignores -diff attribute Jay Soffian
2011-05-23 13:30 ` Michael J Gruber
2011-05-23 15:17   ` Jay Soffian
2011-05-23 17:07     ` Junio C Hamano
2011-05-23 18:11     ` Jeff King
2011-05-23 20:15       ` Jeff King
2011-05-23 20:16         ` [PATCH 1/5] combine-diff: split header printing into its own function Jeff King
2011-05-23 20:16         ` [PATCH 2/5] combine-diff: calculate mode_differs earlier Jeff King
2011-05-23 20:27         ` [PATCH 3/5] combine-diff: handle binary files as binary Jeff King
2011-05-23 23:02           ` Junio C Hamano
2011-05-23 23:50             ` Jeff King
2011-05-30  6:33           ` Junio C Hamano
2011-05-30 14:36             ` Jeff King
2011-05-30 16:19               ` Jeff King
2011-05-30 19:32                 ` Junio C Hamano
2011-05-31 22:42               ` Junio C Hamano
2011-05-23 20:30         ` [PATCH 4/5] refactor get_textconv to not require diff_filespec Jeff King
2011-05-23 20:31         ` [PATCH 5/5] combine-diff: respect textconv attributes Jeff King
2011-05-23 22:47           ` Junio C Hamano
2011-05-23 23:39             ` Jeff King
2011-05-24 16:20           ` Junio C Hamano
2011-05-24 18:52             ` Jeff King
2011-05-23 22:55         ` combined diff does not detect binary files and ignores -diff attribute Jay Soffian
2011-05-23 23:31           ` Jay Soffian
2011-05-23 23:49             ` Jeff King
2011-05-24  0:59               ` Jay Soffian
2011-05-23 23:41           ` Jeff King
2011-05-24  4:46             ` Junio C Hamano
2011-05-24  7:19               ` Michael J Gruber
2011-05-24 15:36                 ` Junio C Hamano
2011-05-24 16:38                   ` Michael J Gruber
2011-05-24 16:43                     ` Junio C Hamano
2011-05-24 16:52                     ` Jay Soffian
2011-05-24 19:13                 ` Jeff King
2011-05-25  7:38                   ` Michael J Gruber
2011-05-25 15:29                     ` Jeff King
2011-05-24 14:40               ` Jay Soffian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.