All of lore.kernel.org
 help / color / mirror / Atom feed
* textconv not invoked when viewing merge commit
@ 2011-04-11 17:12 Peter Oberndorfer
  2011-04-12  9:04 ` Michael J Gruber
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Oberndorfer @ 2011-04-11 17:12 UTC (permalink / raw)
  To: Git List

Hi,

i currently use a textconv filter to show contents of a zip like archive in a user readable format(as a list of contained files).

This works fine, except for merge commits.
For merge commits i see the diff of the binary contents of the file.

Is this intentional?
git help gitattributes mentions no such limitation.
Anywhere else(gitk(on non merge commit), git gui blame) i see the the filtered textual representation of the file.

I tried 1.7.4 msysgit and current master

Greetings Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-11 17:12 textconv not invoked when viewing merge commit Peter Oberndorfer
@ 2011-04-12  9:04 ` Michael J Gruber
  2011-04-14 19:09   ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Michael J Gruber @ 2011-04-12  9:04 UTC (permalink / raw)
  To: Peter Oberndorfer; +Cc: Git List

Peter Oberndorfer venit, vidit, dixit 11.04.2011 19:12:
> Hi,
> 
> i currently use a textconv filter to show contents of a zip like archive in a user readable format(as a list of contained files).
> 
> This works fine, except for merge commits.
> For merge commits i see the diff of the binary contents of the file.
> 
> Is this intentional?
> git help gitattributes mentions no such limitation.
> Anywhere else(gitk(on non merge commit), git gui blame) i see the the filtered textual representation of the file.
> 
> I tried 1.7.4 msysgit and current master
> 
> Greetings Peter

textconv is applied for "diff -m" but not for combined diffs (-c, --cc)
at the moment. They go through a completely different codepath, so it is
expected code-wise (not a bug per se) but not ui-wise.

Looking at the code and trying to dig something up atm...

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-12  9:04 ` Michael J Gruber
@ 2011-04-14 19:09   ` Jeff King
  2011-04-14 19:15     ` Jeff King
  2011-04-14 19:26     ` Junio C Hamano
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff King @ 2011-04-14 19:09 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Peter Oberndorfer, Git List

On Tue, Apr 12, 2011 at 11:04:43AM +0200, Michael J Gruber wrote:

> > This works fine, except for merge commits.
> > For merge commits i see the diff of the binary contents of the file.
> > 
> > Is this intentional?
> > git help gitattributes mentions no such limitation.
> > Anywhere else(gitk(on non merge commit), git gui blame) i see the the filtered textual representation of the file.
> > 
> > I tried 1.7.4 msysgit and current master
> > 
> > Greetings Peter
> 
> textconv is applied for "diff -m" but not for combined diffs (-c, --cc)
> at the moment. They go through a completely different codepath, so it is
> expected code-wise (not a bug per se) but not ui-wise.
> 
> Looking at the code and trying to dig something up atm...

Ick. I started with this test:

diff --git a/t/t4046-diff-textconv-merge.sh b/t/t4046-diff-textconv-merge.sh
new file mode 100755
index 0000000..8643330
--- /dev/null
+++ b/t/t4046-diff-textconv-merge.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='combined diff uses textconv'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit one file &&
+	test_commit two file &&
+	git checkout -b other HEAD^ &&
+	test_commit three file &&
+	test_must_fail git merge master &&
+	echo resolved >file &&
+	git commit -a &&
+	echo "file diff=upcase" >.gitattributes &&
+	git config diff.upcase.textconv "tr a-z A-Z <"
+'
+
+cat >expect <<'EOF'
+Merge branch 'master' into other
+
+diff --combined file
+index 2bdf67a,f719efd..2ab19ae
+--- a/file
++++ b/file
+@@@ -1,1 -1,1 +1,1 @@@
+- THREE
+ -TWO
+++RESOLVED
+EOF
+test_expect_success 'diff -c uses textconv' '
+	git show --format=%s -c >actual &&
+	test_cmp expect actual
+'
+
+test_done

but after looking at the codepath, it is must worse than just textconv.
Try this:

  git init repo &&
  cd repo &&
  openssl rand 64 >file.bin &&
  git add file.bin &&
  git commit -m one &&
  openssl rand 64 >file.bin &&
  git commit -a -m two &&
  git checkout -b other HEAD^
  openssl rand 64 >file.bin &&
  git commit -a -m three &&
  (git merge master || true) &&
  openssl rand 64 >file.bin &&
  git commit -a -m resolved &&
  git show

We just dump the binary goo all over the terminal. So I think the whole
combined-diff code path needs to learn how to handle binaries properly.

Unfortunately, it seems to be totally distinct from the regular diff
code path. It doesn't even use diff_filespecs, so our usual is_binary
and textconv code won't work. So the best way forward may involve
significant refactoring.

And of course we have to figure out sane semantics. The textconv case is
easy; just use the textconv blobs instead of the regular ones. But what
should the true binary case (as in the rand example above) show?

-Peff

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 19:09   ` Jeff King
@ 2011-04-14 19:15     ` Jeff King
  2011-04-14 19:26     ` Junio C Hamano
  1 sibling, 0 replies; 25+ messages in thread
From: Jeff King @ 2011-04-14 19:15 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Peter Oberndorfer, Git List

On Thu, Apr 14, 2011 at 03:09:01PM -0400, Jeff King wrote:

> but after looking at the codepath, it is must worse than just textconv.
> Try this:

Ugh, that should be "much worse" of course.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 19:09   ` Jeff King
  2011-04-14 19:15     ` Jeff King
@ 2011-04-14 19:26     ` Junio C Hamano
  2011-04-14 19:28       ` Jeff King
  2011-04-14 19:43       ` Junio C Hamano
  1 sibling, 2 replies; 25+ messages in thread
From: Junio C Hamano @ 2011-04-14 19:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, Peter Oberndorfer, Git List

Jeff King <peff@peff.net> writes:

> We just dump the binary goo all over the terminal. So I think the whole
> combined-diff code path needs to learn how to handle binaries properly.

How would you show multi-way diffs for binary files?

It would probably be sufficient to say "binary files differ" at the
beginning of the patch-combining codepath of the combined diff, which
would at least keep the --raw -c/--cc output working.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 19:26     ` Junio C Hamano
@ 2011-04-14 19:28       ` Jeff King
  2011-04-14 19:35         ` Michael J Gruber
  2011-04-14 19:43       ` Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff King @ 2011-04-14 19:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael J Gruber, Peter Oberndorfer, Git List

On Thu, Apr 14, 2011 at 12:26:18PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We just dump the binary goo all over the terminal. So I think the whole
> > combined-diff code path needs to learn how to handle binaries properly.
> 
> How would you show multi-way diffs for binary files?

No clue. But anything would be better than pretending it's line oriented
and dumping binary goo to the terminal.

> It would probably be sufficient to say "binary files differ" at the
> beginning of the patch-combining codepath of the combined diff, which
> would at least keep the --raw -c/--cc output working.

Yeah, something like "binary files differ" would probably be OK for
"-c". I think for "--cc", that is probably the best we can do, too. It
is about condensing uninteresting hunks, but we don't even have the
concept of hunks.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 19:28       ` Jeff King
@ 2011-04-14 19:35         ` Michael J Gruber
  0 siblings, 0 replies; 25+ messages in thread
From: Michael J Gruber @ 2011-04-14 19:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Peter Oberndorfer, Git List

Jeff King venit, vidit, dixit 14.04.2011 21:28:
> On Thu, Apr 14, 2011 at 12:26:18PM -0700, Junio C Hamano wrote:
> 
>> Jeff King <peff@peff.net> writes:
>>
>>> We just dump the binary goo all over the terminal. So I think the whole
>>> combined-diff code path needs to learn how to handle binaries properly.
>>
>> How would you show multi-way diffs for binary files?
> 
> No clue. But anything would be better than pretending it's line oriented
> and dumping binary goo to the terminal.
> 
>> It would probably be sufficient to say "binary files differ" at the
>> beginning of the patch-combining codepath of the combined diff, which
>> would at least keep the --raw -c/--cc output working.
> 
> Yeah, something like "binary files differ" would probably be OK for
> "-c". I think for "--cc", that is probably the best we can do, too. It
> is about condensing uninteresting hunks, but we don't even have the
> concept of hunks.
> 
> -Peff

I have "one half" of a partial fix cooking which applies textconv. Good
to see that I'm not the only who is surprised by the disjointness of
codepaths. That should give me some freedom to go wild in
combine-diff.c... But it's rc-time, we're devoting all git time to
regression fixes and cleanups, right?

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 19:26     ` Junio C Hamano
  2011-04-14 19:28       ` Jeff King
@ 2011-04-14 19:43       ` Junio C Hamano
  2011-04-14 20:06         ` Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-14 19:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, Peter Oberndorfer, Git List

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> We just dump the binary goo all over the terminal. So I think the whole
>> combined-diff code path needs to learn how to handle binaries properly.
>
> How would you show multi-way diffs for binary files?
>
> It would probably be sufficient to say "binary files differ" at the
> beginning of the patch-combining codepath of the combined diff, which
> would at least keep the --raw -c/--cc output working.

In other words, I suspect that the only places you need to touch in the
existing codepath would be these places.

diff --git a/combine-diff.c b/combine-diff.c
index 655fa89..9c96f1f 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -201,7 +201,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
 	}
 }
 
-static void combine_diff(const unsigned char *parent, unsigned int mode,
+static int combine_diff(const unsigned char *parent, unsigned int mode,
 			 mmfile_t *result_file,
 			 struct sline *sline, unsigned int cnt, int n,
 			 int num_parent, int result_deleted)
@@ -215,10 +215,17 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	unsigned long sz;
 
 	if (result_deleted)
-		return; /* result deleted */
+		return 0; /* result deleted */
 
 	parent_file.ptr = grab_blob(parent, mode, &sz);
 	parent_file.size = sz;
+
+	if (path has textconv) {
+		parent_file.{ptr,size} = textconv of parent_file;
+	} else if (path is binary) {
+		return -1;
+	}
+
 	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
@@ -255,6 +262,7 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 			p_lno++; /* no '+' means parent had it */
 	}
 	sline[lno].p_lno[n] = p_lno; /* trailer */
+	return 0;
 }
 
 static unsigned long context = 3;
@@ -777,6 +785,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			close(fd);
 	}
 
+	if (path has textconv) {
+		result, result_size = textconv of result;
+	} else if (path is binary) {
+		goto exit_binary;
+	}
+
 	for (cnt = 0, cp = result; cp < result + result_size; cp++) {
 		if (*cp == '\n')
 			cnt++;
@@ -820,11 +834,13 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 				break;
 			}
 		}
-		if (i <= j)
-			combine_diff(elem->parent[i].sha1,
-				     elem->parent[i].mode,
-				     &result_file, sline,
-				     cnt, i, num_parent, result_deleted);
+		if (i <= j) {
+			if (combine_diff(elem->parent[i].sha1,
+					 elem->parent[i].mode,
+					 &result_file, sline,
+					 cnt, i, num_parent, result_deleted))
+				goto exit_binary;
+		}
 		if (elem->parent[i].mode != elem->mode)
 			mode_differs = 1;
 	}
@@ -892,6 +908,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 		dump_sline(sline, cnt, num_parent,
 			   DIFF_OPT_TST(opt, COLOR_DIFF), result_deleted);
 	}
+
+free_exit:
 	free(result);
 
 	for (lno = 0; lno < cnt; lno++) {
@@ -906,6 +924,11 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	}
 	free(sline[0].p_lno);
 	free(sline);
+	return;
+
+exit_binary:
+	printf("path '%s' is binary\n", elem->path);
+	goto free_exit;
 }
 
 #define COLONS "::::::::::::::::::::::::::::::::"

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 19:43       ` Junio C Hamano
@ 2011-04-14 20:06         ` Junio C Hamano
  2011-04-14 20:23           ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-14 20:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, Peter Oberndorfer, Git List

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Jeff King <peff@peff.net> writes:
>>
>>> We just dump the binary goo all over the terminal. So I think the whole
>>> combined-diff code path needs to learn how to handle binaries properly.
>>
>> How would you show multi-way diffs for binary files?
>>
>> It would probably be sufficient to say "binary files differ" at the
>> beginning of the patch-combining codepath of the combined diff, which
>> would at least keep the --raw -c/--cc output working.
>
> In other words, I suspect that the only places you need to touch in the
> existing codepath would be these places.


In the "here are the places" patch, I changed combine_diff() to return "is
this binary?" and made show_patch_diff() to give just a single "path is
binary", but I suspect that it would be simpler not try to be too nice
about binary like that.

Instead, I think we should just use "Binary blob $SHA-1\n" as if that is
the textconv of a binary file without textconv filter.  That would
certainly make the code much simpler, and more importantly, the output
would become more pleasant. We would show something like:

    - Binary blob bc3c57058faba66f6a7a947e1e9642f47053b5bb
     -Binary blob 536e55524db72bd2acf175208aef4f3dfc148d42
    ++Binary blob 67cfeb2016b24df1cb406c18145efd399f6a1792

if we did so.

When showing the working-tree version, we obviously do not have the blob
object name yet, so in such a case, we can say "Binary blob", or just
"Binary".

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 20:06         ` Junio C Hamano
@ 2011-04-14 20:23           ` Jeff King
  2011-04-14 21:05             ` Junio C Hamano
  2011-04-15  6:54             ` textconv not invoked when viewing merge commit Matthieu Moy
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff King @ 2011-04-14 20:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael J Gruber, Peter Oberndorfer, Git List

On Thu, Apr 14, 2011 at 01:06:19PM -0700, Junio C Hamano wrote:

> Instead, I think we should just use "Binary blob $SHA-1\n" as if that is
> the textconv of a binary file without textconv filter.  That would
> certainly make the code much simpler, and more importantly, the output
> would become more pleasant. We would show something like:
> 
>     - Binary blob bc3c57058faba66f6a7a947e1e9642f47053b5bb
>      -Binary blob 536e55524db72bd2acf175208aef4f3dfc148d42
>     ++Binary blob 67cfeb2016b24df1cb406c18145efd399f6a1792
> 
> if we did so.

Yeah, I think that is pretty readable. But it gives me a funny feeling
to encode magic strings inside actual diff output. That is, the output
is indistinguishable from a file which contained the "Binary blob..."
strings.

I can't think of a case where it matters, though, so maybe it is just
paranoia.

We do something similar for textconv, of course, but we always knew that
was a human-only thing, and it isn't enabled for plumbing commands. This
would be.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 20:23           ` Jeff King
@ 2011-04-14 21:05             ` Junio C Hamano
  2011-04-14 21:30               ` Jeff King
  2011-04-15  6:54             ` textconv not invoked when viewing merge commit Matthieu Moy
  1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-14 21:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Michael J Gruber, Peter Oberndorfer, Git List

Jeff King <peff@peff.net> writes:

> On Thu, Apr 14, 2011 at 01:06:19PM -0700, Junio C Hamano wrote:
>
>> Instead, I think we should just use "Binary blob $SHA-1\n" as if that is
>> the textconv of a binary file without textconv filter.  That would
>> certainly make the code much simpler, and more importantly, the output
>> would become more pleasant. We would show something like:
>> 
>>     - Binary blob bc3c57058faba66f6a7a947e1e9642f47053b5bb
>>      -Binary blob 536e55524db72bd2acf175208aef4f3dfc148d42
>>     ++Binary blob 67cfeb2016b24df1cb406c18145efd399f6a1792
>> 
>> if we did so.
>
> Yeah, I think that is pretty readable. But it gives me a funny feeling
> to encode magic strings inside actual diff output. That is, the output
> is indistinguishable from a file which contained the "Binary blob..."
> strings.
>
> I can't think of a case where it matters, though, so maybe it is just
> paranoia.
>
> We do something similar for textconv, of course, but we always knew that
> was a human-only thing, and it isn't enabled for plumbing commands. This
> would be.

Yeah, that may be a sensible concern.

If we really cared, I would say that plumbing should keep the current
behaviour (line-by-line even for binaries, and not using textconv unless
it is asked).  If the command line asked for --textconv, we can use that
"Binary blob $SHA-1" string as a fallback textconv result for binary blobs
that do not have any textconv filter configured.  So the additional logic
to convert the final image and parent images (two places to patch) would
become more like:

	if (if we are a Porcelain or --textconv option given) {
		if (path has textconv)
                	use textconv;
		else if (path is binary)
                	use "Binary blob $SHA-1";
	}

Having said all that, I don't think we made -c/--cc available to plumbing
on purpose; rather they happen to be available because we thought people
with common sense wouldn't run things like "diff-tree --c" that are meant
for human consumption and expect the result to be parsable by their
scripts. In other words, making the parser barf only for plumbing was not
worth doing.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 21:05             ` Junio C Hamano
@ 2011-04-14 21:30               ` Jeff King
  2011-04-15 15:29                 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2011-04-14 21:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael J Gruber, Peter Oberndorfer, Git List

On Thu, Apr 14, 2011 at 02:05:07PM -0700, Junio C Hamano wrote:

> > Yeah, I think that is pretty readable. But it gives me a funny feeling
> > to encode magic strings inside actual diff output. That is, the output
> > is indistinguishable from a file which contained the "Binary blob..."
> > strings.
>[...]
> 
> Yeah, that may be a sensible concern.
> 
> If we really cared, I would say that plumbing should keep the current
> behaviour (line-by-line even for binaries, and not using textconv unless
> it is asked).

I disagree. Spewing binary contents in the middle of patch output is
wrong and a bug, and we should fix it. Not to mention that the results
are simply incomprehensible in many cases. Binary data isn't
line-oriented, and treating it that way is just going to produce
confusing and useless results. Not to mention that I wouldn't be
surprised if embedded NULs in the data are not being handled properly by
the diff code.

I would much rather have it say "Binary files differ". It's not that
informative, but at least you don't waste a lot of time trying to figure
out what in the world it means.

> Having said all that, I don't think we made -c/--cc available to plumbing
> on purpose; rather they happen to be available because we thought people
> with common sense wouldn't run things like "diff-tree --c" that are meant
> for human consumption and expect the result to be parsable by their
> scripts. In other words, making the parser barf only for plumbing was not
> worth doing.

Weren't they needed originally for "git rev-list | git diff-tree"? Maybe
they post-date the invention of actual C "git log"; I didn't look. At
any rate, they've been around for a while, and it is not unreasonable
for somebody to want to script around the generation of human-readable
output, so I think they are a good addition.

I think the real argument to be made is that "--cc" was never parseable,
because it can't be applied, and users of the format should know that. I
sort of buy that. Though you could also potentially do other kinds of
analysis on --cc output (e.g., something blame-ish but totally external
to git). And for that you wouldn't want to pretend content was there
that isn't. It's an edge case, certainly, but I don't see any reason not
to be conservative in what we generate. The "Binary files differ" type
of output is not that much harder to generate.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-14 20:23           ` Jeff King
  2011-04-14 21:05             ` Junio C Hamano
@ 2011-04-15  6:54             ` Matthieu Moy
  2011-04-15 20:45               ` Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Matthieu Moy @ 2011-04-15  6:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Michael J Gruber, Peter Oberndorfer, Git List

Jeff King <peff@peff.net> writes:

> On Thu, Apr 14, 2011 at 01:06:19PM -0700, Junio C Hamano wrote:
>
>> Instead, I think we should just use "Binary blob $SHA-1\n" as if that is
>> the textconv of a binary file without textconv filter.  That would
>> certainly make the code much simpler, and more importantly, the output
>> would become more pleasant. We would show something like:
>> 
>>     - Binary blob bc3c57058faba66f6a7a947e1e9642f47053b5bb
>>      -Binary blob 536e55524db72bd2acf175208aef4f3dfc148d42
>>     ++Binary blob 67cfeb2016b24df1cb406c18145efd399f6a1792
>> 
>> if we did so.
>
> Yeah, I think that is pretty readable. But it gives me a funny feeling
> to encode magic strings inside actual diff output. That is, the output
> is indistinguishable from a file which contained the "Binary blob..."
> strings.
>
> I can't think of a case where it matters, though, so maybe it is just
> paranoia.

A line-counting, statistics tool would think that 1 line has been
removed from both branches, and one new added by the merge.

Well, I know no tool parsing combined diff actually, so it's indeed a
hypothetical case.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] combine-diff: use textconv for combined diff format
  2011-04-14 21:30               ` Jeff King
@ 2011-04-15 15:29                 ` Michael J Gruber
  2011-04-15 18:34                   ` Junio C Hamano
                                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Michael J Gruber @ 2011-04-15 15:29 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano

Currently, we ignore textconv and binary status for the combined diff
formats (-c, -cc) which was never intended.

Change this so that combined diff uses the same helpers.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
---
So, just so that I don't get the vapor patch award, here's a WIP passing
Jeff's test.

Before looking at free()ing what I've introduced and the binary issue I'll
check whether the whole blob/file read hunk in show_patch_diff() can't be
simply subsumed in the fill_textconv() call. It is almost a copy of
diff_populate_filespec() but not quite.

Also, the situation with worktree is even worse than I thought:

git diff -m produces a combined diff!

Also, my patch does not cure "diff -c" against worktree so far, I'm not
textconv'ing the worktree file yet. But then again, "diff -m" sucks here also.

I'll probably pick this up later today.
---
 combine-diff.c                 |   30 +++++++++---
 diff.h                         |    2 +
 t/t4046-diff-textconv-merge.sh |   97 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 8 deletions(-)
 create mode 100755 t/t4046-diff-textconv-merge.sh

diff --git a/combine-diff.c b/combine-diff.c
index 655fa89..8056fc3 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -8,7 +8,7 @@
 #include "log-tree.h"
 #include "refs.h"
 
-static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent)
+static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
 	struct combine_diff_path *p;
@@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
 
 			hashcpy(p->sha1, q->queue[i]->two->sha1);
 			p->mode = q->queue[i]->two->mode;
+			if (textconv)
+				p->textconv = get_textconv(q->queue[i]->two);
 			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
 			p->parent[n].mode = q->queue[i]->one->mode;
 			p->parent[n].status = q->queue[i]->status;
+			if (textconv)
+				p->parent[n].textconv = get_textconv(q->queue[i]->one);
 			*tail = p;
 			tail = &p->next;
 		}
@@ -60,6 +64,8 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
 				hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
 				p->parent[n].mode = q->queue[i]->one->mode;
 				p->parent[n].status = q->queue[i]->status;
+				if (textconv)
+					p->parent[n].textconv = get_textconv(q->queue[i]->one);
 				break;
 			}
 		}
@@ -201,8 +207,8 @@ static void consume_line(void *state_, char *line, unsigned long len)
 	}
 }
 
-static void combine_diff(const unsigned char *parent, unsigned int mode,
-			 mmfile_t *result_file,
+static void combine_diff(const char *path, const unsigned char *parent, unsigned int mode,
+			 struct userdiff_driver *textconv, mmfile_t *result_file,
 			 struct sline *sline, unsigned int cnt, int n,
 			 int num_parent, int result_deleted)
 {
@@ -212,13 +218,13 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	xdemitconf_t xecfg;
 	mmfile_t parent_file;
 	struct combine_diff_state state;
-	unsigned long sz;
+	struct diff_filespec *df = alloc_filespec(path);
 
 	if (result_deleted)
 		return; /* result deleted */
 
-	parent_file.ptr = grab_blob(parent, mode, &sz);
-	parent_file.size = sz;
+	fill_filespec(df, parent, mode);
+	parent_file.size = fill_textconv(textconv, df, &parent_file.ptr);
 	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
@@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			close(fd);
 	}
 
+	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
+		struct diff_filespec *df = alloc_filespec(elem->path);
+		fill_filespec(df, elem->sha1, elem->mode);
+		result_size = fill_textconv(elem->textconv, df, &result);
+	}
+
 	for (cnt = 0, cp = result; cp < result + result_size; cp++) {
 		if (*cp == '\n')
 			cnt++;
@@ -821,8 +833,10 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			}
 		}
 		if (i <= j)
-			combine_diff(elem->parent[i].sha1,
+			combine_diff(elem->path,
+				     elem->parent[i].sha1,
 				     elem->parent[i].mode,
+				     elem->parent[i].textconv,
 				     &result_file, sline,
 				     cnt, i, num_parent, result_deleted);
 		if (elem->parent[i].mode != elem->mode)
@@ -1001,7 +1015,7 @@ void diff_tree_combined(const unsigned char *sha1,
 			diffopts.output_format = DIFF_FORMAT_NO_OUTPUT;
 		diff_tree_sha1(parent[i], sha1, "", &diffopts);
 		diffcore_std(&diffopts);
-		paths = intersect_paths(paths, i, num_parent);
+		paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));
 
 		if (show_log_first && i == 0) {
 			show_log(rev);
diff --git a/diff.h b/diff.h
index 007a055..4ca6b84 100644
--- a/diff.h
+++ b/diff.h
@@ -176,10 +176,12 @@ struct combine_diff_path {
 	char *path;
 	unsigned int mode;
 	unsigned char sha1[20];
+	struct userdiff_driver *textconv;
 	struct combine_diff_parent {
 		char status;
 		unsigned int mode;
 		unsigned char sha1[20];
+		struct userdiff_driver *textconv;
 	} parent[FLEX_ARRAY];
 };
 #define combine_diff_path_size(n, l) \
diff --git a/t/t4046-diff-textconv-merge.sh b/t/t4046-diff-textconv-merge.sh
new file mode 100755
index 0000000..8420bb6
--- /dev/null
+++ b/t/t4046-diff-textconv-merge.sh
@@ -0,0 +1,97 @@
+#!/bin/sh
+
+test_description='combined and merge diff uses textconv'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit one file &&
+	test_commit two file &&
+	git checkout -b other HEAD^ &&
+	test_commit three file &&
+	test_must_fail git merge master &&
+	echo resolved >file &&
+	echo "file diff=upcase" >.gitattributes &&
+	git config diff.upcase.textconv "tr a-z A-Z <"
+'
+
+cat >expect <<'EOF'
+diff --combined file
+index 2bdf67a,f719efd..0000000
+--- a/file
++++ b/file
+@@@ -1,1 -1,1 +1,1 @@@
+- THREE
+ -TWO
+++RESOLVED
+EOF
+test_expect_success 'diff -c uses textconv' '
+	git diff -c >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+diff --git a/file b/file
+index 2bdf67a..0000000 100644
+--- a/file
++++ b/file
+@@ -1 +1 @@
+-THREE
++RESOLVED
+
+diff --git a/file b/file
+index f719efd..0000000 100644
+--- a/file
++++ b/file
+@@ -1 +1 @@
+-TWO
++RESOLVED
+EOF
+test_expect_success 'diff -m uses textconv' '
+	git diff -m >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+Merge branch 'master' into other
+
+diff --combined file
+index 2bdf67a,f719efd..2ab19ae
+--- a/file
++++ b/file
+@@@ -1,1 -1,1 +1,1 @@@
+- THREE
+ -TWO
+++RESOLVED
+EOF
+test_expect_success 'show -c uses textconv' '
+	git commit -a &&
+	git show --format=%s -c >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+Merge branch 'master' into other
+
+diff --git a/file b/file
+index 2bdf67a..2ab19ae 100644
+--- a/file
++++ b/file
+@@ -1 +1 @@
+-THREE
++RESOLVED
+Merge branch 'master' into other
+
+diff --git a/file b/file
+index f719efd..2ab19ae 100644
+--- a/file
++++ b/file
+@@ -1 +1 @@
+-TWO
++RESOLVED
+EOF
+test_expect_success 'show -m uses textconv' '
+	git show --format=%s -m >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
1.7.5.rc1.312.g1936c

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] combine-diff: use textconv for combined diff format
  2011-04-15 15:29                 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
@ 2011-04-15 18:34                   ` Junio C Hamano
  2011-04-16 10:24                     ` Michael J Gruber
  2011-04-15 23:56                   ` Jeff King
  2011-04-21 16:08                   ` Peter Oberndorfer
  2 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-15 18:34 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Jeff King

Michael J Gruber <git@drmicha.warpmail.net> writes:

> git diff -m produces a combined diff!

Hmm, what is the rest of your command line?  I thought -m was a way to ask
pairwise diff with each parent.

> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
>  {
>  	struct diff_queue_struct *q = &diff_queued_diff;
>  	struct combine_diff_path *p;
> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
>  
>  			hashcpy(p->sha1, q->queue[i]->two->sha1);
>  			p->mode = q->queue[i]->two->mode;
> +			if (textconv)
> +				p->textconv = get_textconv(q->queue[i]->two);
>  			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
>  			p->parent[n].mode = q->queue[i]->one->mode;
>  			p->parent[n].status = q->queue[i]->status;
> +			if (textconv)
> +				p->parent[n].textconv = get_textconv(q->queue[i]->one);

This code attempts to handle different textconv set for each different
parents.  But I have to wonder if that is really worth it.

The attribute to decide the content type of the blob is read from the same
set of .gitattributes files, regardless of which parent you are looking at
(and this is not likely to change---the exact procedure that is applied
comes from .git/config that is not even versioned, so there is not much
point in reading from the .gitattributes from the parent tree, trying to
be "precise").

If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv
would be the same because one and two came from the same path.  If it is a
rename, they by definition consist of similar contents, and the user would
want the same textconv conversion applied to them to make them comparable.
Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1
blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be
the right thing to do in theory, doing so wouldn't make a difference in
practice.  More importantly, even if the two textconvs specify different
conversions, it is likely that it is an user error (e.g. the preimage had
"img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the
attributes mechanism does not say ".jqg" is a JPEG that wants to get
"exif" run to be texualized for the purpose of diffing, or something).

Besides, if you really want to support "left hand side and right hand
side, depending on which parent we are talking about, may use different
textconv", you would need to defeat the optimization in show_patch_diff()
that calls reuse_combine_diff() when sha1 are the same from other parent
we have already compared---the parent we are looking at may be using a
different textconv procedure.  Even worse, if parent and child have the
same sha1, the result of running parent textconv on the parent blob may be
different from that of the child, which you would never even see in this
codepath.

So I suspect that using only one textconv per "struct combine_diff_path"
would make both the code simpler, and more importantly would make the
result more correct from the end user's point of view.

> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>  			close(fd);
>  	}
>  
> +	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
> +		struct diff_filespec *df = alloc_filespec(elem->path);
> +		fill_filespec(df, elem->sha1, elem->mode);
> +		result_size = fill_textconv(elem->textconv, df, &result);
> +	}

I suspect that these three lines have to become a small helper function to
be used to convert the final blob (done here), and parent blob (done in
combine_diff).  With the "binary" support, it would eventually need to be
enhanced to something like:

	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) {
        	if (textconv) {
                	do these three lines;
		} else if (is binary) {
                	"Binary blob $SHA-1";
		}
	}

and having a small helper function early in the series would help that
process.

> +		paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));

As an internal API within this file, I would rather see "opt" as a whole
passed to intersect_paths().  We may probably want to determine if the
blob is binary in that function depending on other "opt" fields.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-15  6:54             ` textconv not invoked when viewing merge commit Matthieu Moy
@ 2011-04-15 20:45               ` Junio C Hamano
  2011-04-16  1:47                 ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-15 20:45 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Jeff King, Michael J Gruber, Peter Oberndorfer, Git List

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

>> I can't think of a case where it matters, though, so maybe it is just
>> paranoia.
>
> A line-counting, statistics tool would think that 1 line has been
> removed from both branches, and one new added by the merge.
>
> Well, I know no tool parsing combined diff actually, so it's indeed a
> hypothetical case.

And the ones that have been parsing cdiff wouldn't have done anything good
before this change on such a binary blob anyway, no?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] combine-diff: use textconv for combined diff format
  2011-04-15 15:29                 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
  2011-04-15 18:34                   ` Junio C Hamano
@ 2011-04-15 23:56                   ` Jeff King
  2011-04-21 16:08                   ` Peter Oberndorfer
  2 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2011-04-15 23:56 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Junio C Hamano

On Fri, Apr 15, 2011 at 05:29:05PM +0200, Michael J Gruber wrote:

> Currently, we ignore textconv and binary status for the combined diff
> formats (-c, -cc) which was never intended.

Thanks for working on this.

I think it would be simpler to work on the binary half first. Then it
would be clear where the binary codepath diverges, and sticking the
textconv helpers in there would be easier (the helpers were, after all,
written because it was retrofitting existing diff code that already
handled binaries differently).

The whole grab_blob() thing seems like an unnecessary duplication of the
diff_filespec code. I think if we can switch to a more uniform use of
diff_filespec code, the memory management might end up simpler.

> +	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
> +		struct diff_filespec *df = alloc_filespec(elem->path);
> +		fill_filespec(df, elem->sha1, elem->mode);
> +		result_size = fill_textconv(elem->textconv, df, &result);
> +	}

The memory management with fill_textconv is kind of ugly. Sometimes it
returns memory which must be freed, and sometimes not. Looking at the
diff.c code, I think in this case it will always need freed (because
elem->textconv is non-NULL). Sorry, that was a mess I created a long
time ago that you now get to deal with. :)

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-15 20:45               ` Junio C Hamano
@ 2011-04-16  1:47                 ` Jeff King
  2011-04-16  6:10                   ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2011-04-16  1:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Michael J Gruber, Peter Oberndorfer, Git List

On Fri, Apr 15, 2011 at 01:45:35PM -0700, Junio C Hamano wrote:

> Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:
> 
> >> I can't think of a case where it matters, though, so maybe it is just
> >> paranoia.
> >
> > A line-counting, statistics tool would think that 1 line has been
> > removed from both branches, and one new added by the merge.
> >
> > Well, I know no tool parsing combined diff actually, so it's indeed a
> > hypothetical case.
> 
> And the ones that have been parsing cdiff wouldn't have done anything good
> before this change on such a binary blob anyway, no?

No, but we can view the proposed change as fixing a bug for such a tool
Whereas turning it into:

  --Binary blob XXX
  + Binary blob YYY
   +Binary blob ZZZ

is codifying ambiguous output, and making the tool forever broken.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-16  1:47                 ` Jeff King
@ 2011-04-16  6:10                   ` Junio C Hamano
  2011-04-16  6:33                     ` Jeff King
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-16  6:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthieu Moy, Michael J Gruber, Peter Oberndorfer, Git List

Jeff King <peff@peff.net> writes:

>> > Well, I know no tool parsing combined diff actually, so it's indeed a
>> > hypothetical case.
>> 
>> And the ones that have been parsing cdiff wouldn't have done anything good
>> before this change on such a binary blob anyway, no?
>
> No, but we can view the proposed change as fixing a bug for such a tool
> Whereas turning it into:
>
>   --Binary blob XXX
>   + Binary blob YYY
>    +Binary blob ZZZ
>
> is codifying ambiguous output, and making the tool forever broken.

Of course, if we did this for a plumbing command and when the user did not
ask for --textconv, I would agree with your argument.  Such an output
makes it impossible to tell between the text files that had these lines
and binary files.

What I am suggesting is to make any binary file use a fallback textconv
"Binary blob $SHA-1", when the --textconv option is given from the command
line and no textconv filter is configured for the path, in any textconv
aware commands consistently, not limited to -c/--cc under discussion.

With the current codebase, such a change *would* break a bog-standard,
two-way "git diff" for a binary file; we do want to see the traditional
"Binary files differ" by not using the fallback textconv, but we cannot
tell if the --textconv option was explicitly given from the command line
with the test used in Michael's patch (i.e. ALLOW_TEXTCONV), because we
set the bit by default for Porcelain commands.  And showing "-Binary X"
followed by "-Binary Y" is simply wrong and ambiguous, of course, in such
a case.  We need to be able to tell if an explicit --textconv was given or
we have ALLOW_TEXTCONV merely because we are running a Porcelain.

But I suspect that isn't something we cannot fix---we can just use another
bit to record that in the command line parser.

Once that is fixed, I don't think giving "Binary files differ" when the
line-counter script reads from a plumbing command that was invoked
explicitly with the --textconv command is any better than giving the above
three lines.  For a two-way merge, it does not matter much, but when
viewing a merge with three or more parents, -c/--cc output that shows
which sets of parents had the same blobs would be useful for humans (and
tools) than a single "Binary files differ" output that does not tell any
details.  The line-counter script would be counting "forever broken" data
when you feed your JPEG collection with exif extracting textconv filter
anyway, and I do not necessarily think it would make things worse to give
a fallback textconv filter to binary files that do not have one defined.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-16  6:10                   ` Junio C Hamano
@ 2011-04-16  6:33                     ` Jeff King
  2011-04-16 16:23                       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2011-04-16  6:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Michael J Gruber, Peter Oberndorfer, Git List

On Fri, Apr 15, 2011 at 11:10:44PM -0700, Junio C Hamano wrote:

> >> And the ones that have been parsing cdiff wouldn't have done anything good
> >> before this change on such a binary blob anyway, no?
> >
> > No, but we can view the proposed change as fixing a bug for such a tool
> > Whereas turning it into:
> >
> >   --Binary blob XXX
> >   + Binary blob YYY
> >    +Binary blob ZZZ
> >
> > is codifying ambiguous output, and making the tool forever broken.
> 
> Of course, if we did this for a plumbing command and when the user did not
> ask for --textconv, I would agree with your argument.  Such an output
> makes it impossible to tell between the text files that had these lines
> and binary files.

OK, but what do you intend to do for a plumbing command _without_
--textconv? I think what it is doing now (pretending that lines in the
binary file are relevant, and either truncating output on NUL or spewing
NULs to the output stream) is just wrong.

The only reasonable thing I see there is inventing some combined-diff
form of the "Binary files differ" message.

> What I am suggesting is to make any binary file use a fallback textconv
> "Binary blob $SHA-1", when the --textconv option is given from the command
> line and no textconv filter is configured for the path, in any textconv
> aware commands consistently, not limited to -c/--cc under discussion.

Ick, why? That pseudo-diff contains no additional interesting
information that is not already there (since the "index" line already
contains the blob sha1s). I suppose one could argue that it's more
readable, but I don't find it so; I actually think it is less readable,
because it makes you (even as a human, not a parsing script) think you
are looking at a meaningful text diff.

And then on top of that is the fact that what we do now is consistent
with other diff implementations, so people expect it.

> With the current codebase, such a change *would* break a bog-standard,
> two-way "git diff" for a binary file; we do want to see the traditional
> "Binary files differ" by not using the fallback textconv, but we cannot
> tell if the --textconv option was explicitly given from the command line
> with the test used in Michael's patch (i.e. ALLOW_TEXTCONV), because we
> set the bit by default for Porcelain commands.  And showing "-Binary X"
> followed by "-Binary Y" is simply wrong and ambiguous, of course, in such
> a case.  We need to be able to tell if an explicit --textconv was given or
> we have ALLOW_TEXTCONV merely because we are running a Porcelain.
> 
> But I suspect that isn't something we cannot fix---we can just use another
> bit to record that in the command line parser.

Sure, it would take some code tweaking, but it wouldn't be hard to get
the behavior you are mentioning.

> Once that is fixed, I don't think giving "Binary files differ" when the
> line-counter script reads from a plumbing command that was invoked
> explicitly with the --textconv command is any better than giving the above
> three lines.  For a two-way merge, it does not matter much, but when
> viewing a merge with three or more parents, -c/--cc output that shows
> which sets of parents had the same blobs would be useful for humans (and
> tools) than a single "Binary files differ" output that does not tell any
> details.

Oh, sure. I am not proposing that "-c" should just say exactly "Binary
files X and Y differ", only that we need a message _like_ that.  I think
it would be fine to represent which parents had which sha1, either in
some structured format or even as text. I just think that making it look
exactly like a text diff (even though, yes, that is a convenient
structured format that we already have) is unnecessarily confusing to
both humans and scripts.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] combine-diff: use textconv for combined diff format
  2011-04-15 18:34                   ` Junio C Hamano
@ 2011-04-16 10:24                     ` Michael J Gruber
  2011-04-16 17:19                       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Michael J Gruber @ 2011-04-16 10:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Junio C Hamano venit, vidit, dixit 15.04.2011 20:34:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>> git diff -m produces a combined diff!
> 
> Hmm, what is the rest of your command line?  I thought -m was a way to ask
> pairwise diff with each parent.

Sure, but it does not always work like that. Just look at the test from
my patch, or do any "git merge --no-commit" and then "git diff -m". I
would expect that to compare the worktree to each parent, but in fact it
runs "diff --cc".

At least I thought that's the only way how combine-diff would ever have
to deal with a merge result in the worktree as opposed to a blob. And it
seems that "diff -m" does not handle this but relays to "diff --cc" for
current git. I have not checked the "-m" codepath.

>> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
>>  {
>>  	struct diff_queue_struct *q = &diff_queued_diff;
>>  	struct combine_diff_path *p;
>> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
>>  
>>  			hashcpy(p->sha1, q->queue[i]->two->sha1);
>>  			p->mode = q->queue[i]->two->mode;
>> +			if (textconv)
>> +				p->textconv = get_textconv(q->queue[i]->two);
>>  			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
>>  			p->parent[n].mode = q->queue[i]->one->mode;
>>  			p->parent[n].status = q->queue[i]->status;
>> +			if (textconv)
>> +				p->parent[n].textconv = get_textconv(q->queue[i]->one);
> 
> This code attempts to handle different textconv set for each different
> parents.  But I have to wonder if that is really worth it.
> 
> The attribute to decide the content type of the blob is read from the same
> set of .gitattributes files, regardless of which parent you are looking at
> (and this is not likely to change---the exact procedure that is applied
> comes from .git/config that is not even versioned, so there is not much
> point in reading from the .gitattributes from the parent tree, trying to
> be "precise").
> 
> If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv
> would be the same because one and two came from the same path.  If it is a
> rename, they by definition consist of similar contents, and the user would
> want the same textconv conversion applied to them to make them comparable.
> Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1
> blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be
> the right thing to do in theory, doing so wouldn't make a difference in
> practice.  More importantly, even if the two textconvs specify different
> conversions, it is likely that it is an user error (e.g. the preimage had
> "img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the
> attributes mechanism does not say ".jqg" is a JPEG that wants to get
> "exif" run to be texualized for the purpose of diffing, or something).
> 
> Besides, if you really want to support "left hand side and right hand
> side, depending on which parent we are talking about, may use different
> textconv", you would need to defeat the optimization in show_patch_diff()
> that calls reuse_combine_diff() when sha1 are the same from other parent
> we have already compared---the parent we are looking at may be using a
> different textconv procedure.  Even worse, if parent and child have the
> same sha1, the result of running parent textconv on the parent blob may be
> different from that of the child, which you would never even see in this
> codepath.
> 
> So I suspect that using only one textconv per "struct combine_diff_path"
> would make both the code simpler, and more importantly would make the
> result more correct from the end user's point of view.

I'd be happy to take the simpler approach. While I still think the other
one is "more correct" (modulo the reuse issue) it should not matter in
most cases.

> 
>> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>>  			close(fd);
>>  	}
>>  
>> +	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
>> +		struct diff_filespec *df = alloc_filespec(elem->path);
>> +		fill_filespec(df, elem->sha1, elem->mode);
>> +		result_size = fill_textconv(elem->textconv, df, &result);
>> +	}
> 
> I suspect that these three lines have to become a small helper function to
> be used to convert the final blob (done here), and parent blob (done in
> combine_diff).  With the "binary" support, it would eventually need to be
> enhanced to something like:
> 
> 	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) {
>         	if (textconv) {
>                 	do these three lines;
> 		} else if (is binary) {
>                 	"Binary blob $SHA-1";
> 		}
> 	}
> 

"diff -m --oneline" says something like

aa01ae1 (from 64c0923) Merge branch 'master' into somebranch
diff --git a/a b/a
index 72594ed..d8323da 100644
Binary files a/a and b/a differ
aa01ae1 (from e85049e) Merge branch 'master' into somebranch
diff --git a/a b/a
index 86e041d..d8323da 100644
Binary files a/a and b/a differ


so I'm wondering whether we shouldn't stay closer to that with "--cc
also", e.g.:

aa01ae1 Merge branch 'master' into somebranch
diff --cc a
index 72594ed,86e041d..d8323da
Binary files a/a and b/a differ

BTW: Currently, "--cc --oneline" produces an extra newline before the
diff line, and also note how the diff lines differ ("a/a b/a" vs. "a").
But those are different issues.

> and having a small helper function early in the series would help that
> process.
> 
>> +		paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));
> 
> As an internal API within this file, I would rather see "opt" as a whole
> passed to intersect_paths().  We may probably want to determine if the
> blob is binary in that function depending on other "opt" fields.

Yep.
Michael

[Resent today, sorry. Couldn't get myself to reboot that box yesterday
after a disk gave up.]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: textconv not invoked when viewing merge commit
  2011-04-16  6:33                     ` Jeff King
@ 2011-04-16 16:23                       ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2011-04-16 16:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthieu Moy, Michael J Gruber, Peter Oberndorfer, Git List

Jeff King <peff@peff.net> writes:

> OK, but what do you intend to do for a plumbing command _without_
> --textconv? I think what it is doing now (pretending that lines in the
> binary file are relevant, and either truncating output on NUL or spewing
> NULs to the output stream) is just wrong.

Oh, no question about it.  "Binary files differ" codepath needs to be
added, and independent of if we want to add a fallback textconv.

> Ick, why? That pseudo-diff contains no additional interesting
> information that is not already there (since the "index" line already
> contains the blob sha1s).

True enough.

The only case that might make a difference would be if one side was binary
and the other side and the result was text, in which case the user can not
just see but read the result, but I don't think it is worth caring about.

Also unlike my weatherbaloon patch, Michael's approach (if it is updated
to pass the whole diff options structure instead of just one "do we care
about the textconv" bit to intersect_paths() function) will let us
determine if the combined path should say "Binary files differ" a lot
early, so there is no need to worry about what to do on binary files in
the places we would be adding textconv anymore.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] combine-diff: use textconv for combined diff format
  2011-04-16 10:24                     ` Michael J Gruber
@ 2011-04-16 17:19                       ` Junio C Hamano
  2011-04-16 21:37                         ` Jakub Narebski
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2011-04-16 17:19 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Jeff King

Michael J Gruber <git@drmicha.warpmail.net> writes:

> Junio C Hamano venit, vidit, dixit 15.04.2011 20:34:
>> Michael J Gruber <git@drmicha.warpmail.net> writes:
>> 
>>> git diff -m produces a combined diff!
>> 
>> Hmm, what is the rest of your command line?  I thought -m was a way to ask
>> pairwise diff with each parent.
>
> Sure, but it does not always work like that. Just look at the test from
> my patch, or do any "git merge --no-commit" and then "git diff -m". I
> would expect that to compare the worktree to each parent, but in fact it
> runs "diff --cc".

Thanks; it wasn't clear you are comparing stages with the working tree.
Asking for the rest of the command line paid off ;-)

And yes, comparing multiple entries with the worktree files is done in
run_diff_files() defined in diff-lib.c; it is unaware of the -m option
that was originally defined for diff-tree to show pairwise diff.  That
codepath never cared about -m before nor after -c/--cc was invented for
diff-tree, and it only learned about -c/--cc when it was introduced.  I
think diff-index is unaware of the -m option from the same historical
background (read: not "for the same reason or justification").

We may want to change that, but I am personally not very interested.  We
can ask to diff against a specific stage, I know that is what I do, and I
think that is what most people do [*1*].

> "diff -m --oneline" says something like
>
> aa01ae1 (from 64c0923) Merge branch 'master' into somebranch
> diff --git a/a b/a
> index 72594ed..d8323da 100644
> Binary files a/a and b/a differ
> aa01ae1 (from e85049e) Merge branch 'master' into somebranch
> diff --git a/a b/a
> index 86e041d..d8323da 100644
> Binary files a/a and b/a differ

Yes this is the case for diff-tree running pair-wise comparison.

> so I'm wondering whether we shouldn't stay closer to that with "--cc
> also", e.g.:
>
> aa01ae1 Merge branch 'master' into somebranch
> diff --cc a
> index 72594ed,86e041d..d8323da
> Binary files a/a and b/a differ

The -c/--cc options are about presenting the pairwise -m output in a
different way by combining and condensing.  So in that sense, if we really
want to combine and condense information, one possibility is to do:

 Binary files a/a and c/a differ, b/a and c/a differ.

naming each parent as 'a', 'b', ... and giving the highest letter (in the
two-parent merge case, 'c') to the final result.  By doing so, you can
express where in its tree each parent had the content when you are viewing
a renaming merge.

Bu that opens an old can of worms we should have opened and closed four
years ago.

The header shows "diff --cc a" followed by "--- a/a" followed by "+++ b/a"
before the hunk for a two-way merge.  But if we are to "combine and
condense", another possibility is to show:

    diff --cc a/a b/a c/a
    index bf7c788,fa9d23a,5d24d9f..cc69134
    --- a/a
    --- b/a
    +++ c/a
    @@@@ -74,26 -74,6 -74,29 +74,50 @@@@
    ...

to keep the paths information.  I do not think anybody cared so far, and
perhaps we should have done it when we introduced -c/--cc, but it is not
at all worth changing now.

That means that we are not all that worried about losing the rename
information when showing such a diff in --cc/-c form.  After all, the
"diff --cc a" header is not "diff --cc a/a b/a c/a" that mentions all
paths, and "--- a/a" lines are not repeated for each parent.  So while
showing the names like you suggested may be a possibility, I think an
approach that is more in line with the current output would be:

 "Binary files a in different versions differ"

or something without naming them with a/, b/, ...

In short, it all depends on how much we condense when running -c/--cc.  We
are inconsistent by showing both "--- a/a" and "+++ b/a" lines, but modulo
that we condense away the renamed path information in our current output.
And the final alternative in the previous paragraph would be more in line
with that design.


[Footnote]

*1* I also often use 

 diff HEAD...MERGE_HEAD $path
 diff HEAD $path
 diff MERGE_HEAD $path

during a conflicted merge when it is hard to read in the --cc form.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] combine-diff: use textconv for combined diff format
  2011-04-16 17:19                       ` Junio C Hamano
@ 2011-04-16 21:37                         ` Jakub Narebski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2011-04-16 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael J Gruber, git, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> But that opens an old can of worms we should have opened and closed four
> years ago.
> 
> The header shows "diff --cc a" followed by "--- a/a" followed by "+++ b/a"
> before the hunk for a two-way merge.  But if we are to "combine and
> condense", another possibility is to show:
> 
>     diff --cc a/a b/a c/a
>     index bf7c788,fa9d23a,5d24d9f..cc69134
>     --- a/a
>     --- b/a
>     +++ c/a
>     @@@@ -74,26 -74,6 -74,29 +74,50 @@@@
>     ...
> 
> to keep the paths information.  I do not think anybody cared so far, and
> perhaps we should have done it when we introduced -c/--cc, but it is not
> at all worth changing now.

Such feature would greatly simplify gitweb code for dealing with
combined diff (for a merge commit).  It wouldn't have to jump through
hoops[1] to get pre-image names to have correct link to pre-image...

This affects gitweb performance... in those rare case where we have
rename in merge commit (gitweb is smart enough to do this dance only
if there is rename in a merge).

Note that tree-diff doesn't help either - we have only post-image
name.

[1]: fill_from_file_info subroutine, which in turn uses
     git_get_path_by_hash once per parent, which uses git-ls-tree
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] combine-diff: use textconv for combined diff format
  2011-04-15 15:29                 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
  2011-04-15 18:34                   ` Junio C Hamano
  2011-04-15 23:56                   ` Jeff King
@ 2011-04-21 16:08                   ` Peter Oberndorfer
  2 siblings, 0 replies; 25+ messages in thread
From: Peter Oberndorfer @ 2011-04-21 16:08 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Jeff King, Junio C Hamano

On Freitag, 15. April 2011, Michael J Gruber wrote:
> Currently, we ignore textconv and binary status for the combined diff
> formats (-c, -cc) which was never intended.
> 
> Change this so that combined diff uses the same helpers.
> 
> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
> ---
> So, just so that I don't get the vapor patch award, here's a WIP passing
> Jeff's test.
> 
> Before looking at free()ing what I've introduced and the binary issue I'll
> check whether the whole blob/file read hunk in show_patch_diff() can't be
> simply subsumed in the fill_textconv() call. It is almost a copy of
> diff_populate_filespec() but not quite.
> 
> Also, the situation with worktree is even worse than I thought:
> 
> git diff -m produces a combined diff!
> 
> Also, my patch does not cure "diff -c" against worktree so far, I'm not
> textconv'ing the worktree file yet. But then again, "diff -m" sucks here also.
> 
> I'll probably pick this up later today.

Hi,

thanks for working on this.
I tried this patch on my msysgit system and now gitk shows a nice diff
for my merged archives. :-)
For merges of other binary files without textconf filter (jar)
i still get binary output.
(but i expected this from the notes above/discussion)

thanks,
Greetings Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-04-21 17:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-11 17:12 textconv not invoked when viewing merge commit Peter Oberndorfer
2011-04-12  9:04 ` Michael J Gruber
2011-04-14 19:09   ` Jeff King
2011-04-14 19:15     ` Jeff King
2011-04-14 19:26     ` Junio C Hamano
2011-04-14 19:28       ` Jeff King
2011-04-14 19:35         ` Michael J Gruber
2011-04-14 19:43       ` Junio C Hamano
2011-04-14 20:06         ` Junio C Hamano
2011-04-14 20:23           ` Jeff King
2011-04-14 21:05             ` Junio C Hamano
2011-04-14 21:30               ` Jeff King
2011-04-15 15:29                 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
2011-04-15 18:34                   ` Junio C Hamano
2011-04-16 10:24                     ` Michael J Gruber
2011-04-16 17:19                       ` Junio C Hamano
2011-04-16 21:37                         ` Jakub Narebski
2011-04-15 23:56                   ` Jeff King
2011-04-21 16:08                   ` Peter Oberndorfer
2011-04-15  6:54             ` textconv not invoked when viewing merge commit Matthieu Moy
2011-04-15 20:45               ` Junio C Hamano
2011-04-16  1:47                 ` Jeff King
2011-04-16  6:10                   ` Junio C Hamano
2011-04-16  6:33                     ` Jeff King
2011-04-16 16:23                       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.