All of lore.kernel.org
 help / color / mirror / Atom feed
* git diff, external diff tool, and submodules
@ 2014-02-15 12:19 Grégory Pakosz
  2014-02-16 16:52 ` [PATCH] diff: do not reuse_worktree_file for submodules Thomas Rast
  0 siblings, 1 reply; 6+ messages in thread
From: Grégory Pakosz @ 2014-02-15 12:19 UTC (permalink / raw)
  To: git

Hello,

When a submodule has new commits, I noticed the following:

$ git diff
diff --git a/submodule b/submodule
index 4c75be6..b272d40 160000
--- a/submodule
+++ b/submodule
@@ -1 +1 @@
-Subproject commit 4c75be6435cd515887d35c300ed8b487f8143d8e
+Subproject commit b272d4077fda29028c0bd02efba2837e12a8319c

As you can see, the diff shows the submodule has new commits.

However when using an external diff tool, it seems to me that git diff
fails to handle the submodule case:

$ GIT_EXTERNAL_DIFF=echo git diff
submodule /var/folders/1h/q9nt7m6d2fs61_v177kq1_h00000gn/T//cz1ati_submodule
4c75be6435cd515887d35c300ed8b487f8143d8e 160000 submodule
0000000000000000000000000000000000000000 160000

$LOCAL is set to a temp file that contains:
Subproject commit 4c75be6435cd515887d35c300ed8b487f8143d8e

And I expected $REMOTE to be set to another temp file that contains:
Subproject commit b272d4077fda29028c0bd02efba2837e12a8319c

Instead, $REMOTE is set to the actual submodule path and then visual
diff tools rightfully complains $REMOTE doesn't point to a valid file.

I think git diff should handle the submodule case and create 2
temporary files containing "Subproject commit sha1" for external diff
tools to compare.

What do you think?
Gregory

PS: git difftool has a special "directory mode" triggered with "-d"
that does what git diff does when GIT_EXTERNAL_DIFF is not set. It
creates temp files with "Subproject commit sha1" lines for diff tools
to compare.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] diff: do not reuse_worktree_file for submodules
  2014-02-15 12:19 git diff, external diff tool, and submodules Grégory Pakosz
@ 2014-02-16 16:52 ` Thomas Rast
  2014-02-18 21:01   ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2014-02-16 16:52 UTC (permalink / raw)
  To: git; +Cc: Grégory Pakosz, Junio C Hamano

The GIT_EXTERNAL_DIFF calling code attempts to reuse existing worktree
files for the worktree side of diffs, for performance reasons.
However, that code also tries to do the same with submodules.  This
results in calls to $GIT_EXTERNAL_DIFF where the old-file is a file of
the form "Submodule commit $sha1", but the new-file is a directory in
the worktree.

Fix it by never reusing a worktree "file" in the submodule case.

Reported-by: Grégory Pakosz <gregory.pakosz@gmail.com>
Signed-off-by: Thomas Rast <tr@thomasrast.ch>
---
 diff.c                   |  5 +++--
 t/t4020-diff-external.sh | 30 +++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 7c59bfe..e9a8874 100644
--- a/diff.c
+++ b/diff.c
@@ -2845,8 +2845,9 @@ static struct diff_tempfile *prepare_temp_file(const char *name,
 		remove_tempfile_installed = 1;
 	}
 
-	if (!one->sha1_valid ||
-	    reuse_worktree_file(name, one->sha1, 1)) {
+	if (!S_ISGITLINK(one->mode) &&
+	    (!one->sha1_valid ||
+	     reuse_worktree_file(name, one->sha1, 1))) {
 		struct stat st;
 		if (lstat(name, &st) < 0) {
 			if (errno == ENOENT)
diff --git a/t/t4020-diff-external.sh b/t/t4020-diff-external.sh
index bcae35a..0446201 100755
--- a/t/t4020-diff-external.sh
+++ b/t/t4020-diff-external.sh
@@ -226,12 +226,13 @@ keep_only_cr () {
 }
 
 test_expect_success 'external diff with autocrlf = true' '
-	git config core.autocrlf true &&
+	test_config core.autocrlf true &&
 	GIT_EXTERNAL_DIFF=./fake-diff.sh git diff &&
 	test $(wc -l < crlfed.txt) = $(cat crlfed.txt | keep_only_cr | wc -c)
 '
 
 test_expect_success 'diff --cached' '
+	test_config core.autocrlf true &&
 	git add file &&
 	git update-index --assume-unchanged file &&
 	echo second >file &&
@@ -239,4 +240,31 @@ test_expect_success 'diff --cached' '
 	test_cmp "$TEST_DIRECTORY"/t4020/diff.NUL actual
 '
 
+test_expect_success 'clean up crlf leftovers' '
+	git update-index --no-assume-unchanged file &&
+	rm -f file* &&
+	git reset --hard
+'
+
+test_expect_success 'submodule diff' '
+	git init sub &&
+	( cd sub && test_commit sub1 ) &&
+	git add sub &&
+	test_tick &&
+	git commit -m "add submodule" &&
+	( cd sub && test_commit sub2 ) &&
+	write_script gather_pre_post.sh <<-\EOF &&
+	echo "$1 $4" # path, mode
+	cat "$2" # old file
+	cat "$5" # new file
+	EOF
+	GIT_EXTERNAL_DIFF=./gather_pre_post.sh git diff >actual &&
+	cat >expected <<-EOF &&
+	sub 160000
+	Subproject commit $(git rev-parse HEAD:sub)
+	Subproject commit $(cd sub && git rev-parse HEAD)
+	EOF
+	test_cmp expected actual
+'
+
 test_done
-- 
1.9.0.313.g3d0a325

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: do not reuse_worktree_file for submodules
  2014-02-16 16:52 ` [PATCH] diff: do not reuse_worktree_file for submodules Thomas Rast
@ 2014-02-18 21:01   ` Junio C Hamano
  2014-02-22 11:27     ` Thomas Rast
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2014-02-18 21:01 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Grégory Pakosz

Thomas Rast <tr@thomasrast.ch> writes:

> The GIT_EXTERNAL_DIFF calling code attempts to reuse existing worktree
> files for the worktree side of diffs, for performance reasons.
> However, that code also tries to do the same with submodules.  This
> results in calls to $GIT_EXTERNAL_DIFF where the old-file is a file of
> the form "Submodule commit $sha1", but the new-file is a directory in
> the worktree.
>
> Fix it by never reusing a worktree "file" in the submodule case.
>
> Reported-by: Grégory Pakosz <gregory.pakosz@gmail.com>
> Signed-off-by: Thomas Rast <tr@thomasrast.ch>
> ---
>  diff.c                   |  5 +++--
>  t/t4020-diff-external.sh | 30 +++++++++++++++++++++++++++++-
>  2 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 7c59bfe..e9a8874 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2845,8 +2845,9 @@ static struct diff_tempfile *prepare_temp_file(const char *name,
>  		remove_tempfile_installed = 1;
>  	}
>  
> -	if (!one->sha1_valid ||
> -	    reuse_worktree_file(name, one->sha1, 1)) {
> +	if (!S_ISGITLINK(one->mode) &&
> +	    (!one->sha1_valid ||
> +	     reuse_worktree_file(name, one->sha1, 1))) {

I agree with the goal/end result, but I have to wonder if the
reuse_worktree_file() be the helper function that ought to
encapsulate such a logic?

Instead of feeding it an object name and a path, if we passed a
diff_filespec to the helper, it would have access to the mode as
well.  It would result in a more intrusive change, so I'd prefer to
see your patch applied first and then build such a refactor on top,
perhaps like the attached.

 diff.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index a96992a..74eec80 100644
--- a/diff.c
+++ b/diff.c
@@ -2582,11 +2582,13 @@ void fill_filespec(struct diff_filespec *spec, const unsigned char *sha1,
  * the work tree has that object contents, return true, so that
  * prepare_temp_file() does not have to inflate and extract.
  */
-static int reuse_worktree_file(const char *name, const unsigned char *sha1, int want_file)
+static int reuse_worktree_file(const struct diff_filespec *spec, int want_file)
 {
 	const struct cache_entry *ce;
 	struct stat st;
 	int pos, len;
+	const char *name = spec->path;
+	const unsigned char *sha1 = spec->sha1;
 
 	/*
 	 * We do not read the cache ourselves here, because the
@@ -2698,7 +2700,7 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 		return diff_populate_gitlink(s, size_only);
 
 	if (!s->sha1_valid ||
-	    reuse_worktree_file(s->path, s->sha1, 0)) {
+	    reuse_worktree_file(s, 0)) {
 		struct strbuf buf = STRBUF_INIT;
 		struct stat st;
 		int fd;
@@ -2844,17 +2846,17 @@ static struct diff_tempfile *prepare_temp_file(const char *name,
 
 	if (!S_ISGITLINK(one->mode) &&
 	    (!one->sha1_valid ||
-	     reuse_worktree_file(name, one->sha1, 1))) {
+	     reuse_worktree_file(one, 1))) {
 		struct stat st;
-		if (lstat(name, &st) < 0) {
+		if (lstat(one->path, &st) < 0) {
 			if (errno == ENOENT)
 				goto not_a_valid_file;
-			die_errno("stat(%s)", name);
+			die_errno("stat(%s)", one->path);
 		}
 		if (S_ISLNK(st.st_mode)) {
 			struct strbuf sb = STRBUF_INIT;
-			if (strbuf_readlink(&sb, name, st.st_size) < 0)
-				die_errno("readlink(%s)", name);
+			if (strbuf_readlink(&sb, one->path, st.st_size) < 0)
+				die_errno("readlink(%s)", one->path);
 			prep_temp_blob(name, temp, sb.buf, sb.len,
 				       (one->sha1_valid ?
 					one->sha1 : null_sha1),
@@ -2864,7 +2866,7 @@ static struct diff_tempfile *prepare_temp_file(const char *name,
 		}
 		else {
 			/* we can borrow from the file in the work tree */
-			temp->name = name;
+			temp->name = one->path;
 			if (!one->sha1_valid)
 				strcpy(temp->hex, sha1_to_hex(null_sha1));
 			else

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: do not reuse_worktree_file for submodules
  2014-02-18 21:01   ` Junio C Hamano
@ 2014-02-22 11:27     ` Thomas Rast
  2014-02-23 12:46       ` Thomas Rast
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2014-02-22 11:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Grégory Pakosz

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Rast <tr@thomasrast.ch> writes:
>
>> @@ -2845,8 +2845,9 @@ static struct diff_tempfile *prepare_temp_file(const char *name,
>>  		remove_tempfile_installed = 1;
>>  	}
>>  
>> -	if (!one->sha1_valid ||
>> -	    reuse_worktree_file(name, one->sha1, 1)) {
>> +	if (!S_ISGITLINK(one->mode) &&
>> +	    (!one->sha1_valid ||
>> +	     reuse_worktree_file(name, one->sha1, 1))) {
>
> I agree with the goal/end result, but I have to wonder if the
> reuse_worktree_file() be the helper function that ought to
> encapsulate such a logic?
>
> Instead of feeding it an object name and a path, if we passed a
> diff_filespec to the helper, it would have access to the mode as
> well.  It would result in a more intrusive change, so I'd prefer to
> see your patch applied first and then build such a refactor on top,
> perhaps like the attached.

I see that you already queued 721e727, which has the change you
described plus moving the S_ISGITLINK test into reuse_worktree_file.
The change looks good to me.  However, two nits about the comments:
diff.c now says

  /*
   * Given a name and sha1 pair, if the index tells us the file in
   * the work tree has that object contents, return true, so that
   * prepare_temp_file() does not have to inflate and extract.
   */
  static int reuse_worktree_file(const struct diff_filespec *spec, int want_file)
  {
          const struct cache_entry *ce;
          struct stat st;
          int pos, len;
          const char *name = spec->path;
          const unsigned char *sha1 = spec->sha1;

          /* reading the directory will not give us "Submodule commit XYZ" */
          if (S_ISGITLINK(spec->mode))
                  return 0;

But the function comment is no longer accurate, and the comment about
the S_ISGITLINK exit is rather obscure if one doesn't know what the
callers want.  So how about this on top?

 diff.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git i/diff.c w/diff.c
index a342ea6..dabf913 100644
--- i/diff.c
+++ w/diff.c
@@ -2578,9 +2578,14 @@ void fill_filespec(struct diff_filespec *spec, const unsigned char *sha1,
 }
 
 /*
- * Given a name and sha1 pair, if the index tells us the file in
- * the work tree has that object contents, return true, so that
- * prepare_temp_file() does not have to inflate and extract.
+ * Given a diff_filespec, determine if the corresponding worktree file
+ * can be used for diffing instead of reading the object from the
+ * repository.
+ *
+ * We normally try packfiles, worktree, loose objects in this order.
+ *
+ * If want_file=1 or git was compiled with NO_FAST_WORKING_DIRECTORY,
+ * the order is: worktree, packfiles, loose objects.
  */
 static int reuse_worktree_file(const struct diff_filespec *spec, int want_file)
 {
@@ -2590,7 +2595,11 @@ static int reuse_worktree_file(const struct diff_filespec *spec, int want_file)
 	const char *name = spec->path;
 	const unsigned char *sha1 = spec->sha1;
 
-	/* reading the directory will not give us "Submodule commit XYZ" */
+	/*
+	 * The diff representation of a submodule is "Submodule commit
+	 * XYZ", but in the worktree we have a directory.  So they
+	 * never match.
+	 */
 	if (S_ISGITLINK(spec->mode))
 		return 0;
 


-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: do not reuse_worktree_file for submodules
  2014-02-22 11:27     ` Thomas Rast
@ 2014-02-23 12:46       ` Thomas Rast
  2014-02-24 17:39         ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2014-02-23 12:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Grégory Pakosz

Thomas Rast <tr@thomasrast.ch> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Thomas Rast <tr@thomasrast.ch> writes:
>>
>>> @@ -2845,8 +2845,9 @@ static struct diff_tempfile *prepare_temp_file(const char *name,
>>>  		remove_tempfile_installed = 1;
>>>  	}
>>>  
>>> -	if (!one->sha1_valid ||
>>> -	    reuse_worktree_file(name, one->sha1, 1)) {
>>> +	if (!S_ISGITLINK(one->mode) &&
>>> +	    (!one->sha1_valid ||
>>> +	     reuse_worktree_file(name, one->sha1, 1))) {
>>
>> I agree with the goal/end result, but I have to wonder if the
>> reuse_worktree_file() be the helper function that ought to
>> encapsulate such a logic?
>>
>> Instead of feeding it an object name and a path, if we passed a
>> diff_filespec to the helper, it would have access to the mode as
>> well.  It would result in a more intrusive change, so I'd prefer to
>> see your patch applied first and then build such a refactor on top,
>> perhaps like the attached.
>
> I see that you already queued 721e727, which has the change you
> described plus moving the S_ISGITLINK test into reuse_worktree_file.
> The change looks good to me.

I spoke too soon; it breaks the test I wrote to cover this case, for a
reason that gives me a headache.

When we hit the conditional

>>> -	if (!one->sha1_valid ||
>>> -	    reuse_worktree_file(name, one->sha1, 1)) {
>>> +	if (!S_ISGITLINK(one->mode) &&
>>> +	    (!one->sha1_valid ||
>>> +	     reuse_worktree_file(name, one->sha1, 1))) {

sha1_valid=0 for the submodule on the worktree side of the diff.  The
reason is that we start out with sha1_valid=0 and sha1=000..000 for the
worktree side of all dirty entries, which makes sense at that point.  We
later set the sha1 by looking inside the submodule in
diff_fill_sha1_info(), but we never set sha1_valid.  So the above
conditional will now trigger on the !one->sha1_valid arm, completely
defeating the change to reuse_worktree_file().

We can fix it like below, but it feels a bit wrong to me.  Are
submodules the only case where it makes sense to set sha1_valid when we
fill the sha1?


 diff.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git i/diff.c w/diff.c
index dabf913..cf7281d 100644
--- i/diff.c
+++ w/diff.c
@@ -3081,6 +3082,8 @@ static void diff_fill_sha1_info(struct diff_filespec *one)
 				die_errno("stat '%s'", one->path);
 			if (index_path(one->sha1, one->path, &st, 0))
 				die("cannot hash %s", one->path);
+			if (S_ISGITLINK(one->mode))
+				one->sha1_valid = 1;
 		}
 	}
 	else


-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: do not reuse_worktree_file for submodules
  2014-02-23 12:46       ` Thomas Rast
@ 2014-02-24 17:39         ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2014-02-24 17:39 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Grégory Pakosz

Thomas Rast <tr@thomasrast.ch> writes:

> I spoke too soon; it breaks the test I wrote to cover this case, for a
> reason that gives me a headache.
>
> When we hit the conditional
>
>>>> -	if (!one->sha1_valid ||
>>>> -	    reuse_worktree_file(name, one->sha1, 1)) {
>>>> +	if (!S_ISGITLINK(one->mode) &&
>>>> +	    (!one->sha1_valid ||
>>>> +	     reuse_worktree_file(name, one->sha1, 1))) {
>
> sha1_valid=0 for the submodule on the worktree side of the diff.  The
> reason is that we start out with sha1_valid=0 and sha1=000..000 for the
> worktree side of all dirty entries, which makes sense at that point.  We
> later set the sha1 by looking inside the submodule in
> diff_fill_sha1_info(), but we never set sha1_valid.  So the above
> conditional will now trigger on the !one->sha1_valid arm, completely
> defeating the change to reuse_worktree_file().
>
> We can fix it like below, but it feels a bit wrong to me.  Are
> submodules the only case where it makes sense to set sha1_valid when we
> fill the sha1?

The meaning of filespec->sha1_valid is "Is it known that the
filespec->sha1 and filespec->mode field should be used?"; I agree
that this feels wrong.

Which means that the previous one was wrong, and your original was
the right approach.  I'll drop the update.

Thanks.

>
>  diff.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git i/diff.c w/diff.c
> index dabf913..cf7281d 100644
> --- i/diff.c
> +++ w/diff.c
> @@ -3081,6 +3082,8 @@ static void diff_fill_sha1_info(struct diff_filespec *one)
>  				die_errno("stat '%s'", one->path);
>  			if (index_path(one->sha1, one->path, &st, 0))
>  				die("cannot hash %s", one->path);
> +			if (S_ISGITLINK(one->mode))
> +				one->sha1_valid = 1;
>  		}
>  	}
>  	else

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-24 17:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-15 12:19 git diff, external diff tool, and submodules Grégory Pakosz
2014-02-16 16:52 ` [PATCH] diff: do not reuse_worktree_file for submodules Thomas Rast
2014-02-18 21:01   ` Junio C Hamano
2014-02-22 11:27     ` Thomas Rast
2014-02-23 12:46       ` Thomas Rast
2014-02-24 17:39         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.