All of lore.kernel.org
 help / color / mirror / Atom feed
* fatal error when diffing changed symlinks
@ 2017-02-24 11:47 Christophe Macabiau
  2017-02-24 17:53 ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Macabiau @ 2017-02-24 11:47 UTC (permalink / raw)
  To: git

Hi,

with the commands below, you will get :

> fatal: bad object 0000000000000000000000000000000000000000
> show 0000000000000000000000000000000000000000: command returned error: 128
>

I am using version 2.5.5 fedora 23

cd /tmp
mkdir a
cd a
git init
touch b
ln -s b c
git add .
git commit -m 'first'
touch d
rm c
ln -s d c
git difftool --dir-diff

Thanks for your feedback,

Christophe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fatal error when diffing changed symlinks
  2017-02-24 11:47 fatal error when diffing changed symlinks Christophe Macabiau
@ 2017-02-24 17:53 ` Junio C Hamano
  2017-02-24 19:51   ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-02-24 17:53 UTC (permalink / raw)
  To: Christophe Macabiau; +Cc: git, Johannes Schindelin

Christophe Macabiau <christophemacabiau@gmail.com> writes:

> with the commands below, you will get :
>
>> fatal: bad object 0000000000000000000000000000000000000000
>> show 0000000000000000000000000000000000000000: command returned error: 128
>>
>
> I am using version 2.5.5 fedora 23
>
> cd /tmp
> mkdir a
> cd a
> git init
> touch b
> ln -s b c
> git add .
> git commit -m 'first'
> touch d
> rm c
> ln -s d c
> git difftool --dir-diff

A slightly worse is that the upcoming Git will ship with a rewritten
"difftool" that makes the above sequence segfault.  As either way is
bad, I do not particularly think the rewritten one needs to be
reverted (it would give "fatal: bad object" then, and would not fix
your problem), but it needs to be looked at.

Thanks for a report.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fatal error when diffing changed symlinks
  2017-02-24 17:53 ` Junio C Hamano
@ 2017-02-24 19:51   ` Junio C Hamano
  2017-02-24 20:35     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-02-24 19:51 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Christophe Macabiau

Junio C Hamano <gitster@pobox.com> writes:

>> cd /tmp
>> mkdir a
>> cd a
>> git init
>> touch b
>> ln -s b c
>> git add .
>> git commit -m 'first'
>> touch d
>> rm c
>> ln -s d c
>> git difftool --dir-diff
>
> A slightly worse is that the upcoming Git will ship with a rewritten
> "difftool" that makes the above sequence segfault.

The culprit seems to be these lines in run_dir_diff():

		if (S_ISLNK(lmode)) {
			char *content = read_sha1_file(loid.hash, &type, &size);
			add_left_or_right(&symlinks2, src_path, content, 0);
			free(content);
		}

		if (S_ISLNK(rmode)) {
			char *content = read_sha1_file(roid.hash, &type, &size);
			add_left_or_right(&symlinks2, dst_path, content, 1);
			free(content);
		}

When viewing a working tree file, oid.hash could be 0{40} and
read_sha1_file() is not the right function to use to obtain the
contents.

Both of these two need to pay attention to 0{40}, I think, as the
user may be running "difftool -R --dir-diff" in which case the
working tree would appear in the left hand side instead.

I didn't follow the codepath for regular files closely, but the code
that follows the above excerpt does quite different things to lstate
and rstate, which makes me suspect that the code is not prepared to
see "-R"(everse) diff (and I further suspect that these issues were
inherited from the original scripted Porcelain).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fatal error when diffing changed symlinks
  2017-02-24 19:51   ` Junio C Hamano
@ 2017-02-24 20:35     ` Jeff King
  2017-02-25 12:36       ` Johannes Schindelin
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2017-02-24 20:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Christophe Macabiau

On Fri, Feb 24, 2017 at 11:51:22AM -0800, Junio C Hamano wrote:

> > A slightly worse is that the upcoming Git will ship with a rewritten
> > "difftool" that makes the above sequence segfault.
> 
> The culprit seems to be these lines in run_dir_diff():
> 
> 		if (S_ISLNK(lmode)) {
> 			char *content = read_sha1_file(loid.hash, &type, &size);
> 			add_left_or_right(&symlinks2, src_path, content, 0);
> 			free(content);
> 		}
> 
> 		if (S_ISLNK(rmode)) {
> 			char *content = read_sha1_file(roid.hash, &type, &size);
> 			add_left_or_right(&symlinks2, dst_path, content, 1);
> 			free(content);
> 		}
> 
> When viewing a working tree file, oid.hash could be 0{40} and
> read_sha1_file() is not the right function to use to obtain the
> contents.
> 
> Both of these two need to pay attention to 0{40}, I think, as the
> user may be running "difftool -R --dir-diff" in which case the
> working tree would appear in the left hand side instead.

As a side note, I think even outside of 0{40}, this should be checking
the return value of read_sha1_file(). A corrupted repo should die(), not
segfault.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fatal error when diffing changed symlinks
  2017-02-24 20:35     ` Jeff King
@ 2017-02-25 12:36       ` Johannes Schindelin
  2017-03-07 18:16         ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Schindelin @ 2017-02-25 12:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Christophe Macabiau

Hi Peff & Junio,

On Fri, 24 Feb 2017, Jeff King wrote:

> On Fri, Feb 24, 2017 at 11:51:22AM -0800, Junio C Hamano wrote:
> 
> > > A slightly worse is that the upcoming Git will ship with a rewritten
> > > "difftool" that makes the above sequence segfault.
> > 
> > The culprit seems to be these lines in run_dir_diff():
> > 
> > 		if (S_ISLNK(lmode)) {
> > 			char *content = read_sha1_file(loid.hash, &type, &size);
> > 			add_left_or_right(&symlinks2, src_path, content, 0);
> > 			free(content);
> > 		}
> > 
> > 		if (S_ISLNK(rmode)) {
> > 			char *content = read_sha1_file(roid.hash, &type, &size);
> > 			add_left_or_right(&symlinks2, dst_path, content, 1);
> > 			free(content);
> > 		}
> > 
> > When viewing a working tree file, oid.hash could be 0{40} and
> > read_sha1_file() is not the right function to use to obtain the
> > contents.
> > 
> > Both of these two need to pay attention to 0{40}, I think, as the
> > user may be running "difftool -R --dir-diff" in which case the
> > working tree would appear in the left hand side instead.
> 
> As a side note, I think even outside of 0{40}, this should be checking
> the return value of read_sha1_file(). A corrupted repo should die(), not
> segfault.

I agree. I am on it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fatal error when diffing changed symlinks
  2017-02-25 12:36       ` Johannes Schindelin
@ 2017-03-07 18:16         ` Junio C Hamano
  2017-03-07 22:34           ` Johannes Schindelin
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-03-07 18:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, Christophe Macabiau

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > When viewing a working tree file, oid.hash could be 0{40} and
>> > read_sha1_file() is not the right function to use to obtain the
>> > contents.
>> > 
>> > Both of these two need to pay attention to 0{40}, I think, as the
>> > user may be running "difftool -R --dir-diff" in which case the
>> > working tree would appear in the left hand side instead.
>> 
>> As a side note, I think even outside of 0{40}, this should be checking
>> the return value of read_sha1_file(). A corrupted repo should die(), not
>> segfault.
>
> I agree. I am on it.

Friendly ping, if only to make sure that we can keep a piece of this
thread in the more "recent" pile.

If you have other topics you need to perfect, I think it is OK to
postpone the fix on this topic a bit longer, but I'd hate to ship
two releases with a known breakage without an attempt to fix it, so
if you are otherwise occupied, I may encourage others (including me)
to take a look at this.  The new "difftool" also has a reported
regression somebody else expressed willingness to work on, which is
sort of blocked by everybody else not knowing the timeline on this
one.  cf. <20170303212836.GB13790@arch-attack.localdomain>

A patch series would be very welcome, but "Please go ahead if
somebody else has time, and I'll help reviewing" would be also
good.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fatal error when diffing changed symlinks
  2017-03-07 18:16         ` Junio C Hamano
@ 2017-03-07 22:34           ` Johannes Schindelin
  2017-03-13 17:56             ` [PATCH] difftool: handle changing symlinks in dir-diff mode David Aguilar
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Schindelin @ 2017-03-07 22:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Christophe Macabiau

Hi Junio,

On Tue, 7 Mar 2017, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> > When viewing a working tree file, oid.hash could be 0{40} and
> >> > read_sha1_file() is not the right function to use to obtain the
> >> > contents.
> >> > 
> >> > Both of these two need to pay attention to 0{40}, I think, as the
> >> > user may be running "difftool -R --dir-diff" in which case the
> >> > working tree would appear in the left hand side instead.
> >> 
> >> As a side note, I think even outside of 0{40}, this should be checking
> >> the return value of read_sha1_file(). A corrupted repo should die(), not
> >> segfault.
> >
> > I agree. I am on it.
> 
> Friendly ping, if only to make sure that we can keep a piece of this
> thread in the more "recent" pile.
> 
> If you have other topics you need to perfect, I think it is OK to
> postpone the fix on this topic a bit longer, but I'd hate to ship
> two releases with a known breakage without an attempt to fix it, so
> if you are otherwise occupied, I may encourage others (including me)
> to take a look at this.  The new "difftool" also has a reported
> regression somebody else expressed willingness to work on, which is
> sort of blocked by everybody else not knowing the timeline on this
> one.  cf. <20170303212836.GB13790@arch-attack.localdomain>
> 
> A patch series would be very welcome, but "Please go ahead if
> somebody else has time, and I'll help reviewing" would be also
> good.

Unfortunately, the obvious fix was not enough. My current work in progress
is in the `difftool-null-sha1` branch on https://gihub.com/dscho/git. I
still have a few other things to tend to, but hope to come back to the
difftool before the end of the week.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-07 22:34           ` Johannes Schindelin
@ 2017-03-13 17:56             ` David Aguilar
  2017-03-13 18:32               ` Junio C Hamano
                                 ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: David Aguilar @ 2017-03-13 17:56 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Christophe Macabiau, Git ML

Detect the null object ID for symlinks in dir-diff so that difftool can
prepare temporary files that matches how git handles symlinks.

Previously, a null object ID would crash difftool.  We now detect null
object IDs and write the symlink's content into the temporary symlink
stand-in file.

Original-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: David Aguilar <davvid@gmail.com>
---
 builtin/difftool.c  | 36 +++++++++++++++++++++++++++++++++---
 t/t7800-difftool.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index d13350ce83..6c20e20b45 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -254,6 +254,31 @@ static int ensure_leading_directories(char *path)
 	}
 }
 
+static int create_symlink_file(struct cache_entry* ce, struct checkout* state)
+{
+	/*
+	 * Dereference a worktree symlink and writes its contents
+	 * into the checkout state's path.
+	 */
+	struct strbuf path = STRBUF_INIT;
+	struct strbuf link = STRBUF_INIT;
+
+	int ok = 0;
+
+	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
+		strbuf_add(&path, state->base_dir, state->base_dir_len);
+		strbuf_add(&path, ce->name, ce_namelen(ce));
+
+		write_file_buf(path.buf, link.buf, link.len);
+		ok = 1;
+	}
+
+	strbuf_release(&path);
+	strbuf_release(&link);
+
+	return ok;
+}
+
 static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 			int argc, const char **argv)
 {
@@ -376,13 +401,13 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 			continue;
 		}
 
-		if (S_ISLNK(lmode)) {
+		if (S_ISLNK(lmode) && !is_null_oid(&loid)) {
 			char *content = read_sha1_file(loid.hash, &type, &size);
 			add_left_or_right(&symlinks2, src_path, content, 0);
 			free(content);
 		}
 
-		if (S_ISLNK(rmode)) {
+		if (S_ISLNK(rmode) && !is_null_oid(&roid)) {
 			char *content = read_sha1_file(roid.hash, &type, &size);
 			add_left_or_right(&symlinks2, dst_path, content, 1);
 			free(content);
@@ -414,7 +439,12 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 				oidcpy(&ce->oid, &roid);
 				strcpy(ce->name, dst_path);
 				ce->ce_namelen = dst_path_len;
-				if (checkout_entry(ce, &rstate, NULL))
+
+				if (S_ISLNK(rmode) && is_null_oid(&roid)) {
+					if (!create_symlink_file(ce, &rstate))
+						return error("unable to create symlink file %s",
+							     dst_path);
+				} else if (checkout_entry(ce, &rstate, NULL))
 					return error("could not write '%s'",
 						     dst_path);
 			} else if (!is_null_oid(&roid)) {
diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index e1ec292718..64f8e451b5 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -623,4 +623,44 @@ test_expect_success SYMLINKS 'difftool --dir-diff symlinked directories' '
 	)
 '
 
+test_expect_success SYMLINKS 'difftool --dir-diff' '
+	touch b &&
+	ln -s b c &&
+	git add . &&
+	test_tick &&
+	git commit -m initial &&
+	touch d &&
+	rm c &&
+	ln -s d c &&
+
+	git difftool --dir-diff --extcmd ls >output &&
+	grep -v ^/ output >actual &&
+	cat >expect <<-EOF &&
+		b
+		c
+		dirlinks
+		output
+		submod
+
+		c
+		dirlinks
+		output
+		submod
+	EOF
+	test_cmp expect actual &&
+
+	# The left side contains symlink "c" that points to "b"
+	test_config difftool.cat.cmd "cat \$LOCAL/c" &&
+	git difftool --dir-diff --tool cat >actual &&
+	echo b >expect &&
+	test_cmp expect actual &&
+
+	# The right side contains symlink "c" that points to "d",
+	# which mimics the state of the worktree.
+	test_config difftool.cat.cmd "cat \$REMOTE/c" &&
+	git difftool --dir-diff --tool cat >actual &&
+	echo -n d >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.266.g44c9eec009


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 17:56             ` [PATCH] difftool: handle changing symlinks in dir-diff mode David Aguilar
@ 2017-03-13 18:32               ` Junio C Hamano
  2017-03-13 21:04                 ` Johannes Schindelin
                                   ` (2 more replies)
  2017-03-13 19:02               ` Junio C Hamano
  2017-03-13 21:36               ` Johannes Schindelin
  2 siblings, 3 replies; 17+ messages in thread
From: Junio C Hamano @ 2017-03-13 18:32 UTC (permalink / raw)
  To: David Aguilar; +Cc: Johannes Schindelin, Christophe Macabiau, Git ML

David Aguilar <davvid@gmail.com> writes:

> Detect the null object ID for symlinks in dir-diff so that difftool can
> prepare temporary files that matches how git handles symlinks.
>
> Previously, a null object ID would crash difftool.  We now detect null
> object IDs and write the symlink's content into the temporary symlink
> stand-in file.
>
> Original-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: David Aguilar <davvid@gmail.com>
> ---

I would have appreciated (and I suspect other reviewers would, too)
a bit of back-story wrt how "Original-patch-by" resulted in this
patch after the three-dashes line.  It is perfectly fine if you two
coordinated privately; I mostly wanted to hear something like "Dscho
has been working on this and I asked him if it is OK to take over
his WIP to produce a quick-fix we can ship on the maint branch, here
is the result of that collaboration."  IOW, the person who is named
as the original author is fine to be named like so (I care only
because I do not think we saw the "original" here on the list).

> +static int create_symlink_file(struct cache_entry* ce, struct checkout* state)

Asterisk sticks to variable, not type.

> +{
> +	/*
> +	 * Dereference a worktree symlink and writes its contents

s/writes/write/

> +	 * into the checkout state's path.
> +	 */
> +	struct strbuf path = STRBUF_INIT;
> +	struct strbuf link = STRBUF_INIT;
> +
> +	int ok = 0;
> +
> +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
> +		strbuf_add(&path, state->base_dir, state->base_dir_len);
> +		strbuf_add(&path, ce->name, ce_namelen(ce));
> +
> +		write_file_buf(path.buf, link.buf, link.len);

This does "write content into symlink stand-in file", but why?  A
symbolic link that is not changed on the right-hand side of the
comparison (i.e. S_ISLNK(rmode) && !is_null_oid(&roid)) would go to
checkout_entry() which

 - creates a symbolic link, on a filesystem that supports symlink; or
 - writes the stand-in file on a filesystem that does not.

which is the right thing.  It seems that the other side of "if (!use_wt_file())"
if/elseif/... cascade also does the right thing manually.

Shouldn't this helper also do the same (i.e. create symlink and fall
back to stand-in)?

Also, I am not sure if strbuf_readlink() can unconditionally used
here.  On a filesystem without symbolic link, the working tree
entity that corresponds to the ce that represents a symlink is a
stand-in regular file, so shouldn't we be opening it as a regular
file and reading its contents in that case?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 17:56             ` [PATCH] difftool: handle changing symlinks in dir-diff mode David Aguilar
  2017-03-13 18:32               ` Junio C Hamano
@ 2017-03-13 19:02               ` Junio C Hamano
  2017-03-13 21:36               ` Johannes Schindelin
  2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2017-03-13 19:02 UTC (permalink / raw)
  To: David Aguilar; +Cc: Johannes Schindelin, Christophe Macabiau, Git ML

David Aguilar <davvid@gmail.com> writes:

> -		if (S_ISLNK(lmode)) {
> +		if (S_ISLNK(lmode) && !is_null_oid(&loid)) {
>  			char *content = read_sha1_file(loid.hash, &type, &size);
>  			add_left_or_right(&symlinks2, src_path, content, 0);
>  			free(content);
>  		}
>  
> -		if (S_ISLNK(rmode)) {
> +		if (S_ISLNK(rmode) && !is_null_oid(&roid)) {
>  			char *content = read_sha1_file(roid.hash, &type, &size);
>  			add_left_or_right(&symlinks2, dst_path, content, 1);
>  			free(content);

On this part I didn't comment in my previous message, but what is
the implication of omitting add-left-or-right and not registering
this symbolic link modified in the working tree to the symlinks2
table?

I am wondering if these should be more like

	if (S_ISLNK(lmode) {
		char *content = get_symlink(src_path, &loid);
		add_left_or_right(&symlinks2, src_path, content, 0);
                free(content);
	}                

with get_symlink() helper that does

 - if the object name is not 0{40}, read from the object store

 - if the object name is 0{40}, that means we need to read the real
   contents from the working tree file, so do the "readlink(2) if
   symbolic link is supported, otherwise open/read/close the stub
   file sitting there" thing.

Similary to the right hand side tree.

Discarding "content" after reading feels wasteful, as that is the
information we would be using when populating the rstate and lstaten
working trees later in the loop, but that would probably need a
larger surgery to the code to optimize, I would imagine.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 18:32               ` Junio C Hamano
@ 2017-03-13 21:04                 ` Johannes Schindelin
  2017-03-13 21:27                 ` Johannes Schindelin
  2017-03-14  4:13                 ` Junio C Hamano
  2 siblings, 0 replies; 17+ messages in thread
From: Johannes Schindelin @ 2017-03-13 21:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Aguilar, Christophe Macabiau, Git ML

Hi Junio,

On Mon, 13 Mar 2017, Junio C Hamano wrote:

> David Aguilar <davvid@gmail.com> writes:
> 
> > Detect the null object ID for symlinks in dir-diff so that difftool
> > can prepare temporary files that matches how git handles symlinks.
> >
> > Previously, a null object ID would crash difftool.  We now detect null
> > object IDs and write the symlink's content into the temporary symlink
> > stand-in file.
> >
> > Original-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > Signed-off-by: David Aguilar <davvid@gmail.com>
> > ---
> 
> I would have appreciated (and I suspect other reviewers would, too)
> a bit of back-story wrt how "Original-patch-by" resulted in this
> patch after the three-dashes line.  It is perfectly fine if you two
> coordinated privately; I mostly wanted to hear something like "Dscho
> has been working on this and I asked him if it is OK to take over
> his WIP to produce a quick-fix we can ship on the maint branch, here
> is the result of that collaboration."  IOW, the person who is named
> as the original author is fine to be named like so (I care only
> because I do not think we saw the "original" here on the list).

The story is more like: Johannes started working on this but got pulled
away by other tasks and sent out a link to the initial patch, along with a
note that he hopes to be able to get back to working on that patch before
long (but of course he did not get the chance):

http://public-inbox.org/git/alpine.DEB.2.20.1703072332370.3767@virtualbox/

There was no private exchange. I am happy that David picked up the
project.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 18:32               ` Junio C Hamano
  2017-03-13 21:04                 ` Johannes Schindelin
@ 2017-03-13 21:27                 ` Johannes Schindelin
  2017-03-13 21:33                   ` Junio C Hamano
  2017-03-14  4:13                 ` Junio C Hamano
  2 siblings, 1 reply; 17+ messages in thread
From: Johannes Schindelin @ 2017-03-13 21:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Aguilar, Christophe Macabiau, Git ML

Hi Junio,

On Mon, 13 Mar 2017, Junio C Hamano wrote:

> David Aguilar <davvid@gmail.com> writes:
> 
> > +static int create_symlink_file(struct cache_entry* ce, struct checkout* state)
> 
> Asterisk sticks to variable, not type.

If only we had tools to format the code so that authors as well as
reviewers could concentrate on essential parts of the patches :-)

> > +	 * into the checkout state's path.
> > +	 */
> > +	struct strbuf path = STRBUF_INIT;
> > +	struct strbuf link = STRBUF_INIT;
> > +
> > +	int ok = 0;
> > +
> > +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
> > +		strbuf_add(&path, state->base_dir, state->base_dir_len);
> > +		strbuf_add(&path, ce->name, ce_namelen(ce));
> > +
> > +		write_file_buf(path.buf, link.buf, link.len);
> 
> This does "write content into symlink stand-in file", but why?

From the commit message:

	> Detect the null object ID for symlinks in dir-diff so that
	> difftool can prepare temporary files that matches how git
	> handles symlinks.

The obvious connection: when core.symlinks = false, Git already falls back
to writing plain files with the link target as contents. This function
does the same, for the same motivation: it is the best we can do in this
case.

> Also, I am not sure if strbuf_readlink() can unconditionally used
> here.  On a filesystem without symbolic link, the working tree
> entity that corresponds to the ce that represents a symlink is a
> stand-in regular file, so shouldn't we be opening it as a regular
> file and reading its contents in that case?

I think you are right, we cannot simply call strbuf_readlink(), we would
have to check the core_symlinks variable to maybe read the file contents
instead.

But then, it may not be appropriate to read the worktree to begin with...
see my reply to the patch that I will send out in a couple of minutes.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 21:27                 ` Johannes Schindelin
@ 2017-03-13 21:33                   ` Junio C Hamano
  2017-03-14  2:20                     ` David Aguilar
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-03-13 21:33 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Aguilar, Christophe Macabiau, Git ML

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
>> > +		strbuf_add(&path, state->base_dir, state->base_dir_len);
>> > +		strbuf_add(&path, ce->name, ce_namelen(ce));
>> > +
>> > +		write_file_buf(path.buf, link.buf, link.len);
>> 
>> This does "write content into symlink stand-in file", but why?
>
> From the commit message:
>
> 	> Detect the null object ID for symlinks in dir-diff so that
> 	> difftool can prepare temporary files that matches how git
> 	> handles symlinks.

Yes I read what the proposed log message said, and noticed that
symbolic link is _always_ written as a regular file.

I was questioning why.  I know Git falls back to such behaviour when
the filesystem does not support symbolic links.  "Why do so
unconditionally, even when the filesystem does?" was the question.

> The obvious connection: when core.symlinks = false, Git already falls back
> to writing plain files with the link target as contents. This function
> does the same, for the same motivation: it is the best we can do in this
> case.

And that "obvious connection" does not answer the question.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 17:56             ` [PATCH] difftool: handle changing symlinks in dir-diff mode David Aguilar
  2017-03-13 18:32               ` Junio C Hamano
  2017-03-13 19:02               ` Junio C Hamano
@ 2017-03-13 21:36               ` Johannes Schindelin
  2 siblings, 0 replies; 17+ messages in thread
From: Johannes Schindelin @ 2017-03-13 21:36 UTC (permalink / raw)
  To: David Aguilar; +Cc: Junio C Hamano, Christophe Macabiau, Git ML

Hi David,

Thank you very much for picking this up!

On Mon, 13 Mar 2017, David Aguilar wrote:

> Detect the null object ID for symlinks in dir-diff so that difftool can
> prepare temporary files that matches how git handles symlinks.

Maybe a description is needed how the OID can be null in that case. I have
to admit that I failed to wrap my head around this so far.

> Previously, a null object ID would crash difftool.  We now detect null
> object IDs and write the symlink's content into the temporary symlink
> stand-in file.
> 
> Original-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: David Aguilar <davvid@gmail.com>
> ---
> diff --git a/builtin/difftool.c b/builtin/difftool.c
> index d13350ce83..6c20e20b45 100644
> --- a/builtin/difftool.c
> +++ b/builtin/difftool.c
> @@ -254,6 +254,31 @@ static int ensure_leading_directories(char *path)
>  	}
>  }
>  
> +static int create_symlink_file(struct cache_entry* ce, struct checkout* state)
> +{
> +	/*
> +	 * Dereference a worktree symlink and writes its contents
> +	 * into the checkout state's path.
> +	 */
> +	struct strbuf path = STRBUF_INIT;
> +	struct strbuf link = STRBUF_INIT;
> +
> +	int ok = 0;

It would appear that the convention in Git's C code is for functions to
return 0 upon success and -1 upon error, and to use `int ret` for that
purpose.

> +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {

Looking at the calling site, I would have expected the code to read the
contents as per ce->hash... After all, we are calling this in the
!use_wt_file() case.

But that is exactly that null hash, isn't it?

> @@ -414,7 +439,12 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,

The lines before this hunk read:

>                       if (!use_wt_file(workdir, dst_path, &roid)) {
>                               ce->ce_mode = rmode;

... and then follow these lines:

>  				oidcpy(&ce->oid, &roid);
>  				strcpy(ce->name, dst_path);
>  				ce->ce_namelen = dst_path_len;
> -				if (checkout_entry(ce, &rstate, NULL))
> +
> +				if (S_ISLNK(rmode) && is_null_oid(&roid)) {
> +					if (!create_symlink_file(ce, &rstate))
> +						return error("unable to create symlink file %s",
> +							     dst_path);
> +				} else if (checkout_entry(ce, &rstate, NULL))
>  					return error("could not write '%s'",
>  						     dst_path);
>  			} else if (!is_null_oid(&roid)) {

Given that we explicitly ask use_wt_file() whether we can use the
worktree's file, and we get the answer "no" before we enter the modified
code block, I would really expect us *not* to want to read the link from
disk at all.

Further, reading the code of use_wt_file(), there seems to be another case
where roid is left alone: when the file could not be lstat()ed. So I
wonder whether the create_symlink_file() is the correct solution, or
whether we could somehow fill roid correctly instead, and keep the
checkout_entry() call?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 21:33                   ` Junio C Hamano
@ 2017-03-14  2:20                     ` David Aguilar
  2017-03-14  5:52                       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: David Aguilar @ 2017-03-14  2:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Christophe Macabiau, Git ML

On Mon, Mar 13, 2017 at 02:33:09PM -0700, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> > +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
> >> > +		strbuf_add(&path, state->base_dir, state->base_dir_len);
> >> > +		strbuf_add(&path, ce->name, ce_namelen(ce));
> >> > +
> >> > +		write_file_buf(path.buf, link.buf, link.len);
> >> 
> >> This does "write content into symlink stand-in file", but why?
> >
> > From the commit message:
> >
> > 	> Detect the null object ID for symlinks in dir-diff so that
> > 	> difftool can prepare temporary files that matches how git
> > 	> handles symlinks.
> 
> Yes I read what the proposed log message said, and noticed that
> symbolic link is _always_ written as a regular file.
> 
> I was questioning why.  I know Git falls back to such behaviour when
> the filesystem does not support symbolic links.  "Why do so
> unconditionally, even when the filesystem does?" was the question.
> 
> > The obvious connection: when core.symlinks = false, Git already falls back
> > to writing plain files with the link target as contents. This function
> > does the same, for the same motivation: it is the best we can do in this
> > case.
> 
> And that "obvious connection" does not answer the question.

Thanks for the thorough review.

I'll try to answer questions from the various emails in just this
one spot in case it helps.

Dscho wrote:
> Given that we explicitly ask use_wt_file() whether we can use the
> worktree's file, and we get the answer "no" before we enter the modified
> code block, I would really expect us *not* to want to read the link from
> disk at all.

That probably deserves a comment on its own.

The use_wt_file() function really answers the question --
"does the worktree contain content that does is unknown to
 Git, and thus we should symlink to the worktree?"

Since these are symlinks, and symlinks are already used to
point back to the worktree (so that difftools will write
back to the worktree in case the user edited in-tool)
then the meaning of use_wt_file() in this context
can be misleading.

What we're trying to do is handle the case where Git knows it's dealing
with an entry that it wants to checkout into a temporary area
but it has no way to do so.  These entries always have the
0{40} null SHA1 because that is what git diff-files reports
for content that exists in the worktree only.

Dscho wrote:
> I think you are right, we cannot simply call strbuf_readlink(), we would
> have to check the core_symlinks variable to maybe read the file contents
> instead.
>
> But then, it may not be appropriate to read the worktree to begin
> with...
> see my reply to the patch that I will send out in a couple of minutes.

In this case, where the null SHA1 indicates that it is unknown content,
then I believe we must read from the worktree to simulate what Git
would have checked out.  Checking for core.symlinks should probably be
done before calling strbuf_readlink, though.

Junio wrote:
> > > Detect the null object ID for symlinks in dir-diff so that
> > > difftool can prepare temporary files that matches how git
> > > handles symlinks.
>
> Yes I read what the proposed log message said, and noticed that
> symbolic link is _always_ written as a regular file.
>
> I was questioning why.  I know Git falls back to such behaviour when
> the filesystem does not support symbolic links.  "Why do so
> unconditionally, even when the filesystem does?" was the question.

I have to re-read the code to see where this is special-cased, but
it seems that symlinks are always written as raw files by the
dir-diff code for the purposes of full-tree diffing.

I think the "why" is tied up in the implementation details of
the symlink-back-to-the-worktree-to-allow-editing feature.
By special-casing in-tree symlinks and writing them as raw
files we can hijack on-disk symlinks.  We use on-disk symlinks to
link back to the worktree so that external diff tools can write
to the worktree through the symlink.

Junio wrote:
> On this part I didn't comment in my previous message, but what is
> the implication of omitting add-left-or-right and not registering
> this symbolic link modified in the working tree to the symlinks2
> table?
>
> I am wondering if these should be more like
>
>         if (S_ISLNK(lmode) {
>                 char *content = get_symlink(src_path, &loid);
>                 add_left_or_right(&symlinks2, src_path, content, 0);
>                 free(content);
>         }
>
> with get_symlink() helper that does
>
>  - if the object name is not 0{40}, read from the object store
>
>  - if the object name is 0{40}, that means we need to read the real
>    contents from the working tree file, so do the "readlink(2) if
>    symbolic link is supported, otherwise open/read/close the stub
>    file sitting there" thing.
>
> Similary to the right hand side tree.

I'll take a look at trying this out.

Reading the code again, the point of add_left_or_right()
is to populate the worktree (done later in the loop) with
the stuff we read from Git.  Thus, if we changed just this
section to call get_symlink() then we should not even try
to checkout any symlink entries in !use_wt_file(...)
block where checkout_entry() / create_symlink_file() are called.

Since the symlinks2 hashmap already populates the worktree
then that code should instead simply skip symlinks.


Later, once we get to the part where we copy stuff back
into the worktree, it should be noted that we always
skip over symlinks.  We simply do not handle them,
never have, and I don't think we really should.

The use case that we're trying to handle is a common
use case where the user is using dir-diff and uses the
difftool to edit a file with content that exists in the
worktree only.  For that use case, a symlink to the worktree
is created in the temp area, and Git does not need to do
anything special.

I do not think the use case of a user editing a symlink
stand-in file, and then having Git update the worktree
with the updated content, is common or something we
want to support.  I'd prefer to keep the use case simple
since the code is already complicated enough.

I'll take a stab at adding a get_symlink() helper, adjust
the code so that add_left_or_right() is populated, and
special-case the checkout_entry() code path to simply skip
over null SHA1s.

I'll also address the review notes and try to add more
comments to describe what exactly this code does and why
it does it.

Do the tests make sense?

One minor thing I noticed is that I had to use "echo -n"
for the stuff coming out of strbuf_readlink(), and
plain "echo" for entries that come in via read_sha1_file()
content passed to add_left_or_right().

That suggests that maybe I should append a newline to the
output from strbuf_readlink() so that it matches Git.
Does that sound right?  Does Git store symlink entries
with a terminating newline?

Please let me know if I'm missing something.

cheers,
-- 
David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-13 18:32               ` Junio C Hamano
  2017-03-13 21:04                 ` Johannes Schindelin
  2017-03-13 21:27                 ` Johannes Schindelin
@ 2017-03-14  4:13                 ` Junio C Hamano
  2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2017-03-14  4:13 UTC (permalink / raw)
  To: David Aguilar; +Cc: Johannes Schindelin, Christophe Macabiau, Git ML

Junio C Hamano <gitster@pobox.com> writes:

>> +	struct strbuf path = STRBUF_INIT;
>> +	struct strbuf link = STRBUF_INIT;
>> +
>> +	int ok = 0;
>> +
>> +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
>> +		strbuf_add(&path, state->base_dir, state->base_dir_len);
>> +		strbuf_add(&path, ce->name, ce_namelen(ce));
>> +
>> +		write_file_buf(path.buf, link.buf, link.len);
>
> This does "write content into symlink stand-in file", but why?  A
> symbolic link that is not changed on the right-hand side of the
> comparison (i.e. S_ISLNK(rmode) && !is_null_oid(&roid)) would go to
> checkout_entry() which
>
>  - creates a symbolic link, on a filesystem that supports symlink; or
>  - writes the stand-in file on a filesystem that does not.
>
> which is the right thing.  It seems that the other side of "if (!use_wt_file())"
> if/elseif/... cascade also does the right thing manually.
>
> Shouldn't this helper also do the same (i.e. create symlink and fall
> back to stand-in)?
>
> Also, I am not sure if strbuf_readlink() can unconditionally used
> here.  On a filesystem without symbolic link, the working tree
> entity that corresponds to the ce that represents a symlink is a
> stand-in regular file, so shouldn't we be opening it as a regular
> file and reading its contents in that case?

I _think_ I was mistaken (please correct me again if my reasoning
below is wrong) and unconditional writing of a plain regular file is
what this codepath wants to do, because we are preparing two temporary
directories, to be fed to a Git-unaware tool that knows how to show
a diff of two directories (e.g. "diff -r A B").  Because the tool is
Git-unaware, if a symbolic link appears in either of these temporary
directories, it will try to dereference and show the difference of
the target of the symbolic link, which is not what we want, as the
goal of the dir-diff mode is to produce an output that is logically
equivalent to what "git diff" produces---most importantly, we want
to get textual comparison of the result of the readlink(2).  And
write_file_buf() used here is exactly that---we write the contents
of symlink to a regular file to force the external tool to compare
the readlink(2) result as if it is a text.  Even on a filesystem
that is capable of doing a symbolic link.

The strbuf_readlink() that read the contents of symlink, however,
is wrong on a filesystem that is not capable of a symbolic link; if
core.symlinks is false, we do need to read them as a regular file.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] difftool: handle changing symlinks in dir-diff mode
  2017-03-14  2:20                     ` David Aguilar
@ 2017-03-14  5:52                       ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2017-03-14  5:52 UTC (permalink / raw)
  To: David Aguilar; +Cc: Johannes Schindelin, Christophe Macabiau, Git ML

David Aguilar <davvid@gmail.com> writes:

> Reading the code again, the point of add_left_or_right()
> is to populate the worktree (done later in the loop) with
> the stuff we read from Git.  Thus, if we changed just this
> section to call get_symlink() then we should not even try
> to checkout any symlink entries in !use_wt_file(...)
> block where checkout_entry() / create_symlink_file() are called.
> 
> Since the symlinks2 hashmap already populates the worktree
> then that code should instead simply skip symlinks.

OK, that would simplify things, certainly.

> I'll take a stab at adding a get_symlink() helper, adjust
> the code so that add_left_or_right() is populated, and
> special-case the checkout_entry() code path to simply skip
> over null SHA1s.

Did you mean s/null SHA1s/S_ISLNK()s/?

> One minor thing I noticed is that I had to use "echo -n"
> for the stuff coming out of strbuf_readlink(), and
> plain "echo" for entries that come in via read_sha1_file()
> content passed to add_left_or_right().
>
> That suggests that maybe I should append a newline to the
> output from strbuf_readlink() so that it matches Git.
> Does that sound right?  Does Git store symlink entries
> with a terminating newline?

Do not append a newline.  Unless the pathname of the target you are
symlinking to ends with LF, readlink() won't end with LF, and the
stand-in file shouldn't, either.

By the way, avoid "echo -n"; use "printf '%s'" instead.

Thanks.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-03-14  5:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 11:47 fatal error when diffing changed symlinks Christophe Macabiau
2017-02-24 17:53 ` Junio C Hamano
2017-02-24 19:51   ` Junio C Hamano
2017-02-24 20:35     ` Jeff King
2017-02-25 12:36       ` Johannes Schindelin
2017-03-07 18:16         ` Junio C Hamano
2017-03-07 22:34           ` Johannes Schindelin
2017-03-13 17:56             ` [PATCH] difftool: handle changing symlinks in dir-diff mode David Aguilar
2017-03-13 18:32               ` Junio C Hamano
2017-03-13 21:04                 ` Johannes Schindelin
2017-03-13 21:27                 ` Johannes Schindelin
2017-03-13 21:33                   ` Junio C Hamano
2017-03-14  2:20                     ` David Aguilar
2017-03-14  5:52                       ` Junio C Hamano
2017-03-14  4:13                 ` Junio C Hamano
2017-03-13 19:02               ` Junio C Hamano
2017-03-13 21:36               ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.