All of lore.kernel.org
 help / color / mirror / Atom feed
* Behavior of git rm
@ 2013-04-03 14:50 jpinheiro
  2013-04-03 15:58 ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: jpinheiro @ 2013-04-03 14:50 UTC (permalink / raw)
  To: git

Hi all,

We are students from Universidade do Minho in Portugal, and we are using git
in project as a case study.
While experimenting with git we found an unexpected behavior with git rm.
Here is a trace of the unexpected behavior:

$ git init
$ mkdir D
$ echo "Hi" > D/F
$ git add D/F
$ rm -r D
$ echo "Hey" > D
$ git rm D/F
warning: 'D/F': Not a directory
rm 'D/F'
fatal: git rm: 'D/F': Not a directory


If the file D created with last echo did not exist or was named differently
then no error would occur as expected. For example:

$ git init
$ mkdir D
$ echo "Hi" > D/F
$ git add D/F
$ rm -r D
$ echo "Hey" > F
$ git rm D/F

This works as expected, and the only difference is the name of the file of
the last echo.
Is this the expected behavior of git rm?




--
View this message in context: http://git.661346.n2.nabble.com/Behavior-of-git-rm-tp7581485.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Behavior of git rm
  2013-04-03 14:50 Behavior of git rm jpinheiro
@ 2013-04-03 15:58 ` Jeff King
  2013-04-03 17:35   ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2013-04-03 15:58 UTC (permalink / raw)
  To: jpinheiro; +Cc: git

On Wed, Apr 03, 2013 at 07:50:24AM -0700, jpinheiro wrote:

> While experimenting with git we found an unexpected behavior with git rm.
> Here is a trace of the unexpected behavior:
> 
> $ git init
> $ mkdir D
> $ echo "Hi" > D/F
> $ git add D/F
> $ rm -r D
> $ echo "Hey" > D
> $ git rm D/F
> warning: 'D/F': Not a directory
> rm 'D/F'
> fatal: git rm: 'D/F': Not a directory

We drop the D/F entry from the index, but then fail to actually remove
it from the filesystem, because it has already been replaced. It is
impossible to tell from this toy example what the true intent was, but
in such a situation, there is a reasonable chance that the user should
have invoked "rm --cached" in the first place.

That being said, we do try to handle files which have already gone
missing; when unlink() fails, we do not consider it an error if we got
ENOENT. We could perhaps add ENOTDIR to that list, as it also indicates
that the file is gone (it just happens that one of its prefix
directories was replaced with something else).

The opposite case is also interesting:

  $ git init
  $ echo 1 >D
  $ git add D
  $ rm D
  $ mkdir D
  $ echo 2 >D/F
  $ git rm D
  rm 'D'
  fatal: git rm: 'D': Is a directory

We expect to see 'D' as a file, but it is now a directory. We _could_
recursively remove the directory, but that has the potential to delete
files that the user does not expect.

So in both cases, "git rm" could certainly detect the situation and
proceed with the destructive operation. But when there is such a
conflict between what's in the working tree and what's in the index, I
think we may be better off erring on the conservative side and bailing,
and letting the user reconcile the differences themselves (using either
"git add" or "git rm --cached" to update the index, or deciding how to
handle the working tree contents themselves with regular "rm").

Of the two situations, I think the first one is less likely to be
destructive (noticing that a file is already gone via ENOTDIR), as we
are only proceeding with the index deletion, and we end up not touching
the filesystem at all. That patch would look something like:

diff --git a/builtin/rm.c b/builtin/rm.c
index dabfcf6..7b91d52 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only)
 		ce = active_cache[pos];
 
 		if (lstat(ce->name, &st) < 0) {
-			if (errno != ENOENT)
+			if (errno != ENOENT && errno != ENOTDIR)
 				warning("'%s': %s", ce->name, strerror(errno));
 			/* It already vanished from the working tree */
 			continue;
diff --git a/dir.c b/dir.c
index 57394e4..f9e7355 100644
--- a/dir.c
+++ b/dir.c
@@ -1603,7 +1603,7 @@ int remove_path(const char *name)
 {
 	char *slash;
 
-	if (unlink(name) && errno != ENOENT)
+	if (unlink(name) && errno != ENOENT && errno != ENOTDIR)
 		return -1;
 
 	slash = strrchr(name, '/');

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Behavior of git rm
  2013-04-03 15:58 ` Jeff King
@ 2013-04-03 17:35   ` Junio C Hamano
  2013-04-03 20:36     ` Jeff King
  2013-04-04 19:02     ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2013-04-03 17:35 UTC (permalink / raw)
  To: Jeff King; +Cc: jpinheiro, git

Jeff King <peff@peff.net> writes:

> Of the two situations, I think the first one is less likely to be
> destructive (noticing that a file is already gone via ENOTDIR), as we
> are only proceeding with the index deletion, and we end up not touching
> the filesystem at all.

Nice to see sound reasoning.

>
> diff --git a/builtin/rm.c b/builtin/rm.c
> index dabfcf6..7b91d52 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only)
>  		ce = active_cache[pos];
>  
>  		if (lstat(ce->name, &st) < 0) {
> -			if (errno != ENOENT)
> +			if (errno != ENOENT && errno != ENOTDIR)

OK.  We may be running lstat() on D/F but there may be D that is not
a directory.  If it is a file, we get ENOTDIR.

By the way, if D is a dangling symlink, we get ENOENT; in such a
case, we report "rm 'D/F'" on the output and remove the index entry.

	$ rm -f .git/index && rm -fr D E
	$ mkdir D && >D/F && git add D && rm -fr D
        $ ln -s erewhon D && git rm D/F && git ls-files
        rm 'D/F'

Also if D is a symlink that point at a directory E, "git rm" does
something interesting.

(1) Perhaps we want a complaint in this case.

	$ rm -f .git/index && rm -fr D E
	$ mkdir D && >D/F && git add D && rm -fr D
	$ mkdir E && ln -s E D && git rm D/F

(2) Perhaps we want to make sure D/F is not beyond a symlink in this
    case.

	$ rm -f .git/index && rm -fr D E
	$ mkdir D && >D/F && git add D && rm -fr D
	$ mkdir E && ln -s E D && date >E/F && git rm D/F


	$ git rm -f D/F

> diff --git a/dir.c b/dir.c
> index 57394e4..f9e7355 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1603,7 +1603,7 @@ int remove_path(const char *name)
>  {
>  	char *slash;
>  
> -	if (unlink(name) && errno != ENOENT)
> +	if (unlink(name) && errno != ENOENT && errno != ENOTDIR)
>  		return -1;

Ditto.

>  
>  	slash = strrchr(name, '/');

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Behavior of git rm
  2013-04-03 17:35   ` Junio C Hamano
@ 2013-04-03 20:36     ` Jeff King
  2013-04-04 19:02     ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2013-04-03 20:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Wed, Apr 03, 2013 at 10:35:52AM -0700, Junio C Hamano wrote:

> > diff --git a/builtin/rm.c b/builtin/rm.c
> > index dabfcf6..7b91d52 100644
> > --- a/builtin/rm.c
> > +++ b/builtin/rm.c
> > @@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only)
> >  		ce = active_cache[pos];
> >  
> >  		if (lstat(ce->name, &st) < 0) {
> > -			if (errno != ENOENT)
> > +			if (errno != ENOENT && errno != ENOTDIR)
> 
> OK.  We may be running lstat() on D/F but there may be D that is not
> a directory.  If it is a file, we get ENOTDIR.
> 
> By the way, if D is a dangling symlink, we get ENOENT; in such a
> case, we report "rm 'D/F'" on the output and remove the index entry.
>
> 	$ rm -f .git/index && rm -fr D E
> 	$ mkdir D && >D/F && git add D && rm -fr D
>         $ ln -s erewhon D && git rm D/F && git ls-files
>         rm 'D/F'

That seems sane to me, and makes me feel like handling ENOTDIR here is
the right direction.  What that conditional is trying to say is "if it
is because the file is not there...", and so far we know of three
conditions where it is not there:

  1. There is no entry at that path.

  2. There is a non-directory in the prefix of that path.

  3. There is a dangling symlink in the prefix of that path.

(1) and (3) we already handle via ENOENT. I think it is sane to handle
(2) the same as (3), but we do not do so currently.

> Also if D is a symlink that point at a directory E, "git rm" does
> something interesting.
> 
> (1) Perhaps we want a complaint in this case.
> 
> 	$ rm -f .git/index && rm -fr D E
> 	$ mkdir D && >D/F && git add D && rm -fr D
> 	$ mkdir E && ln -s E D && git rm D/F

I think that is OK without complaint; the user asked to get rid of D/F,
and it is indeed gone (as well as its index entry) after the call
finishes. And we did not even need to delete anything, so we cannot be
losing data. I am much more concerned about this case:

> (2) Perhaps we want to make sure D/F is not beyond a symlink in this
>     case.
> 
> 	$ rm -f .git/index && rm -fr D E
> 	$ mkdir D && >D/F && git add D && rm -fr D
> 	$ mkdir E && ln -s E D && date >E/F && git rm D/F

where the user is deleting something that may or may not be related to
the original D/F. On the other hand, I don't have that much sympathy;
"rm" would make the same deletion. But hmm...shouldn't we be doing an
up-to-date check? Indeed:

  $ git rm D/F
  error: 'D/F' has staged content different from both the file and the HEAD
  (use -f to force removal)
  $ git commit -m foo && git rm D/F
  $ git rm D/F
  error: 'D/F' has local modifications
  (use --cached to keep the file, or -f to force removal)

So I do not think we need any extra safety; the content-level checks
should be enough to make sure we are not losing anything.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Behavior of git rm
  2013-04-03 17:35   ` Junio C Hamano
  2013-04-03 20:36     ` Jeff King
@ 2013-04-04 19:02     ` Jeff King
  2013-04-04 19:03       ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King
                         ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Jeff King @ 2013-04-04 19:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Wed, Apr 03, 2013 at 10:35:52AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Of the two situations, I think the first one is less likely to be
> > destructive (noticing that a file is already gone via ENOTDIR), as we
> > are only proceeding with the index deletion, and we end up not touching
> > the filesystem at all.
> 
> Nice to see sound reasoning.

Here's a patch series which I think covers what we've discussed.

  [1/3]: rm: do not complain about d/f conflicts during deletion
  [2/3]: t3600: test behavior of reverse-d/f conflict
  [3/3]: t3600: test rm of path with changed leading symlinks

The first one is the code change, and the rest just documents the cases
we discussed.

The third one is a little subtle. For the most part is it just testing
the normal "changed content requires --force" behavior of rm. But I
think it is worth having because it also makes sure that after deleting
"d/f" when "d" is a symlink to "e", that we do not remove the new
directory "e" nor the symlink "d". I do not think this case was
explicitly planned for, but it does do the right thing now, and given
the subtlety, I'd rather somebody who changes it notice the breakage in
the test suite.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] rm: do not complain about d/f conflicts during deletion
  2013-04-04 19:02     ` Jeff King
@ 2013-04-04 19:03       ` Jeff King
  2013-04-04 19:03       ` [PATCH 2/3] t3600: test behavior of reverse-d/f conflict Jeff King
  2013-04-04 19:06       ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King
  2 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2013-04-04 19:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

If we used to have an index entry "d/f", but "d" has been
replaced by a non-directory entry, the user may still want
to run "git rm" to delete the stale index entry. They could
use "git rm --cached" to just touch the index, but "git rm"
should also work: we explicitly try to handle the case that
the file has already been removed from the working tree.

However, because unlinking "d/f" in this case will not yield
ENOENT, but rather ENOTDIR, we do not notice that the file
is already gone. Instead, we report it as an error.

The simple solution is to treat ENOTDIR in this case exactly
like ENOENT; all we want to know is whether the file is
already gone, and if a leading path is no longer a
directory, then by definition the sub-path is gone.

Reported-by: jpinheiro <7jpinheiro@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rm.c  |  2 +-
 dir.c         |  2 +-
 t/t3600-rm.sh | 25 +++++++++++++++++++++++++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index dabfcf6..7b91d52 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -110,7 +110,7 @@ static int check_local_mod(unsigned char *head, int index_only)
 		ce = active_cache[pos];
 
 		if (lstat(ce->name, &st) < 0) {
-			if (errno != ENOENT)
+			if (errno != ENOENT && errno != ENOTDIR)
 				warning("'%s': %s", ce->name, strerror(errno));
 			/* It already vanished from the working tree */
 			continue;
diff --git a/dir.c b/dir.c
index 57394e4..f9e7355 100644
--- a/dir.c
+++ b/dir.c
@@ -1603,7 +1603,7 @@ int remove_path(const char *name)
 {
 	char *slash;
 
-	if (unlink(name) && errno != ENOENT)
+	if (unlink(name) && errno != ENOENT && errno != ENOTDIR)
 		return -1;
 
 	slash = strrchr(name, '/');
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index 37bf5f1..73772b2 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -622,4 +622,29 @@ test_expect_success 'rm of a populated nested submodule with a nested .git direc
 	rm -rf submod
 '
 
+test_expect_success 'rm of d/f when d has become a non-directory' '
+	rm -rf d &&
+	mkdir d &&
+	>d/f &&
+	git add d &&
+	rm -rf d &&
+	>d &&
+	git rm d/f &&
+	test_must_fail git rev-parse --verify :d/f &&
+	test_path_is_file d
+'
+
+test_expect_success SYMLINKS 'rm of d/f when d has become a dangling symlink' '
+	rm -rf d &&
+	mkdir d &&
+	>d/f &&
+	git add d &&
+	rm -rf d &&
+	ln -s nonexistent d &&
+	git rm d/f &&
+	test_must_fail git rev-parse --verify :d/f &&
+	test -h d &&
+	test_path_is_missing d
+'
+
 test_done
-- 
1.8.2.rc0.33.gd915649

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] t3600: test behavior of reverse-d/f conflict
  2013-04-04 19:02     ` Jeff King
  2013-04-04 19:03       ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King
@ 2013-04-04 19:03       ` Jeff King
  2013-04-04 19:06       ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King
  2 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2013-04-04 19:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

The previous commit taught "rm" that it is safe to consider
"d/f" removed when "d" has become a non-directory. This
patch adds a test for the opposite: a file "d" that becomes
a directory.

In this case, "git rm" does need to complain, because we
should not be removing arbitrary content under "d". Git
already behaves correctly, but let's make sure that remains
the case by protecting the behavior with a test.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t3600-rm.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index 73772b2..a2e1a03 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -647,4 +647,16 @@ test_expect_success SYMLINKS 'rm of d/f when d has become a dangling symlink' '
 	test_path_is_missing d
 '
 
+test_expect_success 'rm of file when it has become a directory' '
+	rm -rf d &&
+	>d &&
+	git add d &&
+	rm -f d &&
+	mkdir d &&
+	>d/f &&
+	test_must_fail git rm d &&
+	git rev-parse --verify :d &&
+	test_path_is_file d/f
+'
+
 test_done
-- 
1.8.2.rc0.33.gd915649

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 19:02     ` Jeff King
  2013-04-04 19:03       ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King
  2013-04-04 19:03       ` [PATCH 2/3] t3600: test behavior of reverse-d/f conflict Jeff King
@ 2013-04-04 19:06       ` Jeff King
  2013-04-04 19:42         ` Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2013-04-04 19:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

If we have a path "d/f" but replace "d" with a symlink to a
new directory "e", how should we handle "git rm d/f"?

It may seem at first like we need new protections to make
sure that we do not delete random content from "e/f".
However, we are already covered by git-rm's existing
protections: it is happy if the working tree file is either
already deleted, or if its content matches that of the index
and HEAD (and otherwise requires "-f").

Let's add some tests to make sure that these protections
remain in place when used across symlinks. We also want to
make sure that neither the symlink nor the pointed-to
directory is accidentally removed in an attempt to clean up
empty elements of the leading path.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t3600-rm.sh | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index a2e1a03..9eaec08 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -659,4 +659,47 @@ test_expect_success 'rm of file when it has become a directory' '
 	test_path_is_file d/f
 '
 
+test_expect_success 'set up commit with d/f' '
+	rm -rf d e &&
+	mkdir d &&
+	echo content >d/f &&
+	git add d &&
+	git commit -m d
+'
+
+test_expect_success SYMLINKS 'replace dir with symlink to dir (file missing)' '
+	git reset --hard &&
+	rm -rf d e &&
+	mkdir e &&
+	ln -s e d &&
+	git rm d/f &&
+	test_must_fail git rev-parse --verify :d/f &&
+	test -h d &&
+	test_path_is_dir e
+'
+
+test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' '
+	git reset --hard &&
+	rm -rf d e &&
+	mkdir e &&
+	echo content >e/f &&
+	ln -s e d &&
+	git rm d/f &&
+	test_must_fail git rev-parse --verify :d/f &&
+	test -h d &&
+	test_path_is_dir e
+'
+
+test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' '
+	git reset --hard &&
+	rm -rf d e &&
+	mkdir e &&
+	echo changed >e/f &&
+	ln -s e d &&
+	test_must_fail git rm d/f &&
+	git rev-parse --verify :d/f &&
+	test -h d &&
+	test_path_is_file e/f
+'
+
 test_done
-- 
1.8.2.rc0.33.gd915649

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 19:06       ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King
@ 2013-04-04 19:42         ` Junio C Hamano
  2013-04-04 19:55           ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2013-04-04 19:42 UTC (permalink / raw)
  To: Jeff King; +Cc: jpinheiro, git

Jeff King <peff@peff.net> writes:

> +test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' '
> +	git reset --hard &&
> +	rm -rf d e &&
> +	mkdir e &&
> +	echo content >e/f &&
> +	ln -s e d &&
> +	git rm d/f &&
> +	test_must_fail git rev-parse --verify :d/f &&
> +	test -h d &&
> +	test_path_is_dir e
> +'

This does not check if e/f still exists in the working tree, and I
suspect "git rm d/f" removes it.

If you do this:

	rm -fr d e
        mkdir e
        >e/f
        ln -s e d
        git add d/f

we do complain that d/f is beyond a symlink (meaning that all you
can add is the symlink d that may happen to point at something).

Silent removal of e/f that is unrelated to the current project's
tracked contents feels very wrong, and at the same time it looks to
me that it is inconsistent with what we do when adding.

I need a bit more persuading to understand why it is not a bug, I
think.

> +test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' '
> +	git reset --hard &&
> +	rm -rf d e &&
> +	mkdir e &&
> +	echo changed >e/f &&
> +	ln -s e d &&
> +	test_must_fail git rm d/f &&
> +	git rev-parse --verify :d/f &&
> +	test -h d &&
> +	test_path_is_file e/f
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 19:42         ` Junio C Hamano
@ 2013-04-04 19:55           ` Jeff King
  2013-04-04 20:31             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2013-04-04 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Thu, Apr 04, 2013 at 12:42:54PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' '
> > +	git reset --hard &&
> > +	rm -rf d e &&
> > +	mkdir e &&
> > +	echo content >e/f &&
> > +	ln -s e d &&
> > +	git rm d/f &&
> > +	test_must_fail git rev-parse --verify :d/f &&
> > +	test -h d &&
> > +	test_path_is_dir e
> > +'
> 
> This does not check if e/f still exists in the working tree, and I
> suspect "git rm d/f" removes it.

I guess I should have been more clear in my test; I think it _should_ be
removed (and it is). You do not necessarily care that "d" is now the
symlink and not the actual path; it is safe to remove d/f even though it
is behind a symlink now, because it has the exact same content that it
had before (it is of course important that we still remove the actual
d/f index entry, but as far as the working tree goes, we only care that
it is safe to remove, and that we remove it).

IOW, I should have been more explicit like this:

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index 9eaec08..3b51a63 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -687,7 +687,8 @@ test_expect_success SYMLINKS 'replace dir with symlink to dir (same content)' '
 	git rm d/f &&
 	test_must_fail git rev-parse --verify :d/f &&
 	test -h d &&
-	test_path_is_dir e
+	test_path_is_dir e &&
+	test_path_is_missing e/f
 '
 
 test_expect_success SYMLINKS 'replace dir with symlink to dir (new content)' '

> If you do this:
> 
> 	rm -fr d e
>         mkdir e
>         >e/f
>         ln -s e d
>         git add d/f
> 
> we do complain that d/f is beyond a symlink (meaning that all you
> can add is the symlink d that may happen to point at something).

Right, but that is because you are adding a bogus entry to the index; we
cannot have both 'd' as a symlink and 'd/f' as a path in our git tree.
But in the removal case, the index manipulation is perfectly reasonable.
You are deleting the existing "d/f" entry. The only confusion comes from
the fact that the working tree does not match that anymore.

> Silent removal of e/f that is unrelated to the current project's
> tracked contents feels very wrong, and at the same time it looks to
> me that it is inconsistent with what we do when adding.
> 
> I need a bit more persuading to understand why it is not a bug, I
> think.

But that's the point of the two content tests. It _isn't_ unrelated to
the current project's tracked contents; it's the exact same content at
the same path (albeit accessed via symlinks now). The likely case for
this is something like:

  mv dir somewhere/else
  ln -s somewhere/else/dir dir

I do not mind if you want to insert extra protection to not cross
symlink boundaries (which would obviously invalidate my test).  But I
don't think it is necessary because of the existing content-level
protections.  Adding extra protections would disallow "git rm dir/file" in
the above case, but I don't think it's that inconvenient; the user just
has to make the index aware of the typechange first via "git add".

-Peff

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 19:55           ` Jeff King
@ 2013-04-04 20:31             ` Junio C Hamano
  2013-04-04 21:03               ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2013-04-04 20:31 UTC (permalink / raw)
  To: Jeff King; +Cc: jpinheiro, git

Jeff King <peff@peff.net> writes:

>> If you do this:
>> 
>> 	rm -fr d e
>>         mkdir e
>>         >e/f
>>         ln -s e d
>>         git add d/f
>> 
>> we do complain that d/f is beyond a symlink (meaning that all you
>> can add is the symlink d that may happen to point at something).
>
> Right, but that is because you are adding a bogus entry to the index; we
> cannot have both 'd' as a symlink and 'd/f' as a path in our git tree.
> But in the removal case, the index manipulation is perfectly reasonable.

I think you misread me.  I am not adding 'd' as a symlink at all.
IIRC, ancient versions of Git got this case wrong and added d/f to
the index, which we later fixed.

I have been hinting that we should do the same safety not to touch
(even compare the contents of) e/f, because the only reason we even
look at it is because it appears beyond a symbolic link 'd'.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 20:31             ` Junio C Hamano
@ 2013-04-04 21:03               ` Jeff King
  2013-04-04 23:12                 ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2013-04-04 21:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Thu, Apr 04, 2013 at 01:31:43PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> If you do this:
> >> 
> >> 	rm -fr d e
> >>         mkdir e
> >>         >e/f
> >>         ln -s e d
> >>         git add d/f
> >> 
> >> we do complain that d/f is beyond a symlink (meaning that all you
> >> can add is the symlink d that may happen to point at something).
> >
> > Right, but that is because you are adding a bogus entry to the index; we
> > cannot have both 'd' as a symlink and 'd/f' as a path in our git tree.
> > But in the removal case, the index manipulation is perfectly reasonable.
> 
> I think you misread me.  I am not adding 'd' as a symlink at all.
> IIRC, ancient versions of Git got this case wrong and added d/f to
> the index, which we later fixed.

I think I just spoke sloppily. What is bogus about "d/f" is not that "d"
is necessarily in the index right now, but that adding "d/f" implies
that "d" is a tree, which it clearly is not. Git maps filesystem
symlinks into the index and into its trees without dereferencing them.
So whether we have "d" in the index right now or not, "d/f" is wrong
conceptually.

But I do not think the "we are mis-representing the filesystem" problem
applies to this "rm" case. We are not adding anything bogus into the
index; on the contrary, we are deleting something that no longer matches
the filesystem representation (and is actually the same bogosity that we
avoid adding under the rule above).

I do agree that it violates git's general behavior with symlinks (i.e.,
that they are not dereferenced).

> I have been hinting that we should do the same safety not to touch
> (even compare the contents of) e/f, because the only reason we even
> look at it is because it appears beyond a symbolic link 'd'.

I can certainly see the safety argument that crossing a symlink at "d"
means the resulting "d/f" is not necessarily related to the original
"d/f" that is in the index. As I said, I do not mind having the extra
protection; my argument was only that the content-check already protects
us, so the extra protection is not necessary. And the implication is
that I do not feel like working on it. :) I do not mind at all if you
drop my third patch (and that is part of the reason I split it out from
patch 2, which I do think is a no-brainer), and I am happy to review
patches to do the symlink check if you want to work on it.

Having made the argument that the content-check is enough, though, I
think there is an interesting corner case where it might not be. I don't
mind "git rm d/f" deleting "e/f" inside the repository when "d" is a
symlink to "e". But what would happen with:

  rm -rf d
  ln -s /path/outside/repo d
  git rm d/f

Deleting across symlinks inside the repo can be brushed aside with "eh,
well, it is just another way to mention the same name in the
filesystem". But deleting anything outside of the repo seems actively
wrong.

And more on that "brushed aside". I think it is easy in the cases we
have been talking about, namely where "d/f" still exists in the index,
to think that "git rm d/f" is useful and the question is one of safety:
should we delete e/f if it is pointed to? But let us imagine that d/f is
_not_ in the index, but "d" is a symlink pointing to some/long/path".
The user wants to be lazy and say "git rm d/f", because typing
"some/long/path" is too much work. But what happens to the index? We
should probably not be removing "some/long/path".

Hmm. I think you have convinced me (or perhaps I have convinced myself)
that we should generally not be crossing symlink boundaries in
translating names between the filesystem and index. I still don't want
to work on it, though. :)

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 21:03               ` Jeff King
@ 2013-04-04 23:12                 ` Junio C Hamano
  2013-04-04 23:29                   ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2013-04-04 23:12 UTC (permalink / raw)
  To: Jeff King; +Cc: jpinheiro, git

Jeff King <peff@peff.net> writes:

> Deleting across symlinks inside the repo can be brushed aside with "eh,
> well, it is just another way to mention the same name in the
> filesystem". But deleting anything outside of the repo seems actively
> wrong.

Yup, you finally caught up ;-) IIRC, such an outside repository
target was the case people realized that "git add" shouldn't see
across symlinks.

> Hmm. I think you have convinced me (or perhaps I have convinced myself)
> that we should generally not be crossing symlink boundaries in
> translating names between the filesystem and index. I still don't want
> to work on it, though. :)

That is OK.  Just let's not etch a wrong behaviour in stone with
that test.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 23:12                 ` Junio C Hamano
@ 2013-04-04 23:29                   ` Jeff King
  2013-04-04 23:33                     ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2013-04-04 23:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Thu, Apr 04, 2013 at 04:12:01PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Deleting across symlinks inside the repo can be brushed aside with "eh,
> > well, it is just another way to mention the same name in the
> > filesystem". But deleting anything outside of the repo seems actively
> > wrong.
> 
> Yup, you finally caught up ;-) IIRC, such an outside repository
> target was the case people realized that "git add" shouldn't see
> across symlinks.

It would help if you spelled it out rather than making me come to it
while arguing against you. ;P

> > Hmm. I think you have convinced me (or perhaps I have convinced myself)
> > that we should generally not be crossing symlink boundaries in
> > translating names between the filesystem and index. I still don't want
> > to work on it, though. :)
> 
> That is OK.  Just let's not etch a wrong behaviour in stone with
> that test.

So let's drop patch 3. Do we want instead to have an expect_failure
documenting the correct behavior?

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 23:29                   ` Jeff King
@ 2013-04-04 23:33                     ` Junio C Hamano
  2013-04-05  0:00                       ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2013-04-04 23:33 UTC (permalink / raw)
  To: Jeff King; +Cc: jpinheiro, git

Jeff King <peff@peff.net> writes:

> So let's drop patch 3. Do we want instead to have an expect_failure
> documenting the correct behavior?

I think that is very much preferred.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-04 23:33                     ` Junio C Hamano
@ 2013-04-05  0:00                       ` Jeff King
  2013-04-05  4:59                         ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2013-04-05  0:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Thu, Apr 04, 2013 at 04:33:39PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So let's drop patch 3. Do we want instead to have an expect_failure
> > documenting the correct behavior?
> 
> I think that is very much preferred.

Here's a replacement for patch 3, then. I wasn't sure if the
editorializing in the last 2 paragraphs should go in the commit message
or the cover letter; feel free to tweak as you see fit.

-- >8 --
Subject: [PATCH] t3600: document failure of rm across symbolic links

If we have a symlink "d" that points to a directory, we
should not be able to remove "d/f". In the normal case,
where "d/f" does not exist in the index, we already disallow
this, as we only remove things that git knows about in the
index. So for something like:

  ln -s /outside/repo foo
  git add foo
  git rm foo/bar

we will properly produce an error (as there is no index
entry for foo/bar). However, if there is an index entry for
the path (e.g., because the movement is due to working tree
changes that have not yet been reflected in the index), we
will happily delete it, even though the path we delete from the
filesystem is not the same as the path in the index.

This patch documents that failure with a test.

While this is a bug, it should not be possible to cause
serious data loss with it. For any path that does not have
an index entry, we will complain and bail. For a path which
does have an index entry, we will do the usual up-to-date
content check. So even if the deleted path in the filesystem
is not the same as the one we are removing from the index,
we do know that they at least have the same content, and
that the content is included in HEAD.

That means the worst case is not the accidental loss of
content, but rather confusion by the user when a copy of a
file another part of the tree is removed. Which makes this
bug a minor and hard-to-trigger annoyance rather than a
data-loss bug (and hence the fix can be saved for a rainy
day when somebody feels like working on it).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t3600-rm.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index a2e1a03..0c44e9f 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -659,4 +659,32 @@ test_expect_success 'rm of file when it has become a directory' '
 	test_path_is_file d/f
 '
 
+test_expect_success SYMLINKS 'rm across a symlinked leading path (no index)' '
+	rm -rf d e &&
+	mkdir e &&
+	echo content >e/f &&
+	ln -s e d &&
+	git add -A e d &&
+	git commit -m "symlink d to e, e/f exists" &&
+	test_must_fail git rm d/f &&
+	git rev-parse --verify :d &&
+	git rev-parse --verify :e/f &&
+	test -h d &&
+	test_path_is_file e/f
+'
+
+test_expect_failure SYMLINKS 'rm across a symlinked leading path (w/ index)' '
+	rm -rf d e &&
+	mkdir d &&
+	echo content >d/f &&
+	git add -A e d &&
+	git commit -m "d/f exists" &&
+	mv d e &&
+	ln -s e d &&
+	test_must_fail git rm d/f &&
+	git rev-parse --verify :d/f &&
+	test -h d &&
+	test_path_is_file e/f
+'
+
 test_done
-- 
1.8.2.rc0.33.gd915649

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-05  0:00                       ` Jeff King
@ 2013-04-05  4:59                         ` Junio C Hamano
  2013-04-05  5:04                           ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2013-04-05  4:59 UTC (permalink / raw)
  To: Jeff King; +Cc: jpinheiro, git

Jeff King <peff@peff.net> writes:

> Here's a replacement for patch 3, then. I wasn't sure if the
> editorializing in the last 2 paragraphs should go in the commit message
> or the cover letter; feel free to tweak as you see fit.

They look fine as they are.

> That means the worst case is not the accidental loss of content,
> but rather confusion by the user when a copy of a file another
> part of the tree is removed.

A copy of a file that is on the filesystem that may not be related
to the project at all may be lost, and the user may not notice the
lossage for quite a while.  A symlink that points at /etc/passwd may
cause the file to be removed and the user will hopefully notice, but
if the pointed-at file is something in $HOME/tmp/ that you occasionally
use, you may not notice the lossage immediately, and when you notice
the loss, the only assurance you have is that there is a blob that
records what was lost _somewhere_ in _some_ of your project that had
a symlink that points at $HOME/tmp/ at some point in the past.

"Exists somewhere, not lost" is not a very useful assurance, if you
ask me ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] t3600: test rm of path with changed leading symlinks
  2013-04-05  4:59                         ` Junio C Hamano
@ 2013-04-05  5:04                           ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2013-04-05  5:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jpinheiro, git

On Thu, Apr 04, 2013 at 09:59:49PM -0700, Junio C Hamano wrote:

> > That means the worst case is not the accidental loss of content,
> > but rather confusion by the user when a copy of a file another
> > part of the tree is removed.
> 
> A copy of a file that is on the filesystem that may not be related
> to the project at all may be lost, and the user may not notice the
> lossage for quite a while.  A symlink that points at /etc/passwd may
> cause the file to be removed and the user will hopefully notice, but
> if the pointed-at file is something in $HOME/tmp/ that you occasionally
> use, you may not notice the lossage immediately, and when you notice
> the loss, the only assurance you have is that there is a blob that
> records what was lost _somewhere_ in _some_ of your project that had
> a symlink that points at $HOME/tmp/ at some point in the past.

It's actually quite hard to lose those files. We will only remove the
file if it has a matching index entry. So you cannot do:

  ln -s /etc foo
  git rm foo/passwd

because there is no index entry for foo/passwd. You would have to do:

  mkdir foo
  echo content >foo/passwd
  git add foo/passwd
  rm -rf foo
  ln -s /etc foo
  git rm foo/passwd

and then you only lose it if it matches exactly "content". And
recovering it, you know that the original path that held the content was
called "passwd". So yes, technically you could lose a file outside of
the repo and have trouble finding which path it came from later. But in
practice, not really.

Anyway, it is academic at this point. :)

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-04-05  5:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-03 14:50 Behavior of git rm jpinheiro
2013-04-03 15:58 ` Jeff King
2013-04-03 17:35   ` Junio C Hamano
2013-04-03 20:36     ` Jeff King
2013-04-04 19:02     ` Jeff King
2013-04-04 19:03       ` [PATCH 1/3] rm: do not complain about d/f conflicts during deletion Jeff King
2013-04-04 19:03       ` [PATCH 2/3] t3600: test behavior of reverse-d/f conflict Jeff King
2013-04-04 19:06       ` [PATCH 3/3] t3600: test rm of path with changed leading symlinks Jeff King
2013-04-04 19:42         ` Junio C Hamano
2013-04-04 19:55           ` Jeff King
2013-04-04 20:31             ` Junio C Hamano
2013-04-04 21:03               ` Jeff King
2013-04-04 23:12                 ` Junio C Hamano
2013-04-04 23:29                   ` Jeff King
2013-04-04 23:33                     ` Junio C Hamano
2013-04-05  0:00                       ` Jeff King
2013-04-05  4:59                         ` Junio C Hamano
2013-04-05  5:04                           ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.