All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] unpack-trees: do not abort when overwriting an existing file with the same content
@ 2013-01-19 11:24 Nguyễn Thái Ngọc Duy
  2013-01-20 18:35 ` Junio C Hamano
  2013-01-21 11:40 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 9+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-01-19 11:24 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 The "git reset" as illustrated in the test case is one case people
 may be hit by this. We can also do the same for "uptodate" check, but
 I'm not sure how that could happen.

 t/t1011-read-tree-sparse-checkout.sh |  3 ++-
 t/t2021-checkout-overwrite.sh        |  8 ++++++++
 unpack-trees.c                       | 27 +++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 5c0053a..38f9899 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -238,7 +238,8 @@ test_expect_success 'print errors when failed to update worktree' '
 	echo sub >.git/info/sparse-checkout &&
 	git checkout -f init &&
 	mkdir sub &&
-	touch sub/added sub/addedtoo &&
+	echo modified >sub/added &&
+	echo modified >sub/addedtoo &&
 	test_must_fail git checkout top 2>actual &&
 	cat >expected <<\EOF &&
 error: The following untracked working tree files would be overwritten by checkout:
diff --git a/t/t2021-checkout-overwrite.sh b/t/t2021-checkout-overwrite.sh
index 5da63e9..4163449 100755
--- a/t/t2021-checkout-overwrite.sh
+++ b/t/t2021-checkout-overwrite.sh
@@ -47,4 +47,12 @@ test_expect_success SYMLINKS 'checkout commit with dir must not remove untracked
 	test -h a/b
 '
 
+test_expect_success 'do not abort on overwriting an existing file with the same content' '
+	echo abc >bar &&
+	git add bar &&
+	git commit -m "new file" &&
+	git reset HEAD^ &&
+	git checkout HEAD@{1}
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 0e1a196..16adc03 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1363,6 +1363,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 			      struct unpack_trees_options *o)
 {
 	struct cache_entry *result;
+	unsigned long ce_size;
 
 	/*
 	 * It may be that the 'lstat()' succeeded even though
@@ -1405,6 +1406,32 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 			return 0;
 	}
 
+	/*
+	 * If it has the same content that we are going to write down,
+	 * there's no point in complaining. We still overwrite it in the
+	 * end though. Permission is not checked so it may be lost.
+	 */
+	if (ce &&
+	    S_ISREG(st->st_mode) && S_ISREG(ce->ce_mode) &&
+	    st->st_size < 1024 * 1024 && /* should be configurable */
+	    sha1_object_info(ce->sha1, &ce_size) == OBJ_BLOB &&
+	    ce_size == st->st_size) {
+		void *buffer = NULL;
+		unsigned long size;
+		enum object_type type;
+		struct strbuf sb = STRBUF_INIT;
+		int matched =
+			strbuf_read_file(&sb, ce->name, ce_size) == ce_size &&
+			(buffer = read_sha1_file(ce->sha1, &type, &size)) != NULL &&
+			type == OBJ_BLOB &&
+			size == ce_size &&
+			!memcmp(buffer, sb.buf, size);
+		free(buffer);
+		strbuf_release(&sb);
+		if (matched)
+			return 0;
+	}
+
 	return o->gently ? -1 :
 		add_rejected_path(o, error_type, name);
 }
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-19 11:24 [PATCH] unpack-trees: do not abort when overwriting an existing file with the same content Nguyễn Thái Ngọc Duy
@ 2013-01-20 18:35 ` Junio C Hamano
  2013-01-21  1:35   ` Duy Nguyen
  2013-01-21 11:40 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-01-20 18:35 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> +	/*
> +	 * If it has the same content that we are going to write down,

write down???

> +	 * there's no point in complaining. We still overwrite it in the
> +	 * end though. Permission is not checked so it may be lost.
> +	 */

That is a regression, isn't it?  Is it too cumbersome to avoid
losing the permission bits by stopping in that case?

> +	if (ce &&
> +	    S_ISREG(st->st_mode) && S_ISREG(ce->ce_mode) &&
> +	    st->st_size < 1024 * 1024 && /* should be configurable */
> +	    sha1_object_info(ce->sha1, &ce_size) == OBJ_BLOB &&
> +	    ce_size == st->st_size) {
> +		void *buffer = NULL;
> +		unsigned long size;
> +		enum object_type type;
> +		struct strbuf sb = STRBUF_INIT;
> +		int matched =
> +			strbuf_read_file(&sb, ce->name, ce_size) == ce_size &&
> +			(buffer = read_sha1_file(ce->sha1, &type, &size)) != NULL &&
> +			type == OBJ_BLOB &&
> +			size == ce_size &&
> +			!memcmp(buffer, sb.buf, size);
> +		free(buffer);
> +		strbuf_release(&sb);
> +		if (matched)
> +			return 0;
> +	}
> +
>  	return o->gently ? -1 :
>  		add_rejected_path(o, error_type, name);
>  }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-20 18:35 ` Junio C Hamano
@ 2013-01-21  1:35   ` Duy Nguyen
  0 siblings, 0 replies; 9+ messages in thread
From: Duy Nguyen @ 2013-01-21  1:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Jan 21, 2013 at 1:35 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> +     /*
>> +      * If it has the same content that we are going to write down,
>
> write down???

hmm.. "overwrite" then.

>> +      * there's no point in complaining. We still overwrite it in the
>> +      * end though. Permission is not checked so it may be lost.
>> +      */
>
> That is a regression, isn't it?  Is it too cumbersome to avoid
> losing the permission bits by stopping in that case?

I'm not sure how to deal with this case. (Lack of) Executable bit can
be easily restored (unlike file content) if we give the user the list
of changed files. On the other hand, not everybody runs git with a
huge scrollback buffer and warnings can be lost. I guess abort is a
safe choice.
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-19 11:24 [PATCH] unpack-trees: do not abort when overwriting an existing file with the same content Nguyễn Thái Ngọc Duy
  2013-01-20 18:35 ` Junio C Hamano
@ 2013-01-21 11:40 ` Nguyễn Thái Ngọc Duy
  2013-01-21 23:15   ` Jeff King
  1 sibling, 1 reply; 9+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-01-21 11:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Changes: do not lose worktree's executable permission.

 t/t1011-read-tree-sparse-checkout.sh |  3 ++-
 t/t2021-checkout-overwrite.sh        | 18 ++++++++++++++++++
 unpack-trees.c                       | 31 +++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 5c0053a..38f9899 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -238,7 +238,8 @@ test_expect_success 'print errors when failed to update worktree' '
 	echo sub >.git/info/sparse-checkout &&
 	git checkout -f init &&
 	mkdir sub &&
-	touch sub/added sub/addedtoo &&
+	echo modified >sub/added &&
+	echo modified >sub/addedtoo &&
 	test_must_fail git checkout top 2>actual &&
 	cat >expected <<\EOF &&
 error: The following untracked working tree files would be overwritten by checkout:
diff --git a/t/t2021-checkout-overwrite.sh b/t/t2021-checkout-overwrite.sh
index 5da63e9..bb1696d 100755
--- a/t/t2021-checkout-overwrite.sh
+++ b/t/t2021-checkout-overwrite.sh
@@ -47,4 +47,22 @@ test_expect_success SYMLINKS 'checkout commit with dir must not remove untracked
 	test -h a/b
 '
 
+test_expect_success 'do not abort on overwriting an existing file with the same content' '
+	echo abc >bar &&
+	git add bar &&
+	git commit -m "new file" &&
+	git reset HEAD^ &&
+	git checkout HEAD@{1}
+'
+
+test_expect_success POSIXPERM 'do abort on an existing file, same content but different permission' '
+	git checkout -f HEAD^ &&
+	echo abc >bar &&
+	git add bar &&
+	git commit -m "new file" &&
+	git reset HEAD^ &&
+	chmod a+x bar &&
+	test_must_fail git checkout HEAD@{1}
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 0e1a196..ea204ae 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -9,6 +9,8 @@
 #include "refs.h"
 #include "attr.h"
 
+#define SAME_CONTENT_SIZE_LIMIT (1024 * 1024)
+
 /*
  * Error messages expected by scripts out of plumbing commands such as
  * read-tree.  Non-scripted Porcelain is not required to use these messages
@@ -1363,6 +1365,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 			      struct unpack_trees_options *o)
 {
 	struct cache_entry *result;
+	unsigned long ce_size;
 
 	/*
 	 * It may be that the 'lstat()' succeeded even though
@@ -1405,6 +1408,34 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
 			return 0;
 	}
 
+	/*
+	 * If it has the same content that we are going to overwrite,
+	 * there's no point in complaining. We still overwrite it in
+	 * the end though.
+	 */
+	if (ce &&
+	    S_ISREG(st->st_mode) && S_ISREG(ce->ce_mode) &&
+	    (!trust_executable_bit ||
+	     (0100 & (ce->ce_mode ^ st->st_mode)) == 0) &&
+	    st->st_size < SAME_CONTENT_SIZE_LIMIT &&
+	    sha1_object_info(ce->sha1, &ce_size) == OBJ_BLOB &&
+	    ce_size == st->st_size) {
+		void *buffer = NULL;
+		unsigned long size;
+		enum object_type type;
+		struct strbuf sb = STRBUF_INIT;
+		int matched =
+			strbuf_read_file(&sb, ce->name, ce_size) == ce_size &&
+			(buffer = read_sha1_file(ce->sha1, &type, &size)) != NULL &&
+			type == OBJ_BLOB &&
+			size == ce_size &&
+			!memcmp(buffer, sb.buf, size);
+		free(buffer);
+		strbuf_release(&sb);
+		if (matched)
+			return 0;
+	}
+
 	return o->gently ? -1 :
 		add_rejected_path(o, error_type, name);
 }
-- 
1.8.0.rc2.23.g1fb49df

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-21 11:40 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
@ 2013-01-21 23:15   ` Jeff King
  2013-01-22  0:59     ` Duy Nguyen
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2013-01-21 23:15 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

On Mon, Jan 21, 2013 at 06:40:33PM +0700, Nguyen Thai Ngoc Duy wrote:

> +	/*
> +	 * If it has the same content that we are going to overwrite,
> +	 * there's no point in complaining. We still overwrite it in
> +	 * the end though.
> +	 */
> +	if (ce &&
> +	    S_ISREG(st->st_mode) && S_ISREG(ce->ce_mode) &&
> +	    (!trust_executable_bit ||
> +	     (0100 & (ce->ce_mode ^ st->st_mode)) == 0) &&
> +	    st->st_size < SAME_CONTENT_SIZE_LIMIT &&
> +	    sha1_object_info(ce->sha1, &ce_size) == OBJ_BLOB &&
> +	    ce_size == st->st_size) {
> +		void *buffer = NULL;
> +		unsigned long size;
> +		enum object_type type;
> +		struct strbuf sb = STRBUF_INIT;
> +		int matched =
> +			strbuf_read_file(&sb, ce->name, ce_size) == ce_size &&
> +			(buffer = read_sha1_file(ce->sha1, &type, &size)) != NULL &&
> +			type == OBJ_BLOB &&
> +			size == ce_size &&
> +			!memcmp(buffer, sb.buf, size);
> +		free(buffer);
> +		strbuf_release(&sb);
> +		if (matched)
> +			return 0;
> +	}

Can you elaborate on when this code is triggered?

In the general case, shouldn't we already know the sha1 of what's on
disk in the index, and be able to just compare the hashes? And if we
don't, because the entry is start-dirty, should we be updating it
(possibly earlier, so we do not even get into the "need to write" code
path) instead of doing this ad-hoc byte comparison?

Confused...

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-21 23:15   ` Jeff King
@ 2013-01-22  0:59     ` Duy Nguyen
  2013-01-22  1:45       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2013-01-22  0:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Tue, Jan 22, 2013 at 6:15 AM, Jeff King <peff@peff.net> wrote:
> Can you elaborate on when this code is triggered?
>
> In the general case, shouldn't we already know the sha1 of what's on
> disk in the index, and be able to just compare the hashes? And if we
> don't, because the entry is start-dirty, should we be updating it
> (possibly earlier, so we do not even get into the "need to write" code
> path) instead of doing this ad-hoc byte comparison?
>
> Confused...

git reset HEAD~10
# blah that was a mistake, undo it
git checkout HEAD@{1}

I hit it a few times, probably not with the exact recipe above though.
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-22  0:59     ` Duy Nguyen
@ 2013-01-22  1:45       ` Junio C Hamano
  2013-01-22  1:55         ` Duy Nguyen
  2013-01-22 20:19         ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2013-01-22  1:45 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, git

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Jan 22, 2013 at 6:15 AM, Jeff King <peff@peff.net> wrote:
>> Can you elaborate on when this code is triggered?
>>
>> In the general case, shouldn't we already know the sha1 of what's on
>> disk in the index, and be able to just compare the hashes? And if we
>> don't, because the entry is start-dirty, should we be updating it
>> (possibly earlier, so we do not even get into the "need to write" code
>> path) instead of doing this ad-hoc byte comparison?

If the index knows what is in the working tree, I think I agree.

>> Confused...
>
> git reset HEAD~10
> # blah that was a mistake, undo it
> git checkout HEAD@{1}
>
> I hit it a few times, probably not with the exact recipe above though.

I've seen myself doing "git reset HEAD~10" by mistake (I wanted "--hard"),
but the recovery is to do "git reset --hard @{1}" and then come back
with "git reset --hard HEAD~10", so it hasn't been a real problem
for me.

A case similar to this is already covered by a special case of
two-tree read-tree where the index already matches the tree we are
checking out even though it is different from HEAD; in other words,
if you did this:

        git init
        date >file
        git add file; git commit -m 'has a file'
        git checkout -b side
        git rm file; git commit -m 'does not have the file'
        git checkout master file
	: now index has the file from 'master' and worktree is clean
        git checkout master

you shouldn't get any complaint, I think.

If you did "git rm --cached file" to lose it from the index,
immediately after "git checkout master file", then we wouldn't know
what we may be losing.  If the file in the working tree happens to
match the index and the HEAD after checking out the other branch
('master' in this case), it is _not_ losing information, so we might
be able to treat it as an extension of the existing special case.

I haven't thought things through to see if in a more general case
that this codepath triggers we might be losing a local changes that
we should be carrying forward while checking out a new branch,
though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-22  1:45       ` Junio C Hamano
@ 2013-01-22  1:55         ` Duy Nguyen
  2013-01-22 20:19         ` Jeff King
  1 sibling, 0 replies; 9+ messages in thread
From: Duy Nguyen @ 2013-01-22  1:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Tue, Jan 22, 2013 at 8:45 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Tue, Jan 22, 2013 at 6:15 AM, Jeff King <peff@peff.net> wrote:
>>> Can you elaborate on when this code is triggered?
>>>
>>> In the general case, shouldn't we already know the sha1 of what's on
>>> disk in the index, and be able to just compare the hashes? And if we
>>> don't, because the entry is start-dirty, should we be updating it
>>> (possibly earlier, so we do not even get into the "need to write" code
>>> path) instead of doing this ad-hoc byte comparison?
>
> If the index knows what is in the working tree, I think I agree.
>
>>> Confused...
>>
>> git reset HEAD~10
>> # blah that was a mistake, undo it
>> git checkout HEAD@{1}
>>
>> I hit it a few times, probably not with the exact recipe above though.
>
> I've seen myself doing "git reset HEAD~10" by mistake (I wanted "--hard"),
> but the recovery is to do "git reset --hard @{1}" and then come back
> with "git reset --hard HEAD~10", so it hasn't been a real problem
> for me.

Except when you already make some changes elsewhere and you don't want --hard.
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] unpack-trees: do not abort when overwriting an existing file with the same content
  2013-01-22  1:45       ` Junio C Hamano
  2013-01-22  1:55         ` Duy Nguyen
@ 2013-01-22 20:19         ` Jeff King
  1 sibling, 0 replies; 9+ messages in thread
From: Jeff King @ 2013-01-22 20:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, git

On Mon, Jan 21, 2013 at 05:45:05PM -0800, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > On Tue, Jan 22, 2013 at 6:15 AM, Jeff King <peff@peff.net> wrote:
> >> Can you elaborate on when this code is triggered?
> >>
> >> In the general case, shouldn't we already know the sha1 of what's on
> >> disk in the index, and be able to just compare the hashes? And if we
> >> don't, because the entry is start-dirty, should we be updating it
> >> (possibly earlier, so we do not even get into the "need to write" code
> >> path) instead of doing this ad-hoc byte comparison?
> 
> If the index knows what is in the working tree, I think I agree.

Right. I was wondering why it didn't (and if it doesn't, why we are not
saving the information there).

But I think I was letting my inaccurate mental model of the index get in
the way. I tend to think of the stat information as "if the file matches
this stat info, then it has sha1 X". But that is not true. The sha1 we
store is the actual index entry, and if it does not match what is in the
working tree, we do not know or store the sha1 of what is in the working
tree. We cannot just "refresh" that value and compare it, which is what
I was implying.

So I think I was just confused. That is what I get for not actually
doing low-level index stuff enough.

> > git reset HEAD~10
> > # blah that was a mistake, undo it
> > git checkout HEAD@{1}

It seems like

  git reset HEAD@{1}

would be the correct undo, as the original never touched the working
tree.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-01-22 20:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-19 11:24 [PATCH] unpack-trees: do not abort when overwriting an existing file with the same content Nguyễn Thái Ngọc Duy
2013-01-20 18:35 ` Junio C Hamano
2013-01-21  1:35   ` Duy Nguyen
2013-01-21 11:40 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-01-21 23:15   ` Jeff King
2013-01-22  0:59     ` Duy Nguyen
2013-01-22  1:45       ` Junio C Hamano
2013-01-22  1:55         ` Duy Nguyen
2013-01-22 20:19         ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.