All of lore.kernel.org
 help / color / mirror / Atom feed
* two-way merge corner case bug
@ 2013-02-26 20:06 Junio C Hamano
  2013-02-26 20:18 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-02-26 20:06 UTC (permalink / raw)
  To: git; +Cc: peff

It seems that we have a corner case bug in two-way merge to reset
away the conflicts with "read-tree -u --reset HEAD ORIG_HEAD".

In a freshly created repository:

    rm -f A Z
    for i in $(seq 100)
    do
            echo hello
    done >Z
    git add Z
    git commit -m initial
    git branch side
    echo goodbye >>Z
    git commit -a -m changed
    git checkout side
    git mv Z A
    echo kitty >>A
    git commit -a -m side

if you do one of these:

	git checkout master && git merge side
	git checkout side && git merge master

you will get three stages for path A, and nothing for path Z.

With the former, you are starting from a tree with Z (and not A),
and "resetting" should give you A that is the same as 'master' in
the working tree.  With the latter, you are starting from a tree
with A, and "resetting" should give you Z and remove conflicted A
from the working tree.

	git read-tree -u --reset ORIG_HEAD

does the right thing in either case, but

	git read-tree -u --reset HEAD ORIG_HEAD

dies with "cache entry has null sha1: A" for both cases.

The direct symptom comes from 4337b5856f88 (do not write null sha1s
to on-disk index, 2012-07-28), but I think the root cause is
somewhere else.  When we start from 'side', we should notice that
our HEAD has Z, and conflicted A we see in the index came from a
conflicted merge that we are resetting away and should disappear
from the working tree.  When we start from 'master', we should
notice that we do not want to keep the cache entry that is
"modified" in the index, and write the one taken from the "going to"
tree (ORIG_HEAD).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-02-26 20:06 two-way merge corner case bug Junio C Hamano
@ 2013-02-26 20:18 ` Jeff King
  2013-02-26 21:41   ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2013-02-26 20:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Feb 26, 2013 at 12:06:42PM -0800, Junio C Hamano wrote:

> It seems that we have a corner case bug in two-way merge to reset
> away the conflicts with "read-tree -u --reset HEAD ORIG_HEAD".

Isn't this a repeat of:

  http://thread.gmane.org/gmane.comp.version-control.git/202440/focus=212316

whose fix never got finalized? It's on my todo list, but it seems that
items keep sinking further down on that list, rather than getting
completed. :-/

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-02-26 20:18 ` Jeff King
@ 2013-02-26 21:41   ` Junio C Hamano
  2013-03-01  9:22     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-02-26 21:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Tue, Feb 26, 2013 at 12:06:42PM -0800, Junio C Hamano wrote:
>
>> It seems that we have a corner case bug in two-way merge to reset
>> away the conflicts with "read-tree -u --reset HEAD ORIG_HEAD".
>
> Isn't this a repeat of:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/202440/focus=212316
>
> whose fix never got finalized? It's on my todo list, but it seems that
> items keep sinking further down on that list, rather than getting
> completed. :-/

Yeah, I think the patch in your message is a good starting point to
solve a half of the issue (i.e. unmerged current one should be
replaced with the version from the "going to" tree).  Attached is
with a slight update.

The other half that is not solved by this patch is that we will lose
Z because merge-recursive removed it when it renamed it and left
three stages for A in the index, and to "read-tree --reset -u HEAD
ORIG_HEAD", it looks as if you removed Z yourself while on HEAD and
want to carry your local change forward to ORIG_HEAD.

I think this too needs to be fixed.  We are saying "-u --reset" to
mean "I want to go there no matter what", not "-u -m" that means "I
am carrying my changes forward while checking out another branch".

 unpack-trees.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 09e53df..3d17108 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1748,14 +1748,23 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
 		newtree = NULL;
 
 	if (current) {
-		if ((!oldtree && !newtree) || /* 4 and 5 */
-		    (!oldtree && newtree &&
-		     same(current, newtree)) || /* 6 and 7 */
-		    (oldtree && newtree &&
-		     same(oldtree, newtree)) || /* 14 and 15 */
-		    (oldtree && newtree &&
-		     !same(oldtree, newtree) && /* 18 and 19 */
-		     same(current, newtree))) {
+		if (current->ce_flags & CE_CONFLICTED) {
+			if (same(oldtree, newtree) || o->reset) {
+				if (!newtree)
+					return deleted_entry(current, current, o);
+				else
+					return merged_entry(newtree, current, o);
+			}
+			return o->gently ? -1 : reject_merge(current, o);
+		}
+		else if ((!oldtree && !newtree) || /* 4 and 5 */
+			 (!oldtree && newtree &&
+			  same(current, newtree)) || /* 6 and 7 */
+			 (oldtree && newtree &&
+			  same(oldtree, newtree)) || /* 14 and 15 */
+			 (oldtree && newtree &&
+			  !same(oldtree, newtree) && /* 18 and 19 */
+			  same(current, newtree))) {
 			return keep_entry(current, o);
 		}
 		else if (oldtree && !newtree && same(current, oldtree)) {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-02-26 21:41   ` Junio C Hamano
@ 2013-03-01  9:22     ` Jeff King
  2013-03-01 16:57       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2013-03-01  9:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Feb 26, 2013 at 01:41:54PM -0800, Junio C Hamano wrote:

> > Isn't this a repeat of:
> >
> >   http://thread.gmane.org/gmane.comp.version-control.git/202440/focus=212316
> [...]
> 
> Yeah, I think the patch in your message is a good starting point to
> solve a half of the issue (i.e. unmerged current one should be
> replaced with the version from the "going to" tree).  Attached is
> with a slight update.

Thanks for picking this up. I'm not sure, though, about the trigger
here:

> +		if (current->ce_flags & CE_CONFLICTED) {
> +			if (same(oldtree, newtree) || o->reset) {
> +				if (!newtree)
> +					return deleted_entry(current, current, o);
> +				else
> +					return merged_entry(newtree, current, o);
> +			}
> +			return o->gently ? -1 : reject_merge(current, o);

We have a conflicted state in the index for a path. If o->reset is true,
then we want to throw it out in favor of where we are moving to. So that
part makes sense.

But if we are not in "--reset", and we see that oldtree and newtree are
the same, do we still want to move to newtree? It seems like we are
throwing away the conflicted state that the user has. Usually this isn't
a big deal, but with "-u" it could mean losing any in-progress
resolution in the working tree.

I think this generally shouldn't happen, as "-m" does not operate when
the index contains unmerged entries. So it would really just be a safety
valve for somebody calling into unpack_trees without having done the
index check. IOW, should the same(newtree, oldtree) call be cut there,
and we trigger this just on reset? Any other case would be rejected
(which _shouldn't_ happen, but this keeps us conservative, and if we
find a case where it does happen, we can figure out the semantics then).

> The other half that is not solved by this patch is that we will lose
> Z because merge-recursive removed it when it renamed it and left
> three stages for A in the index, and to "read-tree --reset -u HEAD
> ORIG_HEAD", it looks as if you removed Z yourself while on HEAD and
> want to carry your local change forward to ORIG_HEAD.
>
> I think this too needs to be fixed.  We are saying "-u --reset" to
> mean "I want to go there no matter what", not "-u -m" that means "I
> am carrying my changes forward while checking out another branch".

Yeah, hmm. Since we cannot distinguish between such a staged removal and
what merge-recursive deposits in the index, we have to err on on side
(either losing a staged removal, or failing to move to the new state).
What we really want is to detect that Z's removal is related to the
conflicted path A, but we can't do so; that information was never put
into the index in the first place.

We could do this:

diff --git a/unpack-trees.c b/unpack-trees.c
index 3d17108..1e84131 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1788,7 +1788,7 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
 		}
 	}
 	else if (newtree) {
-		if (oldtree && !o->initial_checkout) {
+		if (oldtree && !o->initial_checkout && !o->reset) {
 			/*
 			 * deletion of the path was staged;
 			 */

which solves the case you presented. My worry would be that somebody is
using "--reset" but expecting the removal to be carried through
(technically, "--reset" is documented as "-m but discard unmerged
entries", but we are not really treating it that way here). The test
suite passes, but we may not be covering some oddball case.

-Peff

PS I wonder if the "initial_checkout" hack can just go away under this
   new rule. Although we don't seem to use "o->reset" for the initial
   checkout. Which seems kind of weird. I would think the initial
   checkout would actually just be a oneway merge. Is the point to try
   to leave any existing working tree contents untouched or something?

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-03-01  9:22     ` Jeff King
@ 2013-03-01 16:57       ` Junio C Hamano
  2013-03-01 22:36         ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-03-01 16:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> PS I wonder if the "initial_checkout" hack can just go away under this
>    new rule. Although we don't seem to use "o->reset" for the initial
>    checkout. Which seems kind of weird. I would think the initial
>    checkout would actually just be a oneway merge. Is the point to try
>    to leave any existing working tree contents untouched or something?

An initial checkout is *supposed* to happen in an empty working
tree, so if we code it not to overwrite an existing path in the
working tree, the user cannot lose possibly precious contents with
an mistaken initial checkout (they will instead appear as modified
relative to the index), while in the normal case we will write out
the contents from the HEAD through the index.  We could attempt "we
do not have to if the user behaves, but with this we could help
misbehaving users" if we used twoway merge for an initial checkout.

Having said that, I notice that in the normal codepath (e.g. "git
clone" without the "--no-checkout" option) we no longer use twoway
merge for the initial checkout.  Back when "git clone" was a
scripted Porcelain, I think we used to do a twoway read-tree.  It
may be that we broke it when "clone" was rewritten in C, but the
breakage is to the "we do not have to..." thing, so it may not be a
big deal.

The only case that matters in today's code is "git checkout"
(without any option or argument) immediately after "git clone -n", I
think.  The special casing for this initial checkout in twoway merge
is needed because we go from HEAD to HEAD in that case, and we do
not want to keep the artificial local removals from the index; we
start from not even having the $GIT_INDEX_FILE, so without the
special case all paths appear to have been "rm --cached", which is
usually not what the user would want to see ;-)

> ... My worry would be that somebody is
> using "--reset" but expecting the removal to be carried through
> (technically, "--reset" is documented as "-m but discard unmerged
> entries", but we are not really treating it that way here).

I've checked all in-tree uses of "read-tree --reset -u".

Nobody seems to use that combination, either from scripts or from C
(i.e. when opts.update==1 and opts.merge==1, opts.reset is not set)
with a twoway merge, other than "git am --abort/--skip".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-03-01 16:57       ` Junio C Hamano
@ 2013-03-01 22:36         ` Jeff King
  2013-03-01 23:06           ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2013-03-01 22:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Mar 01, 2013 at 08:57:03AM -0800, Junio C Hamano wrote:

> An initial checkout is *supposed* to happen in an empty working
> tree, so if we code it not to overwrite an existing path in the
> working tree, the user cannot lose possibly precious contents with
> an mistaken initial checkout (they will instead appear as modified
> relative to the index), while in the normal case we will write out
> the contents from the HEAD through the index.  We could attempt "we
> do not have to if the user behaves, but with this we could help
> misbehaving users" if we used twoway merge for an initial checkout.

That matches my thinking. It is probably not worth touching, though,
since it is not causing any problems. I just found it curious that the
exact same (and only, as far as I can see) exception we make for
initial_checkout is the same thing we have to tweak here.

> Having said that, I notice that in the normal codepath (e.g. "git
> clone" without the "--no-checkout" option) we no longer use twoway
> merge for the initial checkout.  Back when "git clone" was a
> scripted Porcelain, I think we used to do a twoway read-tree.  It
> may be that we broke it when "clone" was rewritten in C, but the
> breakage is to the "we do not have to..." thing, so it may not be a
> big deal.

The one-way merge that we use now in clone makes a lot of sense to me.
We do not have a "previous state" we were based on.

> The only case that matters in today's code is "git checkout"
> (without any option or argument) immediately after "git clone -n", I
> think.  The special casing for this initial checkout in twoway merge
> is needed because we go from HEAD to HEAD in that case, and we do
> not want to keep the artificial local removals from the index; we
> start from not even having the $GIT_INDEX_FILE, so without the
> special case all paths appear to have been "rm --cached", which is
> usually not what the user would want to see ;-)

Right. I just wondered if such a checkout should instead be a "reset",
in which case it would fall under the proposed patch. But "git checkout"
never does a twoway_merge with o->reset; instead, it uses a one-way
merge.

Anyway, that is all tangential to the bug at hand.

> > ... My worry would be that somebody is
> > using "--reset" but expecting the removal to be carried through
> > (technically, "--reset" is documented as "-m but discard unmerged
> > entries", but we are not really treating it that way here).
> 
> I've checked all in-tree uses of "read-tree --reset -u".
> 
> Nobody seems to use that combination, either from scripts or from C
> (i.e. when opts.update==1 and opts.merge==1, opts.reset is not set)
> with a twoway merge, other than "git am --abort/--skip".

I can believe it. So do we want to do that fix, then? Did you want to
roll up the two halves of it with a test and write a commit message? I
feel like you could write a much more coherent one than I could on this
subject.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-03-01 22:36         ` Jeff King
@ 2013-03-01 23:06           ` Junio C Hamano
  2013-03-01 23:08             ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-03-01 23:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Fri, Mar 01, 2013 at 08:57:03AM -0800, Junio C Hamano wrote:
>> 
>> Nobody seems to use that combination, either from scripts or from C
>> (i.e. when opts.update==1 and opts.merge==1, opts.reset is not set)
>> with a twoway merge, other than "git am --abort/--skip".
>
> I can believe it. So do we want to do that fix, then? Did you want to
> roll up the two halves of it with a test and write a commit message? I
> feel like you could write a much more coherent one than I could on this
> subject.

I actually was wondering if we can remove that sole uses of two-way
merge with --reset -u from "git am" and replace them with something
else.  But we do want to keep local change that existed before "am"
started, so we cannot avoid use of two-way merge, I guess...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-03-01 23:06           ` Junio C Hamano
@ 2013-03-01 23:08             ` Jeff King
  2013-03-01 23:49               ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2013-03-01 23:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Mar 01, 2013 at 03:06:57PM -0800, Junio C Hamano wrote:

> > I can believe it. So do we want to do that fix, then? Did you want to
> > roll up the two halves of it with a test and write a commit message? I
> > feel like you could write a much more coherent one than I could on this
> > subject.
> 
> I actually was wondering if we can remove that sole uses of two-way
> merge with --reset -u from "git am" and replace them with something
> else.  But we do want to keep local change that existed before "am"
> started, so we cannot avoid use of two-way merge, I guess...

Yeah, I think that is a case we definitely want to keep, as it means any
intermediate work done by the user in applying the patch is not lost.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-03-01 23:08             ` Jeff King
@ 2013-03-01 23:49               ` Junio C Hamano
  2013-03-02  0:54                 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-03-01 23:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> I actually was wondering if we can remove that sole uses of two-way
>> merge with --reset -u from "git am" and replace them with something
>> else.  But we do want to keep local change that existed before "am"
>> started, so we cannot avoid use of two-way merge, I guess...
>
> Yeah, I think that is a case we definitely want to keep, as it means any
> intermediate work done by the user in applying the patch is not lost.

I am not sure what you mean by "intermediate" here.  When the user
attempts to resolve conflict in a path manually and gives up, we do
want to revert changes to such a path to that of HEAD.

Clarifying the semantics of "--reset" needs to be done carefully.

I think any difference "git diff --cached" shows are fair game to
revert to HEAD.  In the earlier example, path "Z" that was created
by recursive merge in an attempt to rename "A" should be removed,
and path "A" should be recreated to that of HEAD, as we know at the
point of "am --skip/--abort" that these two paths were involved in
the recursive merge invoked by the patch application process (that
is the only possible reason why these are different in the index
from HEAD).  Also any conflicting entries can only come from
three-way merge and they should be reverted to that of HEAD.

On the other hand, the paths that appear in "git diff" (except for
those that appear in "git diff --cached", which we will revert to
HEAD following the logic of the previous paragraph) must be kept.
They are changes that were already present in the working tree
before the user decided to accept a trivial patch via "am" that does
not overlap with what the user was doing.  We allow a dirty working
tree but disallow a dirty index when "git am" starts, exactly
because we want to ensure this property.

By doing both of the above, we should be able to satisfy the user
who uses "am --abort/--skip", in order to restore paths that were
involved in the failed attempt to three-way merge to revert back to
that of HEAD, while keeping unrelated changes that were present
before "am" started.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: two-way merge corner case bug
  2013-03-01 23:49               ` Junio C Hamano
@ 2013-03-02  0:54                 ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2013-03-02  0:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Mar 01, 2013 at 03:49:23PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> I actually was wondering if we can remove that sole uses of two-way
> >> merge with --reset -u from "git am" and replace them with something
> >> else.  But we do want to keep local change that existed before "am"
> >> started, so we cannot avoid use of two-way merge, I guess...
> >
> > Yeah, I think that is a case we definitely want to keep, as it means any
> > intermediate work done by the user in applying the patch is not lost.
> 
> I am not sure what you mean by "intermediate" here.  When the user
> attempts to resolve conflict in a path manually and gives up, we do
> want to revert changes to such a path to that of HEAD.

I thought we left it unchanged intentionally, but I cannot seem to find
the conversation I am thinking of. I did find this:

  http://thread.gmane.org/gmane.comp.version-control.git/164002

But that is not about "save the intermediate work done on applying the
series", but rather to help with this case:

  1. You "git am" a series. It doesn't apply.

  2. You get distracted and work on something else, forgetting that the
     dotest directory is sitting there.

  3. Time passes.

  4. You try to "git am" a new series. An "am" is already in progress,
     so you are advised to "git am --abort".

At that point you do not want to reset the commit to the original head
(which was fixed by 7b3b7e3), but you do not necessarily want to reset
the working tree contents either. We leave them in the name of safety
and letting the user deal with it.

> Clarifying the semantics of "--reset" needs to be done carefully.
> 
> I think any difference "git diff --cached" shows are fair game to
> revert to HEAD.  In the earlier example, path "Z" that was created
> by recursive merge in an attempt to rename "A" should be removed,
> and path "A" should be recreated to that of HEAD, as we know at the
> point of "am --skip/--abort" that these two paths were involved in
> the recursive merge invoked by the patch application process (that
> is the only possible reason why these are different in the index
> from HEAD).  Also any conflicting entries can only come from
> three-way merge and they should be reverted to that of HEAD.
> 
> On the other hand, the paths that appear in "git diff" (except for
> those that appear in "git diff --cached", which we will revert to
> HEAD following the logic of the previous paragraph) must be kept.
> They are changes that were already present in the working tree
> before the user decided to accept a trivial patch via "am" that does
> not overlap with what the user was doing.  We allow a dirty working
> tree but disallow a dirty index when "git am" starts, exactly
> because we want to ensure this property.
> 
> By doing both of the above, we should be able to satisfy the user
> who uses "am --abort/--skip", in order to restore paths that were
> involved in the failed attempt to three-way merge to revert back to
> that of HEAD, while keeping unrelated changes that were present
> before "am" started.

I think that makes sense, but I don't think we have a read-tree mode to
do that in one pass, do we? Not that we can't add one if we need.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-03-02  0:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-26 20:06 two-way merge corner case bug Junio C Hamano
2013-02-26 20:18 ` Jeff King
2013-02-26 21:41   ` Junio C Hamano
2013-03-01  9:22     ` Jeff King
2013-03-01 16:57       ` Junio C Hamano
2013-03-01 22:36         ` Jeff King
2013-03-01 23:06           ` Junio C Hamano
2013-03-01 23:08             ` Jeff King
2013-03-01 23:49               ` Junio C Hamano
2013-03-02  0:54                 ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.