All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: git subtree split gets confused on removed and readded directory
@ 2016-01-15 16:23 Marcus Brinkmann
  2016-01-15 23:44 ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Marcus Brinkmann @ 2016-01-15 16:23 UTC (permalink / raw)
  To: git

Hi,

I made a simple test repository showing the problem here:
https://github.com/lambdafu/git-subtree-split-test

After creating the master branch, I created the split/bar branch like this:

$ git subtree split -P bar -b split/bar

The resulting history is confused by the directory "bar" which was
added, removed and then re-added again.  The recent history up to adding
the directory the second time is fine.  But then it seems to loose track
and add the parent of that commit up to the initial commit in the history.

I'd expect that the parent of the readding commit is an empty tree
commit (which removed the last files in the directory), and that before
that are commits that reflect the initial creation of that directory
with its files, but rewritten as a subtree, of course.

Thanks!
Marcus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: git subtree split gets confused on removed and readded directory
  2016-01-15 16:23 BUG: git subtree split gets confused on removed and readded directory Marcus Brinkmann
@ 2016-01-15 23:44 ` Junio C Hamano
  2016-01-17 19:34   ` David Ware
  2016-01-17 23:23   ` David A. Greene
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-01-15 23:44 UTC (permalink / raw)
  To: Dave Ware, David A. Greene; +Cc: git, Marcus Brinkmann

Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes:

> I made a simple test repository showing the problem here:
> https://github.com/lambdafu/git-subtree-split-test
>
> After creating the master branch, I created the split/bar branch like this:
>
> $ git subtree split -P bar -b split/bar
>
> The resulting history is confused by the directory "bar" which was
> added, removed and then re-added again.  The recent history up to adding
> the directory the second time is fine.  But then it seems to loose track
> and add the parent of that commit up to the initial commit in the history.
>
> I'd expect that the parent of the readding commit is an empty tree
> commit (which removed the last files in the directory), and that before
> that are commits that reflect the initial creation of that directory
> with its files, but rewritten as a subtree, of course.

Thanks for a report.

David, does this ring a bell?

Dave, does your fix "subtree split" we saw recently on the list

    http://article.gmane.org/gmane.comp.version-control.git/284125

help this?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: git subtree split gets confused on removed and readded directory
  2016-01-15 23:44 ` Junio C Hamano
@ 2016-01-17 19:34   ` David Ware
  2016-01-17 23:23   ` David A. Greene
  1 sibling, 0 replies; 12+ messages in thread
From: David Ware @ 2016-01-17 19:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David A. Greene, git, Marcus Brinkmann

No my patch doesn't seem to fix this.

Cheers,
Dave Ware

(sorry if you're receiving this for the second time, I'm resending
since the mailing list blocked my earlier reply for html content)

On Sat, Jan 16, 2016 at 12:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes:
>
>> I made a simple test repository showing the problem here:
>> https://github.com/lambdafu/git-subtree-split-test
>>
>> After creating the master branch, I created the split/bar branch like this:
>>
>> $ git subtree split -P bar -b split/bar
>>
>> The resulting history is confused by the directory "bar" which was
>> added, removed and then re-added again.  The recent history up to adding
>> the directory the second time is fine.  But then it seems to loose track
>> and add the parent of that commit up to the initial commit in the history.
>>
>> I'd expect that the parent of the readding commit is an empty tree
>> commit (which removed the last files in the directory), and that before
>> that are commits that reflect the initial creation of that directory
>> with its files, but rewritten as a subtree, of course.
>
> Thanks for a report.
>
> David, does this ring a bell?
>
> Dave, does your fix "subtree split" we saw recently on the list
>
>     http://article.gmane.org/gmane.comp.version-control.git/284125
>
> help this?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: git subtree split gets confused on removed and readded directory
  2016-01-15 23:44 ` Junio C Hamano
  2016-01-17 19:34   ` David Ware
@ 2016-01-17 23:23   ` David A. Greene
  2016-01-20  1:17     ` [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) Marcus Brinkmann
  1 sibling, 1 reply; 12+ messages in thread
From: David A. Greene @ 2016-01-17 23:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Dave Ware, git, Marcus Brinkmann

Junio C Hamano <gitster@pobox.com> writes:

> Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes:
>
>> I made a simple test repository showing the problem here:
>> https://github.com/lambdafu/git-subtree-split-test>
>> After creating the master branch, I created the split/bar branch like this:
>>
>> $ git subtree split -P bar -b split/bar
>>
>> The resulting history is confused by the directory "bar" which was
>> added, removed and then re-added again.  The recent history up to adding
>> the directory the second time is fine.  But then it seems to loose track
>> and add the parent of that commit up to the initial commit in the history.
>>
>> I'd expect that the parent of the readding commit is an empty tree
>> commit (which removed the last files in the directory), and that before
>> that are commits that reflect the initial creation of that directory
>> with its files, but rewritten as a subtree, of course.
>
> Thanks for a report.

Yes, thank you!

> David, does this ring a bell?

No, I have not run into this before.  I'm actually going to be working
in the split code starting sometime this month (work allowing, of
course).  So it's great to get a report like this.

One of the things I want to do is eventually move over subtree split to
using a proper filter-branch instead of the entirely custom code that's
currently there.  This does, however, appear to cause a semntic
difference in preliminary testing which I am still tracking down.  The
filter-based split is *incredibly* faster than the current code.  The
current code can take hours on moderately-sized histories.

This should shake out a lot of these kinds of problems since the
filter-branch code is heavily used and tested while the subtree split
code is not.

Assuming this goes ahead, I plan to introduce a new switch to control
filter-branch vs. original code and migrate the default to filter-branch
if all goes well.

I'll write up a failing test for this so that I remember to address it
when I get to the code.

Thanks again, Marcus!

                         -David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory)
  2016-01-17 23:23   ` David A. Greene
@ 2016-01-20  1:17     ` Marcus Brinkmann
  2016-01-20  4:05       ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene
  0 siblings, 1 reply; 12+ messages in thread
From: Marcus Brinkmann @ 2016-01-20  1:17 UTC (permalink / raw)
  To: David A. Greene, Junio C Hamano; +Cc: Dave Ware, git

'git subtree split' will fail if the history of the subtree has empty
tree commits (or trees that are considered empty, such as submodules).
This fix keeps track of this condition and correctly follows the history
over such commits.

Signed-off-by: Marcus Brinkmann <m.brinkmann@semantics.de>
---
 contrib/subtree/git-subtree.sh | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index edf36f8..b68828b 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -36,6 +36,8 @@ PATH=$PATH:$(git --exec-path)

 require_work_tree

+EMPTY_TREE=`git hash-object -t tree /dev/null`
+
 quiet=
 branch=
 debug=
@@ -449,7 +451,8 @@ copy_or_skip()
 	rev="$1"
 	tree="$2"
 	newparents="$3"
-	assert [ -n "$tree" ]
+
+	[ -z "$tree" ] && tree=$EMPTY_TREE

 	identical=
 	nonidentical=
@@ -603,6 +606,7 @@ cmd_split()
 	revmax=$(eval "$grl" | wc -l)
 	revcount=0
 	createcount=0
+	found_first_commit=
 	eval "$grl" |
 	while read rev parents; do
 		revcount=$(($revcount + 1))
@@ -625,12 +629,16 @@ cmd_split()
 		
 		# ugly.  is there no better way to tell if this is a subtree
 		# vs. a mainline commit?  Does it matter?
-		if [ -z $tree ]; then
-			set_notree $rev
-			if [ -n "$newparents" ]; then
-				cache_set $rev $rev
+		if [ -z $found_first_commit ]; then
+			if [ -z $tree ]; then
+				set_notree $rev
+				if [ -n "$newparents" ]; then
+					cache_set $rev $rev
+				fi
+				continue
+			else
+				found_first_commit=yes
 			fi
-			continue
 		fi

 		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-20  1:17     ` [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) Marcus Brinkmann
@ 2016-01-20  4:05       ` David A. Greene
  2016-01-20 11:22         ` Marcus Brinkmann
  2016-01-24 13:07         ` Marcus Brinkmann
  0 siblings, 2 replies; 12+ messages in thread
From: David A. Greene @ 2016-01-20  4:05 UTC (permalink / raw)
  To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git

Marcus Brinkmann <m.brinkmann@semantics.de> writes:

> 'git subtree split' will fail if the history of the subtree has empty
> tree commits (or trees that are considered empty, such as submodules).
> This fix keeps track of this condition and correctly follows the history
> over such commits.

Thanks for working on this!  Please add a test to t7900-subtree.sh.

> @@ -625,12 +629,16 @@ cmd_split()
>  		
>  		# ugly.  is there no better way to tell if this is a subtree
>  		# vs. a mainline commit?  Does it matter?
> -		if [ -z $tree ]; then
> -			set_notree $rev
> -			if [ -n "$newparents" ]; then
> -				cache_set $rev $rev
> +		if [ -z $found_first_commit ]; then
> +			if [ -z $tree ]; then
> +				set_notree $rev
> +				if [ -n "$newparents" ]; then
> +					cache_set $rev $rev
> +				fi
> +				continue
> +			else
> +				found_first_commit=yes
>  			fi
> -			continue
>  		fi
>
>  		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?

Can you explain the logic here?  The old code appears to be using the
lack of a tree to filter out "mainline" commits from the subtree history
when splitting.  If that test is only done before seeing a proper
subtree commit and never after, then any commit mainline commit
following the first subtree commit in the rev list will miss being
marked with set_notree and the cache will not have the identity entry
added.

Test #36 in t7900-subtree.sh has a mainline commit listed after the
first subtree commit in the rev list, I believe.

I'm not positive your change is wrong, I'd just like to understand it
better.  I'd also like a comment explaining why it works so future
developers don't get confused.  Overall, I am trying to better comment
the code as I make my own changes.

                           -David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-20  4:05       ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene
@ 2016-01-20 11:22         ` Marcus Brinkmann
  2016-01-28  2:55           ` David A. Greene
  2016-01-24 13:07         ` Marcus Brinkmann
  1 sibling, 1 reply; 12+ messages in thread
From: Marcus Brinkmann @ 2016-01-20 11:22 UTC (permalink / raw)
  To: David A. Greene; +Cc: Junio C Hamano, Dave Ware, git

On 01/20/2016 05:05 AM, David A. Greene wrote:
> Marcus Brinkmann <m.brinkmann@semantics.de> writes:
> 
>> 'git subtree split' will fail if the history of the subtree has empty
>> tree commits (or trees that are considered empty, such as submodules).
>> This fix keeps track of this condition and correctly follows the history
>> over such commits.
> 
> Thanks for working on this!  Please add a test to t7900-subtree.sh.

I couldn't get the tests to run and I couldn't find documentation on how
to run it.  If you enlighten me I can add a test :)

>> @@ -625,12 +629,16 @@ cmd_split()
>>  		
>>  		# ugly.  is there no better way to tell if this is a subtree
>>  		# vs. a mainline commit?  Does it matter?
>> -		if [ -z $tree ]; then
>> -			set_notree $rev
>> -			if [ -n "$newparents" ]; then
>> -				cache_set $rev $rev
>> +		if [ -z $found_first_commit ]; then
>> +			if [ -z $tree ]; then
>> +				set_notree $rev
>> +				if [ -n "$newparents" ]; then
>> +					cache_set $rev $rev
>> +				fi
>> +				continue
>> +			else
>> +				found_first_commit=yes
>>  			fi
>> -			continue
>>  		fi
>>
>>  		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
> 
> Can you explain the logic here?  The old code appears to be using the
> lack of a tree to filter out "mainline" commits from the subtree history
> when splitting.  If that test is only done before seeing a proper
> subtree commit and never after, then any commit mainline commit
> following the first subtree commit in the rev list will miss being
> marked with set_notree and the cache will not have the identity entry
> added.
> 
> Test #36 in t7900-subtree.sh has a mainline commit listed after the
> first subtree commit in the rev list, I believe.
> 
> I'm not positive your change is wrong, I'd just like to understand it
> better.  I'd also like a comment explaining why it works so future
> developers don't get confused.  Overall, I am trying to better comment
> the code as I make my own changes.

It's possible the patch does not work for some cases.  For example, I
don't know how the rejoin variant of splits work.

Some observations:

1) The notree list is never actually used except to identify which
commits have been visited in check_parents.

2) I have no idea what use case is covered by the "if [ -n "$newparents"
]; then cache_set $rev $rev; fi".  I left it in purely for traditional
reasons.  So, clarifying that would go a long way in understanding the
code, and if there is a test for that, I will figure it out.

3a) The bug happens because on the first commit that deletes the subdir,
newparents will not be empty, and the "cache_set $rev $rev" will kick in
and subsequently (when the subdir is added again) the history will
divert into the $rev commit which is not rewritten, but part of the
unsplit tree.  This seems very wrong to me!  See 2).

3b) To be very clear: It seems logically inconsistent to me to ever call
set_notree and cache_set on the same rev.  It also seems logically
inconsistent to me to call cache_set rev1 rev2 where rev2 is not
rewritten.  Both seem to be invariant errors that could be caught by
assertions.  They probably should.  In fact, I think my patch makes the
questionable if-case to be dead code, because newparents is never
non-empty before found_first_commit is true.  As such, I think it could
be eliminated.  But I am not 100% sure, as I don't know the intention of
the original code.

4) My patch only preserves the special handling of empty trees up to the
first commit that introduces subdir, because we don't want an empty
commit at the beginning.  After that, empty subdirs are not special at
all - the empty tree is replaced by EMPTY_TREE and handled as if it's a
normal subdir commit.  copy_and_skip will do the right thing.

5) I didn't test any case with multiple parents (merge commits).  There
are several of those cases (merge commits into empty subdirs, branches
with different non-empty subdirs from empty ones), and they don't apply
to my use case (git-svn conversion).  I read the copy_and_skip code and
see that it optimizes some of those cases, and although I didn't see an
obvious problem, I didn't think too deeply about it.

Thanks,
Marcus



-- 
s<e>mantics GmbH
Viktoriaallee 45
52066 Aachen
Web: www.semantics.de
Registergericht  : Amtsgericht Aachen, HRB 8189
Geschäftsführer  : Kay Heiligenhaus M.A.
                   Dipl. Ing. José de la Rosa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-20  4:05       ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene
  2016-01-20 11:22         ` Marcus Brinkmann
@ 2016-01-24 13:07         ` Marcus Brinkmann
  2016-01-28  2:56           ` David A. Greene
  1 sibling, 1 reply; 12+ messages in thread
From: Marcus Brinkmann @ 2016-01-24 13:07 UTC (permalink / raw)
  To: David A. Greene; +Cc: Junio C Hamano, Dave Ware, git

With my patch, "git subtree split -P" produces the same result (for my
data set) as "git filter-branch --subdirectory-filter", which is much
faster, because it selects the revisions to rewrite before rewriting.
As I am not using any of the advanced features of "git subtree", I will
just use "git filter-branch" instead.

Thanks!
Marcus

On 01/20/2016 05:05 AM, David A. Greene wrote:
> Marcus Brinkmann <m.brinkmann@semantics.de> writes:
> 
>> 'git subtree split' will fail if the history of the subtree has empty
>> tree commits (or trees that are considered empty, such as submodules).
>> This fix keeps track of this condition and correctly follows the history
>> over such commits.
> 
> Thanks for working on this!  Please add a test to t7900-subtree.sh.
> 
>> @@ -625,12 +629,16 @@ cmd_split()
>>  		
>>  		# ugly.  is there no better way to tell if this is a subtree
>>  		# vs. a mainline commit?  Does it matter?
>> -		if [ -z $tree ]; then
>> -			set_notree $rev
>> -			if [ -n "$newparents" ]; then
>> -				cache_set $rev $rev
>> +		if [ -z $found_first_commit ]; then
>> +			if [ -z $tree ]; then
>> +				set_notree $rev
>> +				if [ -n "$newparents" ]; then
>> +					cache_set $rev $rev
>> +				fi
>> +				continue
>> +			else
>> +				found_first_commit=yes
>>  			fi
>> -			continue
>>  		fi
>>
>>  		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
> 
> Can you explain the logic here?  The old code appears to be using the
> lack of a tree to filter out "mainline" commits from the subtree history
> when splitting.  If that test is only done before seeing a proper
> subtree commit and never after, then any commit mainline commit
> following the first subtree commit in the rev list will miss being
> marked with set_notree and the cache will not have the identity entry
> added.
> 
> Test #36 in t7900-subtree.sh has a mainline commit listed after the
> first subtree commit in the rev list, I believe.
> 
> I'm not positive your change is wrong, I'd just like to understand it
> better.  I'd also like a comment explaining why it works so future
> developers don't get confused.  Overall, I am trying to better comment
> the code as I make my own changes.
> 
>                            -David
> 


-- 
s<e>mantics GmbH
Viktoriaallee 45
52066 Aachen
Web: www.semantics.de
Registergericht  : Amtsgericht Aachen, HRB 8189
Geschäftsführer  : Kay Heiligenhaus M.A.
                   Dipl. Ing. José de la Rosa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-20 11:22         ` Marcus Brinkmann
@ 2016-01-28  2:55           ` David A. Greene
  0 siblings, 0 replies; 12+ messages in thread
From: David A. Greene @ 2016-01-28  2:55 UTC (permalink / raw)
  To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git

[ Sorry it took a few days to reply.  I am absolutely slammed at work
  and will be for the next few weeks at least.  The good news is that
  it's resulting in some nice work on git-subtree!  :) ]

Marcus Brinkmann <m.brinkmann@semantics.de> writes:

> On 01/20/2016 05:05 AM, David A. Greene wrote:
>> Marcus Brinkmann <m.brinkmann@semantics.de> writes:
>> 
>>> 'git subtree split' will fail if the history of the subtree has empty
>>> tree commits (or trees that are considered empty, such as submodules).
>>> This fix keeps track of this condition and correctly follows the history
>>> over such commits.
>> 
>> Thanks for working on this!  Please add a test to t7900-subtree.sh.
>
> I couldn't get the tests to run and I couldn't find documentation on how
> to run it.  If you enlighten me I can add a test :)

Just run "make test" in contrib/subtree.  You have to build git first.

>> Can you explain the logic here?  The old code appears to be using the
>> lack of a tree to filter out "mainline" commits from the subtree history
>> when splitting.  If that test is only done before seeing a proper
>> subtree commit and never after, then any commit mainline commit
>> following the first subtree commit in the rev list will miss being
>> marked with set_notree and the cache will not have the identity entry
>> added.
>> 
>> Test #36 in t7900-subtree.sh has a mainline commit listed after the
>> first subtree commit in the rev list, I believe.
>> 
>> I'm not positive your change is wrong, I'd just like to understand it
>> better.  I'd also like a comment explaining why it works so future
>> developers don't get confused.  Overall, I am trying to better comment
>> the code as I make my own changes.
>
> It's possible the patch does not work for some cases.  For example, I
> don't know how the rejoin variant of splits work.

I'm not so much worried about catching all cases of the bug you
identified, though it would be good if the patch did.  I'm much more
concerned about not causing a regression in existing functionality.

> Some observations:
>
> 1) The notree list is never actually used except to identify which
> commits have been visited in check_parents.

It's really verifying that we visited parents before children in the
split code, I think.  That seems like a good check to keep.

Let me make sure I understand your fix too.  Are you essentially
skipping empty commits when splitting?  You original patch said that
split failed but didn't say how.  Did git-subtree spit out an error
message, or did the failure manifest in some other way?  If I knew the
failure mode it might help me understand your changes better.

I see you explain the proble below (thanks!) but I'd still like to know
how you discovered it.  It will help in constructing a test.

> 2) I have no idea what use case is covered by the "if [ -n "$newparents"
> ]; then cache_set $rev $rev; fi".  I left it in purely for traditional
> reasons.  So, clarifying that would go a long way in understanding the
> code, and if there is a test for that, I will figure it out.

As far as I understand things, $newparents being non-empty means that
the commits parents were split out and $newparents contains the hashes
of the split commits, so that when this commit is split it can set up
the proper parent links.

If the commit doesn't have a tree in the subdirectory, then I *think*
the split codesimply sets the identity entry in the cache so that any
future commit that has this (empty) one as a parent will see it in the
$newparents list.  Since copy_or_skip checks for parents with empty
trees and does not link split commits to them, I don't understand the
purpose of including these commits in $newparents.

So you may very well be right that it just doesn't matter if we skip
these altogether.

> 3a) The bug happens because on the first commit that deletes the subdir,
> newparents will not be empty, and the "cache_set $rev $rev" will kick in
> and subsequently (when the subdir is added again) the history will
> divert into the $rev commit which is not rewritten, but part of the
> unsplit tree.  This seems very wrong to me!  See 2).

Ah, so the problem isn't empty commits per se, it's the fact that a
subdirectory was deleted and re-added.  That makes sense.  I didn't
understand that from your original commit message though I now remember
you discussed it in the first e-mail you sent.  It would be good to
clarify this in the final commit.

I agree that the behavior you describe is wrong.  So the fix basically
relies on the fact that there is some tree in the subdirectory, which
later gets deleted, but since "found_first_commit" triggered, we'll just
skip those empty commits and never see them in the cache so that when a
tree appears again we won't link to mainline commits.  Have I got it
right?

Currently the split code is all-or-nothing.  You split the whole history
or none of it.  Eventually I want to make it flexible enough to allow
splitting ranges of commits or individual commits.  I'm wondering how
your code will work if the split range starts or ends within the set of
empty subdirectory commits.  It's not something you really have to worry
about since I'd deal with it when I write the code, but it's something
that popped into my head.

> 3b) To be very clear: It seems logically inconsistent to me to ever call
> set_notree and cache_set on the same rev.  It also seems logically
> inconsistent to me to call cache_set rev1 rev2 where rev2 is not
> rewritten.  Both seem to be invariant errors that could be caught by
> assertions.  They probably should.  In fact, I think my patch makes the
> questionable if-case to be dead code, because newparents is never
> non-empty before found_first_commit is true.  As such, I think it could
> be eliminated.  But I am not 100% sure, as I don't know the intention of
> the original code.

After your walk-through and some exploring of the code, I think you are
right.  Like you I am not 100% sure but it would definitely be nice to
clean this bit of code up.

> 4) My patch only preserves the special handling of empty trees up to the
> first commit that introduces subdir, because we don't want an empty
> commit at the beginning.  After that, empty subdirs are not special at
> all - the empty tree is replaced by EMPTY_TREE and handled as if it's a
> normal subdir commit.  copy_and_skip will do the right thing.

I agree.

> 5) I didn't test any case with multiple parents (merge commits).  There
> are several of those cases (merge commits into empty subdirs, branches
> with different non-empty subdirs from empty ones), and they don't apply
> to my use case (git-svn conversion).  I read the copy_and_skip code and
> see that it optimizes some of those cases, and although I didn't see an
> obvious problem, I didn't think too deeply about it.

Yeah, that's definitely a concern.  I've run some tests on the current
git-subtree with merge commits and IIRC the results made sense.  I
should add those tests to the testbase.

I'm definitely inclined to accept your patch after this discussion but I
do want to see a few things:

1. A testcase

2. A more expanded commit message basically describing what you said in
   3-4 above, plus a bit more from your original e-mail describing the
   subdirectory delete and re-creation.  It will help people who come
   along later understand the history of the code.

3. Remove the questionable cache_set.  I agree that it seems wrong and
   copy_and_skip ignores it anyway.  Let's get rid of this cruft.
   Include something in the commit message about why this was done.

Thanks for the thorough explanation and for your work on this.  If you
can re-roll with the above and the existing tests pass, I think we can
pass this on to Junio.

Looking forward to the re-roll!

                           -David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-24 13:07         ` Marcus Brinkmann
@ 2016-01-28  2:56           ` David A. Greene
  2016-01-28  4:06             ` Marcus Brinkmann
  0 siblings, 1 reply; 12+ messages in thread
From: David A. Greene @ 2016-01-28  2:56 UTC (permalink / raw)
  To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git

Marcus Brinkmann <m.brinkmann@semantics.de> writes:

> With my patch, "git subtree split -P" produces the same result (for my
> data set) as "git filter-branch --subdirectory-filter", which is much
> faster, because it selects the revisions to rewrite before rewriting.
> As I am not using any of the advanced features of "git subtree", I will
> just use "git filter-branch" instead.

Heh.  :)

I hope to replace all that ugly split code with filter-branch as you
describe but there are some cases where it differs.  It may be that your
changes fix some of that.

Are you still able to do a re-roll on this?

                      -David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-28  2:56           ` David A. Greene
@ 2016-01-28  4:06             ` Marcus Brinkmann
  2016-02-03  2:34               ` David A. Greene
  0 siblings, 1 reply; 12+ messages in thread
From: Marcus Brinkmann @ 2016-01-28  4:06 UTC (permalink / raw)
  To: David A. Greene; +Cc: Junio C Hamano, Dave Ware, git

On 01/28/2016 03:56 AM, David A. Greene wrote:
> Marcus Brinkmann <m.brinkmann@semantics.de> writes:
>
>> With my patch, "git subtree split -P" produces the same result (for my
>> data set) as "git filter-branch --subdirectory-filter", which is much
>> faster, because it selects the revisions to rewrite before rewriting.
>> As I am not using any of the advanced features of "git subtree", I will
>> just use "git filter-branch" instead.
>
> Heh.  :)
>
> I hope to replace all that ugly split code with filter-branch as you
> describe but there are some cases where it differs.  It may be that your
> changes fix some of that.
>
> Are you still able to do a re-roll on this?

I have to admit that my interest has declined steeply since discovering 
that subtree-split and filter-branch --subtree-filter give different 
results from "git svn" on the subdirectory.  The reason is that git-svn 
includes all commits for revisions that regular "svn log" gives on that 
directory, which includes commits that serve as branch points only or 
that are empty except for unhandled properties.

While empty commits for unhandled properties wouldn't be fatal, missing 
branch points make "git svn" really unhappy when asked to rebuild .git/svn.

As migration from SVN is my main motivation at this point to use a 
subtree filter at this point (git-svn is just very slow - about one week 
on our repository), I am somewhat stuck and back to using git-svn. 
Although hacking up something with filter-branch seems like a remote 
option, it's probably nothing that generalizes.

It didn't help that "make test" in contrib/subtree gives me 27 out of 29 
failed tests (with no indication how to figure out what exactly failed).

Oh well :)

Marcus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] contrib/subtree: Split history with empty trees correctly
  2016-01-28  4:06             ` Marcus Brinkmann
@ 2016-02-03  2:34               ` David A. Greene
  0 siblings, 0 replies; 12+ messages in thread
From: David A. Greene @ 2016-02-03  2:34 UTC (permalink / raw)
  To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git

Marcus Brinkmann <m.brinkmann@semantics.de> writes:

>> Are you still able to do a re-roll on this?
>
> I have to admit that my interest has declined steeply since
> discovering that subtree-split and filter-branch --subtree-filter give
> different results from "git svn" on the subdirectory.  The reason is
> that git-svn includes all commits for revisions that regular "svn log"
> gives on that directory, which includes commits that serve as branch
> points only or that are empty except for unhandled properties.

What do you mean by "branch points only?"

It's ok if you can't do a reroll.  I can't work on it right now but
perhaps when I get back to cleaning up the split code I can take what
you have and incoporate it.  I do very much appreciate your work on
this!

> While empty commits for unhandled properties wouldn't be fatal,
> missing branch points make "git svn" really unhappy when asked to
> rebuild .git/svn.

[ I may have misunderstood your intent, see below. ]

I just want to make sure I understand your situation.  You used git-svn
to mirror a project to git and then used git-subtree to incorporate that
mirror into a larger project?

Why is the split being done?  If there's an active Subversion repository
being mirrors it's much better to commit changes back to the Subversion
repository than to the git mirror.

> As migration from SVN is my main motivation at this point to use a
> subtree filter at this point (git-svn is just very slow - about one
> week on our repository), I am somewhat stuck and back to using
> git-svn. Although hacking up something with filter-branch seems like a
> remote option, it's probably nothing that generalizes.

Ok, maybe I misunderstood your situation.  Are you converting one big
repository via git-svn and then trying to break out individual
directories into smaller projects?

git-svn + git-subtree/git-filter-branch is not the best way to do that.
svn-all-fast-export is far superior for a one-off conversion and makes
splitting repositories a breeze.  It happens during conversion rather
than as a post-processing step.

https://techbase.kde.org/Projects/MoveToGit/UsingSvn2Git

> It didn't help that "make test" in contrib/subtree gives me 27 out of
> 29 failed tests (with no indication how to figure out what exactly
> failed).

Huh.  I don't know why that would happen.  Did you build the git tools
first?  A testing run using --debug and --verbose (see the Makefile in
contrib/subtree/t) would be informative.  I understand if you don't have
time to do that.  I haven't seen such failures before so I'm curious as
to what happened.

                      -David

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-02-03  2:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15 16:23 BUG: git subtree split gets confused on removed and readded directory Marcus Brinkmann
2016-01-15 23:44 ` Junio C Hamano
2016-01-17 19:34   ` David Ware
2016-01-17 23:23   ` David A. Greene
2016-01-20  1:17     ` [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) Marcus Brinkmann
2016-01-20  4:05       ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene
2016-01-20 11:22         ` Marcus Brinkmann
2016-01-28  2:55           ` David A. Greene
2016-01-24 13:07         ` Marcus Brinkmann
2016-01-28  2:56           ` David A. Greene
2016-01-28  4:06             ` Marcus Brinkmann
2016-02-03  2:34               ` David A. Greene

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.