git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] subtree: fix split processing with multiple subtrees present
@ 2023-09-18 20:05 Zach FettersMoore via GitGitGadget
  2023-09-18 23:31 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-18 20:05 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

In the diagram below, 'M' represents the mainline repo branch, 'A'
represents one subtree, and 'B' represents another. M3 and B1 represent
a split commit for subtree B that was created from commit M4. M2 and A1
represent a split commit made from subtree A that was also created
based on changes back to and including M4. M1 represents new changes to
the repo, in this scenario if you try to run a 'git subtree split
--rejoin' for subtree B, commits M1, M2, and A1, will be included in
the processing of changes for the new split commit since the last
split/rejoin for subtree B was at M3. The issue is that by having A1
included in this processing the command ends up needing to processing
every commit down tree A even though none of that is needed or relevant
to the current command and result.

M1
 |	  \	  \
M2	   |	   |
 |     	  A1	   |
M3	   |	   |
 |	   |	  B1
M4	   |	   |

So this commit makes a change to the processing of commits for the split
command in order to ignore non-mainline commits from other subtrees such
as A1 in the diagram by adding a new function
'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to still
function as expected but removes all of the unnecessary processing that
takes place currently which greatly inflates the processing time.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
    subtree: fix split processing with multiple subtrees present
    
    When there are multiple subtrees in a repo and git subtree split
    --rejoin is being used for the subtrees, the processing of commits for a
    new split can take a significant (and constantly growing) amount of time
    because the split commits from other subtrees cause the processing to
    have to scan the entire history of the other subtree(s). This patch
    filters out the other subtree split commits that are unnecessary for the
    split commit processing.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

 contrib/subtree/git-subtree.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..e9250dfb019 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,12 +778,29 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be ignored from processing for splits
+should_ignore_subtree_commit () {
+  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
+  then
+    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
+    then
+      return 0
+    fi
+  fi
+  return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
 	local rev="$1"
 	local parents="$2"
 
+    if should_ignore_subtree_commit $rev
+    then
+	    return
+    fi
+
 	if test $indent -eq 0
 	then
 		revcount=$(($revcount + 1))

base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] subtree: fix split processing with multiple subtrees present
  2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
@ 2023-09-18 23:31 ` Junio C Hamano
  2023-09-19  1:04 ` Junio C Hamano
  2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
  2 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2023-09-18 23:31 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore

"Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>  contrib/subtree/git-subtree.sh | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index e0c5d3b0de6..e9250dfb019 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -778,12 +778,29 @@ ensure_valid_ref_format () {
>  		die "fatal: '$1' does not look like a ref"
>  }
>  
> +# Usage: check if a commit from another subtree should be ignored from processing for splits
> +should_ignore_subtree_commit () {
> +  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
> +  then
> +    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
> +    then
> +      return 0
> +    fi
> +  fi
> +  return 1
> +}
>
>  # Usage: process_split_commit REV PARENTS
>  process_split_commit () {
>  	assert test $# = 2
>  	local rev="$1"
>  	local parents="$2"
>  
> +    if should_ignore_subtree_commit $rev
> +    then
> +	    return
> +    fi
> +

Please do not violate Documentation/CodingGuidelines for our shell
scripted Porcelain, even if it is a script in contrib/ and also
please avoid bash-isms.

Also doesn't "subtree" have its own test?  If this change is a fix
for some problem(s), can we have a test or two that demonstrate how
the current code without the patch is broken?

Thanks.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] subtree: fix split processing with multiple subtrees present
  2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2023-09-18 23:31 ` Junio C Hamano
@ 2023-09-19  1:04 ` Junio C Hamano
  2023-10-26 19:59   ` Zach FettersMoore
  2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
  2 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2023-09-19  1:04 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore

"Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> In the diagram below, 'M' represents the mainline repo branch, 'A'
> represents one subtree, and 'B' represents another. M3 and B1 represent
> a split commit for subtree B that was created from commit M4. M2 and A1
> represent a split commit made from subtree A that was also created
> based on changes back to and including M4. M1 represents new changes to
> the repo, in this scenario if you try to run a 'git subtree split
> --rejoin' for subtree B, commits M1, M2, and A1, will be included in
> the processing of changes for the new split commit since the last
> split/rejoin for subtree B was at M3. The issue is that by having A1
> included in this processing the command ends up needing to processing
> every commit down tree A even though none of that is needed or relevant
> to the current command and result.
>
> M1
>  |      \       \
> M2       |       |
>  |      A1       |
> M3       |       |
>  |       |      B1
> M4       |       |

The above paragraph explains which different things you drew in the
diagram are representing, but it is not clear how they relate to
each other.  Do they for example depict parent-child commit
relationship?  What are the wide gaps between these three tracks and
what are the short angled lines leaning to the left near the tip?
Is the time/topology flowing from bottom to top?

> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index e0c5d3b0de6..e9250dfb019 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -778,12 +778,29 @@ ensure_valid_ref_format () {
>  		die "fatal: '$1' does not look like a ref"
>  }
>  
> +# Usage: check if a commit from another subtree should be ignored from processing for splits

Way overlong line.  Please split them accordingly.  I won't comment
on what CodingGuidelines tells us already, in this review, but have
a few comments here:

> +should_ignore_subtree_commit () {
> +  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
> +  then
> +    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]

Here $dir is a free variable that comes from outside.  The caller
does not supply it as a parameter to this function (and the caller
does not receive it as its parameter from its caller).  Yet the file
as a whole seems to liberally make assignments to it ("git grep dir="
on the file counts 7 assignments).  Are we sure we are looking for
the right $dir in this particular grep?

	Side note: I am not familiar with this part of the code at
	all, so do not take it as "here is a bug", but more as "this
	smells error prone."

Also can $dir have regular expressions special characters?  "The
existing code and new code alike, git-subtree is not prepared to 
handle directory names with RE special characters well at all, so
do not use them if you do not want your history broken" is an
acceptable answer.

The caller of this function process_split_commit is cmd_split and
process_split_commit (hence this function) is called repeatedly
inside a loop.  This function makes a traversal over the entire
history for each and every iteration in "good" cases where there is
no 'mainline' or 'subtree-dir' commits for the given $dir.

I wonder if it is more efficient to enumerate all commits that hits
these grep criteria in the cmd_split before it starts to call
process_split_commit repeatedly.  If it knows which commit can be
ignored beforehand, it can skip and not call process_split_commit,
no?

> +    then
> +      return 0
> +    fi
> +  fi
> +  return 1
> +}
> +
>  # Usage: process_split_commit REV PARENTS
>  process_split_commit () {
>  	assert test $# = 2
>  	local rev="$1"
>  	local parents="$2"

These seem to assume that $1 and $2 can have $IFS in them, so
shouldn't ...

> +    if should_ignore_subtree_commit $rev

... this call too enclose $rev inside a pair of double-quotes for
consistency?  We know the loop in the cmd_split that calls this
function is reading from "rev-list --parents" and $rev is a 40-hex
commit object name (and $parents can have more than one 40-hex
commit object names separated with SP), so it is safe to leave $rev
unquoted, but it pays to be consistent to help make the code more
readable.

> +    then
> +	    return
> +    fi
> +
>  	if test $indent -eq 0
>  	then
>  		revcount=$(($revcount + 1))
>
> base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/2] subtree: fix split processing with multiple subtrees present
  2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2023-09-18 23:31 ` Junio C Hamano
  2023-09-19  1:04 ` Junio C Hamano
@ 2023-09-22 16:25 ` Zach FettersMoore via GitGitGadget
  2023-09-22 16:25   ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget
                     ` (2 more replies)
  2 siblings, 3 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore

When there are multiple subtrees in a repo and git subtree split --rejoin is
being used for the subtrees, the processing of commits for a new split can
take a significant (and constantly growing) amount of time because the split
commits from other subtrees cause the processing to have to scan the entire
history of the other subtree(s). This patch filters out the other subtree
split commits that are unnecessary for the split commit processing.

Zach FettersMoore (2):
  subtree: fix split processing with multiple subtrees present
  subtree: changing location of commit ignore processing

 contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)


base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

Range-diff vs v1:

 1:  43175154a82 = 1:  43175154a82 subtree: fix split processing with multiple subtrees present
 -:  ----------- > 2:  d6811daf7cf subtree: changing location of commit ignore processing

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/2] subtree: fix split processing with multiple subtrees present
  2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
@ 2023-09-22 16:25   ` Zach FettersMoore via GitGitGadget
  2023-09-22 16:25   ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
  2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

In the diagram below, 'M' represents the mainline repo branch, 'A'
represents one subtree, and 'B' represents another. M3 and B1 represent
a split commit for subtree B that was created from commit M4. M2 and A1
represent a split commit made from subtree A that was also created
based on changes back to and including M4. M1 represents new changes to
the repo, in this scenario if you try to run a 'git subtree split
--rejoin' for subtree B, commits M1, M2, and A1, will be included in
the processing of changes for the new split commit since the last
split/rejoin for subtree B was at M3. The issue is that by having A1
included in this processing the command ends up needing to processing
every commit down tree A even though none of that is needed or relevant
to the current command and result.

M1
 |	  \	  \
M2	   |	   |
 |     	  A1	   |
M3	   |	   |
 |	   |	  B1
M4	   |	   |

So this commit makes a change to the processing of commits for the split
command in order to ignore non-mainline commits from other subtrees such
as A1 in the diagram by adding a new function
'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to still
function as expected but removes all of the unnecessary processing that
takes place currently which greatly inflates the processing time.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
 contrib/subtree/git-subtree.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..e9250dfb019 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,12 +778,29 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be ignored from processing for splits
+should_ignore_subtree_commit () {
+  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
+  then
+    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
+    then
+      return 0
+    fi
+  fi
+  return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
 	local rev="$1"
 	local parents="$2"
 
+    if should_ignore_subtree_commit $rev
+    then
+	    return
+    fi
+
 	if test $indent -eq 0
 	then
 		revcount=$(($revcount + 1))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/2] subtree: changing location of commit ignore processing
  2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
  2023-09-22 16:25   ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget
@ 2023-09-22 16:25   ` Zach FettersMoore via GitGitGadget
  2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

Based on feedback from original commit:

-Updated the location of check whether a commit should
be ignored during split processing

-Updated code to better fit coding guidelines

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
 contrib/subtree/git-subtree.sh | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e9250dfb019..e69991a9d80 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,11 +778,13 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
-# Usage: check if a commit from another subtree should be ignored from processing for splits
-should_ignore_subtree_commit () {
-  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
+# Usage: check if a commit from another subtree should be
+# ignored from processing for splits
+should_ignore_subtree_split_commit () {
+  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
   then
-    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
+    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
     then
       return 0
     fi
@@ -796,11 +798,6 @@ process_split_commit () {
 	local rev="$1"
 	local parents="$2"
 
-    if should_ignore_subtree_commit $rev
-    then
-	    return
-    fi
-
 	if test $indent -eq 0
 	then
 		revcount=$(($revcount + 1))
@@ -980,7 +977,20 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		if should_ignore_subtree_split_commit "$rev"
+		then
+			continue
+		fi
+		parsedParents=''
+		for parent in $parents
+		do
+			should_ignore_subtree_split_commit "$parent"
+			if test $? -eq 1
+			then
+				parsedParents+="$parent "
+			fi
+		done
+		process_split_commit "$rev" "$parsedParents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present
  2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
  2023-09-22 16:25   ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget
  2023-09-22 16:25   ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
@ 2023-09-29 20:32   ` Zach FettersMoore via GitGitGadget
  2023-09-29 20:32     ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget
                       ` (3 more replies)
  2 siblings, 4 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore

When there are multiple subtrees in a repo and git subtree split --rejoin is
being used for the subtrees, the processing of commits for a new split can
take a significant (and constantly growing) amount of time because the split
commits from other subtrees cause the processing to have to scan the entire
history of the other subtree(s). This patch filters out the other subtree
split commits that are unnecessary for the split commit processing.

Zach FettersMoore (3):
  subtree: fix split processing with multiple subtrees present
  subtree: changing location of commit ignore processing
  subtree: adding test to validate fix

 contrib/subtree/git-subtree.sh     | 29 ++++++++++++++++++++-
 contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 1 deletion(-)


base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

Range-diff vs v2:

 1:  43175154a82 = 1:  43175154a82 subtree: fix split processing with multiple subtrees present
 2:  d6811daf7cf = 2:  d6811daf7cf subtree: changing location of commit ignore processing
 -:  ----------- > 3:  eff8bfcc042 subtree: adding test to validate fix

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/3] subtree: fix split processing with multiple subtrees present
  2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
@ 2023-09-29 20:32     ` Zach FettersMoore via GitGitGadget
  2023-09-29 20:32     ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

In the diagram below, 'M' represents the mainline repo branch, 'A'
represents one subtree, and 'B' represents another. M3 and B1 represent
a split commit for subtree B that was created from commit M4. M2 and A1
represent a split commit made from subtree A that was also created
based on changes back to and including M4. M1 represents new changes to
the repo, in this scenario if you try to run a 'git subtree split
--rejoin' for subtree B, commits M1, M2, and A1, will be included in
the processing of changes for the new split commit since the last
split/rejoin for subtree B was at M3. The issue is that by having A1
included in this processing the command ends up needing to processing
every commit down tree A even though none of that is needed or relevant
to the current command and result.

M1
 |	  \	  \
M2	   |	   |
 |     	  A1	   |
M3	   |	   |
 |	   |	  B1
M4	   |	   |

So this commit makes a change to the processing of commits for the split
command in order to ignore non-mainline commits from other subtrees such
as A1 in the diagram by adding a new function
'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to still
function as expected but removes all of the unnecessary processing that
takes place currently which greatly inflates the processing time.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
 contrib/subtree/git-subtree.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..e9250dfb019 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,12 +778,29 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be ignored from processing for splits
+should_ignore_subtree_commit () {
+  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
+  then
+    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
+    then
+      return 0
+    fi
+  fi
+  return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
 	local rev="$1"
 	local parents="$2"
 
+    if should_ignore_subtree_commit $rev
+    then
+	    return
+    fi
+
 	if test $indent -eq 0
 	then
 		revcount=$(($revcount + 1))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 2/3] subtree: changing location of commit ignore processing
  2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2023-09-29 20:32     ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget
@ 2023-09-29 20:32     ` Zach FettersMoore via GitGitGadget
  2023-09-29 20:33     ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget
  2023-10-26 19:17     ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  3 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

Based on feedback from original commit:

-Updated the location of check whether a commit should
be ignored during split processing

-Updated code to better fit coding guidelines

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
 contrib/subtree/git-subtree.sh | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e9250dfb019..e69991a9d80 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,11 +778,13 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
-# Usage: check if a commit from another subtree should be ignored from processing for splits
-should_ignore_subtree_commit () {
-  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
+# Usage: check if a commit from another subtree should be
+# ignored from processing for splits
+should_ignore_subtree_split_commit () {
+  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
   then
-    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
+    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
     then
       return 0
     fi
@@ -796,11 +798,6 @@ process_split_commit () {
 	local rev="$1"
 	local parents="$2"
 
-    if should_ignore_subtree_commit $rev
-    then
-	    return
-    fi
-
 	if test $indent -eq 0
 	then
 		revcount=$(($revcount + 1))
@@ -980,7 +977,20 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		if should_ignore_subtree_split_commit "$rev"
+		then
+			continue
+		fi
+		parsedParents=''
+		for parent in $parents
+		do
+			should_ignore_subtree_split_commit "$parent"
+			if test $? -eq 1
+			then
+				parsedParents+="$parent "
+			fi
+		done
+		process_split_commit "$rev" "$parsedParents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/3] subtree: adding test to validate fix
  2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2023-09-29 20:32     ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget
  2023-09-29 20:32     ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
@ 2023-09-29 20:33     ` Zach FettersMoore via GitGitGadget
  2023-10-17 20:02       ` Zach FettersMoore
  2023-10-26 19:17     ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  3 siblings, 1 reply; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:33 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

Adding a test to validate that the proposed fix
solves the issue.

The test accomplishes this by checking the output
of the split command to ensure the output from
the progress of 'process_split_commit' function
that represents the 'extracount' of commits
processed does not increment.

This was tested against the original functionality
to show the test failed, and then with this fix
to show the test passes.

This illustrated that when using multiple subtrees,
A and B, when doing a split on subtree B, the
processing does not traverse the entire history
of subtree A which is unnecessary and would cause
the 'extracount' of processed commits to climb
based on the number of commits in the history of
subtree A.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
 contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 49a21dd7c9c..57c12e9f924 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -385,6 +385,47 @@ test_expect_success 'split sub dir/ with --rejoin' '
 	)
 '
 
+test_expect_success 'split with multiple subtrees' '
+	subtree_test_create_repo "$test_count" &&
+	subtree_test_create_repo "$test_count/subA" &&
+	subtree_test_create_repo "$test_count/subB" &&
+	test_create_commit "$test_count" main1 &&
+	test_create_commit "$test_count/subA" subA1 &&
+	test_create_commit "$test_count/subA" subA2 &&
+	test_create_commit "$test_count/subA" subA3 &&
+	test_create_commit "$test_count/subB" subB1 &&
+	(
+		cd "$test_count" &&
+		git fetch ./subA HEAD &&
+		git subtree add --prefix=subADir FETCH_HEAD
+	) &&
+	(
+		cd "$test_count" &&
+		git fetch ./subB HEAD &&
+		git subtree add --prefix=subBDir FETCH_HEAD
+	) &&
+	test_create_commit "$test_count" subADir/main-subA1 &&
+	test_create_commit "$test_count" subBDir/main-subB1 &&
+	(
+		cd "$test_count" &&
+		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
+	) &&
+	(
+		cd "$test_count" &&
+		git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
+	) &&
+	test_create_commit "$test_count" subADir/main-subA2 &&
+	test_create_commit "$test_count" subBDir/main-subB2 &&
+	(
+		cd "$test_count" &&
+		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
+	) &&
+	(
+		cd "$test_count" &&
+		test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
+	)
+'
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/3] subtree: adding test to validate fix
  2023-09-29 20:33     ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget
@ 2023-10-17 20:02       ` Zach FettersMoore
  0 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore @ 2023-10-17 20:02 UTC (permalink / raw)
  To: gitgitgadget, git; +Cc: zach.fetters

From: "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com>

> From: Zach FettersMoore <zach.fetters@apollographql.com>
> 
> Adding a test to validate that the proposed fix
> solves the issue.
> 
> The test accomplishes this by checking the output
> of the split command to ensure the output from
> the progress of 'process_split_commit' function
> that represents the 'extracount' of commits
> processed does not increment.
> 
> This was tested against the original functionality
> to show the test failed, and then with this fix
> to show the test passes.
> 
> This illustrated that when using multiple subtrees,
> A and B, when doing a split on subtree B, the
> processing does not traverse the entire history
> of subtree A which is unnecessary and would cause
> the 'extracount' of processed commits to climb
> based on the number of commits in the history of
> subtree A.
> 
> Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
> ---
>  contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> index 49a21dd7c9c..57c12e9f924 100755
> --- a/contrib/subtree/t/t7900-subtree.sh
> +++ b/contrib/subtree/t/t7900-subtree.sh
> @@ -385,6 +385,47 @@ test_expect_success 'split sub dir/ with --rejoin' '
>  	)
>  '
>  
> +test_expect_success 'split with multiple subtrees' '
> +	subtree_test_create_repo "$test_count" &&
> +	subtree_test_create_repo "$test_count/subA" &&
> +	subtree_test_create_repo "$test_count/subB" &&
> +	test_create_commit "$test_count" main1 &&
> +	test_create_commit "$test_count/subA" subA1 &&
> +	test_create_commit "$test_count/subA" subA2 &&
> +	test_create_commit "$test_count/subA" subA3 &&
> +	test_create_commit "$test_count/subB" subB1 &&
> +	(
> +		cd "$test_count" &&
> +		git fetch ./subA HEAD &&
> +		git subtree add --prefix=subADir FETCH_HEAD
> +	) &&
> +	(
> +		cd "$test_count" &&
> +		git fetch ./subB HEAD &&
> +		git subtree add --prefix=subBDir FETCH_HEAD
> +	) &&
> +	test_create_commit "$test_count" subADir/main-subA1 &&
> +	test_create_commit "$test_count" subBDir/main-subB1 &&
> +	(
> +		cd "$test_count" &&
> +		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
> +	) &&
> +	(
> +		cd "$test_count" &&
> +		git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
> +	) &&
> +	test_create_commit "$test_count" subADir/main-subA2 &&
> +	test_create_commit "$test_count" subBDir/main-subB2 &&
> +	(
> +		cd "$test_count" &&
> +		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
> +	) &&
> +	(
> +		cd "$test_count" &&
> +		test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
> +	)
> +'
> +
>  test_expect_success 'split sub dir/ with --rejoin from scratch' '
>  	subtree_test_create_repo "$test_count" &&
>  	test_create_commit "$test_count" main1 &&
> --

Checking to see if there is something else I need to do with this
to get it across the finish line, wasn't sure if something else
was needed since it's been dormant since my last push a few 
weeks ago.

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4] subtree: fix split processing with multiple subtrees present
  2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
                       ` (2 preceding siblings ...)
  2023-09-29 20:33     ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget
@ 2023-10-26 19:17     ` Zach FettersMoore via GitGitGadget
  2023-11-18 11:28       ` Christian Couder
  2023-11-28 21:17       ` [PATCH v5] " Zach FettersMoore via GitGitGadget
  3 siblings, 2 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-10-26 19:17 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

In the diagram below, 'M' represents the mainline repo branch, 'A'
represents one subtree, and 'B' represents another. M3 and B1 represent
a split commit for subtree B that was created from commit M4. M2 and A1
represent a split commit made from subtree A that was also created
based on changes back to and including M4. M1 represents new changes to
the repo, in this scenario if you try to run a 'git subtree split
--rejoin' for subtree B, commits M1, M2, and A1, will be included in
the processing of changes for the new split commit since the last
split/rejoin for subtree B was at M3. The issue is that by having A1
included in this processing the command ends up needing to processing
every commit down tree A even though none of that is needed or relevant
to the current command and result.

M1
 |	  \	  \
M2	   |	   |
 |     	  A1	   |
M3	   |	   |
 |	   |	  B1
M4	   |	   |

So this commit makes a change to the processing of commits for the split
command in order to ignore non-mainline commits from other subtrees such
as A1 in the diagram by adding a new function
'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to still
function as expected but removes all of the unnecessary processing that
takes place currently which greatly inflates the processing time.

Added a test to validate that the proposed fix
solves the issue.

The test accomplishes this by checking the output
of the split command to ensure the output from
the progress of 'process_split_commit' function
that represents the 'extracount' of commits
processed does not increment.

This was tested against the original functionality
to show the test failed, and then with this fix
to show the test passes.

This illustrated that when using multiple subtrees,
A and B, when doing a split on subtree B, the
processing does not traverse the entire history
of subtree A which is unnecessary and would cause
the 'extracount' of processed commits to climb
based on the number of commits in the history of
subtree A.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
    subtree: fix split processing with multiple subtrees present
    
    When there are multiple subtrees in a repo and git subtree split
    --rejoin is being used for the subtrees, the processing of commits for a
    new split can take a significant (and constantly growing) amount of time
    because the split commits from other subtrees cause the processing to
    have to scan the entire history of the other subtree(s). This patch
    filters out the other subtree split commits that are unnecessary for the
    split commit processing.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

Range-diff vs v3:

 1:  43175154a82 < -:  ----------- subtree: fix split processing with multiple subtrees present
 2:  d6811daf7cf < -:  ----------- subtree: changing location of commit ignore processing
 3:  eff8bfcc042 ! 1:  353152910eb subtree: adding test to validate fix
     @@ Metadata
      Author: Zach FettersMoore <zach.fetters@apollographql.com>
      
       ## Commit message ##
     -    subtree: adding test to validate fix
     +    subtree: fix split processing with multiple subtrees present
      
     -    Adding a test to validate that the proposed fix
     +    When there are multiple subtrees present in a repository and they are
     +    all using 'git subtree split', the 'split' command can take a
     +    significant (and constantly growing) amount of time to run even when
     +    using the '--rejoin' flag. This is due to the fact that when processing
     +    commits to determine the last known split to start from when looking
     +    for changes, if there has been a split/merge done from another subtree
     +    there will be 2 split commits, one mainline and one subtree, for the
     +    second subtree that are part of the processing. The non-mainline
     +    subtree split commit will cause the processing to always need to search
     +    the entire history of the given subtree as part of its processing even
     +    though those commits are totally irrelevant to the current subtree
     +    split being run.
     +
     +    In the diagram below, 'M' represents the mainline repo branch, 'A'
     +    represents one subtree, and 'B' represents another. M3 and B1 represent
     +    a split commit for subtree B that was created from commit M4. M2 and A1
     +    represent a split commit made from subtree A that was also created
     +    based on changes back to and including M4. M1 represents new changes to
     +    the repo, in this scenario if you try to run a 'git subtree split
     +    --rejoin' for subtree B, commits M1, M2, and A1, will be included in
     +    the processing of changes for the new split commit since the last
     +    split/rejoin for subtree B was at M3. The issue is that by having A1
     +    included in this processing the command ends up needing to processing
     +    every commit down tree A even though none of that is needed or relevant
     +    to the current command and result.
     +
     +    M1
     +     |        \       \
     +    M2         |       |
     +     |        A1       |
     +    M3         |       |
     +     |         |      B1
     +    M4         |       |
     +
     +    So this commit makes a change to the processing of commits for the split
     +    command in order to ignore non-mainline commits from other subtrees such
     +    as A1 in the diagram by adding a new function
     +    'should_ignore_subtree_commit' which is called during
     +    'process_split_commit'. This allows the split/rejoin processing to still
     +    function as expected but removes all of the unnecessary processing that
     +    takes place currently which greatly inflates the processing time.
     +
     +    Added a test to validate that the proposed fix
          solves the issue.
      
          The test accomplishes this by checking the output
     @@ Commit message
      
          Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
      
     + ## contrib/subtree/git-subtree.sh ##
     +@@ contrib/subtree/git-subtree.sh: ensure_valid_ref_format () {
     + 		die "fatal: '$1' does not look like a ref"
     + }
     + 
     ++# Usage: check if a commit from another subtree should be
     ++# ignored from processing for splits
     ++should_ignore_subtree_split_commit () {
     ++  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
     ++  then
     ++    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
     ++			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
     ++    then
     ++      return 0
     ++    fi
     ++  fi
     ++  return 1
     ++}
     ++
     + # Usage: process_split_commit REV PARENTS
     + process_split_commit () {
     + 	assert test $# = 2
     +@@ contrib/subtree/git-subtree.sh: cmd_split () {
     + 	eval "$grl" |
     + 	while read rev parents
     + 	do
     +-		process_split_commit "$rev" "$parents"
     ++		if should_ignore_subtree_split_commit "$rev"
     ++		then
     ++			continue
     ++		fi
     ++		parsedParents=''
     ++		for parent in $parents
     ++		do
     ++			should_ignore_subtree_split_commit "$parent"
     ++			if test $? -eq 1
     ++			then
     ++				parsedParents+="$parent "
     ++			fi
     ++		done
     ++		process_split_commit "$rev" "$parsedParents"
     + 	done || exit $?
     + 
     + 	latest_new=$(cache_get latest_new) || exit $?
     +
       ## contrib/subtree/t/t7900-subtree.sh ##
      @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --rejoin' '
       	)
     @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --r
      +	) &&
      +	(
      +		cd "$test_count" &&
     -+		test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
     ++		test "$(git subtree split --prefix=subBDir --squash --rejoin \
     ++		 -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
      +	)
      +'
      +


 contrib/subtree/git-subtree.sh     | 29 ++++++++++++++++++++-
 contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..e69991a9d80 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,6 +778,20 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be
+# ignored from processing for splits
+should_ignore_subtree_split_commit () {
+  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
+  then
+    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
+    then
+      return 0
+    fi
+  fi
+  return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
@@ -963,7 +977,20 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		if should_ignore_subtree_split_commit "$rev"
+		then
+			continue
+		fi
+		parsedParents=''
+		for parent in $parents
+		do
+			should_ignore_subtree_split_commit "$parent"
+			if test $? -eq 1
+			then
+				parsedParents+="$parent "
+			fi
+		done
+		process_split_commit "$rev" "$parsedParents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 49a21dd7c9c..87d59afd761 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' '
 	)
 '
 
+test_expect_success 'split with multiple subtrees' '
+	subtree_test_create_repo "$test_count" &&
+	subtree_test_create_repo "$test_count/subA" &&
+	subtree_test_create_repo "$test_count/subB" &&
+	test_create_commit "$test_count" main1 &&
+	test_create_commit "$test_count/subA" subA1 &&
+	test_create_commit "$test_count/subA" subA2 &&
+	test_create_commit "$test_count/subA" subA3 &&
+	test_create_commit "$test_count/subB" subB1 &&
+	(
+		cd "$test_count" &&
+		git fetch ./subA HEAD &&
+		git subtree add --prefix=subADir FETCH_HEAD
+	) &&
+	(
+		cd "$test_count" &&
+		git fetch ./subB HEAD &&
+		git subtree add --prefix=subBDir FETCH_HEAD
+	) &&
+	test_create_commit "$test_count" subADir/main-subA1 &&
+	test_create_commit "$test_count" subBDir/main-subB1 &&
+	(
+		cd "$test_count" &&
+		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
+	) &&
+	(
+		cd "$test_count" &&
+		git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
+	) &&
+	test_create_commit "$test_count" subADir/main-subA2 &&
+	test_create_commit "$test_count" subBDir/main-subB2 &&
+	(
+		cd "$test_count" &&
+		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
+	) &&
+	(
+		cd "$test_count" &&
+		test "$(git subtree split --prefix=subBDir --squash --rejoin \
+		 -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
+	)
+'
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] subtree: fix split processing with multiple subtrees present
  2023-09-19  1:04 ` Junio C Hamano
@ 2023-10-26 19:59   ` Zach FettersMoore
  0 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore @ 2023-10-26 19:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Zach FettersMoore via GitGitGadget, git

> Please do not violate Documentation/CodingGuidelines for our shell
> scripted Porcelain, even if it is a script in contrib/ and also
>please avoid bash-isms.

I believe I have resolved the CodingGuidelines issues.

> Also doesn't "subtree" have its own test?  If this change is a fix
> for some problem(s), can we have a test or two that demonstrate how
> the current code without the patch is broken?

I was able to add a test that validates against some of the metrics that
are tracked when running a split  for processing commits. Validated that
before my fix the test fails, and after my fix the test passes.

>> In the diagram below, 'M' represents the mainline repo branch, 'A'
>> represents one subtree, and 'B' represents another. M3 and B1 represent
>> a split commit for subtree B that was created from commit M4. M2 and A1
>> represent a split commit made from subtree A that was also created
>> based on changes back to and including M4. M1 represents new changes to
>> the repo, in this scenario if you try to run a 'git subtree split
>> --rejoin' for subtree B, commits M1, M2, and A1, will be included in
>> the processing of changes for the new split commit since the last
>> split/rejoin for subtree B was at M3. The issue is that by having A1
>> included in this processing the command ends up needing to processing
>> every commit down tree A even though none of that is needed or relevant
>> to the current command and result.
>>
>> M1
>>  |      \       \
>> M2       |       |
>>  |      A1       |
>> M3       |       |
>>  |       |      B1
>> M4       |       |

> The above paragraph explains which different things you drew in the
> diagram are representing, but it is not clear how they relate to
> each other.  Do they for example depict parent-child commit
> relationship?  What are the wide gaps between these three tracks and
> what are the short angled lines leaning to the left near the tip?
> Is the time/topology flowing from bottom to top?

I am realizing I made a few mistakes with trying to illustrate the diagram
which I will attempt to make more clear below. As for the 3 columns in the
diagram, 'M' represents the mainline branch of the repo being developed in,
while column 'A' represents the history of a subtree 'A' included in the
repo, and column 'B' also represents the history of a subtree 'B' in the
repo. The diagram attempts to illustrate when a 'git subtree split --rejoin'
is used, that there is a commit made in the subtrees history, and that is
then merged into the mainline repo branch.

M1
 |
 |
M2 --- |
 |     A1
 |     |
M3 ---------- |
 |     |      B1
M4     |      |

Hopefully that helps better illustrate the state of the repo before the new
'git subtree split --rejoin' attempt and why it results in the described issue.

>> +should_ignore_subtree_commit () {
>> +  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
>> +  then
>> +    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
>
> Here $dir is a free variable that comes from outside.  The caller
> does not supply it as a parameter to this function (and the caller
> does not receive it as its parameter from its caller).  Yet the file
> as a whole seems to liberally make assignments to it ("git grep dir="
> on the file counts 7 assignments).  Are we sure we are looking for
> the right $dir in this particular grep?
>
>  Side note: I am not familiar with this part of the code at
>  all, so do not take it as "here is a bug", but more as "this
>  smells error prone."

From my testing and what I see for '$dir' usage in the 'cmd_split'
function which leads to this code it is the correct '$dir', although
I see your point about it being reassigned in different places which
makes it error prone. I switched this to use the command
line argument '$arg_prefix' since the subtree prefix passed into
the command is what we actually want in this case so we can filter
out commits from other subtrees.

> Also can $dir have regular expressions special characters?  "The
> existing code and new code alike, git-subtree is not prepared to
> handle directory names with RE special characters well at all, so
> do not use them if you do not want your history broken" is an
> acceptable answer.

As far as I can tell from looking at the code (which I only recently
started using) the '$dir' which is based on the subtree prefix is
not setup to handle this.

> The caller of this function process_split_commit is cmd_split and
> process_split_commit (hence this function) is called repeatedly
> inside a loop.  This function makes a traversal over the entire
> history for each and every iteration in "good" cases where there is
> no 'mainline' or 'subtree-dir' commits for the given $dir.
>
> I wonder if it is more efficient to enumerate all commits that hits
> these grep criteria in the cmd_split before it starts to call
> process_split_commit repeatedly.  If it knows which commit can be
> ignored beforehand, it can skip and not call process_split_commit,
> no?

Moved this functionality into the 'cmd_split' function as suggested.

>> +    then
>> +      return 0
>> +    fi
>> +  fi
>> +  return 1
>> +}
>> +
>>  # Usage: process_split_commit REV PARENTS
>>  process_split_commit () {
>>   assert test $# = 2
>>   local rev="$1"
>>   local parents="$2"
>
> These seem to assume that $1 and $2 can have $IFS in them, so
> shouldn't ...
>
>> +    if should_ignore_subtree_commit $rev
>
> ... this call too enclose $rev inside a pair of double-quotes for
> consistency?  We know the loop in the cmd_split that calls this
> function is reading from "rev-list --parents" and $rev is a 40-hex
> commit object name (and $parents can have more than one 40-hex
> commit object names separated with SP), so it is safe to leave $rev
> unquoted, but it pays to be consistent to help make the code more
> readable.

Updated this for consistency


On Mon, Sep 18, 2023 at 9:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > In the diagram below, 'M' represents the mainline repo branch, 'A'
> > represents one subtree, and 'B' represents another. M3 and B1 represent
> > a split commit for subtree B that was created from commit M4. M2 and A1
> > represent a split commit made from subtree A that was also created
> > based on changes back to and including M4. M1 represents new changes to
> > the repo, in this scenario if you try to run a 'git subtree split
> > --rejoin' for subtree B, commits M1, M2, and A1, will be included in
> > the processing of changes for the new split commit since the last
> > split/rejoin for subtree B was at M3. The issue is that by having A1
> > included in this processing the command ends up needing to processing
> > every commit down tree A even though none of that is needed or relevant
> > to the current command and result.
> >
> > M1
> >  |      \       \
> > M2       |       |
> >  |      A1       |
> > M3       |       |
> >  |       |      B1
> > M4       |       |
>
> The above paragraph explains which different things you drew in the
> diagram are representing, but it is not clear how they relate to
> each other.  Do they for example depict parent-child commit
> relationship?  What are the wide gaps between these three tracks and
> what are the short angled lines leaning to the left near the tip?
> Is the time/topology flowing from bottom to top?
>
> > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> > index e0c5d3b0de6..e9250dfb019 100755
> > --- a/contrib/subtree/git-subtree.sh
> > +++ b/contrib/subtree/git-subtree.sh
> > @@ -778,12 +778,29 @@ ensure_valid_ref_format () {
> >               die "fatal: '$1' does not look like a ref"
> >  }
> >
> > +# Usage: check if a commit from another subtree should be ignored from processing for splits
>
> Way overlong line.  Please split them accordingly.  I won't comment
> on what CodingGuidelines tells us already, in this review, but have
> a few comments here:
>
> > +should_ignore_subtree_commit () {
> > +  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
> > +  then
> > +    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
>
> Here $dir is a free variable that comes from outside.  The caller
> does not supply it as a parameter to this function (and the caller
> does not receive it as its parameter from its caller).  Yet the file
> as a whole seems to liberally make assignments to it ("git grep dir="
> on the file counts 7 assignments).  Are we sure we are looking for
> the right $dir in this particular grep?
>
>         Side note: I am not familiar with this part of the code at
>         all, so do not take it as "here is a bug", but more as "this
>         smells error prone."
>
> Also can $dir have regular expressions special characters?  "The
> existing code and new code alike, git-subtree is not prepared to
> handle directory names with RE special characters well at all, so
> do not use them if you do not want your history broken" is an
> acceptable answer.
>
> The caller of this function process_split_commit is cmd_split and
> process_split_commit (hence this function) is called repeatedly
> inside a loop.  This function makes a traversal over the entire
> history for each and every iteration in "good" cases where there is
> no 'mainline' or 'subtree-dir' commits for the given $dir.
>
> I wonder if it is more efficient to enumerate all commits that hits
> these grep criteria in the cmd_split before it starts to call
> process_split_commit repeatedly.  If it knows which commit can be
> ignored beforehand, it can skip and not call process_split_commit,
> no?
>
> > +    then
> > +      return 0
> > +    fi
> > +  fi
> > +  return 1
> > +}
> > +
> >  # Usage: process_split_commit REV PARENTS
> >  process_split_commit () {
> >       assert test $# = 2
> >       local rev="$1"
> >       local parents="$2"
>
> These seem to assume that $1 and $2 can have $IFS in them, so
> shouldn't ...
>
> > +    if should_ignore_subtree_commit $rev
>
> ... this call too enclose $rev inside a pair of double-quotes for
> consistency?  We know the loop in the cmd_split that calls this
> function is reading from "rev-list --parents" and $rev is a 40-hex
> commit object name (and $parents can have more than one 40-hex
> commit object names separated with SP), so it is safe to leave $rev
> unquoted, but it pays to be consistent to help make the code more
> readable.
>
> > +    then
> > +         return
> > +    fi
> > +
> >       if test $indent -eq 0
> >       then
> >               revcount=$(($revcount + 1))
> >
> > base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] subtree: fix split processing with multiple subtrees present
  2023-10-26 19:17     ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
@ 2023-11-18 11:28       ` Christian Couder
  2023-11-28 21:04         ` Zach FettersMoore
  2023-11-28 21:17       ` [PATCH v5] " Zach FettersMoore via GitGitGadget
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Couder @ 2023-11-18 11:28 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore

On Thu, Oct 26, 2023 at 9:18 PM Zach FettersMoore via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Zach FettersMoore <zach.fetters@apollographql.com>
>
> When there are multiple subtrees present in a repository and they are
> all using 'git subtree split', the 'split' command can take a
> significant (and constantly growing) amount of time to run even when
> using the '--rejoin' flag. This is due to the fact that when processing
> commits to determine the last known split to start from when looking
> for changes, if there has been a split/merge done from another subtree
> there will be 2 split commits, one mainline and one subtree, for the
> second subtree that are part of the processing. The non-mainline
> subtree split commit will cause the processing to always need to search
> the entire history of the given subtree as part of its processing even
> though those commits are totally irrelevant to the current subtree
> split being run.

Thanks for your continued work on this!

I am not familiar with git subtree so I might miss obvious things. On
the other hand, my comments might help increase a bit the number of
people who could review this patch.

> In the diagram below, 'M' represents the mainline repo branch, 'A'
> represents one subtree, and 'B' represents another. M3 and B1 represent
> a split commit for subtree B that was created from commit M4. M2 and A1
> represent a split commit made from subtree A that was also created
> based on changes back to and including M4. M1 represents new changes to
> the repo, in this scenario if you try to run a 'git subtree split
> --rejoin' for subtree B, commits M1, M2, and A1, will be included in
> the processing of changes for the new split commit since the last
> split/rejoin for subtree B was at M3. The issue is that by having A1
> included in this processing the command ends up needing to processing
> every commit down tree A even though none of that is needed or relevant
> to the current command and result.
>
> M1
>  |        \       \
> M2         |       |
>  |        A1       |
> M3         |       |
>  |         |      B1
> M4         |       |

About the above, Junio already commented the following:

-> The above paragraph explains which different things you drew in the
-> diagram are representing, but it is not clear how they relate to
-> each other.  Do they for example depict parent-child commit
-> relationship?  What are the wide gaps between these three tracks and
-> what are the short angled lines leaning to the left near the tip?
-> Is the time/topology flowing from bottom to top?

and it doesn't look like you have addressed that comment.

When you say "M3 and B1 represent a split commit for subtree B that
was created from commit M4." I am not sure what it means exactly.
Could you give example commands that could have created the M3 and B1
commits?

> So this commit makes a change to the processing of commits for the split
> command in order to ignore non-mainline commits from other subtrees such
> as A1 in the diagram by adding a new function
> 'should_ignore_subtree_commit' which is called during
> 'process_split_commit'. This allows the split/rejoin processing to still
> function as expected but removes all of the unnecessary processing that
> takes place currently which greatly inflates the processing time.

Could you tell a bit more what kind of processing time reduction is or
would be possible on what kind of repo? Have you benchmark-ed or just
timed this somehow on one of your repos or better on an open source
repo (so that we could reproduce if we wanted)?

> Added a test to validate that the proposed fix
> solves the issue.
>
> The test accomplishes this by checking the output
> of the split command to ensure the output from
> the progress of 'process_split_commit' function
> that represents the 'extracount' of commits
> processed does not increment.

Does not increment compared to what?

> This was tested against the original functionality
> to show the test failed, and then with this fix
> to show the test passes.
>
> This illustrated that when using multiple subtrees,
> A and B, when doing a split on subtree B, the
> processing does not traverse the entire history
> of subtree A which is unnecessary and would cause
> the 'extracount' of processed commits to climb
> based on the number of commits in the history of
> subtree A.

Does this mean that the test checks that the extracount is the same
when subtree A exists as when it doesn't exist?

[...]

>  contrib/subtree/git-subtree.sh     | 29 ++++++++++++++++++++-
>  contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++
>  2 files changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index e0c5d3b0de6..e69991a9d80 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -778,6 +778,20 @@ ensure_valid_ref_format () {
>                 die "fatal: '$1' does not look like a ref"
>  }
>
> +# Usage: check if a commit from another subtree should be
> +# ignored from processing for splits
> +should_ignore_subtree_split_commit () {

Maybe adding:

    assert test $# = 1
    local rev="$1"

here, and using $rev instead of $1 in this function could make things
a bit clearer and similar to what is done in other functions.

> +  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
> +  then
> +    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
> +                       test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
> +    then
> +      return 0
> +    fi
> +  fi
> +  return 1
> +}

The above doesn't seem to be properly indented. We use tabs not spaces.

>  # Usage: process_split_commit REV PARENTS
>  process_split_commit () {
>         assert test $# = 2
> @@ -963,7 +977,20 @@ cmd_split () {
>         eval "$grl" |
>         while read rev parents
>         do
> -               process_split_commit "$rev" "$parents"
> +               if should_ignore_subtree_split_commit "$rev"
> +               then
> +                       continue
> +               fi
> +               parsedParents=''

It seems to me that we name variables "parsed_parents" (or sometimes
"parsedparents") rather than "parsedParents".

> +               for parent in $parents
> +               do
> +                       should_ignore_subtree_split_commit "$parent"
> +                       if test $? -eq 1

I think the 2 lines above could be replaced by:

+                       if ! should_ignore_subtree_split_commit "$parent"

> +                       then
> +                               parsedParents+="$parent "

It doesn't seem to me that we use "+=" much in our shell scripts.
https://www.shellcheck.net/ emits the following:

(warning): In POSIX sh, += is undefined.

so I guess we don't use it because it's not available in some usual shells.

(I haven't checked the script with https://www.shellcheck.net/ before
and after your patch, but it might help avoid bash-isms and such
issues.)

> +                       fi
> +               done
> +               process_split_commit "$rev" "$parsedParents"
>         done || exit $?

It looks like we use "exit $?" a lot in git-subtree.sh while we use
just "exit" most often elsewhere. Not sure why.

>         latest_new=$(cache_get latest_new) || exit $?
> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> index 49a21dd7c9c..87d59afd761 100755
> --- a/contrib/subtree/t/t7900-subtree.sh
> +++ b/contrib/subtree/t/t7900-subtree.sh
> @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' '
>         )
>  '
>
> +test_expect_success 'split with multiple subtrees' '
> +       subtree_test_create_repo "$test_count" &&
> +       subtree_test_create_repo "$test_count/subA" &&
> +       subtree_test_create_repo "$test_count/subB" &&
> +       test_create_commit "$test_count" main1 &&
> +       test_create_commit "$test_count/subA" subA1 &&
> +       test_create_commit "$test_count/subA" subA2 &&
> +       test_create_commit "$test_count/subA" subA3 &&
> +       test_create_commit "$test_count/subB" subB1 &&
> +       (
> +               cd "$test_count" &&
> +               git fetch ./subA HEAD &&
> +               git subtree add --prefix=subADir FETCH_HEAD
> +       ) &&
> +       (
> +               cd "$test_count" &&
> +               git fetch ./subB HEAD &&
> +               git subtree add --prefix=subBDir FETCH_HEAD
> +       ) &&
> +       test_create_commit "$test_count" subADir/main-subA1 &&
> +       test_create_commit "$test_count" subBDir/main-subB1 &&
> +       (
> +               cd "$test_count" &&
> +               git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
> +       ) &&

Not sure why there are so many sub-shells used, and why the -C option
is not used instead to tell Git to work in a subdirectory. I guess you
copied what most existing (old) tests in this test script do.

For example perhaps the 4 line above could be replaced by just:

+               git -C "$test_count" subtree split --prefix=subADir
--squash --rejoin -m "Sub A Split 1" &&

> +       (
> +               cd "$test_count" &&
> +               git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
> +       ) &&
> +       test_create_commit "$test_count" subADir/main-subA2 &&
> +       test_create_commit "$test_count" subBDir/main-subB2 &&
> +       (
> +               cd "$test_count" &&
> +               git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
> +       ) &&
> +       (
> +               cd "$test_count" &&
> +               test "$(git subtree split --prefix=subBDir --squash --rejoin \
> +                -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
> +       )
> +'

It's not clear to me what the test is doing. Maybe you could split it
into 2 tests. Perhaps one setting up a repo with multiple subtrees and
one checking that a new split ignores other subtree split commits.
Perhaps adding a few comments would help too.

Best,
Christian.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4] subtree: fix split processing with multiple subtrees present
  2023-11-18 11:28       ` Christian Couder
@ 2023-11-28 21:04         ` Zach FettersMoore
  0 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore @ 2023-11-28 21:04 UTC (permalink / raw)
  To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git

>> In the diagram below, 'M' represents the mainline repo branch, 'A'
>> represents one subtree, and 'B' represents another. M3 and B1 represent
>> a split commit for subtree B that was created from commit M4. M2 and A1
>> represent a split commit made from subtree A that was also created
>> based on changes back to and including M4. M1 represents new changes to
>> the repo, in this scenario if you try to run a 'git subtree split
>> --rejoin' for subtree B, commits M1, M2, and A1, will be included in
>> the processing of changes for the new split commit since the last
>> split/rejoin for subtree B was at M3. The issue is that by having A1
>> included in this processing the command ends up needing to processing
>> every commit down tree A even though none of that is needed or relevant
>> to the current command and result.
>>
>> M1
>>  |        \       \
>> M2         |       |
>>  |        A1       |
>> M3         |       |
>>  |         |      B1
>> M4         |       |

> About the above, Junio already commented the following:
>
> -> The above paragraph explains which different things you drew in the
> -> diagram are representing, but it is not clear how they relate to
> -> each other.  Do they for example depict parent-child commit
> -> relationship?  What are the wide gaps between these three tracks and
> -> what are the short angled lines leaning to the left near the tip?
> -> Is the time/topology flowing from bottom to top?
>
> and it doesn't look like you have addressed that comment.
>
> When you say "M3 and B1 represent a split commit for subtree B that
> was created from commit M4." I am not sure what it means exactly.
> Could you give example commands that could have created the M3 and B1
> commits?

I removed the diagram from the commit message since it seems a little
unclear, and in its place I added an example of an open source repo
(which I am currently using the fix in) and the commands to replicate
the issue. Hopefully that better illustrates how I came across the issue
and what it is.

>> So this commit makes a change to the processing of commits for the split
>> command in order to ignore non-mainline commits from other subtrees such
>> as A1 in the diagram by adding a new function
>> 'should_ignore_subtree_commit' which is called during
>> 'process_split_commit'. This allows the split/rejoin processing to still
>> function as expected but removes all of the unnecessary processing that
>> takes place currently which greatly inflates the processing time.

> Could you tell a bit more what kind of processing time reduction is or
> would be possible on what kind of repo? Have you benchmark-ed or just
> timed this somehow on one of your repos or better on an open source
> repo (so that we could reproduce if we wanted)?

I added some extra info for this to the commit message as well, but to
answer your question yes I discovered and benchmarked this issue in a
repo I help maintain. I was seeing splits take upwards of 12 minutes
before the fix, and after they were taking only seconds. Also provided
infor on the repo and how to reproduce in the updated commit message.

>> Added a test to validate that the proposed fix
>> solves the issue.
>>
>> The test accomplishes this by checking the output
>> of the split command to ensure the output from
>> the progress of 'process_split_commit' function
>> that represents the 'extracount' of commits
>> processed does not increment.

> Does not increment compared to what?

I reworded this to say the 'extracount' remains at 0 since
there should be no extra processed commits from the second subtree
in the test.

>> This was tested against the original functionality
>> to show the test failed, and then with this fix
>> to show the test passes.
>>
>> This illustrated that when using multiple subtrees,
>> A and B, when doing a split on subtree B, the
>> processing does not traverse the entire history
>> of subtree A which is unnecessary and would cause
>> the 'extracount' of processed commits to climb
>> based on the number of commits in the history of
>> subtree A.

> Does this mean that the test checks that the extracount is the same
> when subtree A exists as when it doesn't exist?

This means the test is checking that the 'extracount' remains at
0, because anything above 0 would mean commits from subtree A were
being processed, which is where the issue stems from.

>>  contrib/subtree/git-subtree.sh     | 29 ++++++++++++++++++++-
>>  contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++
>>  2 files changed, 70 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree=
>.sh
>> index e0c5d3b0de6..e69991a9d80 100755
>> --- a/contrib/subtree/git-subtree.sh
>> +++ b/contrib/subtree/git-subtree.sh
>> @@ -778,6 +778,20 @@ ensure_valid_ref_format () {
>>                 die "fatal: '$1' does not look like a ref"
>>  }
>>
>> +# Usage: check if a commit from another subtree should be
>> +# ignored from processing for splits
>> +should_ignore_subtree_split_commit () {

> Maybe adding:
>
>     assert test $# =3D 1
>     local rev=3D"$1"
>
> here, and using $rev instead of $1 in this function could make things
> a bit clearer and similar to what is done in other functions.

Updated.

>> +  if test -n "$(git log -1 --grep=3D"git-subtree-dir:" $1)"
>> +  then
>> +    if test -z "$(git log -1 --grep=3D"git-subtree-mainline:" $1)" &&
>> +                       test -z "$(git log -1 --grep=3D"git-subtree-dir: =
>>  $arg_prefix$" $1)"
>> +    then
>> +      return 0
>> +    fi
>> +  fi
>> +  return 1
>> +}

> The above doesn't seem to be properly indented. We use tabs not spaces.

Fixed.

>>  # Usage: process_split_commit REV PARENTS
>>  process_split_commit () {
>>         assert test $# =3D 2
>> @@ -963,7 +977,20 @@ cmd_split () {
>>         eval "$grl" |
>>         while read rev parents
>>         do
>> -               process_split_commit "$rev" "$parents"
>> +               if should_ignore_subtree_split_commit "$rev"
>> +               then
>> +                       continue
>> +               fi
>> +               parsedParents=3D''

> It seems to me that we name variables "parsed_parents" (or sometimes
> "parsedparents") rather than "parsedParents".

Fixed.

>> +               for parent in $parents
>> +               do
>> +                       should_ignore_subtree_split_commit "$parent"
>> +                       if test $? -eq 1

> I think the 2 lines above could be replaced by:
>
> +                       if ! should_ignore_subtree_split_commit "$parent"

Updated.

>> +                       then
>> +                               parsedParents+=3D"$parent "

> It doesn't seem to me that we use "+=3D" much in our shell scripts.
> https://www.shellcheck.net/ emits the following:
>
> (warning): In POSIX sh, +=3D is undefined.
>
> so I guess we don't use it because it's not available in some usual shells.
>
> (I haven't checked the script with https://www.shellcheck.net/ before
> and after your patch, but it might help avoid bash-isms and such
> issues.)

Updated this to remove the '+=' usage.

>> +                       fi
>> +               done
>> +               process_split_commit "$rev" "$parsedParents"
>>         done || exit $?

> It looks like we use "exit $?" a lot in git-subtree.sh while we use
> just "exit" most often elsewhere. Not sure why.

Yea I am unsure of the reasoning of that, I was just trying to follow the
what the existing script was already doing.

>>         latest_new=3D$(cache_get latest_new) || exit $?
>> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900=
>> -subtree.sh
>> index 49a21dd7c9c..87d59afd761 100755
>> --- a/contrib/subtree/t/t7900-subtree.sh
>> +++ b/contrib/subtree/t/t7900-subtree.sh
>> @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' '
>>         )
>>  '
>>
>> +test_expect_success 'split with multiple subtrees' '
>> +       subtree_test_create_repo "$test_count" &&
>> +       subtree_test_create_repo "$test_count/subA" &&
>> +       subtree_test_create_repo "$test_count/subB" &&
>> +       test_create_commit "$test_count" main1 &&
>> +       test_create_commit "$test_count/subA" subA1 &&
>> +       test_create_commit "$test_count/subA" subA2 &&
>> +       test_create_commit "$test_count/subA" subA3 &&
>> +       test_create_commit "$test_count/subB" subB1 &&
>> +       (
>> +               cd "$test_count" &&
>> +               git fetch ./subA HEAD &&
>> +               git subtree add --prefix=3DsubADir FETCH_HEAD
>> +       ) &&
>> +       (
>> +               cd "$test_count" &&
>> +               git fetch ./subB HEAD &&
>> +               git subtree add --prefix=3DsubBDir FETCH_HEAD
>> +       ) &&
>> +       test_create_commit "$test_count" subADir/main-subA1 &&
>> +       test_create_commit "$test_count" subBDir/main-subB1 &&
>> +       (
>> +               cd "$test_count" &&
>> +               git subtree split --prefix=3DsubADir --squash --rejoin -m=
>> "Sub A Split 1"
>> +       ) &&

> Not sure why there are so many sub-shells used, and why the -C option
> is not used instead to tell Git to work in a subdirectory. I guess you
> copied what most existing (old) tests in this test script do.
>
> For example perhaps the 4 line above could be replaced by just:
>
> +               git -C "$test_count" subtree split --prefix=3DsubADir
> --squash --rejoin -m "Sub A Split 1" &&

Yea I was following what was being done in other existing tests, although
this seems like a better way to do this so I updated the test to remove
the extra sub-shells.

>> +       (
>> +               cd "$test_count" &&
>> +               git subtree split --prefix=3DsubBDir --squash --rejoin -m=
>> "Sub B Split 1"
>> +       ) &&
>> +       test_create_commit "$test_count" subADir/main-subA2 &&
>> +       test_create_commit "$test_count" subBDir/main-subB2 &&
>> +       (
>> +               cd "$test_count" &&
>> +               git subtree split --prefix=3DsubADir --squash --rejoin -m=
>> "Sub A Split 2"
>> +       ) &&
>> +       (
>> +               cd "$test_count" &&
>> +               test "$(git subtree split --prefix=3DsubBDir --squash --r=
>> ejoin \
>> +                -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" =3D ""
>> +       )
>> +'

> It's not clear to me what the test is doing. Maybe you could split it
> into 2 tests. Perhaps one setting up a repo with multiple subtrees and
> one checking that a new split ignores other subtree split commits.
> Perhaps adding a few comments would help too.

Added some comments before the test to describe the steps the test is taking in
order to verify the desired behavior.


On Sat, Nov 18, 2023 at 6:28 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Thu, Oct 26, 2023 at 9:18 PM Zach FettersMoore via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Zach FettersMoore <zach.fetters@apollographql.com>
> >
> > When there are multiple subtrees present in a repository and they are
> > all using 'git subtree split', the 'split' command can take a
> > significant (and constantly growing) amount of time to run even when
> > using the '--rejoin' flag. This is due to the fact that when processing
> > commits to determine the last known split to start from when looking
> > for changes, if there has been a split/merge done from another subtree
> > there will be 2 split commits, one mainline and one subtree, for the
> > second subtree that are part of the processing. The non-mainline
> > subtree split commit will cause the processing to always need to search
> > the entire history of the given subtree as part of its processing even
> > though those commits are totally irrelevant to the current subtree
> > split being run.
>
> Thanks for your continued work on this!
>
> I am not familiar with git subtree so I might miss obvious things. On
> the other hand, my comments might help increase a bit the number of
> people who could review this patch.
>
> > In the diagram below, 'M' represents the mainline repo branch, 'A'
> > represents one subtree, and 'B' represents another. M3 and B1 represent
> > a split commit for subtree B that was created from commit M4. M2 and A1
> > represent a split commit made from subtree A that was also created
> > based on changes back to and including M4. M1 represents new changes to
> > the repo, in this scenario if you try to run a 'git subtree split
> > --rejoin' for subtree B, commits M1, M2, and A1, will be included in
> > the processing of changes for the new split commit since the last
> > split/rejoin for subtree B was at M3. The issue is that by having A1
> > included in this processing the command ends up needing to processing
> > every commit down tree A even though none of that is needed or relevant
> > to the current command and result.
> >
> > M1
> >  |        \       \
> > M2         |       |
> >  |        A1       |
> > M3         |       |
> >  |         |      B1
> > M4         |       |
>
> About the above, Junio already commented the following:
>
> -> The above paragraph explains which different things you drew in the
> -> diagram are representing, but it is not clear how they relate to
> -> each other.  Do they for example depict parent-child commit
> -> relationship?  What are the wide gaps between these three tracks and
> -> what are the short angled lines leaning to the left near the tip?
> -> Is the time/topology flowing from bottom to top?
>
> and it doesn't look like you have addressed that comment.
>
> When you say "M3 and B1 represent a split commit for subtree B that
> was created from commit M4." I am not sure what it means exactly.
> Could you give example commands that could have created the M3 and B1
> commits?
>
> > So this commit makes a change to the processing of commits for the split
> > command in order to ignore non-mainline commits from other subtrees such
> > as A1 in the diagram by adding a new function
> > 'should_ignore_subtree_commit' which is called during
> > 'process_split_commit'. This allows the split/rejoin processing to still
> > function as expected but removes all of the unnecessary processing that
> > takes place currently which greatly inflates the processing time.
>
> Could you tell a bit more what kind of processing time reduction is or
> would be possible on what kind of repo? Have you benchmark-ed or just
> timed this somehow on one of your repos or better on an open source
> repo (so that we could reproduce if we wanted)?
>
> > Added a test to validate that the proposed fix
> > solves the issue.
> >
> > The test accomplishes this by checking the output
> > of the split command to ensure the output from
> > the progress of 'process_split_commit' function
> > that represents the 'extracount' of commits
> > processed does not increment.
>
> Does not increment compared to what?
>
> > This was tested against the original functionality
> > to show the test failed, and then with this fix
> > to show the test passes.
> >
> > This illustrated that when using multiple subtrees,
> > A and B, when doing a split on subtree B, the
> > processing does not traverse the entire history
> > of subtree A which is unnecessary and would cause
> > the 'extracount' of processed commits to climb
> > based on the number of commits in the history of
> > subtree A.
>
> Does this mean that the test checks that the extracount is the same
> when subtree A exists as when it doesn't exist?
>
> [...]
>
> >  contrib/subtree/git-subtree.sh     | 29 ++++++++++++++++++++-
> >  contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++
> >  2 files changed, 70 insertions(+), 1 deletion(-)
> >
> > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> > index e0c5d3b0de6..e69991a9d80 100755
> > --- a/contrib/subtree/git-subtree.sh
> > +++ b/contrib/subtree/git-subtree.sh
> > @@ -778,6 +778,20 @@ ensure_valid_ref_format () {
> >                 die "fatal: '$1' does not look like a ref"
> >  }
> >
> > +# Usage: check if a commit from another subtree should be
> > +# ignored from processing for splits
> > +should_ignore_subtree_split_commit () {
>
> Maybe adding:
>
>     assert test $# = 1
>     local rev="$1"
>
> here, and using $rev instead of $1 in this function could make things
> a bit clearer and similar to what is done in other functions.
>
> > +  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
> > +  then
> > +    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
> > +                       test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
> > +    then
> > +      return 0
> > +    fi
> > +  fi
> > +  return 1
> > +}
>
> The above doesn't seem to be properly indented. We use tabs not spaces.
>
> >  # Usage: process_split_commit REV PARENTS
> >  process_split_commit () {
> >         assert test $# = 2
> > @@ -963,7 +977,20 @@ cmd_split () {
> >         eval "$grl" |
> >         while read rev parents
> >         do
> > -               process_split_commit "$rev" "$parents"
> > +               if should_ignore_subtree_split_commit "$rev"
> > +               then
> > +                       continue
> > +               fi
> > +               parsedParents=''
>
> It seems to me that we name variables "parsed_parents" (or sometimes
> "parsedparents") rather than "parsedParents".
>
> > +               for parent in $parents
> > +               do
> > +                       should_ignore_subtree_split_commit "$parent"
> > +                       if test $? -eq 1
>
> I think the 2 lines above could be replaced by:
>
> +                       if ! should_ignore_subtree_split_commit "$parent"
>
> > +                       then
> > +                               parsedParents+="$parent "
>
> It doesn't seem to me that we use "+=" much in our shell scripts.
> https://www.shellcheck.net/ emits the following:
>
> (warning): In POSIX sh, += is undefined.
>
> so I guess we don't use it because it's not available in some usual shells.
>
> (I haven't checked the script with https://www.shellcheck.net/ before
> and after your patch, but it might help avoid bash-isms and such
> issues.)
>
> > +                       fi
> > +               done
> > +               process_split_commit "$rev" "$parsedParents"
> >         done || exit $?
>
> It looks like we use "exit $?" a lot in git-subtree.sh while we use
> just "exit" most often elsewhere. Not sure why.
>
> >         latest_new=$(cache_get latest_new) || exit $?
> > diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> > index 49a21dd7c9c..87d59afd761 100755
> > --- a/contrib/subtree/t/t7900-subtree.sh
> > +++ b/contrib/subtree/t/t7900-subtree.sh
> > @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' '
> >         )
> >  '
> >
> > +test_expect_success 'split with multiple subtrees' '
> > +       subtree_test_create_repo "$test_count" &&
> > +       subtree_test_create_repo "$test_count/subA" &&
> > +       subtree_test_create_repo "$test_count/subB" &&
> > +       test_create_commit "$test_count" main1 &&
> > +       test_create_commit "$test_count/subA" subA1 &&
> > +       test_create_commit "$test_count/subA" subA2 &&
> > +       test_create_commit "$test_count/subA" subA3 &&
> > +       test_create_commit "$test_count/subB" subB1 &&
> > +       (
> > +               cd "$test_count" &&
> > +               git fetch ./subA HEAD &&
> > +               git subtree add --prefix=subADir FETCH_HEAD
> > +       ) &&
> > +       (
> > +               cd "$test_count" &&
> > +               git fetch ./subB HEAD &&
> > +               git subtree add --prefix=subBDir FETCH_HEAD
> > +       ) &&
> > +       test_create_commit "$test_count" subADir/main-subA1 &&
> > +       test_create_commit "$test_count" subBDir/main-subB1 &&
> > +       (
> > +               cd "$test_count" &&
> > +               git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
> > +       ) &&
>
> Not sure why there are so many sub-shells used, and why the -C option
> is not used instead to tell Git to work in a subdirectory. I guess you
> copied what most existing (old) tests in this test script do.
>
> For example perhaps the 4 line above could be replaced by just:
>
> +               git -C "$test_count" subtree split --prefix=subADir
> --squash --rejoin -m "Sub A Split 1" &&
>
> > +       (
> > +               cd "$test_count" &&
> > +               git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
> > +       ) &&
> > +       test_create_commit "$test_count" subADir/main-subA2 &&
> > +       test_create_commit "$test_count" subBDir/main-subB2 &&
> > +       (
> > +               cd "$test_count" &&
> > +               git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
> > +       ) &&
> > +       (
> > +               cd "$test_count" &&
> > +               test "$(git subtree split --prefix=subBDir --squash --rejoin \
> > +                -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
> > +       )
> > +'
>
> It's not clear to me what the test is doing. Maybe you could split it
> into 2 tests. Perhaps one setting up a repo with multiple subtrees and
> one checking that a new split ignores other subtree split commits.
> Perhaps adding a few comments would help too.
>
> Best,
> Christian.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5] subtree: fix split processing with multiple subtrees present
  2023-10-26 19:17     ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
  2023-11-18 11:28       ` Christian Couder
@ 2023-11-28 21:17       ` Zach FettersMoore via GitGitGadget
  2023-11-30 20:33         ` Christian Couder
  2023-12-01 14:54         ` [PATCH v6] " Zach FettersMoore via GitGitGadget
  1 sibling, 2 replies; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-11-28 21:17 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

To see this in practice you can use the open source GitHub repo
'apollo-ios-dev' and do the following in order:

-Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'
 directories
-Create a commit containing these changes
-Do a split on apollo-ios-codegen
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
-Do a split on apollo-ios
   - git subtree split --prefix=apollo-ios --squash --rejoin
-Make changes to a file in apollo-ios-codegen
-Create a commit containing the change(s)
-Do a split on apollo-ios-codegen
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

You will see that the final split is looking for the last split
on apollo-ios-codegen to use as it's starting point to process
commits. Since there is a split commit from apollo-ios in between the
2 splits run on apollo-ios-codegen, the processing ends up traversing
the entire history of apollo-ios which increases the time it takes to
do a split based on how long of a history apollo-ios has, while none
of these commits are relevant to the split being done on
apollo-ios-codegen.

So this commit makes a change to the processing of commits for the
split command in order to ignore non-mainline commits from other
subtrees such as apollo-ios in the above breakdown by adding a new
function 'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to
still function as expected but removes all of the unnecessary
processing that takes place currently which greatly inflates the
processing time. In the above example, previously the final split
would take ~10-12 minutes, while after this fix it takes seconds.

Added a test to validate that the proposed fix
solves the issue.

The test accomplishes this by checking the output
of the split command to ensure the output from
the progress of 'process_split_commit' function
that represents the 'extracount' of commits
processed remains at 0, meaning none of the commits
from the second subtree were processed.

This was tested against the original functionality
to show the test failed, and then with this fix
to show the test passes.

This illustrated that when using multiple subtrees,
A and B, when doing a split on subtree B, the
processing does not traverse the entire history
of subtree A which is unnecessary and would cause
the 'extracount' of processed commits to climb
based on the number of commits in the history of
subtree A.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
    subtree: fix split processing with multiple subtrees present
    
    When there are multiple subtrees in a repo and git subtree split
    --rejoin is being used for the subtrees, the processing of commits for a
    new split can take a significant (and constantly growing) amount of time
    because the split commits from other subtrees cause the processing to
    have to scan the entire history of the other subtree(s). This patch
    filters out the other subtree split commits that are unnecessary for the
    split commit processing.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

Range-diff vs v4:

 1:  353152910eb ! 1:  e7445a95f30 subtree: fix split processing with multiple subtrees present
     @@ Commit message
          though those commits are totally irrelevant to the current subtree
          split being run.
      
     -    In the diagram below, 'M' represents the mainline repo branch, 'A'
     -    represents one subtree, and 'B' represents another. M3 and B1 represent
     -    a split commit for subtree B that was created from commit M4. M2 and A1
     -    represent a split commit made from subtree A that was also created
     -    based on changes back to and including M4. M1 represents new changes to
     -    the repo, in this scenario if you try to run a 'git subtree split
     -    --rejoin' for subtree B, commits M1, M2, and A1, will be included in
     -    the processing of changes for the new split commit since the last
     -    split/rejoin for subtree B was at M3. The issue is that by having A1
     -    included in this processing the command ends up needing to processing
     -    every commit down tree A even though none of that is needed or relevant
     -    to the current command and result.
     +    To see this in practice you can use the open source GitHub repo
     +    'apollo-ios-dev' and do the following in order:
      
     -    M1
     -     |        \       \
     -    M2         |       |
     -     |        A1       |
     -    M3         |       |
     -     |         |      B1
     -    M4         |       |
     +    -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'
     +     directories
     +    -Create a commit containing these changes
     +    -Do a split on apollo-ios-codegen
     +       - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
     +    -Do a split on apollo-ios
     +       - git subtree split --prefix=apollo-ios --squash --rejoin
     +    -Make changes to a file in apollo-ios-codegen
     +    -Create a commit containing the change(s)
     +    -Do a split on apollo-ios-codegen
     +       - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
      
     -    So this commit makes a change to the processing of commits for the split
     -    command in order to ignore non-mainline commits from other subtrees such
     -    as A1 in the diagram by adding a new function
     -    'should_ignore_subtree_commit' which is called during
     -    'process_split_commit'. This allows the split/rejoin processing to still
     -    function as expected but removes all of the unnecessary processing that
     -    takes place currently which greatly inflates the processing time.
     +    You will see that the final split is looking for the last split
     +    on apollo-ios-codegen to use as it's starting point to process
     +    commits. Since there is a split commit from apollo-ios in between the
     +    2 splits run on apollo-ios-codegen, the processing ends up traversing
     +    the entire history of apollo-ios which increases the time it takes to
     +    do a split based on how long of a history apollo-ios has, while none
     +    of these commits are relevant to the split being done on
     +    apollo-ios-codegen.
     +
     +    So this commit makes a change to the processing of commits for the
     +    split command in order to ignore non-mainline commits from other
     +    subtrees such as apollo-ios in the above breakdown by adding a new
     +    function 'should_ignore_subtree_commit' which is called during
     +    'process_split_commit'. This allows the split/rejoin processing to
     +    still function as expected but removes all of the unnecessary
     +    processing that takes place currently which greatly inflates the
     +    processing time. In the above example, previously the final split
     +    would take ~10-12 minutes, while after this fix it takes seconds.
      
          Added a test to validate that the proposed fix
          solves the issue.
     @@ Commit message
          of the split command to ensure the output from
          the progress of 'process_split_commit' function
          that represents the 'extracount' of commits
     -    processed does not increment.
     +    processed remains at 0, meaning none of the commits
     +    from the second subtree were processed.
      
          This was tested against the original functionality
          to show the test failed, and then with this fix
     @@ contrib/subtree/git-subtree.sh: ensure_valid_ref_format () {
      +# Usage: check if a commit from another subtree should be
      +# ignored from processing for splits
      +should_ignore_subtree_split_commit () {
     -+  if test -n "$(git log -1 --grep="git-subtree-dir:" $1)"
     -+  then
     -+    if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" &&
     -+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)"
     -+    then
     -+      return 0
     -+    fi
     -+  fi
     -+  return 1
     ++	assert test $# = 1
     ++	local rev="$1"
     ++	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
     ++	then
     ++		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
     ++			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
     ++		then
     ++			return 0
     ++		fi
     ++	fi
     ++	return 1
      +}
      +
       # Usage: process_split_commit REV PARENTS
     @@ contrib/subtree/git-subtree.sh: cmd_split () {
      +		then
      +			continue
      +		fi
     -+		parsedParents=''
     ++		parsedparents=''
      +		for parent in $parents
      +		do
     -+			should_ignore_subtree_split_commit "$parent"
     -+			if test $? -eq 1
     ++			if ! should_ignore_subtree_split_commit "$parent"
      +			then
     -+				parsedParents+="$parent "
     ++				parsedparents="$parsedparents$parent "
      +			fi
      +		done
     -+		process_split_commit "$rev" "$parsedParents"
     ++		process_split_commit "$rev" "$parsedparents"
       	done || exit $?
       
       	latest_new=$(cache_get latest_new) || exit $?
     @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --r
       	)
       '
       
     ++# Tests that commits from other subtrees are not processed as
     ++# part of a split.
     ++#
     ++# This test performs the following:
     ++# - Creates Repo with subtrees 'subA' and 'subB'
     ++# - Creates commits in the repo including changes to subtrees
     ++# - Runs the following 'split' and commit' commands in order:
     ++# 	- Perform 'split' on subtree A
     ++# 	- Perform 'split' on subtree B
     ++# 	- Create new commits with changes to subtree A and B
     ++# 	- Perform split on subtree A
     ++# 	- Check that the commits in subtree B are not processed
     ++#			as part of the subtree A split
      +test_expect_success 'split with multiple subtrees' '
      +	subtree_test_create_repo "$test_count" &&
      +	subtree_test_create_repo "$test_count/subA" &&
     @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --r
      +	test_create_commit "$test_count/subA" subA2 &&
      +	test_create_commit "$test_count/subA" subA3 &&
      +	test_create_commit "$test_count/subB" subB1 &&
     -+	(
     -+		cd "$test_count" &&
     -+		git fetch ./subA HEAD &&
     -+		git subtree add --prefix=subADir FETCH_HEAD
     -+	) &&
     -+	(
     -+		cd "$test_count" &&
     -+		git fetch ./subB HEAD &&
     -+		git subtree add --prefix=subBDir FETCH_HEAD
     -+	) &&
     ++	git -C "$test_count" fetch ./subA HEAD &&
     ++	git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD &&
     ++	git -C "$test_count" fetch ./subB HEAD &&
     ++	git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD &&
      +	test_create_commit "$test_count" subADir/main-subA1 &&
      +	test_create_commit "$test_count" subBDir/main-subB1 &&
     -+	(
     -+		cd "$test_count" &&
     -+		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1"
     -+	) &&
     -+	(
     -+		cd "$test_count" &&
     -+		git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1"
     -+	) &&
     ++	git -C "$test_count" subtree split --prefix=subADir \
     ++		--squash --rejoin -m "Sub A Split 1" &&
     ++	git -C "$test_count" subtree split --prefix=subBDir \
     ++		--squash --rejoin -m "Sub B Split 1" &&
      +	test_create_commit "$test_count" subADir/main-subA2 &&
      +	test_create_commit "$test_count" subBDir/main-subB2 &&
     -+	(
     -+		cd "$test_count" &&
     -+		git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2"
     -+	) &&
     -+	(
     -+		cd "$test_count" &&
     -+		test "$(git subtree split --prefix=subBDir --squash --rejoin \
     -+		 -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
     -+	)
     ++	git -C "$test_count" subtree split --prefix=subADir \
     ++		--squash --rejoin -m "Sub A Split 2" &&
     ++	test "$(git -C "$test_count" subtree split --prefix=subBDir \
     ++		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
      +'
      +
       test_expect_success 'split sub dir/ with --rejoin from scratch' '


 contrib/subtree/git-subtree.sh     | 30 +++++++++++++++++++++-
 contrib/subtree/t/t7900-subtree.sh | 40 ++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..a0bf958ea66 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,6 +778,22 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be
+# ignored from processing for splits
+should_ignore_subtree_split_commit () {
+	assert test $# = 1
+	local rev="$1"
+	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
+	then
+		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
+		then
+			return 0
+		fi
+	fi
+	return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
@@ -963,7 +979,19 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		if should_ignore_subtree_split_commit "$rev"
+		then
+			continue
+		fi
+		parsedparents=''
+		for parent in $parents
+		do
+			if ! should_ignore_subtree_split_commit "$parent"
+			then
+				parsedparents="$parsedparents$parent "
+			fi
+		done
+		process_split_commit "$rev" "$parsedparents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 49a21dd7c9c..ca4df5be832 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -385,6 +385,46 @@ test_expect_success 'split sub dir/ with --rejoin' '
 	)
 '
 
+# Tests that commits from other subtrees are not processed as
+# part of a split.
+#
+# This test performs the following:
+# - Creates Repo with subtrees 'subA' and 'subB'
+# - Creates commits in the repo including changes to subtrees
+# - Runs the following 'split' and commit' commands in order:
+# 	- Perform 'split' on subtree A
+# 	- Perform 'split' on subtree B
+# 	- Create new commits with changes to subtree A and B
+# 	- Perform split on subtree A
+# 	- Check that the commits in subtree B are not processed
+#			as part of the subtree A split
+test_expect_success 'split with multiple subtrees' '
+	subtree_test_create_repo "$test_count" &&
+	subtree_test_create_repo "$test_count/subA" &&
+	subtree_test_create_repo "$test_count/subB" &&
+	test_create_commit "$test_count" main1 &&
+	test_create_commit "$test_count/subA" subA1 &&
+	test_create_commit "$test_count/subA" subA2 &&
+	test_create_commit "$test_count/subA" subA3 &&
+	test_create_commit "$test_count/subB" subB1 &&
+	git -C "$test_count" fetch ./subA HEAD &&
+	git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD &&
+	git -C "$test_count" fetch ./subB HEAD &&
+	git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD &&
+	test_create_commit "$test_count" subADir/main-subA1 &&
+	test_create_commit "$test_count" subBDir/main-subB1 &&
+	git -C "$test_count" subtree split --prefix=subADir \
+		--squash --rejoin -m "Sub A Split 1" &&
+	git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -m "Sub B Split 1" &&
+	test_create_commit "$test_count" subADir/main-subA2 &&
+	test_create_commit "$test_count" subBDir/main-subB2 &&
+	git -C "$test_count" subtree split --prefix=subADir \
+		--squash --rejoin -m "Sub A Split 2" &&
+	test "$(git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
+'
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present
  2023-11-28 21:17       ` [PATCH v5] " Zach FettersMoore via GitGitGadget
@ 2023-11-30 20:33         ` Christian Couder
  2023-11-30 21:01           ` Zach FettersMoore
  2023-12-01 14:54         ` [PATCH v6] " Zach FettersMoore via GitGitGadget
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Couder @ 2023-11-30 20:33 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore

On Tue, Nov 28, 2023 at 10:17 PM Zach FettersMoore via GitGitGadget
<gitgitgadget@gmail.com> wrote:

> To see this in practice you can use the open source GitHub repo
> 'apollo-ios-dev' and do the following in order:
>
> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'

It looks like there is a spurious A after 'apollo-ios' in the line above.

>  directories
> -Create a commit containing these changes
> -Do a split on apollo-ios-codegen
>    - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

I might be doing something stupid or wrong, but I get the following:

$ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
fatal: could not rev-parse split hash
cc70a7d49e84696f0df210710445784c504ed748 from commit
360f068ea0d57f250621ab7dbe205313f52a0e98
hint: hash might be a tag, try fetching it from the subtree repository:
hint:    git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748

> -Do a split on apollo-ios
>    - git subtree split --prefix=apollo-ios --squash --rejoin

Same issue:

$ git subtree split --prefix=apollo-ios --squash --rejoin
fatal: could not rev-parse split hash
b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit
b48030c3eb6e2faf4bff981c5c63ca72aceecdfa
hint: hash might be a tag, try fetching it from the subtree repository:
hint:    git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111

I didn't try to get farther than this, as it seems that some
instructions might be missing.

[...]

> So this commit makes a change to the processing of commits for the
> split command in order to ignore non-mainline commits from other
> subtrees such as apollo-ios in the above breakdown by adding a new
> function 'should_ignore_subtree_commit' which is called during
> 'process_split_commit'. This allows the split/rejoin processing to
> still function as expected but removes all of the unnecessary
> processing that takes place currently which greatly inflates the
> processing time. In the above example, previously the final split
> would take ~10-12 minutes, while after this fix it takes seconds.

Nice!

Except for the above issues in the commit message, the rest of the
patch looks good to me, thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present
  2023-11-30 20:33         ` Christian Couder
@ 2023-11-30 21:01           ` Zach FettersMoore
  0 siblings, 0 replies; 30+ messages in thread
From: Zach FettersMoore @ 2023-11-30 21:01 UTC (permalink / raw)
  To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git

>> To see this in practice you can use the open source GitHub repo
>> 'apollo-ios-dev' and do the following in order:
>>
>> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'

> It looks like there is a spurious A after 'apollo-ios' in the line above.

Thanks for catching that, definitely a typo on my part.

>> directories
>> -Create a commit containing these changes
>> -Do a split on apollo-ios-codegen
>> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

> I might be doing something stupid or wrong, but I get the following:
>
> $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> fatal: could not rev-parse split hash
> cc70a7d49e84696f0df210710445784c504ed748 from commit
> 360f068ea0d57f250621ab7dbe205313f52a0e98
> hint: hash might be a tag, try fetching it from the subtree repository:
> hint: git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748

Updated this to include doing a fetch to ensure all remote repo
info is available locally.

>> -Do a split on apollo-ios
>> - git subtree split --prefix=apollo-ios --squash --rejoin

> Same issue:
>
> $ git subtree split --prefix=apollo-ios --squash --rejoin
> fatal: could not rev-parse split hash
> b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit
> b48030c3eb6e2faf4bff981c5c63ca72aceecdfa
> hint: hash might be a tag, try fetching it from the subtree repository:
> hint: git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111
>
> I didn't try to get farther than this, as it seems that some
> instructions might be missing.

Same as above, added extra instruction to do a fetch first.

Also added a little extra info that the issue may present after the
first split in the instructions depending on the current state of the
repo being used. Also added a way to do the same steps with the changes
applied to see that it resolves the issue.

>> So this commit makes a change to the processing of commits for the
>> split command in order to ignore non-mainline commits from other
>> subtrees such as apollo-ios in the above breakdown by adding a new
>> function 'should_ignore_subtree_commit' which is called during
>> 'process_split_commit'. This allows the split/rejoin processing to
>> still function as expected but removes all of the unnecessary
>> processing that takes place currently which greatly inflates the
>> processing time. In the above example, previously the final split
>> would take ~10-12 minutes, while after this fix it takes seconds.

> Nice!
>
> Except for the above issues in the commit message, the rest of the
> patch looks good to me, thanks!

Great! Thanks for the review and guidance!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-11-28 21:17       ` [PATCH v5] " Zach FettersMoore via GitGitGadget
  2023-11-30 20:33         ` Christian Couder
@ 2023-12-01 14:54         ` Zach FettersMoore via GitGitGadget
  2023-12-04 11:08           ` Christian Couder
  1 sibling, 1 reply; 30+ messages in thread
From: Zach FettersMoore via GitGitGadget @ 2023-12-01 14:54 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Zach FettersMoore, Zach FettersMoore

From: Zach FettersMoore <zach.fetters@apollographql.com>

When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

To see this in practice you can use the open source GitHub repo
'apollo-ios-dev' and do the following in order:

-Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
 directories
-Create a commit containing these changes
-Do a split on apollo-ios-codegen
   - Do a fetch on the subtree repo
      - git fetch git@github.com:apollographql/apollo-ios-codegen.git
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
   - Depending on the current state of the 'apollo-ios-dev' repo
     you may see the issue at this point if the last split was on
     apollo-ios
-Do a split on apollo-ios
   - Do a fetch on the subtree repo
      - git fetch git@github.com:apollographql/apollo-ios.git
   - git subtree split --prefix=apollo-ios --squash --rejoin
-Make changes to a file in apollo-ios-codegen
-Create a commit containing the change(s)
-Do a split on apollo-ios-codegen
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
-To see that the patch fixes the issue you can use the custom subtree
 script in the repo so following the same steps as above, except
 instead of using 'git subtree ...' for the commands use
 'git-subtree.sh ...' for the commands

You will see that the final split is looking for the last split
on apollo-ios-codegen to use as it's starting point to process
commits. Since there is a split commit from apollo-ios in between the
2 splits run on apollo-ios-codegen, the processing ends up traversing
the entire history of apollo-ios which increases the time it takes to
do a split based on how long of a history apollo-ios has, while none
of these commits are relevant to the split being done on
apollo-ios-codegen.

So this commit makes a change to the processing of commits for the
split command in order to ignore non-mainline commits from other
subtrees such as apollo-ios in the above breakdown by adding a new
function 'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to
still function as expected but removes all of the unnecessary
processing that takes place currently which greatly inflates the
processing time. In the above example, previously the final split
would take ~10-12 minutes, while after this fix it takes seconds.

Added a test to validate that the proposed fix
solves the issue.

The test accomplishes this by checking the output
of the split command to ensure the output from
the progress of 'process_split_commit' function
that represents the 'extracount' of commits
processed remains at 0, meaning none of the commits
from the second subtree were processed.

This was tested against the original functionality
to show the test failed, and then with this fix
to show the test passes.

This illustrated that when using multiple subtrees,
A and B, when doing a split on subtree B, the
processing does not traverse the entire history
of subtree A which is unnecessary and would cause
the 'extracount' of processed commits to climb
based on the number of commits in the history of
subtree A.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
---
    subtree: fix split processing with multiple subtrees present
    
    When there are multiple subtrees in a repo and git subtree split
    --rejoin is being used for the subtrees, the processing of commits for a
    new split can take a significant (and constantly growing) amount of time
    because the split commits from other subtrees cause the processing to
    have to scan the entire history of the other subtree(s). This patch
    filters out the other subtree split commits that are unnecessary for the
    split commit processing.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/1587

Range-diff vs v5:

 1:  e7445a95f30 ! 1:  2a65ec0e4df subtree: fix split processing with multiple subtrees present
     @@ Commit message
          To see this in practice you can use the open source GitHub repo
          'apollo-ios-dev' and do the following in order:
      
     -    -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen'
     +    -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
           directories
          -Create a commit containing these changes
          -Do a split on apollo-ios-codegen
     +       - Do a fetch on the subtree repo
     +          - git fetch git@github.com:apollographql/apollo-ios-codegen.git
             - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
     +       - Depending on the current state of the 'apollo-ios-dev' repo
     +         you may see the issue at this point if the last split was on
     +         apollo-ios
          -Do a split on apollo-ios
     +       - Do a fetch on the subtree repo
     +          - git fetch git@github.com:apollographql/apollo-ios.git
             - git subtree split --prefix=apollo-ios --squash --rejoin
          -Make changes to a file in apollo-ios-codegen
          -Create a commit containing the change(s)
          -Do a split on apollo-ios-codegen
             - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
     +    -To see that the patch fixes the issue you can use the custom subtree
     +     script in the repo so following the same steps as above, except
     +     instead of using 'git subtree ...' for the commands use
     +     'git-subtree.sh ...' for the commands
      
          You will see that the final split is looking for the last split
          on apollo-ios-codegen to use as it's starting point to process


 contrib/subtree/git-subtree.sh     | 30 +++++++++++++++++++++-
 contrib/subtree/t/t7900-subtree.sh | 40 ++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index e0c5d3b0de6..a0bf958ea66 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -778,6 +778,22 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
+# Usage: check if a commit from another subtree should be
+# ignored from processing for splits
+should_ignore_subtree_split_commit () {
+	assert test $# = 1
+	local rev="$1"
+	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
+	then
+		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
+		then
+			return 0
+		fi
+	fi
+	return 1
+}
+
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
@@ -963,7 +979,19 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		if should_ignore_subtree_split_commit "$rev"
+		then
+			continue
+		fi
+		parsedparents=''
+		for parent in $parents
+		do
+			if ! should_ignore_subtree_split_commit "$parent"
+			then
+				parsedparents="$parsedparents$parent "
+			fi
+		done
+		process_split_commit "$rev" "$parsedparents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 49a21dd7c9c..ca4df5be832 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -385,6 +385,46 @@ test_expect_success 'split sub dir/ with --rejoin' '
 	)
 '
 
+# Tests that commits from other subtrees are not processed as
+# part of a split.
+#
+# This test performs the following:
+# - Creates Repo with subtrees 'subA' and 'subB'
+# - Creates commits in the repo including changes to subtrees
+# - Runs the following 'split' and commit' commands in order:
+# 	- Perform 'split' on subtree A
+# 	- Perform 'split' on subtree B
+# 	- Create new commits with changes to subtree A and B
+# 	- Perform split on subtree A
+# 	- Check that the commits in subtree B are not processed
+#			as part of the subtree A split
+test_expect_success 'split with multiple subtrees' '
+	subtree_test_create_repo "$test_count" &&
+	subtree_test_create_repo "$test_count/subA" &&
+	subtree_test_create_repo "$test_count/subB" &&
+	test_create_commit "$test_count" main1 &&
+	test_create_commit "$test_count/subA" subA1 &&
+	test_create_commit "$test_count/subA" subA2 &&
+	test_create_commit "$test_count/subA" subA3 &&
+	test_create_commit "$test_count/subB" subB1 &&
+	git -C "$test_count" fetch ./subA HEAD &&
+	git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD &&
+	git -C "$test_count" fetch ./subB HEAD &&
+	git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD &&
+	test_create_commit "$test_count" subADir/main-subA1 &&
+	test_create_commit "$test_count" subBDir/main-subB1 &&
+	git -C "$test_count" subtree split --prefix=subADir \
+		--squash --rejoin -m "Sub A Split 1" &&
+	git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -m "Sub B Split 1" &&
+	test_create_commit "$test_count" subADir/main-subA2 &&
+	test_create_commit "$test_count" subBDir/main-subB2 &&
+	git -C "$test_count" subtree split --prefix=subADir \
+		--squash --rejoin -m "Sub A Split 2" &&
+	test "$(git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
+'
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-01 14:54         ` [PATCH v6] " Zach FettersMoore via GitGitGadget
@ 2023-12-04 11:08           ` Christian Couder
  2023-12-11 15:39             ` Zach FettersMoore
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Couder @ 2023-12-04 11:08 UTC (permalink / raw)
  To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore

On Fri, Dec 1, 2023 at 3:54 PM Zach FettersMoore via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Zach FettersMoore <zach.fetters@apollographql.com>
>
> When there are multiple subtrees present in a repository and they are
> all using 'git subtree split', the 'split' command can take a
> significant (and constantly growing) amount of time to run even when
> using the '--rejoin' flag. This is due to the fact that when processing
> commits to determine the last known split to start from when looking
> for changes, if there has been a split/merge done from another subtree
> there will be 2 split commits, one mainline and one subtree, for the
> second subtree that are part of the processing. The non-mainline
> subtree split commit will cause the processing to always need to search
> the entire history of the given subtree as part of its processing even
> though those commits are totally irrelevant to the current subtree
> split being run.
>
> To see this in practice you can use the open source GitHub repo
> 'apollo-ios-dev' and do the following in order:
>
> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
>  directories
> -Create a commit containing these changes
> -Do a split on apollo-ios-codegen
>    - Do a fetch on the subtree repo
>       - git fetch git@github.com:apollographql/apollo-ios-codegen.git
>    - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

Now I get the following without your patch at this step:

$ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
[...]/libexec/git-core/git-subtree: 318: Maximum function recursion
depth (1000) reached

Line 318 in git-subtree.sh contains the following:

missed=$(cache_miss "$@") || exit $?

With your patch it seems to work:

$ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
Merge made by the 'ort' strategy.
e274aed3ba6d0659fb4cc014587cf31c1e8df7f4

>    - Depending on the current state of the 'apollo-ios-dev' repo
>      you may see the issue at this point if the last split was on
>      apollo-ios

I guess I see it, but it seems a bit different for me than what you describe.

Otherwise your patch looks good to me now.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-04 11:08           ` Christian Couder
@ 2023-12-11 15:39             ` Zach FettersMoore
  2023-12-12 16:06               ` Christian Couder
  0 siblings, 1 reply; 30+ messages in thread
From: Zach FettersMoore @ 2023-12-11 15:39 UTC (permalink / raw)
  To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git

>>
>> From: Zach FettersMoore <zach.fetters@apollographql.com>
>>
>> When there are multiple subtrees present in a repository and they are
>> all using 'git subtree split', the 'split' command can take a
>> significant (and constantly growing) amount of time to run even when
>> using the '--rejoin' flag. This is due to the fact that when processing
>> commits to determine the last known split to start from when looking
>> for changes, if there has been a split/merge done from another subtree
>> there will be 2 split commits, one mainline and one subtree, for the
>> second subtree that are part of the processing. The non-mainline
>> subtree split commit will cause the processing to always need to search
>> the entire history of the given subtree as part of its processing even
>> though those commits are totally irrelevant to the current subtree
>> split being run.
>>
>> To see this in practice you can use the open source GitHub repo
>> 'apollo-ios-dev' and do the following in order:
>>
>> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
>> directories
>> -Create a commit containing these changes
>> -Do a split on apollo-ios-codegen
>> - Do a fetch on the subtree repo
>> - git fetch git@github.com:apollographql/apollo-ios-codegen.git
>> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin

> Now I get the following without your patch at this step:
>
> $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> [...]/libexec/git-core/git-subtree: 318: Maximum function recursion
> depth (1000) reached
>
> Line 318 in git-subtree.sh contains the following:
>
> missed=$(cache_miss "$@") || exit $?
>
> With your patch it seems to work:
>
> $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> Merge made by the 'ort' strategy.
> e274aed3ba6d0659fb4cc014587cf31c1e8df7f4

Looking into this some it looks like it could be a bash config
difference? My machine always runs it all the way through vs
failing for recursion depth. Although that would also be an issue
which is solved by this fix.

>> - Depending on the current state of the 'apollo-ios-dev' repo
>> you may see the issue at this point if the last split was on
>> apollo-ios

> I guess I see it, but it seems a bit different for me than what you describe.
>
> Otherwise your patch looks good to me now.

Yea I hadn't accounted for/realized that some folks may see a recursion
depth error vs it just taking a long time like it does for me. Also what
I was saying with the apollo-ios-dev repo is you may not need all the steps
to see the issue, because its possible the state of the repo is already
in a position to display the issue just by doing a split on
apollo-ios-codegen.

Great! Thanks again for all the feedback and guidance! Is there anything
else I need to do to get this across the finish line and merged in?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-11 15:39             ` Zach FettersMoore
@ 2023-12-12 16:06               ` Christian Couder
  2023-12-12 22:28                 ` Junio C Hamano
  2023-12-20 15:25                 ` Christian Couder
  0 siblings, 2 replies; 30+ messages in thread
From: Christian Couder @ 2023-12-12 16:06 UTC (permalink / raw)
  To: Zach FettersMoore; +Cc: Zach FettersMoore via GitGitGadget, git

On Mon, Dec 11, 2023 at 4:39 PM Zach FettersMoore
<zach.fetters@apollographql.com> wrote:
>
> >>
> >> From: Zach FettersMoore <zach.fetters@apollographql.com>

> >> To see this in practice you can use the open source GitHub repo
> >> 'apollo-ios-dev' and do the following in order:
> >>
> >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
> >> directories
> >> -Create a commit containing these changes
> >> -Do a split on apollo-ios-codegen
> >> - Do a fetch on the subtree repo
> >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git
> >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
>
> > Now I get the following without your patch at this step:
> >
> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion
> > depth (1000) reached
> >
> > Line 318 in git-subtree.sh contains the following:
> >
> > missed=$(cache_miss "$@") || exit $?
> >
> > With your patch it seems to work:
> >
> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > Merge made by the 'ort' strategy.
> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
>
> Looking into this some it looks like it could be a bash config
> difference? My machine always runs it all the way through vs
> failing for recursion depth. Although that would also be an issue
> which is solved by this fix.

I use Ubuntu where /bin/sh is dash so my current guess is that dash
might have a smaller recursion limit than bash.

I just found https://stackoverflow.com/questions/69493528/git-subtree-maximum-function-recursion-depth
which seems to agree.

I will try to test using bash soon.

> >> - Depending on the current state of the 'apollo-ios-dev' repo
> >> you may see the issue at this point if the last split was on
> >> apollo-ios
>
> > I guess I see it, but it seems a bit different for me than what you describe.
> >
> > Otherwise your patch looks good to me now.
>
> Yea I hadn't accounted for/realized that some folks may see a recursion
> depth error vs it just taking a long time like it does for me. Also what
> I was saying with the apollo-ios-dev repo is you may not need all the steps
> to see the issue, because its possible the state of the repo is already
> in a position to display the issue just by doing a split on
> apollo-ios-codegen.
>
> Great! Thanks again for all the feedback and guidance! Is there anything
> else I need to do to get this across the finish line and merged in?

Hopefully I will be able to confirm I see the same error as you with
bash soon, and it will be enough to get it merged.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-12 16:06               ` Christian Couder
@ 2023-12-12 22:28                 ` Junio C Hamano
  2023-12-13 15:20                   ` Zach FettersMoore
  2023-12-20 15:25                 ` Christian Couder
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2023-12-12 22:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git

Christian Couder <christian.couder@gmail.com> writes:

>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
>> > Merge made by the 'ort' strategy.
>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
>>
>> Looking into this some it looks like it could be a bash config
>> difference? My machine always runs it all the way through vs
>> failing for recursion depth. Although that would also be an issue
>> which is solved by this fix.
>
> I use Ubuntu where /bin/sh is dash so my current guess is that dash
> might have a smaller recursion limit than bash.

That sounds quite bad.  Does it have to be recursive (iow, if we can
rewrite the logic to be iterative instead, that would be a much better
way to fix the issue)?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-12 22:28                 ` Junio C Hamano
@ 2023-12-13 15:20                   ` Zach FettersMoore
  2024-01-03 16:33                     ` Christian Couder
  0 siblings, 1 reply; 30+ messages in thread
From: Zach FettersMoore @ 2023-12-13 15:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, Zach FettersMoore via GitGitGadget, git

Christian Couder <christian.couder@gmail.com> writes:

>>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
>>> > Merge made by the 'ort' strategy.
>>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
>>>
>>> Looking into this some it looks like it could be a bash config
>>> difference? My machine always runs it all the way through vs
>>> failing for recursion depth. Although that would also be an issue
>>> which is solved by this fix.
>>
>> I use Ubuntu where /bin/sh is dash so my current guess is that dash
>> might have a smaller recursion limit than bash.
>
> That sounds quite bad. Does it have to be recursive (iow, if we can
> rewrite the logic to be iterative instead, that would be a much better
> way to fix the issue)?

I don't think an iterative vs recursive approach fixes this
particular issue, the root of the issue this patch is fixing
is that lots of commits from the history of subtrees not
being acted upon are being processed when they don't need to
be. So the iterative approach would likely resolve the
recursion limit issue for some shells, but in my instance
I don't see a recursion limit error, it just takes an
extraordinary amount of time to run the split command
because of all the unnecessary processing which needs to be
avoided which this patch fixes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-12 16:06               ` Christian Couder
  2023-12-12 22:28                 ` Junio C Hamano
@ 2023-12-20 15:25                 ` Christian Couder
  2024-01-25 10:09                   ` Christian Couder
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Couder @ 2023-12-20 15:25 UTC (permalink / raw)
  To: Zach FettersMoore; +Cc: Zach FettersMoore via GitGitGadget, git

On Tue, Dec 12, 2023 at 5:06 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Dec 11, 2023 at 4:39 PM Zach FettersMoore
> <zach.fetters@apollographql.com> wrote:
> >
> > >>
> > >> From: Zach FettersMoore <zach.fetters@apollographql.com>
>
> > >> To see this in practice you can use the open source GitHub repo
> > >> 'apollo-ios-dev' and do the following in order:
> > >>
> > >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
> > >> directories
> > >> -Create a commit containing these changes
> > >> -Do a split on apollo-ios-codegen
> > >> - Do a fetch on the subtree repo
> > >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git
> > >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> >
> > > Now I get the following without your patch at this step:
> > >
> > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion
> > > depth (1000) reached
> > >
> > > Line 318 in git-subtree.sh contains the following:
> > >
> > > missed=$(cache_miss "$@") || exit $?
> > >
> > > With your patch it seems to work:
> > >
> > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > > Merge made by the 'ort' strategy.
> > > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
> >
> > Looking into this some it looks like it could be a bash config
> > difference? My machine always runs it all the way through vs
> > failing for recursion depth. Although that would also be an issue
> > which is solved by this fix.
>
> I use Ubuntu where /bin/sh is dash so my current guess is that dash
> might have a smaller recursion limit than bash.
>
> I just found https://stackoverflow.com/questions/69493528/git-subtree-maximum-function-recursion-depth
> which seems to agree.
>
> I will try to test using bash soon.

Sorry, to not have tried earlier before with bash.

Now I have tried it and yeah it works fine with you patch, while
without it the last step of the reproduction recipe takes a lot of
time and results in a core dump:

/home/christian/libexec/git-core/git-subtree: line 924: 857920 Done
                eval "$grl"
    857921 Segmentation fault      (core dumped) | while read rev parents; do
   process_split_commit "$rev" "$parents";
done

So overall I think your patch is great! Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-13 15:20                   ` Zach FettersMoore
@ 2024-01-03 16:33                     ` Christian Couder
  0 siblings, 0 replies; 30+ messages in thread
From: Christian Couder @ 2024-01-03 16:33 UTC (permalink / raw)
  To: Zach FettersMoore; +Cc: Junio C Hamano, Zach FettersMoore via GitGitGadget, git

(Sorry for replying only to Zach instead of everyone previously.)

On Wed, Dec 13, 2023 at 4:20 PM Zach FettersMoore
<zach.fetters@apollographql.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> >>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> >>> > Merge made by the 'ort' strategy.
> >>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
> >>>
> >>> Looking into this some it looks like it could be a bash config
> >>> difference? My machine always runs it all the way through vs
> >>> failing for recursion depth. Although that would also be an issue
> >>> which is solved by this fix.
> >>
> >> I use Ubuntu where /bin/sh is dash so my current guess is that dash
> >> might have a smaller recursion limit than bash.
> >
> > That sounds quite bad. Does it have to be recursive (iow, if we can
> > rewrite the logic to be iterative instead, that would be a much better
> > way to fix the issue)?
>
> I don't think an iterative vs recursive approach fixes this
> particular issue, the root of the issue this patch is fixing
> is that lots of commits from the history of subtrees not
> being acted upon are being processed when they don't need to
> be. So the iterative approach would likely resolve the
> recursion limit issue for some shells, but in my instance
> I don't see a recursion limit error, it just takes an
> extraordinary amount of time to run the split command
> because of all the unnecessary processing which needs to be
> avoided which this patch fixes.

Fixing possible recursion might be an improvement on top of your
patch. But without your patch the test case it describes would anyway
take a lot more time than seems necessary. So I agree that your patch
should definitely be merged anyway.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2023-12-20 15:25                 ` Christian Couder
@ 2024-01-25 10:09                   ` Christian Couder
  2024-01-25 16:38                     ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Couder @ 2024-01-25 10:09 UTC (permalink / raw)
  To: Zach FettersMoore, Junio C Hamano; +Cc: Zach FettersMoore via GitGitGadget, git

It seems that this topic has fallen into the cracks or something,
while the associated pch looks good to me.

On Wed, Dec 20, 2023 at 4:25 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Tue, Dec 12, 2023 at 5:06 PM Christian Couder
> <christian.couder@gmail.com> wrote:
> >
> > On Mon, Dec 11, 2023 at 4:39 PM Zach FettersMoore
> > <zach.fetters@apollographql.com> wrote:
> > >
> > > >>
> > > >> From: Zach FettersMoore <zach.fetters@apollographql.com>
> >
> > > >> To see this in practice you can use the open source GitHub repo
> > > >> 'apollo-ios-dev' and do the following in order:
> > > >>
> > > >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
> > > >> directories
> > > >> -Create a commit containing these changes
> > > >> -Do a split on apollo-ios-codegen
> > > >> - Do a fetch on the subtree repo
> > > >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git
> > > >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > >
> > > > Now I get the following without your patch at this step:
> > > >
> > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > > > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion
> > > > depth (1000) reached
> > > >
> > > > Line 318 in git-subtree.sh contains the following:
> > > >
> > > > missed=$(cache_miss "$@") || exit $?
> > > >
> > > > With your patch it seems to work:
> > > >
> > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
> > > > Merge made by the 'ort' strategy.
> > > > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
> > >
> > > Looking into this some it looks like it could be a bash config
> > > difference? My machine always runs it all the way through vs
> > > failing for recursion depth. Although that would also be an issue
> > > which is solved by this fix.
> >
> > I use Ubuntu where /bin/sh is dash so my current guess is that dash
> > might have a smaller recursion limit than bash.
> >
> > I just found https://stackoverflow.com/questions/69493528/git-subtree-maximum-function-recursion-depth
> > which seems to agree.
> >
> > I will try to test using bash soon.
>
> Sorry, to not have tried earlier before with bash.
>
> Now I have tried it and yeah it works fine with you patch, while
> without it the last step of the reproduction recipe takes a lot of
> time and results in a core dump:
>
> /home/christian/libexec/git-core/git-subtree: line 924: 857920 Done
>                 eval "$grl"
>     857921 Segmentation fault      (core dumped) | while read rev parents; do
>    process_split_commit "$rev" "$parents";
> done
>
> So overall I think your patch is great! Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2024-01-25 10:09                   ` Christian Couder
@ 2024-01-25 16:38                     ` Junio C Hamano
  2024-01-25 18:52                       ` Christian Couder
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2024-01-25 16:38 UTC (permalink / raw)
  To: Christian Couder
  Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git

Christian Couder <christian.couder@gmail.com> writes:

> It seems that this topic has fallen into the cracks or something,
> while the associated pch looks good to me.

Yeah, it wasn't clear to me that your message you are responding to
was your Reviewed-by:.  If I recall my impression correctly from the
time I skimmed its proposed log message the last time, it focused on
describing a single failure case the author encountered in the real
world and said that the patch changed the behaviour to correct that
single case, and was not very clear if it was meant as a general
fix.  Is the patch text, including its proposed patch description,
satisfactory to you?  In other words, is the above your Reviewed-by:?

Thanks for pinging the thread.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2024-01-25 16:38                     ` Junio C Hamano
@ 2024-01-25 18:52                       ` Christian Couder
  2024-01-25 18:56                         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Couder @ 2024-01-25 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git

On Thu, Jan 25, 2024 at 5:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > It seems that this topic has fallen into the cracks or something,
> > while the associated pch looks good to me.
>
> Yeah, it wasn't clear to me that your message you are responding to
> was your Reviewed-by:.  If I recall my impression correctly from the
> time I skimmed its proposed log message the last time, it focused on
> describing a single failure case the author encountered in the real
> world and said that the patch changed the behaviour to correct that
> single case, and was not very clear if it was meant as a general
> fix.  Is the patch text, including its proposed patch description,
> satisfactory to you?  In other words, is the above your Reviewed-by:?

Yes, it's satisfactory for me, and I am Ok to give my Reviewed-by:, thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
  2024-01-25 18:52                       ` Christian Couder
@ 2024-01-25 18:56                         ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2024-01-25 18:56 UTC (permalink / raw)
  To: Christian Couder
  Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git

Christian Couder <christian.couder@gmail.com> writes:

> On Thu, Jan 25, 2024 at 5:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Christian Couder <christian.couder@gmail.com> writes:
>>
>> > It seems that this topic has fallen into the cracks or something,
>> > while the associated pch looks good to me.
>>
>> Yeah, it wasn't clear to me that your message you are responding to
>> was your Reviewed-by:.  If I recall my impression correctly from the
>> time I skimmed its proposed log message the last time, it focused on
>> describing a single failure case the author encountered in the real
>> world and said that the patch changed the behaviour to correct that
>> single case, and was not very clear if it was meant as a general
>> fix.  Is the patch text, including its proposed patch description,
>> satisfactory to you?  In other words, is the above your Reviewed-by:?
>
> Yes, it's satisfactory for me, and I am Ok to give my Reviewed-by:, thanks!

Thanks.  Will queue.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-01-25 18:56 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
2023-09-18 23:31 ` Junio C Hamano
2023-09-19  1:04 ` Junio C Hamano
2023-10-26 19:59   ` Zach FettersMoore
2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget
2023-09-22 16:25   ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget
2023-09-22 16:25   ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
2023-09-29 20:32   ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
2023-09-29 20:32     ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget
2023-09-29 20:32     ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget
2023-09-29 20:33     ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget
2023-10-17 20:02       ` Zach FettersMoore
2023-10-26 19:17     ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget
2023-11-18 11:28       ` Christian Couder
2023-11-28 21:04         ` Zach FettersMoore
2023-11-28 21:17       ` [PATCH v5] " Zach FettersMoore via GitGitGadget
2023-11-30 20:33         ` Christian Couder
2023-11-30 21:01           ` Zach FettersMoore
2023-12-01 14:54         ` [PATCH v6] " Zach FettersMoore via GitGitGadget
2023-12-04 11:08           ` Christian Couder
2023-12-11 15:39             ` Zach FettersMoore
2023-12-12 16:06               ` Christian Couder
2023-12-12 22:28                 ` Junio C Hamano
2023-12-13 15:20                   ` Zach FettersMoore
2024-01-03 16:33                     ` Christian Couder
2023-12-20 15:25                 ` Christian Couder
2024-01-25 10:09                   ` Christian Couder
2024-01-25 16:38                     ` Junio C Hamano
2024-01-25 18:52                       ` Christian Couder
2024-01-25 18:56                         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).