All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Tom Clarkson via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Avery Pennarun <apenwarr@gmail.com>,
	Ed Maste <emaste@freebsd.org>, Tom Clarkson <tom@tqclarkson.com>,
	Tom Clarkson <tom@tqclarkson.com>
Subject: Re: [PATCH v2 6/7] subtree: more robustly distinguish subtree and mainline commits
Date: Wed, 7 Oct 2020 21:42:44 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2010072128130.50@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <a7aaedfed3785c6ca693f60f05e76156f68a5d39.1602021913.git.gitgitgadget@gmail.com>

Hi,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> From: Tom Clarkson <tom@tqclarkson.com>
>
> Prevent a mainline commit without $dir being treated as a subtree
> commit and pulling in the entire mainline history. Any valid subtree
> commit will have only valid subtree commits as parents, which will be
> unchanged by check_parents.

I feel like this is only half the picture because I have a hard time
stitching these two sentences together.

After studying the code and your patch a bit, it appears to me that
`process_split_commit()` calls `check_parents()` first, which will call
`process_split_commit()` for all as yet unmapped parents. So basically, it
recurses until it found a commit all of whose parents are already mapped,
then permeates that information all the way back.

Doesn't this cause serious issues with stack overflows and all for long
commit histories?

> Signed-off-by: Tom Clarkson <tom@tqclarkson.com>
> ---
>  contrib/subtree/git-subtree.sh | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index e56621a986..fa6293b372 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -224,8 +224,6 @@ cache_setup () {
>  	fi
>  	mkdir -p "$cachedir" ||
>  		die "Can't create new cachedir: $cachedir"
> -	mkdir -p "$cachedir/notree" ||
> -		die "Can't create new cachedir: $cachedir/notree"

It might make sense to talk about this a bit in the commit message.
Essentially, you are replacing the `notree/<rev>` files by mapping `<rev>`
to the empty string.

This makes me wonder, again, whether the file system layout of the cache
can hold up to the demands. If a main project were to merge a subtree
with, say, 10 million commits, wouldn't that mean that `git subtree` would
now fill one directory with 10 million files? I cannot imagine that this
performs well, still.

>  	debug "Using cachedir: $cachedir" >&2
>  }
>
> @@ -255,18 +253,11 @@ check_parents () {
>  	local indent=$(($2 + 1))
>  	for miss in $missed
>  	do
> -		if ! test -r "$cachedir/notree/$miss"
> -		then
> -			debug "  unprocessed parent commit: $miss ($indent)"
> -			process_split_commit "$miss" "" "$indent"
> -		fi
> +		debug "  unprocessed parent commit: $miss ($indent)"
> +		process_split_commit "$miss" "" "$indent"

That makes sense to me, as the `missed` variable only contains as yet
unmapped commits, therefore we do not have to have an equivalent `test -r`
check.

Ciao,
Dscho

>  	done
>  }
>
> -set_notree () {
> -	echo "1" > "$cachedir/notree/$1"
> -}
> -
>  cache_set () {
>  	oldrev="$1"
>  	newrev="$2"
> @@ -719,11 +710,18 @@ process_split_commit () {
>  	# vs. a mainline commit?  Does it matter?
>  	if test -z "$tree"
>  	then
> -		set_notree "$rev"
>  		if test -n "$newparents"
>  		then
> -			cache_set "$rev" "$rev"
> +			if test "$newparents" = "$parents"
> +			then
> +				# if all parents were subtrees, this can be a subtree commit
> +				cache_set "$rev" "$rev"
> +			else
> +				# a mainline commit with tree missing is equivalent to the initial commit
> +				cache_set "$rev" ""
> +			fi
>  		else
> +			# no parents with valid subtree mappings means a commit prior to subtree add
>  			cache_set "$rev" ""
>  		fi
>  		return
> --
> gitgitgadget
>
>

  reply	other threads:[~2020-10-07 19:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11  5:49 [PATCH 0/7] subtree: Fix handling of complex history Tom Clarkson via GitGitGadget
2020-05-11  5:49 ` [PATCH 1/7] subtree: handle multiple parents passed to cache_miss Tom Clarkson via GitGitGadget
2020-05-11  5:49 ` [PATCH 2/7] subtree: exclude commits predating add from recursive processing Tom Clarkson via GitGitGadget
2020-05-11  5:49 ` [PATCH 3/7] subtree: persist cache between split runs Tom Clarkson via GitGitGadget
2020-05-11  5:49 ` [PATCH 4/7] subtree: add git subtree map command Tom Clarkson via GitGitGadget
2020-05-11  5:49 ` [PATCH 5/7] subtree: add git subtree use and ignore commands Tom Clarkson via GitGitGadget
2020-05-11  5:50 ` [PATCH 6/7] subtree: more robustly distinguish subtree and mainline commits Tom Clarkson via GitGitGadget
2020-05-11  5:50 ` [PATCH 7/7] subtree: document new subtree commands Tom Clarkson via GitGitGadget
2020-10-04 17:52 ` [PATCH 0/7] subtree: Fix handling of complex history Ed Maste
2020-10-04 19:27   ` Johannes Schindelin
2020-10-05 16:47     ` Junio C Hamano
2020-10-05 21:37     ` Ed Maste
2020-10-07 16:31       ` Johannes Schindelin
2020-10-06 22:05 ` [PATCH v2 " Tom Clarkson via GitGitGadget
2020-10-06 22:05   ` [PATCH v2 1/7] subtree: handle multiple parents passed to cache_miss Tom Clarkson via GitGitGadget
2020-10-07 13:12     ` Ed Maste
2020-10-06 22:05   ` [PATCH v2 2/7] subtree: exclude commits predating add from recursive processing Tom Clarkson via GitGitGadget
2020-10-07 15:36     ` Johannes Schindelin
2020-10-06 22:05   ` [PATCH v2 3/7] subtree: persist cache between split runs Tom Clarkson via GitGitGadget
2020-10-07 16:06     ` Johannes Schindelin
2020-10-06 22:05   ` [PATCH v2 4/7] subtree: add git subtree map command Tom Clarkson via GitGitGadget
2020-10-06 22:05   ` [PATCH v2 5/7] subtree: add git subtree use and ignore commands Tom Clarkson via GitGitGadget
2020-10-07 16:29     ` Johannes Schindelin
2020-10-06 22:05   ` [PATCH v2 6/7] subtree: more robustly distinguish subtree and mainline commits Tom Clarkson via GitGitGadget
2020-10-07 19:42     ` Johannes Schindelin [this message]
2020-10-06 22:05   ` [PATCH v2 7/7] subtree: document new subtree commands Tom Clarkson via GitGitGadget
2020-10-07 19:43     ` Johannes Schindelin
2020-10-07 19:46   ` [PATCH v2 0/7] subtree: Fix handling of complex history Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2010072128130.50@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=apenwarr@gmail.com \
    --cc=emaste@freebsd.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=tom@tqclarkson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.