git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Clarkson <tqclarkson@icloud.com>
To: Ed Maste <emaste@freebsd.org>
Cc: git mailing list <git@vger.kernel.org>
Subject: Re: git-subtree split misbehaviour with a commit having empty ls-tree for the specified subdir
Date: Mon, 23 Dec 2019 01:01:47 +1100	[thread overview]
Message-ID: <905A443A-7E2B-45C2-985F-46C3E295670A@icloud.com> (raw)
In-Reply-To: <CAPyFy2Ar+OncJtgZZyAzxs0PkXy5rSU6ALS+MimK8x5TzWjLug@mail.gmail.com>


> On 21 Dec 2019, at 2:56 am, Ed Maste <emaste@freebsd.org> wrote:
> 
> On Wed, 18 Dec 2019 at 19:57, Tom Clarkson <tqclarkson@icloud.com> wrote:
>> 
>>> Overall I think your proposed algorithm is reasonable (even though I
>>> think it won't address some of the cases in our repo). Will your
>>> algorithm allow us to pass $dir to git rev-list, for the initial
>>> split?
>> 
>> Is this just for performance reasons? As I understand it that was left out because it would exclude relevant commits on an existing subtree, but it could make sense as an optimization for the first split of a large repo.
> 
> Yes, it's for performance reasons on a first split that I'd like to
> see it. On the FreeBSD repo the difference is some 40 minutes vs. a
> few seconds.

I tried out the dir filter after getting the full revlist to produce a reasonable result.  It is a lot faster, but unfortunately it doesn’t produce the same output - If you use actual parent commits, you have to process the 50k irrelevant ones to keep a valid path. If you let rev-list find what it thinks is the most recent relevant change, the subtree merges resolve to nothing.

>> So the process becomes something like
>> 
>> # clear the cache - shouldn't usually be necessary, but it's a universal debugging step.
>> git subtree clear-cache --prefix=dir
>> 
>> # ref and all its parents are before subtree add. Treat any children as inital commits.
>> git subtree ignore --prefix=dir ref
>> 
>> # ref and all its parents are known subtree commits to be included without transformation.
>> git subtree existing --prefix=dir ref
>> 
>> # Override an arbitrary mapping, either for performance or because that commit is problematic
>> git subtree map --prefix=dir mainline-ref subtree-ref
>> 
>> # Run the existing algorithm, but skipping anything defined manually
>> git subtree split --prefix=dir
> 
> This sounds about perfect.
> 
>>> For a concrete example (from the repo at
>>> https://github.com/freebsd/freebsd), 7f3a50b3b9f8 is a mainline commit
>>> that added a new subtree, from 9ee787636908. I think that if I could
>>> inform subtree split that 9ee787636908 is the root it would work for
>>> me.
>> 
>> Aside from the metadata, that one is a bit different from a standard subtree add in that it copies three folders from the subtree repo rather than the root - so the contents of contrib/elftoolchain will never exactly match the actual elftoolchain repo, and 9ee787636908 is neither mainline nor subtree as subtree split understands it.
> 
> Fair enough, and we have lots of examples of slightly strange history
> in svn that svn2git represents in interesting ways.
> 
>> If you ignore 9ee787636908, the resulting subtree will be fairly clean, but won’t have much of a relationship to the external repo.
>> 
>> If you treat 9ee787636908 as an existing subtree, the second commit on your subtree will be based on 7f3a50b3b9f8, which deletes most of the contents of the subtree. You should still be able to merge in updates from the external repo, but if you try to push changes upstream the deletion will break things.
> 
> I think this is fine - our main goal here is to be able to update
> contrib/ code within FreeBSD as we do today with svn, and we may well
> always have some changes that are never intended to be pushed
> upstream.
> 
> Continuing the example from our repo, there is more history in the
> "subtree" already, with 061ef1f9424f as the head. ca8624403626 is the
> merge to mainline.


If you want to try out my update, it’s at  https://github.com/gitgitgadget/git/pull/493. The commands I ended up with were

git subtree ignore --clear-cache --prefix=contrib/elftoolchain 4d43158
git subtree use --prefix=contrib/elftoolchain 9e78763
git subtree split --prefix=contrib/elftoolchain 53f2672ff78be42389cf41a8258f6e9ce36808fb

On my machine, ignore takes about 2 minutes to flag 200k commits as irrelevant. The split takes around 15 to go through the remaining 50k.

  reply	other threads:[~2019-12-22 14:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22 16:55 git-subtree split misbehaviour with a commit having empty ls-tree for the specified subdir Ed Maste
2019-12-18  0:17 ` Tom Clarkson
2019-12-18 10:23   ` Ed Maste
2019-12-19  0:57     ` Tom Clarkson
2019-12-20 15:56       ` Ed Maste
2019-12-22 14:01         ` Tom Clarkson [this message]
2020-01-21 22:36           ` Ed Maste
     [not found]         ` <DB65AE2F-12DE-43B7-8B20-4E173794CAF2@icloud.com>
2020-04-28 18:08           ` Ed Maste
2020-06-17 14:46         ` Ed Maste
2020-06-18  1:13           ` Tom Clarkson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=905A443A-7E2B-45C2-985F-46C3E295670A@icloud.com \
    --to=tqclarkson@icloud.com \
    --cc=emaste@freebsd.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).