All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: Git Mailing List <git@vger.kernel.org>, entwicklung@pengutronix.de
Subject: Re: time needed to rebase shortend by using --onto?
Date: Thu, 27 May 2021 16:08:32 -0700	[thread overview]
Message-ID: <CABPp-BEVME5Gx=F4HWHBb_0wn6XJF==DzVLo2i1xj63BB+_jtw@mail.gmail.com> (raw)
In-Reply-To: <20210527215947.g2mnds6zj5uv5mjq@pengutronix.de>

On Thu, May 27, 2021 at 2:59 PM Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
>
> Hello,
>
> On Wed, May 26, 2021 at 07:38:08AM -0700, Elijah Newren wrote:
> > On Wed, May 26, 2021 at 3:13 AM Uwe Kleine-König
> > <u.kleine-koenig@pengutronix.de> wrote:
> > > I have a kernel topic branch containing 4 patches on top of Linux v5.4.
> > > (I didn't speak to the affected customer, so I cannot easily share the
> > > patch stack. If need be I can probably anonymize it or ask if I can
> > > publish the patches.)
> > >
> > > It rebases clean on v5.10:
> > >
> > >         $ time git rebase v5.10
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Successfully rebased and updated detached HEAD.
> > >
> > >         real    3m47.841s
> > >         user    1m25.706s
> > >         sys     0m11.181s
> > >
> > > If I start with the same rev checked out and explicitly specify the
> > > merge base, the rebase process is considerably faster:
> > >
> > >         $ time git rebase --onto v5.10 v5.4
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > >         Successfully rebased and updated detached HEAD.
> > >
> > >         real    1m20.588s
> > >         user    1m12.645s
> > >         sys     0m6.733s

Note: In your original report you had rename detection and it clearly
took a significant amount of time...

> > >
> > > Is there some relevant complexity in the first invocation I'm not seeing
> > > that explains it takes more than the double time? I would have expected
> > > that
> > >
> > >         git rebase v5.10
> > >
> > > does the same as:
> > >
> > >         git rebase --onto v5.10 $(git merge-base HEAD v5.10)
> > >
> > > . (FTR:
> > >
> > >         $ time git merge-base HEAD v5.10
> > >         219d54332a09e8d8741c1e1982f5eae56099de85
> > >
> > >         real    0m0.158s
> > >         user    0m0.105s
> > >         sys     0m0.052s
> > >
> > > , 219d5433 is v5.4 as expected.
> >
> > That does seem surprising, though if an automatic gc completed between
> > the two commands that could certainly explain it.  If that theory is
> > correct, it would suggest that it'd be difficult for you to reproduce;
>
> This reproduces just fine. The repository is quite big and it is slow at
> times. With the same tree on a different machine, the rebase is quicker,
> but the factor 2 between the two different commands is visible there,
> too:
>
> uwe@taurus:~/gsrc/linux$ git checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> HEAD is now at bc2e99c9c9e0 [...]
>
> uwe@taurus:~/gsrc/linux$ time git rebase v5.10
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> Successfully rebased and updated detached HEAD.
>
> real    0m20.737s
> user    0m14.188s
> sys     0m3.767s
>
> uwe@taurus:~/gsrc/linux$ git checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> HEAD is now at bc2e99c9c9e0 [...]
>
> uwe@taurus:~/gsrc/linux$ time git rebase --onto v5.10 v5.4
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> Successfully rebased and updated detached HEAD.
>
> real    0m12.129s
> user    0m7.196s
> sys     0m3.141s
>
> (This is with a slightly newer git: 2.30.2-1 from Debian)

And here, there was no rename detection so this isn't the same thing
anymore.  You could try setting merge.renameLimit higher.

However, the 7-8 second difference (and the likely large differences
between 5.4 and 5.10) do suggest that Junio's hunch that fork-point
behavior being at play could be an issue in these two commands.

> Then I repeated the test with git 2.32.0-rc1 (wgit is just calling
> bin-wrappers/git in my git working copy):
>
> uwe@taurus:~/gsrc/linux$ wgit version
> git version 2.32.0.rc1
>
> uwe@taurus:~/gsrc/linux$ wgit checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> HEAD is now at bc2e99c9c9e0 [...]
>
> uwe@taurus:~/gsrc/linux$ time wgit rebase v5.10
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> Successfully rebased and updated detached HEAD.
>
> real    0m19.438s
> user    0m13.629s
> sys     0m3.299s
>
> uwe@taurus:~/gsrc/linux$ wgit checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> HEAD is now at bc2e99c9c9e0 [...]
>
> uwe@taurus:~/gsrc/linux$ time wgit rebase --onto v5.10 v5.4
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> warning: inexact rename detection was skipped due to too many files.
> warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> Successfully rebased and updated detached HEAD.
>
> real    0m13.848s
> user    0m8.315s
> sys     0m3.182s
>
> So the surprise persists.

Yeah, with no rename detection, the newer git version isn't going to
make a bit of difference.

> > running again with either command would give you something closer to
> > the lower time both times.  Is that the case?  (Also, what's the
> > output of "git count-objects -v"?)
>
> After the above commands I have:
>
>         count: 3203
>         size: 17664
>         in-pack: 4763753
>         packs: 11
>         size-pack: 1273957
>         prune-packable: 19
>         garbage: 0
>         size-garbage: 0

So, not freshly packed, but not in need of an automatic gc either.

>         alternate: /home/uwe/var/gitstore/linux.git/objects

You've got an alternate?  How well packed is it?  (What does "git
count-objects -v" in that other repo show?)

>
> (On the repository I did this initially I have:
>
>         warning: garbage found: .git/objects/pack/pack-864148a84c0524073ed8c8aa1a76155d5c677879.pack.temp
>         warning: garbage found: /ptx/src/git/linux.git/objects/pack/tmp_pack_X9gHnq
>         count: 2652
>         size: 14640
>         in-pack: 2117015
>         packs: 8
>         size-pack: 574167
>         prune-packable: 856
>         garbage: 2
>         size-garbage: 1114236
>         alternate: /ptx/src/git/linux.git/objects
>
> (Is the garbage a reason this is so slow? Can I just remove the two
> files pointed out?)

If there isn't some still-running git operation that is fetching and
writing to these files, then yes they can be cleaned out.  I doubt
they'd make too much of a difference, though.  I was more curious if
you went from say 10000 loose objects to ~0, or from 50+ packs down to
1 between operations due to an automatic gc completing.

> > I'd love to try this with git-2.32.0-rc1 (or even my not-yet-upstream
> > patches that optimize even further) with adding "--strategy=ort" to
> > your rebase command to see how much of a timing difference it makes.
> > Any chance the patches could either be published, or you could retry
> > with git-2.32.0-rc1 and add the --strategy=ort command line option to
> > your rebase command(s)?
>
> With --strategy=ort added I have:
>
> uwe@taurus:~/gsrc/linux$ time wgit rebase --strategy=ort v5.10
> Successfully rebased and updated detached HEAD.
>
> real    0m19.202s
> user    0m12.724s
> sys     0m2.961s
>
> [...]
>
> uwe@taurus:~/gsrc/linux$ time wgit rebase --strategy=ort --onto v5.10 v5.4
> Successfully rebased and updated detached HEAD.
>
> real    0m12.395s
> user    0m6.638s
> sys     0m3.284s
>
> So the warnings about inexact rename detection don't appear and it's a
> bit faster, but I still see the timing difference between these two
> commands.

Right, this says that --strategy=ort WITH rename detection is as fast
as the default --strategy=recursive WITHOUT rename detection.

It's not a fair comparison (you'd need to set merge.renameLimit higher
and re-run the cases where you had warnings), but is interesting
nonetheless.  It basically suggests that rename detection comes for
free with the ort strategy.

> I assume you are still interested in seeing this branch? I think
> anonymising it shouldn't be so hard, the patches are not so big. I'll
> modify the branch to make it shareable and assuming the problem still
> reproduces with it will share it with you.

Thanks.

  parent reply	other threads:[~2021-05-27 23:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 10:09 time needed to rebase shortend by using --onto? Uwe Kleine-König
2021-05-26 11:04 ` Bagas Sanjaya
2021-05-26 14:38 ` Elijah Newren
2021-05-27 21:59   ` Uwe Kleine-König
2021-05-27 22:15     ` Uwe Kleine-König
2021-05-28  5:38       ` Elijah Newren
2021-05-27 23:08     ` Elijah Newren [this message]
2021-05-28 21:40       ` Uwe Kleine-König
2021-05-28 22:26         ` Elijah Newren
2021-05-29 16:59         ` Felipe Contreras
2021-05-26 22:18 ` Junio C Hamano
2021-05-27 22:16   ` Uwe Kleine-König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BEVME5Gx=F4HWHBb_0wn6XJF==DzVLo2i1xj63BB+_jtw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=entwicklung@pengutronix.de \
    --cc=git@vger.kernel.org \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.