git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: Git Mailing List <git@vger.kernel.org>, entwicklung@pengutronix.de
Subject: Re: time needed to rebase shortend by using --onto?
Date: Fri, 28 May 2021 15:26:04 -0700	[thread overview]
Message-ID: <CABPp-BG=nro4ydA9hAdq0A+AGX27_qCsy0sgrqfBLcGfFaQo8A@mail.gmail.com> (raw)
In-Reply-To: <20210528214024.vw4huojcklrm6d27@pengutronix.de>

Hi Uwe,

On Fri, May 28, 2021 at 2:40 PM Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
>
> Hello Elijah,
>
> On Thu, May 27, 2021 at 04:08:32PM -0700, Elijah Newren wrote:
> > On Thu, May 27, 2021 at 2:59 PM Uwe Kleine-König
> > <u.kleine-koenig@pengutronix.de> wrote:
> > > On Wed, May 26, 2021 at 07:38:08AM -0700, Elijah Newren wrote:
> > > > On Wed, May 26, 2021 at 3:13 AM Uwe Kleine-König
> > > > <u.kleine-koenig@pengutronix.de> wrote:
...
> > Note: In your original report you had rename detection and it clearly
> > took a significant amount of time...
>
> FTR: My impression is that the repo I used for the first report is slow
> in general. Also git log sometimes takes a considerable time to start
> emitting output.
>
...
>
> I learned a few things since my last mail, here comes an updated test
> again on the machine and repo used for the initial report:
>
>         ukl@dude.ptx:~/gsrc/linux$ wgit version
>         git version 2.32.0.rc1
>
>         ukl@dude.ptx:~/gsrc/linux$ cat rebasecheck
>         #!/bin/bash
>
>         set -e
>
>         # do it once to heat the caches and ensure all objects are available already to have the next cycles identical.
>         wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
>         wgit rebase v5.10
>
>         wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
>         echo "rebase v5.10"
>         time wgit rebase v5.10
>
>         wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
>         echo "rebase --onto v5.10 v5.4"
>         time wgit rebase --onto v5.10 v5.4
>
> I do the rebase now once before the timing for the reasons described in
> the comment. The second identical command is quite a bit quicker. Also
> now that the commands are scripted they are done in a smaller time frame
> (which matters as the machine is used heavily among my colleagues and
> me). I run the script a few times in a row, after all colleagues are in
> their week-end:
>
>         ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
>         ...
>         rebase v5.10
>         ...
>         real    1m13.579s
>         user    1m2.919s
>         sys     0m6.220s
>         ...
>         rebase --onto v5.10 v5.4
>         ...
>         real    1m2.852s
>         user    0m53.780s
>         sys     0m6.225s
>
>         ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
>         ...
>         rebase v5.10
>         ...
>         real    1m10.816s
>         user    1m3.344s
>         sys     0m6.991s
>         ...
>         rebase --onto v5.10 v5.4
>         ...
>         real    0m59.695s
>         user    0m53.510s
>         sys     0m5.579s
>
>         ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
>         ...
>         rebase v5.10
>         ...
>         real    1m9.688s
>         user    1m3.346s
>         sys     0m6.105s
>         ...
>         rebase --onto v5.10 v5.4
>         ...
>         real    0m59.981s
>         user    0m52.931s
>         sys     0m6.282s
>
> So it's not a factor 2 any more, but still reproducibly quicker when
> --onto is used.

Yep, so that looks like the results I was getting.  Adding
--reapply-cherry-picks should remove most of that time difference as I
stated in my previous email.

> > However, the 7-8 second difference (and the likely large differences
> > between 5.4 and 5.10) do suggest that Junio's hunch that fork-point
> > behavior being at play could be an issue in these two commands.

I don't think --no-fork-point will matter here since you are detaching
HEAD before running rebase.  fork-point is all about looking up the
reflog of the current branch to find better matches.
--reapply-cherry-picks should help you out and erase most of this 7-8
second difference.

> > > > running again with either command would give you something closer to
> > > > the lower time both times.  Is that the case?  (Also, what's the
> > > > output of "git count-objects -v"?)
> > >
> > > After the above commands I have:
> > >
> > >         count: 3203
> > >         size: 17664
> > >         in-pack: 4763753
> > >         packs: 11
> > >         size-pack: 1273957
> > >         prune-packable: 19
> > >         garbage: 0
> > >         size-garbage: 0
> >
> > So, not freshly packed, but not in need of an automatic gc either.
> >
> > >         alternate: /home/uwe/var/gitstore/linux.git/objects
> >
> > You've got an alternate?  How well packed is it?  (What does "git
> > count-objects -v" in that other repo show?)
> >
...
>
> In the alternate I have:
>
>         ukl@dude.ptx:/ptx/src/git/linux.git/objects$ wgit count-objects -v
>         warning: garbage found: /ptx/work/user/git/linux.git/objects/pack/tmp_pack_X9gHnq
>         count: 5035

This is really close to the threshold of needing repacking, but still okay.

>         size: 40720
>         in-pack: 87083076
>         packs: 1108

1108 packs!?!?  This will make all kinds of operations slow.  This
explains your comment about operations with your original repo being
slow in general, and why you feel you need to do a warmup run first to
get a reasonable timing.  50 is the limit where repacking is deemed
necessary; you're 2116% beyond that point.  I've only seen repos with
pack counts near this level a couple times and they are excruciatingly
painful to deal with.

However, be careful not to use "git gc" or "git prune" in this repo,
since it's used as an alternate (doing so could corrupt the repos that
depend on this one).  Just use "git repack" with the appropriate flags
instead.

>         size-pack: 51109693

51G.  Wow.  A fresh clone of linux is waaay smaller than that.  3 G, I
think?  I would have thought lots of your packs were small, but this
suggests you probably have lots of duplicate objects in these packs.

>         prune-packable: 3050
>         garbage: 1
>         size-garbage: 1112612

And 1 G of garbage that could just be deleted.

> I rerun the script with -sort added:
>
>         ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
>         ...
>         rebase v5.10
>         ...
>         real    0m25.047s
>         user    0m17.652s
>         sys     0m5.802s
>         ...
>         rebase --onto v5.10 v5.4
>         ...
>         real    0m12.471s
>         user    0m7.854s
>         sys     0m4.413s
>
>         ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
>         ...
>         rebase v5.10
>         ...
>         real    0m22.180s
>         user    0m17.219s
>         sys     0m4.701s
>         ...
>         rebase --onto v5.10 v5.4
>         ...
>         real    0m12.341s
>         user    0m7.308s
>         sys     0m4.632s
>
> So -sort is quite a bit quicker, but the ~10s overhead when not using
> --onto is visible there, too.

Yeah, try adding --reapply-cherry-picks; I think that flag should
shrink most of the difference.

> When looking at the timing of the output, the 10s time difference occur
> before "Rebasing (1/4)" is emitted.
>
>         wgit rebase -sort --onto v5.10 v5.10
>
> behaves like
>
>         wgit rebase -sort v5.10
>
> and if I only rebase the first two patches (instead of four) it still
> takes nearly the same time. Another test I did was:
>
>         time wgit rebase -sort --onto v5.10 v5.7
>
>         real    0m17.712s
>         user    0m11.570s
>         sys     0m5.396s
>
> So there seems to be something before the actual rebase is done that
> takes longer when HEAD..$base contains more objects.
> Given that
>
>         ukl@dude.ptx:~/gsrc/linux$ time wgit log --oneline --cherry v5.10...0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
>         + 0091ecb84cfd (ptx/ukl/rebase-timing) nvmem: core: skip child nodes not matching binding
>         + 38af1d38c542 spidev: add "hxxxxxxx,xxxxxx" compatible
>         + a7edcfb6a968 regmap: fix memory leak in regmap_debugfs_init()
>         + b1d90bc89408 pci: add quirk for txxxxx FPGA watchdog
>
>         real    0m10.783s
>         user    0m10.346s
>         sys     0m0.436s
>
> I guess this range is searched for commits that have the same patch id
> as the patches to rebase?

Yep, and --reapply-cherry-picks removes this cherry-searching.  Try it
and see how it affects your results.  I don't think it'll entirely
eliminate the differences for you (it didn't for me), because there
appears to be some other weird overhead -- part of it from
can_fast_forward() and more that I didn't track down further.  I do
think that the --reapply-cherry-picks will remove most of the
differences for you, though.

> FTR: In the above repo I have:
>
>         ukl@dude.ptx:~/gsrc/linux$ wgit config merge.renameLimit
>         10000

Yep, so my choice of 9999 to try to reproduce your behavior was a
pretty good pick, eh?  :-)


Hope that helps,
Elijah

  reply	other threads:[~2021-05-28 22:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 10:09 time needed to rebase shortend by using --onto? Uwe Kleine-König
2021-05-26 11:04 ` Bagas Sanjaya
2021-05-26 14:38 ` Elijah Newren
2021-05-27 21:59   ` Uwe Kleine-König
2021-05-27 22:15     ` Uwe Kleine-König
2021-05-28  5:38       ` Elijah Newren
2021-05-27 23:08     ` Elijah Newren
2021-05-28 21:40       ` Uwe Kleine-König
2021-05-28 22:26         ` Elijah Newren [this message]
2021-05-29 16:59         ` Felipe Contreras
2021-05-26 22:18 ` Junio C Hamano
2021-05-27 22:16   ` Uwe Kleine-König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BG=nro4ydA9hAdq0A+AGX27_qCsy0sgrqfBLcGfFaQo8A@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=entwicklung@pengutronix.de \
    --cc=git@vger.kernel.org \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).