git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>, entwicklung@pengutronix.de
Subject: Re: time needed to rebase shortend by using --onto?
Date: Fri, 28 May 2021 23:40:24 +0200	[thread overview]
Message-ID: <20210528214024.vw4huojcklrm6d27@pengutronix.de> (raw)
In-Reply-To: <CABPp-BEVME5Gx=F4HWHBb_0wn6XJF==DzVLo2i1xj63BB+_jtw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 15873 bytes --]

Hello Elijah,

On Thu, May 27, 2021 at 04:08:32PM -0700, Elijah Newren wrote:
> On Thu, May 27, 2021 at 2:59 PM Uwe Kleine-König
> <u.kleine-koenig@pengutronix.de> wrote:
> > On Wed, May 26, 2021 at 07:38:08AM -0700, Elijah Newren wrote:
> > > On Wed, May 26, 2021 at 3:13 AM Uwe Kleine-König
> > > <u.kleine-koenig@pengutronix.de> wrote:
> > > > I have a kernel topic branch containing 4 patches on top of Linux v5.4.
> > > > (I didn't speak to the affected customer, so I cannot easily share the
> > > > patch stack. If need be I can probably anonymize it or ask if I can
> > > > publish the patches.)
> > > >
> > > > It rebases clean on v5.10:
> > > >
> > > >         $ time git rebase v5.10
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Successfully rebased and updated detached HEAD.
> > > >
> > > >         real    3m47.841s
> > > >         user    1m25.706s
> > > >         sys     0m11.181s
> > > >
> > > > If I start with the same rev checked out and explicitly specify the
> > > > merge base, the rebase process is considerably faster:
> > > >
> > > >         $ time git rebase --onto v5.10 v5.4
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Performing inexact rename detection: 100% (36806539/36806539), done.
> > > >         Successfully rebased and updated detached HEAD.
> > > >
> > > >         real    1m20.588s
> > > >         user    1m12.645s
> > > >         sys     0m6.733s
> 
> Note: In your original report you had rename detection and it clearly
> took a significant amount of time...

FTR: My impression is that the repo I used for the first report is slow
in general. Also git log sometimes takes a considerable time to start
emitting output.

> > > > Is there some relevant complexity in the first invocation I'm not seeing
> > > > that explains it takes more than the double time? I would have expected
> > > > that
> > > >
> > > >         git rebase v5.10
> > > >
> > > > does the same as:
> > > >
> > > >         git rebase --onto v5.10 $(git merge-base HEAD v5.10)
> > > >
> > > > . (FTR:
> > > >
> > > >         $ time git merge-base HEAD v5.10
> > > >         219d54332a09e8d8741c1e1982f5eae56099de85
> > > >
> > > >         real    0m0.158s
> > > >         user    0m0.105s
> > > >         sys     0m0.052s
> > > >
> > > > , 219d5433 is v5.4 as expected.
> > >
> > > That does seem surprising, though if an automatic gc completed between
> > > the two commands that could certainly explain it.  If that theory is
> > > correct, it would suggest that it'd be difficult for you to reproduce;
> >
> > This reproduces just fine. The repository is quite big and it is slow at
> > times. With the same tree on a different machine, the rebase is quicker,
> > but the factor 2 between the two different commands is visible there,
> > too:
> >
> > uwe@taurus:~/gsrc/linux$ git checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> > HEAD is now at bc2e99c9c9e0 [...]
> >
> > uwe@taurus:~/gsrc/linux$ time git rebase v5.10
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > Successfully rebased and updated detached HEAD.
> >
> > real    0m20.737s
> > user    0m14.188s
> > sys     0m3.767s
> >
> > uwe@taurus:~/gsrc/linux$ git checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> > HEAD is now at bc2e99c9c9e0 [...]
> >
> > uwe@taurus:~/gsrc/linux$ time git rebase --onto v5.10 v5.4
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8604 and retry the command.
> > Successfully rebased and updated detached HEAD.
> >
> > real    0m12.129s
> > user    0m7.196s
> > sys     0m3.141s
> >
> > (This is with a slightly newer git: 2.30.2-1 from Debian)
> 
> And here, there was no rename detection so this isn't the same thing
> anymore.  You could try setting merge.renameLimit higher.

I learned a few things since my last mail, here comes an updated test
again on the machine and repo used for the initial report:

	ukl@dude.ptx:~/gsrc/linux$ wgit version
	git version 2.32.0.rc1

	ukl@dude.ptx:~/gsrc/linux$ cat rebasecheck 
	#!/bin/bash

	set -e 

	# do it once to heat the caches and ensure all objects are available already to have the next cycles identical.
	wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
	wgit rebase v5.10

	wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
	echo "rebase v5.10"
	time wgit rebase v5.10

	wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
	echo "rebase --onto v5.10 v5.4"
	time wgit rebase --onto v5.10 v5.4

I do the rebase now once before the timing for the reasons described in
the comment. The second identical command is quite a bit quicker. Also
now that the commands are scripted they are done in a smaller time frame
(which matters as the machine is used heavily among my colleagues and
me). I run the script a few times in a row, after all colleagues are in
their week-end:

	ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck 
	...
	rebase v5.10
	...
	real	1m13.579s
	user	1m2.919s
	sys	0m6.220s
	...
	rebase --onto v5.10 v5.4
	...
	real	1m2.852s
	user	0m53.780s
	sys	0m6.225s

	ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck 
	...
	rebase v5.10
	...
	real	1m10.816s
	user	1m3.344s
	sys	0m6.991s
	...
	rebase --onto v5.10 v5.4
	...
	real	0m59.695s
	user	0m53.510s
	sys	0m5.579s

	ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck 
	...
	rebase v5.10
	...
	real	1m9.688s
	user	1m3.346s
	sys	0m6.105s
	...
	rebase --onto v5.10 v5.4
	...
	real	0m59.981s
	user	0m52.931s
	sys	0m6.282s

So it's not a factor 2 any more, but still reproducibly quicker when
--onto is used.

> However, the 7-8 second difference (and the likely large differences
> between 5.4 and 5.10) do suggest that Junio's hunch that fork-point
> behavior being at play could be an issue in these two commands.
> 
> > Then I repeated the test with git 2.32.0-rc1 (wgit is just calling
> > bin-wrappers/git in my git working copy):
> >
> > uwe@taurus:~/gsrc/linux$ wgit version
> > git version 2.32.0.rc1
> >
> > uwe@taurus:~/gsrc/linux$ wgit checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> > HEAD is now at bc2e99c9c9e0 [...]
> >
> > uwe@taurus:~/gsrc/linux$ time wgit rebase v5.10
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > Successfully rebased and updated detached HEAD.
> >
> > real    0m19.438s
> > user    0m13.629s
> > sys     0m3.299s
> >
> > uwe@taurus:~/gsrc/linux$ wgit checkout bc2e99c9c9e0d29494b1739624554e4f5f979d32
> > HEAD is now at bc2e99c9c9e0 [...]
> >
> > uwe@taurus:~/gsrc/linux$ time wgit rebase --onto v5.10 v5.4
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > warning: inexact rename detection was skipped due to too many files.
> > warning: you may want to set your merge.renamelimit variable to at least 8024 and retry the command.
> > Successfully rebased and updated detached HEAD.
> >
> > real    0m13.848s
> > user    0m8.315s
> > sys     0m3.182s
> >
> > So the surprise persists.
> 
> Yeah, with no rename detection, the newer git version isn't going to
> make a bit of difference.
> 
> > > running again with either command would give you something closer to
> > > the lower time both times.  Is that the case?  (Also, what's the
> > > output of "git count-objects -v"?)
> >
> > After the above commands I have:
> >
> >         count: 3203
> >         size: 17664
> >         in-pack: 4763753
> >         packs: 11
> >         size-pack: 1273957
> >         prune-packable: 19
> >         garbage: 0
> >         size-garbage: 0
> 
> So, not freshly packed, but not in need of an automatic gc either.
> 
> >         alternate: /home/uwe/var/gitstore/linux.git/objects
> 
> You've got an alternate?  How well packed is it?  (What does "git
> count-objects -v" in that other repo show?)
> 
> >
> > (On the repository I did this initially I have:
> >
> >         warning: garbage found: .git/objects/pack/pack-864148a84c0524073ed8c8aa1a76155d5c677879.pack.temp
> >         warning: garbage found: /ptx/src/git/linux.git/objects/pack/tmp_pack_X9gHnq
> >         count: 2652
> >         size: 14640
> >         in-pack: 2117015
> >         packs: 8
> >         size-pack: 574167
> >         prune-packable: 856
> >         garbage: 2
> >         size-garbage: 1114236
> >         alternate: /ptx/src/git/linux.git/objects

In the alternate I have:

	ukl@dude.ptx:/ptx/src/git/linux.git/objects$ wgit count-objects -v
	warning: garbage found: /ptx/work/user/git/linux.git/objects/pack/tmp_pack_X9gHnq
	count: 5035
	size: 40720
	in-pack: 87083076
	packs: 1108
	size-pack: 51109693
	prune-packable: 3050
	garbage: 1
	size-garbage: 1112612

The alternate tracks

	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
	git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git
	git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

(only the tags for the two latter).

> > (Is the garbage a reason this is so slow? Can I just remove the two
> > files pointed out?)
> 
> If there isn't some still-running git operation that is fetching and
> writing to these files, then yes they can be cleaned out.  I doubt
> they'd make too much of a difference, though.  I was more curious if
> you went from say 10000 loose objects to ~0, or from 50+ packs down to
> 1 between operations due to an automatic gc completing.
> 
> > > I'd love to try this with git-2.32.0-rc1 (or even my not-yet-upstream
> > > patches that optimize even further) with adding "--strategy=ort" to
> > > your rebase command to see how much of a timing difference it makes.
> > > Any chance the patches could either be published, or you could retry
> > > with git-2.32.0-rc1 and add the --strategy=ort command line option to
> > > your rebase command(s)?
> >
> > With --strategy=ort added I have:
> >
> > uwe@taurus:~/gsrc/linux$ time wgit rebase --strategy=ort v5.10
> > Successfully rebased and updated detached HEAD.
> >
> > real    0m19.202s
> > user    0m12.724s
> > sys     0m2.961s
> >
> > [...]
> >
> > uwe@taurus:~/gsrc/linux$ time wgit rebase --strategy=ort --onto v5.10 v5.4
> > Successfully rebased and updated detached HEAD.
> >
> > real    0m12.395s
> > user    0m6.638s
> > sys     0m3.284s
> >
> > So the warnings about inexact rename detection don't appear and it's a
> > bit faster, but I still see the timing difference between these two
> > commands.
> 
> Right, this says that --strategy=ort WITH rename detection is as fast
> as the default --strategy=recursive WITHOUT rename detection.

I rerun the script with -sort added:

	ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
	...
	rebase v5.10
	...
	real	0m25.047s
	user	0m17.652s
	sys	0m5.802s
	...
	rebase --onto v5.10 v5.4
	...
	real	0m12.471s
	user	0m7.854s
	sys	0m4.413s

	ukl@dude.ptx:~/gsrc/linux$ bash rebasecheck
	...
	rebase v5.10
	...
	real	0m22.180s
	user	0m17.219s
	sys	0m4.701s
	...
	rebase --onto v5.10 v5.4
	...
	real	0m12.341s
	user	0m7.308s
	sys	0m4.632s

So -sort is quite a bit quicker, but the ~10s overhead when not using
--onto is visible there, too.
  
When looking at the timing of the output, the 10s time difference occur
before "Rebasing (1/4)" is emitted.

	wgit rebase -sort --onto v5.10 v5.10

behaves like

	wgit rebase -sort v5.10

and if I only rebase the first two patches (instead of four) it still
takes nearly the same time. Another test I did was:

	time wgit rebase -sort --onto v5.10 v5.7

	real	0m17.712s
	user	0m11.570s
	sys	0m5.396s

So there seems to be something before the actual rebase is done that
takes longer when HEAD..$base contains more objects.
Given that

	ukl@dude.ptx:~/gsrc/linux$ time wgit log --oneline --cherry v5.10...0091ecb84cfdef0f4cb65810219f5ac9bb4341e5
	+ 0091ecb84cfd (ptx/ukl/rebase-timing) nvmem: core: skip child nodes not matching binding
	+ 38af1d38c542 spidev: add "hxxxxxxx,xxxxxx" compatible
	+ a7edcfb6a968 regmap: fix memory leak in regmap_debugfs_init()
	+ b1d90bc89408 pci: add quirk for txxxxx FPGA watchdog

	real	0m10.783s
	user	0m10.346s
	sys	0m0.436s

I guess this range is searched for commits that have the same patch id
as the patches to rebase?

> It's not a fair comparison (you'd need to set merge.renameLimit higher
> and re-run the cases where you had warnings), but is interesting
> nonetheless.  It basically suggests that rename detection comes for
> free with the ort strategy.

FTR: In the above repo I have:

	ukl@dude.ptx:~/gsrc/linux$ wgit config merge.renameLimit
	10000

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-05-28 21:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 10:09 time needed to rebase shortend by using --onto? Uwe Kleine-König
2021-05-26 11:04 ` Bagas Sanjaya
2021-05-26 14:38 ` Elijah Newren
2021-05-27 21:59   ` Uwe Kleine-König
2021-05-27 22:15     ` Uwe Kleine-König
2021-05-28  5:38       ` Elijah Newren
2021-05-27 23:08     ` Elijah Newren
2021-05-28 21:40       ` Uwe Kleine-König [this message]
2021-05-28 22:26         ` Elijah Newren
2021-05-29 16:59         ` Felipe Contreras
2021-05-26 22:18 ` Junio C Hamano
2021-05-27 22:16   ` Uwe Kleine-König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210528214024.vw4huojcklrm6d27@pengutronix.de \
    --to=u.kleine-koenig@pengutronix.de \
    --cc=entwicklung@pengutronix.de \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).