All of lore.kernel.org
 help / color / mirror / Atom feed
* VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i
@ 2010-07-13  6:56 Marat Radchenko
  2010-07-13  8:12 ` Michael J Gruber
  2010-10-13  7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
  0 siblings, 2 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-07-13  6:56 UTC (permalink / raw)
  To: git

Hi.

My setup:
0. Quad-code machine with 8GB of ram, 10K RPM hdd.
1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200 
commits/day.
2. My local branch with 1-5 commits which i often rebase against trunk.
3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch 
over 453 commits in trunk using "git rebase trunk".
4. trunk does contain "bad" from diff POV files (big & binary).
5. Sadly, data in repo is confidential.

Expected: rebase takes some reasonable amount of time (< 1 min?).

Actual: rebase takes 20 mins.

Almost all of that time was spent doing `git format-patch -k --stdout --full-
index --ignore-if-in-upstream 
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
52` (that's three commits from my branch) at 100% of one CPU core.

Additional info:

Another similar rebase but over 4.5k of commits took 2 hours.

Running without --ignore-if-in-upstream:
$ time git format-patch -k --stdout --full-index 
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
5 | wc -l
25823
Is it 
real	0m0.163s
user	0m0.140s
sys	0m0.020s

Proof there are only three commits:

$ git rev-list 
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
52d3fde4ae7497981a6fe61b0366b105477896cf52
e18069258806bda6a6165822003f5e9fd958f906
c8c2f2e157e615b73d0baab1d793a22991c9ba71

Questions:
1. Is it expected behavior (branch you rebase onto has binary files -> no 
performance for you)?
2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in-
upstream?
3. If [1] is no, should i run some kind of profiler (how?) to determine what 
exactly causes such performance drop?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i
  2010-07-13  6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
@ 2010-07-13  8:12 ` Michael J Gruber
  2010-07-13  8:13   ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
  2010-10-13  7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
  1 sibling, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-07-13  8:12 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

Marat Radchenko venit, vidit, dixit 13.07.2010 08:56:
> Hi.
> 
> My setup:
> 0. Quad-code machine with 8GB of ram, 10K RPM hdd.
> 1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200 
> commits/day.
> 2. My local branch with 1-5 commits which i often rebase against trunk.
> 3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch 
> over 453 commits in trunk using "git rebase trunk".
> 4. trunk does contain "bad" from diff POV files (big & binary).
> 5. Sadly, data in repo is confidential.
> 
> Expected: rebase takes some reasonable amount of time (< 1 min?).
> 
> Actual: rebase takes 20 mins.
> 
> Almost all of that time was spent doing `git format-patch -k --stdout --full-
> index --ignore-if-in-upstream 
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 52` (that's three commits from my branch) at 100% of one CPU core.
> 
> Additional info:
> 
> Another similar rebase but over 4.5k of commits took 2 hours.
> 
> Running without --ignore-if-in-upstream:
> $ time git format-patch -k --stdout --full-index 
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 5 | wc -l
> 25823
> Is it 
> real	0m0.163s
> user	0m0.140s
> sys	0m0.020s
> 
> Proof there are only three commits:
> 
> $ git rev-list 
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 52d3fde4ae7497981a6fe61b0366b105477896cf52
> e18069258806bda6a6165822003f5e9fd958f906
> c8c2f2e157e615b73d0baab1d793a22991c9ba71
> 
> Questions:
> 1. Is it expected behavior (branch you rebase onto has binary files -> no 
> performance for you)?

Well, with "ignore-if-in-upstream" git has to compute a patch-id for
every upstream patch (merge-base..upstream) and compare to the ids of
the commits in mb..HEAD.

> 2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in-
> upstream?

Not currently, but with my upcoming patch ;)

This has the (side-) effect of not ignoring patches which have been
applied (with different sha1) upstream, of course.

> 3. If [1] is no, should i run some kind of profiler (how?) to determine what 
> exactly causes such performance drop?

It is the calculation of the patch-ids. Git first creates a "binary
diff" and then computes the patch-id (sha1) of that diff. I am sure we
could optimize the calculation of patch-ids for binary diffs, which may
be useful in addition to shutting off "cherry" with rebase.

Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-07-13  8:12 ` Michael J Gruber
@ 2010-07-13  8:13   ` Michael J Gruber
  2010-07-13 19:33     ` Erik Faye-Lund
  0 siblings, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-07-13  8:13 UTC (permalink / raw)
  To: git; +Cc: Marat Radchenko

git-rebase uses "format-patch --ignore-if-in-upstream" do determine
which commits to apply. This may or may not be desired: a user may want
to transplant all commits, or may opt to avoid the possibly time
consuming calculation of patch-ids.

Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
--no-cherry options (to override the config), where --cherry means the
current behavior and --no-cherry avoids "--ignore-if-in-upstream".

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
---
RFC for obvious reasons (doc, tests).

 git-rebase.sh |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index ab4afa7..1eb6ad1 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -53,6 +53,7 @@ git_am_opt=
 rebase_root=
 force_rebase=
 allow_rerere_autoupdate=
+cherry=$(git config --bool rebase.cherry)
 
 continue_merge () {
 	test -n "$prev_head" || die "prev_head must be defined"
@@ -307,6 +308,12 @@ do
 		esac
 		do_merge=t
 		;;
+	--cherry)
+		cherry=true
+		;;
+	--no-cherry)
+		cherry=false
+		;;
 	-n|--no-stat)
 		diffstat=
 		;;
@@ -540,9 +547,16 @@ else
 	revisions="$upstream..$orig_head"
 fi
 
+if test "x$cherry" = "xfalse"
+then
+	cherry_opt=""
+else
+	cherry_opt="--ignore-if-in-upstream"
+fi
+
 if test -z "$do_merge"
 then
-	git format-patch -k --stdout --full-index --ignore-if-in-upstream \
+	git format-patch -k --stdout --full-index $cherry_opt \
 		$root_flag "$revisions" |
 	git am $git_am_opt --rebasing --resolvemsg="$RESOLVEMSG" &&
 	move_to_original_branch
-- 
1.7.2.rc1.212.g850a

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-07-13  8:13   ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
@ 2010-07-13 19:33     ` Erik Faye-Lund
  2010-09-04 15:03       ` Michael J Gruber
  0 siblings, 1 reply; 7+ messages in thread
From: Erik Faye-Lund @ 2010-07-13 19:33 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Marat Radchenko

s/of/off/ in the subject ;)

On Tue, Jul 13, 2010 at 10:13 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> git-rebase uses "format-patch --ignore-if-in-upstream" do determine
> which commits to apply. This may or may not be desired: a user may want
> to transplant all commits, or may opt to avoid the possibly time
> consuming calculation of patch-ids.
>
> Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
> --no-cherry options (to override the config), where --cherry means the
> current behavior and --no-cherry avoids "--ignore-if-in-upstream".
>
> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
> ---
> RFC for obvious reasons (doc, tests).

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-07-13 19:33     ` Erik Faye-Lund
@ 2010-09-04 15:03       ` Michael J Gruber
  2010-09-09  8:05         ` Marat Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-09-04 15:03 UTC (permalink / raw)
  To: kusmabite; +Cc: Erik Faye-Lund, git, Marat Radchenko, Junio C Hamano

Erik Faye-Lund venit, vidit, dixit 13.07.2010 21:33:
> s/of/off/ in the subject ;)
> 
> On Tue, Jul 13, 2010 at 10:13 AM, Michael J Gruber
> <git@drmicha.warpmail.net> wrote:
>> git-rebase uses "format-patch --ignore-if-in-upstream" do determine
>> which commits to apply. This may or may not be desired: a user may want
>> to transplant all commits, or may opt to avoid the possibly time
>> consuming calculation of patch-ids.
>>
>> Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
>> --no-cherry options (to override the config), where --cherry means the
>> current behavior and --no-cherry avoids "--ignore-if-in-upstream".
>>
>> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
>> ---
>> RFC for obvious reasons (doc, tests).
> 

Pinging this one. Is there any interest? Erik is right, off course ;)

Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-09-04 15:03       ` Michael J Gruber
@ 2010-09-09  8:05         ` Marat Radchenko
  0 siblings, 0 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-09-09  8:05 UTC (permalink / raw)
  To: Michael J Gruber, kusmabite; +Cc: Erik Faye-Lund, git, Junio C Hamano

> Pinging this one. Is there any interest? Erik is right, off course ;)

There definitely is. Since [1] rebasing became much faster (minutes instead of tens of minutes), though still it takes more than I'd like it to.

[1]: http://repo.or.cz/w/git.git/commit/34597c1f5a77c710dae33092cb8a7cb01c6b21c1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [FEATURE REQUEST] allow enabling patience diff algorithm by default
  2010-07-13  6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
  2010-07-13  8:12 ` Michael J Gruber
@ 2010-10-13  7:56 ` Marat Radchenko
  1 sibling, 0 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-10-13  7:56 UTC (permalink / raw)
  To: git


I observe patience algorithm being several times faster than standard diff on
some big (1MB<size<10MB) text files (and, actually, it produces smaller
diffs). So using patience diff is likely to improve git-rev-list
performance.

Suggested way: add option to ~/.gitconfig to enable patience diff by
default. Additionally, smth like--no-patience may be added to commands that
accept --patience now so it is possible to override setting if needed.

-- 
View this message in context: http://git.661346.n2.nabble.com/VERY-slow-git-format-patch-tens-on-minutes-during-rebase-and-rev-list-during-rebase-i-tp5286226p5629926.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-13  7:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-13  6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
2010-07-13  8:12 ` Michael J Gruber
2010-07-13  8:13   ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
2010-07-13 19:33     ` Erik Faye-Lund
2010-09-04 15:03       ` Michael J Gruber
2010-09-09  8:05         ` Marat Radchenko
2010-10-13  7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.