* [RFC] Speeding up a null fetch
@ 2007-02-11 23:32 Julian Phillips
2007-02-11 23:49 ` Johannes Schindelin
2007-02-11 23:52 ` Shawn O. Pearce
0 siblings, 2 replies; 5+ messages in thread
From: Julian Phillips @ 2007-02-11 23:32 UTC (permalink / raw)
To: git
I was investigating replacing an existing subversion setup with git, and
was mostly pleased with the results - until it came to trying to update a
clone ... which took very much longer than the original clone.
An artifical test repository that has similar features (~25000 commits,
~8000 tags, ~900 branches and a 2.5Gb packfile) when running locally
takes ~20m to clone and ~48m to fetch (with no new commits in the
original repository - i.e. the fetch does not update anything) with a
current code base (i.e. newer than 1.5.0-rc4). As a side note,
performance was actually better with an older version - packed refs
makes things quite a bit worse (clone was only ~30m with 1.4 IIRC).
Investigation showed that the main culprit seemed to be show-ref
having to build a sorted list of all refs for every ref that was being
checked. So I used the patch below to reduce this to a single call to
show-ref (unless the ref had been updated). With this patch the fetch
timed dropped to just under 1m - obviously quite a lot faster (better
than I expected in fact).
However, this seems more band-aid than fix, and I wondered if someone
more familiar with the git internals could point me in the right
direction for a better fix, e.g. should I look at rewriting fetch in C?
diff --git a/Makefile b/Makefile
index 5d31e6d..6baf043 100644
--- a/Makefile
+++ b/Makefile
@@ -120,7 +120,7 @@ ALL_CFLAGS = $(CFLAGS)
ALL_LDFLAGS = $(LDFLAGS)
STRIP ?= strip
-prefix = $(HOME)
+prefix = $(HOME)/git
bindir = $(prefix)/bin
gitexecdir = $(bindir)
template_dir = $(prefix)/share/git-core/templates/
@@ -188,7 +188,7 @@ SCRIPT_PERL = \
SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
$(patsubst %.perl,%,$(SCRIPT_PERL)) \
- git-cherry-pick git-status git-instaweb
+ git-cherry-pick git-status git-instaweb git-ref-diff.py
# ... and all the rest that could be moved out of bindir to gitexecdir
PROGRAMS = \
diff --git a/git-fetch.sh b/git-fetch.sh
index 357cac2..ce135a5 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -108,11 +108,12 @@ ls_remote_result=$(git ls-remote $exec "$remote") ||
append_fetch_head () {
head_="$1"
- remote_="$2"
- remote_name_="$3"
- remote_nick_="$4"
- local_name_="$5"
- case "$6" in
+ local_head_="$2"
+ remote_="$3"
+ remote_name_="$4"
+ remote_nick_="$5"
+ local_name_="$6"
+ case "$7" in
t) not_for_merge_='not-for-merge' ;;
'') not_for_merge_= ;;
esac
@@ -151,10 +152,15 @@ append_fetch_head () {
echo "$head_ not-for-merge $note_" >>"$GIT_DIR/FETCH_HEAD"
fi
- update_local_ref "$local_name_" "$head_" "$note_"
+ update_local_ref "$local_name_" "$head_" "$note_" "$local_head_"
}
update_local_ref () {
+ if [ "$2" == "$4" ]; then
+ [ "$verbose" ] && echo >&2 "* $1: same as $3"
+ return 0
+ fi
+
# If we are storing the head locally make sure that it is
# a fast forward (aka "reverse push").
@@ -392,7 +398,7 @@ fetch_main () {
(
git-fetch-pack --thin $exec $keep $shallow_depth "$remote" $rref ||
echo failed "$remote"
- ) |
+ ) | git-ref-diff.py "$reflist" |
(
trap '
if test -n "$keepfile" && test -f "$keepfile"
@@ -402,7 +408,7 @@ fetch_main () {
' 0
keepfile=
- while read sha1 remote_name
+ while read sha1 remote_name local_sha1
do
case "$sha1" in
failed)
@@ -441,7 +447,7 @@ fetch_main () {
esac
done
local_name=$(expr "z$found" : 'z[^:]*:\(.*\)')
- append_fetch_head "$sha1" "$remote" \
+ append_fetch_head "$sha1" "$local_sha1" "$remote" \
"$remote_name" "$remote_nick" "$local_name" \
"$not_for_merge" || exit
done
diff --git a/git-ref-diff.py b/git-ref-diff.py
new file mode 100755
index 0000000..2b30e4c
--- /dev/null
+++ b/git-ref-diff.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python
+
+import os
+import re
+import sys
+
+ref_map_re = re.compile("^\.?\+?(?P<remote>.*?):(?P<local>.*)$")
+
+refs = {}
+refsp = os.popen("git-show-ref")
+for ref in refsp.readlines():
+ (sha, ref) = ref.strip().split(' ')
+ refs[ref] = sha
+refsp.close()
+
+ref_map = {}
+for line in sys.argv[1].split('\n'):
+ ref_map_m = ref_map_re.search(line)
+ if ref_map_m:
+ remote = ref_map_m.group('remote')
+ local = ref_map_m.group('local')
+ ref_map[remote] = local
+
+while True:
+ try:
+ (sha, ref) = raw_input().split(' ')
+ except EOFError:
+ sys.exit(0)
+ lref = ref_map.get(ref, None)
+ if refs.has_key(lref):
+ print "%s %s %s" % (sha, ref, refs[lref])
+ else:
+ print "%s %s -" % (sha, ref)
--
Julian
---
Why bother building any more nuclear warheads until we use the ones we have?
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC] Speeding up a null fetch
2007-02-11 23:32 [RFC] Speeding up a null fetch Julian Phillips
@ 2007-02-11 23:49 ` Johannes Schindelin
2007-02-12 0:14 ` Julian Phillips
2007-02-11 23:52 ` Shawn O. Pearce
1 sibling, 1 reply; 5+ messages in thread
From: Johannes Schindelin @ 2007-02-11 23:49 UTC (permalink / raw)
To: Julian Phillips; +Cc: git
Hi,
On Sun, 11 Feb 2007, Julian Phillips wrote:
> An artifical test repository that has similar features (~25000 commits,
> ~8000 tags, ~900 branches and a 2.5Gb packfile) when running locally
> takes ~20m to clone and ~48m to fetch (with no new commits in the
> original repository - i.e. the fetch does not update anything) with a
> current code base (i.e. newer than 1.5.0-rc4).
Ouch.
I hope you packed the refs?
BTW your patch
- was not minimal (and therefore it takes longer than necessary to find
what you actually fixed),
- it does not show where and how the call to show-ref is avoided (I
eventually understand that you avoid calling update_local_ref early, but
you sure could have made that easier), and
- it uses Pythong.
Also, it touches a quite core part of git, which will hopefully be
replaced by a builtin _after_ 1.5.0.
> However, this seems more band-aid than fix, and I wondered if someone
> more familiar with the git internals could point me in the right
> direction for a better fix, e.g. should I look at rewriting fetch in C?
Look into the "pu" branch of git. There are the beginnings of a builtin
(written in C) fetch.
But this _will_ have to wait until after 1.5.0.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] Speeding up a null fetch
2007-02-11 23:32 [RFC] Speeding up a null fetch Julian Phillips
2007-02-11 23:49 ` Johannes Schindelin
@ 2007-02-11 23:52 ` Shawn O. Pearce
2007-02-12 0:18 ` Julian Phillips
1 sibling, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2007-02-11 23:52 UTC (permalink / raw)
To: Julian Phillips; +Cc: git
Julian Phillips <julian@quantumfyre.co.uk> wrote:
> Investigation showed that the main culprit seemed to be show-ref
> having to build a sorted list of all refs for every ref that was being
> checked. So I used the patch below to reduce this to a single call to
> show-ref (unless the ref had been updated). With this patch the fetch
> timed dropped to just under 1m - obviously quite a lot faster (better
> than I expected in fact).
Have a look at the `pu` branch in git.git. Junio has done some
work in this area to handle 1000 refs better:
...
commit 58fef67cb067b6dee8f94b7b0e0c1a2d324e3505
Author: Junio C Hamano <junkio@cox.net>
Date: Tue Jan 16 02:31:36 2007 -0800
git-fetch: rewrite another shell loop in C
Move another shell loop that canonicalizes the list of refs for
underlying git-fetch-pack and fetch-native-store into C.
This seems to shave the runtime for the same 1000 branch
repository from 30 seconds down to 15 seconds (it used to be 2
and half minutes with the original version).
Signed-off-by: Junio C Hamano <junkio@cox.net>
commit 3fc3729cd08e9d40dad54ccdd4db53900eca197b
Author: Junio C Hamano <junkio@cox.net>
Date: Tue Jan 16 01:53:29 2007 -0800
git-fetch: move more code into C.
This adds "native-store" subcommand to git-fetch--tool to
move a huge loop implemented in shell into C. This shaves about
70% of the runtime to fetch and update 1000 tracking branches
with a single fetch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
...
> However, this seems more band-aid than fix, and I wondered if someone
> more familiar with the git internals could point me in the right
> direction for a better fix, e.g. should I look at rewriting fetch in C?
Rewriting fetch in C is a lot of work, not just in developing it,
but in testing that all existing functionality is preserved and no
new bugs are introduced. Rewriting some of the performance critical
parts perhaps makes sense. Rewriting them in Python doesn't, as
we no longer have any Python dependency, and would like to keep it
that way (actuallly, some folks are also trying to remove the Perl
dependency from some of our critical tools).
--
Shawn.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] Speeding up a null fetch
2007-02-11 23:49 ` Johannes Schindelin
@ 2007-02-12 0:14 ` Julian Phillips
0 siblings, 0 replies; 5+ messages in thread
From: Julian Phillips @ 2007-02-12 0:14 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
On Mon, 12 Feb 2007, Johannes Schindelin wrote:
> Hi,
>
> On Sun, 11 Feb 2007, Julian Phillips wrote:
>
>> An artifical test repository that has similar features (~25000 commits,
>> ~8000 tags, ~900 branches and a 2.5Gb packfile) when running locally
>> takes ~20m to clone and ~48m to fetch (with no new commits in the
>> original repository - i.e. the fetch does not update anything) with a
>> current code base (i.e. newer than 1.5.0-rc4).
>
> Ouch.
>
> I hope you packed the refs?
Unfortunately packing only makes things slower ... as it then becomes
impossible to directly access a particular ref directly, which some of the
calls to show-ref do.
>
> BTW your patch
> - was not minimal (and therefore it takes longer than necessary to find
> what you actually fixed),
> - it does not show where and how the call to show-ref is avoided (I
> eventually understand that you avoid calling update_local_ref early, but
> you sure could have made that easier), and
Ah yes, sorry. I seem to have managed to forget to include the paragraph
explaining what I had done ... :$
(That'll teach me to trying doing too many things at once.)
> - it uses Pythong.
>
> Also, it touches a quite core part of git, which will hopefully be
> replaced by a builtin _after_ 1.5.0.
Indeed, I would never propose what I have done so far as a fix. I am
definitely still in the investigation phase.
>
>> However, this seems more band-aid than fix, and I wondered if someone
>> more familiar with the git internals could point me in the right
>> direction for a better fix, e.g. should I look at rewriting fetch in C?
>
> Look into the "pu" branch of git. There are the beginnings of a builtin
> (written in C) fetch.
Ah - this I didn't know. I shall have to have a play with that, I did
notice that there is internal caching of the ref list that might magically
solve the problem if fetch was a builtin (but I have a feeling that it
won't be that simple).
>
> But this _will_ have to wait until after 1.5.0.
I hope so. 1.5 is looking very nice, and I really don't think that many
people have such a stuipdly large repository ...
>
> Ciao,
> Dscho
>
--
Julian
---
You are in a maze of little twisting passages, all alike.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] Speeding up a null fetch
2007-02-11 23:52 ` Shawn O. Pearce
@ 2007-02-12 0:18 ` Julian Phillips
0 siblings, 0 replies; 5+ messages in thread
From: Julian Phillips @ 2007-02-12 0:18 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
On Sun, 11 Feb 2007, Shawn O. Pearce wrote:
> Julian Phillips <julian@quantumfyre.co.uk> wrote:
>> Investigation showed that the main culprit seemed to be show-ref
>> having to build a sorted list of all refs for every ref that was being
>> checked. So I used the patch below to reduce this to a single call to
>> show-ref (unless the ref had been updated). With this patch the fetch
>> timed dropped to just under 1m - obviously quite a lot faster (better
>> than I expected in fact).
>
> Have a look at the `pu` branch in git.git. Junio has done some
> work in this area to handle 1000 refs better:
>
> ...
> commit 58fef67cb067b6dee8f94b7b0e0c1a2d324e3505
> Author: Junio C Hamano <junkio@cox.net>
> Date: Tue Jan 16 02:31:36 2007 -0800
>
> git-fetch: rewrite another shell loop in C
>
> Move another shell loop that canonicalizes the list of refs for
> underlying git-fetch-pack and fetch-native-store into C.
>
> This seems to shave the runtime for the same 1000 branch
> repository from 30 seconds down to 15 seconds (it used to be 2
> and half minutes with the original version).
>
> Signed-off-by: Junio C Hamano <junkio@cox.net>
>
> commit 3fc3729cd08e9d40dad54ccdd4db53900eca197b
> Author: Junio C Hamano <junkio@cox.net>
> Date: Tue Jan 16 01:53:29 2007 -0800
>
> git-fetch: move more code into C.
>
> This adds "native-store" subcommand to git-fetch--tool to
> move a huge loop implemented in shell into C. This shaves about
> 70% of the runtime to fetch and update 1000 tracking branches
> with a single fetch.
>
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> ...
>
I shall have to see how this work fares with ~9000 refs ... but it
certainly sounds good.
>> However, this seems more band-aid than fix, and I wondered if someone
>> more familiar with the git internals could point me in the right
>> direction for a better fix, e.g. should I look at rewriting fetch in C?
>
> Rewriting fetch in C is a lot of work, not just in developing it,
> but in testing that all existing functionality is preserved and no
> new bugs are introduced. Rewriting some of the performance critical
> parts perhaps makes sense.
Indeed - this is why I asked rather than just diving in.
> Rewriting them in Python doesn't, as
> we no longer have any Python dependency, and would like to keep it
> that way (actuallly, some folks are also trying to remove the Perl
> dependency from some of our critical tools).
I only used python for speed of development, I was simply trying to verify
my suspicions. I certainly wouldn't expect a python script to get added
(having seen all the python scripts get replaced).
--
Julian
---
There are no accidents whatsoever in the universe.
-- Baba Ram Dass
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-02-12 0:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-11 23:32 [RFC] Speeding up a null fetch Julian Phillips
2007-02-11 23:49 ` Johannes Schindelin
2007-02-12 0:14 ` Julian Phillips
2007-02-11 23:52 ` Shawn O. Pearce
2007-02-12 0:18 ` Julian Phillips
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.