From: Martin Fick <mfick@codeaurora.org>
To: git@vger.kernel.org
Cc: "Christian Couder" <chriscool@tuxfamily.org>,
"Thomas Rast" <trast@student.ethz.ch>,
"René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Julian Phillips" <julian@quantumfyre.co.uk>,
"Michael Haggerty" <mhagger@alum.mit.edu>
Subject: Re: Git is not scalable with too many refs/*
Date: Sat, 8 Oct 2011 14:59:51 -0600 [thread overview]
Message-ID: <201110081459.52174.mfick@codeaurora.org> (raw)
In-Reply-To: <201109301606.31748.mfick@codeaurora.org>
On Friday, September 30, 2011 04:06:31 pm Martin Fick wrote:
> On Friday, September 30, 2011 03:02:30 pm Martin Fick
wrote:
> > On Friday, September 30, 2011 10:41:13 am Martin Fick
>
> wrote:
> > Since a full sync is now done to about 5mins, I broke
> > down the output a bit. It appears that the longest
> > part (2:45m) is now the time spent scrolling though
> > each
> >
> > change still. Each one of these takes about 2ms:
> > * [new branch] refs/changes/99/71199/1 ->
> >
> > refs/changes/99/71199/1
> >
> > Seems fast, but at about 80K... So, are there any
> > obvious N loops over the refs happening inside each of
> > of the [new branch] iterations?
>
> OK, I narrowed it down I believe. If I comment out the
> invalidate_cached_refs() line in write_ref_sha1(), it
> speeds through this section.
>
> I guess this makes sense, we invalidate the cache and
> have to rebuild it after every new ref is added?
> Perhaps a simple fix would be to move the invalidation
> right after all the refs are updated? Maybe
> write_ref_sha1 could take in a flag to tell it to not
> invalidate the cache so that during iterative updates it
> could be disabled and then run manually after the
> update?
OK, this thing has been bugging me...
I found some more surprising results, I hope you can follow
because there are corner cases here which have surprising
impacts.
** Important fact:
** ---------------
** When I clone my repo, it has about 4K tags which
** come in packed to the clone.
**
This fact has a heavy impact on how I test things. If I
choose to delete these packed-refs from the cloned repo and
then do a fetch of the changes, all of the tags are also
fetched along with these changes. This means that if I want
to test the impact of having packed-refs vs no packed refs,
on my change fetches, I need to first delete the packed-refs
file, and second fetch all the tags again, so that when I
fetch the changes, the repo only actually fetches changes,
not all the tags!
So, with this in mind, I have discovered, that the fetch
performance degradation by invalidating the caches in
write_ref_sha1() is actually due to the packed-refs being
reloaded and resorted again on each ref insertion (not the
loose refs)!!!
Remember the important fact above? Yeah, those silly 4K
refs (not a huge number, not 61K!) take a while to reread
from the file and sort. When this is done for 61K changes,
it adds a lot of time to a fetch. The sad part is that, of
course, the packed-refs don't really need to be invalidated
since we never add new refs as packed refs during a fetch
(but apparently we do during a clone)! Also noteworthy is
that invalidating the loose refs, does not cause a big
delay.
Some data:
1) A fetch of the changes in my series with all good
external patches applied takes about 7:30min.
2) A fetch of the changes with #1 invalidate_cache_refs()
commented out in write_ref_sha1() takes about 1:50min.
3) A fetch of the changes with #1 with
invalidate_cache_refs() in write_ref_sha1() replaced with a
call to my custom invalidate_loose_cache_refs() takes about
1:50min.
4) A fetch with #1 on a repo with packed-refs deleted after
the clone, takes about ~5min.
** This is a strange regression which threw me off. In this
case, all the tags are refetched in addition to the changes,
this seems to cause some weird interaction that makes things
take longer than they should (#5 + #6 = 2:10m << #4 5min).
5) A fetch with #1 on a repo with packed-refs deleted after
the clone, and then a fetch done to get all the tags (see
#6), takes only 1:30m!!!!
6) A fetch to get all the **TAGS** with packed-refs deleted
after the clone, takes about 40s.
---Additional side data/tests:
7) A fetch of the changes with #1 and a special flag causing
the packed-refs to be read from the file, but not parsed or
sorted, takes 2:34min. So just the repeated reads add at
least 40s.
8) A fetch of the changes with #1 and a special flag causing
the packed-refs to be read from the file, parsed, but NOT
sorted, takes 3:40min. So the parsing appears to take an
additional minute at least.
I think that all of this might explain why no matter how
good Michael's intentions are with his patch series, his
series isn't likely to fix this problem unless he does not
invalidate the packed-refs after each insertion. I tried
preventing this invalidation in his series to prove this,
but unfortunately, it appears that in his series it is no
longer possible to only invalidate just the packed-refs? :(
Michael, I hope I am completely wrong about that...
Are there any good consistency reasons to invalidate the
packed refs in write_ref_sha1()? If not, would you accept a
patch to simply skip this invalidation (to only invalidate
the loose refs)?
Thanks,
-Martin
--
Employee of Qualcomm Innovation Center, Inc. which is a
member of Code Aurora Forum
next prev parent reply other threads:[~2011-10-08 20:59 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09 6:50 ` Sverre Rabbelier
2011-06-09 15:23 ` Shawn Pearce
2011-06-09 15:52 ` A Large Angry SCM
2011-06-09 15:56 ` Shawn Pearce
2011-06-09 16:26 ` Jeff King
2011-06-10 3:59 ` NAKAMURA Takumi
2011-06-13 22:27 ` Jeff King
2011-06-14 0:17 ` Andreas Ericsson
2011-06-14 0:30 ` Jeff King
2011-06-14 4:41 ` Junio C Hamano
2011-06-14 7:26 ` Sverre Rabbelier
2011-06-14 10:02 ` Johan Herland
2011-06-14 10:34 ` Sverre Rabbelier
2011-06-14 17:02 ` Jeff King
2011-06-14 19:20 ` Shawn Pearce
2011-06-14 19:47 ` Jeff King
2011-06-14 20:12 ` Shawn Pearce
2011-09-08 19:53 ` Martin Fick
2011-09-09 0:52 ` Martin Fick
2011-09-09 1:05 ` Thomas Rast
2011-09-09 1:13 ` Thomas Rast
2011-09-09 15:59 ` Jens Lehmann
2011-09-25 20:43 ` Martin Fick
2011-09-26 12:41 ` Christian Couder
2011-09-26 17:47 ` Martin Fick
2011-09-26 18:56 ` Christian Couder
2011-09-30 16:41 ` Martin Fick
2011-09-30 19:26 ` Martin Fick
2011-09-30 21:02 ` Martin Fick
2011-09-30 22:06 ` Martin Fick
2011-10-01 20:41 ` Junio C Hamano
2011-10-02 5:19 ` Michael Haggerty
2011-10-03 0:46 ` Martin Fick
2011-10-04 8:08 ` Michael Haggerty
2011-10-03 18:12 ` Martin Fick
2011-10-03 19:42 ` Junio C Hamano
2011-10-04 8:16 ` Michael Haggerty
2011-10-08 20:59 ` Martin Fick [this message]
2011-10-09 5:43 ` Michael Haggerty
2011-09-28 19:38 ` Martin Fick
2011-09-28 22:10 ` Martin Fick
2011-09-29 0:54 ` Julian Phillips
2011-09-29 1:37 ` Martin Fick
2011-09-29 2:19 ` Julian Phillips
2011-09-29 16:38 ` Martin Fick
2011-09-29 18:26 ` Julian Phillips
2011-09-29 18:27 ` René Scharfe
2011-09-29 19:10 ` Junio C Hamano
2011-09-29 4:18 ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57 ` Junio C Hamano
2011-09-29 22:04 ` [PATCH v2] " Julian Phillips
2011-09-29 22:06 ` [PATCH] " Junio C Hamano
2011-09-29 22:11 ` [PATCH v3] " Julian Phillips
2011-09-29 23:48 ` Junio C Hamano
2011-09-30 15:30 ` Michael Haggerty
2011-09-30 16:38 ` Junio C Hamano
2011-09-30 17:56 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02 5:15 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02 5:45 ` Junio C Hamano
2011-10-04 20:58 ` Junio C Hamano
2011-09-30 1:13 ` Martin Fick
2011-09-30 3:44 ` Junio C Hamano
2011-09-30 8:04 ` Julian Phillips
2011-09-30 15:45 ` Martin Fick
2011-09-29 20:44 ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10 ` Julian Phillips
2011-09-29 20:11 ` Martin Fick
2011-09-30 9:12 ` René Scharfe
2011-09-30 16:09 ` Martin Fick
2011-09-30 16:52 ` Junio C Hamano
2011-09-30 18:17 ` René Scharfe
2011-10-01 15:28 ` René Scharfe
2011-10-01 15:38 ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02 ` Sverre Rabbelier
2011-10-01 15:43 ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51 ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56 ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01 ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02 ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09 ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16 ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15 ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21 ` Sverre Rabbelier
2011-09-26 15:48 ` Martin Fick
2011-09-26 15:56 ` Sverre Rabbelier
2011-09-26 16:38 ` Martin Fick
2011-09-26 16:49 ` Julian Phillips
2011-09-26 18:07 ` Martin Fick
2011-09-26 18:37 ` Julian Phillips
2011-09-26 20:01 ` Martin Fick
2011-09-26 20:07 ` Junio C Hamano
2011-09-26 20:28 ` Julian Phillips
2011-09-26 21:39 ` Martin Fick
2011-09-26 21:52 ` Martin Fick
2011-09-26 23:26 ` Julian Phillips
2011-09-26 23:37 ` David Michael Barr
2011-09-27 1:01 ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27 2:04 ` David Michael Barr
2011-09-26 23:38 ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27 0:00 ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02 4:58 ` Michael Haggerty
2011-09-27 0:12 ` Git is not scalable with too many refs/* Martin Fick
2011-09-27 0:22 ` Julian Phillips
2011-09-27 2:34 ` Martin Fick
2011-09-27 7:59 ` Julian Phillips
2011-09-27 8:20 ` Sverre Rabbelier
2011-09-27 9:01 ` Julian Phillips
2011-09-27 10:01 ` Sverre Rabbelier
2011-09-27 10:25 ` Nguyen Thai Ngoc Duy
2011-09-27 11:07 ` Michael Haggerty
2011-09-27 12:10 ` Julian Phillips
2011-09-26 22:30 ` Julian Phillips
2011-09-26 15:32 ` Michael Haggerty
2011-09-26 15:42 ` Martin Fick
2011-09-26 16:25 ` Thomas Rast
2011-09-09 13:50 ` Michael Haggerty
2011-09-09 15:51 ` Michael Haggerty
2011-09-09 16:03 ` Jens Lehmann
2011-06-10 7:41 ` Andreas Ericsson
2011-06-10 19:41 ` Shawn Pearce
2011-06-10 20:12 ` Jakub Narebski
2011-06-10 20:35 ` Jeff King
2011-06-13 7:08 ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42 ` Stephen Bash
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201110081459.52174.mfick@codeaurora.org \
--to=mfick@codeaurora.org \
--cc=chriscool@tuxfamily.org \
--cc=git@vger.kernel.org \
--cc=julian@quantumfyre.co.uk \
--cc=mhagger@alum.mit.edu \
--cc=rene.scharfe@lsrfire.ath.cx \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.