All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: git@vger.kernel.org
Cc: "Christian Couder" <chriscool@tuxfamily.org>,
	"Thomas Rast" <trast@student.ethz.ch>,
	"René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
	"Julian Phillips" <julian@quantumfyre.co.uk>,
	"Michael Haggerty" <mhagger@alum.mit.edu>
Subject: Re: Git is not scalable with too many refs/*
Date: Sat, 8 Oct 2011 14:59:51 -0600	[thread overview]
Message-ID: <201110081459.52174.mfick@codeaurora.org> (raw)
In-Reply-To: <201109301606.31748.mfick@codeaurora.org>

On Friday, September 30, 2011 04:06:31 pm Martin Fick wrote:
> On Friday, September 30, 2011 03:02:30 pm Martin Fick 
wrote:
> > On Friday, September 30, 2011 10:41:13 am Martin Fick
> 
> wrote:
> > Since a full sync is now done to about 5mins, I broke
> > down the output a bit.  It appears that the longest
> > part (2:45m) is now the time spent scrolling though
> > each
> > 
> > change still. Each one of these takes about 2ms:
> >  * [new branch]      refs/changes/99/71199/1 ->
> > 
> > refs/changes/99/71199/1
> > 
> > Seems fast, but at about 80K... So, are there any
> > obvious N loops over the refs happening inside each of
> > of the [new branch] iterations?
> 
> OK, I narrowed it down I believe.  If I comment out the
> invalidate_cached_refs() line in write_ref_sha1(), it
> speeds through this section.
> 
> I guess this makes sense, we invalidate the cache and
> have to rebuild it after every new ref is added? 
> Perhaps a simple fix would be to move the invalidation
> right after all the refs are updated?  Maybe
> write_ref_sha1 could take in a flag to tell it to not
> invalidate the cache so that during iterative updates it
> could be disabled and then run manually after the
> update?

OK, this thing has been bugging me...

I found some more surprising results, I hope you can follow 
because there are corner cases here which have surprising 
impacts.


** Important fact: 
** ---------------
** When I clone my repo, it has about 4K tags which
** come in packed to the clone.
**

This fact has a heavy impact on how I test things. If I 
choose to delete these packed-refs from the cloned repo and 
then do a fetch of the changes, all of the tags are also 
fetched along with these changes.  This means that if I want 
to test the impact of having packed-refs vs no packed refs, 
on my change fetches, I need to first delete the packed-refs 
file, and second fetch all the tags again, so that when I 
fetch the changes, the repo only actually fetches changes, 
not all the tags!

So, with this in mind, I have discovered, that the fetch 
performance degradation by invalidating the caches in 
write_ref_sha1() is actually due to the packed-refs being 
reloaded and resorted again on each ref insertion (not the 
loose refs)!!!

Remember the important fact above?  Yeah, those silly 4K 
refs (not a huge number, not 61K!) take a while to reread 
from the file and sort.  When this is done for 61K changes, 
it adds a lot of time to a fetch.  The sad part is that, of 
course, the packed-refs don't really need to be invalidated 
since we never add new refs as packed refs during a fetch 
(but apparently we do during a clone)!  Also noteworthy is 
that invalidating the loose refs, does not cause a big 
delay.


Some data:

1) A fetch of the changes in my series with all good 
external patches applied takes about 7:30min.


2) A fetch of the changes with #1 invalidate_cache_refs() 
commented out in write_ref_sha1() takes about 1:50min.


3) A fetch of the changes with #1 with 
invalidate_cache_refs() in write_ref_sha1() replaced with a 
call to my custom invalidate_loose_cache_refs() takes about 
1:50min.


4) A fetch with #1 on a repo with packed-refs deleted after 
the clone, takes about ~5min.  

** This is a strange regression which threw me off.  In this 
case, all the tags are refetched in addition to the changes, 
this seems to cause some weird interaction that makes things 
take longer than they should (#5 + #6 = 2:10m  <<  #4 5min).


5) A fetch with #1 on a repo with packed-refs deleted after 
the clone, and then a fetch done to get all the tags (see 
#6), takes only 1:30m!!!!


6) A fetch to get all the **TAGS** with packed-refs deleted 
after the clone, takes about 40s.



---Additional side data/tests:

7) A fetch of the changes with #1 and a special flag causing 
the packed-refs to be read from the file, but not parsed or 
sorted, takes 2:34min.  So just the repeated reads add at 
least 40s.


8) A fetch of the changes with #1 and a special flag causing 
the packed-refs to be read from the file, parsed, but NOT 
sorted, takes 3:40min.  So the parsing appears to take an 
additional minute at least.




I think that all of this might explain why no matter how 
good Michael's intentions are with his patch series, his 
series isn't likely to fix this problem unless he does not 
invalidate the packed-refs after each insertion.  I tried 
preventing this invalidation in his series to prove this, 
but unfortunately, it appears that in his series it is no 
longer possible to only invalidate just the packed-refs? :(
Michael, I hope I am completely wrong about that...


Are there any good consistency reasons to invalidate the 
packed refs in write_ref_sha1()?  If not, would you accept a 
patch to simply skip this invalidation (to only invalidate 
the loose refs)?

Thanks,
 
-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

  parent reply	other threads:[~2011-10-08 20:59 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09  6:50 ` Sverre Rabbelier
2011-06-09 15:23   ` Shawn Pearce
2011-06-09 15:52     ` A Large Angry SCM
2011-06-09 15:56       ` Shawn Pearce
2011-06-09 16:26         ` Jeff King
2011-06-10  3:59           ` NAKAMURA Takumi
2011-06-13 22:27             ` Jeff King
2011-06-14  0:17             ` Andreas Ericsson
2011-06-14  0:30               ` Jeff King
2011-06-14  4:41                 ` Junio C Hamano
2011-06-14  7:26                   ` Sverre Rabbelier
2011-06-14 10:02                     ` Johan Herland
2011-06-14 10:34                       ` Sverre Rabbelier
2011-06-14 17:02                       ` Jeff King
2011-06-14 19:20                         ` Shawn Pearce
2011-06-14 19:47                           ` Jeff King
2011-06-14 20:12                             ` Shawn Pearce
2011-09-08 19:53                               ` Martin Fick
2011-09-09  0:52                                 ` Martin Fick
2011-09-09  1:05                                   ` Thomas Rast
2011-09-09  1:13                                     ` Thomas Rast
2011-09-09 15:59                                   ` Jens Lehmann
2011-09-25 20:43                                   ` Martin Fick
2011-09-26 12:41                                     ` Christian Couder
2011-09-26 17:47                                       ` Martin Fick
2011-09-26 18:56                                         ` Christian Couder
2011-09-30 16:41                                           ` Martin Fick
2011-09-30 19:26                                             ` Martin Fick
2011-09-30 21:02                                             ` Martin Fick
2011-09-30 22:06                                               ` Martin Fick
2011-10-01 20:41                                                 ` Junio C Hamano
2011-10-02  5:19                                                   ` Michael Haggerty
2011-10-03  0:46                                                     ` Martin Fick
2011-10-04  8:08                                                       ` Michael Haggerty
2011-10-03 18:12                                                 ` Martin Fick
2011-10-03 19:42                                                   ` Junio C Hamano
2011-10-04  8:16                                                   ` Michael Haggerty
2011-10-08 20:59                                                 ` Martin Fick [this message]
2011-10-09  5:43                                                   ` Michael Haggerty
2011-09-28 19:38                                       ` Martin Fick
2011-09-28 22:10                                         ` Martin Fick
2011-09-29  0:54                                           ` Julian Phillips
2011-09-29  1:37                                             ` Martin Fick
2011-09-29  2:19                                               ` Julian Phillips
2011-09-29 16:38                                                 ` Martin Fick
2011-09-29 18:26                                                   ` Julian Phillips
2011-09-29 18:27                                                 ` René Scharfe
2011-09-29 19:10                                                   ` Junio C Hamano
2011-09-29  4:18                                                     ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57                                                       ` Junio C Hamano
2011-09-29 22:04                                                       ` [PATCH v2] " Julian Phillips
2011-09-29 22:06                                                       ` [PATCH] " Junio C Hamano
2011-09-29 22:11                                                         ` [PATCH v3] " Julian Phillips
2011-09-29 23:48                                                           ` Junio C Hamano
2011-09-30 15:30                                                             ` Michael Haggerty
2011-09-30 16:38                                                               ` Junio C Hamano
2011-09-30 17:56                                                                 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02  5:15                                                                 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02  5:45                                                                   ` Junio C Hamano
2011-10-04 20:58                                                                     ` Junio C Hamano
2011-09-30  1:13                                                           ` Martin Fick
2011-09-30  3:44                                                             ` Junio C Hamano
2011-09-30  8:04                                                               ` Julian Phillips
2011-09-30 15:45                                                               ` Martin Fick
2011-09-29 20:44                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10                                                   ` Julian Phillips
2011-09-29 20:11                                                   ` Martin Fick
2011-09-30  9:12                                                     ` René Scharfe
2011-09-30 16:09                                                       ` Martin Fick
2011-09-30 16:52                                                       ` Junio C Hamano
2011-09-30 18:17                                                         ` René Scharfe
2011-10-01 15:28                                                           ` René Scharfe
2011-10-01 15:38                                                             ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02                                                               ` Sverre Rabbelier
2011-10-01 15:43                                                             ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51                                                             ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56                                                             ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01                                                             ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02                                                             ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09                                                             ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16                                                             ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21                                       ` Sverre Rabbelier
2011-09-26 15:48                                         ` Martin Fick
2011-09-26 15:56                                           ` Sverre Rabbelier
2011-09-26 16:38                                             ` Martin Fick
2011-09-26 16:49                                               ` Julian Phillips
2011-09-26 18:07                                       ` Martin Fick
2011-09-26 18:37                                         ` Julian Phillips
2011-09-26 20:01                                           ` Martin Fick
2011-09-26 20:07                                             ` Junio C Hamano
2011-09-26 20:28                                             ` Julian Phillips
2011-09-26 21:39                                               ` Martin Fick
2011-09-26 21:52                                                 ` Martin Fick
2011-09-26 23:26                                                   ` Julian Phillips
2011-09-26 23:37                                                     ` David Michael Barr
2011-09-27  1:01                                                       ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27  2:04                                                         ` David Michael Barr
2011-09-26 23:38                                                     ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27  0:00                                                       ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02  4:58                                                         ` Michael Haggerty
2011-09-27  0:12                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-27  0:22                                                       ` Julian Phillips
2011-09-27  2:34                                                         ` Martin Fick
2011-09-27  7:59                                                           ` Julian Phillips
2011-09-27  8:20                                                     ` Sverre Rabbelier
2011-09-27  9:01                                                       ` Julian Phillips
2011-09-27 10:01                                                         ` Sverre Rabbelier
2011-09-27 10:25                                                           ` Nguyen Thai Ngoc Duy
2011-09-27 11:07                                                         ` Michael Haggerty
2011-09-27 12:10                                                           ` Julian Phillips
2011-09-26 22:30                                                 ` Julian Phillips
2011-09-26 15:32                                     ` Michael Haggerty
2011-09-26 15:42                                       ` Martin Fick
2011-09-26 16:25                                         ` Thomas Rast
2011-09-09 13:50                                 ` Michael Haggerty
2011-09-09 15:51                                   ` Michael Haggerty
2011-09-09 16:03                                   ` Jens Lehmann
2011-06-10  7:41         ` Andreas Ericsson
2011-06-10 19:41           ` Shawn Pearce
2011-06-10 20:12             ` Jakub Narebski
2011-06-10 20:35             ` Jeff King
2011-06-13  7:08             ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42   ` Stephen Bash

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201110081459.52174.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=julian@quantumfyre.co.uk \
    --cc=mhagger@alum.mit.edu \
    --cc=rene.scharfe@lsrfire.ath.cx \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.