All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julian Phillips <julian@quantumfyre.co.uk>
To: Martin Fick <mfick@codeaurora.org>
Cc: <git@vger.kernel.org>
Subject: Re: Git is not scalable with too many refs/*
Date: Mon, 26 Sep 2011 23:30:46 +0100	[thread overview]
Message-ID: <981f42c890e69e8c4ac1958df52b2214@quantumfyre.co.uk> (raw)
In-Reply-To: <201109261539.33437.mfick@codeaurora.org>

On Mon, 26 Sep 2011 15:39:33 -0600, Martin Fick wrote:
> On Monday, September 26, 2011 02:28:53 pm Julian Phillips
> wrote:
>> On Mon, 26 Sep 2011 14:01:38 -0600, Martin Fick wrote:
>> -- snip --
>>
>> > So, maybe you are correct, maybe my repo is the corner
>> > case? Is a repo which needs to be gced considered a
>> > corner case? Should git be able to detect that the
>> > repo is so in desperate need of gcing?  Is it normal
>> > for git to need to gc right after a clone and then
>> > fetching ~100K refs?
>>
>> Were you 100k refs packed before the gc?  If not, perhaps
>> your refs are causing a lot of trouble for the merge
>> sort?  They will be written out sorted to the
>> packed-refs file, so the merge sort won't have to do any
>> real work when loading them after that...
>
> I am not sure how to determine that (?), but I think they
> were packed.  Under .git/objects/pack there were 2 large
> files, both close to 500MB.  Those 2 files constituted most
> of the space in the repo (I was wrong about the repo sizes,
> that included the working dir, so think about half the
> quoted sizes for all of .git).  So does that mean it is
> mostly packed?  Aside from the pack and idx files, there was
> nothing else under the objects dir.  After gcing, it is down
> to just one ~500MB pack file.

If refs are listed under .git/refs/... they are unpacked, if they are 
listed in .git/packed-refs they are packed.
They can be in both if updated since the last pack.

>> > I am not sure what is right here, if this patch makes a
>> > repo which needs gcing degrade 5 to 10 times worse
>> > than the benefit of this patch, it still seems
>> > questionable to me.
>>
>> Well - it does this _for your repo_, that doesn't
>> automatically mean that it does generally, or
>> frequently.
>
> Oh, def agreed! I just didn't want to discount it so quickly
> as being a corner case.
>
>
>> For instance, none of my normal repos that
>> have a lot of refs are Gerrit ones, and I wouldn't be
>> surprised if they benefitted from the merge sort
>> (assuming that I am right that the merge sort is taking
>> a long time on your gerrit refs).
>>
>> Besides, you would be better off running gc, and thus
>> getting the benefit too.
>
> Agreed, which is why I was asking if git should have noticed
> my "degenerate" case and auto gced?  But hopefully, there is
> an actual bug here somewhere and we both will get to eat our
> cake. :)

I think automatic gc is currently only triggered by unpacked objects, 
not unpacked refs ... perhaps the auto-gc should cover refs too?

>> >> Random thought.  What happens to the with compression
>> >> case if you leave the commit in, but add a sleep(15)
>> >> to the end of sort_refs_list?
>> >
>> > Why, what are you thinking?  Hmm, I am trying this on
>> > the non gced repo and it doesn't seem to be completing
>> > (no cpu usage)!  It appears that perhaps it is being
>> > called many times (the sleeping would explain no cpu
>> > usage)?!?  This could be a real problem, this should
>> > only get called once right?
>>
>> I was just wondering if the time taken to get the refs
>> was changing the interaction with something else.  Not
>> very likely, but ...
>>
>> I added a print statement, and it was called four times
>> when I had unpacked refs, and once with packed.  So,
>> maybe you are hitting some nasty case with unpacked
>> refs.  If you use a print statement instead of a sleep,
>> how many times does sort_refs_lists get called in your
>> unpacked case?  It may well also be worth calculating
>> the time taken to do the sort.
>
> In my case it was called 18785 times!  Any other tests I
> should run?

That's a lot of sorts.  I really can't see why there would need to be 
more than one ...

I've created a new test repo, using a more complicated method to 
construct the 100k refs, and it took ~40m to run "git branch" instead of 
the 1.2s for the previous repo.  So, I think the ref naming pattern used 
by Gerrit is definitely triggering something odd.  However, progress is 
a bit slow - now that it takes over 1/2 an hour to try things out ...

-- 
Julian

  parent reply	other threads:[~2011-09-26 22:30 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09  6:50 ` Sverre Rabbelier
2011-06-09 15:23   ` Shawn Pearce
2011-06-09 15:52     ` A Large Angry SCM
2011-06-09 15:56       ` Shawn Pearce
2011-06-09 16:26         ` Jeff King
2011-06-10  3:59           ` NAKAMURA Takumi
2011-06-13 22:27             ` Jeff King
2011-06-14  0:17             ` Andreas Ericsson
2011-06-14  0:30               ` Jeff King
2011-06-14  4:41                 ` Junio C Hamano
2011-06-14  7:26                   ` Sverre Rabbelier
2011-06-14 10:02                     ` Johan Herland
2011-06-14 10:34                       ` Sverre Rabbelier
2011-06-14 17:02                       ` Jeff King
2011-06-14 19:20                         ` Shawn Pearce
2011-06-14 19:47                           ` Jeff King
2011-06-14 20:12                             ` Shawn Pearce
2011-09-08 19:53                               ` Martin Fick
2011-09-09  0:52                                 ` Martin Fick
2011-09-09  1:05                                   ` Thomas Rast
2011-09-09  1:13                                     ` Thomas Rast
2011-09-09 15:59                                   ` Jens Lehmann
2011-09-25 20:43                                   ` Martin Fick
2011-09-26 12:41                                     ` Christian Couder
2011-09-26 17:47                                       ` Martin Fick
2011-09-26 18:56                                         ` Christian Couder
2011-09-30 16:41                                           ` Martin Fick
2011-09-30 19:26                                             ` Martin Fick
2011-09-30 21:02                                             ` Martin Fick
2011-09-30 22:06                                               ` Martin Fick
2011-10-01 20:41                                                 ` Junio C Hamano
2011-10-02  5:19                                                   ` Michael Haggerty
2011-10-03  0:46                                                     ` Martin Fick
2011-10-04  8:08                                                       ` Michael Haggerty
2011-10-03 18:12                                                 ` Martin Fick
2011-10-03 19:42                                                   ` Junio C Hamano
2011-10-04  8:16                                                   ` Michael Haggerty
2011-10-08 20:59                                                 ` Martin Fick
2011-10-09  5:43                                                   ` Michael Haggerty
2011-09-28 19:38                                       ` Martin Fick
2011-09-28 22:10                                         ` Martin Fick
2011-09-29  0:54                                           ` Julian Phillips
2011-09-29  1:37                                             ` Martin Fick
2011-09-29  2:19                                               ` Julian Phillips
2011-09-29 16:38                                                 ` Martin Fick
2011-09-29 18:26                                                   ` Julian Phillips
2011-09-29 18:27                                                 ` René Scharfe
2011-09-29 19:10                                                   ` Junio C Hamano
2011-09-29  4:18                                                     ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57                                                       ` Junio C Hamano
2011-09-29 22:04                                                       ` [PATCH v2] " Julian Phillips
2011-09-29 22:06                                                       ` [PATCH] " Junio C Hamano
2011-09-29 22:11                                                         ` [PATCH v3] " Julian Phillips
2011-09-29 23:48                                                           ` Junio C Hamano
2011-09-30 15:30                                                             ` Michael Haggerty
2011-09-30 16:38                                                               ` Junio C Hamano
2011-09-30 17:56                                                                 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02  5:15                                                                 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02  5:45                                                                   ` Junio C Hamano
2011-10-04 20:58                                                                     ` Junio C Hamano
2011-09-30  1:13                                                           ` Martin Fick
2011-09-30  3:44                                                             ` Junio C Hamano
2011-09-30  8:04                                                               ` Julian Phillips
2011-09-30 15:45                                                               ` Martin Fick
2011-09-29 20:44                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10                                                   ` Julian Phillips
2011-09-29 20:11                                                   ` Martin Fick
2011-09-30  9:12                                                     ` René Scharfe
2011-09-30 16:09                                                       ` Martin Fick
2011-09-30 16:52                                                       ` Junio C Hamano
2011-09-30 18:17                                                         ` René Scharfe
2011-10-01 15:28                                                           ` René Scharfe
2011-10-01 15:38                                                             ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02                                                               ` Sverre Rabbelier
2011-10-01 15:43                                                             ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51                                                             ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56                                                             ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01                                                             ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02                                                             ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09                                                             ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16                                                             ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21                                       ` Sverre Rabbelier
2011-09-26 15:48                                         ` Martin Fick
2011-09-26 15:56                                           ` Sverre Rabbelier
2011-09-26 16:38                                             ` Martin Fick
2011-09-26 16:49                                               ` Julian Phillips
2011-09-26 18:07                                       ` Martin Fick
2011-09-26 18:37                                         ` Julian Phillips
2011-09-26 20:01                                           ` Martin Fick
2011-09-26 20:07                                             ` Junio C Hamano
2011-09-26 20:28                                             ` Julian Phillips
2011-09-26 21:39                                               ` Martin Fick
2011-09-26 21:52                                                 ` Martin Fick
2011-09-26 23:26                                                   ` Julian Phillips
2011-09-26 23:37                                                     ` David Michael Barr
2011-09-27  1:01                                                       ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27  2:04                                                         ` David Michael Barr
2011-09-26 23:38                                                     ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27  0:00                                                       ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02  4:58                                                         ` Michael Haggerty
2011-09-27  0:12                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-27  0:22                                                       ` Julian Phillips
2011-09-27  2:34                                                         ` Martin Fick
2011-09-27  7:59                                                           ` Julian Phillips
2011-09-27  8:20                                                     ` Sverre Rabbelier
2011-09-27  9:01                                                       ` Julian Phillips
2011-09-27 10:01                                                         ` Sverre Rabbelier
2011-09-27 10:25                                                           ` Nguyen Thai Ngoc Duy
2011-09-27 11:07                                                         ` Michael Haggerty
2011-09-27 12:10                                                           ` Julian Phillips
2011-09-26 22:30                                                 ` Julian Phillips [this message]
2011-09-26 15:32                                     ` Michael Haggerty
2011-09-26 15:42                                       ` Martin Fick
2011-09-26 16:25                                         ` Thomas Rast
2011-09-09 13:50                                 ` Michael Haggerty
2011-09-09 15:51                                   ` Michael Haggerty
2011-09-09 16:03                                   ` Jens Lehmann
2011-06-10  7:41         ` Andreas Ericsson
2011-06-10 19:41           ` Shawn Pearce
2011-06-10 20:12             ` Jakub Narebski
2011-06-10 20:35             ` Jeff King
2011-06-13  7:08             ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42   ` Stephen Bash

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=981f42c890e69e8c4ac1958df52b2214@quantumfyre.co.uk \
    --to=julian@quantumfyre.co.uk \
    --cc=git@vger.kernel.org \
    --cc=mfick@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.