All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Geert Jansen <gerardu@amazon.com>,
	Junio C Hamano <gitster@pobox.com>,
	"git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [RFC PATCH] index-pack: improve performance on NFS
Date: Fri, 09 Nov 2018 11:11:55 +0100	[thread overview]
Message-ID: <87h8gq5zmc.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <87lg635hz4.fsf@evledraar.gmail.com>


On Thu, Nov 08 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Nov 08 2018, Jeff King wrote:
>
>> On Wed, Nov 07, 2018 at 10:55:24PM +0000, Geert Jansen wrote:
>>
>>> On Mon, Oct 29, 2018 at 07:27:39PM -0400, Jeff King wrote:
>>>
>>> > On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>> > >  * Re-roll my 4 patch series to include the patch you have in
>>> > >    <20181027093300.GA23974@sigill.intra.peff.net>
>>> >
>>> > I don't think it's quite ready for inclusion as-is. I hope to brush it
>>> > up a bit, but I have quite a backlog of stuff to review, as well.
>>>
>>> We're still quite keen to get this patch included. Is there anything I can do
>>> to help?
>>
>> Yes, testing and review. :)
>>
>> I won't send the series out just yet, as I suspect it could use another
>> read-through on my part. But if you want to peek at it or try some
>> timings, it's available at:
>>
>>   https://github.com/peff/git jk/loose-cache
>
> Just a comment on this from the series:
>
>     Note that it is possible for this to actually be _slower_. We'll do a
>     full readdir() to fill the cache, so if you have a very large number of
>     loose objects and a very small number of lookups, that readdir() may end
>     up more expensive.
>
>     In practice, though, having a large number of loose objects is already a
>     performance problem, which should be fixed by repacking or pruning via
>     git-gc. So on balance, this should be a good tradeoff.
>
> Our biggest repo has a very large number of loose objects at any given
> time, but the vast majority of these are because gc *is* happening very
> frequently and the default expiry policy of 2wks is in effect.
>
> Having a large number of loose objects is not per-se a performance
> problem.
>
> It's a problem if you end up "faulting" to from packs to the loose
> object directory a lot because those objects are still reachable, but if
> they're not reachable that number can grow very large if your ref churn
> is large (so lots of expired loose object production).
>
> Anyway, the series per-se looks good to me. It's particularly nice to
> have some of the ODB cleanup + cleanup in fetch-pack.c
>
> Just wanted to note that in our default (reasonable) config we do
> produce scenarios where this change can still be somewhat pathological,
> so I'm still interested in disabling it entirely given the
> implausibility of what it's guarding against.

Some actual numbers for this for a fairly small repo on NFS, "cold"
cache (hadn't run it in a bit):

    $ time (find objects/?? -type f|wc -l)
    862
    real    0m1.927s

Warm cache:

    $ time (find objects/?? -type f|wc -l)
    872
    real    0m0.151s

Cold cache on a bigger monorepo:

    $ time (find objects/?? -type f|wc -l)
    real    0m4.336s

Warm cache on a bigger monorepo (more ref churn):

    $ time (find objects/?? -type f|wc -l)
    49746
    real    0m1.082s

This on a server where bulk sustained writes of large files are really
fast (up to 1GB/s). It's just these metadata ops that are slow.

I also get cold cache times of up to 6 seconds on:

    time (find $(ls -d objects/??|sort -R) -type f | wc -l)

As opposed max of ~4s without -R, so I suspect there may be some
client/server optimization where things are iterated over in recursive
glob order (pre-fetched?), whereas the cache will try to fill buckets is
it encounters loose objects, so iterate over objects/{00..ff} randomly.

I'm not really leading up to any point here I haven't made already. I
was just curious to try to find some upper bound of overhead if say a
pack with 512 objects is pushed. In that case it's very likely that we
need to fill at least 200/256 buckets.

  reply	other threads:[~2018-11-09 10:12 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-25 18:38 [RFC PATCH] index-pack: improve performance on NFS Jansen, Geert
2018-10-26  0:21 ` Junio C Hamano
2018-10-26 20:38   ` Ævar Arnfjörð Bjarmason
2018-10-27  7:26     ` Junio C Hamano
2018-10-27  9:33       ` Jeff King
2018-10-27 11:22         ` Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking Ævar Arnfjörð Bjarmason
2018-10-30  2:49             ` Geert Jansen
2018-10-30  9:04               ` Junio C Hamano
2018-10-30 18:43             ` [PATCH v2 0/3] index-pack: test updates Ævar Arnfjörð Bjarmason
2018-11-13 20:19               ` [PATCH v3] index-pack: add ability to disable SHA-1 collision check Ævar Arnfjörð Bjarmason
2018-11-14  7:09                 ` Junio C Hamano
2018-11-14 12:40                   ` Ævar Arnfjörð Bjarmason
2018-10-30 18:43             ` [PATCH v2 1/3] pack-objects test: modernize style Ævar Arnfjörð Bjarmason
2018-10-30 18:43             ` [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-30 18:43             ` [PATCH v2 3/3] index-pack tests: don't leave test repo dirty " Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 1/4] pack-objects test: modernize style Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 3/4] index-pack tests: don't leave test repo dirty " Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 4/4] index-pack: add ability to disable SHA-1 collision check Ævar Arnfjörð Bjarmason
2018-10-29 15:04           ` [RFC PATCH] index-pack: improve performance on NFS Jeff King
2018-10-29 15:09             ` Jeff King
2018-10-29 19:36             ` Ævar Arnfjörð Bjarmason
2018-10-29 23:27               ` Jeff King
2018-11-07 22:55                 ` Geert Jansen
2018-11-08 12:02                   ` Jeff King
2018-11-08 20:58                     ` Geert Jansen
2018-11-08 21:18                       ` Jeff King
2018-11-08 21:55                         ` Geert Jansen
2018-11-08 22:20                     ` Ævar Arnfjörð Bjarmason
2018-11-09 10:11                       ` Ævar Arnfjörð Bjarmason [this message]
2018-11-12 14:31                       ` Jeff King
2018-11-12 14:46                     ` [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:46                       ` [PATCH 1/9] fsck: do not reuse child_process structs Jeff King
2018-11-12 15:26                         ` Derrick Stolee
2018-11-12 14:47                       ` [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with() Jeff King
2018-11-12 18:23                         ` Stefan Beller
2018-11-12 14:48                       ` [PATCH 3/9] rename "alternate_object_database" to "object_directory" Jeff King
2018-11-12 15:30                         ` Derrick Stolee
2018-11-12 15:36                           ` Jeff King
2018-11-12 19:41                             ` Ramsay Jones
2018-11-12 14:48                       ` [PATCH 4/9] sha1_file_name(): overwrite buffer instead of appending Jeff King
2018-11-12 15:32                         ` Derrick Stolee
2018-11-12 14:49                       ` [PATCH 5/9] handle alternates paths the same as the main object dir Jeff King
2018-11-12 15:38                         ` Derrick Stolee
2018-11-12 15:46                           ` Jeff King
2018-11-12 15:50                             ` Derrick Stolee
2018-11-12 14:50                       ` [PATCH 6/9] sha1-file: use an object_directory for " Jeff King
2018-11-12 15:48                         ` Derrick Stolee
2018-11-12 16:09                           ` Jeff King
2018-11-12 19:04                             ` Stefan Beller
2018-11-22 17:42                               ` Jeff King
2018-11-12 18:48                           ` Stefan Beller
2018-11-12 14:50                       ` [PATCH 7/9] object-store: provide helpers for loose_objects_cache Jeff King
2018-11-12 19:24                         ` René Scharfe
2018-11-12 20:16                           ` Jeff King
2018-11-12 14:54                       ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
2018-11-12 16:00                         ` Derrick Stolee
2018-11-12 16:01                         ` Ævar Arnfjörð Bjarmason
2018-11-12 16:21                           ` Jeff King
2018-11-12 22:18                             ` Ævar Arnfjörð Bjarmason
2018-11-12 22:30                               ` Ævar Arnfjörð Bjarmason
2018-11-13 10:02                                 ` Ævar Arnfjörð Bjarmason
2018-11-14 18:21                                   ` René Scharfe
2018-12-02 10:52                                   ` René Scharfe
2018-12-03 22:04                                     ` Jeff King
2018-12-04 21:45                                       ` René Scharfe
2018-12-05  4:46                                         ` Jeff King
2018-12-05  6:02                                           ` René Scharfe
2018-12-05  6:51                                             ` Jeff King
2018-12-05  8:15                                               ` Jeff King
2018-12-05 18:41                                                 ` René Scharfe
2018-12-05 20:17                                                   ` Jeff King
2018-11-12 22:44                             ` Geert Jansen
2018-11-27 20:48                         ` René Scharfe
2018-12-01 19:49                           ` Jeff King
2018-11-12 14:55                       ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
2018-11-12 19:25                         ` René Scharfe
2018-11-12 19:32                           ` Ævar Arnfjörð Bjarmason
2018-11-12 20:07                             ` Jeff King
2018-11-12 20:13                             ` René Scharfe
2018-11-12 16:02                       ` [PATCH 0/9] caching loose objects Derrick Stolee
2018-11-12 19:10                         ` Stefan Beller
2018-11-09 13:43                   ` [RFC PATCH] index-pack: improve performance on NFS Ævar Arnfjörð Bjarmason
2018-11-09 16:08                     ` Duy Nguyen
2018-11-10 14:04                       ` Ævar Arnfjörð Bjarmason
2018-11-12 14:34                         ` Jeff King
2018-11-12 22:58                     ` Geert Jansen
2018-10-27 14:04         ` Duy Nguyen
2018-10-29 15:18           ` Jeff King
2018-10-29  0:48         ` Junio C Hamano
2018-10-29 15:20           ` Jeff King
2018-10-29 18:43             ` Ævar Arnfjörð Bjarmason
2018-10-29 21:34           ` Geert Jansen
2018-10-29 21:50             ` Jeff King
2018-10-29 22:21               ` Geert Jansen
2018-10-29 22:27             ` Jeff King
2018-10-29 22:35               ` Stefan Beller
2018-10-29 23:29                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8gq5zmc.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=gerardu@amazon.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.