Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Colin Stolley" <cstolley@runbox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] packfile.c: speed up loading lots of packfiles.
Date: Tue, 3 Dec 2019 17:17:30 -0500
Message-ID: <20191203221730.GA28419@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq7e3d75vk.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 03, 2019 at 08:04:15AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Good catch. The issue is that we only add entries to the hashmap in
> > prepare_packed_git(), but they may be added to the pack list by other
> > callers of install_packed_git(). It probably makes sense to just push
> > the hashmap maintenance down into that function, like below.
> 
> Makes sense to me.
> 
> Let me locally squash your fix in and credit you with helped-by
> footer in the amended log message.  Strictly speaking, this may
> invalidate the perf numbers, but I do not think the scenario p5303
> sets up alone is all that interesting anyway---if you have 10,000
> packs, not just registering them (which is improved with this patch)
> but using objects from them would be slower than necessary X-<.

Thanks, that sounds good.

I actually re-checked the perf numbers (mostly to make sure I didn't
screw anything up) and got similar results.

I agree that 10,000 packs is ridiculous, but we do see it (and worse)
occasionally from people pushing in a loop before our scheduled
maintenance kicks in.  And it's quadratic, so if you hit 30k packs, it's
another factor of 9 worse. It makes even diagnosing the repository
pretty painful. :)

Also a fun fact: Linux actually has a limit on the number of
simultaneous mmaps that a process can have, which defaults to ~64k. But
if you have if you have 32k packs, then we'll map both the packs and the
idx files. Plus whatever you need for mapping the binary, libraries,
etc, plus any maps opened by malloc() for large requests.

I have occasionally run into this trying to repack some very
out-of-control cases (the magic fix is to double the limit with `sysctl
-w vm.max_map_count=131060`, if you are curious). I also wondered if
this might be made worse by the recent change to drop
release_pack_memory(). But I ran into it even before that change,
because zlib calls malloc() directly. We're also pretty aggressive about
dying when mmap() returns an error (rather than closing packs and trying
again).

I think Git _could_ be handle this more gracefully by just trying to
keep fewer packs open at one time (the way we similarly try not to use
up all of the file descriptors). But I have a hard time caring too much,
since it's such a ridiculous situation in the first place. Bumping the
limits is an easy operational fix.

-Peff

  parent reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 22:24 Colin Stolley
2019-11-28  0:42 ` hashmap vs khash? " Eric Wong
2019-11-30 17:36   ` Junio C Hamano
2019-12-02 14:39   ` Jeff King
2019-12-02 17:40 ` SZEDER Gábor
2019-12-02 19:42   ` Jeff King
2019-12-03  6:17     ` Taylor Blau
2019-12-03 15:34       ` Jeff King
2019-12-03 16:04     ` Junio C Hamano
2019-12-03 17:33       ` Colin Stolley
2019-12-03 22:18         ` Jeff King
2019-12-04 18:15           ` Junio C Hamano
2019-12-03 22:17       ` Jeff King [this message]
2019-12-04  4:23         ` Jonathan Nieder
2019-12-03  6:19 ` Taylor Blau

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203221730.GA28419@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=cstolley@runbox.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git