All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Performance issue: initial git clone causes massive repack
Date: Mon, 06 Apr 2009 00:06:00 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0904052336260.6741@xanadu.home> (raw)
In-Reply-To: <20090405T230552Z@curie.orbis-terrarum.net>

On Sun, 5 Apr 2009, Robin H. Johnson wrote:

> On Sun, Apr 05, 2009 at 03:57:14PM -0400, Jeff King wrote:
> > > During an initial clone, I see that git-upload-pack invokes
> > > pack-objects, despite the ENTIRE repository already being packed - no
> > > loose objects whatsoever. git-upload-pack then seems to buffer in
> > > memory.
> > We need to run pack-objects even if the repo is fully packed because we
> > don't know what's _in_ the existing pack (or packs). In particular we
> > want to:
> >   - combine multiple packs into a single pack; this is more efficient on
> >     the network, because you can find more deltas, and I believe is
> >     required because the protocol sends only a single pack.
> > 
> >   - cull any objects which are not actually part of the reachability
> >     chain from the refs we are sending
> > 
> > If no work needs to be done for either case, then pack-objects should
> > basically just figure that out and then send the existing pack (the
> > expensive bit is doing deltas, and we don't consider objects in the same
> > pack for deltas, as we know we have already considered that during the
> > last repack). It does mmap the whole pack, so you will see your virtual
> > memory jump, but nothing should require the whole pack being in memory
> > at once.

Actually the pack is mapped with a (configurable) window.  See the
core.packedGitWindowSize and core.packedGitLimit config options for 
details.

> While my current pack setup has multiple packs of not more than 100MiB
> each, that was simply for ease of resume with rsync+http tests. Even
> when I already had a single pack, with every object reachable,
> pack-objects was redoing the packing.

In that case it shouldn't have.

> > pack-objects streams the output to upload-pack, which should only ever
> > have an 8K buffer of it in memory at any given time.
> > 
> > At least that is how it is all supposed to work, according to my
> > understanding. So if you are seeing very high memory usage, I wonder if
> > there is a bug in pack-objects or upload-pack that can be fixed.
> > 
> > Maybe somebody more knowledgeable than me about packing can comment.
> Looking at the source, I agree that it should be buffering, however top and ps
> seem to disagree. 3GiB VSZ and 2.5GiB RSS here now.
> 
> %CPU %MEM     VSZ     RSS STAT START   TIME COMMAND
>  0.0  0.0  140932    1040 Ss   16:09   0:00 \_ git-upload-pack /code/gentoo/gentoo-git/gentoo-x86.git 
> 32.2  0.0       0       0 Z    16:09   1:50     \_ [git-upload-pack] <defunct>
> 80.8 44.2 3018484 2545700 Sl   16:09   4:36     \_ git pack-objects --stdout --progress --delta-base-offset 
> 
> Also, I did another trace, using some other hardware, in a LAN setting, and
> noticed that git-upload-pack/pack-objects only seems to start output to the
> network after it reaches 100% in 'remote: Compressing objects:'.

That's to be expected.  Delta compression matches objects which are not 
in the stream order at all.  Therefore it is not possible to start 
outputting pack data until this pass is done.  Still, this pass should 
not be invoked if your repository is already fully packed into one pack.  
Can you confirm this is actually the case?

> Relatedly, throwing more RAM (6GiB total, vs. the previous 2GiB) at 
> the server in this case cut the 200 wallclock minutes before any 
> sending too place down to 5 minutes.

Well... here's a wild guess.  In the source repository serving clone 
requests, please do:

	git config pack.deltaCacheSize 1
	git config pack.deltaCacheLimit 0

and try cloning again with a fully packed repository.

> > > For the initial clone, can the git-upload-pack algorithm please send
> > > existing packs, and only generate a pack containing the non-packed
> > > items?
> > 
> > I believe that would require a change to the protocol to allow multiple
> > packs. However, it may be possible to munge the pack header in such a
> > way that you basically concatenate multiple packs. You would still want
> > to peek in the big pack to try deltas from the non-packed items, though.

As explained already, even if the protocol requires a single pack to be 
created, it is still made up of unmodified data segments from existing 
packs as much as possible.  So you should see it more or less as the 
concatenation of those packs already, plus some munging over the edges.

> > I think all of this falls into the realm of the GSOC pack caching project.
> > There have been other discussions on the list, so you might want to look
> > through those for something useful.
> Yes, both changing the protocol, and recognizing that existing packs may be
> suitable to send could be considered as part of the caching project, as they
> fall under the aegis of making good use of what's stored in the cache already
> to send.

The caching pack project is to address a different issue: mainly to 
bypass the object enumeration cost.  In other words, it could allow for 
skipping the "Counting objects" pass, and a tiny bit more.  At least in 
theory that's about the main difference.  This has many drawbacks as 
well though.


Nicolas

  parent reply	other threads:[~2009-04-06  4:09 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-04 22:07 Performance issue: initial git clone causes massive repack Robin H. Johnson
2009-04-05  0:05 ` Nicolas Sebrecht
2009-04-05  0:37   ` Robin H. Johnson
2009-04-05  3:54     ` Nicolas Sebrecht
2009-04-05  4:08       ` Nicolas Sebrecht
2009-04-05  7:04       ` Robin H. Johnson
2009-04-05 19:02         ` Nicolas Sebrecht
2009-04-05 19:17           ` Shawn O. Pearce
2009-04-05 23:02             ` Robin H. Johnson
2009-04-05 20:43           ` Robin H. Johnson
2009-04-05 21:08             ` Shawn O. Pearce
2009-04-05 21:28           ` david
2009-04-05 21:36             ` Sverre Rabbelier
2009-04-06  3:24               ` Nicolas Pitre
2009-04-07  8:10                 ` Björn Steinbrink
2009-04-07  9:45                   ` Jakub Narebski
2009-04-07 13:13                     ` Nicolas Pitre
2009-04-07 13:37                       ` Jakub Narebski
2009-04-07 14:03                         ` Jon Smirl
2009-04-07 17:59                         ` Nicolas Pitre
2009-04-07 14:21                       ` Björn Steinbrink
2009-04-07 17:48                         ` Nicolas Pitre
2009-04-07 18:12                           ` Björn Steinbrink
2009-04-07 18:56                             ` Nicolas Pitre
2009-04-07 20:27                               ` Björn Steinbrink
2009-04-08  4:52                                 ` Nicolas Pitre
2009-04-10 20:38                                   ` Robin H. Johnson
2009-04-11  1:58                                     ` Nicolas Pitre
2009-04-11  7:06                                       ` Mike Hommey
2009-04-14 15:52                                     ` Johannes Schindelin
2009-04-14 20:17                                       ` Nicolas Pitre
2009-04-14 20:27                                         ` Robin H. Johnson
2009-04-14 21:02                                           ` Nicolas Pitre
2009-04-15  3:09                                           ` Nguyen Thai Ngoc Duy
2009-04-15  5:53                                             ` Robin H. Johnson
2009-04-15  5:54                                             ` Junio C Hamano
2009-04-15 11:51                                               ` Nicolas Pitre
2009-04-22  1:15                                           ` Sam Vilain
2009-04-22  9:55                                             ` Mike Ralphson
2009-04-22 11:24                                               ` Pieter de Bie
2009-04-22 13:19                                               ` Johannes Schindelin
2009-04-22 14:35                                                 ` Shawn O. Pearce
2009-04-22 16:40                                                   ` Andreas Ericsson
2009-04-22 17:06                                                     ` Johannes Schindelin
2009-04-23 19:30                                               ` Christian Couder
2009-04-22 14:14                                             ` Nicolas Pitre
2009-04-22 22:01                                               ` Sam Vilain
2009-04-22 22:50                                                 ` Björn Steinbrink
2009-04-22 23:07                                                 ` Nicolas Pitre
2009-04-22 23:30                                                   ` Johannes Schindelin
2009-04-23  3:16                                                     ` Nicolas Pitre
2009-04-14 20:30                                         ` Johannes Schindelin
2009-04-07 20:29                             ` Jeff King
2009-04-07 20:35                               ` Björn Steinbrink
2009-04-08 11:28                       ` [PATCH] process_{tree,blob}: Remove useless xstrdup calls Björn Steinbrink
2009-04-10 22:20                         ` Linus Torvalds
2009-04-11  0:27                           ` Linus Torvalds
2009-04-11  1:15                             ` Linus Torvalds
2009-04-11  1:34                               ` Nicolas Pitre
2009-04-11 13:41                               ` Björn Steinbrink
2009-04-11 14:07                                 ` Björn Steinbrink
2009-04-11 18:06                                   ` Linus Torvalds
2009-04-11 18:22                                     ` Linus Torvalds
2009-04-11 19:22                                       ` Björn Steinbrink
2009-04-11 20:50                                     ` Björn Steinbrink
2009-04-11 21:43                                       ` Linus Torvalds
2009-04-11 23:24                                         ` Björn Steinbrink
2009-04-11 18:19                                   ` Linus Torvalds
2009-04-11 19:40                                     ` Björn Steinbrink
2009-04-11 19:58                                       ` Linus Torvalds
2009-04-05 22:59             ` Performance issue: initial git clone causes massive repack Nicolas Sebrecht
2009-04-05 23:20               ` david
2009-04-05 23:28                 ` Robin Rosenberg
2009-04-06  3:34                 ` Nicolas Pitre
2009-04-06  5:15                   ` Junio C Hamano
2009-04-06 13:12                     ` Nicolas Pitre
2009-04-06 13:52                     ` Jon Smirl
2009-04-06 14:19                       ` Nicolas Pitre
2009-04-06 14:37                         ` Jon Smirl
2009-04-06 14:48                           ` Shawn O. Pearce
2009-04-06 15:14                           ` Nicolas Pitre
2009-04-06 15:28                             ` Jon Smirl
2009-04-06 16:14                               ` Nicolas Pitre
2009-04-06 11:22                   ` Matthieu Moy
2009-04-06 13:29                     ` Nicolas Pitre
2009-04-06 14:03                       ` Robin H. Johnson
2009-04-06 14:14                         ` Nicolas Pitre
2009-04-07 10:11               ` Martin Langhoff
2009-04-05 19:57 ` Jeff King
2009-04-05 23:38   ` Robin H. Johnson
2009-04-05 23:42     ` Robin H. Johnson
     [not found]     ` <0015174c150e49b5740466d7d2c2@google.com>
2009-04-06  0:29       ` Robin H. Johnson
2009-04-06  3:10     ` Nguyen Thai Ngoc Duy
2009-04-06  4:09       ` Nicolas Pitre
2009-04-06  4:06     ` Nicolas Pitre [this message]
2009-04-06 14:20       ` Robin H. Johnson
2009-04-11 17:24 ` Mark Levedahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0904052336260.6741@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.