All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: "Robin H. Johnson" <robbat2@gentoo.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Performance issue: initial git clone causes massive repack
Date: Tue, 14 Apr 2009 16:17:55 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0904141542161.6741@xanadu.home> (raw)
In-Reply-To: <alpine.DEB.1.00.0904141749330.10279@pacific.mpi-cbg.de>

On Tue, 14 Apr 2009, Johannes Schindelin wrote:

> Hi,
> 
> On Fri, 10 Apr 2009, Robin H. Johnson wrote:
> 
> > On Wed, Apr 08, 2009 at 12:52:54AM -0400, Nicolas Pitre wrote:
> > > > http://git.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary
> > > > At least that's what I cloned ;-) I hope it's the right one, but it fits
> > > > the description...
> > > OK.  FWIW, I repacked it with --window=250 --depth=250 and obtained a 
> > > 725MB pack file.  So that's about half the originally reported size.
> > The one problem with having the single large packfile is that Git
> > doesn't have a trivial way to resume downloading it when the git://
> > protocol is used.
> > 
> > For our developers cursed with bad internet connections (a fair number
> > of firewalls that don't seem to respect keepalive properly), I suppose
> > I can probably just maintain a separate repo for their initial clones,
> > which leaves a large overall download, but more chances to resume.
> 
> IMO the best we could do under these circumstances is to use fsck 
> --lost-found to find those commits which have a complete history (i.e. no 
> "broken links") -- this probably needs to be implemented as a special mode 
> of --lost-found -- and store them in a temporary to-be-removed 
> namespace, say refs/heads/incomplete-refs/$number, which will be sent to 
> the server when fetching the next time.  (Might need some iterations to 
> get everything, though.)

Well, although this might seem a good idea, this would help only in 
those cases where there is at least one complete revision available, 
i.e. no delta needed. This is usually true for the top commit after a 
repack which objects are all stored at the front of the pack and serve 
as base objects for deltas from subsequent (older) commits.  Thing is, 
that first revision is likely to occupy a significant portion of the 
whole pack, like no less than the size of the equivalent .tar.gz for the 
content of that commit.  To see what this represents, just try a shallow 
clone with depth=1.  For the Linux kernel, this is more than 80MB while 
the whole repo is in the 200MB range.  So if your connection isn't 
reliable enough to transfer at least that amount then you're screwed 
anyway.

Independently from this, I think there is quite a lot of confusion here.  
According to Robin, the reason for splitting the large Gentoo repo into 
multiple packs is apparently to help with the resuming of a clone.  We 
know that the git:// protocol is currently not resumable, and having 
multiple packs on the remote server won't change the outcome in any way 
as the client still receives a single big pack anyway.

WRT the HTTP protocol, I was questioning git's ability to resume the 
transfer of a pack in the middle if such transfer is interrupted without 
redownloading it all. And Mike Hommey says this is actually the case.

Meaning there is simply no reason to split a big pack into multiple 
ones.  If anything, it'll only make a clone over the native git protocol 
more costly for the server which has to pack everything back together.


Nicolas

  reply	other threads:[~2009-04-14 20:19 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-04 22:07 Performance issue: initial git clone causes massive repack Robin H. Johnson
2009-04-05  0:05 ` Nicolas Sebrecht
2009-04-05  0:37   ` Robin H. Johnson
2009-04-05  3:54     ` Nicolas Sebrecht
2009-04-05  4:08       ` Nicolas Sebrecht
2009-04-05  7:04       ` Robin H. Johnson
2009-04-05 19:02         ` Nicolas Sebrecht
2009-04-05 19:17           ` Shawn O. Pearce
2009-04-05 23:02             ` Robin H. Johnson
2009-04-05 20:43           ` Robin H. Johnson
2009-04-05 21:08             ` Shawn O. Pearce
2009-04-05 21:28           ` david
2009-04-05 21:36             ` Sverre Rabbelier
2009-04-06  3:24               ` Nicolas Pitre
2009-04-07  8:10                 ` Björn Steinbrink
2009-04-07  9:45                   ` Jakub Narebski
2009-04-07 13:13                     ` Nicolas Pitre
2009-04-07 13:37                       ` Jakub Narebski
2009-04-07 14:03                         ` Jon Smirl
2009-04-07 17:59                         ` Nicolas Pitre
2009-04-07 14:21                       ` Björn Steinbrink
2009-04-07 17:48                         ` Nicolas Pitre
2009-04-07 18:12                           ` Björn Steinbrink
2009-04-07 18:56                             ` Nicolas Pitre
2009-04-07 20:27                               ` Björn Steinbrink
2009-04-08  4:52                                 ` Nicolas Pitre
2009-04-10 20:38                                   ` Robin H. Johnson
2009-04-11  1:58                                     ` Nicolas Pitre
2009-04-11  7:06                                       ` Mike Hommey
2009-04-14 15:52                                     ` Johannes Schindelin
2009-04-14 20:17                                       ` Nicolas Pitre [this message]
2009-04-14 20:27                                         ` Robin H. Johnson
2009-04-14 21:02                                           ` Nicolas Pitre
2009-04-15  3:09                                           ` Nguyen Thai Ngoc Duy
2009-04-15  5:53                                             ` Robin H. Johnson
2009-04-15  5:54                                             ` Junio C Hamano
2009-04-15 11:51                                               ` Nicolas Pitre
2009-04-22  1:15                                           ` Sam Vilain
2009-04-22  9:55                                             ` Mike Ralphson
2009-04-22 11:24                                               ` Pieter de Bie
2009-04-22 13:19                                               ` Johannes Schindelin
2009-04-22 14:35                                                 ` Shawn O. Pearce
2009-04-22 16:40                                                   ` Andreas Ericsson
2009-04-22 17:06                                                     ` Johannes Schindelin
2009-04-23 19:30                                               ` Christian Couder
2009-04-22 14:14                                             ` Nicolas Pitre
2009-04-22 22:01                                               ` Sam Vilain
2009-04-22 22:50                                                 ` Björn Steinbrink
2009-04-22 23:07                                                 ` Nicolas Pitre
2009-04-22 23:30                                                   ` Johannes Schindelin
2009-04-23  3:16                                                     ` Nicolas Pitre
2009-04-14 20:30                                         ` Johannes Schindelin
2009-04-07 20:29                             ` Jeff King
2009-04-07 20:35                               ` Björn Steinbrink
2009-04-08 11:28                       ` [PATCH] process_{tree,blob}: Remove useless xstrdup calls Björn Steinbrink
2009-04-10 22:20                         ` Linus Torvalds
2009-04-11  0:27                           ` Linus Torvalds
2009-04-11  1:15                             ` Linus Torvalds
2009-04-11  1:34                               ` Nicolas Pitre
2009-04-11 13:41                               ` Björn Steinbrink
2009-04-11 14:07                                 ` Björn Steinbrink
2009-04-11 18:06                                   ` Linus Torvalds
2009-04-11 18:22                                     ` Linus Torvalds
2009-04-11 19:22                                       ` Björn Steinbrink
2009-04-11 20:50                                     ` Björn Steinbrink
2009-04-11 21:43                                       ` Linus Torvalds
2009-04-11 23:24                                         ` Björn Steinbrink
2009-04-11 18:19                                   ` Linus Torvalds
2009-04-11 19:40                                     ` Björn Steinbrink
2009-04-11 19:58                                       ` Linus Torvalds
2009-04-05 22:59             ` Performance issue: initial git clone causes massive repack Nicolas Sebrecht
2009-04-05 23:20               ` david
2009-04-05 23:28                 ` Robin Rosenberg
2009-04-06  3:34                 ` Nicolas Pitre
2009-04-06  5:15                   ` Junio C Hamano
2009-04-06 13:12                     ` Nicolas Pitre
2009-04-06 13:52                     ` Jon Smirl
2009-04-06 14:19                       ` Nicolas Pitre
2009-04-06 14:37                         ` Jon Smirl
2009-04-06 14:48                           ` Shawn O. Pearce
2009-04-06 15:14                           ` Nicolas Pitre
2009-04-06 15:28                             ` Jon Smirl
2009-04-06 16:14                               ` Nicolas Pitre
2009-04-06 11:22                   ` Matthieu Moy
2009-04-06 13:29                     ` Nicolas Pitre
2009-04-06 14:03                       ` Robin H. Johnson
2009-04-06 14:14                         ` Nicolas Pitre
2009-04-07 10:11               ` Martin Langhoff
2009-04-05 19:57 ` Jeff King
2009-04-05 23:38   ` Robin H. Johnson
2009-04-05 23:42     ` Robin H. Johnson
     [not found]     ` <0015174c150e49b5740466d7d2c2@google.com>
2009-04-06  0:29       ` Robin H. Johnson
2009-04-06  3:10     ` Nguyen Thai Ngoc Duy
2009-04-06  4:09       ` Nicolas Pitre
2009-04-06  4:06     ` Nicolas Pitre
2009-04-06 14:20       ` Robin H. Johnson
2009-04-11 17:24 ` Mark Levedahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0904141542161.6741@xanadu.home \
    --to=nico@cam.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.