All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Robin H. Johnson" <robbat2@gentoo.org>
To: Git Mailing List <git@vger.kernel.org>
Subject: Re: Performance issue: initial git clone causes massive repack
Date: Sun, 5 Apr 2009 00:04:12 -0700	[thread overview]
Message-ID: <20090405070412.GB869@curie-int> (raw)
In-Reply-To: <20090405035453.GB12927@vidovic>

[-- Attachment #1: Type: text/plain, Size: 4992 bytes --]

Before I answer the rest of your post, I'd like to note that the matter
of which choice between single-repo, repo-per-package, repo-per-category
has been flogged to death within Gentoo.

I did not come to the Git mailing list to rehash those choices. I came
here to find a solution to the performance problem. While it shows up
with our repo, I'm certain that we're not the only people with the
problem. The GSoC 2009 ideas contain a potential project for caching the
generated packs, which, while having value in itself, could be partially
avoided by sending suitable pre-built packs (if they exist) without any
repacking.

On Sun, Apr 05, 2009 at 05:54:53AM +0200, Nicolas Sebrecht wrote:
> > That causes incredibly bloat unfortunately.
> > 
> > I'll summarize why here for the git mailing list. Most our developers
> > have the entire tree checked out, and in informal surveys, would like to
> > continue to do so. There are ~13500 packages right now 
> Each developer doesn't work on so many packages, right ? From my point
> of view, checkin'out the entire tree is the wrong way on how to do
> things.
Also, I should note that working on the tree isn't the only reason to
have the tree checked out. While the great majority of Gentoo users have
their trees purely from rsync, there is nothing stopping you from using
a tree from CVS (anonCVS for the users, master CVS server for the
developers).

A quick bit of stats run show that while some developers only touch a
few packages, there are at least 200 developers that have done a major
change to 100 or more packages.

> > Without tail packing, the Gentoo tree is presently around 520MiB (you
> > can fit it into ~190MiB with tail packing). This means that
> > repo-per-package would have an overhead in the range of 400%.
> Don't know about the business for Gentoo, but HDD is cheap.
There's no reason to have bloat just for the layout to change.

> Also, I'd like to know how much space you will gain with the CVS to Git >
> migration.  How bigger is a CVS repo against a Git one ?
For the CVS checkouts right now: 
- ~410MiB of content (w/ 4kb inodes)
- ~240MiB of CVS overhead (w/ 4kb inodes)
(sorry about the earlier 520MiB number, I forgot to exclude a local dir
of stats data on my box when I ran du quickly).

Our experimental Git, with only a single repo for gentoo-x86:
- ~410MiB of content (w/ 4kb inodes)
- 80MiB - 1.6GiB of Git total overhead.

80MiB of overhead is the total overhead with a shallow clone at depth 1.
1.6GiB is with the full history.

And per-package numbers, because we DID do an experimental conversion,
last year, although the packs might not have been optimal:
- ~410MiB of content (w/ 4kb inodes)
- 4.7GiB of Git total overhead, with a breakdown:
  - 1.9GiB in inode waste
  - 2.8GiB in packs

> One repo per category could be a good compromise assuming one seperate
> branch per package, then.
Other downsides to repo-per-category and repo-per-package:
- Raises difficulty in adding a new package/category. 
  You cannot just do 'mkdir && vi ... && git add && git commit' anymore.
- The name of the directory for both of the category AND the package are not
  specified in the ebuild, as such, unless they are checked out to the right
  location, you will get breakage (definitely in the package name, and
  about 10% of the time with categories).
- You cannot use git-cvsserver with them cleanly and have the correct
  behavior (we DO have developers that want to use the CVS emulation
  layer) - adding a category or a package would NOT trigger the
  addition of a new repo on the server when needed.
- Does NOT present a good base for anybody wanting to branch the entire
  tree themselves.
  

> > Additionally, there's a lot of commonality between ebuilds and packages,
> > and having repo-per-package means that the compression algorithms can't
> > make use of it - dictionary algorithms are effective at compression for
> > a reason.
> Please, no. We are in the long term issues. Compression will be
> efficient. It's all about the content of the files and dictionary
> algorithms certainly will do a good job over the ebuilds revisions.
We're already on track to drop the CVS $Header$, and thereafter, some of the
ebuilds are already on track to be smaller. Here's our prototype dev-perl/Sub-Name-0.04.
====
# Copyright 1999-2009 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
MODULE_AUTHOR=XMATH
inherit perl-module
DESCRIPTION="(re)name a sub"
LICENSE="|| ( Artistic GPL-2 )"
SLOT="0"
KEYWORDS="~amd64 ~x86"
IUSE=""
SRC_TEST=do
====

We can have all the CPAN packages from CPAN author XMATH, with changing
only the DESCRIPTION string. KEYWORDS then just changes over the package
lifespan.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]

  parent reply	other threads:[~2009-04-05  7:05 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-04 22:07 Performance issue: initial git clone causes massive repack Robin H. Johnson
2009-04-05  0:05 ` Nicolas Sebrecht
2009-04-05  0:37   ` Robin H. Johnson
2009-04-05  3:54     ` Nicolas Sebrecht
2009-04-05  4:08       ` Nicolas Sebrecht
2009-04-05  7:04       ` Robin H. Johnson [this message]
2009-04-05 19:02         ` Nicolas Sebrecht
2009-04-05 19:17           ` Shawn O. Pearce
2009-04-05 23:02             ` Robin H. Johnson
2009-04-05 20:43           ` Robin H. Johnson
2009-04-05 21:08             ` Shawn O. Pearce
2009-04-05 21:28           ` david
2009-04-05 21:36             ` Sverre Rabbelier
2009-04-06  3:24               ` Nicolas Pitre
2009-04-07  8:10                 ` Björn Steinbrink
2009-04-07  9:45                   ` Jakub Narebski
2009-04-07 13:13                     ` Nicolas Pitre
2009-04-07 13:37                       ` Jakub Narebski
2009-04-07 14:03                         ` Jon Smirl
2009-04-07 17:59                         ` Nicolas Pitre
2009-04-07 14:21                       ` Björn Steinbrink
2009-04-07 17:48                         ` Nicolas Pitre
2009-04-07 18:12                           ` Björn Steinbrink
2009-04-07 18:56                             ` Nicolas Pitre
2009-04-07 20:27                               ` Björn Steinbrink
2009-04-08  4:52                                 ` Nicolas Pitre
2009-04-10 20:38                                   ` Robin H. Johnson
2009-04-11  1:58                                     ` Nicolas Pitre
2009-04-11  7:06                                       ` Mike Hommey
2009-04-14 15:52                                     ` Johannes Schindelin
2009-04-14 20:17                                       ` Nicolas Pitre
2009-04-14 20:27                                         ` Robin H. Johnson
2009-04-14 21:02                                           ` Nicolas Pitre
2009-04-15  3:09                                           ` Nguyen Thai Ngoc Duy
2009-04-15  5:53                                             ` Robin H. Johnson
2009-04-15  5:54                                             ` Junio C Hamano
2009-04-15 11:51                                               ` Nicolas Pitre
2009-04-22  1:15                                           ` Sam Vilain
2009-04-22  9:55                                             ` Mike Ralphson
2009-04-22 11:24                                               ` Pieter de Bie
2009-04-22 13:19                                               ` Johannes Schindelin
2009-04-22 14:35                                                 ` Shawn O. Pearce
2009-04-22 16:40                                                   ` Andreas Ericsson
2009-04-22 17:06                                                     ` Johannes Schindelin
2009-04-23 19:30                                               ` Christian Couder
2009-04-22 14:14                                             ` Nicolas Pitre
2009-04-22 22:01                                               ` Sam Vilain
2009-04-22 22:50                                                 ` Björn Steinbrink
2009-04-22 23:07                                                 ` Nicolas Pitre
2009-04-22 23:30                                                   ` Johannes Schindelin
2009-04-23  3:16                                                     ` Nicolas Pitre
2009-04-14 20:30                                         ` Johannes Schindelin
2009-04-07 20:29                             ` Jeff King
2009-04-07 20:35                               ` Björn Steinbrink
2009-04-08 11:28                       ` [PATCH] process_{tree,blob}: Remove useless xstrdup calls Björn Steinbrink
2009-04-10 22:20                         ` Linus Torvalds
2009-04-11  0:27                           ` Linus Torvalds
2009-04-11  1:15                             ` Linus Torvalds
2009-04-11  1:34                               ` Nicolas Pitre
2009-04-11 13:41                               ` Björn Steinbrink
2009-04-11 14:07                                 ` Björn Steinbrink
2009-04-11 18:06                                   ` Linus Torvalds
2009-04-11 18:22                                     ` Linus Torvalds
2009-04-11 19:22                                       ` Björn Steinbrink
2009-04-11 20:50                                     ` Björn Steinbrink
2009-04-11 21:43                                       ` Linus Torvalds
2009-04-11 23:24                                         ` Björn Steinbrink
2009-04-11 18:19                                   ` Linus Torvalds
2009-04-11 19:40                                     ` Björn Steinbrink
2009-04-11 19:58                                       ` Linus Torvalds
2009-04-05 22:59             ` Performance issue: initial git clone causes massive repack Nicolas Sebrecht
2009-04-05 23:20               ` david
2009-04-05 23:28                 ` Robin Rosenberg
2009-04-06  3:34                 ` Nicolas Pitre
2009-04-06  5:15                   ` Junio C Hamano
2009-04-06 13:12                     ` Nicolas Pitre
2009-04-06 13:52                     ` Jon Smirl
2009-04-06 14:19                       ` Nicolas Pitre
2009-04-06 14:37                         ` Jon Smirl
2009-04-06 14:48                           ` Shawn O. Pearce
2009-04-06 15:14                           ` Nicolas Pitre
2009-04-06 15:28                             ` Jon Smirl
2009-04-06 16:14                               ` Nicolas Pitre
2009-04-06 11:22                   ` Matthieu Moy
2009-04-06 13:29                     ` Nicolas Pitre
2009-04-06 14:03                       ` Robin H. Johnson
2009-04-06 14:14                         ` Nicolas Pitre
2009-04-07 10:11               ` Martin Langhoff
2009-04-05 19:57 ` Jeff King
2009-04-05 23:38   ` Robin H. Johnson
2009-04-05 23:42     ` Robin H. Johnson
     [not found]     ` <0015174c150e49b5740466d7d2c2@google.com>
2009-04-06  0:29       ` Robin H. Johnson
2009-04-06  3:10     ` Nguyen Thai Ngoc Duy
2009-04-06  4:09       ` Nicolas Pitre
2009-04-06  4:06     ` Nicolas Pitre
2009-04-06 14:20       ` Robin H. Johnson
2009-04-11 17:24 ` Mark Levedahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090405070412.GB869@curie-int \
    --to=robbat2@gentoo.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.