git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Carl Baldwin <cnb@fc.hp.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: auto-packing on kernel.org? please?
Date: Mon, 21 Nov 2005 12:01:51 -0700	[thread overview]
Message-ID: <20051121190151.GA2568@hpsvcnb.fc.hp.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0510131113490.15297@g5.osdl.org>

I have a question about automatic repacking.

I am thinking of turning something like Linus' repacking heuristic loose
on my repositories.  I just want to make sure it is as safe as possible.

At the core of the incremental and full repack strategies are these
statements.

Incremental...
> 		git repack &&
> 			git prune-packed

Full...
> 		git repack -a -d &&
> 			git prune-packed

Are there some built in safety checks in 'git repack' and/or 'git
prune-packed' to guard against corruption?  In the long run, I would
feel more comfortable with somelike like this:

git repack
git verify-pack <new pack>
git prune-packed

Would something like this even work with 'git repack -a -d'?  Is there a
way to do something like the following for a full repack to achieve the
ultimate in paranoia?

git repack -a
git verify-pack <new pack file>
git trash-redundant-packs <new pack file>
git prune-packed

Carl

On Thu, Oct 13, 2005 at 11:44:30AM -0700, Linus Torvalds wrote:
> 
> I know we tried this once earlier, and it caused problems, but that was 
> when pack-files were new, and not everybody could handle them. These days, 
> if you can't handle pack-files, kernel.org is already pretty useless, 
> because all the major packages use them anyway, because people have 
> packed their repositories by hand.
> 
> So I'm suggesting we try to do an automatic repack every once in a while. 
> 
> In my suggestion, there would be two levels of repacking: "incremental" 
> and "full", and both of them would count the number of files before they 
> run, so that you'd only do it when it seems worthwhile.
> 
> This is a _really_ simple heuristic:
> 
>  - incremental repacking run every day:
> 
> 	#
> 	# Check if we have more than a couple of hundred
> 	# unpacked objects - approximated by whether we
> 	# have any "00" directory with more than one 
> 	#
> 	# This means that we don't repack projects that
> 	# that don't have a lot of work going on.
> 	#
> 	# Note: with really new versions of git, the "00"
> 	# directory may not exist if it has been pruned
> 	# away, so handle that gracefully.
> 	#
> 	export GIT_DIR=${1:-.}
> 	objs=$(find "$GIT_DIR/objects/00" -type f 2> /dev/null | wc -l)
> 	if [ "$obj" -gt 0 ]; then
> 		git repack &&
> 			git prune-packed
> 	fi
> 
>  - "full repack" every week if the number of packs has grown to be bigger 
>    than say 10 (ie even a very active projects will never have a full 
>    repack more than every other week)
> 
> 	#
> 	# Check if we have lots of packs, where "lots" is defined as 10.
> 	#
> 	# Note: with something that was generated with an old version
> 	# of git, the "pack" directory may not exist, so handle that
> 	# gracefully.
> 	#
> 	export GIT_DIR=${1:-.}
> 	packs=$(find "$GIT_DIR/objects/pack" -name '*.idx' 2> /dev/null | wc -l)
> 	if [ "$packs" -gt 10 ]; then
> 		git repack -a -d &&
> 			git prune-packed
> 	fi
> 
>  - do a full repack of everything once to start with.
> 
> 	export GIT_DIR=${1:-.}
> 	git repack -a -d &&
> 		git prune-packed
> 
> the above three trivial scripts just take a single argument, which becomes 
> the GIT_DIR (and if no argument exists, it would default to ".")
> 
> Is there any reason not to do this? Right now mirroring is slow, and 
> webgit is also getting to be very slow sometimes. I bet we'd be _much_ 
> better off with this kind of setup.
> 
> NOTE! The above is the "stupid" approach, which totally ignores alternate 
> directories, and isn't able to take advantage of the fact that many 
> projects could share objects. But it's simple, and it's efficient (eg it 
> won't spend time on things like the large historic archives which don't 
> change, but that would be expensive to repack if you didn't check for the 
> need).
> 
> So we could try to come up with a better approach eventually, which would 
> automatically notice alternate directories and not repack stuff that 
> exists there, but I'm pretty sure that the above would already help a 
> _lot_, and while pack-files have been been around forever, the 
> "alternates" support is still pretty new, so the above is also the "safer" 
> thing to do.
> 
> We'd only do the automatic thing on stuff under /pub/scm, of course: not 
> stuff in peoples home directories etc..
> 
> Peter?
> 
> 			Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        Systems VLSI Laboratory
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  parent reply	other threads:[~2005-11-21 19:02 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-13 18:44 auto-packing on kernel.org? please? Linus Torvalds
     [not found] ` <434EABFD.5070604@zytor.com>
     [not found]   ` <434EC07C.30505@pobox.com>
2005-10-13 21:23     ` [kernel.org users] " Linus Torvalds
2005-10-16 14:33       ` Dirk Behme
2005-10-16 15:44         ` Daniel Barkalow
2005-10-16 16:12           ` Nick Hengeveld
2005-10-16 16:23             ` Brian Gerst
2005-10-16 16:56               ` Junio C Hamano
2005-10-16 21:33                 ` Nick Hengeveld
2005-10-16 22:12                   ` Junio C Hamano
2005-10-17  6:06                     ` Nick Hengeveld
2005-10-17  8:21                       ` Junio C Hamano
2005-10-17 17:41                         ` Nick Hengeveld
2005-10-17 20:08                           ` Junio C Hamano
2005-10-17 22:56                             ` Daniel Barkalow
2005-10-17 23:19                               ` Linus Torvalds
2005-10-17 23:54                             ` Nick Hengeveld
2005-10-17 19:13                   ` Daniel Barkalow
2005-10-16 17:10               ` Johannes Schindelin
2005-10-16 17:15               ` Brian Gerst
2005-11-21 19:01 ` Carl Baldwin [this message]
2005-11-21 19:24   ` Linus Torvalds
2005-11-21 19:58     ` Junio C Hamano
2005-11-21 20:38       ` Linus Torvalds
2005-11-21 21:35         ` Junio C Hamano
2005-11-22  5:26     ` Chuck Lever
2005-11-22  5:41       ` Linus Torvalds
2005-11-22 14:13         ` Catalin Marinas
2005-11-22 17:05           ` Linus Torvalds
     [not found]           ` <7v64qkfwhe.fsf@assigned-by-dhcp.cox.net>
     [not found]             ` <b0943d9e0511220946o3b62842ey@mail.gmail.com>
     [not found]               ` <7v1x18eddp.fsf@assigned-by-dhcp.cox.net>
2005-11-23 14:10                 ` Catalin Marinas
2005-11-22 18:18         ` Chuck Lever
2005-11-23 14:18           ` Catalin Marinas
2005-11-22 17:25     ` Carl Baldwin
2005-11-22 17:58       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051121190151.GA2568@hpsvcnb.fc.hp.com \
    --to=cnb@fc.hp.com \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).