From: Carl Baldwin <cnb@fc.hp.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: auto-packing on kernel.org? please?
Date: Mon, 21 Nov 2005 12:01:51 -0700 [thread overview]
Message-ID: <20051121190151.GA2568@hpsvcnb.fc.hp.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0510131113490.15297@g5.osdl.org>
I have a question about automatic repacking.
I am thinking of turning something like Linus' repacking heuristic loose
on my repositories. I just want to make sure it is as safe as possible.
At the core of the incremental and full repack strategies are these
statements.
Incremental...
> git repack &&
> git prune-packed
Full...
> git repack -a -d &&
> git prune-packed
Are there some built in safety checks in 'git repack' and/or 'git
prune-packed' to guard against corruption? In the long run, I would
feel more comfortable with somelike like this:
git repack
git verify-pack <new pack>
git prune-packed
Would something like this even work with 'git repack -a -d'? Is there a
way to do something like the following for a full repack to achieve the
ultimate in paranoia?
git repack -a
git verify-pack <new pack file>
git trash-redundant-packs <new pack file>
git prune-packed
Carl
On Thu, Oct 13, 2005 at 11:44:30AM -0700, Linus Torvalds wrote:
>
> I know we tried this once earlier, and it caused problems, but that was
> when pack-files were new, and not everybody could handle them. These days,
> if you can't handle pack-files, kernel.org is already pretty useless,
> because all the major packages use them anyway, because people have
> packed their repositories by hand.
>
> So I'm suggesting we try to do an automatic repack every once in a while.
>
> In my suggestion, there would be two levels of repacking: "incremental"
> and "full", and both of them would count the number of files before they
> run, so that you'd only do it when it seems worthwhile.
>
> This is a _really_ simple heuristic:
>
> - incremental repacking run every day:
>
> #
> # Check if we have more than a couple of hundred
> # unpacked objects - approximated by whether we
> # have any "00" directory with more than one
> #
> # This means that we don't repack projects that
> # that don't have a lot of work going on.
> #
> # Note: with really new versions of git, the "00"
> # directory may not exist if it has been pruned
> # away, so handle that gracefully.
> #
> export GIT_DIR=${1:-.}
> objs=$(find "$GIT_DIR/objects/00" -type f 2> /dev/null | wc -l)
> if [ "$obj" -gt 0 ]; then
> git repack &&
> git prune-packed
> fi
>
> - "full repack" every week if the number of packs has grown to be bigger
> than say 10 (ie even a very active projects will never have a full
> repack more than every other week)
>
> #
> # Check if we have lots of packs, where "lots" is defined as 10.
> #
> # Note: with something that was generated with an old version
> # of git, the "pack" directory may not exist, so handle that
> # gracefully.
> #
> export GIT_DIR=${1:-.}
> packs=$(find "$GIT_DIR/objects/pack" -name '*.idx' 2> /dev/null | wc -l)
> if [ "$packs" -gt 10 ]; then
> git repack -a -d &&
> git prune-packed
> fi
>
> - do a full repack of everything once to start with.
>
> export GIT_DIR=${1:-.}
> git repack -a -d &&
> git prune-packed
>
> the above three trivial scripts just take a single argument, which becomes
> the GIT_DIR (and if no argument exists, it would default to ".")
>
> Is there any reason not to do this? Right now mirroring is slow, and
> webgit is also getting to be very slow sometimes. I bet we'd be _much_
> better off with this kind of setup.
>
> NOTE! The above is the "stupid" approach, which totally ignores alternate
> directories, and isn't able to take advantage of the fact that many
> projects could share objects. But it's simple, and it's efficient (eg it
> won't spend time on things like the large historic archives which don't
> change, but that would be expensive to repack if you didn't check for the
> need).
>
> So we could try to come up with a better approach eventually, which would
> automatically notice alternate directories and not repack stuff that
> exists there, but I'm pretty sure that the above would already help a
> _lot_, and while pack-files have been been around forever, the
> "alternates" support is still pretty new, so the above is also the "safer"
> thing to do.
>
> We'd only do the automatic thing on stuff under /pub/scm, of course: not
> stuff in peoples home directories etc..
>
> Peter?
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Carl Baldwin Systems VLSI Laboratory
Hewlett Packard Company
MS 88 work: 970 898-1523
3404 E. Harmony Rd. work: Carl.N.Baldwin@hp.com
Fort Collins, CO 80525 home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
next prev parent reply other threads:[~2005-11-21 19:02 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-13 18:44 auto-packing on kernel.org? please? Linus Torvalds
[not found] ` <434EABFD.5070604@zytor.com>
[not found] ` <434EC07C.30505@pobox.com>
2005-10-13 21:23 ` [kernel.org users] " Linus Torvalds
2005-10-16 14:33 ` Dirk Behme
2005-10-16 15:44 ` Daniel Barkalow
2005-10-16 16:12 ` Nick Hengeveld
2005-10-16 16:23 ` Brian Gerst
2005-10-16 16:56 ` Junio C Hamano
2005-10-16 21:33 ` Nick Hengeveld
2005-10-16 22:12 ` Junio C Hamano
2005-10-17 6:06 ` Nick Hengeveld
2005-10-17 8:21 ` Junio C Hamano
2005-10-17 17:41 ` Nick Hengeveld
2005-10-17 20:08 ` Junio C Hamano
2005-10-17 22:56 ` Daniel Barkalow
2005-10-17 23:19 ` Linus Torvalds
2005-10-17 23:54 ` Nick Hengeveld
2005-10-17 19:13 ` Daniel Barkalow
2005-10-16 17:10 ` Johannes Schindelin
2005-10-16 17:15 ` Brian Gerst
2005-11-21 19:01 ` Carl Baldwin [this message]
2005-11-21 19:24 ` Linus Torvalds
2005-11-21 19:58 ` Junio C Hamano
2005-11-21 20:38 ` Linus Torvalds
2005-11-21 21:35 ` Junio C Hamano
2005-11-22 5:26 ` Chuck Lever
2005-11-22 5:41 ` Linus Torvalds
2005-11-22 14:13 ` Catalin Marinas
2005-11-22 17:05 ` Linus Torvalds
[not found] ` <7v64qkfwhe.fsf@assigned-by-dhcp.cox.net>
[not found] ` <b0943d9e0511220946o3b62842ey@mail.gmail.com>
[not found] ` <7v1x18eddp.fsf@assigned-by-dhcp.cox.net>
2005-11-23 14:10 ` Catalin Marinas
2005-11-22 18:18 ` Chuck Lever
2005-11-23 14:18 ` Catalin Marinas
2005-11-22 17:25 ` Carl Baldwin
2005-11-22 17:58 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051121190151.GA2568@hpsvcnb.fc.hp.com \
--to=cnb@fc.hp.com \
--cc=git@vger.kernel.org \
--cc=hpa@zytor.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).