git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: auto-packing on kernel.org? please?
Date: Thu, 13 Oct 2005 11:44:30 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0510131113490.15297@g5.osdl.org> (raw)


I know we tried this once earlier, and it caused problems, but that was 
when pack-files were new, and not everybody could handle them. These days, 
if you can't handle pack-files, kernel.org is already pretty useless, 
because all the major packages use them anyway, because people have 
packed their repositories by hand.

So I'm suggesting we try to do an automatic repack every once in a while. 

In my suggestion, there would be two levels of repacking: "incremental" 
and "full", and both of them would count the number of files before they 
run, so that you'd only do it when it seems worthwhile.

This is a _really_ simple heuristic:

 - incremental repacking run every day:

	#
	# Check if we have more than a couple of hundred
	# unpacked objects - approximated by whether we
	# have any "00" directory with more than one 
	#
	# This means that we don't repack projects that
	# that don't have a lot of work going on.
	#
	# Note: with really new versions of git, the "00"
	# directory may not exist if it has been pruned
	# away, so handle that gracefully.
	#
	export GIT_DIR=${1:-.}
	objs=$(find "$GIT_DIR/objects/00" -type f 2> /dev/null | wc -l)
	if [ "$obj" -gt 0 ]; then
		git repack &&
			git prune-packed
	fi

 - "full repack" every week if the number of packs has grown to be bigger 
   than say 10 (ie even a very active projects will never have a full 
   repack more than every other week)

	#
	# Check if we have lots of packs, where "lots" is defined as 10.
	#
	# Note: with something that was generated with an old version
	# of git, the "pack" directory may not exist, so handle that
	# gracefully.
	#
	export GIT_DIR=${1:-.}
	packs=$(find "$GIT_DIR/objects/pack" -name '*.idx' 2> /dev/null | wc -l)
	if [ "$packs" -gt 10 ]; then
		git repack -a -d &&
			git prune-packed
	fi

 - do a full repack of everything once to start with.

	export GIT_DIR=${1:-.}
	git repack -a -d &&
		git prune-packed

the above three trivial scripts just take a single argument, which becomes 
the GIT_DIR (and if no argument exists, it would default to ".")

Is there any reason not to do this? Right now mirroring is slow, and 
webgit is also getting to be very slow sometimes. I bet we'd be _much_ 
better off with this kind of setup.

NOTE! The above is the "stupid" approach, which totally ignores alternate 
directories, and isn't able to take advantage of the fact that many 
projects could share objects. But it's simple, and it's efficient (eg it 
won't spend time on things like the large historic archives which don't 
change, but that would be expensive to repack if you didn't check for the 
need).

So we could try to come up with a better approach eventually, which would 
automatically notice alternate directories and not repack stuff that 
exists there, but I'm pretty sure that the above would already help a 
_lot_, and while pack-files have been been around forever, the 
"alternates" support is still pretty new, so the above is also the "safer" 
thing to do.

We'd only do the automatic thing on stuff under /pub/scm, of course: not 
stuff in peoples home directories etc..

Peter?

			Linus

             reply	other threads:[~2005-10-13 18:44 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-13 18:44 Linus Torvalds [this message]
     [not found] ` <434EABFD.5070604@zytor.com>
     [not found]   ` <434EC07C.30505@pobox.com>
2005-10-13 21:23     ` [kernel.org users] Re: auto-packing on kernel.org? please? Linus Torvalds
2005-10-16 14:33       ` Dirk Behme
2005-10-16 15:44         ` Daniel Barkalow
2005-10-16 16:12           ` Nick Hengeveld
2005-10-16 16:23             ` Brian Gerst
2005-10-16 16:56               ` Junio C Hamano
2005-10-16 21:33                 ` Nick Hengeveld
2005-10-16 22:12                   ` Junio C Hamano
2005-10-17  6:06                     ` Nick Hengeveld
2005-10-17  8:21                       ` Junio C Hamano
2005-10-17 17:41                         ` Nick Hengeveld
2005-10-17 20:08                           ` Junio C Hamano
2005-10-17 22:56                             ` Daniel Barkalow
2005-10-17 23:19                               ` Linus Torvalds
2005-10-17 23:54                             ` Nick Hengeveld
2005-10-17 19:13                   ` Daniel Barkalow
2005-10-16 17:10               ` Johannes Schindelin
2005-10-16 17:15               ` Brian Gerst
2005-11-21 19:01 ` Carl Baldwin
2005-11-21 19:24   ` Linus Torvalds
2005-11-21 19:58     ` Junio C Hamano
2005-11-21 20:38       ` Linus Torvalds
2005-11-21 21:35         ` Junio C Hamano
2005-11-22  5:26     ` Chuck Lever
2005-11-22  5:41       ` Linus Torvalds
2005-11-22 14:13         ` Catalin Marinas
2005-11-22 17:05           ` Linus Torvalds
     [not found]           ` <7v64qkfwhe.fsf@assigned-by-dhcp.cox.net>
     [not found]             ` <b0943d9e0511220946o3b62842ey@mail.gmail.com>
     [not found]               ` <7v1x18eddp.fsf@assigned-by-dhcp.cox.net>
2005-11-23 14:10                 ` Catalin Marinas
2005-11-22 18:18         ` Chuck Lever
2005-11-23 14:18           ` Catalin Marinas
2005-11-22 17:25     ` Carl Baldwin
2005-11-22 17:58       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0510131113490.15297@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).