git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Carl Baldwin <cnb@fc.hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: auto-packing on kernel.org? please?
Date: Tue, 22 Nov 2005 09:58:45 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0511220939540.13959@g5.osdl.org> (raw)
In-Reply-To: <20051122172558.GA1935@hpsvcnb.fc.hp.com>



On Tue, 22 Nov 2005, Carl Baldwin wrote:

> On Mon, Nov 21, 2005 at 11:24:11AM -0800, Linus Torvalds wrote:
> > NOTE! Since that email, "git repack" has gotten a "local" option (-l), 
> > which is very useful if the repositories have pointers to alternates.
> > 
> > So do
> > 
> > 	git repack -l
> > 
> > instead, to get much better packs (and "-a -d" for the full case, of 
> > course).
> 
> I'm assuming that this option will have no effect on a repository with
> no alternates file.

Correct.

The only thing it does is that when it looks up an object, if it's not in 
our _own_ ".git/objects/" dir, it won't pack it.

Actually, that's not entirely true. It isn't smart enough to know where 
every object exists, so it only knows about remote _packs_. So what 
happens is that if you do

	git repack -l -a -d

it will create a pack-file that contains _all_ unpacked objects (whether 
local or not) and all objects that are in local packs (because of the 
"-a"), but not any objects that are in "alternate packs".

Which is actually exactly what you want, if you are in the situation that 
kernel.org is, and you have people who point their alternates to mine: 
when I repack my objects, they'll use my packs, but other than that, 
they'll prefer to use their own packs over any unpacked objects.

> > So right now, repacking a broken archive can actually break it even more.
> 
> Interesting.

Well, with the latest git repack script, that should no longer be true.

> > NOTE! Your "git verify-pack" wouldn't even catch this: the _pack_ is fine, 
> > it's just incomplete.
> 
> In my opinion, git repack did the right thing in creating the pack even
> if it is more broken.  Starting with a broken repository was the real
> problem.  git repack shouldn't need to worry too much about it.

Well, "git repack" did the wrong thing in that it never _noticed_, and it 
then removed all old packs - even though those old packs contained objects 
that we hadn't repacked because of the broken repository.

Of course, _usually_ a broken repository is just that - broken. The way 
you fix a broken repo is to find a non-broken one, and clone that. 
However, sometimes what you can do (if you literally just lost a few 
objects) is to find a non-broken repo, and make that the _alternates_, in 
which case you may be able to save any work you had in the broken one 
(assuming you only lost objects that were available somewhere else).

> Looking at it from the nervous repository admin's point of view I think
> he would want to make sure that the repository is good to begin with.

Doing an fsck is certainly always a good idea. I do a "shallow" fsck 
usually several times a day ("shallow" means that it doesn't fsck packs, 
only new objects that I have aquired since the last repacking), and I do a 
full fsck a couple of times a week.

I don't actually know why I do that, though. I don't think I've really 
_ever_ had a broken repo since some very early days, except for the cases 
where I break things on purpose (like remove an object to check whether 
"git repack" does the right thing or not). I'm just used to it, and the 
shallow fsck takes a fraction of a second, so I tend to do it after each 
pull.

So I really think that an admin has to be more than "nervous" to worry 
about it. He has to be really anal.

(Now, doing a repack and a fsck every week or so might be good, and 
automatic shallow fsck's daily is probably a great idea too. After all, it 
_is_ checking checksums, so if you worry about security and want to make 
sure that nobody is trying to break in and do bad things to your repo, a 
regular fsck is a good thing even if you're not otherwise worried about 
corruption).

> *NOTE*  There is one question that I feel remains unanswered.  Is it
> possible to split up the repack -a and repack -d so that the nervous
> repository owner can insert a git verify-pack in the middle.

They are already split up inside "git-repack", so we could add a hook 
there, I guess. See the git-repack.sh file, and notice how it does the 
"remove_redundant" part only after it has created the new pack-file and 
done a "sync".

> I don't mean to say that I don't trust git repack to do the right thing.
> Fundamentally, I just think that I shouldn't depend on it to do the
> right thing in order to avoid corruption in my repository.

That's good. However, as the previous failure of git repack showed, to 
some degree the more likely failure mode is actually that the pack 
generated by "git repack" is perfectly fine, but it's not _complete_. Say 
we have a bug in git repack, for example.

Another case where it's not complete is when you have deleted a branch. 
"git repack -a -d" will effectively do a "git prune" wrt objects that are 
no longer reachable, and that were in the old packs.

So I'd actually suggest a slightly different approach. When-ever you 
remove old objects (whether it's "git prune" or "git prune-packed" or "git 
repack -a -d"), you might want to have an option that doesn't actually 
_remove_ them, but just moves them into ".git/attic" or something like 
that.

Then you can clean up the attic after doing your weekly full fsck or 
something. And it has the advantage that if somebody has deleted a branch, 
and notices later that maybe he wanted that branch back, you can "unprune" 
all the objects, run "git-fsck-objects --full" to find any dangling 
commits, and you'll have all your branches back.

So in many ways it would perhaps be nicer to have that kind of "safe 
remove" option to the pruning commands?

			Linus

      reply	other threads:[~2005-11-22 18:05 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-13 18:44 auto-packing on kernel.org? please? Linus Torvalds
     [not found] ` <434EABFD.5070604@zytor.com>
     [not found]   ` <434EC07C.30505@pobox.com>
2005-10-13 21:23     ` [kernel.org users] " Linus Torvalds
2005-10-16 14:33       ` Dirk Behme
2005-10-16 15:44         ` Daniel Barkalow
2005-10-16 16:12           ` Nick Hengeveld
2005-10-16 16:23             ` Brian Gerst
2005-10-16 16:56               ` Junio C Hamano
2005-10-16 21:33                 ` Nick Hengeveld
2005-10-16 22:12                   ` Junio C Hamano
2005-10-17  6:06                     ` Nick Hengeveld
2005-10-17  8:21                       ` Junio C Hamano
2005-10-17 17:41                         ` Nick Hengeveld
2005-10-17 20:08                           ` Junio C Hamano
2005-10-17 22:56                             ` Daniel Barkalow
2005-10-17 23:19                               ` Linus Torvalds
2005-10-17 23:54                             ` Nick Hengeveld
2005-10-17 19:13                   ` Daniel Barkalow
2005-10-16 17:10               ` Johannes Schindelin
2005-10-16 17:15               ` Brian Gerst
2005-11-21 19:01 ` Carl Baldwin
2005-11-21 19:24   ` Linus Torvalds
2005-11-21 19:58     ` Junio C Hamano
2005-11-21 20:38       ` Linus Torvalds
2005-11-21 21:35         ` Junio C Hamano
2005-11-22  5:26     ` Chuck Lever
2005-11-22  5:41       ` Linus Torvalds
2005-11-22 14:13         ` Catalin Marinas
2005-11-22 17:05           ` Linus Torvalds
     [not found]           ` <7v64qkfwhe.fsf@assigned-by-dhcp.cox.net>
     [not found]             ` <b0943d9e0511220946o3b62842ey@mail.gmail.com>
     [not found]               ` <7v1x18eddp.fsf@assigned-by-dhcp.cox.net>
2005-11-23 14:10                 ` Catalin Marinas
2005-11-22 18:18         ` Chuck Lever
2005-11-23 14:18           ` Catalin Marinas
2005-11-22 17:25     ` Carl Baldwin
2005-11-22 17:58       ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0511220939540.13959@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=cnb@fc.hp.com \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).