From: Linus Torvalds <torvalds@osdl.org>
To: Carl Baldwin <cnb@fc.hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: auto-packing on kernel.org? please?
Date: Tue, 22 Nov 2005 09:58:45 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0511220939540.13959@g5.osdl.org> (raw)
In-Reply-To: <20051122172558.GA1935@hpsvcnb.fc.hp.com>
On Tue, 22 Nov 2005, Carl Baldwin wrote:
> On Mon, Nov 21, 2005 at 11:24:11AM -0800, Linus Torvalds wrote:
> > NOTE! Since that email, "git repack" has gotten a "local" option (-l),
> > which is very useful if the repositories have pointers to alternates.
> >
> > So do
> >
> > git repack -l
> >
> > instead, to get much better packs (and "-a -d" for the full case, of
> > course).
>
> I'm assuming that this option will have no effect on a repository with
> no alternates file.
Correct.
The only thing it does is that when it looks up an object, if it's not in
our _own_ ".git/objects/" dir, it won't pack it.
Actually, that's not entirely true. It isn't smart enough to know where
every object exists, so it only knows about remote _packs_. So what
happens is that if you do
git repack -l -a -d
it will create a pack-file that contains _all_ unpacked objects (whether
local or not) and all objects that are in local packs (because of the
"-a"), but not any objects that are in "alternate packs".
Which is actually exactly what you want, if you are in the situation that
kernel.org is, and you have people who point their alternates to mine:
when I repack my objects, they'll use my packs, but other than that,
they'll prefer to use their own packs over any unpacked objects.
> > So right now, repacking a broken archive can actually break it even more.
>
> Interesting.
Well, with the latest git repack script, that should no longer be true.
> > NOTE! Your "git verify-pack" wouldn't even catch this: the _pack_ is fine,
> > it's just incomplete.
>
> In my opinion, git repack did the right thing in creating the pack even
> if it is more broken. Starting with a broken repository was the real
> problem. git repack shouldn't need to worry too much about it.
Well, "git repack" did the wrong thing in that it never _noticed_, and it
then removed all old packs - even though those old packs contained objects
that we hadn't repacked because of the broken repository.
Of course, _usually_ a broken repository is just that - broken. The way
you fix a broken repo is to find a non-broken one, and clone that.
However, sometimes what you can do (if you literally just lost a few
objects) is to find a non-broken repo, and make that the _alternates_, in
which case you may be able to save any work you had in the broken one
(assuming you only lost objects that were available somewhere else).
> Looking at it from the nervous repository admin's point of view I think
> he would want to make sure that the repository is good to begin with.
Doing an fsck is certainly always a good idea. I do a "shallow" fsck
usually several times a day ("shallow" means that it doesn't fsck packs,
only new objects that I have aquired since the last repacking), and I do a
full fsck a couple of times a week.
I don't actually know why I do that, though. I don't think I've really
_ever_ had a broken repo since some very early days, except for the cases
where I break things on purpose (like remove an object to check whether
"git repack" does the right thing or not). I'm just used to it, and the
shallow fsck takes a fraction of a second, so I tend to do it after each
pull.
So I really think that an admin has to be more than "nervous" to worry
about it. He has to be really anal.
(Now, doing a repack and a fsck every week or so might be good, and
automatic shallow fsck's daily is probably a great idea too. After all, it
_is_ checking checksums, so if you worry about security and want to make
sure that nobody is trying to break in and do bad things to your repo, a
regular fsck is a good thing even if you're not otherwise worried about
corruption).
> *NOTE* There is one question that I feel remains unanswered. Is it
> possible to split up the repack -a and repack -d so that the nervous
> repository owner can insert a git verify-pack in the middle.
They are already split up inside "git-repack", so we could add a hook
there, I guess. See the git-repack.sh file, and notice how it does the
"remove_redundant" part only after it has created the new pack-file and
done a "sync".
> I don't mean to say that I don't trust git repack to do the right thing.
> Fundamentally, I just think that I shouldn't depend on it to do the
> right thing in order to avoid corruption in my repository.
That's good. However, as the previous failure of git repack showed, to
some degree the more likely failure mode is actually that the pack
generated by "git repack" is perfectly fine, but it's not _complete_. Say
we have a bug in git repack, for example.
Another case where it's not complete is when you have deleted a branch.
"git repack -a -d" will effectively do a "git prune" wrt objects that are
no longer reachable, and that were in the old packs.
So I'd actually suggest a slightly different approach. When-ever you
remove old objects (whether it's "git prune" or "git prune-packed" or "git
repack -a -d"), you might want to have an option that doesn't actually
_remove_ them, but just moves them into ".git/attic" or something like
that.
Then you can clean up the attic after doing your weekly full fsck or
something. And it has the advantage that if somebody has deleted a branch,
and notices later that maybe he wanted that branch back, you can "unprune"
all the objects, run "git-fsck-objects --full" to find any dangling
commits, and you'll have all your branches back.
So in many ways it would perhaps be nicer to have that kind of "safe
remove" option to the pruning commands?
Linus
prev parent reply other threads:[~2005-11-22 18:05 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-13 18:44 auto-packing on kernel.org? please? Linus Torvalds
[not found] ` <434EABFD.5070604@zytor.com>
[not found] ` <434EC07C.30505@pobox.com>
2005-10-13 21:23 ` [kernel.org users] " Linus Torvalds
2005-10-16 14:33 ` Dirk Behme
2005-10-16 15:44 ` Daniel Barkalow
2005-10-16 16:12 ` Nick Hengeveld
2005-10-16 16:23 ` Brian Gerst
2005-10-16 16:56 ` Junio C Hamano
2005-10-16 21:33 ` Nick Hengeveld
2005-10-16 22:12 ` Junio C Hamano
2005-10-17 6:06 ` Nick Hengeveld
2005-10-17 8:21 ` Junio C Hamano
2005-10-17 17:41 ` Nick Hengeveld
2005-10-17 20:08 ` Junio C Hamano
2005-10-17 22:56 ` Daniel Barkalow
2005-10-17 23:19 ` Linus Torvalds
2005-10-17 23:54 ` Nick Hengeveld
2005-10-17 19:13 ` Daniel Barkalow
2005-10-16 17:10 ` Johannes Schindelin
2005-10-16 17:15 ` Brian Gerst
2005-11-21 19:01 ` Carl Baldwin
2005-11-21 19:24 ` Linus Torvalds
2005-11-21 19:58 ` Junio C Hamano
2005-11-21 20:38 ` Linus Torvalds
2005-11-21 21:35 ` Junio C Hamano
2005-11-22 5:26 ` Chuck Lever
2005-11-22 5:41 ` Linus Torvalds
2005-11-22 14:13 ` Catalin Marinas
2005-11-22 17:05 ` Linus Torvalds
[not found] ` <7v64qkfwhe.fsf@assigned-by-dhcp.cox.net>
[not found] ` <b0943d9e0511220946o3b62842ey@mail.gmail.com>
[not found] ` <7v1x18eddp.fsf@assigned-by-dhcp.cox.net>
2005-11-23 14:10 ` Catalin Marinas
2005-11-22 18:18 ` Chuck Lever
2005-11-23 14:18 ` Catalin Marinas
2005-11-22 17:25 ` Carl Baldwin
2005-11-22 17:58 ` Linus Torvalds [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0511220939540.13959@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=cnb@fc.hp.com \
--cc=git@vger.kernel.org \
--cc=hpa@zytor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).