git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Carl Baldwin <cnb@fc.hp.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: auto-packing on kernel.org? please?
Date: Tue, 22 Nov 2005 10:25:58 -0700	[thread overview]
Message-ID: <20051122172558.GA1935@hpsvcnb.fc.hp.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0511211110480.13959@g5.osdl.org>

On Mon, Nov 21, 2005 at 11:24:11AM -0800, Linus Torvalds wrote:
> NOTE! Since that email, "git repack" has gotten a "local" option (-l), 
> which is very useful if the repositories have pointers to alternates.
> 
> So do
> 
> 	git repack -l
> 
> instead, to get much better packs (and "-a -d" for the full case, of 
> course).

I'm assuming that this option will have no effect on a repository with
no alternates file.

> Other that than, the old email suggestion should still be fine.

[snip]

> You can certainly do that if you are nervous. It might even be a good 
> idea: just for fun, I just did
> 
> 	git clone -l git git-clone
> 	cd git-clone
> 
> 	# pick an object at random
> 	rm .git/objects/f7/c3d39fe3db6da3a307da385a7a1cb563ed15f7
> 
> 	git repack -a -d
> 
> and it said:
> 
> 	error: Could not read f7c3d39fe3db6da3a307da385a7a1cb563ed15f7
> 	fatal: bad tree object f7c3d39fe3db6da3a307da385a7a1cb563ed15f7
> 
> but then it created the pack _anyway_, and said:
> 
> 	Packing 27 objects
> 	Pack pack-13bfca704078175c1c1c59964553b14f7b952651 created.
> 
> and happily removed all the old ones.
> 
> So right now, repacking a broken archive can actually break it even more.

Interesting.

> NOTE! Your "git verify-pack" wouldn't even catch this: the _pack_ is fine, 
> it's just incomplete.

In my opinion, git repack did the right thing in creating the pack even
if it is more broken.  Starting with a broken repository was the real
problem.  git repack shouldn't need to worry too much about it.

Looking at it from the nervous repository admin's point of view I think
he would want to make sure that the repository is good to begin with.  I
think this should be left up to the repository owner and maybe not git
repack.  Although, the check that you do following this is probably a
good idea.

> Of course, this only happens if the repository was broken to begin with, 
> so arguably it's not that bad. But it does show that git-repack should be 
> more careful and return an error more aggressively.
> 
> Can anybody tell me how to do that sanely? Right now we do
> 
> 	..
> 	name=$(git-rev-list --objects $rev_list $(git-rev-parse $rev_parse) |
> 	        git-pack-objects --non-empty $pack_objects .tmp-pack) ||
> 	        exit 1
> 	..
> 
> and the thing is, the "git-pack-objects" thing is happy, it's the 
> "git-rev-list" that fails. So because the last command in the pipeline 
> returns ok, we think it all is ok..
> 
> (This is one of the reasons I much prefer working in C over working in 
> shell: it may be twenty times more lines, but when you have a problem, the 
> fix is always obvious..)
> 
> Anyway, with that fixed, a "git repack" in many ways would be a mini-fsck, 
> so it should be very safe in general. Modulo any other bugs like the 
> above.
> 
> 		Linus

*NOTE*  There is one question that I feel remains unanswered.  Is it
possible to split up the repack -a and repack -d so that the nervous
repository owner can insert a git verify-pack in the middle.

I'm not nearly this nervous about repositories that I keep for myself
but I have ownership of some repositories on which many people may
depend.  I will feel better if I can verify the pack separately from
git-repack before I do the (potentially destructive) -d to remove old
packs.

I don't mean to say that I don't trust git repack to do the right thing.
Fundamentally, I just think that I shouldn't depend on it to do the
right thing in order to avoid corruption in my repository.

Carl

PS  I love that the git object store is designed so that object files
never *need* to be removed, renamed, modified or otherwise touched in
any way after being written to disk.  I think this makes git inherently
extremely safe from corruption unlike many other older repository
designs.  The only thing that breaks this inherent safety is the desire
to pack repositories to avoid bloat.

That is why I want to be a little paranoid when I do the repacking.  I
want to maintain some inherent safety in the process that I use to pack
them.  This kind of inherent safety is much more valuable then even the
highest quality code written to actually do the packing.

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        Systems VLSI Laboratory
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  parent reply	other threads:[~2005-11-22 17:26 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-13 18:44 auto-packing on kernel.org? please? Linus Torvalds
     [not found] ` <434EABFD.5070604@zytor.com>
     [not found]   ` <434EC07C.30505@pobox.com>
2005-10-13 21:23     ` [kernel.org users] " Linus Torvalds
2005-10-16 14:33       ` Dirk Behme
2005-10-16 15:44         ` Daniel Barkalow
2005-10-16 16:12           ` Nick Hengeveld
2005-10-16 16:23             ` Brian Gerst
2005-10-16 16:56               ` Junio C Hamano
2005-10-16 21:33                 ` Nick Hengeveld
2005-10-16 22:12                   ` Junio C Hamano
2005-10-17  6:06                     ` Nick Hengeveld
2005-10-17  8:21                       ` Junio C Hamano
2005-10-17 17:41                         ` Nick Hengeveld
2005-10-17 20:08                           ` Junio C Hamano
2005-10-17 22:56                             ` Daniel Barkalow
2005-10-17 23:19                               ` Linus Torvalds
2005-10-17 23:54                             ` Nick Hengeveld
2005-10-17 19:13                   ` Daniel Barkalow
2005-10-16 17:10               ` Johannes Schindelin
2005-10-16 17:15               ` Brian Gerst
2005-11-21 19:01 ` Carl Baldwin
2005-11-21 19:24   ` Linus Torvalds
2005-11-21 19:58     ` Junio C Hamano
2005-11-21 20:38       ` Linus Torvalds
2005-11-21 21:35         ` Junio C Hamano
2005-11-22  5:26     ` Chuck Lever
2005-11-22  5:41       ` Linus Torvalds
2005-11-22 14:13         ` Catalin Marinas
2005-11-22 17:05           ` Linus Torvalds
     [not found]           ` <7v64qkfwhe.fsf@assigned-by-dhcp.cox.net>
     [not found]             ` <b0943d9e0511220946o3b62842ey@mail.gmail.com>
     [not found]               ` <7v1x18eddp.fsf@assigned-by-dhcp.cox.net>
2005-11-23 14:10                 ` Catalin Marinas
2005-11-22 18:18         ` Chuck Lever
2005-11-23 14:18           ` Catalin Marinas
2005-11-22 17:25     ` Carl Baldwin [this message]
2005-11-22 17:58       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051122172558.GA1935@hpsvcnb.fc.hp.com \
    --to=cnb@fc.hp.com \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).