git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 1/3] improve reliability of fixup_pack_header_footer()
Date: Fri, 29 Aug 2008 09:08:48 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.1.10.0808290844200.1624@xanadu.home> (raw)
In-Reply-To: <20080829044459.GA28492@spearce.org>

On Thu, 28 Aug 2008, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@cam.org> wrote:
> > Currently, this function has the potential to read corrupted pack data
> > from disk and give it a valid SHA1 checksum.  Let's add the ability to
> > validate SHA1 checksum of existing data along the way, including before
> > and after any arbitrary point in the pack.
> 
> I found your implementation in fixup_pack_header to be very
> contorted.  Did you read the JGit patch I posted?  I think its
> implementation is more elegant and easier to follow.  Oh, and its
> BSD licensed... so easy for a GPLv2 project to borrow.  ;-)

Well, actually, I don't find the JGit implementation easier to follow at 
all.  Of course I wrote the C version while you wrote the Java version.  
Maybe without our respective bias then things are just fine in both 
cases.

> >  void fixup_pack_header_footer(int pack_fd,
> ...
> > +	if (partial_pack_sha1 && !partial_pack_offset) {
> > +		partial_pack_offset = lseek(pack_fd, 0, SEEK_CUR);
> > +		if (partial_pack_offset == (off_t)-1)
> 
> Eh?
> 
> I find this to be nonsense.  If the caller gave us a SHA-1 but
> wants us to do fixup then they have increased the size of the pack.
> Which means they must pass us the original length.  Computing it
> here is relying on the caller leaving the file pointer positioned
> at the old end.  Who the heck does that after fixing a thin pack?

This is not for thin packs but for split packs in repack-objects. And 
yeah, the caller has the offset information already in that case too, so 
this could be removed.  It just felt more generic that way originally.

> >  	buf = xmalloc(buf_sz);
> >  	for (;;) {
> > -		ssize_t n = xread(pack_fd, buf, buf_sz);
> > +		ssize_t m, n;
> > +		m = (partial_pack_sha1 && partial_pack_offset < buf_sz) ?
> > +			partial_pack_offset : buf_sz;
> > +		n = xread(pack_fd, buf, m);
> 
> Why not read the full buf_sz per round and then use only part of
> the buffer in SHA1_Update(&old_sha1_ctx)?  It took me a while to
> figure out what the heck you were trying to measure here.  It also
> would be a few less CPU instructions if you always just read buf_sz
> and leave the min test to after the "if (!partial_pack_sha1)" below.

I doubt it.  That would only move the offset test from before the read 
to after it, and actually complicate things for restarting the 
checksumming after the special offset was crossed.  And we're talking 
about a comparison with a conditional move instructions only.

And another thing I had in store (but for which you _again_ beat me to :-) )
is to realign data reads onto filesystem blocks.

> > +		if (!partial_pack_sha1)
> > +			continue;
> > +
> > +		SHA1_Update(&old_sha1_ctx, buf, n);
> > +		partial_pack_offset -= n;
> > +		if (partial_pack_offset == 0) {
> > +			unsigned char sha1[20];
> > +			SHA1_Final(sha1, &old_sha1_ctx);
> > +			if (hashcmp(sha1, partial_pack_sha1) != 0)
> > +				die("Unexpected checksum for %s "
> > +				    "(disk corruption?)", pack_name);
> > +			/*
> > +			 * Now let's compute the SHA1 of the remainder of the
> > +			 * pack, which also means making partial_pack_offset
> > +			 * big enough not to matter anymore.
> > +			 */
> > +			SHA1_Init(&old_sha1_ctx);
> > +			partial_pack_offset = ~partial_pack_offset;
> > +			partial_pack_offset -= MSB(partial_pack_offset, 1);
> 
> That's freaking brilliant.  And something I missed in JGit.
> 
> The way its implemented is downright difficult to follow though.
> I'll admit it took me a good 10 minutes to understand what you were
> doing here, and why.

Again maybe that's just because you're just too biased by your own 
implementation.  I don't consider this code particularly obscur.


Nicolas

  reply	other threads:[~2008-08-29 13:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-25 16:46 "failed to read delta base object at..." J. Bruce Fields
2008-08-25 18:58 ` Nicolas Pitre
2008-08-25 21:18   ` J. Bruce Fields
2008-08-25 19:01 ` Linus Torvalds
2008-08-25 21:31   ` J. Bruce Fields
2008-08-25 21:37     ` Linus Torvalds
2008-08-25 22:13       ` J. Bruce Fields
2008-08-25 23:59         ` Linus Torvalds
2008-08-26 20:43           ` Jason McMullan
2008-08-26 21:01             ` Jason McMullan
2008-08-27 17:05               ` Linus Torvalds
2008-08-27 19:17                 ` Nicolas Pitre
2008-08-27 19:48                   ` Linus Torvalds
2008-08-27 20:46                     ` Nicolas Pitre
2008-08-29  2:05                       ` [PATCH 0/3] don't let disk corruptions escape pack SHA1 checksum Nicolas Pitre
2008-08-29  2:07                         ` [PATCH 1/3] improve reliability of fixup_pack_header_footer() Nicolas Pitre
2008-08-29  2:07                           ` [PATCH 2/3] pack-objects: use fixup_pack_header_footer()'s validation mode Nicolas Pitre
2008-08-29  2:07                             ` [PATCH 3/3] index-pack: " Nicolas Pitre
2008-08-29  4:44                           ` [PATCH 1/3] improve reliability of fixup_pack_header_footer() Shawn O. Pearce
2008-08-29 13:08                             ` Nicolas Pitre [this message]
2008-08-29 14:30                               ` Shawn O. Pearce
2008-08-29 20:07                                 ` [PATCH 0/5] pack header rewriting improvements Nicolas Pitre
2008-08-29 20:07                                   ` [PATCH 1/5] pack-objects: improve returned information from write_one() Nicolas Pitre
2008-08-29 20:07                                     ` [PATCH 2/5] improve reliability of fixup_pack_header_footer() Nicolas Pitre
2008-08-29 20:08                                       ` [PATCH 3/5] pack-objects: use fixup_pack_header_footer()'s validation mode Nicolas Pitre
2008-08-29 20:08                                         ` [PATCH 4/5] index-pack: " Nicolas Pitre
2008-08-29 20:08                                           ` [PATCH 5/5] fixup_pack_header_footer(): use nicely aligned buffer sizes Nicolas Pitre
2008-08-31  7:10                                             ` Junio C Hamano
2008-08-29 20:14                                 ` [PATCH 1/3] improve reliability of fixup_pack_header_footer() Nicolas Pitre
2008-08-29  4:55                         ` [PATCH 0/3] don't let disk corruptions escape pack SHA1 checksum Shawn O. Pearce
2008-08-26 20:55           ` "failed to read delta base object at..." J. Bruce Fields
2008-08-27 20:14           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.10.0808290844200.1624@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).