git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: thomas <trast@student.ethz.ch>
Cc: "Stefan Zager" <szager@google.com>,
	git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Nicolas Pitre" <nico@fluxnic.net>
Subject: Re: [PATCH] sha1_file: remove recursion in packed_object_info
Date: Mon, 25 Mar 2013 11:17:38 -0700	[thread overview]
Message-ID: <7vtxnzxs1p.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <87620faky3.fsf@linux-k42r.v.cablecom.net> (thomas's message of "Mon, 25 Mar 2013 10:27:16 +0100")

thomas <trast@student.ethz.ch> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> The following comment is also lost but...
>>
>>> -	/* We choose to only get the type of the base object and
>>> -	 * ignore potentially corrupt pack file that expects the delta
>>> -	 * based on a base with a wrong size.  This saves tons of
>>> -	 * inflate() calls.
>>> -	 */
>>> -	if (sizep) {
>>> -		*sizep = get_size_from_delta(p, w_curs, curpos);
>>> -		if (*sizep == 0)
>>> -			type = OBJ_BAD;
>>
>> ... is this check correct?  There is an equivalent check at the
>> beginning of the new packed_object_info() to error out a deltified
>> result.  Why is an object whose size is 0 bad?
>
> Cc'ing Nicolas, but I think there are several reasons:
>
> If it's a delta, then according to docs[1] it starts with the SHA1 of
> the base object, plus the deflated data.  So it is at least 20 bytes.

get_size_from_delta() grabs the size, the number you would get in
the third parameter of read_sha1_file(), of the result of applying
the delta we are looking at.  The part that stores this information
is called the "compressed delta data" in the document you are
looking at.

The function you want to look at is patch_delta(), where it grabs
two such sizes from the delta stream with get_delta_hdr_size().

A delta stream begins with:

    * preimage length, expressed as a 7-bit-per-byte varint;
    * postimage length, expressed as a 7-bit-per-byte varint;

followed by number of records, each prefixed by a command byte.

    * Command byte with its 8th bit set records source offset and
      size (max 32 and 24 bits, respectively---other 7 bits in the
      command byte tells us how large the offset and size are) and
      tells us to insert a copy of that region at the current point.

    * Command byte between 1-127 (inclusive) tells us to add that
      many bytes that follow the command byte from the delta stream
      at the current point.

    * Command byte 0 is an error.

And get_size_from_delta() skips the preimage length, grabs postimage
length and returns the latter.  It is how we decide how many bytes
we need to allocate to hold the result of applying the delta.

> If it's not a delta, then it must start with '<type> <size>\0', which
> even after compression cannot possibly be 0 bytes.
>
> Either way, get_size_from_delta() also uses 0 as the error return.

Yes, that is why I said "is this check correct?".  As I already
said, I think the only two things that protects us from creating a
delta whose postimage size is 0 are the fact that we do not even
attempt to deltify anything smaller than 50 bytes in pack-objects,
and create_delta() refuses to create a delta to produce an empty
postimage.

  parent reply	other threads:[~2013-03-25 18:18 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-15 22:42 regression in multi-threaded git-pack-index Stefan Zager
2013-03-16 11:41 ` Jeff King
2013-03-16 12:38   ` Duy Nguyen
2013-03-19  8:17   ` Thomas Rast
2013-03-19  9:30     ` Jeff King
2013-03-19  9:59       ` Jeff King
2013-03-19 10:08         ` Jeff King
2013-03-19 10:24           ` Jeff King
2013-03-19 10:29             ` Thomas Rast
2013-03-19 10:33               ` Jeff King
2013-03-19 10:45                 ` Thomas Rast
2013-03-19 10:47                   ` Jeff King
2013-03-19 10:58             ` [PATCH] index-pack: always zero-initialize object_entry list Jeff King
2013-03-19 15:33               ` Thomas Rast
2013-03-19 15:43                 ` Jeff King
2013-03-19 15:52                   ` Jeff King
2013-03-19 16:17                     ` [PATCH v2] " Jeff King
2013-03-19 16:27                       ` Thomas Rast
2013-03-19 17:13                         ` Junio C Hamano
2013-03-20 19:12                       ` Eric Sunshine
2013-03-20 19:13                         ` Jeff King
2013-03-20 19:14                           ` Eric Sunshine
2013-03-19 12:35     ` regression in multi-threaded git-pack-index Duy Nguyen
2013-03-19 13:01       ` [PATCH] index-pack: protect deepest_delta in multithread code Nguyễn Thái Ngọc Duy
2013-03-19 13:25         ` Jeff King
2013-03-19 13:50         ` Thomas Rast
2013-03-19 14:07           ` Duy Nguyen
2013-03-19 14:16             ` [PATCH v2] index-pack: guard nr_resolved_deltas reads by lock Thomas Rast
2013-03-19 15:53               ` Junio C Hamano
2013-03-19 15:41 ` regression in multi-threaded git-pack-index Thomas Rast
2013-03-19 15:45   ` Thomas Rast
2013-03-19 16:11     ` Thomas Rast
2013-03-19 17:58       ` Junio C Hamano
2013-03-19 22:08         ` [PATCH] sha1_file: remove recursion in packed_object_info Thomas Rast
2013-03-20 16:47           ` Junio C Hamano
2013-03-25  9:27             ` thomas
2013-03-25 18:07               ` [PATCH v2 0/3] Recursion-free unpack_entry and packed_object_info Thomas Rast
2013-03-25 18:07                 ` [PATCH v2 1/3] sha1_file: remove recursion in packed_object_info Thomas Rast
2013-03-25 18:07                 ` [PATCH v2 2/3] Refactor parts of in_delta_base_cache/cache_or_unpack_entry Thomas Rast
2013-03-25 23:15                   ` Junio C Hamano
2013-03-26 11:09                     ` thomas
2013-03-25 18:07                 ` [PATCH v2 3/3] sha1_file: remove recursion in unpack_entry Thomas Rast
2013-03-25 23:19                   ` Junio C Hamano
2013-03-26  3:37                 ` [PATCH v2 0/3] Recursion-free unpack_entry and packed_object_info Nicolas Pitre
2013-03-25 18:17               ` Junio C Hamano [this message]
2013-03-27 20:03               ` [PATCH v3 " Thomas Rast
2013-03-27 20:03                 ` [PATCH v3 1/3] sha1_file: remove recursion in packed_object_info Thomas Rast
2013-03-27 20:03                 ` [PATCH v3 2/3] Refactor parts of in_delta_base_cache/cache_or_unpack_entry Thomas Rast
2013-03-27 20:03                 ` [PATCH v3 3/3] sha1_file: remove recursion in unpack_entry Thomas Rast
2013-03-27 20:29                   ` Junio C Hamano
2013-03-20  1:17       ` regression in multi-threaded git-pack-index Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vtxnzxs1p.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=szager@google.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).