git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Bouke Versteegh <info@boukeversteegh.nl>
Cc: git@vger.kernel.org
Subject: Re: Docs: possible incorrect diagram of Delta Copy instruction
Date: Fri, 11 Sep 2020 10:33:21 -0400	[thread overview]
Message-ID: <20200911143321.GA2374950@coredump.intra.peff.net> (raw)
In-Reply-To: <CAFag0172JvkTgzXs2ieq2kSfnNRy_8n5YsP87uQ-Pyf5vB+yfw@mail.gmail.com>

On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote:

> While working on an implementation of git in dart, I've noticed a
> possible error in the documentation. I hope I'm using the correct
> channel to report this issue.

This is definitely the right place.

> On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object)
> a diagram is shown, that explains the format of a Copy instruction
> inside a Deltified pack entry:
> 
> +----------+---------+---------+---------+---------+-------+-------+-------+
> | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
> +----------+---------+---------+---------+---------+-------+-------+-------+
> 
> The documentation specifies that diagrams follow the RFC950
> (https://www.ietf.org/rfc/rfc1950.txt) format.

I think you mean rfc1951 here, but yeah, it is clear in that document
that bytes are in MSB order. And also that each of those boxes is a
single byte.

> That means that the left bit is MSB, and the right bit is LSB, so the
> OpCode is MSB (1xxxxxxx), which is correct and matches other sources.

Right, agreed.

> It also would mean that offset 1-4 should be read from bit 7, 6, 5 and
> 4 (i.e. 0x40, 0x20, 0x10, 0x08)

I don't think this follows, though. "offset1" is a separate byte. We
have a sequence of bytes here, some of which may be present. Their
presence is determined by the bits "xxxxxxx", but the diagram does not
say anything about the order.

The paragraphs below do say:

  This is the instruction format to copy a byte range from the source
  object. It encodes the offset to copy from and the number of bytes to
  copy. Offset and size are in little-endian order.

So "offset1" is the lowest byte of the offset value. We still don't know
which bit in the instruction byte tells us whether it's there yet,
though. The next paragraph says:

  All offset and size bytes are optional. This is to reduce the
  instruction size when encoding small offsets or sizes. The first seven
  bits in the first octet determines which of the next seven octets is
  present. If bit zero is set, offset1 is present. If bit one is set
  offset2 is present and so on.

So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check
that first. Which I think is what happens in:

>   PARSE_CP_PARAM(0x01, cp_off, 0);

That's reading the low bit of the command byte, and then reading the
_next_ byte from data, and shifting it not at all (because it's the low
byte of the offset). That macro looks like this:

  #define PARSE_CP_PARAM(bit, var, shift) do { \
                          if (cmd & (bit)) { \
                                  if (data >= top) \
                                          goto bad_length; \
                                  var |= ((unsigned) *data++ << (shift)); \
                          } } while (0)

> However, looking at the git source-code and other documentation (see
> [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and
> the SIZE  bits are stored right after MSB (opcode).

I think both the code and diagram are right. It's just that the order of
the follow-on bytes is not part of that diagram, but rather the
explanation below it.

> From this source code, I would conclude that the diagram should look
> like this instead:
> 
> +----------+---------+---------+---------+---------+-------+-------+-------+
> | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 |
> +----------+---------+---------+---------+---------+-------+-------+-------+

That would definitely be wrong. If there's an offset1 byte, it
immediately follows the command byte.

-Peff

  reply	other threads:[~2020-09-11 15:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-11 12:30 Docs: possible incorrect diagram of Delta Copy instruction Bouke Versteegh
2020-09-11 14:33 ` Jeff King [this message]
2020-09-12 11:21   ` Bouke Versteegh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200911143321.GA2374950@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=info@boukeversteegh.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).