All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Catalin Marinas <catalin.marinas@gmail.com>
Cc: "Karl Hasselström" <kha@treskal.com>,
	"Andy Green (林安廸)" <andy@warmcat.com>,
	git@vger.kernel.org
Subject: Re: [StGit PATCH] Parse commit object header correctly
Date: Thu, 09 Feb 2012 11:04:02 -0800	[thread overview]
Message-ID: <7vk43vx1zx.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <CAHkRjk6dr=5wxm+iSC2_CSB-q3k2WG_Um+X7dwsy-H8tL508EA@mail.gmail.com> (Catalin Marinas's message of "Thu, 9 Feb 2012 09:38:00 +0000")

Catalin Marinas <catalin.marinas@gmail.com> writes:

> On 8 February 2012 07:33, Junio C Hamano <gitster@pobox.com> wrote:
>> To allow parsing the header produced by versions of Git newer than the
>> code written to parse it, all commit parsers are expected to skip unknown
>> header lines, so that newer types of header lines can be added safely.
>> The only three things that are promised are:
>>
>>  (1) the header ends with an empty line (just an LF, not "a blank line"),
>>  (2) unknown lines can be skipped, and
>>  (3) a header "field" begins with the field name, followed by a single SP
>>     followed by the value.
>
> Thanks for looking into this. Is this the same as an email header? If
> yes, we could just use the python's email.Header.decode_header()
> function (I haven't tried yet).

If you are thinking about feeding everything up to the first empty line to
whatever is designed to parse email header, please don't.  I do not think
they obey "skip unknown lines without barfing" rule [*1*], so we would be
back to square one if you did so.

The fix posted in this thread is necessary because the change to StGit
between v0.15 and v0.16 made to ignore lines starting with "encoding " was
a wrong way to work around the broken parser in v0.15 in the first place.
The parser assumed that (1) all whitespaces around the header lines can be
stripped, (2) the result after such stripping will always have at least
one whitespace so that line.split(None, 1) will never barf, and (3)
between the field name and its value there may be arbitrary number of
whitespace characters that can be ignored so that line.split(None, 1) is a
safe way to split it into a (key,value) pair.  None of which is a safe
thing to assume.  The rule for safe parsing is to ignore all lines it does
not understand without assuming anything, and I wrote the patch in this
thread to make sure it makes no such unwarranted assumption.

> BTW, does Git allow custom headers to be inserted by tools like StGit?

The header format is designed in such a way that it is safe for a parser
to silently ignore unknown cruft, but that also means tools that work on
an existing commit and produce a similar one, like "commit --amend", are
free to either ignore and drop them when creating a new commit out of the
original one, or replay it verbatim without adjusting them to the new
context they appear in.  In that sense, they are technically "allowed",
but depending on the nature of the information you are putting there, it
semantically may or may not produce the desired result [*2*].  I would say
it is strongly discouraged to invent new types of header lines without
first consulting the people who maintain tools you must interoperate with,
so that they will also be aware of them, and hopefully their tools can be
adjusted to help you use them.

Sorry for the breakage and making you to deal with this post release. We
observed that recent StGit did not barf after we added "encoding " field
to Git, and assumed that StGit correctly ignored lines that it did not
understand, without inspecting its code.

At least we should have Cc'ed you guys directly when the change was being
discussed on the list.


[Footnote]

*1* I also suspect that it will handle a line that begins with a single SP
differently if you use email parsing rules. In a commit object header, the
content of such a line is appended to the value of the previous field
after turning that leading single SP into a LF, and the resulting value
will be a string that consists of multiple lines. The header folding rule
used for e-mail in RFC2822 is a way to represent a (logically) single line
as physically multiple lines, so the result of unfolding will become a
single line.  This difference may not matter for the purpose of the
current StGit that understands nothing but tree/parent/author/committer,
but because we are discussing a forward-looking fix for its parser, I
wouldn't recommend "it does not matter because we do not currently care"
approach.

*2* For example, a line that begins with "gpgsig " is a field that records
the GPG signature of a commit itself (using "git commit -S"), and it
should not survive across "commit --amend".  A line that begins with
"mergetag " is a field that records the tag information that was merged
from a side branch, and amending such a merge does not change what was
merged, so it should survive across "commit --amend".

      parent reply	other threads:[~2012-02-09 19:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-07 13:02 STGIT: Deathpatch in linus tree "Andy Green (林安廸)"
2012-02-07 17:37 ` Junio C Hamano
2012-02-08  7:33   ` [StGit PATCH] Parse commit object header correctly Junio C Hamano
2012-02-08 10:00     ` Michael Haggerty
2012-02-08 10:43       ` Frans Klaver
2012-02-08 16:17         ` Michael Haggerty
2012-02-08 20:04           ` Frans Klaver
2012-02-09  3:58             ` Junio C Hamano
2012-02-15 12:24       ` Catalin Marinas
2012-02-15 18:13         ` Junio C Hamano
2012-02-15 18:40         ` "Andy Green (林安廸)"
2012-02-09  9:38     ` Catalin Marinas
2012-02-09 17:51       ` Jonathan Nieder
2012-02-10  4:27         ` Nguyen Thai Ngoc Duy
2012-02-09 19:04       ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vk43vx1zx.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=andy@warmcat.com \
    --cc=catalin.marinas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kha@treskal.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.