* git-am includes escape characters from 'From' field @ 2016-09-12 20:10 Swift Geek 2016-09-13 15:26 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Swift Geek @ 2016-09-12 20:10 UTC (permalink / raw) To: git git-am seems to add backslash that escapes double quote character, example git format-patch From 63da989a5295214f9bd06cd7b409a86a65241eea Mon Sep 17 00:00:00 2001 From: "Sebastian \"Swift Geek\" Grzywna" <swiftgeek@gmail.com> Date: Mon, 12 Sep 2016 21:27:32 +0200 Subject: [PATCH] Example showing git-am bug that includes escape characters --- testfile | 1 + 1 file changed, 1 insertion(+) create mode 100644 testfile diff --git a/testfile b/testfile new file mode 100644 index 0000000..b48e119 --- /dev/null +++ b/testfile @@ -0,0 +1 @@ +test file contents -- 2.9.3 In .gitconfig I have name = Sebastian \"Swift Geek\" Grzywna which appears to work fine with regular git commit - backslashes are not included in 'From' field. Regards, Swift Geek ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: git-am includes escape characters from 'From' field 2016-09-12 20:10 git-am includes escape characters from 'From' field Swift Geek @ 2016-09-13 15:26 ` Jeff King 2016-09-13 23:46 ` [RFC 0/1] de-quote quoted-strings in mailinfo Kevin Daudt 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2016-09-13 15:26 UTC (permalink / raw) To: Swift Geek; +Cc: git On Mon, Sep 12, 2016 at 10:10:23PM +0200, Swift Geek wrote: > git-am seems to add backslash that escapes double quote character, example > git format-patch > > From 63da989a5295214f9bd06cd7b409a86a65241eea Mon Sep 17 00:00:00 2001 > From: "Sebastian \"Swift Geek\" Grzywna" <swiftgeek@gmail.com> This looks correct; the output of format-patch is an rfc2822 message, and it requires this quoting. The part you don't show, and that I think is wrong, is that if you then "git am" this patch, it pulls the backslashes into the commit object. The culprit looks like "parse_mail()" in builtin/am.c (or possibly mailinfo() that it calls), which blindly picks up the name portion without doing any rfc2822 de-quoting. I don't think we have any existing de-quoting routines to plug in, so fixing it would probably start with writing one. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 0/1] de-quote quoted-strings in mailinfo 2016-09-13 15:26 ` Jeff King @ 2016-09-13 23:46 ` Kevin Daudt 2016-09-13 23:46 ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt 0 siblings, 1 reply; 16+ messages in thread From: Kevin Daudt @ 2016-09-13 23:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Kevin Daudt This is my first 'big' C patch, so first an RFC. This patch implements RFC2822 dequoting of quoted-pairs in quoted strings, which was not done yet. This means removing the "\" as escape character from header fields, but only quoted strings, and comments (text between braces). According to the RFC, comments can also appear in square brackets in the e-mail domain, but that has not been implemented. In fact, just like other functions, it just looks at the whole header line. Please let me know what you think. Kevin Daudt (1): mailinfo: de-quote quoted-pair in header fields mailinfo.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ t/t5100-mailinfo.sh | 5 +++++ t/t5100/quoted-pair.expect | 5 +++++ t/t5100/quoted-pair.in | 9 +++++++++ t/t5100/quoted-pair.info | 5 +++++ 5 files changed, 70 insertions(+) create mode 100644 t/t5100/quoted-pair.expect create mode 100644 t/t5100/quoted-pair.in create mode 100644 t/t5100/quoted-pair.info -- 2.10.0.rc2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-13 23:46 ` [RFC 0/1] de-quote quoted-strings in mailinfo Kevin Daudt @ 2016-09-13 23:46 ` Kevin Daudt 2016-09-14 0:04 ` Junio C Hamano 2016-09-14 5:13 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: Kevin Daudt @ 2016-09-13 23:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Kevin Daudt rfc2822 has provisions for quoted strings in structured header fields, but also allows for escaping these with so-called quoted-pairs. git currently does not do anything with this at all, and verbatim takes over the field body. Make sure to properly dequote these quoted-strings and comments. Signed-off-by: Kevin Daudt <me@ikke.info> --- mailinfo.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ t/t5100-mailinfo.sh | 5 +++++ t/t5100/quoted-pair.expect | 5 +++++ t/t5100/quoted-pair.in | 9 +++++++++ t/t5100/quoted-pair.info | 5 +++++ 5 files changed, 70 insertions(+) create mode 100644 t/t5100/quoted-pair.expect create mode 100644 t/t5100/quoted-pair.in create mode 100644 t/t5100/quoted-pair.info diff --git a/mailinfo.c b/mailinfo.c index e19abe3..3b7ae8a 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -445,6 +445,51 @@ static void decode_header(struct mailinfo *mi, struct strbuf *it) mi->input_error = -1; } +static int unescape_quoted_pair(struct mailinfo *mi, struct strbuf *line) +{ + struct strbuf outbuf = STRBUF_INIT; + const char *in = line->buf; + int c, skip=0; + char escape_context=0; + + while ((c = *in++) != 0) { + if (!skip) { + switch (c) { + case '"': + if (!escape_context) + escape_context = '"'; + else if (escape_context == '"') + escape_context = 0; + break; + case '\\': + if (escape_context) { + skip = 1; + continue; + } + break; + case '(': + if (!escape_context) + escape_context = '('; + break; + case ')': + if (escape_context == '(') + escape_context = 0; + break; + } + } else { + skip = 0; + } + + strbuf_addch(&outbuf, c); + } + + strbuf_reset(line); + strbuf_addbuf(line, &outbuf); + + return 0; + +} + static int check_header(struct mailinfo *mi, const struct strbuf *line, struct strbuf *hdr_data[], int overwrite) @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi, */ strbuf_add(&sb, line->buf + len + 2, line->len - len - 2); decode_header(mi, &sb); + unescape_quoted_pair(mi, &sb); handle_header(&hdr_data[i], &sb); ret = 1; goto check_header_out; diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh index 1a5a546..2be61bf 100755 --- a/t/t5100-mailinfo.sh +++ b/t/t5100-mailinfo.sh @@ -142,4 +142,9 @@ test_expect_success 'mailinfo unescapes with --mboxrd' ' test_cmp expect mboxrd/msg ' +test_expect_success 'mailinfo unescapes rfc2822 quoted-pair' ' + git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >"$TEST_DIRECTORY"/t5100/quoted-pair.info && + test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect "$TEST_DIRECTORY"/t5100/quoted-pair.info +' + test_done diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect new file mode 100644 index 0000000..9fe72e9 --- /dev/null +++ b/t/t5100/quoted-pair.expect @@ -0,0 +1,5 @@ +Author: "Author "The Author" Name" +Email: somebody@example.com +Subject: testing quoted-pair +Date: Sun, 25 May 2008 00:38:18 -0700 + diff --git a/t/t5100/quoted-pair.in b/t/t5100/quoted-pair.in new file mode 100644 index 0000000..e2e627a --- /dev/null +++ b/t/t5100/quoted-pair.in @@ -0,0 +1,9 @@ +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001 +From: "Author \"The Author\" Name" <somebody@example.com> +Date: Sun, 25 May 2008 00:38:18 -0700 +Subject: [PATCH] testing quoted-pair + + + +--- +patch diff --git a/t/t5100/quoted-pair.info b/t/t5100/quoted-pair.info new file mode 100644 index 0000000..9fe72e9 --- /dev/null +++ b/t/t5100/quoted-pair.info @@ -0,0 +1,5 @@ +Author: "Author "The Author" Name" +Email: somebody@example.com +Subject: testing quoted-pair +Date: Sun, 25 May 2008 00:38:18 -0700 + -- 2.10.0.rc2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-13 23:46 ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt @ 2016-09-14 0:04 ` Junio C Hamano 2016-09-14 4:58 ` Kevin Daudt 2016-09-14 5:09 ` Jeff King 2016-09-14 5:13 ` Jeff King 1 sibling, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2016-09-14 0:04 UTC (permalink / raw) To: Kevin Daudt; +Cc: git Kevin Daudt <me@ikke.info> writes: > +static int unescape_quoted_pair(struct mailinfo *mi, struct strbuf *line) > +{ > + struct strbuf outbuf = STRBUF_INIT; > + const char *in = line->buf; > + int c, skip=0; > + char escape_context=0; Have SP around '=', i.e. int c, skip = 0; char escape_context = 0; > + while ((c = *in++) != 0) { > + if (!skip) { > + switch (c) { > + case '"': > +... > + break; > + } > + } else { > + skip = 0; > + } > + > + strbuf_addch(&outbuf, c); > + } It often is easier to read if smaller of the two are in the if part and the larger in else part. Also your switch/case is indented one level too deep. I.e. while (...) { if (skip) { skip = 0; } else { switch (c) { case '"': do this; ... } } strbuf_addch(...); } I found the variable name "skip" a bit hard to reason about. What it does is to signal the next round of the processing that we have seen a single-byte quote and it should keep the byte it will get, no matter what its value is. It is "skipping" the conditional processing, but I'd imagine most people would consider it is "keeping the byte". > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi, > */ > strbuf_add(&sb, line->buf + len + 2, line->len - len - 2); > decode_header(mi, &sb); > + unescape_quoted_pair(mi, &sb); > handle_header(&hdr_data[i], &sb); > ret = 1; > goto check_header_out; I wonder why this call is only in here, not on other headers that all call decode_header(). For that matter, I wonder if the call (or the logic of the helper function itself) should go at the end of decode_header(). After all, this is different kind of decoding; the current one knows how to do b/q encoding but forgot about the more traditional quoting done with backslash, and you are teaching the code that the current decoding it does is insufficient and how to handle the one that the original implementors forgot about. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 0:04 ` Junio C Hamano @ 2016-09-14 4:58 ` Kevin Daudt 2016-09-14 5:09 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: Kevin Daudt @ 2016-09-14 4:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, Sep 13, 2016 at 05:04:45PM -0700, Junio C Hamano wrote: > Kevin Daudt <me@ikke.info> writes: > > > It often is easier to read if smaller of the two are in the if part > and the larger in else part. Also your switch/case is indented one > level too deep. I.e. > Thanks, I've switched the order and fixed indentation. > > I found the variable name "skip" a bit hard to reason about. What > it does is to signal the next round of the processing that we have > seen a single-byte quote and it should keep the byte it will get, no > matter what its value is. It is "skipping" the conditional > processing, but I'd imagine most people would consider it is > "keeping the byte". Yes, I agree and was trying to find a better name. I have renamed it to "take_next_literally", which indicates better what it means. > > > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi, > > */ > > strbuf_add(&sb, line->buf + len + 2, line->len - len - 2); > > decode_header(mi, &sb); > > + unescape_quoted_pair(mi, &sb); > > handle_header(&hdr_data[i], &sb); > > ret = 1; > > goto check_header_out; > > I wonder why this call is only in here, not on other headers that > all call decode_header(). For that matter, I wonder if the call (or > the logic of the helper function itself) should go at the end of > decode_header(). After all, this is different kind of decoding; the > current one knows how to do b/q encoding but forgot about the more > traditional quoting done with backslash, and you are teaching the > code that the current decoding it does is insufficient and how to > handle the one that the original implementors forgot about. Makes sense, it should be applied to all headers (I missed the other decode_header calls). I will send a new version later. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 0:04 ` Junio C Hamano 2016-09-14 4:58 ` Kevin Daudt @ 2016-09-14 5:09 ` Jeff King 2016-09-14 5:54 ` Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2016-09-14 5:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kevin Daudt, git On Tue, Sep 13, 2016 at 05:04:45PM -0700, Junio C Hamano wrote: > > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi, > > */ > > strbuf_add(&sb, line->buf + len + 2, line->len - len - 2); > > decode_header(mi, &sb); > > + unescape_quoted_pair(mi, &sb); > > handle_header(&hdr_data[i], &sb); > > ret = 1; > > goto check_header_out; > > I wonder why this call is only in here, not on other headers that > all call decode_header(). For that matter, I wonder if the call (or > the logic of the helper function itself) should go at the end of > decode_header(). After all, this is different kind of decoding; the > current one knows how to do b/q encoding but forgot about the more > traditional quoting done with backslash, and you are teaching the > code that the current decoding it does is insufficient and how to > handle the one that the original implementors forgot about. It has been a while since I looked at rfc2822, but aren't the quoting and syntax rules different for addresses versus other headers? We would not want to dequote a Subject header, I think. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 5:09 ` Jeff King @ 2016-09-14 5:54 ` Junio C Hamano 2016-09-14 16:03 ` Kevin Daudt 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2016-09-14 5:54 UTC (permalink / raw) To: Jeff King; +Cc: Kevin Daudt, git Jeff King <peff@peff.net> writes: > It has been a while since I looked at rfc2822, but aren't the quoting > and syntax rules different for addresses versus other headers? We would > not want to dequote a Subject header, I think. You're absolutely right. RFC2822 does not quite _want_ to dequote anything. As you pointed out in a separate message, we are the one who want to strip out "" quoting when mailinfo says Author: "Jeff King" to its standard output (aka "info"), and turn it into GIT_AUTHOR_NAME='Jeff King' and do so ONLY for the author name. So I would think it is the responsibility of the one that reads the "info" file that is produced by mailinfo to dequote the backslash thing if the mailinfo gave us Author: "Jeff \"Peff\" King" ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 5:54 ` Junio C Hamano @ 2016-09-14 16:03 ` Kevin Daudt 2016-09-14 17:43 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Kevin Daudt @ 2016-09-14 16:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git On Tue, Sep 13, 2016 at 10:54:47PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > It has been a while since I looked at rfc2822, but aren't the quoting > > and syntax rules different for addresses versus other headers? We would > > not want to dequote a Subject header, I think. > > You're absolutely right. RFC2822 does not quite _want_ to dequote > anything. As you pointed out in a separate message, we are the one > who want to strip out "" quoting when mailinfo says > > Author: "Jeff King" > > to its standard output (aka "info"), and turn it into > > GIT_AUTHOR_NAME='Jeff King' > > and do so ONLY for the author name. > > So I would think it is the responsibility of the one that reads the > "info" file that is produced by mailinfo to dequote the backslash > thing if the mailinfo gave us > > Author: "Jeff \"Peff\" King" > The RFC makes a distinction between structured fields and unstructured fields. Quoting would not even be necessary for unstructured fields (like Subject), so yes, that those fields should be left alone. Unstructures fields are subject, comments, keywords and optional fields, the rest is considered structured. Because the only field where this is a problem is the From field, I think it would be safe to limit the unquoting just to that field. My reasoning to do the unquoting here is because it's the RFC requires the quoting in the first place. I already noticed a bug in the current unquoting of the author when adding a comment to the From: field. From: "A U Thor" <au@thor.com> (test) When applied the the author of this patch shows up as: Author: A U Thor" (test) <au@thor.com> So I agree with Jeff[1] where he states that the surrounding quotes should be removed, if that's not a problem for git. [1]:https://public-inbox.org/git/20160914051305.vphknpsikyxi3hg3@sigill.intra.peff.net/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 16:03 ` Kevin Daudt @ 2016-09-14 17:43 ` Junio C Hamano 2016-09-14 19:17 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2016-09-14 17:43 UTC (permalink / raw) To: Kevin Daudt; +Cc: Jeff King, git Kevin Daudt <me@ikke.info> writes: > When applied the the author of this patch shows up as: > > Author: A U Thor" (test) <au@thor.com> > > So I agree with Jeff[1] where he states that the surrounding quotes > should be removed, if that's not a problem for git. > > [1]:https://public-inbox.org/git/20160914051305.vphknpsikyxi3hg3@sigill.intra.peff.net/ I think we can go either way and it does not matter all that much if "mailinfo" changes its output or the reader of "mailinfo" output changes its input--we will either be munging data read from "From:" when producing the "Author:" line, or taking the "Author:" output by mailinfo and removing the quotes. As an output from mailinfo that looks like this: Author: "A U Thor" Email: au@thor.com is made into a commit object that has this: author A U Thor <au@thor.com> we know that the reader of mailinfo output _already_ has some logic to strip the surrounding double quotes. That is the only reason why I think it is a better approach to not dequote in the "mailinfo" but in the reader to turn Author: "A \"U\" Thor" Email: au@thor.com into a commit object that has this: author A "U" Thor <au@thor.com> than updating mailinfo to produce Author: A "U" Thor Email: au@thor.com and then create the same result. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 17:43 ` Junio C Hamano @ 2016-09-14 19:17 ` Jeff King 2016-09-14 19:30 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2016-09-14 19:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kevin Daudt, git On Wed, Sep 14, 2016 at 10:43:18AM -0700, Junio C Hamano wrote: > I think we can go either way and it does not matter all that much if > "mailinfo" changes its output or the reader of "mailinfo" output > changes its input--we will either be munging data read from "From:" > when producing the "Author:" line, or taking the "Author:" output by > mailinfo and removing the quotes. Yeah, that was the part I was wondering about in my original response. What is the output of mailinfo _supposed_ to be, and do we consider that at all public (i.e., are there are other tools besides "git am" that build on mailinfo)? At least "am" already does some quote-stripping, so any de-quoting added in mailinfo is potentially a regression (if we indeed care about keeping the output stable). But if we are OK with that, it seems to me that mailinfo is the best place to do the de-quoting, because then its output is well-defined: everything after "Author:" up to the newline is the name. Whereas if the cleanup of the value is split across mailinfo and its reader, then it is hard to know which side is responsible for which part. mailinfo handles whitespace unfolding, certainly. What other rfc2822 things does it handle? What are the rules for dequoting its output? I'll admit I don't care _too_ much. This is a remote corner of the code that I hope never to have to look at. I'm mostly just describing how the problem space makes sense to _me_, and how I would write it if starting from scratch. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 19:17 ` Jeff King @ 2016-09-14 19:30 ` Junio C Hamano 2016-09-14 19:38 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2016-09-14 19:30 UTC (permalink / raw) To: Jeff King; +Cc: Kevin Daudt, git Jeff King <peff@peff.net> writes: > On Wed, Sep 14, 2016 at 10:43:18AM -0700, Junio C Hamano wrote: > >> I think we can go either way and it does not matter all that much if >> "mailinfo" changes its output or the reader of "mailinfo" output >> changes its input--we will either be munging data read from "From:" >> when producing the "Author:" line, or taking the "Author:" output by >> mailinfo and removing the quotes. > > Yeah, that was the part I was wondering about in my original response. > What is the output of mailinfo _supposed_ to be, and do we consider that > at all public (i.e., are there are other tools besides "git am" that > build on mailinfo)? > > At least "am" already does some quote-stripping, so any de-quoting added > in mailinfo is potentially a regression (if we indeed care about keeping > the output stable). Another small thing I am not sure about is if the \ quoting can hide an embedded newline in the author name. Would we end up turning From: "Jeff \ King" <peff@peff.net> or somesuch into Author: Jeff King Email: peff@peff.net ;-) > But if we are OK with that, it seems to me that mailinfo is the best > place to do the de-quoting, because then its output is well-defined: > everything after "Author:" up to the newline is the name. There are other things mailinfo does, like turning this From: peff@peff.net (Jeff King) into Author: Jeff King Email: peff@peff.net and From: Uh "foo" Bar peff@peff.net (Jeff King) into Author: Uh "foo" Bar (Jeff King) Email: peff@peff.net So let's roll the \" -> " into mailinfo. I am not sure if we also should remove the surrounding "", i.e. we currently do not turn this From: "Jeff King" <peff@peff.net> into this: Author: Jeff King Email: peff@peff.net I think we probably should, and remove the one that does so from the reader. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 19:30 ` Junio C Hamano @ 2016-09-14 19:38 ` Jeff King 2016-09-15 5:15 ` Kevin Daudt 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2016-09-14 19:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kevin Daudt, git On Wed, Sep 14, 2016 at 12:30:06PM -0700, Junio C Hamano wrote: > Another small thing I am not sure about is if the \ quoting can hide > an embedded newline in the author name. Would we end up turning > > From: "Jeff \ > King" <peff@peff.net> > > or somesuch into > > Author: Jeff > King > Email: peff@peff.net > > ;-) Heh, yeah. That is another reason to clean up and sanitize as much as possible before stuffing it into another text format that will be parsed. > So let's roll the \" -> " into mailinfo. > > I am not sure if we also should remove the surrounding "", i.e. we > currently do not turn this > > From: "Jeff King" <peff@peff.net> > > into this: > > Author: Jeff King > Email: peff@peff.net > > I think we probably should, and remove the one that does so from the > reader. I think you have to, or else you cannot tell the difference between surrounding quotes that need to be stripped, and ones that were backslash-escaped. Like: From: "Jeff King" <peff@peff.net> From: \"Jeff King\" <peff@peff.net> which would both become: Author: "Jeff King" Email: peff@peff.net though I am not sure the latter one is actually valid; you might need to be inside syntactic quotes in order to include backslashed quotes. I haven't read rfc2822 carefully recently enough to know. Anyway, I think that: From: One "Two \"Three\" Four" Five may also be valid. So the quote-stripping in the reader is not just "at the outside", but may need to handle interior syntactic quotes, too. So it really makes sense for me to clean and sanitize as much as possible in one step, and then make the parser of mailinfo as dumb as possible. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-14 19:38 ` Jeff King @ 2016-09-15 5:15 ` Kevin Daudt 2016-09-15 7:18 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Kevin Daudt @ 2016-09-15 5:15 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On Wed, Sep 14, 2016 at 12:38:20PM -0700, Jeff King wrote: > On Wed, Sep 14, 2016 at 12:30:06PM -0700, Junio C Hamano wrote: > > > Another small thing I am not sure about is if the \ quoting can hide > > an embedded newline in the author name. Would we end up turning > > > > From: "Jeff \ > > King" <peff@peff.net> > > > > or somesuch into > > > > Author: Jeff > > King > > Email: peff@peff.net > > > > ;-) > > Heh, yeah. That is another reason to clean up and sanitize as much as > possible before stuffing it into another text format that will be > parsed. A quoted string cannot contain newlines according to the RFC, so I think we don't need to care about that. > > > So let's roll the \" -> " into mailinfo. > > > > I am not sure if we also should remove the surrounding "", i.e. we > > currently do not turn this > > > > From: "Jeff King" <peff@peff.net> > > > > into this: > > > > Author: Jeff King > > Email: peff@peff.net > > > > I think we probably should, and remove the one that does so from the > > reader. > > I think you have to, or else you cannot tell the difference between > surrounding quotes that need to be stripped, and ones that were > backslash-escaped. Like: > > From: "Jeff King" <peff@peff.net> > From: \"Jeff King\" <peff@peff.net> > > which would both become: > > Author: "Jeff King" > Email: peff@peff.net > > though I am not sure the latter one is actually valid; you might need to > be inside syntactic quotes in order to include backslashed quotes. I > haven't read rfc2822 carefully recently enough to know. > > Anyway, I think that: > > From: One "Two \"Three\" Four" Five > > may also be valid. So the quote-stripping in the reader is not just "at > the outside", but may need to handle interior syntactic quotes, too. So > it really makes sense for me to clean and sanitize as much as possible > in one step, and then make the parser of mailinfo as dumb as possible. > Makes sense, the current itteration of my patch already strips exterior quotes, no matter where they happen. I will send a patch soon. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-15 5:15 ` Kevin Daudt @ 2016-09-15 7:18 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2016-09-15 7:18 UTC (permalink / raw) To: Kevin Daudt; +Cc: Junio C Hamano, git On Thu, Sep 15, 2016 at 07:15:33AM +0200, Kevin Daudt wrote: > > > Another small thing I am not sure about is if the \ quoting can hide > > > an embedded newline in the author name. Would we end up turning > > > > > > From: "Jeff \ > > > King" <peff@peff.net> > > > > > > or somesuch into > > > > > > Author: Jeff > > > King > > > Email: peff@peff.net > > > > > > ;-) > > > > Heh, yeah. That is another reason to clean up and sanitize as much as > > possible before stuffing it into another text format that will be > > parsed. > > A quoted string cannot contain newlines according to the RFC, so I think > we don't need to care about that. I wondered how we handled something like: From: =?UTF-8?q?J=0Aff=20King?= <peff@peff.net> which sticks a newline into the middle of the buffer. We do decode it that way, but eventually call cleanup_space() which converts a run of 1 or more isspace() characters into a single space (0x20). So you end up with: Author: J ff King which is probably reasonable. > Makes sense, the current itteration of my patch already strips exterior > quotes, no matter where they happen. > > I will send a patch soon. Great. Thanks for working on this. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields 2016-09-13 23:46 ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt 2016-09-14 0:04 ` Junio C Hamano @ 2016-09-14 5:13 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: Jeff King @ 2016-09-14 5:13 UTC (permalink / raw) To: kevin; +Cc: git, Junio C Hamano, Kevin Daudt On Wed, Sep 14, 2016 at 01:46:12AM +0200, Kevin Daudt wrote: > diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect > new file mode 100644 > index 0000000..9fe72e9 > --- /dev/null > +++ b/t/t5100/quoted-pair.expect > @@ -0,0 +1,5 @@ > +Author: "Author "The Author" Name" > +Email: somebody@example.com > +Subject: testing quoted-pair > +Date: Sun, 25 May 2008 00:38:18 -0700 So obviously this is much better than including the backslashed quotes. But I have to wonder why the first line is not: Author: Author "The Author" Name Who is responsible for stripping out the other quotes? I know that they _do_ get stripped out even in the current code, but it is not clear to me if that is intentional or an accident. In Git's world-view (e.g., in commit headers), an ident name continues until we get to the "<" of the email (or a "\n" terminates the header line completely). So if mailinfo is converting rfc2822 headers into Git ident, I'd expect it to fully remove any quotes that are not intended to be in the name, and everything after "Author: " up to the newline would become the name. It's entirely possible I'm missing something subtle about the design of mailinfo, though. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-09-15 7:18 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-09-12 20:10 git-am includes escape characters from 'From' field Swift Geek 2016-09-13 15:26 ` Jeff King 2016-09-13 23:46 ` [RFC 0/1] de-quote quoted-strings in mailinfo Kevin Daudt 2016-09-13 23:46 ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt 2016-09-14 0:04 ` Junio C Hamano 2016-09-14 4:58 ` Kevin Daudt 2016-09-14 5:09 ` Jeff King 2016-09-14 5:54 ` Junio C Hamano 2016-09-14 16:03 ` Kevin Daudt 2016-09-14 17:43 ` Junio C Hamano 2016-09-14 19:17 ` Jeff King 2016-09-14 19:30 ` Junio C Hamano 2016-09-14 19:38 ` Jeff King 2016-09-15 5:15 ` Kevin Daudt 2016-09-15 7:18 ` Jeff King 2016-09-14 5:13 ` Jeff King
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.