All of lore.kernel.org
 help / color / mirror / Atom feed
* git-am includes escape characters from 'From' field
@ 2016-09-12 20:10 Swift Geek
  2016-09-13 15:26 ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Swift Geek @ 2016-09-12 20:10 UTC (permalink / raw)
  To: git

git-am seems to add backslash that escapes double quote character, example  
git format-patch

 From 63da989a5295214f9bd06cd7b409a86a65241eea Mon Sep 17 00:00:00 2001
From: "Sebastian \"Swift Geek\" Grzywna" <swiftgeek@gmail.com>
Date: Mon, 12 Sep 2016 21:27:32 +0200
Subject: [PATCH] Example showing git-am bug that includes escape characters

---
  testfile | 1 +
  1 file changed, 1 insertion(+)
  create mode 100644 testfile

diff --git a/testfile b/testfile
new file mode 100644
index 0000000..b48e119
--- /dev/null
+++ b/testfile
@@ -0,0 +1 @@
+test file contents
-- 
2.9.3

In .gitconfig I have
name = Sebastian \"Swift Geek\" Grzywna
which appears to work fine with regular git commit - backslashes are not  
included in 'From' field.

Regards,
Swift Geek

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: git-am includes escape characters from 'From' field
  2016-09-12 20:10 git-am includes escape characters from 'From' field Swift Geek
@ 2016-09-13 15:26 ` Jeff King
  2016-09-13 23:46   ` [RFC 0/1] de-quote quoted-strings in mailinfo Kevin Daudt
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2016-09-13 15:26 UTC (permalink / raw)
  To: Swift Geek; +Cc: git

On Mon, Sep 12, 2016 at 10:10:23PM +0200, Swift Geek wrote:

> git-am seems to add backslash that escapes double quote character, example
> git format-patch
> 
> From 63da989a5295214f9bd06cd7b409a86a65241eea Mon Sep 17 00:00:00 2001
> From: "Sebastian \"Swift Geek\" Grzywna" <swiftgeek@gmail.com>

This looks correct; the output of format-patch is an rfc2822 message,
and it requires this quoting.

The part you don't show, and that I think is wrong, is that if you then
"git am" this patch, it pulls the backslashes into the commit object.
The culprit looks like "parse_mail()" in builtin/am.c (or possibly
mailinfo() that it calls), which blindly picks up the name portion
without doing any rfc2822 de-quoting.

I don't think we have any existing de-quoting routines to plug in, so
fixing it would probably start with writing one.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 0/1] de-quote quoted-strings in mailinfo
  2016-09-13 15:26 ` Jeff King
@ 2016-09-13 23:46   ` Kevin Daudt
  2016-09-13 23:46     ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Daudt @ 2016-09-13 23:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kevin Daudt

This is my first 'big' C patch, so first an RFC. 

This patch implements RFC2822 dequoting of quoted-pairs in quoted
strings, which was not done yet. This means removing the "\" as escape
character from header fields, but only quoted strings, and comments
(text between braces).

According to the RFC, comments can also appear in square brackets in the
e-mail domain, but that has not been implemented. In fact, just like
other functions, it just looks at the whole header line.

Please let me know what you think.

Kevin Daudt (1):
  mailinfo: de-quote quoted-pair in header fields

 mailinfo.c                 | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh        |  5 +++++
 t/t5100/quoted-pair.expect |  5 +++++
 t/t5100/quoted-pair.in     |  9 +++++++++
 t/t5100/quoted-pair.info   |  5 +++++
 5 files changed, 70 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in
 create mode 100644 t/t5100/quoted-pair.info

-- 
2.10.0.rc2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-13 23:46   ` [RFC 0/1] de-quote quoted-strings in mailinfo Kevin Daudt
@ 2016-09-13 23:46     ` Kevin Daudt
  2016-09-14  0:04       ` Junio C Hamano
  2016-09-14  5:13       ` Jeff King
  0 siblings, 2 replies; 16+ messages in thread
From: Kevin Daudt @ 2016-09-13 23:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kevin Daudt

rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

git currently does not do anything with this at all, and verbatim takes
over the field body.

Make sure to properly dequote these quoted-strings and comments.

Signed-off-by: Kevin Daudt <me@ikke.info>
---
 mailinfo.c                 | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh        |  5 +++++
 t/t5100/quoted-pair.expect |  5 +++++
 t/t5100/quoted-pair.in     |  9 +++++++++
 t/t5100/quoted-pair.info   |  5 +++++
 5 files changed, 70 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in
 create mode 100644 t/t5100/quoted-pair.info

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..3b7ae8a 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -445,6 +445,51 @@ static void decode_header(struct mailinfo *mi, struct strbuf *it)
 		mi->input_error = -1;
 }
 
+static int unescape_quoted_pair(struct mailinfo *mi, struct strbuf *line)
+{
+	struct strbuf outbuf = STRBUF_INIT;
+	const char *in = line->buf;
+	int c, skip=0;
+	char escape_context=0;
+
+	while ((c = *in++) != 0) {
+		if (!skip) {
+			switch (c) {
+				case '"':
+					if (!escape_context)
+						escape_context = '"';
+					else if (escape_context == '"')
+						escape_context = 0;
+					break;
+				case '\\':
+					if (escape_context) {
+						skip = 1;
+						continue;
+					}
+					break;
+				case '(':
+					if (!escape_context)
+						escape_context = '(';
+					break;
+				case ')':
+					if (escape_context == '(')
+						escape_context = 0;
+					break;
+			}
+		} else {
+			skip = 0;
+		}
+
+		strbuf_addch(&outbuf, c);
+	}
+
+	strbuf_reset(line);
+	strbuf_addbuf(line, &outbuf);
+
+	return 0;
+
+}
+
 static int check_header(struct mailinfo *mi,
 			const struct strbuf *line,
 			struct strbuf *hdr_data[], int overwrite)
@@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
 			 */
 			strbuf_add(&sb, line->buf + len + 2, line->len - len - 2);
 			decode_header(mi, &sb);
+			unescape_quoted_pair(mi, &sb);
 			handle_header(&hdr_data[i], &sb);
 			ret = 1;
 			goto check_header_out;
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..2be61bf 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -142,4 +142,9 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
 	test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo unescapes rfc2822 quoted-pair' '
+    git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >"$TEST_DIRECTORY"/t5100/quoted-pair.info &&
+    test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect "$TEST_DIRECTORY"/t5100/quoted-pair.info
+'
+
 test_done
diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect
new file mode 100644
index 0000000..9fe72e9
--- /dev/null
+++ b/t/t5100/quoted-pair.expect
@@ -0,0 +1,5 @@
+Author: "Author "The Author" Name"
+Email: somebody@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-pair.in b/t/t5100/quoted-pair.in
new file mode 100644
index 0000000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-pair.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
diff --git a/t/t5100/quoted-pair.info b/t/t5100/quoted-pair.info
new file mode 100644
index 0000000..9fe72e9
--- /dev/null
+++ b/t/t5100/quoted-pair.info
@@ -0,0 +1,5 @@
+Author: "Author "The Author" Name"
+Email: somebody@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
-- 
2.10.0.rc2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-13 23:46     ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt
@ 2016-09-14  0:04       ` Junio C Hamano
  2016-09-14  4:58         ` Kevin Daudt
  2016-09-14  5:09         ` Jeff King
  2016-09-14  5:13       ` Jeff King
  1 sibling, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2016-09-14  0:04 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git

Kevin Daudt <me@ikke.info> writes:

> +static int unescape_quoted_pair(struct mailinfo *mi, struct strbuf *line)
> +{
> +	struct strbuf outbuf = STRBUF_INIT;
> +	const char *in = line->buf;
> +	int c, skip=0;
> +	char escape_context=0;

Have SP around '=', i.e.

	int c, skip = 0;
	char escape_context = 0;

> +	while ((c = *in++) != 0) {
> +		if (!skip) {
> +			switch (c) {
> +				case '"':
> +...
> +					break;
> +			}
> +		} else {
> +			skip = 0;
> +		}
> +
> +		strbuf_addch(&outbuf, c);
> +	}

It often is easier to read if smaller of the two are in the if part
and the larger in else part.  Also your switch/case is indented one
level too deep.  I.e.

        while (...) {
                if (skip) {
                        skip = 0;
                } else {
                        switch (c) {
                        case '"':
                                do this;
                                ...
                        }
                }
                strbuf_addch(...);
        }

I found the variable name "skip" a bit hard to reason about.  What
it does is to signal the next round of the processing that we have
seen a single-byte quote and it should keep the byte it will get, no
matter what its value is.  It is "skipping" the conditional
processing, but I'd imagine most people would consider it is
"keeping the byte".

> @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
>  			 */
>  			strbuf_add(&sb, line->buf + len + 2, line->len - len - 2);
>  			decode_header(mi, &sb);
> +			unescape_quoted_pair(mi, &sb);
>  			handle_header(&hdr_data[i], &sb);
>  			ret = 1;
>  			goto check_header_out;

I wonder why this call is only in here, not on other headers that
all call decode_header().  For that matter, I wonder if the call (or
the logic of the helper function itself) should go at the end of
decode_header().  After all, this is different kind of decoding; the
current one knows how to do b/q encoding but forgot about the more
traditional quoting done with backslash, and you are teaching the
code that the current decoding it does is insufficient and how to
handle the one that the original implementors forgot about.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14  0:04       ` Junio C Hamano
@ 2016-09-14  4:58         ` Kevin Daudt
  2016-09-14  5:09         ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Kevin Daudt @ 2016-09-14  4:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Sep 13, 2016 at 05:04:45PM -0700, Junio C Hamano wrote:
> Kevin Daudt <me@ikke.info> writes:
> 
> 
> It often is easier to read if smaller of the two are in the if part
> and the larger in else part.  Also your switch/case is indented one
> level too deep.  I.e.
> 

Thanks, I've switched the order and fixed indentation.

> 
> I found the variable name "skip" a bit hard to reason about.  What
> it does is to signal the next round of the processing that we have
> seen a single-byte quote and it should keep the byte it will get, no
> matter what its value is.  It is "skipping" the conditional
> processing, but I'd imagine most people would consider it is
> "keeping the byte".

Yes, I agree and was trying to find a better name. I have renamed it to
"take_next_literally", which indicates better what it means.

> 
> > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
> >  			 */
> >  			strbuf_add(&sb, line->buf + len + 2, line->len - len - 2);
> >  			decode_header(mi, &sb);
> > +			unescape_quoted_pair(mi, &sb);
> >  			handle_header(&hdr_data[i], &sb);
> >  			ret = 1;
> >  			goto check_header_out;
> 
> I wonder why this call is only in here, not on other headers that
> all call decode_header().  For that matter, I wonder if the call (or
> the logic of the helper function itself) should go at the end of
> decode_header().  After all, this is different kind of decoding; the
> current one knows how to do b/q encoding but forgot about the more
> traditional quoting done with backslash, and you are teaching the
> code that the current decoding it does is insufficient and how to
> handle the one that the original implementors forgot about.

Makes sense, it should be applied to all headers (I missed the other
decode_header calls).

I will send a new version later.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14  0:04       ` Junio C Hamano
  2016-09-14  4:58         ` Kevin Daudt
@ 2016-09-14  5:09         ` Jeff King
  2016-09-14  5:54           ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff King @ 2016-09-14  5:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Daudt, git

On Tue, Sep 13, 2016 at 05:04:45PM -0700, Junio C Hamano wrote:

> > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
> >  			 */
> >  			strbuf_add(&sb, line->buf + len + 2, line->len - len - 2);
> >  			decode_header(mi, &sb);
> > +			unescape_quoted_pair(mi, &sb);
> >  			handle_header(&hdr_data[i], &sb);
> >  			ret = 1;
> >  			goto check_header_out;
> 
> I wonder why this call is only in here, not on other headers that
> all call decode_header().  For that matter, I wonder if the call (or
> the logic of the helper function itself) should go at the end of
> decode_header().  After all, this is different kind of decoding; the
> current one knows how to do b/q encoding but forgot about the more
> traditional quoting done with backslash, and you are teaching the
> code that the current decoding it does is insufficient and how to
> handle the one that the original implementors forgot about.

It has been a while since I looked at rfc2822, but aren't the quoting
and syntax rules different for addresses versus other headers? We would
not want to dequote a Subject header, I think.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-13 23:46     ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt
  2016-09-14  0:04       ` Junio C Hamano
@ 2016-09-14  5:13       ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2016-09-14  5:13 UTC (permalink / raw)
  To: kevin; +Cc: git, Junio C Hamano, Kevin Daudt

On Wed, Sep 14, 2016 at 01:46:12AM +0200, Kevin Daudt wrote:

> diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect
> new file mode 100644
> index 0000000..9fe72e9
> --- /dev/null
> +++ b/t/t5100/quoted-pair.expect
> @@ -0,0 +1,5 @@
> +Author: "Author "The Author" Name"
> +Email: somebody@example.com
> +Subject: testing quoted-pair
> +Date: Sun, 25 May 2008 00:38:18 -0700

So obviously this is much better than including the backslashed quotes.
But I have to wonder why the first line is not:

  Author: Author "The Author" Name

Who is responsible for stripping out the other quotes? I know that they
_do_ get stripped out even in the current code, but it is not clear to
me if that is intentional or an accident.

In Git's world-view (e.g., in commit headers), an ident name continues
until we get to the "<" of the email (or a "\n" terminates the header
line completely). So if mailinfo is converting rfc2822 headers into Git
ident, I'd expect it to fully remove any quotes that are not intended to
be in the name, and everything after "Author: " up to the newline would
become the name.

It's entirely possible I'm missing something subtle about the design of
mailinfo, though.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14  5:09         ` Jeff King
@ 2016-09-14  5:54           ` Junio C Hamano
  2016-09-14 16:03             ` Kevin Daudt
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-09-14  5:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Kevin Daudt, git

Jeff King <peff@peff.net> writes:

> It has been a while since I looked at rfc2822, but aren't the quoting
> and syntax rules different for addresses versus other headers? We would
> not want to dequote a Subject header, I think.

You're absolutely right.  RFC2822 does not quite _want_ to dequote
anything.  As you pointed out in a separate message, we are the one
who want to strip out "" quoting when mailinfo says

	Author: "Jeff King"

to its standard output (aka "info"), and turn it into

	GIT_AUTHOR_NAME='Jeff King'

and do so ONLY for the author name.

So I would think it is the responsibility of the one that reads the
"info" file that is produced by mailinfo to dequote the backslash
thing if the mailinfo gave us

	Author: "Jeff \"Peff\" King"


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14  5:54           ` Junio C Hamano
@ 2016-09-14 16:03             ` Kevin Daudt
  2016-09-14 17:43               ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Daudt @ 2016-09-14 16:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Tue, Sep 13, 2016 at 10:54:47PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > It has been a while since I looked at rfc2822, but aren't the quoting
> > and syntax rules different for addresses versus other headers? We would
> > not want to dequote a Subject header, I think.
> 
> You're absolutely right.  RFC2822 does not quite _want_ to dequote
> anything.  As you pointed out in a separate message, we are the one
> who want to strip out "" quoting when mailinfo says
> 
> 	Author: "Jeff King"
> 
> to its standard output (aka "info"), and turn it into
> 
> 	GIT_AUTHOR_NAME='Jeff King'
> 
> and do so ONLY for the author name.
> 
> So I would think it is the responsibility of the one that reads the
> "info" file that is produced by mailinfo to dequote the backslash
> thing if the mailinfo gave us
> 
> 	Author: "Jeff \"Peff\" King"
> 

The RFC makes a distinction between structured fields and unstructured
fields. Quoting would not even be necessary for unstructured fields
(like Subject), so yes, that those fields should be left alone.

Unstructures fields are subject, comments, keywords and optional fields,
the rest is considered structured.

Because the only field where this is a problem is the From field, I
think it would be safe to limit the unquoting just to that field.

My reasoning to do the unquoting here is because it's the RFC requires
the quoting in the first place.

I already noticed a bug in the current unquoting of the author when
adding a comment to the From: field.

    From: "A U Thor" <au@thor.com> (test)

When applied the the author of this patch shows up as:

    Author: A U Thor" (test) <au@thor.com>

So I agree with Jeff[1] where he states that the surrounding quotes
should be removed, if that's not a problem for git.

[1]:https://public-inbox.org/git/20160914051305.vphknpsikyxi3hg3@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14 16:03             ` Kevin Daudt
@ 2016-09-14 17:43               ` Junio C Hamano
  2016-09-14 19:17                 ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-09-14 17:43 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: Jeff King, git

Kevin Daudt <me@ikke.info> writes:

> When applied the the author of this patch shows up as:
>
>     Author: A U Thor" (test) <au@thor.com>
>
> So I agree with Jeff[1] where he states that the surrounding quotes
> should be removed, if that's not a problem for git.
>
> [1]:https://public-inbox.org/git/20160914051305.vphknpsikyxi3hg3@sigill.intra.peff.net/

I think we can go either way and it does not matter all that much if
"mailinfo" changes its output or the reader of "mailinfo" output
changes its input--we will either be munging data read from "From:"
when producing the "Author:" line, or taking the "Author:" output by
mailinfo and removing the quotes.

As an output from mailinfo that looks like this:

	Author: "A U Thor"
        Email: au@thor.com

is made into a commit object that has this:

	author A U Thor <au@thor.com>

we know that the reader of mailinfo output _already_ has some logic
to strip the surrounding double quotes.  That is the only reason why
I think it is a better approach to not dequote in the "mailinfo" but
in the reader to turn

	Author: "A \"U\" Thor"
        Email: au@thor.com

into a commit object that has this:

	author A "U" Thor <au@thor.com>

than updating mailinfo to produce

	Author: A "U" Thor
        Email: au@thor.com

and then create the same result.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14 17:43               ` Junio C Hamano
@ 2016-09-14 19:17                 ` Jeff King
  2016-09-14 19:30                   ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2016-09-14 19:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Daudt, git

On Wed, Sep 14, 2016 at 10:43:18AM -0700, Junio C Hamano wrote:

> I think we can go either way and it does not matter all that much if
> "mailinfo" changes its output or the reader of "mailinfo" output
> changes its input--we will either be munging data read from "From:"
> when producing the "Author:" line, or taking the "Author:" output by
> mailinfo and removing the quotes.

Yeah, that was the part I was wondering about in my original response.
What is the output of mailinfo _supposed_ to be, and do we consider that
at all public (i.e., are there are other tools besides "git am" that
build on mailinfo)?

At least "am" already does some quote-stripping, so any de-quoting added
in mailinfo is potentially a regression (if we indeed care about keeping
the output stable).

But if we are OK with that, it seems to me that mailinfo is the best
place to do the de-quoting, because then its output is well-defined:
everything after "Author:" up to the newline is the name. Whereas if the
cleanup of the value is split across mailinfo and its reader, then it is
hard to know which side is responsible for which part. mailinfo handles
whitespace unfolding, certainly. What other rfc2822 things does it
handle? What are the rules for dequoting its output?

I'll admit I don't care _too_ much. This is a remote corner of the code
that I hope never to have to look at. I'm mostly just describing how the
problem space makes sense to _me_, and how I would write it if starting
from scratch.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14 19:17                 ` Jeff King
@ 2016-09-14 19:30                   ` Junio C Hamano
  2016-09-14 19:38                     ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-09-14 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Kevin Daudt, git

Jeff King <peff@peff.net> writes:

> On Wed, Sep 14, 2016 at 10:43:18AM -0700, Junio C Hamano wrote:
>
>> I think we can go either way and it does not matter all that much if
>> "mailinfo" changes its output or the reader of "mailinfo" output
>> changes its input--we will either be munging data read from "From:"
>> when producing the "Author:" line, or taking the "Author:" output by
>> mailinfo and removing the quotes.
>
> Yeah, that was the part I was wondering about in my original response.
> What is the output of mailinfo _supposed_ to be, and do we consider that
> at all public (i.e., are there are other tools besides "git am" that
> build on mailinfo)?
>
> At least "am" already does some quote-stripping, so any de-quoting added
> in mailinfo is potentially a regression (if we indeed care about keeping
> the output stable).

Another small thing I am not sure about is if the \ quoting can hide
an embedded newline in the author name.  Would we end up turning

	From: "Jeff \
            King" <peff@peff.net>

or somesuch into

	Author: Jeff
        King
        Email: peff@peff.net

;-)

> But if we are OK with that, it seems to me that mailinfo is the best
> place to do the de-quoting, because then its output is well-defined:
> everything after "Author:" up to the newline is the name.

There are other things mailinfo does, like turning this

	From: peff@peff.net (Jeff King)

into

	Author: Jeff King
        Email: peff@peff.net

and

	From: Uh "foo" Bar peff@peff.net (Jeff King)

into

	Author: Uh "foo" Bar (Jeff King)
        Email: peff@peff.net

So let's roll the \" -> " into mailinfo.

I am not sure if we also should remove the surrounding "", i.e. we
currently do not turn this

	From: "Jeff King" <peff@peff.net>

into this:

	Author: Jeff King
        Email: peff@peff.net

I think we probably should, and remove the one that does so from the
reader.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14 19:30                   ` Junio C Hamano
@ 2016-09-14 19:38                     ` Jeff King
  2016-09-15  5:15                       ` Kevin Daudt
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2016-09-14 19:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Daudt, git

On Wed, Sep 14, 2016 at 12:30:06PM -0700, Junio C Hamano wrote:

> Another small thing I am not sure about is if the \ quoting can hide
> an embedded newline in the author name.  Would we end up turning
> 
> 	From: "Jeff \
>             King" <peff@peff.net>
> 
> or somesuch into
> 
> 	Author: Jeff
>         King
>         Email: peff@peff.net
> 
> ;-)

Heh, yeah. That is another reason to clean up and sanitize as much as
possible before stuffing it into another text format that will be
parsed.

> So let's roll the \" -> " into mailinfo.
> 
> I am not sure if we also should remove the surrounding "", i.e. we
> currently do not turn this
> 
> 	From: "Jeff King" <peff@peff.net>
> 
> into this:
> 
> 	Author: Jeff King
>         Email: peff@peff.net
> 
> I think we probably should, and remove the one that does so from the
> reader.

I think you have to, or else you cannot tell the difference between
surrounding quotes that need to be stripped, and ones that were
backslash-escaped. Like:

  From: "Jeff King" <peff@peff.net>
  From: \"Jeff King\" <peff@peff.net>

which would both become:

  Author: "Jeff King"
  Email: peff@peff.net

though I am not sure the latter one is actually valid; you might need to
be inside syntactic quotes in order to include backslashed quotes. I
haven't read rfc2822 carefully recently enough to know.

Anyway, I think that:

  From: One "Two \"Three\" Four" Five

may also be valid. So the quote-stripping in the reader is not just "at
the outside", but may need to handle interior syntactic quotes, too. So
it really makes sense for me to clean and sanitize as much as possible
in one step, and then make the parser of mailinfo as dumb as possible.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-14 19:38                     ` Jeff King
@ 2016-09-15  5:15                       ` Kevin Daudt
  2016-09-15  7:18                         ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Daudt @ 2016-09-15  5:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Wed, Sep 14, 2016 at 12:38:20PM -0700, Jeff King wrote:
> On Wed, Sep 14, 2016 at 12:30:06PM -0700, Junio C Hamano wrote:
> 
> > Another small thing I am not sure about is if the \ quoting can hide
> > an embedded newline in the author name.  Would we end up turning
> > 
> > 	From: "Jeff \
> >             King" <peff@peff.net>
> > 
> > or somesuch into
> > 
> > 	Author: Jeff
> >         King
> >         Email: peff@peff.net
> > 
> > ;-)
> 
> Heh, yeah. That is another reason to clean up and sanitize as much as
> possible before stuffing it into another text format that will be
> parsed.

A quoted string cannot contain newlines according to the RFC, so I think
we don't need to care about that.

> 
> > So let's roll the \" -> " into mailinfo.
> > 
> > I am not sure if we also should remove the surrounding "", i.e. we
> > currently do not turn this
> > 
> > 	From: "Jeff King" <peff@peff.net>
> > 
> > into this:
> > 
> > 	Author: Jeff King
> >         Email: peff@peff.net
> > 
> > I think we probably should, and remove the one that does so from the
> > reader.
> 
> I think you have to, or else you cannot tell the difference between
> surrounding quotes that need to be stripped, and ones that were
> backslash-escaped. Like:
> 
>   From: "Jeff King" <peff@peff.net>
>   From: \"Jeff King\" <peff@peff.net>
> 
> which would both become:
> 
>   Author: "Jeff King"
>   Email: peff@peff.net
> 
> though I am not sure the latter one is actually valid; you might need to
> be inside syntactic quotes in order to include backslashed quotes. I
> haven't read rfc2822 carefully recently enough to know.
> 
> Anyway, I think that:
> 
>   From: One "Two \"Three\" Four" Five
> 
> may also be valid. So the quote-stripping in the reader is not just "at
> the outside", but may need to handle interior syntactic quotes, too. So
> it really makes sense for me to clean and sanitize as much as possible
> in one step, and then make the parser of mailinfo as dumb as possible.
> 

Makes sense, the current itteration of my patch already strips exterior
quotes, no matter where they happen.

I will send a patch soon.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields
  2016-09-15  5:15                       ` Kevin Daudt
@ 2016-09-15  7:18                         ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2016-09-15  7:18 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: Junio C Hamano, git

On Thu, Sep 15, 2016 at 07:15:33AM +0200, Kevin Daudt wrote:

> > > Another small thing I am not sure about is if the \ quoting can hide
> > > an embedded newline in the author name.  Would we end up turning
> > > 
> > > 	From: "Jeff \
> > >             King" <peff@peff.net>
> > > 
> > > or somesuch into
> > > 
> > > 	Author: Jeff
> > >         King
> > >         Email: peff@peff.net
> > > 
> > > ;-)
> > 
> > Heh, yeah. That is another reason to clean up and sanitize as much as
> > possible before stuffing it into another text format that will be
> > parsed.
> 
> A quoted string cannot contain newlines according to the RFC, so I think
> we don't need to care about that.

I wondered how we handled something like:

  From: =?UTF-8?q?J=0Aff=20King?= <peff@peff.net>

which sticks a newline into the middle of the buffer. We do decode it
that way, but eventually call cleanup_space() which converts a run of 1
or more isspace() characters into a single space (0x20). So you end up
with:

  Author: J ff King

which is probably reasonable.

> Makes sense, the current itteration of my patch already strips exterior
> quotes, no matter where they happen.
> 
> I will send a patch soon.

Great. Thanks for working on this.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-15  7:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12 20:10 git-am includes escape characters from 'From' field Swift Geek
2016-09-13 15:26 ` Jeff King
2016-09-13 23:46   ` [RFC 0/1] de-quote quoted-strings in mailinfo Kevin Daudt
2016-09-13 23:46     ` [RFC 0/1] mailinfo: de-quote quoted-pair in header fields Kevin Daudt
2016-09-14  0:04       ` Junio C Hamano
2016-09-14  4:58         ` Kevin Daudt
2016-09-14  5:09         ` Jeff King
2016-09-14  5:54           ` Junio C Hamano
2016-09-14 16:03             ` Kevin Daudt
2016-09-14 17:43               ` Junio C Hamano
2016-09-14 19:17                 ` Jeff King
2016-09-14 19:30                   ` Junio C Hamano
2016-09-14 19:38                     ` Jeff King
2016-09-15  5:15                       ` Kevin Daudt
2016-09-15  7:18                         ` Jeff King
2016-09-14  5:13       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.