git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] generate a valid rfc2047 mail header for multi-line subject.
@ 2011-02-14  8:09 xzer
  2011-02-22 20:43 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: xzer @ 2011-02-14  8:09 UTC (permalink / raw)
  To: git; +Cc: xzer

There is still a problem that git-am will lost the line break.
It's not easy to retain it, but as the first step, we can generate
a valid rfc2047 header now.
---
 pretty.c |   29 ++++++++++++++++++++++++++++-
 1 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/pretty.c b/pretty.c
index 8549934..f18a38d 100644
--- a/pretty.c
+++ b/pretty.c
@@ -249,6 +249,33 @@ needquote:
 	strbuf_addstr(sb, "?=");
 }
 
+static void add_rfc2047_multiline(struct strbuf *sb, const char *line, int len,
+                       const char *encoding)
+{
+	int first = 1;
+	char *mline = xmemdupz(line, len);
+	const char *cline = mline;
+	int offset = 0, linelen = 0;
+        for (;;) {
+                linelen = get_one_line(cline);
+
+                cline += linelen;
+
+                if (!linelen)
+                        break;
+		
+		if (!first)
+			strbuf_addf(sb, "\n ");
+
+		offset = *(cline -1) == '\n'; 
+
+		add_rfc2047(sb, cline-linelen, linelen-offset, encoding);
+		first = 0;
+
+        }
+	free(mline);
+}
+
 void pp_user_info(const char *what, enum cmit_fmt fmt, struct strbuf *sb,
 		  const char *line, enum date_mode dmode,
 		  const char *encoding)
@@ -1115,7 +1142,7 @@ void pp_title_line(enum cmit_fmt fmt,
 	strbuf_grow(sb, title.len + 1024);
 	if (subject) {
 		strbuf_addstr(sb, subject);
-		add_rfc2047(sb, title.buf, title.len, encoding);
+		add_rfc2047_multiline(sb, title.buf, title.len, encoding);
 	} else {
 		strbuf_addbuf(sb, &title);
 	}
-- 
1.7.4.52.g00e6e.dirty

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-14  8:09 [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer
@ 2011-02-22 20:43 ` Junio C Hamano
  2011-02-23  8:08   ` Jeff King
  2011-02-23 15:34   ` xzer
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2011-02-22 20:43 UTC (permalink / raw)
  To: xzer; +Cc: git

xzer <xiaozhu@gmail.com> writes:

> Subject: Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.

We prefer to have "[PATCH] subsystem: description without final full-stop" here.

> There is still a problem that git-am will lost the line break.

What does "still" refer to?  It is unclear under what condition the
command lose "the line break" (nor which line break you are refering to; I
am guessing that you have a commit that begins with a multi-line paragraph
and you are talking about line breaks between the lines in the first
paragraph).

> It's not easy to retain it, but as the first step, we can generate
> a valid rfc2047 header now.

Please describe what is broken (iow, "Given this sample input, we
currently generate this output, which is not a valid rfc2047") and what
the new output looks like ("Update pp_title_line() to generate this output
instead.")

> ---

Missing sign-off with a real name.

> diff --git a/pretty.c b/pretty.c
> index 8549934..f18a38d 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -249,6 +249,33 @@ needquote:
>  	strbuf_addstr(sb, "?=");
>  }
>  
> +static void add_rfc2047_multiline(struct strbuf *sb, const char *line, int len,
> +                       const char *encoding)
> +{
> +	int first = 1;
> +	char *mline = xmemdupz(line, len);
> +	const char *cline = mline;
> +	int offset = 0, linelen = 0;
> +        for (;;) {

You seem to have indent that uses SPs instead of HT around here...

> +                linelen = get_one_line(cline);

I can see you are trying to be careful not to let get_one_line() overstep
past "len" the caller gave you by making a copy first, but is this
overhead really necessary?  After all we know in this static function that
the caller is feeding the contents from a strbuf, which always have a
terminating NUL (and that is why it is Ok that get_one_line() is not a
counted string interface).

> +
> +                cline += linelen;
> +
> +                if (!linelen)
> +                        break;
> +		
> +		if (!first)
> +			strbuf_addf(sb, "\n ");
> +
> +		offset = *(cline -1) == '\n'; 
> +
> +		add_rfc2047(sb, cline-linelen, linelen-offset, encoding);
> +		first = 0;
> +
> +        }
> +	free(mline);
> +}

So the general idea of this change (I am thinking aloud what should be in
the updated commit log message as the problem description) is that:

 - We currently give an entire multi-line paragraph string to the
   add_rfc2047() function to be formatted as the title of the commit;

 - The add_rfc2047() functionjust passes "\n" through, without making it a
   folding whitespace followed by a newline, to help callers that want to
   use this function to produce a header line that is rfc 2822 conformant;

 - The patch introduces a new function add_rfc2047_multiline() that splits
   its input and performs line folding for such a caller (namely, the
   pp_title_line() function);

 - Another caller of add_rfc2047(), pp_user_info, is not changed, and it
   won't fold the name of the user that appear on the From: line.

It is unclear if the last point is really the right thing to do, though.
It is not a new problem that an author name that has a "\n" in it would
break the output, but we probably would want to fix that case too here?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-22 20:43 ` Junio C Hamano
@ 2011-02-23  8:08   ` Jeff King
  2011-02-23  9:48     ` Jeff King
  2011-02-23 17:34     ` Junio C Hamano
  2011-02-23 15:34   ` xzer
  1 sibling, 2 replies; 15+ messages in thread
From: Jeff King @ 2011-02-23  8:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

On Tue, Feb 22, 2011 at 12:43:40PM -0800, Junio C Hamano wrote:

> So the general idea of this change (I am thinking aloud what should be in
> the updated commit log message as the problem description) is that:
> 
>  - We currently give an entire multi-line paragraph string to the
>    add_rfc2047() function to be formatted as the title of the commit;
> 
>  - The add_rfc2047() functionjust passes "\n" through, without making it a
>    folding whitespace followed by a newline, to help callers that want to
>    use this function to produce a header line that is rfc 2822 conformant;
> 
>  - The patch introduces a new function add_rfc2047_multiline() that splits
>    its input and performs line folding for such a caller (namely, the
>    pp_title_line() function);
> 
>  - Another caller of add_rfc2047(), pp_user_info, is not changed, and it
>    won't fold the name of the user that appear on the From: line.
> 
> It is unclear if the last point is really the right thing to do, though.
> It is not a new problem that an author name that has a "\n" in it would
> break the output, but we probably would want to fix that case too here?

Yeah, I think the best path forward is:

  1. Stop feeding "pre-folded" subject lines to the email formatter.
     Give it the regular subject line with no newlines.

  2. rfc2047 encoding should encode a literal newline. Which should
     generally never happen, but is probably the most sane thing to do
     if it does.

  3. rfc2047 should fold all lines at some sane length. As it is now, we
     may sometimes generate long lines in headers (though in practice, I
     doubt this is much of a problem).

I started to work on this, but got stuck on (3). Our existing wrap
functions want NUL-terminated strings, and we are operating on a
substring. I tried converting the wrap functions to handle lengths, but
it got way uglier than I had hoped. I think just strdup'ing the subject
temporarily is probably fine, though. Let me see what I can come up
with.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-23  8:08   ` Jeff King
@ 2011-02-23  9:48     ` Jeff King
  2011-02-23  9:50       ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King
                         ` (3 more replies)
  2011-02-23 17:34     ` Junio C Hamano
  1 sibling, 4 replies; 15+ messages in thread
From: Jeff King @ 2011-02-23  9:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

On Wed, Feb 23, 2011 at 03:08:54AM -0500, Jeff King wrote:

> Yeah, I think the best path forward is:
> 
>   1. Stop feeding "pre-folded" subject lines to the email formatter.
>      Give it the regular subject line with no newlines.
> 
>   2. rfc2047 encoding should encode a literal newline. Which should
>      generally never happen, but is probably the most sane thing to do
>      if it does.
> 
>   3. rfc2047 should fold all lines at some sane length. As it is now, we
>      may sometimes generate long lines in headers (though in practice, I
>      doubt this is much of a problem).

So here is a series that does this. It still doesn't preserve subject
newlines in "format-patch | am", but I don't think that was ever a goal
of the code. If we want to add it as an optional feature on top (maybe
as part of "-k"?), it should be easy to do (since the rfc2047 encoding
will now preserve embedded newlines).

  [1/3]: strbuf: add fixed-length version of add_wrapped_text
  [2/3]: format-patch: wrap long header lines
  [3/3]: format-patch: rfc2047-encode newlines in headers

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text
  2011-02-23  9:48     ` Jeff King
@ 2011-02-23  9:50       ` Jeff King
  2011-02-23  9:58       ` [PATCH 2/3] format-patch: wrap long header lines Jeff King
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-02-23  9:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

The function strbuf_add_wrapped_text takes a NUL-terminated
string. This makes it annoying to wrap strings we have as a
pointer and a length.

Refactoring strbuf_add_wrapped_text and all of its
sub-functions to handle fixed-length strings turned out to
be really ugly. So this implementation is lame; it just
strdups the text and operates on the NUL-terminated version.
This should be fine as the strings we are wrapping are
generally pretty short.  If it becomes a problem, we can
optimize later.

Signed-off-by: Jeff King <peff@peff.net>
---
 utf8.c |    9 +++++++++
 utf8.h |    2 ++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/utf8.c b/utf8.c
index 84cfc72..8acbc66 100644
--- a/utf8.c
+++ b/utf8.c
@@ -405,6 +405,15 @@ new_line:
 	}
 }
 
+int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
+			     int indent, int indent2, int width)
+{
+	char *tmp = xstrndup(data, len);
+	int r = strbuf_add_wrapped_text(buf, tmp, indent, indent2, width);
+	free(tmp);
+	return r;
+}
+
 int is_encoding_utf8(const char *name)
 {
 	if (!name)
diff --git a/utf8.h b/utf8.h
index ebc4d2f..81f2c82 100644
--- a/utf8.h
+++ b/utf8.h
@@ -10,6 +10,8 @@ int is_encoding_utf8(const char *name);
 
 int strbuf_add_wrapped_text(struct strbuf *buf,
 		const char *text, int indent, int indent2, int width);
+int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
+			     int indent, int indent2, int width);
 
 #ifndef NO_ICONV
 char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding);
-- 
1.7.2.5.15.gfdd1c

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] format-patch: wrap long header lines
  2011-02-23  9:48     ` Jeff King
  2011-02-23  9:50       ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King
@ 2011-02-23  9:58       ` Jeff King
  2011-02-23  9:59       ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King
  2011-02-23 15:16       ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer
  3 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-02-23  9:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

Subject and identity headers may be arbitrarily long. In the
past, we just assumed that single-line headers would be
reasonably short. For multi-line subjects that we squish
into a single line, we just "pre-folded" the data in
pp_title_line by adding a newline and indentation.

There were two problems. One is that, although rare,
single-line messages can actually be longer than the
recommended line-length limits. The second is that the
pre-folding interacted badly with rfc2047 encoding, leading
to malformed headers.

Instead, let's stop pre-folding the subject lines, and just
fold everything based on length in add_rfc2047, whether
it is encoded or not.

Signed-off-by: Jeff King <peff@peff.net>
---
Three things to note:

  1. We call strbuf_add_wrapped_bytes for the non-encoded case. This
     nicely wraps on word and multi-character boundaries. But it will
     never do a "hard" wrap if there are no word boundaries, and it
     probably should at the 998-character mark which rfc2822 specifies
     as a hard limit.

     I don't know how much we care. For something like that you'd have
     to be maliciously trying to create a bogus patch. If you're mailing
     it, you could just create the bogus mail by hand. If you're trying
     to buffer overflow somebody's "format-patch | am" script, it won't
     do anything, as mailinfo does not have a limit on line length.

  2. For the non-quoted case, technically we want to indent less on the
     first line than we do on subsequent lines (to account for "Subject:
     [PATCH]"). strbuf_add_wrapped_bytes doesn't support that notion. We
     could add it, but it probably doesn't matter. We just end up
     wrapping the subsequent lines a little tighter than we need to.

  3. I used RFC2822's SHOULD value of 78 characters as a line length.
     That's probably unnecessarily conservative. In theory wrapping
     shouldn't make a difference to the data, but maybe people who
     hand-edit the result would prefer not to see wrapping? I dunno. In
     that case, we could set it to something higher like 120, which
     would still wrap the really ridiculous cases.

 pretty.c                |   32 +++++++++++++----
 t/t4014-format-patch.sh |   84 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 8 deletions(-)

diff --git a/pretty.c b/pretty.c
index 8549934..0e167f4 100644
--- a/pretty.c
+++ b/pretty.c
@@ -216,7 +216,15 @@ static int is_rfc2047_special(char ch)
 static void add_rfc2047(struct strbuf *sb, const char *line, int len,
 		       const char *encoding)
 {
-	int i, last;
+	static const int max_length = 78; /* per rfc2822 */
+	int i;
+	int line_len;
+
+	/* How many bytes are already used on the current line? */
+	for (i = sb->len - 1; i >= 0; i--)
+		if (sb->buf[i] == '\n')
+			break;
+	line_len = sb->len - (i+1);
 
 	for (i = 0; i < len; i++) {
 		int ch = line[i];
@@ -225,14 +233,21 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len,
 		if ((i + 1 < len) && (ch == '=' && line[i+1] == '?'))
 			goto needquote;
 	}
-	strbuf_add(sb, line, len);
+	strbuf_add_wrapped_bytes(sb, line, len, 0, 1, max_length - line_len);
 	return;
 
 needquote:
 	strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
 	strbuf_addf(sb, "=?%s?q?", encoding);
-	for (i = last = 0; i < len; i++) {
+	line_len += strlen(encoding) + 5; /* 5 for =??q? */
+	for (i = 0; i < len; i++) {
 		unsigned ch = line[i] & 0xFF;
+
+		if (line_len >= max_length - 2) {
+			strbuf_addf(sb, "?=\n =?%s?q?", encoding);
+			line_len = strlen(encoding) + 5 + 1; /* =??q? plus SP */
+		}
+
 		/*
 		 * We encode ' ' using '=20' even though rfc2047
 		 * allows using '_' for readability.  Unfortunately,
@@ -240,12 +255,14 @@ needquote:
 		 * leave the underscore in place.
 		 */
 		if (is_rfc2047_special(ch) || ch == ' ') {
-			strbuf_add(sb, line + last, i - last);
 			strbuf_addf(sb, "=%02X", ch);
-			last = i + 1;
+			line_len += 3;
+		}
+		else {
+			strbuf_addch(sb, ch);
+			line_len++;
 		}
 	}
-	strbuf_add(sb, line + last, len - last);
 	strbuf_addstr(sb, "?=");
 }
 
@@ -1106,11 +1123,10 @@ void pp_title_line(enum cmit_fmt fmt,
 		   const char *encoding,
 		   int need_8bit_cte)
 {
-	const char *line_separator = (fmt == CMIT_FMT_EMAIL) ? "\n " : " ";
 	struct strbuf title;
 
 	strbuf_init(&title, 80);
-	*msg_p = format_subject(&title, *msg_p, line_separator);
+	*msg_p = format_subject(&title, *msg_p, " ");
 
 	strbuf_grow(sb, title.len + 1024);
 	if (subject) {
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 027c13d..9c66367 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -709,4 +709,88 @@ test_expect_success TTY 'format-patch --stdout paginates' '
 	test_path_is_missing .git/pager_used
 '
 
+test_expect_success 'format-patch handles multi-line subjects' '
+	rm -rf patches/ &&
+	echo content >>file &&
+	for i in one two three; do echo $i; done >msg &&
+	git add file &&
+	git commit -F msg &&
+	git format-patch -o patches -1 &&
+	grep ^Subject: patches/0001-one.patch >actual &&
+	echo "Subject: [PATCH] one two three" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'format-patch handles multi-line encoded subjects' '
+	rm -rf patches/ &&
+	echo content >>file &&
+	for i in en två tre; do echo $i; done >msg &&
+	git add file &&
+	git commit -F msg &&
+	git format-patch -o patches -1 &&
+	grep ^Subject: patches/0001-en.patch >actual &&
+	echo "Subject: [PATCH] =?UTF-8?q?en=20tv=C3=A5=20tre?=" >expect &&
+	test_cmp expect actual
+'
+
+M8="foo bar "
+M64=$M8$M8$M8$M8$M8$M8$M8$M8
+M512=$M64$M64$M64$M64$M64$M64$M64$M64
+cat >expect <<'EOF'
+Subject: [PATCH] foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo
+ bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar
+ foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo
+ bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar
+ foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo
+ bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar
+ foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo
+ bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar
+ foo bar foo bar foo bar foo bar
+EOF
+test_expect_success 'format-patch wraps extremely long headers (ascii)' '
+	echo content >>file &&
+	git add file &&
+	git commit -m "$M512" &&
+	git format-patch --stdout -1 >patch &&
+	sed -n "/^Subject/p; /^ /p; /^$/q" <patch >subject &&
+	test_cmp expect subject
+'
+
+M8="föö bar "
+M64=$M8$M8$M8$M8$M8$M8$M8$M8
+M512=$M64$M64$M64$M64$M64$M64$M64$M64
+cat >expect <<'EOF'
+Subject: [PATCH] =?UTF-8?q?f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?=
+ =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar?=
+EOF
+test_expect_success 'format-patch wraps extremely long headers (rfc2047)' '
+	rm -rf patches/ &&
+	echo content >>file &&
+	git add file &&
+	git commit -m "$M512" &&
+	git format-patch --stdout -1 >patch &&
+	sed -n "/^Subject/p; /^ /p; /^$/q" <patch >subject &&
+	test_cmp expect subject
+'
+
 test_done
-- 
1.7.2.5.15.gfdd1c

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] format-patch: rfc2047-encode newlines in headers
  2011-02-23  9:48     ` Jeff King
  2011-02-23  9:50       ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King
  2011-02-23  9:58       ` [PATCH 2/3] format-patch: wrap long header lines Jeff King
@ 2011-02-23  9:59       ` Jeff King
  2011-02-23 21:47         ` Junio C Hamano
  2011-02-23 15:16       ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer
  3 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2011-02-23  9:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

These should generally never happen, as we already
concatenate multiples in subjects into a single line. But
let's be defensive, since not encoding them means we will
output malformed headers.

Signed-off-by: Jeff King <peff@peff.net>
---
 pretty.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pretty.c b/pretty.c
index 0e167f4..65d20a7 100644
--- a/pretty.c
+++ b/pretty.c
@@ -228,7 +228,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len,
 
 	for (i = 0; i < len; i++) {
 		int ch = line[i];
-		if (non_ascii(ch))
+		if (non_ascii(ch) || ch == '\n')
 			goto needquote;
 		if ((i + 1 < len) && (ch == '=' && line[i+1] == '?'))
 			goto needquote;
@@ -254,7 +254,7 @@ needquote:
 		 * many programs do not understand this and just
 		 * leave the underscore in place.
 		 */
-		if (is_rfc2047_special(ch) || ch == ' ') {
+		if (is_rfc2047_special(ch) || ch == ' ' || ch == '\n') {
 			strbuf_addf(sb, "=%02X", ch);
 			line_len += 3;
 		}
-- 
1.7.2.5.15.gfdd1c

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-23  9:48     ` Jeff King
                         ` (2 preceding siblings ...)
  2011-02-23  9:59       ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King
@ 2011-02-23 15:16       ` xzer
  2011-02-23 16:35         ` Jeff King
  3 siblings, 1 reply; 15+ messages in thread
From: xzer @ 2011-02-23 15:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

2011/2/23 Jeff King <peff@peff.net>:
> On Wed, Feb 23, 2011 at 03:08:54AM -0500, Jeff King wrote:
>
>> Yeah, I think the best path forward is:
>>
>>   1. Stop feeding "pre-folded" subject lines to the email formatter.
>>      Give it the regular subject line with no newlines.
>>
>>   2. rfc2047 encoding should encode a literal newline. Which should
>>      generally never happen, but is probably the most sane thing to do
>>      if it does.
>>
>>   3. rfc2047 should fold all lines at some sane length. As it is now, we
>>      may sometimes generate long lines in headers (though in practice, I
>>      doubt this is much of a problem).
>
> So here is a series that does this. It still doesn't preserve subject
> newlines in "format-patch | am", but I don't think that was ever a goal
> of the code. If we want to add it as an optional feature on top (maybe
> as part of "-k"?), it should be easy to do (since the rfc2047 encoding
> will now preserve embedded newlines).
>
>  [1/3]: strbuf: add fixed-length version of add_wrapped_text
>  [2/3]: format-patch: wrap long header lines
>  [3/3]: format-patch: rfc2047-encode newlines in headers
>
> -Peff
>

To the first point, I really want to find a way that we can remain the
line breaker
after import a formatted patch. That's why I add a new function to product multi
line header, I want to do something which is special to subject. In my usage,
I told my men every day that don't write too long in the first
paragraph, but there
are always somebody who forgets it, then I will get a patch with a
very long subject
just like a nightmare(yes, I gave them my temporary fix which I submitted here,
so they can write as long as they want).

So I want to know whether we can generate a 2047 compatible header so
that mailer
can catch it correctly and the git-am can import it with line breaker
correctly too.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-22 20:43 ` Junio C Hamano
  2011-02-23  8:08   ` Jeff King
@ 2011-02-23 15:34   ` xzer
  2011-02-23 17:45     ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: xzer @ 2011-02-23 15:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2011/2/23 Junio C Hamano <gitster@pobox.com>:
> xzer <xiaozhu@gmail.com> writes:
>
>> Subject: Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
>
> We prefer to have "[PATCH] subsystem: description without final full-stop" here.
>
>> There is still a problem that git-am will lost the line break.
>
> What does "still" refer to?  It is unclear under what condition the
> command lose "the line break" (nor which line break you are refering to; I
> am guessing that you have a commit that begins with a multi-line paragraph
> and you are talking about line breaks between the lines in the first
> paragraph).
>

Yes, that is what I am refering, the line breaks in the first paragraph.

>> It's not easy to retain it, but as the first step, we can generate
>> a valid rfc2047 header now.
>
> Please describe what is broken (iow, "Given this sample input, we
> currently generate this output, which is not a valid rfc2047") and what
> the new output looks like ("Update pp_title_line() to generate this output
> instead.")
>

At present we can only concatenate the lines in the first paragraph so
that we can generate a valid rfc2047 mail, but we will lost the line breaks
after import the patch by git-am.

>> ---
>
> Missing sign-off with a real name.
>

I am sorry that I didn't find the document of submitting a patch until
yesterday, Thanks for your comment.

>> diff --git a/pretty.c b/pretty.c
>> index 8549934..f18a38d 100644
>> --- a/pretty.c
>> +++ b/pretty.c
>> @@ -249,6 +249,33 @@ needquote:
>>       strbuf_addstr(sb, "?=");
>>  }
>>
>> +static void add_rfc2047_multiline(struct strbuf *sb, const char *line, int len,
>> +                       const char *encoding)
>> +{
>> +     int first = 1;
>> +     char *mline = xmemdupz(line, len);
>> +     const char *cline = mline;
>> +     int offset = 0, linelen = 0;
>> +        for (;;) {
>
> You seem to have indent that uses SPs instead of HT around here...
>
>> +                linelen = get_one_line(cline);
>
> I can see you are trying to be careful not to let get_one_line() overstep
> past "len" the caller gave you by making a copy first, but is this
> overhead really necessary?  After all we know in this static function that
> the caller is feeding the contents from a strbuf, which always have a
> terminating NUL (and that is why it is Ok that get_one_line() is not a
> counted string interface).
>

I am not sure that who will call this function in future, I think since there is
a argument as len, so I'd better to obey the function declare.

>> +
>> +                cline += linelen;
>> +
>> +                if (!linelen)
>> +                        break;
>> +
>> +             if (!first)
>> +                     strbuf_addf(sb, "\n ");
>> +
>> +             offset = *(cline -1) == '\n';
>> +
>> +             add_rfc2047(sb, cline-linelen, linelen-offset, encoding);
>> +             first = 0;
>> +
>> +        }
>> +     free(mline);
>> +}
>
> So the general idea of this change (I am thinking aloud what should be in
> the updated commit log message as the problem description) is that:
>
>  - We currently give an entire multi-line paragraph string to the
>   add_rfc2047() function to be formatted as the title of the commit;
>
>  - The add_rfc2047() functionjust passes "\n" through, without making it a
>   folding whitespace followed by a newline, to help callers that want to
>   use this function to produce a header line that is rfc 2822 conformant;
>
>  - The patch introduces a new function add_rfc2047_multiline() that splits
>   its input and performs line folding for such a caller (namely, the
>   pp_title_line() function);
>
>  - Another caller of add_rfc2047(), pp_user_info, is not changed, and it
>   won't fold the name of the user that appear on the From: line.
>
> It is unclear if the last point is really the right thing to do, though.
> It is not a new problem that an author name that has a "\n" in it would
> break the output, but we probably would want to fix that case too here?
>

Your comment is just right for what I tried to do, I explained why I add a new
function for subject specially in the mail which replied to Jeff, I
want to remain
the line breaks after import the patch, so I think I need do something here
in future, it will be compatible with rfc2047 and also can be imported with
line breaks correctly. I don't know how yet, so I just want to left a
possibility.
So I introduce a new function for subject only.

xzer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-23 15:16       ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer
@ 2011-02-23 16:35         ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-02-23 16:35 UTC (permalink / raw)
  To: xzer; +Cc: Junio C Hamano, git

On Thu, Feb 24, 2011 at 12:16:04AM +0900, xzer wrote:

> To the first point, I really want to find a way that we can remain the
> line breaker
> after import a formatted patch. That's why I add a new function to product multi
> line header, I want to do something which is special to subject. In my usage,
> I told my men every day that don't write too long in the first
> paragraph, but there
> are always somebody who forgets it, then I will get a patch with a
> very long subject
> just like a nightmare(yes, I gave them my temporary fix which I submitted here,
> so they can write as long as they want).
> 
> So I want to know whether we can generate a 2047 compatible header so
> that mailer
> can catch it correctly and the git-am can import it with line breaker
> correctly too.

Yes. With my patches, if you feed a subject with linebreaks to
add_rfc2047, they will be encoded. So you just need an extra patch on
top of mine that will use straight linebreaks (_not_ linebreaks with an
extra space) in pp_title_line.  Below is a quick and dirty patch to do
that when "-k" is specified. You will also need to specify "-k" with
applying it with "git am", but other than that it seems to work.

However, I'm still not sure it's a good idea. Other parts of git will
try to treat your paragraph as a single line (e.g., git log --oneline).
Plus this patch is ugly because of the number of layers of abstraction
we have to pass the keep-subject through. I'm not sure there's a good
way around that.

---
diff --git a/builtin/log.c b/builtin/log.c
index d8c6c28..3fdf488 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -768,7 +768,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	pp_user_info(NULL, CMIT_FMT_EMAIL, &sb, committer, DATE_RFC2822,
 		     encoding);
 	pp_title_line(CMIT_FMT_EMAIL, &msg, &sb, subject_start, extra_headers,
-		      encoding, need_8bit_cte);
+		      encoding, need_8bit_cte, 0);
 	pp_remainder(CMIT_FMT_EMAIL, &msg, &sb, 0);
 	printf("%s\n", sb.buf);
 
@@ -1130,6 +1130,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		die ("-n and -k are mutually exclusive.");
 	if (keep_subject && subject_prefix)
 		die ("--subject-prefix and -k are mutually exclusive.");
+	rev.preserve_subject = keep_subject;
 
 	argc = setup_revisions(argc, argv, &rev, &s_r_opt);
 	if (argc > 1)
diff --git a/commit.h b/commit.h
index 659c87c..6eace1c 100644
--- a/commit.h
+++ b/commit.h
@@ -73,6 +73,7 @@ struct pretty_print_context
 	int abbrev;
 	const char *subject;
 	const char *after_subject;
+	int preserve_subject;
 	enum date_mode date_mode;
 	int need_8bit_cte;
 	int show_notes;
@@ -107,7 +108,8 @@ void pp_title_line(enum cmit_fmt fmt,
 		   const char *subject,
 		   const char *after_subject,
 		   const char *encoding,
-		   int need_8bit_cte);
+		   int need_8bit_cte,
+		   int preserve_lines);
 void pp_remainder(enum cmit_fmt fmt,
 		  const char **msg_p,
 		  struct strbuf *sb,
diff --git a/log-tree.c b/log-tree.c
index b46ed3b..9b9aaf2 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -504,6 +504,7 @@ void show_log(struct rev_info *opt)
 	ctx.date_mode = opt->date_mode;
 	ctx.abbrev = opt->diffopt.abbrev;
 	ctx.after_subject = extra_headers;
+	ctx.preserve_subject = opt->preserve_subject;
 	ctx.reflog_info = opt->reflog_info;
 	pretty_print_commit(opt->commit_format, commit, &msgbuf, &ctx);
 
diff --git a/pretty.c b/pretty.c
index 65d20a7..315f1d2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1121,12 +1121,13 @@ void pp_title_line(enum cmit_fmt fmt,
 		   const char *subject,
 		   const char *after_subject,
 		   const char *encoding,
-		   int need_8bit_cte)
+		   int need_8bit_cte,
+		   int preserve_lines)
 {
 	struct strbuf title;
 
 	strbuf_init(&title, 80);
-	*msg_p = format_subject(&title, *msg_p, " ");
+	*msg_p = format_subject(&title, *msg_p, preserve_lines ? "\n" : " ");
 
 	strbuf_grow(sb, title.len + 1024);
 	if (subject) {
@@ -1254,7 +1255,8 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 	/* These formats treat the title line specially. */
 	if (fmt == CMIT_FMT_ONELINE || fmt == CMIT_FMT_EMAIL)
 		pp_title_line(fmt, &msg, sb, context->subject,
-			      context->after_subject, encoding, need_8bit_cte);
+			      context->after_subject, encoding, need_8bit_cte,
+			      context->preserve_subject);
 
 	beginning_of_body = sb->len;
 	if (fmt != CMIT_FMT_ONELINE)
diff --git a/revision.h b/revision.h
index 05659c6..f8ddd83 100644
--- a/revision.h
+++ b/revision.h
@@ -90,7 +90,8 @@ struct rev_info {
 			abbrev_commit:1,
 			use_terminator:1,
 			missing_newline:1,
-			date_mode_explicit:1;
+			date_mode_explicit:1,
+			preserve_subject:1;
 	unsigned int	disable_stdin:1;
 
 	enum date_mode date_mode;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-23  8:08   ` Jeff King
  2011-02-23  9:48     ` Jeff King
@ 2011-02-23 17:34     ` Junio C Hamano
  2011-02-24  7:34       ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2011-02-23 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: xzer, git

Jeff King <peff@peff.net> writes:

> Yeah, I think the best path forward is:
>
>   1. Stop feeding "pre-folded" subject lines to the email formatter.
>      Give it the regular subject line with no newlines.

A bit of history.  The original design of the pp_title_line() function
since 4234a76 (Extend --pretty=oneline to cover the first paragraph,
2007-06-11) was to notice a multi-line paragraph and turn embedded
newlines into line folds (this seems to be a breakage specific to
non-ASCII titles).

As RFC 5322 (or 822/2822 for that matter) does not allow newlines in field
bodies (2.2: A field body MUST NOT include CR and LF except when used in
"folding" and "unfolding"...), it was the only way to allow the recipient
to tell where the original line breaks were to fold at the line breaks in
the original commit message.  Then the recipient _can_ be git aware and
turn the folding CRLF-SP into a LF, not just a SP, relying on the hope
that the transport between the sender and the recipient would not clobber
line folding, to recover the original.

The rebase pipeline (i.e. "format-patch | am") would have satisfied such a
flaky assumption and that was the only reason I wrote the line folding on
the output side that way.  These days, however, "am" invoked in the rebase
pipeline knows to slurp the message not from the patch text but from the
original message, so we can safely depart form the original design rationale.

>   2. rfc2047 encoding should encode a literal newline. Which should
>      generally never happen, but is probably the most sane thing to do
>      if it does.

I was re-reading RFC 2047 and its 5. (3) [Page 8] seems to imply that this
might be allowed: "Only printable and white space character data should be
encoded using this scheme."; I think LF is counted as a white space
character in this context, but it is a bit unclear.

If this "encode newline via 2047" _were_ allowed, I would say that my
preference is not to go with your 1. above.  Instead I would prefer to see
us feed the entire first paragraph, whether it is a single-liner or
multi-line paragraph, to the step 2 ...

>   3. rfc2047 should fold all lines at some sane length...

... and the have step3 fold its result to limit the physical length of the
output line(s).  Note that a multi-line first paragraph always will be
encoded using 2047 because we cannot have a newline in the field body per
RFC5322.  But going the above route would allow us to recover the original
first paragraph intact.

We might need to tweak the receiving end a bit, though.  IIRC, mailinfo
output assumed we will always be dealing with a single-liner subject.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-23 15:34   ` xzer
@ 2011-02-23 17:45     ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2011-02-23 17:45 UTC (permalink / raw)
  To: xzer; +Cc: Junio C Hamano, git

xzer <xiaozhu@gmail.com> writes:

> 2011/2/23 Junio C Hamano <gitster@pobox.com>:
>> xzer <xiaozhu@gmail.com> writes:
>>
>>> Subject: Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
>>
>> We prefer to have "[PATCH] subsystem: description without final full-stop" here.
>>
>>> There is still a problem that git-am will lost the line break.
>>
>> What does "still" refer to?  It is unclear under what condition the
>> command lose "the line break" (nor which line break you are refering to; I
>> am guessing that you have a commit that begins with a multi-line paragraph
>> and you are talking about line breaks between the lines in the first
>> paragraph).
>
> Yes, that is what I am refering, the line breaks in the first paragraph.

Hmm, I gave a suggestion and asked three questions, and only get one
answer back?

>> ... After all we know in this static function that
>> the caller is feeding the contents from a strbuf, which always have a
>> terminating NUL (and that is why it is Ok that get_one_line() is not a
>> counted string interface).
>
> I am not sure that who will call this function in future, I think since there is
> a argument as len, so I'd better to obey the function declare.

If that is the case I would have preferred to see you give get_one_line()
that is a function static to this file an ability to read from a counted
string, instead of making an extra allcation.  But I think you will notice
that all the callchain that pass a pointer into the message around knows
and relies on the fact that the buffer is NUL terminated if you look
around in the file, and that was why I made that suggestion.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] format-patch: rfc2047-encode newlines in headers
  2011-02-23  9:59       ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King
@ 2011-02-23 21:47         ` Junio C Hamano
  2011-02-24  7:15           ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2011-02-23 21:47 UTC (permalink / raw)
  To: Jeff King; +Cc: xzer, git

Jeff King <peff@peff.net> writes:

> These should generally never happen, as we already
> concatenate multiples in subjects into a single line. But
> let's be defensive, since not encoding them means we will
> output malformed headers.

In this particular case, wouldn't it be more conservative and defensive to
produce malformed headers so that the patch won't leave the originator?  I
have a suspicion that mailinfo would choke on the output of this one, even
though I didn't try.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] format-patch: rfc2047-encode newlines in headers
  2011-02-23 21:47         ` Junio C Hamano
@ 2011-02-24  7:15           ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-02-24  7:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

On Wed, Feb 23, 2011 at 01:47:53PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > These should generally never happen, as we already
> > concatenate multiples in subjects into a single line. But
> > let's be defensive, since not encoding them means we will
> > output malformed headers.
> 
> In this particular case, wouldn't it be more conservative and defensive to
> produce malformed headers so that the patch won't leave the
> originator?

No. If you go back to xzer's original mail, the malformed headers didn't
cause messages not to be sent. They just resulted in corrupted and
missing data on the receiver side. I don't think we can rely on any MUA
or MTA having a particular behavior for malformed mail. Some of them may
complain, but many won't.

> I have a suspicion that mailinfo would choke on the output of this
> one, even though I didn't try.

Actually, it does quite well. Without "-k", mailinfo turns it into a
single line, which is what I would expect. With "-k", the info file
contains:

  Author: Jeff King
  Email: peff@peff.net
  Subject: this is a long
  Subject: subject line with
  Subject: many lines in it
  Date: Wed, 23 Feb 2011 11:30:43 -0500

which "git am" turns back into the original multi-line subject.

So I think it's definitely the right thing to do. Not only does it avoid
us generating malformed mail, but because existing mailinfo handles it
sanely, it makes it easy to do a "preserve-newlines" patch on top (which
I'm still not sure is a great idea, but I can see the use in certain
circumstances).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject.
  2011-02-23 17:34     ` Junio C Hamano
@ 2011-02-24  7:34       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-02-24  7:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: xzer, git

On Wed, Feb 23, 2011 at 09:34:32AM -0800, Junio C Hamano wrote:

> A bit of history.  The original design of the pp_title_line() function
> since 4234a76 (Extend --pretty=oneline to cover the first paragraph,
> 2007-06-11) was to notice a multi-line paragraph and turn embedded
> newlines into line folds (this seems to be a breakage specific to
> non-ASCII titles).

Yes, the only actual breakage is the interaction of the folding with
non-ascii titles. The "fold everything" is a new feature. And one that
is probably not all that necessary in the real world, as rfc2822 allows
up to 998 characters. So it is more about complying with the SHOULD
there than the MUST.

> As RFC 5322 (or 822/2822 for that matter) does not allow newlines in field

Wow, I'm out of date. I had no idea 2822 had been superseded. ;)

> bodies (2.2: A field body MUST NOT include CR and LF except when used in
> "folding" and "unfolding"...), it was the only way to allow the recipient
> to tell where the original line breaks were to fold at the line breaks in
> the original commit message.  Then the recipient _can_ be git aware and
> turn the folding CRLF-SP into a LF, not just a SP, relying on the hope
> that the transport between the sender and the recipient would not clobber
> line folding, to recover the original.

Ah, that makes the current code make a lot more sense. Thanks for the
history.

> The rebase pipeline (i.e. "format-patch | am") would have satisfied such a
> flaky assumption and that was the only reason I wrote the line folding on
> the output side that way.  These days, however, "am" invoked in the rebase
> pipeline knows to slurp the message not from the patch text but from the
> original message, so we can safely depart form the original design rationale.

Agreed.

> I was re-reading RFC 2047 and its 5. (3) [Page 8] seems to imply that this
> might be allowed: "Only printable and white space character data should be
> encoded using this scheme."; I think LF is counted as a white space
> character in this context, but it is a bit unclear.

Yeah, the section is a little vague. I think the intent is that senders
should encode reasonable text things, not multimedia garbage, and that
receivers should be wary of getting arbitrary crap. So I think we are
following the spirit of the section in any case.

Reading over it, though, I do notice that we are specifically forbidden
to break multi-byte characters between encoded-words. And my
implementation doesn't take care about that. I think in practice, any
reasonable implementation would just concatenate the results and be
happy, but you never know.

> If this "encode newline via 2047" _were_ allowed, I would say that my
> preference is not to go with your 1. above.  Instead I would prefer to see
> us feed the entire first paragraph, whether it is a single-liner or
> multi-line paragraph, to the step 2 ...

Hmm. I disagree. I thought the decision was made long ago to convert
such multi-paragraph subjects into a single line in most cases.  Because
supporting such subjects at all was never about encouraging people to
flaunt the subject-line convention, but about letting the tools have a
reasonable default behavior for commits imported from systems that did
not follow that convention.

On top of that, I think there is some question of how encoded newlines
in a subject line will be handled by MUAs in the wild (given the
ambiguity of rfc2047 mentioned above). So perhaps it is better to be
conservative and not generate them by default.

And on top of that, it is not just a "how should it be if starting from
scratch" decision. We have been flattening multi-line ascii subjects for
years, so this would be a behavior change.

So I would think this should be triggered by an option ("-k" makes the
most sense to me) if anything. I am somewhat lukewarm on even that; as
you mentioned above, the rebase pipeline has a better preservation
mechanism these days, so it is really about people who want to email
patches to each other while disregarding the subject-line convention.

> >   3. rfc2047 should fold all lines at some sane length...
> 
> ... and the have step3 fold its result to limit the physical length of the
> output line(s).  Note that a multi-line first paragraph always will be
> encoded using 2047 because we cannot have a newline in the field body per
> RFC5322.  But going the above route would allow us to recover the original
> first paragraph intact.

Yes, exactly.

> We might need to tweak the receiving end a bit, though.  IIRC, mailinfo
> output assumed we will always be dealing with a single-liner subject.

I don't know if it's intentional or accidental, but mailinfo handles it
just as I would expect. See my other message.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-24  7:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-14  8:09 [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer
2011-02-22 20:43 ` Junio C Hamano
2011-02-23  8:08   ` Jeff King
2011-02-23  9:48     ` Jeff King
2011-02-23  9:50       ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King
2011-02-23  9:58       ` [PATCH 2/3] format-patch: wrap long header lines Jeff King
2011-02-23  9:59       ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King
2011-02-23 21:47         ` Junio C Hamano
2011-02-24  7:15           ` Jeff King
2011-02-23 15:16       ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer
2011-02-23 16:35         ` Jeff King
2011-02-23 17:34     ` Junio C Hamano
2011-02-24  7:34       ` Jeff King
2011-02-23 15:34   ` xzer
2011-02-23 17:45     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).