git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Content-Type handling
@ 2012-09-17 16:34 Tomas Cohen Arazi
  2012-09-17 17:18 ` Re*: " Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Tomas Cohen Arazi @ 2012-09-17 16:34 UTC (permalink / raw)
  To: git

Hi, I'm not sure it is a bug, but we used:

git config --global format.headers "Content-Type: text/plain; charset=\"utf-8\""

and recently (perhaps an Ubuntu default setup issue) the content-type
is being automatically set, the result is that patches contain this:

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

This might not be the problem, but when I apply the patch I get this:

fatal: cannot convert from UTF-8utf-8 to UTF-8

which looks like a bug. Not sure it hasn't been reported before, but I
think it should take one of the content-type specifications and not
append both. I couldn't find a place to looks for previous bug reports
to check, so forgive me if this is not the right place to report it.

Regards
To+

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re*: Content-Type handling
  2012-09-17 16:34 Content-Type handling Tomas Cohen Arazi
@ 2012-09-17 17:18 ` Junio C Hamano
  2012-09-17 22:28   ` [PATCH] mailinfo: do not concatenate charset= attribute values from mime headers Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2012-09-17 17:18 UTC (permalink / raw)
  To: Tomas Cohen Arazi; +Cc: git

Tomas Cohen Arazi <tomascohen@gmail.com> writes:

> Hi, I'm not sure it is a bug, but we used:
>
> git config --global format.headers "Content-Type: text/plain; charset=\"utf-8\""
>
> and recently (perhaps an Ubuntu default setup issue) the content-type
> is being automatically set, the result is that patches contain this:
>
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> Content-Type: text/plain; charset="utf-8"
>
> This might not be the problem, but when I apply the patch I get this:
>
> fatal: cannot convert from UTF-8utf-8 to UTF-8
>
> which looks like a bug. Not sure it hasn't been reported before, but I
> think it should take one of the content-type specifications and not
> append both. I couldn't find a place to looks for previous bug reports
> to check, so forgive me if this is not the right place to report it.

What I think is happening is:

 * ancient "git format-patch" did not add appropriate content-type
   headers, and that bug was fixed in more recent versions.

 * while waiting for the bug to get fixed, the user worked around by
   having her own format.headers to work it around.

 * the user upgraded to a newer version of git without the bug.

 * Git trusts that the user knows better than git itself, and it
   adds format.headers to whatever it needs to have in the message.
   As a message with more than one content-type: is invalid, the
   handcrafted format.headers used to be used as a workaround should
   be removed, but it hasn't been.

 * Then the receiving end (mailinfo) gets confused by the invalid
   input.


At least, a patch line this may alleviate the symptom.  It seems
that we assumed we won't see multiple charset="..." so the parser,
starting from an empty charset, simply appended whatever it found
in the message, instead of overwriting the old value it had.  The
other user of the function this patch touches always makes sure the
"attr" strbuf is empty before calling the function, and wouldn't
have been confused by multiple occurrences of the same header, and
this patch should not affect it, either.

 builtin/mailinfo.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git i/builtin/mailinfo.c w/builtin/mailinfo.c
index b691b77..2b3f4d9 100644
--- i/builtin/mailinfo.c
+++ w/builtin/mailinfo.c
@@ -160,10 +160,9 @@ static int slurp_attr(const char *line, const char *name, struct strbuf *attr)
 	const char *ends, *ap = strcasestr(line, name);
 	size_t sz;
 
-	if (!ap) {
-		strbuf_setlen(attr, 0);
+	strbuf_setlen(attr, 0);
+	if (!ap)
 		return 0;
-	}
 	ap += strlen(name);
 	if (*ap == '"') {
 		ap++;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] mailinfo: do not concatenate charset= attribute values from mime headers
  2012-09-17 17:18 ` Re*: " Junio C Hamano
@ 2012-09-17 22:28   ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2012-09-17 22:28 UTC (permalink / raw)
  To: git; +Cc: Tomas Cohen Arazi

"Content-type: text/plain; charset=UTF-8" header should not appear
twice in the input, but it is always better to gracefully deal with
such a case.  The current code concatenates the value to the values
we have seen previously, producing nonsense such as "utf8UTF-8".

Instead of concatenating, forget the previous value and use the last
value we see.

Reported-by: Tomas Cohen Arazi
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * I think this is the right thing to do after all.  Thanks for
   bringing the issue up, Tomas.

 builtin/mailinfo.c  |  5 ++---
 t/t5100-mailinfo.sh |  2 +-
 t/t5100/info0017    |  5 +++++
 t/t5100/msg0017     |  2 ++
 t/t5100/patch0017   |  6 ++++++
 t/t5100/sample.mbox | 16 ++++++++++++++++
 6 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 t/t5100/info0017
 create mode 100644 t/t5100/msg0017
 create mode 100644 t/t5100/patch0017

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index eaf9e15..dd8b67c 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -160,10 +160,9 @@ static int slurp_attr(const char *line, const char *name, struct strbuf *attr)
 	const char *ends, *ap = strcasestr(line, name);
 	size_t sz;
 
-	if (!ap) {
-		strbuf_setlen(attr, 0);
+	strbuf_setlen(attr, 0);
+	if (!ap)
 		return 0;
-	}
 	ap += strlen(name);
 	if (*ap == '"') {
 		ap++;
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 81904d9..3e64a7a 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -11,7 +11,7 @@ test_expect_success 'split sample box' \
 	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
 	last=`cat last` &&
 	echo total is $last &&
-	test `cat last` = 16'
+	test `cat last` = 17'
 
 check_mailinfo () {
 	mail=$1 opt=$2
diff --git a/t/t5100/info0017 b/t/t5100/info0017
new file mode 100644
index 0000000..d2bc89f
--- /dev/null
+++ b/t/t5100/info0017
@@ -0,0 +1,5 @@
+Author: A U Thor
+Email: a.u.thor@example.com
+Subject: A E I O U
+Date: Mon, 17 Sep 2012 14:23:44 -0700
+
diff --git a/t/t5100/msg0017 b/t/t5100/msg0017
new file mode 100644
index 0000000..2ee0900
--- /dev/null
+++ b/t/t5100/msg0017
@@ -0,0 +1,2 @@
+New content here
+
diff --git a/t/t5100/patch0017 b/t/t5100/patch0017
new file mode 100644
index 0000000..35cf84c
--- /dev/null
+++ b/t/t5100/patch0017
@@ -0,0 +1,6 @@
+diff --git a/foo b/foo
+index e69de29..d95f3ad 100644
+--- a/foo
++++ b/foo
+@@ -0,0 +1 @@
++New content
diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox
index de10312..fcc0e0d 100644
--- a/t/t5100/sample.mbox
+++ b/t/t5100/sample.mbox
@@ -683,3 +683,19 @@ index e69de29..d95f3ad 100644
 @@ -0,0 +1 @@
 +content
 
+From nobody Mon Sep 17 00:00:00 2001
+From: A U Thor <a.u.thor@example.com>
+Subject: A E I O U
+Date: Mon, 17 Sep 2012 14:23:44 -0700
+MIME-Version: 1.0
+Content-Type: text/plain; charset="iso-2022-jp"
+Content-type: text/plain; charset="UTF-8"
+
+New content here
+
+diff --git a/foo b/foo
+index e69de29..d95f3ad 100644
+--- a/foo
++++ b/foo
+@@ -0,0 +1 @@
++New content
-- 
1.7.12.556.gb90bb19

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-09-17 22:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-17 16:34 Content-Type handling Tomas Cohen Arazi
2012-09-17 17:18 ` Re*: " Junio C Hamano
2012-09-17 22:28   ` [PATCH] mailinfo: do not concatenate charset= attribute values from mime headers Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).