All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
       [not found] ` <1410472786-14552-5-git-send-email-mark.einon@gmail.com>
@ 2014-09-13  9:37   ` Dan Carpenter
  2014-09-13 15:45     ` Greg KH
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Carpenter @ 2014-09-13  9:37 UTC (permalink / raw)
  To: Mark Einon; +Cc: devel, gregkh, git

On Thu, Sep 11, 2014 at 10:59:42PM +0100, Mark Einon wrote:
> >From struct ce_stats; unicast_pkts_rcvd, unicast_pkts_xmtd,
> multicast_pkts_xmtd, broadcast_pkts_rcvd and broadcast_pkts_xmtd

For some reason something adds a '>' to the start of lines which start
with 'From'.  I don't know what it is...

When I apply this patch with 'git am' then it just removes the From
line.

I have seen these '>From' lines before but I haven't seen anyone discuss
this problem.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13  9:37   ` [PATCH 4/8] staging: et131x: Remove ununsed statistics Dan Carpenter
@ 2014-09-13 15:45     ` Greg KH
  2014-09-13 19:41       ` Dan Carpenter
  2014-09-13 20:36       ` Jeff King
  0 siblings, 2 replies; 22+ messages in thread
From: Greg KH @ 2014-09-13 15:45 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: devel, Mark Einon, git

On Sat, Sep 13, 2014 at 12:37:46PM +0300, Dan Carpenter wrote:
> On Thu, Sep 11, 2014 at 10:59:42PM +0100, Mark Einon wrote:
> > >From struct ce_stats; unicast_pkts_rcvd, unicast_pkts_xmtd,
> > multicast_pkts_xmtd, broadcast_pkts_rcvd and broadcast_pkts_xmtd
> 
> For some reason something adds a '>' to the start of lines which start
> with 'From'.  I don't know what it is...

It's an email protocol requirement, some RFC dictates it as "From" at
the start of the line is an email "start" flag.

> When I apply this patch with 'git am' then it just removes the From
> line.

As it should :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13 15:45     ` Greg KH
@ 2014-09-13 19:41       ` Dan Carpenter
  2014-09-13 20:36       ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Carpenter @ 2014-09-13 19:41 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, Mark Einon, git

On Sat, Sep 13, 2014 at 08:45:56AM -0700, Greg KH wrote:
> On Sat, Sep 13, 2014 at 12:37:46PM +0300, Dan Carpenter wrote:
> > On Thu, Sep 11, 2014 at 10:59:42PM +0100, Mark Einon wrote:
> > > >From struct ce_stats; unicast_pkts_rcvd, unicast_pkts_xmtd,
> > > multicast_pkts_xmtd, broadcast_pkts_rcvd and broadcast_pkts_xmtd
> > 
> > For some reason something adds a '>' to the start of lines which start
> > with 'From'.  I don't know what it is...
> 
> It's an email protocol requirement, some RFC dictates it as "From" at
> the start of the line is an email "start" flag.
> 
> > When I apply this patch with 'git am' then it just removes the From
> > line.
> 
> As it should :)
> 

But now the changelog is corrupt.  I have tested with the git version
2.1.0.238.gce1d3a9.  The first line of the changelog gets chopped off
because of the ">From".

It's a little annoying.  Do we just tell Mark to resend with a different
changelog or is there a way to fix the tools?

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13 15:45     ` Greg KH
  2014-09-13 19:41       ` Dan Carpenter
@ 2014-09-13 20:36       ` Jeff King
  2014-09-13 20:47         ` Mark Einon
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2014-09-13 20:36 UTC (permalink / raw)
  To: Greg KH; +Cc: Dan Carpenter, Mark Einon, devel, git

On Sat, Sep 13, 2014 at 08:45:56AM -0700, Greg KH wrote:

> On Sat, Sep 13, 2014 at 12:37:46PM +0300, Dan Carpenter wrote:
> > On Thu, Sep 11, 2014 at 10:59:42PM +0100, Mark Einon wrote:
> > > >From struct ce_stats; unicast_pkts_rcvd, unicast_pkts_xmtd,
> > > multicast_pkts_xmtd, broadcast_pkts_rcvd and broadcast_pkts_xmtd
> > 
> > For some reason something adds a '>' to the start of lines which start
> > with 'From'.  I don't know what it is...
> 
> It's an email protocol requirement, some RFC dictates it as "From" at
> the start of the line is an email "start" flag.

It's not an RFC thing. It's a side effect of the mbox format, which
squashes together multiple messages with "From " lines to mark their
starts. So many mbox implementations will quote them as ">From" (others
introduce a Content-Length header, or are simply more careful about
making sure that the line looks like a real "From " line, which should
contain a date).

If somebody's MUA is actually transmitting emails with the quoting,
that's wrong. It is a local storage problem, and they should not be
spreading the quoting disease to other systems.

> > When I apply this patch with 'git am' then it just removes the From
> > line.
> 
> As it should :)

That seems wrong. We should either leave it as-is (i.e., assume the
writer used no quoting and really did mean ">From") or strip the ">" to
turn it into "From" (i.e., assume the writer did use quoting). In some
implementations, a literal ">From" gets quoted to ">>From" and so on. So
we could even strip one level of quoting from such things (if we assume
the writer was such an implementation).

I don't think we can make this 100% foolproof without knowing which mbox
variant the writer used. But dropping the line is probably the worst
possible thing, as it does not match _any_ variants. :)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13 20:36       ` Jeff King
@ 2014-09-13 20:47         ` Mark Einon
  2014-09-13 20:57           ` Dan Carpenter
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Einon @ 2014-09-13 20:47 UTC (permalink / raw)
  To: Jeff King; +Cc: devel, Greg KH, git, Dan Carpenter

On Sat, Sep 13, 2014 at 04:36:45PM -0400, Jeff King wrote:
> I don't think we can make this 100% foolproof without knowing which mbox
> variant the writer used. But dropping the line is probably the worst
> possible thing, as it does not match _any_ variants. :)

Hi,

FYI it was 'git send-email' v2.1.0 that sent the mail, and I don't have
the offending character in any versions of the mail I can see.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13 20:47         ` Mark Einon
@ 2014-09-13 20:57           ` Dan Carpenter
  2014-09-13 21:06             ` Mark Einon
  2014-09-13 21:09             ` Dan Carpenter
  0 siblings, 2 replies; 22+ messages in thread
From: Dan Carpenter @ 2014-09-13 20:57 UTC (permalink / raw)
  To: Mark Einon; +Cc: Jeff King, devel, git, Greg KH

On Sat, Sep 13, 2014 at 09:47:45PM +0100, Mark Einon wrote:
> On Sat, Sep 13, 2014 at 04:36:45PM -0400, Jeff King wrote:
> > I don't think we can make this 100% foolproof without knowing which mbox
> > variant the writer used. But dropping the line is probably the worst
> > possible thing, as it does not match _any_ variants. :)
> 
> Hi,
> 
> FYI it was 'git send-email' v2.1.0 that sent the mail, and I don't have
> the offending character in any versions of the mail I can see.
> 

The mailing list version has it.

http://www.spinics.net/lists/linux-driver-devel/msg54372.html

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13 20:57           ` Dan Carpenter
@ 2014-09-13 21:06             ` Mark Einon
  2014-09-13 21:09             ` Dan Carpenter
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Einon @ 2014-09-13 21:06 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Jeff King, devel, git, Greg KH

On Sat, Sep 13, 2014 at 11:57:51PM +0300, Dan Carpenter wrote:
> On Sat, Sep 13, 2014 at 09:47:45PM +0100, Mark Einon wrote:
> > On Sat, Sep 13, 2014 at 04:36:45PM -0400, Jeff King wrote:
> > > I don't think we can make this 100% foolproof without knowing which mbox
> > > variant the writer used. But dropping the line is probably the worst
> > > possible thing, as it does not match _any_ variants. :)
> > 
> > Hi,
> > 
> > FYI it was 'git send-email' v2.1.0 that sent the mail, and I don't have
> > the offending character in any versions of the mail I can see.
> > 
> 
> The mailing list version has it.
> 
> http://www.spinics.net/lists/linux-driver-devel/msg54372.html

Fair enough. The marc.info version doesn't though, so it's proably not my MUA:

http://marc.info/?l=linux-driver-devel&m=141047281318963&w=2

Cheers,

Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] staging: et131x: Remove ununsed statistics
  2014-09-13 20:57           ` Dan Carpenter
  2014-09-13 21:06             ` Mark Einon
@ 2014-09-13 21:09             ` Dan Carpenter
  2014-09-13 21:25               ` [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Dan Carpenter @ 2014-09-13 21:09 UTC (permalink / raw)
  To: Mark Einon; +Cc: Jeff King, devel, git, Greg KH

On Sat, Sep 13, 2014 at 11:57:51PM +0300, Dan Carpenter wrote:
> On Sat, Sep 13, 2014 at 09:47:45PM +0100, Mark Einon wrote:
> > On Sat, Sep 13, 2014 at 04:36:45PM -0400, Jeff King wrote:
> > > I don't think we can make this 100% foolproof without knowing which mbox
> > > variant the writer used. But dropping the line is probably the worst
> > > possible thing, as it does not match _any_ variants. :)
> > 
> > Hi,
> > 
> > FYI it was 'git send-email' v2.1.0 that sent the mail, and I don't have
> > the offending character in any versions of the mail I can see.
> > 
> 
> The mailing list version has it.
> 
> http://www.spinics.net/lists/linux-driver-devel/msg54372.html

Or based on Peff's email it might be a bug in the spinics list software.
Here are some other examples:

Piper mail has '>From'.
http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2014-September/058299.html

but gmane gets it right.
http://comments.gmane.org/gmane.linux.drivers.driver-project.devel/57684

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-13 21:09             ` Dan Carpenter
@ 2014-09-13 21:25               ` Jeff King
  2014-09-13 22:57                 ` brian m. carlson
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2014-09-13 21:25 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Mark Einon, Greg KH, git, Junio C Hamano

[-cc driver-devel list, as this is getting into git patches]

On Sun, Sep 14, 2014 at 12:09:08AM +0300, Dan Carpenter wrote:

> > > FYI it was 'git send-email' v2.1.0 that sent the mail, and I don't have
> > > the offending character in any versions of the mail I can see.
> [...]
> Piper mail has '>From'.
> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2014-September/058299.html
> 
> but gmane gets it right.
> http://comments.gmane.org/gmane.linux.drivers.driver-project.devel/57684

Thanks both of you for following up. I did confirm that git-send-email
does not add such quoting. From your findings above, I'd agree that it's
the list-archive software munging it, and they are buggy IMHO (they
should de-quote on display).

Here's an RFC patch to help the "git am" side handle this case better.

-- >8 --
Subject: mailinfo: do not treat ">From" lines as in-body headers

Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line
inside body, 2006-05-21), we have treated lines like ">From"
in the body as headers. This makes "git am" work for people
who erroneously paste the whole mbox entry:

  From 12345abcd...
  From: them
  Subject: [PATCH] whatever

into their email body. However, it also causes false
positives for people who really do have ">From" in the first
paragraph of their commit messages. In this case, we'll drop
the line completely, breaking the commit message.

We could try to make our checking more robust by doing one
or both of:

  - making sure the line looks like a git "From " line
    (40-hex sha1, date, etc).

  - seeing if the following lines are actually rfc2822
    headers

However, it's probably not worth the complexity. There are a
few reasons to think that this code is not actually
triggered very often. One, the patch was written in 2006,
when git was still relatively new, and people frequently
made mistakes in sending patches; these days we not see this
error much. And two, we check only the quoted ">From" form,
and not the regular "From". So whether it kicks in at all
depends entirely on how the mbox is saved by the user's MUA.
And in the intervening 8 years, nobody has complained about
the "From" case.

With this patch, we will simply treat such ">From" lines as
normal body lines (and stop in-body header parsing). Note
that we do _not_ unquote them into "From". Whether
from-quoting is in effect depends on the exact mbox format
being used, which depends on the MUA writing the file. We
cannot know for sure whether to unquote or not, so we leave
the line alone.

Signed-off-by: Jeff King <peff@peff.net>
---
I admit my arguments that it is not in use are a little flaky, and this
may just be me being lazy. Trying to match arbitrary "From" lines is
very hard and heuristic-filled, but if we are only trying to match
git-generated mbox lines, that's much easier. It would not hurt too much
to go that route.

I also tend to think we should unquote ">From" into "From". As discussed
above, we do know whether the author meant the literal former, or meant
the latter and it quoted into the former. But I'd guess that a literal
">From" is quite rare, so we'd probably serve more people by de-quoting.
That is really a separate issue for another patch, though.

 builtin/mailinfo.c         |  4 ----
 t/t5100-mailinfo.sh        |  9 +++++++++
 t/t5100/quoted-from.expect |  3 +++
 t/t5100/quoted-from.in     | 10 ++++++++++
 4 files changed, 22 insertions(+), 4 deletions(-)
 create mode 100644 t/t5100/quoted-from.expect
 create mode 100644 t/t5100/quoted-from.in

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index cf11c8d..0a08e44 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -328,10 +328,6 @@ static int check_header(const struct strbuf *line,
 	}
 
 	/* for inbody stuff */
-	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
-		ret = 1; /* Should this return 0? */
-		goto check_header_out;
-	}
 	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
 		for (i = 0; header[i]; i++) {
 			if (!strcmp("Subject", header[i])) {
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 3e64a7a..578ff16 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -89,4 +89,13 @@ test_expect_success 'mailinfo on from header without name works' '
 
 '
 
+test_expect_success 'mailinfo on message with quoted >From' '
+	mkdir quoted-from &&
+	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailinfo quoted-from/msg quoted-from/patch \
+	  <quoted-from/0001 >quoted-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+'
+
 test_done
diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
new file mode 100644
index 0000000..8c9d48c
--- /dev/null
+++ b/t/t5100/quoted-from.expect
@@ -0,0 +1,3 @@
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
new file mode 100644
index 0000000..847e1c4
--- /dev/null
+++ b/t/t5100/quoted-from.in
@@ -0,0 +1,10 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Author Name <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted >From
+
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
+---
+patch
-- 
2.1.0.373.g91ca799

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-13 21:25               ` [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers Jeff King
@ 2014-09-13 22:57                 ` brian m. carlson
  2014-09-14  0:47                   ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: brian m. carlson @ 2014-09-13 22:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Dan Carpenter, Mark Einon, Greg KH, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

On Sat, Sep 13, 2014 at 05:25:05PM -0400, Jeff King wrote:
> Thanks both of you for following up. I did confirm that git-send-email
> does not add such quoting. From your findings above, I'd agree that it's
> the list-archive software munging it, and they are buggy IMHO (they
> should de-quote on display).

I wonder if git send-email should do what mutt does in this case, which
is use quoted-printable encoding and encode the first F as =46 (as well
as any equals signs as =3D).  It looks like mailinfo.c already is
capable of handling that, and that would avoid the entire issue.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-13 22:57                 ` brian m. carlson
@ 2014-09-14  0:47                   ` Jeff King
  2014-09-14  0:55                     ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2014-09-14  0:47 UTC (permalink / raw)
  To: git; +Cc: Dan Carpenter, Mark Einon, Greg KH, git, Junio C Hamano

On Sat, Sep 13, 2014 at 10:57:14PM +0000, brian m. carlson wrote:

> On Sat, Sep 13, 2014 at 05:25:05PM -0400, Jeff King wrote:
> > Thanks both of you for following up. I did confirm that git-send-email
> > does not add such quoting. From your findings above, I'd agree that it's
> > the list-archive software munging it, and they are buggy IMHO (they
> > should de-quote on display).
> 
> I wonder if git send-email should do what mutt does in this case, which
> is use quoted-printable encoding and encode the first F as =46 (as well
> as any equals signs as =3D).  It looks like mailinfo.c already is
> capable of handling that, and that would avoid the entire issue.

That's not an unreasonable tactic. However, I think we'd still want to
do something with mailinfo on the receiving end, similar to the patch I
sent. We don't know that the sending side is necessarily send-email.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-14  0:47                   ` Jeff King
@ 2014-09-14  0:55                     ` Junio C Hamano
  2014-09-14  1:01                       ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2014-09-14  0:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

On Sat, Sep 13, 2014 at 5:47 PM, Jeff King <peff@peff.net> wrote:
>
> On Sat, Sep 13, 2014 at 10:57:14PM +0000, brian m. carlson wrote:
>
> > I wonder if git send-email should do what mutt does in this case, which
> > is use quoted-printable encoding and encode the first F as =46 (as well
> > as any equals signs as =3D).  It looks like mailinfo.c already is
> > capable of handling that, and that would avoid the entire issue.
>
> That's not an unreasonable tactic. However, I think we'd still want to
> do something with mailinfo on the receiving end, similar to the patch I
> sent. We don't know that the sending side is necessarily send-email.

Hmm, isn't the ">" stuffing in front of a beginning-of-line "From " purely
a local matter of MUA that stores messages in (old-style) mbox format
where a line that begins with "From " is what defines the end of the
previous message? Why should send-email do anything when it sends
individual messages separately out?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-14  0:55                     ` Junio C Hamano
@ 2014-09-14  1:01                       ` Jeff King
  2014-09-14  1:30                         ` Jeff King
  2014-09-15 17:55                         ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Jeff King @ 2014-09-14  1:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

On Sat, Sep 13, 2014 at 05:55:49PM -0700, Junio C Hamano wrote:

> On Sat, Sep 13, 2014 at 5:47 PM, Jeff King <peff@peff.net> wrote:
> >
> > On Sat, Sep 13, 2014 at 10:57:14PM +0000, brian m. carlson wrote:
> >
> > > I wonder if git send-email should do what mutt does in this case, which
> > > is use quoted-printable encoding and encode the first F as =46 (as well
> > > as any equals signs as =3D).  It looks like mailinfo.c already is
> > > capable of handling that, and that would avoid the entire issue.
> >
> > That's not an unreasonable tactic. However, I think we'd still want to
> > do something with mailinfo on the receiving end, similar to the patch I
> > sent. We don't know that the sending side is necessarily send-email.
> 
> Hmm, isn't the ">" stuffing in front of a beginning-of-line "From " purely
> a local matter of MUA that stores messages in (old-style) mbox format
> where a line that begins with "From " is what defines the end of the
> previous message?

Yes, it is[1].

> Why should send-email do anything when it sends individual messages
> separately out?

It does not need to, but the QP-transformation helps protect against
other, stupider software downstream.  And unlike From-quoting it is
actually well-specified and reversible.

-Peff

[1] We do use the mbox format in git, and AFAIK do not do any
    From-quoting of this nature.  I haven't tested, but I suspect that
    certain format-patch output would be corrupted when reading back via
    "git am", let alone other random mbox readers.  If we wanted to do
    the QP magic brian suggests, it would probably make sense to do it
    as part of format-patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-14  1:01                       ` Jeff King
@ 2014-09-14  1:30                         ` Jeff King
  2014-09-15 18:56                           ` Junio C Hamano
  2014-09-15 17:55                         ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2014-09-14  1:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

On Sat, Sep 13, 2014 at 09:01:20PM -0400, Jeff King wrote:

> [1] We do use the mbox format in git, and AFAIK do not do any
>     From-quoting of this nature.  I haven't tested, but I suspect that
>     certain format-patch output would be corrupted when reading back via
>     "git am", let alone other random mbox readers.  If we wanted to do
>     the QP magic brian suggests, it would probably make sense to do it
>     as part of format-patch.

It looks like we have a reasonably sane is_from_line() function. So at
least _we_ will not generally break on reading our own output, except in
some extreme circumstances (you'd have to come up with something
contrived like "From me, at 10:30 30 minutes before 11!").

We can pretty easily reuse this to make the from-line check in mailinfo
more robust. Here's a replacement for the patch I sent earlier that
keeps the "magically ignore extra >From headers" behavior but fixes the
case that started this discussion.

We could still do the QP thing to protect against other downstream tools
(or to cover the contrived cases as above), but I think with this patch
we at least cover the sane cases.

-- >8 --
Subject: mailinfo: make ">From" in-body header check more robust

Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line
inside body, 2006-05-21), we have treated lines like ">From"
in the body as headers. This makes "git am" work for people
who erroneously paste the whole mbox entry:

  From 12345abcd...
  From: them
  Subject: [PATCH] whatever

into their email body (assuming that an mbox writer then
quotes "From" as ">From", as otherwise we would actually
mailsplit on the in-body line).

However, this has false positives if somebody actually has a
commit body that starts with "From "; in this case we
erroneously remove the line entirely from the commit
message. We can make this check more robust by making sure
the line actually looks like a real mbox "From" line.

To do this we pull the "is_from_line" definition out of
mailsplit, and make it available for general use.

Signed-off-by: Jeff King <peff@peff.net>
---
 Makefile                   |  1 +
 builtin/mailinfo.c         |  2 +-
 builtin/mailsplit.c        | 31 -------------------------------
 cache.h                    |  6 ++++++
 mbox.c                     | 32 ++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh        | 18 ++++++++++++++++++
 t/t5100/embed-from.expect  |  5 +++++
 t/t5100/embed-from.in      | 13 +++++++++++++
 t/t5100/quoted-from.expect |  3 +++
 t/t5100/quoted-from.in     | 10 ++++++++++
 10 files changed, 89 insertions(+), 32 deletions(-)
 create mode 100644 mbox.c
 create mode 100644 t/t5100/embed-from.expect
 create mode 100644 t/t5100/embed-from.in
 create mode 100644 t/t5100/quoted-from.expect
 create mode 100644 t/t5100/quoted-from.in

diff --git a/Makefile b/Makefile
index e0f15a3..e018450 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,7 @@ LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
+LIB_OBJS += mbox.o
 LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index cf11c8d..a434d39 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -329,7 +329,7 @@ static int check_header(const struct strbuf *line,
 
 	/* for inbody stuff */
 	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
-		ret = 1; /* Should this return 0? */
+		ret = is_from_line(line->buf + 1, line->len - 1);
 		goto check_header_out;
 	}
 	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 763cda0..11ba281 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -12,37 +12,6 @@
 static const char git_mailsplit_usage[] =
 "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]";
 
-static int is_from_line(const char *line, int len)
-{
-	const char *colon;
-
-	if (len < 20 || memcmp("From ", line, 5))
-		return 0;
-
-	colon = line + len - 2;
-	line += 5;
-	for (;;) {
-		if (colon < line)
-			return 0;
-		if (*--colon == ':')
-			break;
-	}
-
-	if (!isdigit(colon[-4]) ||
-	    !isdigit(colon[-2]) ||
-	    !isdigit(colon[-1]) ||
-	    !isdigit(colon[ 1]) ||
-	    !isdigit(colon[ 2]))
-		return 0;
-
-	/* year */
-	if (strtol(colon+3, NULL, 10) <= 90)
-		return 0;
-
-	/* Ok, close enough */
-	return 1;
-}
-
 static struct strbuf buf = STRBUF_INIT;
 static int keep_cr;
 
diff --git a/cache.h b/cache.h
index dfa1a56..9e71ca5 100644
--- a/cache.h
+++ b/cache.h
@@ -1568,4 +1568,10 @@ void stat_validity_update(struct stat_validity *sv, int fd);
 
 int versioncmp(const char *s1, const char *s2);
 
+/*
+ * Returns true if the line appears to be an mbox "From" line starting a new
+ * message.
+ */
+int is_from_line(const char *line, int len);
+
 #endif /* CACHE_H */
diff --git a/mbox.c b/mbox.c
new file mode 100644
index 0000000..75f3150
--- /dev/null
+++ b/mbox.c
@@ -0,0 +1,32 @@
+#include "cache.h"
+
+int is_from_line(const char *line, int len)
+{
+	const char *colon;
+
+	if (len < 20 || memcmp("From ", line, 5))
+		return 0;
+
+	colon = line + len - 2;
+	line += 5;
+	for (;;) {
+		if (colon < line)
+			return 0;
+		if (*--colon == ':')
+			break;
+	}
+
+	if (!isdigit(colon[-4]) ||
+	    !isdigit(colon[-2]) ||
+	    !isdigit(colon[-1]) ||
+	    !isdigit(colon[ 1]) ||
+	    !isdigit(colon[ 2]))
+		return 0;
+
+	/* year */
+	if (strtol(colon+3, NULL, 10) <= 90)
+		return 0;
+
+	/* Ok, close enough */
+	return 1;
+}
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 3e64a7a..9e1ad1c 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name works' '
 
 '
 
+test_expect_success 'mailinfo finds headers after embedded From line' '
+	mkdir embed-from &&
+	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
+	git mailinfo embed-from/msg embed-from/patch \
+	  <embed-from/0001 >embed-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
+'
+
+test_expect_success 'mailinfo on message with quoted >From' '
+	mkdir quoted-from &&
+	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailinfo quoted-from/msg quoted-from/patch \
+	  <quoted-from/0001 >quoted-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+'
+
 test_done
diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect
new file mode 100644
index 0000000..06a3a38
--- /dev/null
+++ b/t/t5100/embed-from.expect
@@ -0,0 +1,5 @@
+Author: Commit Author
+Email: commit@example.com
+Subject: patch subject
+Date: Sat, 13 Sep 2014 21:13:23 -0400 
+
diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in
new file mode 100644
index 0000000..5f3f84e
--- /dev/null
+++ b/t/t5100/embed-from.in
@@ -0,0 +1,13 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Email Author <email@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] email subject
+
+>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Commit Author <commit@example.com>
+Date: Sat, 13 Sep 2014 21:13:23 -0400
+Subject: patch subject
+
+patch body
+---
+patch
diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
new file mode 100644
index 0000000..8c9d48c
--- /dev/null
+++ b/t/t5100/quoted-from.expect
@@ -0,0 +1,3 @@
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
new file mode 100644
index 0000000..847e1c4
--- /dev/null
+++ b/t/t5100/quoted-from.in
@@ -0,0 +1,10 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Author Name <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted >From
+
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
+---
+patch
-- 
2.1.0.373.g91ca799

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-14  1:01                       ` Jeff King
  2014-09-14  1:30                         ` Jeff King
@ 2014-09-15 17:55                         ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2014-09-15 17:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

Jeff King <peff@peff.net> writes:

> On Sat, Sep 13, 2014 at 05:55:49PM -0700, Junio C Hamano wrote:
>
>> On Sat, Sep 13, 2014 at 5:47 PM, Jeff King <peff@peff.net> wrote:
>> >
>> > On Sat, Sep 13, 2014 at 10:57:14PM +0000, brian m. carlson wrote:
>> >
>> > > I wonder if git send-email should do what mutt does in this case, which
>> > > is use quoted-printable encoding and encode the first F as =46 (as well
>> > > as any equals signs as =3D).  It looks like mailinfo.c already is
>> > > capable of handling that, and that would avoid the entire issue.
>> >
>> > That's not an unreasonable tactic. However, I think we'd still want to
>> > do something with mailinfo on the receiving end, similar to the patch I
>> > sent. We don't know that the sending side is necessarily send-email.
>> 
>> Hmm, isn't the ">" stuffing in front of a beginning-of-line "From " purely
>> a local matter of MUA that stores messages in (old-style) mbox format
>> where a line that begins with "From " is what defines the end of the
>> previous message?
>
> Yes, it is[1].
>
>> Why should send-email do anything when it sends individual messages
>> separately out?
>
> It does not need to, but the QP-transformation helps protect against
> other, stupider software downstream.  And unlike From-quoting it is
> actually well-specified and reversible.

Oh, I was only reacting to a phantom suggestion nobody made to add
">" on the sending side (which would not help anybody), but now I
re-read the thread with a larger screen I realize nobody made such a
suggestion.  Sorry for a noise.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-14  1:30                         ` Jeff King
@ 2014-09-15 18:56                           ` Junio C Hamano
  2014-09-15 20:15                             ` Junio C Hamano
  2014-09-16  0:12                             ` Jeff King
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2014-09-15 18:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

Jeff King <peff@peff.net> writes:

> It looks like we have a reasonably sane is_from_line() function. So at
> least _we_ will not generally break on reading our own output, except in
> some extreme circumstances (you'd have to come up with something
> contrived like "From me, at 10:30 30 minutes before 11!").
>
> We can pretty easily reuse this to make the from-line check in mailinfo
> more robust. Here's a replacement for the patch I sent earlier that
> keeps the "magically ignore extra >From headers" behavior but fixes the
> case that started this discussion.

Why cache.h when this is still only between mail{info,split}.c both
of which do not really deal with any "Git" data?

For mailsplit, we are trying to detect mbox boundary various MUAs
would use in their output, and is_from_line() may be appropriate,
but I am not sure if the same logic is appropriate for mailinfo.
What are we trying to protect us against?  Those who paste a single
e-mail output from format-patch in full?  Do people paste a single
e-mail received to their mailbox from somebody else and do we need
to protect against that, skipping the ">From " thing, while risking
to skip "From me at 10:30 30 minutes..."?

If we only want to skip ">?From" in pasted format-patch output, we
would want a rule in mailinfo that is tighter than is_from_line() in
mailsplit.

By the way, I see ">From " in is_rfc2822_header(), too.  Do we have
to worry about this comparison as well?

> -- >8 --
> Subject: mailinfo: make ">From" in-body header check more robust
>
> Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line
> inside body, 2006-05-21), we have treated lines like ">From"
> in the body as headers. This makes "git am" work for people
> who erroneously paste the whole mbox entry:
>
>   From 12345abcd...
>   From: them
>   Subject: [PATCH] whatever
>
> into their email body (assuming that an mbox writer then
> quotes "From" as ">From", as otherwise we would actually
> mailsplit on the in-body line).
>
> However, this has false positives if somebody actually has a
> commit body that starts with "From "; in this case we
> erroneously remove the line entirely from the commit
> message. We can make this check more robust by making sure
> the line actually looks like a real mbox "From" line.
>
> To do this we pull the "is_from_line" definition out of
> mailsplit, and make it available for general use.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Makefile                   |  1 +
>  builtin/mailinfo.c         |  2 +-
>  builtin/mailsplit.c        | 31 -------------------------------
>  cache.h                    |  6 ++++++
>  mbox.c                     | 32 ++++++++++++++++++++++++++++++++
>  t/t5100-mailinfo.sh        | 18 ++++++++++++++++++
>  t/t5100/embed-from.expect  |  5 +++++
>  t/t5100/embed-from.in      | 13 +++++++++++++
>  t/t5100/quoted-from.expect |  3 +++
>  t/t5100/quoted-from.in     | 10 ++++++++++
>  10 files changed, 89 insertions(+), 32 deletions(-)
>  create mode 100644 mbox.c
>  create mode 100644 t/t5100/embed-from.expect
>  create mode 100644 t/t5100/embed-from.in
>  create mode 100644 t/t5100/quoted-from.expect
>  create mode 100644 t/t5100/quoted-from.in
>
> diff --git a/Makefile b/Makefile
> index e0f15a3..e018450 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -704,6 +704,7 @@ LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
>  LIB_OBJS += mailmap.o
>  LIB_OBJS += match-trees.o
> +LIB_OBJS += mbox.o
>  LIB_OBJS += merge.o
>  LIB_OBJS += merge-blobs.o
>  LIB_OBJS += merge-recursive.o
> diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
> index cf11c8d..a434d39 100644
> --- a/builtin/mailinfo.c
> +++ b/builtin/mailinfo.c
> @@ -329,7 +329,7 @@ static int check_header(const struct strbuf *line,
>  
>  	/* for inbody stuff */
>  	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
> -		ret = 1; /* Should this return 0? */
> +		ret = is_from_line(line->buf + 1, line->len - 1);
>  		goto check_header_out;
>  	}
>  	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
> diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> index 763cda0..11ba281 100644
> --- a/builtin/mailsplit.c
> +++ b/builtin/mailsplit.c
> @@ -12,37 +12,6 @@
>  static const char git_mailsplit_usage[] =
>  "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]";
>  
> -static int is_from_line(const char *line, int len)
> -{
> -	const char *colon;
> -
> -	if (len < 20 || memcmp("From ", line, 5))
> -		return 0;
> -
> -	colon = line + len - 2;
> -	line += 5;
> -	for (;;) {
> -		if (colon < line)
> -			return 0;
> -		if (*--colon == ':')
> -			break;
> -	}
> -
> -	if (!isdigit(colon[-4]) ||
> -	    !isdigit(colon[-2]) ||
> -	    !isdigit(colon[-1]) ||
> -	    !isdigit(colon[ 1]) ||
> -	    !isdigit(colon[ 2]))
> -		return 0;
> -
> -	/* year */
> -	if (strtol(colon+3, NULL, 10) <= 90)
> -		return 0;
> -
> -	/* Ok, close enough */
> -	return 1;
> -}
> -
>  static struct strbuf buf = STRBUF_INIT;
>  static int keep_cr;
>  
> diff --git a/cache.h b/cache.h
> index dfa1a56..9e71ca5 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1568,4 +1568,10 @@ void stat_validity_update(struct stat_validity *sv, int fd);
>  
>  int versioncmp(const char *s1, const char *s2);
>  
> +/*
> + * Returns true if the line appears to be an mbox "From" line starting a new
> + * message.
> + */
> +int is_from_line(const char *line, int len);
> +
>  #endif /* CACHE_H */
> diff --git a/mbox.c b/mbox.c
> new file mode 100644
> index 0000000..75f3150
> --- /dev/null
> +++ b/mbox.c
> @@ -0,0 +1,32 @@
> +#include "cache.h"
> +
> +int is_from_line(const char *line, int len)
> +{
> +	const char *colon;
> +
> +	if (len < 20 || memcmp("From ", line, 5))
> +		return 0;
> +
> +	colon = line + len - 2;
> +	line += 5;
> +	for (;;) {
> +		if (colon < line)
> +			return 0;
> +		if (*--colon == ':')
> +			break;
> +	}
> +
> +	if (!isdigit(colon[-4]) ||
> +	    !isdigit(colon[-2]) ||
> +	    !isdigit(colon[-1]) ||
> +	    !isdigit(colon[ 1]) ||
> +	    !isdigit(colon[ 2]))
> +		return 0;
> +
> +	/* year */
> +	if (strtol(colon+3, NULL, 10) <= 90)
> +		return 0;
> +
> +	/* Ok, close enough */
> +	return 1;
> +}
> diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
> index 3e64a7a..9e1ad1c 100755
> --- a/t/t5100-mailinfo.sh
> +++ b/t/t5100-mailinfo.sh
> @@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name works' '
>  
>  '
>  
> +test_expect_success 'mailinfo finds headers after embedded From line' '
> +	mkdir embed-from &&
> +	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
> +	git mailinfo embed-from/msg embed-from/patch \
> +	  <embed-from/0001 >embed-from/out &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
> +'
> +
> +test_expect_success 'mailinfo on message with quoted >From' '
> +	mkdir quoted-from &&
> +	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
> +	git mailinfo quoted-from/msg quoted-from/patch \
> +	  <quoted-from/0001 >quoted-from/out &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
> +'
> +
>  test_done
> diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect
> new file mode 100644
> index 0000000..06a3a38
> --- /dev/null
> +++ b/t/t5100/embed-from.expect
> @@ -0,0 +1,5 @@
> +Author: Commit Author
> +Email: commit@example.com
> +Subject: patch subject
> +Date: Sat, 13 Sep 2014 21:13:23 -0400 
> +
> diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in
> new file mode 100644
> index 0000000..5f3f84e
> --- /dev/null
> +++ b/t/t5100/embed-from.in
> @@ -0,0 +1,13 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Email Author <email@example.com>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] email subject
> +
> +>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Commit Author <commit@example.com>
> +Date: Sat, 13 Sep 2014 21:13:23 -0400
> +Subject: patch subject
> +
> +patch body
> +---
> +patch
> diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
> new file mode 100644
> index 0000000..8c9d48c
> --- /dev/null
> +++ b/t/t5100/quoted-from.expect
> @@ -0,0 +1,3 @@
> +>From the depths of history, we are stuck with the
> +flaky mbox format.
> +
> diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
> new file mode 100644
> index 0000000..847e1c4
> --- /dev/null
> +++ b/t/t5100/quoted-from.in
> @@ -0,0 +1,10 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Author Name <somebody@example.com>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] testing quoted >From
> +
> +>From the depths of history, we are stuck with the
> +flaky mbox format.
> +
> +---
> +patch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-15 18:56                           ` Junio C Hamano
@ 2014-09-15 20:15                             ` Junio C Hamano
  2014-09-16  0:19                               ` Jeff King
  2014-09-16  0:12                             ` Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2014-09-15 20:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

Junio C Hamano <gitster@pobox.com> writes:

> Why cache.h when this is still only between mail{info,split}.c both
> of which do not really deal with any "Git" data?
>
> For mailsplit, we are trying to detect mbox boundary various MUAs
> would use in their output, and is_from_line() may be appropriate,
> but I am not sure if the same logic is appropriate for mailinfo.
> What are we trying to protect us against?  Those who paste a single
> e-mail output from format-patch in full?  Do people paste a single
> e-mail received to their mailbox from somebody else and do we need
> to protect against that, skipping the ">From " thing, while risking
> to skip "From me at 10:30 30 minutes..."?
>
> If we only want to skip ">?From" in pasted format-patch output, we
> would want a rule in mailinfo that is tighter than is_from_line() in
> mailsplit.

That is, something like this on top of your patch.  Or is this a bit
too strict?

 Makefile            |  1 +
 builtin/mailinfo.c  |  3 ++-
 builtin/mailsplit.c |  1 +
 cache.h             |  6 ------
 mbox.c              | 15 +++++++++++++++
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index dc5d2af..c0491a3 100644
--- a/Makefile
+++ b/Makefile
@@ -686,6 +686,7 @@ LIB_H += list-objects.h
 LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
+LIB_H += mbox.h
 LIB_H += merge-blobs.h
 LIB_H += merge-recursive.h
 LIB_H += mergesort.h
diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index a434d39..ccccd69 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -6,6 +6,7 @@
 #include "builtin.h"
 #include "utf8.h"
 #include "strbuf.h"
+#include "mbox.h"
 
 static FILE *cmitmsg, *patchfile, *fin, *fout;
 
@@ -329,7 +330,7 @@ static int check_header(const struct strbuf *line,
 
 	/* for inbody stuff */
 	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
-		ret = is_from_line(line->buf + 1, line->len - 1);
+		ret = is_format_patch_separator(line->buf + 1, line->len - 1);
 		goto check_header_out;
 	}
 	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 775588e..d8da1e4 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -8,6 +8,7 @@
 #include "builtin.h"
 #include "string-list.h"
 #include "strbuf.h"
+#include "mbox.h"
 
 static const char git_mailsplit_usage[] =
 "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]";
diff --git a/cache.h b/cache.h
index eae58e7..fcb511d 100644
--- a/cache.h
+++ b/cache.h
@@ -1502,10 +1502,4 @@ void stat_validity_update(struct stat_validity *sv, int fd);
 
 int versioncmp(const char *s1, const char *s2);
 
-/*
- * Returns true if the line appears to be an mbox "From" line starting a new
- * message.
- */
-int is_from_line(const char *line, int len);
-
 #endif /* CACHE_H */
diff --git a/mbox.c b/mbox.c
index 75f3150..2ab2f85 100644
--- a/mbox.c
+++ b/mbox.c
@@ -30,3 +30,18 @@ int is_from_line(const char *line, int len)
 	/* Ok, close enough */
 	return 1;
 }
+
+#define SAMPLE "From e6807f3efca28b30decfecb1732a56c7db1137ee Mon Sep 17 00:00:00 2001\n"
+int is_format_patch_separator(const char *line, int len)
+{
+	const char *cp;
+
+	if (len != strlen(SAMPLE))
+		return 0;
+	if (!skip_prefix(line, "From ", &cp))
+		return 0;
+	if (strspn(cp, "0123456789abcdef") != 40)
+		return 0;
+	cp += 40;
+	return !memcmp(SAMPLE + (cp - line), cp, strlen(SAMPLE) - (cp - line));
+}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-15 18:56                           ` Junio C Hamano
  2014-09-15 20:15                             ` Junio C Hamano
@ 2014-09-16  0:12                             ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff King @ 2014-09-16  0:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

On Mon, Sep 15, 2014 at 11:56:16AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It looks like we have a reasonably sane is_from_line() function. So at
> > least _we_ will not generally break on reading our own output, except in
> > some extreme circumstances (you'd have to come up with something
> > contrived like "From me, at 10:30 30 minutes before 11!").
> >
> > We can pretty easily reuse this to make the from-line check in mailinfo
> > more robust. Here's a replacement for the patch I sent earlier that
> > keeps the "magically ignore extra >From headers" behavior but fixes the
> > case that started this discussion.
> 
> Why cache.h when this is still only between mail{info,split}.c both
> of which do not really deal with any "Git" data?

My thought process was basically:

  1. It should go into libgit.a, so we don't end up with an accidental
     libgit->builtin dependency.

  2. It is only one function, and we seem to stuff everything into
     cache.h, so it is probably not a big deal to do it there.

I'd be happy with mbox.h, too.

> For mailsplit, we are trying to detect mbox boundary various MUAs
> would use in their output, and is_from_line() may be appropriate,
> but I am not sure if the same logic is appropriate for mailinfo.
> What are we trying to protect us against?  Those who paste a single
> e-mail output from format-patch in full?  Do people paste a single
> e-mail received to their mailbox from somebody else and do we need
> to protect against that, skipping the ">From " thing, while risking
> to skip "From me at 10:30 30 minutes..."?
> 
> If we only want to skip ">?From" in pasted format-patch output, we
> would want a rule in mailinfo that is tighter than is_from_line() in
> mailsplit.

Yes, the current is_from_line is much looser than we need here. It must
parse lines from an arbitrary writer, and you are right that here we are
specifically interested in git header-lines (at least that is the
rationale given by your 81c5cf7, and what I was trying to preserve).

I mostly just didn't think it mattered much. Despite coming up with a
silly false positive, I doubt it would happen in practice (remember that
in addition to matching, it must _also_ be in an in-body header block at
the start of the body).

> By the way, I see ">From " in is_rfc2822_header(), too.  Do we have
> to worry about this comparison as well?

I don't think so. We call that from read_one_header_line, which is used
when running through the _real_ rfc2822 header. Such a ">From" line
there would not make any sense (we skip "From" at arbitrary points in
the header, too, which is kind of weird; our real goal is to skip the
initial one, which we could do separately). But it does not hurt to be
overly aggressive in eating them there, as they cannot possibly be body
lines.

We do the same thing after a multipart boundary, so if you had:

  <regular rfc2822 email headers>
  Content-Type: multipart/mixed; boundary=foo

  --foo
  Content-Type: text/plain

  your message

  --foo
  Content-Type: message/rfc822

  From blah blah blah
  <commit's rfc2822 email headers>

  --foo--

then I guess it would help. I am not sure that doing so is actually
valid (you do not need the mbox splitter inside the multipart, and it is
actively wrong and a potential danger to mbox readers to put it there).
But if you did put it there, _and_ an mbox writer quoted that line to
">From", I guess we would want to ignore it.

So from my reading (and I am not 100% sure that I am not missing
something), I think we could be much stricter in our parsing there. But
given that it is not causing any problems, and has been that way for
quite a while, I am quite hesitant to change it.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-15 20:15                             ` Junio C Hamano
@ 2014-09-16  0:19                               ` Jeff King
  2014-09-16 18:01                                 ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2014-09-16  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

On Mon, Sep 15, 2014 at 01:15:35PM -0700, Junio C Hamano wrote:

> > If we only want to skip ">?From" in pasted format-patch output, we
> > would want a rule in mailinfo that is tighter than is_from_line() in
> > mailsplit.
> 
> That is, something like this on top of your patch.  Or is this a bit
> too strict?

The only cases that I can think of that would be a problem with this
strictness are:

  1. Somebody writes format-patch output to a file, reads in the mbox
     using another program, and then writes out the result (munging the
     mbox From line). And then pastes the whole thing into their email
     body.

     I can see the first part happening. But given that it is totally
     irrelevant _unless_ they then screw up and paste the From line in
     the body (which is already a corner case), it probably doesn't
     matter.

  2. We change the static From lines that git generates. We can always
     update the parser, of course, but it may be running a different
     version of git than the sender.  People with an old git running
     "git am" would stop skipping past "From" lines in messages from
     people on newer gits.

Again, this eating of the in-body "From" line is already a corner case,
so it's not the end of the world if it breaks in a few cases. But I'd
also be fine with just leaving it looser.

>  Makefile            |  1 +
>  builtin/mailinfo.c  |  3 ++-
>  builtin/mailsplit.c |  1 +
>  cache.h             |  6 ------
>  mbox.c              | 15 +++++++++++++++
>  5 files changed, 19 insertions(+), 7 deletions(-)

I think you forgot to "git add" mbox.h. That being said, if we did go
this route, I do not see any reason to share the code at all. This can
be purely a mailinfo.c thing.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-16  0:19                               ` Jeff King
@ 2014-09-16 18:01                                 ` Junio C Hamano
  2014-09-16 18:41                                   ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2014-09-16 18:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

Jeff King <peff@peff.net> writes:

> The only cases that I can think of that would be a problem with this
> strictness are:
>
>   1. Somebody writes format-patch output to a file, reads in the mbox
>      using another program, and then writes out the result (munging the
>      mbox From line). And then pastes the whole thing into their email
>      body.
>
>      I can see the first part happening. But given that it is totally
>      irrelevant _unless_ they then screw up and paste the From line in
>      the body (which is already a corner case), it probably doesn't
>      matter.

Yeah, I tend to agree.

>   2. We change the static From lines that git generates. We can always
>      update the parser, of course, but it may be running a different
>      version of git than the sender.  People with an old git running
>      "git am" would stop skipping past "From" lines in messages from
>      people on newer gits.

I hope that it is not going to happen; the reason we refrain from
ever changing the datestamp has been to keep it constant to help
those who write magic(5), and I do not think we have a reason to
defeat that.

> I think you forgot to "git add" mbox.h. That being said, if we did go
> this route, I do not see any reason to share the code at all. This can
> be purely a mailinfo.c thing.

OK.  A reroll coming today when I find time.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-16 18:01                                 ` Junio C Hamano
@ 2014-09-16 18:41                                   ` Junio C Hamano
  2014-09-16 20:29                                     ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2014-09-16 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

Junio C Hamano <gitster@pobox.com> writes:

>> I think you forgot to "git add" mbox.h. That being said, if we did go
>> this route, I do not see any reason to share the code at all. This can
>> be purely a mailinfo.c thing.
>
> OK.  A reroll coming today when I find time.

-- >8 --
From: Jeff King <peff@peff.net>
Date: Sat, 13 Sep 2014 21:30:38 -0400
Subject: [PATCH] mailinfo: make ">From" in-body header check more robust

Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line inside
body, 2006-05-21), we have treated lines like ">From" in the body as
headers. This makes "git am" work for people who erroneously paste
the whole output from format-patch:

  From 12345abcd...fedcba543210 Mon Sep 17 00:00:00 2001
  From: them
  Subject: [PATCH] whatever

into their email body (assuming that an mbox writer then quotes
"From" as ">From", as otherwise we would actually mailsplit on the
in-body line).

However, this has false positives if somebody actually has a commit
body that starts with "From "; in this case we erroneously remove
the line entirely from the commit message. We can make this check
more robust by making sure the line actually looks like a real mbox
"From" line.

Inspect the line that begins with ">From " a more carefully to only
skip lines that match the expected pattern (note that the datestamp
part of the format-patch output is designed to be kept constant to
help those who write magic(5) entries).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/mailinfo.c         | 17 ++++++++++++++++-
 t/t5100-mailinfo.sh        | 18 ++++++++++++++++++
 t/t5100/embed-from.expect  |  5 +++++
 t/t5100/embed-from.in      | 13 +++++++++++++
 t/t5100/quoted-from.expect |  3 +++
 t/t5100/quoted-from.in     | 10 ++++++++++
 6 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 t/t5100/embed-from.expect
 create mode 100644 t/t5100/embed-from.in
 create mode 100644 t/t5100/quoted-from.expect
 create mode 100644 t/t5100/quoted-from.in

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index cf11c8d..2632fb0 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -288,6 +288,21 @@ static inline int cmp_header(const struct strbuf *line, const char *hdr)
 			line->buf[len] == ':' && isspace(line->buf[len + 1]);
 }
 
+#define SAMPLE "From e6807f3efca28b30decfecb1732a56c7db1137ee Mon Sep 17 00:00:00 2001\n"
+static int is_format_patch_separator(const char *line, int len)
+{
+	const char *cp;
+
+	if (len != strlen(SAMPLE))
+		return 0;
+	if (!skip_prefix(line, "From ", &cp))
+		return 0;
+	if (strspn(cp, "0123456789abcdef") != 40)
+		return 0;
+	cp += 40;
+	return !memcmp(SAMPLE + (cp - line), cp, strlen(SAMPLE) - (cp - line));
+}
+
 static int check_header(const struct strbuf *line,
 				struct strbuf *hdr_data[], int overwrite)
 {
@@ -329,7 +344,7 @@ static int check_header(const struct strbuf *line,
 
 	/* for inbody stuff */
 	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
-		ret = 1; /* Should this return 0? */
+		ret = is_format_patch_separator(line->buf + 1, line->len - 1);
 		goto check_header_out;
 	}
 	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 3e64a7a..9e1ad1c 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name works' '
 
 '
 
+test_expect_success 'mailinfo finds headers after embedded From line' '
+	mkdir embed-from &&
+	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
+	git mailinfo embed-from/msg embed-from/patch \
+	  <embed-from/0001 >embed-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
+'
+
+test_expect_success 'mailinfo on message with quoted >From' '
+	mkdir quoted-from &&
+	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailinfo quoted-from/msg quoted-from/patch \
+	  <quoted-from/0001 >quoted-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+'
+
 test_done
diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect
new file mode 100644
index 0000000..06a3a38
--- /dev/null
+++ b/t/t5100/embed-from.expect
@@ -0,0 +1,5 @@
+Author: Commit Author
+Email: commit@example.com
+Subject: patch subject
+Date: Sat, 13 Sep 2014 21:13:23 -0400 
+
diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in
new file mode 100644
index 0000000..5f3f84e
--- /dev/null
+++ b/t/t5100/embed-from.in
@@ -0,0 +1,13 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Email Author <email@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] email subject
+
+>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Commit Author <commit@example.com>
+Date: Sat, 13 Sep 2014 21:13:23 -0400
+Subject: patch subject
+
+patch body
+---
+patch
diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
new file mode 100644
index 0000000..8c9d48c
--- /dev/null
+++ b/t/t5100/quoted-from.expect
@@ -0,0 +1,3 @@
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
new file mode 100644
index 0000000..847e1c4
--- /dev/null
+++ b/t/t5100/quoted-from.in
@@ -0,0 +1,10 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Author Name <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted >From
+
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
+---
+patch
-- 
2.1.0-420-g23b5121

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers
  2014-09-16 18:41                                   ` Junio C Hamano
@ 2014-09-16 20:29                                     ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2014-09-16 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dan Carpenter, Mark Einon, Greg KH

On Tue, Sep 16, 2014 at 11:41:08AM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> I think you forgot to "git add" mbox.h. That being said, if we did go
> >> this route, I do not see any reason to share the code at all. This can
> >> be purely a mailinfo.c thing.
> >
> > OK.  A reroll coming today when I find time.
> 
> -- >8 --
> From: Jeff King <peff@peff.net>
> Date: Sat, 13 Sep 2014 21:30:38 -0400
> Subject: [PATCH] mailinfo: make ">From" in-body header check more robust
>[...]

This looks good to me. Thanks.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-09-16 20:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1410472786-14552-1-git-send-email-mark.einon@gmail.com>
     [not found] ` <1410472786-14552-5-git-send-email-mark.einon@gmail.com>
2014-09-13  9:37   ` [PATCH 4/8] staging: et131x: Remove ununsed statistics Dan Carpenter
2014-09-13 15:45     ` Greg KH
2014-09-13 19:41       ` Dan Carpenter
2014-09-13 20:36       ` Jeff King
2014-09-13 20:47         ` Mark Einon
2014-09-13 20:57           ` Dan Carpenter
2014-09-13 21:06             ` Mark Einon
2014-09-13 21:09             ` Dan Carpenter
2014-09-13 21:25               ` [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers Jeff King
2014-09-13 22:57                 ` brian m. carlson
2014-09-14  0:47                   ` Jeff King
2014-09-14  0:55                     ` Junio C Hamano
2014-09-14  1:01                       ` Jeff King
2014-09-14  1:30                         ` Jeff King
2014-09-15 18:56                           ` Junio C Hamano
2014-09-15 20:15                             ` Junio C Hamano
2014-09-16  0:19                               ` Jeff King
2014-09-16 18:01                                 ` Junio C Hamano
2014-09-16 18:41                                   ` Junio C Hamano
2014-09-16 20:29                                     ` Jeff King
2014-09-16  0:12                             ` Jeff King
2014-09-15 17:55                         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.