All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH 0/3] support mboxrd format
@ 2016-05-30 23:21 Eric Wong
  2016-05-30 23:21 ` [PATCH 1/3] pretty: support "mboxrd" output format Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Eric Wong @ 2016-05-30 23:21 UTC (permalink / raw)
  To: git

Sometimes users will copy+paste an entire mbox into a
commit message, leading to bad splits when a patch is
output as an email.

Unlike other mbox-family formats, mboxrd allows reversible
round-tripping while avoiding bad splits for old "mboxo"
readers.

I'm also considering altering the current
"From ${COMMIT} Mon Sep 17 00:00:00 2001" line to something
else so mailsplit (or "am") can autodetect.

Maybe:
	From ${COMMIT}@mboxrd Mon Sep 17 00:00:00 2001
?


We may also want to default to single escaping "From " in
commit messages for --pretty=email to avoid corruption when
somebody copy+pastes an mbox into the commit message.
This is a technically incompatible change, but I think it's
preferable to breaking splitting complete.

In other words, --pretty=email changes to output "mboxo"
for now.

Long term (possibly git 3.0?), maybe mboxrd can become the
default mail format.  IMHO, it should've been since 2005.

ref: http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/mail-mbox-formats.html

Eric Wong (3):
      pretty: support "mboxrd" output format
      mailsplit: support unescaping mboxrd messages
      am: support --patch-format=mboxrd

 Documentation/git-am.txt        |  3 ++-
 Documentation/git-mailsplit.txt |  7 ++++++-
 builtin/am.c                    | 14 ++++++++++---
 builtin/log.c                   |  2 +-
 builtin/mailsplit.c             | 23 +++++++++++++++++++++
 commit.h                        |  6 ++++++
 log-tree.c                      |  4 ++--
 pretty.c                        | 45 +++++++++++++++++++++++++++++++++--------
 t/t4014-format-patch.sh         | 27 +++++++++++++++++++++++++
 t/t4150-am.sh                   | 20 ++++++++++++++++++
 t/t5100-mailinfo.sh             | 13 ++++++++++++
 t/t5100/0001mboxrd              |  4 ++++
 t/t5100/0002mboxrd              |  3 +++
 t/t5100/sample.mboxrd           | 17 ++++++++++++++++
 14 files changed, 172 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-30 23:21 [RFC/PATCH 0/3] support mboxrd format Eric Wong
@ 2016-05-30 23:21 ` Eric Wong
  2016-05-31  3:40   ` Eric Sunshine
  2016-05-30 23:21 ` [PATCH 2/3] mailsplit: support unescaping mboxrd messages Eric Wong
  2016-05-30 23:21 ` [PATCH 3/3] am: support --patch-format=mboxrd Eric Wong
  2 siblings, 1 reply; 18+ messages in thread
From: Eric Wong @ 2016-05-30 23:21 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

This output format prevents format-patch output from breaking
readers if somebody copy+pasted an mbox into a commit message.

Unlike the traditional "mboxo" format, "mboxrd" is designed to
be fully-reversible.  "mboxrd" also gracefully degrades to
showing extra ">" in existing "mboxo" readers.

This degradation is preferable to breaking message splitting
completely, a problem I've seen in "mboxcl" due to having
multiple, non-existent, or inaccurate Content-Length headers.

"mboxcl2" is a non-starter since it's inherits the problems
of "mboxcl" while being completely incompatible with existing
tooling based around mailsplit.

ref: http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/mail-mbox-formats.html

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/log.c           |  2 +-
 commit.h                |  6 ++++++
 log-tree.c              |  4 ++--
 pretty.c                | 45 +++++++++++++++++++++++++++++++++++++--------
 t/t4014-format-patch.sh | 27 +++++++++++++++++++++++++++
 5 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 099f4f7..6d6f368 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -953,7 +953,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	struct pretty_print_context pp = {0};
 	struct commit *head = list[0];
 
-	if (rev->commit_format != CMIT_FMT_EMAIL)
+	if (!cmit_fmt_is_mail(rev->commit_format))
 		die(_("Cover letter needs email format"));
 
 	committer = git_committer_info(0);
diff --git a/commit.h b/commit.h
index b06db4d..1e04d3a 100644
--- a/commit.h
+++ b/commit.h
@@ -131,11 +131,17 @@ enum cmit_fmt {
 	CMIT_FMT_FULLER,
 	CMIT_FMT_ONELINE,
 	CMIT_FMT_EMAIL,
+	CMIT_FMT_MBOXRD,
 	CMIT_FMT_USERFORMAT,
 
 	CMIT_FMT_UNSPECIFIED
 };
 
+static inline int cmit_fmt_is_mail(enum cmit_fmt fmt)
+{
+	return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD);
+}
+
 struct pretty_print_context {
 	/*
 	 * Callers should tweak these to change the behavior of pp_* functions.
diff --git a/log-tree.c b/log-tree.c
index 78a5381..48daf84 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -603,7 +603,7 @@ void show_log(struct rev_info *opt)
 	 * Print header line of header..
 	 */
 
-	if (opt->commit_format == CMIT_FMT_EMAIL) {
+	if (cmit_fmt_is_mail(opt->commit_format)) {
 		log_write_email_headers(opt, commit, &ctx.subject, &extra_headers,
 					&ctx.need_8bit_cte);
 	} else if (opt->commit_format != CMIT_FMT_USERFORMAT) {
@@ -694,7 +694,7 @@ void show_log(struct rev_info *opt)
 
 	if ((ctx.fmt != CMIT_FMT_USERFORMAT) &&
 	    ctx.notes_message && *ctx.notes_message) {
-		if (ctx.fmt == CMIT_FMT_EMAIL) {
+		if (cmit_fmt_is_mail(ctx.fmt)) {
 			strbuf_addstr(&msgbuf, "---\n");
 			opt->shown_dashes = 1;
 		}
diff --git a/pretty.c b/pretty.c
index 87c4497..a33e604 100644
--- a/pretty.c
+++ b/pretty.c
@@ -92,6 +92,7 @@ static void setup_commit_formats(void)
 		{ "medium",	CMIT_FMT_MEDIUM,	0,	8 },
 		{ "short",	CMIT_FMT_SHORT,		0,	0 },
 		{ "email",	CMIT_FMT_EMAIL,		0,	0 },
+		{ "mboxrd",	CMIT_FMT_MBOXRD,	0,	0 },
 		{ "fuller",	CMIT_FMT_FULLER,	0,	8 },
 		{ "full",	CMIT_FMT_FULL,		0,	8 },
 		{ "oneline",	CMIT_FMT_ONELINE,	1,	0 }
@@ -444,7 +445,7 @@ void pp_user_info(struct pretty_print_context *pp,
 	if (pp->mailmap)
 		map_user(pp->mailmap, &mailbuf, &maillen, &namebuf, &namelen);
 
-	if (pp->fmt == CMIT_FMT_EMAIL) {
+	if (cmit_fmt_is_mail(pp->fmt)) {
 		if (pp->from_ident && ident_cmp(pp->from_ident, &ident)) {
 			struct strbuf buf = STRBUF_INIT;
 
@@ -494,6 +495,7 @@ void pp_user_info(struct pretty_print_context *pp,
 			    show_ident_date(&ident, &pp->date_mode));
 		break;
 	case CMIT_FMT_EMAIL:
+	case CMIT_FMT_MBOXRD:
 		strbuf_addf(sb, "Date: %s\n",
 			    show_ident_date(&ident, DATE_MODE(RFC2822)));
 		break;
@@ -535,7 +537,7 @@ static void add_merge_info(const struct pretty_print_context *pp,
 {
 	struct commit_list *parent = commit->parents;
 
-	if ((pp->fmt == CMIT_FMT_ONELINE) || (pp->fmt == CMIT_FMT_EMAIL) ||
+	if ((pp->fmt == CMIT_FMT_ONELINE) || (cmit_fmt_is_mail(pp->fmt)) ||
 	    !parent || !parent->next)
 		return;
 
@@ -1614,7 +1616,7 @@ void pp_title_line(struct pretty_print_context *pp,
 	if (pp->after_subject) {
 		strbuf_addstr(sb, pp->after_subject);
 	}
-	if (pp->fmt == CMIT_FMT_EMAIL) {
+	if (cmit_fmt_is_mail(pp->fmt)) {
 		strbuf_addch(sb, '\n');
 	}
 
@@ -1697,12 +1699,34 @@ static void pp_handle_indent(struct pretty_print_context *pp,
 		strbuf_add(sb, line, linelen);
 }
 
+static regex_t *mboxrd_prepare(void)
+{
+	static regex_t preg;
+	const char re[] = "^>*From ";
+	int err = regcomp(&preg, re, REG_NOSUB | REG_EXTENDED);
+
+	if (err) {
+		char errbuf[1024];
+
+		regerror(err, &preg, errbuf, sizeof(errbuf));
+		regfree(&preg);
+		die("Cannot prepare regexp `%s': %s", re, errbuf);
+	}
+
+	return &preg;
+}
+
 void pp_remainder(struct pretty_print_context *pp,
 		  const char **msg_p,
 		  struct strbuf *sb,
 		  int indent)
 {
 	int first = 1;
+	static regex_t *mboxrd_from;
+
+	if (pp->fmt == CMIT_FMT_MBOXRD && !mboxrd_from)
+		mboxrd_from = mboxrd_prepare();
+
 	for (;;) {
 		const char *line = *msg_p;
 		int linelen = get_one_line(line);
@@ -1725,8 +1749,13 @@ void pp_remainder(struct pretty_print_context *pp,
 		else if (pp->expand_tabs_in_log)
 			strbuf_add_tabexpand(sb, pp->expand_tabs_in_log,
 					     line, linelen);
-		else
+		else {
+			if (pp->fmt == CMIT_FMT_MBOXRD &&
+					!regexec(mboxrd_from, line, 0, 0, 0))
+				strbuf_addch(sb, '>');
+
 			strbuf_add(sb, line, linelen);
+		}
 		strbuf_addch(sb, '\n');
 	}
 }
@@ -1750,14 +1779,14 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	encoding = get_log_output_encoding();
 	msg = reencoded = logmsg_reencode(commit, NULL, encoding);
 
-	if (pp->fmt == CMIT_FMT_ONELINE || pp->fmt == CMIT_FMT_EMAIL)
+	if (pp->fmt == CMIT_FMT_ONELINE || cmit_fmt_is_mail(pp->fmt))
 		indent = 0;
 
 	/*
 	 * We need to check and emit Content-type: to mark it
 	 * as 8-bit if we haven't done so.
 	 */
-	if (pp->fmt == CMIT_FMT_EMAIL && need_8bit_cte == 0) {
+	if (cmit_fmt_is_mail(pp->fmt) && need_8bit_cte == 0) {
 		int i, ch, in_body;
 
 		for (in_body = i = 0; (ch = msg[i]); i++) {
@@ -1785,7 +1814,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	msg = skip_empty_lines(msg);
 
 	/* These formats treat the title line specially. */
-	if (pp->fmt == CMIT_FMT_ONELINE || pp->fmt == CMIT_FMT_EMAIL)
+	if (pp->fmt == CMIT_FMT_ONELINE || cmit_fmt_is_mail(pp->fmt))
 		pp_title_line(pp, &msg, sb, encoding, need_8bit_cte);
 
 	beginning_of_body = sb->len;
@@ -1802,7 +1831,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	 * format.  Make sure we did not strip the blank line
 	 * between the header and the body.
 	 */
-	if (pp->fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
+	if (cmit_fmt_is_mail(pp->fmt) && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
 
 	unuse_commit_buffer(commit, reencoded);
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 8049cad..0a2070e 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1565,4 +1565,31 @@ test_expect_success 'format-patch --base overrides format.useAutoBase' '
 	test_cmp expected actual
 '
 
+test_expect_success 'format-patch --pretty=mboxrd' '
+	cat >msg <<-INPUT_END &&
+	mboxrd should escape the body
+
+	From could trip up a loose mbox parser
+	>From extra escape for reversibility
+	>>From extra escape for reversibility 2
+	from lower case not escaped
+	Fromm bad speling not escaped
+	 From with leading space not escaped
+	INPUT_END
+
+	cat >expect <<-INPUT_END &&
+	>From could trip up a loose mbox parser
+	>>From extra escape for reversibility
+	>>>From extra escape for reversibility 2
+	from lower case not escaped
+	Fromm bad speling not escaped
+	 From with leading space not escaped
+	INPUT_END
+
+	C=$(git commit-tree HEAD^^{tree} -p HEAD <msg) &&
+	git format-patch --pretty=mboxrd --stdout -1 $C~1..$C >patch &&
+	grep -A5 "^>From could trip up a loose mbox parser" patch >actual &&
+	test_cmp expect actual
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] mailsplit: support unescaping mboxrd messages
  2016-05-30 23:21 [RFC/PATCH 0/3] support mboxrd format Eric Wong
  2016-05-30 23:21 ` [PATCH 1/3] pretty: support "mboxrd" output format Eric Wong
@ 2016-05-30 23:21 ` Eric Wong
  2016-05-30 23:21 ` [PATCH 3/3] am: support --patch-format=mboxrd Eric Wong
  2 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2016-05-30 23:21 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

This will allow us to parse the output of --pretty=mboxrd
and the output of other mboxrd generators.

Signed-off-by: Eric Wong <e@80x24.org>
---
 Documentation/git-mailsplit.txt |  7 ++++++-
 builtin/mailsplit.c             | 23 +++++++++++++++++++++++
 t/t5100-mailinfo.sh             | 13 +++++++++++++
 t/t5100/0001mboxrd              |  4 ++++
 t/t5100/0002mboxrd              |  3 +++
 t/t5100/sample.mboxrd           | 17 +++++++++++++++++
 6 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 t/t5100/0001mboxrd
 create mode 100644 t/t5100/0002mboxrd
 create mode 100644 t/t5100/sample.mboxrd

diff --git a/Documentation/git-mailsplit.txt b/Documentation/git-mailsplit.txt
index 4d1b871..e3b2a88 100644
--- a/Documentation/git-mailsplit.txt
+++ b/Documentation/git-mailsplit.txt
@@ -8,7 +8,8 @@ git-mailsplit - Simple UNIX mbox splitter program
 SYNOPSIS
 --------
 [verse]
-'git mailsplit' [-b] [-f<nn>] [-d<prec>] [--keep-cr] -o<directory> [--] [(<mbox>|<Maildir>)...]
+'git mailsplit' [-b] [-f<nn>] [-d<prec>] [--keep-cr] [--mboxrd]
+		-o<directory> [--] [(<mbox>|<Maildir>)...]
 
 DESCRIPTION
 -----------
@@ -47,6 +48,10 @@ OPTIONS
 --keep-cr::
 	Do not remove `\r` from lines ending with `\r\n`.
 
+--mboxrd::
+	Input is of the "mboxrd" format and "^>+From " line escaping is
+	reversed.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 4859ede..fad871d 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -45,6 +45,7 @@ static int is_from_line(const char *line, int len)
 
 static struct strbuf buf = STRBUF_INIT;
 static int keep_cr;
+static regex_t *gtfrom;
 
 /* Called with the first line (potentially partial)
  * already in buf[] -- normally that should begin with
@@ -77,6 +78,10 @@ static int split_one(FILE *mbox, const char *name, int allow_bare)
 			strbuf_addch(&buf, '\n');
 		}
 
+		if (gtfrom && buf.len > (sizeof(">From ") - 1) &&
+				!regexec(gtfrom, buf.buf, 0, 0, 0))
+			strbuf_remove(&buf, 0, 1);
+
 		if (fwrite(buf.buf, 1, buf.len, output) != buf.len)
 			die_errno("cannot write output");
 
@@ -242,6 +247,22 @@ out:
 	return ret;
 }
 
+static regex_t *gtfrom_prepare(void)
+{
+	static regex_t preg;
+	const char re[] = "^>+From ";
+	int err = regcomp(&preg, re, REG_NOSUB | REG_EXTENDED);
+
+	if (err) {
+		char errbuf[1024];
+		regerror(err, &preg, errbuf, sizeof(errbuf));
+		regfree(&preg);
+		die("Cannot prepare regexp `%s': %s", re, errbuf);
+	}
+
+	return &preg;
+}
+
 int cmd_mailsplit(int argc, const char **argv, const char *prefix)
 {
 	int nr = 0, nr_prec = 4, num = 0;
@@ -271,6 +292,8 @@ int cmd_mailsplit(int argc, const char **argv, const char *prefix)
 			keep_cr = 1;
 		} else if ( arg[1] == 'o' && arg[2] ) {
 			dir = arg+2;
+		} else if (!strcmp(arg, "--mboxrd")) {
+			gtfrom = gtfrom_prepare();
 		} else if ( arg[1] == '-' && !arg[2] ) {
 			argp++;	/* -- marks end of options */
 			break;
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 85b3df5..62b442c 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -111,4 +111,17 @@ test_expect_success 'mailinfo on message with quoted >From' '
 	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
 '
 
+test_expect_success 'mailinfo unescapes with --mboxrd' '
+	mkdir mboxrd &&
+	git mailsplit -omboxrd --mboxrd \
+		"$TEST_DIRECTORY"/t5100/sample.mboxrd >last &&
+	test x"$(cat last)" = x2 &&
+	for i in 0001 0002
+	do
+		git mailinfo mboxrd/msg mboxrd/patch \
+		  <mboxrd/$i >mboxrd/out &&
+		test_cmp "$TEST_DIRECTORY"/t5100/${i}mboxrd mboxrd/msg
+	done
+'
+
 test_done
diff --git a/t/t5100/0001mboxrd b/t/t5100/0001mboxrd
new file mode 100644
index 0000000..494ec55
--- /dev/null
+++ b/t/t5100/0001mboxrd
@@ -0,0 +1,4 @@
+From the beginning, mbox should have been mboxrd
+>From escaped
+From not mangled but this line should have been escaped
+
diff --git a/t/t5100/0002mboxrd b/t/t5100/0002mboxrd
new file mode 100644
index 0000000..3c30b0b
--- /dev/null
+++ b/t/t5100/0002mboxrd
@@ -0,0 +1,3 @@
+ >From unchanged
+ From also unchanged
+
diff --git a/t/t5100/sample.mboxrd b/t/t5100/sample.mboxrd
new file mode 100644
index 0000000..75de6dd
--- /dev/null
+++ b/t/t5100/sample.mboxrd
@@ -0,0 +1,17 @@
+From mboxrd Mon Sep 17 00:00:00 2001
+From: mboxrd writer <mboxrd@example.com>
+Date: Fri, 9 Jun 2006 00:44:16 -0700
+Subject: [PATCH] a commit with escaped From lines
+
+>From the beginning, mbox should have been mboxrd
+>>From escaped
+From not mangled but this line should have been escaped
+
+From mboxrd Mon Sep 17 00:00:00 2001
+From: mboxrd writer <mboxrd@example.com>
+Date: Fri, 9 Jun 2006 00:44:16 -0700
+Subject: [PATCH 2/2] another with fake From lines
+
+ >From unchanged
+ From also unchanged
+

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] am: support --patch-format=mboxrd
  2016-05-30 23:21 [RFC/PATCH 0/3] support mboxrd format Eric Wong
  2016-05-30 23:21 ` [PATCH 1/3] pretty: support "mboxrd" output format Eric Wong
  2016-05-30 23:21 ` [PATCH 2/3] mailsplit: support unescaping mboxrd messages Eric Wong
@ 2016-05-30 23:21 ` Eric Wong
  2 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2016-05-30 23:21 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

Combined with "git format-patch --pretty=mboxrd", this should
allow us to round-trip commit messages with embedded mbox
"From " lines without corruption.

Signed-off-by: Eric Wong <e@80x24.org>
---
 Documentation/git-am.txt |  3 ++-
 builtin/am.c             | 14 +++++++++++---
 t/t4150-am.sh            | 20 ++++++++++++++++++++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-am.txt b/Documentation/git-am.txt
index 13cdd7f..6348c29 100644
--- a/Documentation/git-am.txt
+++ b/Documentation/git-am.txt
@@ -116,7 +116,8 @@ default.   You can use `--no-utf8` to override this.
 	By default the command will try to detect the patch format
 	automatically. This option allows the user to bypass the automatic
 	detection and specify the patch format that the patch(es) should be
-	interpreted as. Valid formats are mbox, stgit, stgit-series and hg.
+	interpreted as. Valid formats are mbox, mboxrd,
+	stgit, stgit-series and hg.
 
 -i::
 --interactive::
diff --git a/builtin/am.c b/builtin/am.c
index 3dfe70b..d5da5fe 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -70,7 +70,8 @@ enum patch_format {
 	PATCH_FORMAT_MBOX,
 	PATCH_FORMAT_STGIT,
 	PATCH_FORMAT_STGIT_SERIES,
-	PATCH_FORMAT_HG
+	PATCH_FORMAT_HG,
+	PATCH_FORMAT_MBOXRD
 };
 
 enum keep_type {
@@ -712,7 +713,8 @@ done:
  * Splits out individual email patches from `paths`, where each path is either
  * a mbox file or a Maildir. Returns 0 on success, -1 on failure.
  */
-static int split_mail_mbox(struct am_state *state, const char **paths, int keep_cr)
+static int split_mail_mbox(struct am_state *state, const char **paths,
+				int keep_cr, int mboxrd)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	struct strbuf last = STRBUF_INIT;
@@ -724,6 +726,8 @@ static int split_mail_mbox(struct am_state *state, const char **paths, int keep_
 	argv_array_push(&cp.args, "-b");
 	if (keep_cr)
 		argv_array_push(&cp.args, "--keep-cr");
+	if (mboxrd)
+		argv_array_push(&cp.args, "--mboxrd");
 	argv_array_push(&cp.args, "--");
 	argv_array_pushv(&cp.args, paths);
 
@@ -965,13 +969,15 @@ static int split_mail(struct am_state *state, enum patch_format patch_format,
 
 	switch (patch_format) {
 	case PATCH_FORMAT_MBOX:
-		return split_mail_mbox(state, paths, keep_cr);
+		return split_mail_mbox(state, paths, keep_cr, 0);
 	case PATCH_FORMAT_STGIT:
 		return split_mail_conv(stgit_patch_to_mail, state, paths, keep_cr);
 	case PATCH_FORMAT_STGIT_SERIES:
 		return split_mail_stgit_series(state, paths, keep_cr);
 	case PATCH_FORMAT_HG:
 		return split_mail_conv(hg_patch_to_mail, state, paths, keep_cr);
+	case PATCH_FORMAT_MBOXRD:
+		return split_mail_mbox(state, paths, keep_cr, 1);
 	default:
 		die("BUG: invalid patch_format");
 	}
@@ -2201,6 +2207,8 @@ static int parse_opt_patchformat(const struct option *opt, const char *arg, int
 		*opt_value = PATCH_FORMAT_STGIT_SERIES;
 	else if (!strcmp(arg, "hg"))
 		*opt_value = PATCH_FORMAT_HG;
+	else if (!strcmp(arg, "mboxrd"))
+		*opt_value = PATCH_FORMAT_MBOXRD;
 	else
 		return error(_("Invalid value for --patch-format: %s"), arg);
 	return 0;
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index b41bd17..74e093d 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -957,4 +957,24 @@ test_expect_success 'am -s unexpected trailer block' '
 	test_cmp expect actual
 '
 
+test_expect_success 'am --patch-format=mboxrd handles mboxrd' '
+	rm -fr .git/rebase-apply &&
+	git checkout -f first &&
+	echo mboxrd >>file &&
+	git add file &&
+	cat >msg <<-INPUT_END &&
+	mboxrd should escape the body
+
+	From could trip up a loose mbox parser
+	>From extra escape for reversibility
+	INPUT_END
+	git commit -F msg &&
+	git format-patch --pretty=mboxrd --stdout -1 >mboxrd1 &&
+	grep "^>From could trip up a loose mbox parser" mboxrd1 &&
+	git checkout -f first &&
+	git am --patch-format=mboxrd mboxrd1 &&
+	git cat-file commit HEAD | tail -n4 >out &&
+	test_cmp msg out
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-30 23:21 ` [PATCH 1/3] pretty: support "mboxrd" output format Eric Wong
@ 2016-05-31  3:40   ` Eric Sunshine
  2016-05-31  7:45     ` Eric Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-05-31  3:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git List

On Mon, May 30, 2016 at 7:21 PM, Eric Wong <e@80x24.org> wrote:
> This output format prevents format-patch output from breaking
> readers if somebody copy+pasted an mbox into a commit message.
>
> Unlike the traditional "mboxo" format, "mboxrd" is designed to
> be fully-reversible.  "mboxrd" also gracefully degrades to
> showing extra ">" in existing "mboxo" readers.
> [...]
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
> diff --git a/pretty.c b/pretty.c
> @@ -1697,12 +1699,34 @@ static void pp_handle_indent(struct pretty_print_context *pp,
> +static regex_t *mboxrd_prepare(void)
> +{
> +       static regex_t preg;
> +       const char re[] = "^>*From ";
> +       int err = regcomp(&preg, re, REG_NOSUB | REG_EXTENDED);
> +[...]
> +       return &preg;
> +}
> +
>  void pp_remainder(struct pretty_print_context *pp,
>                   const char **msg_p,
>                   struct strbuf *sb,
>                   int indent)
>  {
> +       static regex_t *mboxrd_from;
> +
> +       if (pp->fmt == CMIT_FMT_MBOXRD && !mboxrd_from)
> +               mboxrd_from = mboxrd_prepare();
> +
> @@ -1725,8 +1749,13 @@ void pp_remainder(struct pretty_print_context *pp,
>                 else if (pp->expand_tabs_in_log)
>                         strbuf_add_tabexpand(sb, pp->expand_tabs_in_log,
>                                              line, linelen);
> -               else
> +               else {
> +                       if (pp->fmt == CMIT_FMT_MBOXRD &&
> +                                       !regexec(mboxrd_from, line, 0, 0, 0))
> +                               strbuf_addch(sb, '>');

At first glance, this seems dangerous since it's handing 'line' to
regexec() without paying attention to 'linelen'. For an arbitrary
regex, this could result in erroneous matches on subsequent "lines",
however, since this expression is anchored with '^', it's not a
problem. But, it is a bit subtle.

I wonder if hand-coding, rather than using a regex, could be an improvement:

    static int is_mboxrd_from(const char *s, size_t n)
    {
        size_t f = strlen("From ");
        const char *t = s + n;

        while (s < t && *s == '>')
            s++;
        return t - s >= f && !memcmp(s, "From ", f);
    }

    ...
    if (is_mboxrd_from(line, linelen)
        strbuf_addch(sb, '>');

or something.

> +
>                         strbuf_add(sb, line, linelen);
> +               }
>                 strbuf_addch(sb, '\n');
>         }
>  }
> diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> @@ -1565,4 +1565,31 @@ test_expect_success 'format-patch --base overrides format.useAutoBase' '
> +test_expect_success 'format-patch --pretty=mboxrd' '
> +       cat >msg <<-INPUT_END &&

Maybe use <<-\INPUT_END to indicate that no variable interpolation is
expected. Ditto below.

> +       mboxrd should escape the body
> +
> +       From could trip up a loose mbox parser
> +       >From extra escape for reversibility
> +       >>From extra escape for reversibility 2
> +       from lower case not escaped
> +       Fromm bad speling not escaped
> +        From with leading space not escaped
> +       INPUT_END
> +
> +       cat >expect <<-INPUT_END &&
> +       >From could trip up a loose mbox parser
> +       >>From extra escape for reversibility
> +       >>>From extra escape for reversibility 2
> +       from lower case not escaped
> +       Fromm bad speling not escaped
> +        From with leading space not escaped
> +       INPUT_END
> +
> +       C=$(git commit-tree HEAD^^{tree} -p HEAD <msg) &&
> +       git format-patch --pretty=mboxrd --stdout -1 $C~1..$C >patch &&
> +       grep -A5 "^>From could trip up a loose mbox parser" patch >actual &&

Hmm, -A is not POSIX and is otherwise not used in Git tests. Perhaps
you could use 'git grep --no-index -A' instead or something?

> +       test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-31  3:40   ` Eric Sunshine
@ 2016-05-31  7:45     ` Eric Wong
  2016-05-31 18:10       ` Eric Sunshine
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Wong @ 2016-05-31  7:45 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, May 30, 2016 at 7:21 PM, Eric Wong <e@80x24.org> wrote:
> > diff --git a/pretty.c b/pretty.c
> > @@ -1697,12 +1699,34 @@ static void pp_handle_indent(struct pretty_print_context *pp,
> > +static regex_t *mboxrd_prepare(void)
> > +{
> > +       static regex_t preg;
> > +       const char re[] = "^>*From ";
> > +       int err = regcomp(&preg, re, REG_NOSUB | REG_EXTENDED);
> > +[...]
> > +       return &preg;
> > +}
> > +
> >  void pp_remainder(struct pretty_print_context *pp,
> >                   const char **msg_p,
> >                   struct strbuf *sb,
> >                   int indent)
> >  {
> > +       static regex_t *mboxrd_from;
> > +
> > +       if (pp->fmt == CMIT_FMT_MBOXRD && !mboxrd_from)
> > +               mboxrd_from = mboxrd_prepare();
> > +
> > @@ -1725,8 +1749,13 @@ void pp_remainder(struct pretty_print_context *pp,
> >                 else if (pp->expand_tabs_in_log)
> >                         strbuf_add_tabexpand(sb, pp->expand_tabs_in_log,
> >                                              line, linelen);
> > -               else
> > +               else {
> > +                       if (pp->fmt == CMIT_FMT_MBOXRD &&
> > +                                       !regexec(mboxrd_from, line, 0, 0, 0))
> > +                               strbuf_addch(sb, '>');
> 
> At first glance, this seems dangerous since it's handing 'line' to
> regexec() without paying attention to 'linelen'. For an arbitrary
> regex, this could result in erroneous matches on subsequent "lines",
> however, since this expression is anchored with '^', it's not a
> problem. But, it is a bit subtle.

Maybe having more context of the pp_remainder function and
seeing the get_one_line call would've helped in the diff;
I didn't think of this issue once I figured out where to
place the change.

On the other hand, not being too familiar with git C APIs, I was
more worried strbuf was not NUL-terminated for regexec, but it
seems to be.

> I wonder if hand-coding, rather than using a regex, could be an improvement:
> 
>     static int is_mboxrd_from(const char *s, size_t n)
>     {
>         size_t f = strlen("From ");
>         const char *t = s + n;
> 
>         while (s < t && *s == '>')
>             s++;
>         return t - s >= f && !memcmp(s, "From ", f);
>     }
> 
> or something.

Yikes.  I mostly work in high-level languages and do my best to
avoid string parsing in C; so that scares me.  A lot.

I admit a regex isn't necessary, but I'm wondering if the above
could be made less frightening to someone like me.

Or maybe I'm just easily frightened :x

Maybe extra test cases + valgrind can quell my fears :)

> > diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> > @@ -1565,4 +1565,31 @@ test_expect_success 'format-patch --base overrides format.useAutoBase' '
> > +test_expect_success 'format-patch --pretty=mboxrd' '
> > +       cat >msg <<-INPUT_END &&
> 
> Maybe use <<-\INPUT_END to indicate that no variable interpolation is
> expected. Ditto below.

Noted, though I may add variable interpolation to test a line
with trailing whitespace (just "From ") to ensure we don't
overrun a buffer.

> Hmm, -A is not POSIX and is otherwise not used in Git tests. Perhaps
> you could use 'git grep --no-index -A' instead or something?

Noted and will fix for v2.  Will wait a day or two for
further comments on this series.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-31  7:45     ` Eric Wong
@ 2016-05-31 18:10       ` Eric Sunshine
  2016-05-31 18:29         ` Eric Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-05-31 18:10 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git List

On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
> Eric Sunshine <sunshine@sunshineco.com> wrote:
>> On Mon, May 30, 2016 at 7:21 PM, Eric Wong <e@80x24.org> wrote:
>> > +                       if (pp->fmt == CMIT_FMT_MBOXRD &&
>> > +                                       !regexec(mboxrd_from, line, 0, 0, 0))
>> > +                               strbuf_addch(sb, '>');
>>
>> At first glance, this seems dangerous since it's handing 'line' to
>> regexec() without paying attention to 'linelen'. For an arbitrary
>> regex, this could result in erroneous matches on subsequent "lines",
>> however, since this expression is anchored with '^', it's not a
>> problem. But, it is a bit subtle.
>
> Maybe having more context of the pp_remainder function and
> seeing the get_one_line call would've helped in the diff;
> I didn't think of this issue once I figured out where to
> place the change.

No, extra context wouldn't have helped. The problem is that
get_one_line() merely returns the length of the line, which might be
where the NUL-terminator is or it might be where the next newline is.
Therefore, you can't rely upon the "current line" being
NUL-terminated. So, in general, it's not "safe" to pass it to a
function expecting the "line" to end at a NUL-terminator.

> On the other hand, not being too familiar with git C APIs, I was
> more worried strbuf was not NUL-terminated for regexec, but it
> seems to be.

Yes, that's a guarantee, but it doesn't help in this case. Given
line="foo\nbar", get_one_line(line) will return 4, the length of
"foo\n", but regexec() won't know to stop looking until it hits the
NUL after the 'r'. An arbitrary regex, such as /bar/ will match beyond
what get_one_line() considers the end-of-line, which is why this code
looks scary (wrong) at first glance.

>> I wonder if hand-coding, rather than using a regex, could be an improvement:
>>
>>     static int is_mboxrd_from(const char *s, size_t n)
>>     {
>>         size_t f = strlen("From ");
>>         const char *t = s + n;
>>
>>         while (s < t && *s == '>')
>>             s++;
>>         return t - s >= f && !memcmp(s, "From ", f);
>>     }
>>
>> or something.
>
> Yikes.  I mostly work in high-level languages and do my best to
> avoid string parsing in C; so that scares me.  A lot.

The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
(I think) typical of how such a function would be coded in Git itself,
so it looks normal and easy to grok to me (but, of course, I'm
probably biased since I wrote it).

> I admit a regex isn't necessary, but I'm wondering if the above
> could be made less frightening to someone like me.

Perhaps, but it's difficult to say without knowing how it frightens you.

> Maybe extra test cases + valgrind can quell my fears :)

I can envision tests such as:

    ""
    "F"
    "From"
    "From "
    "From     "
    "From foobar"

and so on, if that's what you mean.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-31 18:10       ` Eric Sunshine
@ 2016-05-31 18:29         ` Eric Wong
  2016-05-31 20:12           ` Eric Sunshine
  2016-06-02  7:51           ` Eric Wong
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Wong @ 2016-05-31 18:29 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
> > Eric Sunshine <sunshine@sunshineco.com> wrote:

<snip>  Ah thanks, I can see your point-of-view, now.
I always had the '^' in my mind since I've written the same
thing in Perl and Ruby.

> >> I wonder if hand-coding, rather than using a regex, could be an improvement:
> >>
> >>     static int is_mboxrd_from(const char *s, size_t n)
> >>     {
> >>         size_t f = strlen("From ");
> >>         const char *t = s + n;
> >>
> >>         while (s < t && *s == '>')
> >>             s++;
> >>         return t - s >= f && !memcmp(s, "From ", f);
> >>     }
> >>
> >> or something.
> >
> > Yikes.  I mostly work in high-level languages and do my best to
> > avoid string parsing in C; so that scares me.  A lot.
> 
> The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
> (I think) typical of how such a function would be coded in Git itself,
> so it looks normal and easy to grok to me (but, of course, I'm
> probably biased since I wrote it).
> 
> > I admit a regex isn't necessary, but I'm wondering if the above
> > could be made less frightening to someone like me.
> 
> Perhaps, but it's difficult to say without knowing how it frightens you.

Pointer arithmetic leading to buffer overruns;
but yeah, I don't do string parsing in C often, if ever.

> > Maybe extra test cases + valgrind can quell my fears :)
> 
> I can envision tests such as:
> 
>     ""
>     "F"
>     "From"
>     "From "
>     "From     "
>     "From foobar"
> 
> and so on, if that's what you mean.

Yes, I also noticed trailing spaces are dropped anyways, so
there's no perfect round-tripping going on.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-31 18:29         ` Eric Wong
@ 2016-05-31 20:12           ` Eric Sunshine
  2016-05-31 20:19             ` Eric Sunshine
  2016-06-02  7:51           ` Eric Wong
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-05-31 20:12 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git List

On Tue, May 31, 2016 at 2:29 PM, Eric Wong <e@80x24.org> wrote:
> Eric Sunshine <sunshine@sunshineco.com> wrote:
>> On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
>> > Eric Sunshine <sunshine@sunshineco.com> wrote:
> <snip>  Ah thanks, I can see your point-of-view, now.
> I always had the '^' in my mind since I've written the same
> thing in Perl and Ruby.

On reflection, even with the '^' anchor, it isn't safe the way it's
coded since '^' will match following a newline, won't it? Therefore,
because 'line' isn't necessarily NUL-terminated, the pattern could
match on some line beyond what get_one_line() considers the "current
line".

This makes the hand-coded is_mboxrd_from() even more attractive; plus
you can re-use it in patch 2.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-31 20:12           ` Eric Sunshine
@ 2016-05-31 20:19             ` Eric Sunshine
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Sunshine @ 2016-05-31 20:19 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git List

On Tue, May 31, 2016 at 4:12 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On reflection, even with the '^' anchor, it isn't safe the way it's
> coded since '^' will match following a newline, won't it? Therefore,
> because 'line' isn't necessarily NUL-terminated, the pattern could
> match on some line beyond what get_one_line() considers the "current
> line".

I guess that behavior only kicks in if you specify REG_NEWLINE, which
you didn't.

> This makes the hand-coded is_mboxrd_from() even more attractive; plus
> you can re-use it in patch 2.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-05-31 18:29         ` Eric Wong
  2016-05-31 20:12           ` Eric Sunshine
@ 2016-06-02  7:51           ` Eric Wong
  2016-06-03 22:22             ` Eric Sunshine
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Wong @ 2016-06-02  7:51 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

Eric Wong <e@80x24.org> wrote:
> Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
> > > Eric Sunshine <sunshine@sunshineco.com> wrote:
> > >> I wonder if hand-coding, rather than using a regex, could be an improvement:
> > >>
> > >>     static int is_mboxrd_from(const char *s, size_t n)
> > >>     {
> > >>         size_t f = strlen("From ");
> > >>         const char *t = s + n;
> > >>
> > >>         while (s < t && *s == '>')
> > >>             s++;
> > >>         return t - s >= f && !memcmp(s, "From ", f);
> > >>     }
> > >>
> > >> or something.
> > >
> > > Yikes.  I mostly work in high-level languages and do my best to
> > > avoid string parsing in C; so that scares me.  A lot.
> > 
> > The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
> > (I think) typical of how such a function would be coded in Git itself,
> > so it looks normal and easy to grok to me (but, of course, I'm
> > probably biased since I wrote it).

For reference, here is the gfrom function from qmail (gfrom.c,
source package netqmail-1.06 in Debian, reformatted git style)

int gfrom(char *s, int len)
{
	while ((len > 0) && (*s == '>')) {
		++s;
		--len;
	}

	return (len >= 5) && !str_diffn(s, "From ", 5);
}

Similar to yours, but a several small things improves
readability for me:

* the avoidance of subtraction from the "return" conditional
* s/n/len/ variable name
* extra parentheses
* removal of "t" variable (t for "terminal/termination"?)

str_diffn is memcmp-like, I assume.  My eyes glazed over
when I saw that function implemented in str_diffn.c, too.

Just thinking out loud, with sufficient tests I could go with
either.  Will reroll when/if I get the chance tomorrow.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-02  7:51           ` Eric Wong
@ 2016-06-03 22:22             ` Eric Sunshine
  2016-06-03 22:36               ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-06-03 22:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git List, Junio C Hamano

[cc:+junio]

On Thu, Jun 2, 2016 at 3:51 AM, Eric Wong <e@80x24.org> wrote:
> Eric Wong <e@80x24.org> wrote:
>> Eric Sunshine <sunshine@sunshineco.com> wrote:
>> > On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
>> > > Eric Sunshine <sunshine@sunshineco.com> wrote:
>> > >> I wonder if hand-coding, rather than using a regex, could be an improvement:
>> > >>
>> > >>     static int is_mboxrd_from(const char *s, size_t n)
>> > >>     {
>> > >>         size_t f = strlen("From ");
>> > >>         const char *t = s + n;
>> > >>
>> > >>         while (s < t && *s == '>')
>> > >>             s++;
>> > >>         return t - s >= f && !memcmp(s, "From ", f);
>> > >>     }
>> > >>
>> > >> or something.
>> > >
>> > > Yikes.  I mostly work in high-level languages and do my best to
>> > > avoid string parsing in C; so that scares me.  A lot.
>> >
>> > The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
>> > (I think) typical of how such a function would be coded in Git itself,
>> > so it looks normal and easy to grok to me (but, of course, I'm
>> > probably biased since I wrote it).
>
> For reference, here is the gfrom function from qmail (gfrom.c,
> source package netqmail-1.06 in Debian, reformatted git style)
>
> int gfrom(char *s, int len)
> {
>         while ((len > 0) && (*s == '>')) {
>                 ++s;
>                 --len;
>         }
>
>         return (len >= 5) && !str_diffn(s, "From ", 5);
> }

Seems less idiomatic and less like what we might see elsewhere in the
Git codebase, but that's subjective. Functionally, it appears correct.

> Similar to yours, but a several small things improves
> readability for me:
>
> * the avoidance of subtraction from the "return" conditional
> * s/n/len/ variable name

Idiomatic C code favors concise names such as 'i', 'j', or 'n', for
instance, but I don't care strongly.

> * extra parentheses

Unnecessary syntactic noise (consuming reviewer brain cycles).

> * removal of "t" variable (t for "terminal/termination"?)

Heh, no, just the next letter after 's'. Again, just an idiom, as 'i',
'j', 'k' are often used for integers, 's' and 't' are common for
strings.

> str_diffn is memcmp-like, I assume.  My eyes glazed over
> when I saw that function implemented in str_diffn.c, too.
>
> Just thinking out loud, with sufficient tests I could go with
> either.  Will reroll when/if I get the chance tomorrow.

As mentioned above, it's all subjective and, of course, I have a bias
toward the example I provided, but don't otherwise feel strongly about
it. I do, however, like the idea of using a simple hand-coded matching
function over the regex (but no so much that I would complain about
it). Use whatever you and Junio feel is appropriate.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-03 22:22             ` Eric Sunshine
@ 2016-06-03 22:36               ` Junio C Hamano
  2016-06-03 22:59                 ` Eric Sunshine
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2016-06-03 22:36 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Eric Wong, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Thu, Jun 2, 2016 at 3:51 AM, Eric Wong <e@80x24.org> wrote:
>> Eric Wong <e@80x24.org> wrote:
>>> Eric Sunshine <sunshine@sunshineco.com> wrote:
>>> > On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
>>> > > Eric Sunshine <sunshine@sunshineco.com> wrote:
>>> > >> I wonder if hand-coding, rather than using a regex, could be an improvement:
>>> > >>
>>> > >>     static int is_mboxrd_from(const char *s, size_t n)
>>> > >>     {
>>> > >>         size_t f = strlen("From ");
>>> > >>         const char *t = s + n;
>>> > >>
>>> > >>         while (s < t && *s == '>')
>>> > >>             s++;
>>> > >>         return t - s >= f && !memcmp(s, "From ", f);
>>> > >>     }
>>> > >>
>>> > >> or something.
>>> > >
>>> > > Yikes.  I mostly work in high-level languages and do my best to
>>> > > avoid string parsing in C; so that scares me.  A lot.
>
> As mentioned above, it's all subjective and, of course, I have a bias
> toward the example I provided, but don't otherwise feel strongly about
> it. I do, however, like the idea of using a simple hand-coded matching
> function over the regex (but no so much that I would complain about
> it). Use whatever you and Junio feel is appropriate.

This is meant to be a replacement for the original that uses
regexec(), which in turn means the string we are checking is
guaranteed to be NUL terminated, right?

	static int is_mboxrd_from(const char *line) {
	        return starts_with(line + strspn(line, ">"), "From ");
	}

is sufficiently high-level that no longer is scary, hopefully?

I agree with you that regexec() is way overkill for something small
and simple like this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-03 22:36               ` Junio C Hamano
@ 2016-06-03 22:59                 ` Eric Sunshine
  2016-06-03 23:42                   ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-06-03 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, Git List

On Fri, Jun 3, 2016 at 6:36 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>> On Thu, Jun 2, 2016 at 3:51 AM, Eric Wong <e@80x24.org> wrote:
>>> Eric Wong <e@80x24.org> wrote:
>>>> Eric Sunshine <sunshine@sunshineco.com> wrote:
>>>> > On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@80x24.org> wrote:
>>>> > > Eric Sunshine <sunshine@sunshineco.com> wrote:
>>>> > >> I wonder if hand-coding, rather than using a regex, could be an improvement:
>>>> > >>
>>>> > >>     static int is_mboxrd_from(const char *s, size_t n)
>>>> > >>     {
>>>> > >>         size_t f = strlen("From ");
>>>> > >>         const char *t = s + n;
>>>> > >>
>>>> > >>         while (s < t && *s == '>')
>>>> > >>             s++;
>>>> > >>         return t - s >= f && !memcmp(s, "From ", f);
>>>> > >>     }
>>>> > >>
>>>> > >> or something.
>>>> > >
>>>> > > Yikes.  I mostly work in high-level languages and do my best to
>>>> > > avoid string parsing in C; so that scares me.  A lot.
>>
>> As mentioned above, it's all subjective and, of course, I have a bias
>> toward the example I provided, but don't otherwise feel strongly about
>> it. I do, however, like the idea of using a simple hand-coded matching
>> function over the regex (but no so much that I would complain about
>> it). Use whatever you and Junio feel is appropriate.
>
> This is meant to be a replacement for the original that uses
> regexec(), which in turn means the string we are checking is
> guaranteed to be NUL terminated, right?

Yes, this is meant as a replacement for the regexec(), but no, there
is no guarantee that the "line" is NUL-terminated. In this case, the
'line' in pretty.c:pp_remainder() might end at a NUL or it might just
end at the next newline in the larger buffer.

The bit about sending the "line" to regexec() without taking its
logical end-of-line into account was what caught my attention in the
first place.

>         static int is_mboxrd_from(const char *line) {
>                 return starts_with(line + strspn(line, ">"), "From ");
>         }
>
> is sufficiently high-level that no longer is scary, hopefully?

That's nice and concise but unfortunately not useful for this case
where we must respect the logical end-of-line (within the larger
buffer), represented by line+linelen.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-03 22:59                 ` Eric Sunshine
@ 2016-06-03 23:42                   ` Junio C Hamano
  2016-06-04  0:02                     ` Eric Sunshine
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2016-06-03 23:42 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Eric Wong, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

>>         static int is_mboxrd_from(const char *line) {
>>                 return starts_with(line + strspn(line, ">"), "From ");
>>         }
>>
>> is sufficiently high-level that no longer is scary, hopefully?
>
> That's nice and concise but unfortunately not useful for this case
> where we must respect the logical end-of-line (within the larger
> buffer), represented by line+linelen.

Hmph, none of ">", "F", "r", "o", "m", or " " is "\n"; would eol
still be an issue?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-03 23:42                   ` Junio C Hamano
@ 2016-06-04  0:02                     ` Eric Sunshine
  2016-06-04  0:19                       ` Eric Sunshine
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-06-04  0:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, Git List

On Fri, Jun 3, 2016 at 7:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>>>         static int is_mboxrd_from(const char *line) {
>>>                 return starts_with(line + strspn(line, ">"), "From ");
>>>         }
>>>
>>> is sufficiently high-level that no longer is scary, hopefully?
>>
>> That's nice and concise but unfortunately not useful for this case
>> where we must respect the logical end-of-line (within the larger
>> buffer), represented by line+linelen.
>
> Hmph, none of ">", "F", "r", "o", "m", or " " is "\n"; would eol
> still be an issue?

Ah right. Sorry for being slow.

Your concise version of is_mboxrd_from() is a nice alternative, as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-04  0:02                     ` Eric Sunshine
@ 2016-06-04  0:19                       ` Eric Sunshine
  2016-06-04  2:03                         ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2016-06-04  0:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, Git List

On Fri, Jun 3, 2016 at 8:02 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Fri, Jun 3, 2016 at 7:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Sunshine <sunshine@sunshineco.com> writes:
>>
>>>>         static int is_mboxrd_from(const char *line) {
>>>>                 return starts_with(line + strspn(line, ">"), "From ");
>>>>         }
>>>>
>>>> is sufficiently high-level that no longer is scary, hopefully?
>>>
>>> That's nice and concise but unfortunately not useful for this case
>>> where we must respect the logical end-of-line (within the larger
>>> buffer), represented by line+linelen.
>>
>> Hmph, none of ">", "F", "r", "o", "m", or " " is "\n"; would eol
>> still be an issue?
>
> Ah right. Sorry for being slow.
>
> Your concise version of is_mboxrd_from() is a nice alternative, as well.

My only minor reservation is that it your concise version is still
subtle with regard to not taking 'linelen' into account. At first
glance, it looks like a bug that it doesn't consider the logical
end-of-line, and someone reading the code has to put in extra effort
to convince himself that the code works as intended.

For that reason, I have a bit of a preference for a version of
is_mboxrd_from() which does take 'linelen' into account explicitly.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] pretty: support "mboxrd" output format
  2016-06-04  0:19                       ` Eric Sunshine
@ 2016-06-04  2:03                         ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2016-06-04  2:03 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Eric Wong, Git List

On Fri, Jun 3, 2016 at 5:19 PM, Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> My only minor reservation is that it your concise version is still
> subtle with regard to not taking 'linelen' into account. At first
> glance, it looks like a bug that it doesn't consider the logical
> end-of-line, and someone reading the code has to put in extra effort
> to convince himself that the code works as intended.
>
> For that reason, I have a bit of a preference for a version of
> is_mboxrd_from() which does take 'linelen' into account explicitly.

That certainly is true. You could check that the end result did not consume
more than linelen after the single-liner computes its result, but I am not sure
under what condition that check would trigger. Even though we may hold the
entire e-mail file without NUL-terminating each line, we have a NUL-termination
at the end of the file, no?

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-06-04  2:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-30 23:21 [RFC/PATCH 0/3] support mboxrd format Eric Wong
2016-05-30 23:21 ` [PATCH 1/3] pretty: support "mboxrd" output format Eric Wong
2016-05-31  3:40   ` Eric Sunshine
2016-05-31  7:45     ` Eric Wong
2016-05-31 18:10       ` Eric Sunshine
2016-05-31 18:29         ` Eric Wong
2016-05-31 20:12           ` Eric Sunshine
2016-05-31 20:19             ` Eric Sunshine
2016-06-02  7:51           ` Eric Wong
2016-06-03 22:22             ` Eric Sunshine
2016-06-03 22:36               ` Junio C Hamano
2016-06-03 22:59                 ` Eric Sunshine
2016-06-03 23:42                   ` Junio C Hamano
2016-06-04  0:02                     ` Eric Sunshine
2016-06-04  0:19                       ` Eric Sunshine
2016-06-04  2:03                         ` Junio C Hamano
2016-05-30 23:21 ` [PATCH 2/3] mailsplit: support unescaping mboxrd messages Eric Wong
2016-05-30 23:21 ` [PATCH 3/3] am: support --patch-format=mboxrd Eric Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.