All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH] Add interpret-trailers builtin
@ 2013-11-03 21:17 Christian Couder
  2013-11-04  1:01 ` Johan Herland
  2013-11-04 19:02 ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Christian Couder @ 2013-11-03 21:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johan Herland, Josh Triplett, Thomas Rast, Michael Haggerty,
	Dan Carpenter, Greg Kroah-Hartman

This RFC patch shows the work in progress to implement a new
command:

	git interpret-trailers

1) Rational:

This command should help with RFC 822 style headers, called
"trailers", that are found at the end of commit messages.

For a long time, these trailers have become a de facto standard
way to add helpful information into commit messages.

Until now git commit has only supported the well known
"Signed-off-by: " trailer, that is used by many projects like
the Linux kernel and Git.

It is better to implement features for these trailers in a new
command rather than in builtin/commit.c, because this way the
prepare-commit-msg and commit-msg hooks can reuse this command.

2) Current state:

Currently the usage string of this command is:

git interpret-trailers [--trim-empty] [--infile=file] [<token[=value]>...]

The following features are implemented:
	- the result is printed on stdout
	- the [<token[=value]>...] arguments are interpreted
	- a commit message passed using the "--infile=file" option is interpreted
	- the "trailer.<token>.value" options in the config are interpreted

The following features are planned but not yet implemented:
	- some documentation
	- more tests
	- the "trailer.<token>.if_exist" config option
	- the "trailer.<token>.if_missing" config option
	- the "trailer.<token>.command" config option

3) Notes:

* "trailer" seems better than "commitTrailer" as the config key because
it looks like all the config keys are lower case and "committrailer" is not
very readable.

* "trailer.<token>.value" looks better than "trailer.<token>.trailer", so
I chose the former.

* Rather than only one "trailer.<token>.style" config option, it seems
better to me to have both "trailer.<token>.if_exist" and
"trailer.<token>.if_missing".

* I might send a patch series instead of just one big patch when there will
be fewer big changes in the code.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 .gitignore                    |   1 +
 Makefile                      |   1 +
 builtin.h                     |   1 +
 builtin/interpret-trailers.c  | 284 ++++++++++++++++++++++++++++++++++++++++++
 git.c                         |   1 +
 strbuf.c                      |   7 ++
 strbuf.h                      |   1 +
 t/t7513-interpret-trailers.sh | 101 +++++++++++++++
 8 files changed, 397 insertions(+)
 create mode 100644 builtin/interpret-trailers.c
 create mode 100755 t/t7513-interpret-trailers.sh

diff --git a/.gitignore b/.gitignore
index 66199ed..e6cf15b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -73,6 +73,7 @@
 /git-index-pack
 /git-init
 /git-init-db
+/git-interpret-trailers
 /git-instaweb
 /git-log
 /git-lost-found
diff --git a/Makefile b/Makefile
index af847f8..96441f1 100644
--- a/Makefile
+++ b/Makefile
@@ -937,6 +937,7 @@ BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
 BUILTIN_OBJS += builtin/index-pack.o
 BUILTIN_OBJS += builtin/init-db.o
+BUILTIN_OBJS += builtin/interpret-trailers.o
 BUILTIN_OBJS += builtin/log.o
 BUILTIN_OBJS += builtin/ls-files.o
 BUILTIN_OBJS += builtin/ls-remote.o
diff --git a/builtin.h b/builtin.h
index b56cf07..88c2999 100644
--- a/builtin.h
+++ b/builtin.h
@@ -71,6 +71,7 @@ extern int cmd_hash_object(int argc, const char **argv, const char *prefix);
 extern int cmd_help(int argc, const char **argv, const char *prefix);
 extern int cmd_index_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_init_db(int argc, const char **argv, const char *prefix);
+extern int cmd_interpret_trailers(int argc, const char **argv, const char *prefix);
 extern int cmd_log(int argc, const char **argv, const char *prefix);
 extern int cmd_log_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_files(int argc, const char **argv, const char *prefix);
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
new file mode 100644
index 0000000..2bcd480
--- /dev/null
+++ b/builtin/interpret-trailers.c
@@ -0,0 +1,284 @@
+/*
+ * Builtin "git interpret-trailers"
+ *
+ * Copyright (c) 2013 Christian Couder <chriscool@tuxfamily.org>
+ *
+ */
+
+#include "cache.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "strbuf.h"
+
+static const char * const git_interpret_trailers_usage[] = {
+	N_("git interpret-trailers [--trim-empty] [--infile=file] [<token[=value]>...]"),
+	NULL
+};
+
+static void parse_arg(struct strbuf *tok, struct strbuf *val, const char *arg)
+{
+	char *end = strchr(arg, '=');
+	if (!end)
+		end = strchr(arg, ':');
+	if (end) {
+		strbuf_add(tok, arg, end - arg);
+		strbuf_trim(tok);
+		strbuf_addstr(val, end + 1);
+		strbuf_trim(val);
+	} else {
+		strbuf_addstr(tok, arg);
+		strbuf_trim(tok);
+	}
+}
+
+static struct string_list trailer_list;
+
+enum style_if_exist { DONT_REPEAT, OVERWRITE, REPEAT };
+enum style_if_missing { DONT_APPEND, APPEND };
+
+struct trailer_info {
+	char *value;
+	char *command;
+	enum style_if_exist style_exist;
+	enum style_if_missing style_missing;
+};
+
+static int git_trailer_config(const char *key, const char *value, void *cb)
+{
+	if (!prefixcmp(key, "trailer.")) {
+		const char *orig_key = key;
+		char *name;
+		struct string_list_item *item;
+		struct trailer_info *info;
+		enum { VALUE, COMMAND, IF_EXIST, IF_MISSING } type;
+
+		key += 8;
+		if (!suffixcmp(key, ".value")) {
+			name = xstrndup(key, strlen(key) - 6);
+			type = VALUE;
+		} else if (!suffixcmp(key, ".command")) {
+			name = xstrndup(key, strlen(key) - 8);
+			type = COMMAND;
+		} else if (!suffixcmp(key, ".if_exist")) {
+			name = xstrndup(key, strlen(key) - 9);
+			type = IF_EXIST;
+		} else if (!suffixcmp(key, ".if_missing")) {
+			name = xstrndup(key, strlen(key) - 11);
+			type = IF_MISSING;
+		} else
+			return 0;
+
+		item = string_list_insert(&trailer_list, name);
+
+		if (!item->util)
+			item->util = xcalloc(sizeof(struct trailer_info), 1);
+		info = item->util;
+		if (type == VALUE) {
+			if (info->value)
+				warning(_("more than one %s"), orig_key);
+			info->value = xstrdup(value);
+		} else if (type == IF_EXIST) {
+			if (!strcasecmp("dont_repeat", value)) {
+				info->style_exist = DONT_REPEAT;
+			} else if (!strcasecmp("overwrite", value)) {
+				info->style_exist = OVERWRITE;
+			} else if (!strcasecmp("repeat", value)) {
+				info->style_exist = REPEAT;
+			} else
+				warning(_("unknow value '%s' for key '%s'"), value, orig_key);
+		} else if (type == IF_MISSING) {
+			if (!strcasecmp("dont_append", value)) {
+				info->style_missing = DONT_APPEND;
+			} else if (!strcasecmp("append", value)) {
+				info->style_missing = APPEND;
+			} else
+				warning(_("unknow value '%s' for key '%s'"), value, orig_key);
+		} else {
+			if (info->command)
+				warning(_("more than one %s"), orig_key);
+			info->command = xstrdup(value);
+		}
+	}
+	return 0;
+}
+
+static size_t alnum_len(const char *buf, size_t len) {
+	while (--len >= 0 && !isalnum(buf[len]));
+	return len + 1;
+}
+
+static void apply_config(struct strbuf *tok, struct strbuf *val, struct trailer_info *info)
+{
+	if (info->value) {
+		strbuf_reset(tok);
+		strbuf_addstr(tok, info->value);
+	}
+	if (info->command) {
+	}
+}
+
+static void apply_config_list(struct strbuf *tok, struct strbuf *val)
+{
+	int j, tok_alnum_len = alnum_len(tok->buf, tok->len);
+
+	for (j = 0; j < trailer_list.nr; j++) {
+		struct string_list_item *item = trailer_list.items + j;
+		struct trailer_info *info = item->util;
+		if (!strncasecmp(tok->buf, item->string, tok_alnum_len) ||
+		    !strncasecmp(tok->buf, info->value, tok_alnum_len)) {
+			apply_config(tok, val, info);
+			break;
+		}
+	}
+}
+
+static struct strbuf **read_input_file(const char *infile)
+{
+	struct strbuf sb = STRBUF_INIT;
+
+	if (strbuf_read_file(&sb, infile, 0) < 0)
+		die_errno(_("could not read input file '%s'"), infile);
+
+	return strbuf_split(&sb, '\n');
+}
+
+/*
+ * Return the the (0 based) index of the first trailer line
+ * or the line count if there are no trailers.
+ */
+static int find_trailer_start(struct strbuf **lines)
+{
+	int count, start, empty = 1;
+
+	/* Get the line count */
+	for (count = 0; lines[count]; count++);
+
+	/*
+	 * Get the start of the trailers by looking starting from the end
+	 * for a line with only spaces before lines with one ':'.
+	 */
+	for (start = count - 1; start >= 0; start--) {
+		if (strbuf_isspace(lines[start])) {
+			if (empty)
+				continue;
+			return start + 1;
+		}
+		if (strchr(lines[start]->buf, ':')) {
+			if (empty)
+				empty = 0;
+			continue;
+		}
+		return count;
+	}
+
+	return empty ? count : start + 1;
+}
+
+static void print_tok_val(const char *tok_buf, size_t tok_len,
+			  const char *val_buf, size_t val_len)
+{
+	char c = tok_buf[tok_len - 1];
+	if (isalnum(c))
+		printf("%s: %s\n", tok_buf, val_buf);
+	else if (isspace(c) || c == '#')
+		printf("%s%s\n", tok_buf, val_buf);
+	else
+		printf("%s %s\n", tok_buf, val_buf);
+}
+
+static void process_input_file(const char *infile,
+			       struct string_list *tok_list,
+			       struct string_list *val_list)
+{
+	struct strbuf **lines = read_input_file(infile);
+	int start = find_trailer_start(lines);
+	int i;
+
+	/* Output non trailer lines as is */
+	for (i = 0; lines[i] && i < start; i++) {
+		printf("%s", lines[i]->buf);
+	}
+
+	/* Process trailer lines */
+	for (i = start; lines[i]; i++) {
+		struct strbuf tok = STRBUF_INIT;
+		struct strbuf val = STRBUF_INIT;
+		parse_arg(&tok, &val, lines[i]->buf);
+		string_list_append(tok_list, strbuf_detach(&tok, NULL));
+		string_list_append(val_list, strbuf_detach(&val, NULL));
+	}
+}
+
+int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
+{
+	const char *infile = NULL;
+	int trim_empty = 0;
+	int i;
+	struct string_list tok_list = STRING_LIST_INIT_NODUP;
+	struct string_list val_list = STRING_LIST_INIT_NODUP;
+	char *applied_arg;
+
+	struct option options[] = {
+		OPT_BOOL(0, "trim-empty", &trim_empty, N_("trim empty trailers")),
+		OPT_FILENAME(0, "infile", &infile, N_("use message from file")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options,
+			     git_interpret_trailers_usage, 0);
+
+	git_config(git_trailer_config, NULL);
+
+	/* Print the non trailer part of infile */
+	if (infile) {
+		process_input_file(infile, &tok_list, &val_list);
+		applied_arg = xcalloc(tok_list.nr, 1);
+	}
+
+	for (i = 0; i < argc; i++) {
+		struct strbuf tok = STRBUF_INIT;
+		struct strbuf val = STRBUF_INIT;
+		int j, seen = 0;
+
+		parse_arg(&tok, &val, argv[i]);
+
+		apply_config_list(&tok, &val);
+
+		/* Apply the trailer arguments to the trailers in infile */
+		for (j = 0; j < tok_list.nr; j++) {
+			struct string_list_item *tok_item = tok_list.items + j;
+			struct string_list_item *val_item = val_list.items + j;
+			int tok_alnum_len = alnum_len(tok.buf, tok.len);
+			if (!strncasecmp(tok.buf, tok_item->string, tok_alnum_len)) {
+				tok_item->string = xstrdup(tok.buf);
+				val_item->string = xstrdup(val.buf);
+				seen = 1;
+				applied_arg[j] = 1;
+				break;
+			}
+		}
+
+		/* Print the trailer arguments that are not in infile */
+		if (!seen && (!trim_empty || val.len > 0))
+			print_tok_val(tok.buf, tok.len, val.buf, val.len);
+
+		strbuf_release(&tok);
+		strbuf_release(&val);
+	}
+
+	/* Print the trailer part of infile */
+	for (i = 0; i < tok_list.nr; i++) {
+		struct strbuf tok = STRBUF_INIT;
+		struct strbuf val = STRBUF_INIT;
+		strbuf_addstr(&tok, tok_list.items[i].string);
+		strbuf_addstr(&val, val_list.items[i].string);
+
+		if (!applied_arg[i])
+			apply_config_list(&tok, &val);
+
+		if (!trim_empty || val.len > 0)
+			print_tok_val(tok.buf, tok.len, val.buf, val.len);
+	}
+
+	return 0;
+}
diff --git a/git.c b/git.c
index cb5208d..9f2eb77 100644
--- a/git.c
+++ b/git.c
@@ -383,6 +383,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
 		{ "init", cmd_init_db },
 		{ "init-db", cmd_init_db },
+		{ "interpret-trailers", cmd_interpret_trailers, RUN_SETUP },
 		{ "log", cmd_log, RUN_SETUP },
 		{ "ls-files", cmd_ls_files, RUN_SETUP },
 		{ "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY },
diff --git a/strbuf.c b/strbuf.c
index 1170d01..f9080fa 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -106,6 +106,13 @@ void strbuf_ltrim(struct strbuf *sb)
 	sb->buf[sb->len] = '\0';
 }
 
+int strbuf_isspace(struct strbuf *sb)
+{
+	char *b;
+	for (b = sb->buf; *b && isspace(*b); b++);
+	return !*b;
+}
+
 struct strbuf **strbuf_split_buf(const char *str, size_t slen,
 				 int terminator, int max)
 {
diff --git a/strbuf.h b/strbuf.h
index 73e80ce..02bff3a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -42,6 +42,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) {
 extern void strbuf_trim(struct strbuf *);
 extern void strbuf_rtrim(struct strbuf *);
 extern void strbuf_ltrim(struct strbuf *);
+extern int strbuf_isspace(struct strbuf *);
 extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
 
 /*
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
new file mode 100755
index 0000000..5d2f967
--- /dev/null
+++ b/t/t7513-interpret-trailers.sh
@@ -0,0 +1,101 @@
+#!/bin/sh
+#
+# Copyright (c) 2013 Christian Couder
+#
+
+test_description='git interpret-trailers'
+
+. ./test-lib.sh
+
+cat >basic_message <<'EOF'
+subject
+
+body
+EOF
+
+cat >complex_message_body <<'EOF'
+my subject
+
+my body which is long
+and contains some special
+chars like : = ? !
+
+EOF
+
+# Do not remove trailing spaces below!
+cat >complex_message_trailers <<'EOF'
+Fixes: 
+Acked-by: 
+Reviewed-by: 
+Signed-off-by: 
+EOF
+
+test_expect_success 'without config' '
+	printf "ack: Peff\nReviewed-by: \nAcked-by: Johan\n" >expected &&
+	git interpret-trailers "ack = Peff" "Reviewed-by" "Acked-by: Johan" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '--trim-empty without config' '
+	printf "ack: Peff\nAcked-by: Johan\n" >expected &&
+	git interpret-trailers --trim-empty "ack = Peff" "Reviewed-by" "Acked-by: Johan" "sob:" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'with config setup' '
+	git config trailer.ack.value "Acked-by: " &&
+	printf "Acked-by: Peff\n" >expected &&
+	git interpret-trailers --trim-empty "ack = Peff" >actual &&
+	test_cmp expected actual &&
+	git interpret-trailers --trim-empty "Acked-by = Peff" >actual &&
+	test_cmp expected actual &&
+	git interpret-trailers --trim-empty "Acked-by :Peff" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'with config setup and = sign' '
+	git config trailer.ack.value "Acked-by= " &&
+	printf "Acked-by= Peff\n" >expected &&
+	git interpret-trailers --trim-empty "ack = Peff" >actual &&
+	test_cmp expected actual &&
+	git interpret-trailers --trim-empty "Acked-by= Peff" >actual &&
+	test_cmp expected actual &&
+	git interpret-trailers --trim-empty "Acked-by : Peff" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'with config setup and # sign' '
+	git config trailer.bug.value "Bug #" &&
+	printf "Bug #42\n" >expected &&
+	git interpret-trailers --trim-empty "bug = 42" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'with commit basic message' '
+	git interpret-trailers --infile basic_message >actual &&
+	test_cmp basic_message actual
+'
+
+test_expect_success 'with commit complex message' '
+	cat complex_message_body complex_message_trailers >complex_message &&
+	cat complex_message_body >expected &&
+	printf "Fixes: \nAcked-by= \nReviewed-by: \nSigned-off-by: \n" >>expected &&
+	git interpret-trailers --infile complex_message >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'with commit complex message and args' '
+	cat complex_message_body >expected &&
+	printf "Bug #42\nFixes: \nAcked-by= Peff\nReviewed-by: \nSigned-off-by: \n" >>expected &&
+	git interpret-trailers --infile complex_message "ack: Peff" "bug: 42" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'with commit complex message, args and --trim-empty' '
+	cat complex_message_body >expected &&
+	printf "Bug #42\nAcked-by= Peff\n" >>expected &&
+	git interpret-trailers --trim-empty --infile complex_message "ack: Peff" "bug: 42" >actual &&
+	test_cmp expected actual
+'
+
+test_done
-- 
1.8.4.1.576.g36ba827.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-03 21:17 [RFC/PATCH] Add interpret-trailers builtin Christian Couder
@ 2013-11-04  1:01 ` Johan Herland
  2013-11-04 19:12   ` Junio C Hamano
  2013-11-04 19:02 ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Johan Herland @ 2013-11-04  1:01 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, Git mailing list, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

On Sun, Nov 3, 2013 at 10:17 PM, Christian Couder
<chriscool@tuxfamily.org> wrote:
> This RFC patch shows the work in progress to implement a new
> command:

First of all: Thanks for working on this! This looks like a really
good start. Plenty of comments below (mostly either to learn myself,
or to check what alternatives you've considered), but overall, I'm
happy about where this is going.

>         git interpret-trailers

Pesonally, I'd rather name it "git process-footers", since I think
"process" better captures the two-way functionality of this program
(both parsing/interpreting _and_ generating/writiing), and I believe
I'm not alone in preferring "footer" over "trailer". That said, this
is certainly bikeshedding territory, and I am not the one writing the
code, so feel free to ignore.

> 1) Rational:

s/Rational/Rationale/

>
> This command should help with RFC 822 style headers, called
> "trailers", that are found at the end of commit messages.

As has been asked earlier in this discussion, we should probably
specify explicitly what we _mean_ with "RFC822-style"
headers/footers/trailers, and exactly how closely we follow the actual
RFC... E.g. do we make use of the linebreaking rules? encoding
handling? etc... We may want to take a more relaxed approach (after
all, we're not including a complete RFC822/RFC2822 implementation),
but we should at least state so, and possibly how/why we do so.

> For a long time, these trailers have become a de facto standard
> way to add helpful information into commit messages.
>
> Until now git commit has only supported the well known
> "Signed-off-by: " trailer, that is used by many projects like
> the Linux kernel and Git.
>
> It is better to implement features for these trailers in a new
> command rather than in builtin/commit.c, because this way the
> prepare-commit-msg and commit-msg hooks can reuse this command.
>
> 2) Current state:
>
> Currently the usage string of this command is:
>
> git interpret-trailers [--trim-empty] [--infile=file] [<token[=value]>...]

--trim-empty will remove empty footers given on the command-line? Or
in the infile? or both? I think I'm fine with both, but I wonder if
there is a use case for treating the two types of empty footers
separately...

> The following features are implemented:
>         - the result is printed on stdout
>         - the [<token[=value]>...] arguments are interpreted
>         - a commit message passed using the "--infile=file" option is interpreted

If the output is written to stdout, then why is not the input taken
from stdin? Or vice versa: why not --outfile?

>         - the "trailer.<token>.value" options in the config are interpreted

This is the default value of the footer if not further specified on
the command-line?

> The following features are planned but not yet implemented:
>         - some documentation
>         - more tests
>         - the "trailer.<token>.if_exist" config option
>         - the "trailer.<token>.if_missing" config option

Not sure what these two options will do, or what kind of values they take...

>         - the "trailer.<token>.command" config option

I guess the value of .command is executed to generate the default
content of the footer?

> 3) Notes:
>
> * "trailer" seems better than "commitTrailer" as the config key because
> it looks like all the config keys are lower case and "committrailer" is not
> very readable.

As stated above, I prefer "footer" over "trailer"...

> * "trailer.<token>.value" looks better than "trailer.<token>.trailer", so
> I chose the former.
>
> * Rather than only one "trailer.<token>.style" config option, it seems
> better to me to have both "trailer.<token>.if_exist" and
> "trailer.<token>.if_missing".
>
> * I might send a patch series instead of just one big patch when there will
> be fewer big changes in the code.

Maybe at least split in two:
 - One patch to deal with the administrivia of adding a new command.
 - One (or more) patch(es) to introduce the logic/functionality of the
new command.

> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  .gitignore                    |   1 +
>  Makefile                      |   1 +
>  builtin.h                     |   1 +
>  builtin/interpret-trailers.c  | 284 ++++++++++++++++++++++++++++++++++++++++++
>  git.c                         |   1 +
>  strbuf.c                      |   7 ++
>  strbuf.h                      |   1 +
>  t/t7513-interpret-trailers.sh | 101 +++++++++++++++
>  8 files changed, 397 insertions(+)
>  create mode 100644 builtin/interpret-trailers.c
>  create mode 100755 t/t7513-interpret-trailers.sh
>
> diff --git a/.gitignore b/.gitignore
> index 66199ed..e6cf15b 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -73,6 +73,7 @@
>  /git-index-pack
>  /git-init
>  /git-init-db
> +/git-interpret-trailers
>  /git-instaweb
>  /git-log
>  /git-lost-found
> diff --git a/Makefile b/Makefile
> index af847f8..96441f1 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -937,6 +937,7 @@ BUILTIN_OBJS += builtin/hash-object.o
>  BUILTIN_OBJS += builtin/help.o
>  BUILTIN_OBJS += builtin/index-pack.o
>  BUILTIN_OBJS += builtin/init-db.o
> +BUILTIN_OBJS += builtin/interpret-trailers.o
>  BUILTIN_OBJS += builtin/log.o
>  BUILTIN_OBJS += builtin/ls-files.o
>  BUILTIN_OBJS += builtin/ls-remote.o
> diff --git a/builtin.h b/builtin.h
> index b56cf07..88c2999 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -71,6 +71,7 @@ extern int cmd_hash_object(int argc, const char **argv, const char *prefix);
>  extern int cmd_help(int argc, const char **argv, const char *prefix);
>  extern int cmd_index_pack(int argc, const char **argv, const char *prefix);
>  extern int cmd_init_db(int argc, const char **argv, const char *prefix);
> +extern int cmd_interpret_trailers(int argc, const char **argv, const char *prefix);
>  extern int cmd_log(int argc, const char **argv, const char *prefix);
>  extern int cmd_log_reflog(int argc, const char **argv, const char *prefix);
>  extern int cmd_ls_files(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
> new file mode 100644
> index 0000000..2bcd480
> --- /dev/null
> +++ b/builtin/interpret-trailers.c
> @@ -0,0 +1,284 @@
> +/*
> + * Builtin "git interpret-trailers"
> + *
> + * Copyright (c) 2013 Christian Couder <chriscool@tuxfamily.org>
> + *
> + */
> +
> +#include "cache.h"
> +#include "builtin.h"
> +#include "parse-options.h"
> +#include "strbuf.h"
> +
> +static const char * const git_interpret_trailers_usage[] = {
> +       N_("git interpret-trailers [--trim-empty] [--infile=file] [<token[=value]>...]"),
> +       NULL
> +};
> +
> +static void parse_arg(struct strbuf *tok, struct strbuf *val, const char *arg)

When discussing RFC822-style headers (or in our case,
footer/trailers), one often talks about a "field" which consists of a
"field name" and a "field value" (or "field body"). I propose that we
use the same naming convention in our code.

Here, I assume 'tok' is the field name, and 'val' is the field value.
Maybe rename to "name" and "value", respectively?

> +{
> +       char *end = strchr(arg, '=');
> +       if (!end)
> +               end = strchr(arg, ':');

So both '=' (preferred) and ':' are accepted as field/value
separators. That's ok for the command-line, I believe.

> +       if (end) {
> +               strbuf_add(tok, arg, end - arg);
> +               strbuf_trim(tok);
> +               strbuf_addstr(val, end + 1);
> +               strbuf_trim(val);
> +       } else {
> +               strbuf_addstr(tok, arg);
> +               strbuf_trim(tok);
> +       }
> +}
> +
> +static struct string_list trailer_list;
> +
> +enum style_if_exist { DONT_REPEAT, OVERWRITE, REPEAT };

>From the enum values, I assume that these declare the desired behavior
when there are two (or more?) or the same footer (i.e. same "field
name"). However, it's not (yet) obvious to me in which order they are
processed. There are several opportunities for multiple instances of
the same "field name":

 - Two (or more) occurences in the infile
 - Two (or more) occurences on the command line
 - One occurence in the infile, and one of the command line

It is not obvious to me in which order these are processed, or which
of those cases constitute a duplicate/repetition, hence it's not clear
to me what the result will be when e.g. using OVERWRITE. Also not sure
what DONT_REPEAT means, and if/how it's different from OVERWRITE.
Instead, I suggest we should specify clearly:

 1. In which order are the footers processed
 2. What constitutes a repetition/duplicate (IMHO all of the above three cases)
 3. Better enum names that are easily understood given the rules in #1
and #2, I suggest:

enum handle_duplicates { KEEP_FIRST, KEEP_LAST, KEEP_ALL };

> +enum style_if_missing { DONT_APPEND, APPEND };

enum handle_missing { SKIP, INCLUDE }; ?

> +struct trailer_info {
> +       char *value;

Is this the field _name_ or field _value_?

> +       char *command;

I assume this is only used when trailer.<token>.command is used?

> +       enum style_if_exist style_exist;
> +       enum style_if_missing style_missing;
> +};
> +
> +static int git_trailer_config(const char *key, const char *value, void *cb)
> +{
> +       if (!prefixcmp(key, "trailer.")) {
> +               const char *orig_key = key;
> +               char *name;
> +               struct string_list_item *item;
> +               struct trailer_info *info;
> +               enum { VALUE, COMMAND, IF_EXIST, IF_MISSING } type;
> +
> +               key += 8;
> +               if (!suffixcmp(key, ".value")) {
> +                       name = xstrndup(key, strlen(key) - 6);
> +                       type = VALUE;
> +               } else if (!suffixcmp(key, ".command")) {
> +                       name = xstrndup(key, strlen(key) - 8);
> +                       type = COMMAND;
> +               } else if (!suffixcmp(key, ".if_exist")) {
> +                       name = xstrndup(key, strlen(key) - 9);
> +                       type = IF_EXIST;
> +               } else if (!suffixcmp(key, ".if_missing")) {
> +                       name = xstrndup(key, strlen(key) - 11);
> +                       type = IF_MISSING;
> +               } else
> +                       return 0;
> +
> +               item = string_list_insert(&trailer_list, name);
> +
> +               if (!item->util)
> +                       item->util = xcalloc(sizeof(struct trailer_info), 1);
> +               info = item->util;
> +               if (type == VALUE) {
> +                       if (info->value)
> +                               warning(_("more than one %s"), orig_key);
> +                       info->value = xstrdup(value);
> +               } else if (type == IF_EXIST) {

Consider splitting the parsing of values -> enums into helper
functions, to make this function shorter and its logic more easily
followed.

> +                       if (!strcasecmp("dont_repeat", value)) {
> +                               info->style_exist = DONT_REPEAT;
> +                       } else if (!strcasecmp("overwrite", value)) {
> +                               info->style_exist = OVERWRITE;
> +                       } else if (!strcasecmp("repeat", value)) {
> +                               info->style_exist = REPEAT;
> +                       } else
> +                               warning(_("unknow value '%s' for key '%s'"), value, orig_key);

s/unknow/unknown/

> +               } else if (type == IF_MISSING) {
> +                       if (!strcasecmp("dont_append", value)) {
> +                               info->style_missing = DONT_APPEND;
> +                       } else if (!strcasecmp("append", value)) {
> +                               info->style_missing = APPEND;
> +                       } else
> +                               warning(_("unknow value '%s' for key '%s'"), value, orig_key);
> +               } else {
> +                       if (info->command)
> +                               warning(_("more than one %s"), orig_key);
> +                       info->command = xstrdup(value);
> +               }
> +       }
> +       return 0;
> +}
> +
> +static size_t alnum_len(const char *buf, size_t len) {
> +       while (--len >= 0 && !isalnum(buf[len]));
> +       return len + 1;
> +}
> +
> +static void apply_config(struct strbuf *tok, struct strbuf *val, struct trailer_info *info)
> +{
> +       if (info->value) {
> +               strbuf_reset(tok);
> +               strbuf_addstr(tok, info->value);
> +       }
> +       if (info->command) {
> +       }

I guess this if-block might be filled by a later commit, but is there
any reason to have the empty block in this commit?

> +}
> +
> +static void apply_config_list(struct strbuf *tok, struct strbuf *val)
> +{
> +       int j, tok_alnum_len = alnum_len(tok->buf, tok->len);

So tok_alnum_len is the length of the initial run of alphanumeric
characters in tok (i.e. does not include '-'?)

> +
> +       for (j = 0; j < trailer_list.nr; j++) {

We're iterating through the footers we found in infile?

> +               struct string_list_item *item = trailer_list.items + j;
> +               struct trailer_info *info = item->util;
> +               if (!strncasecmp(tok->buf, item->string, tok_alnum_len) ||
> +                   !strncasecmp(tok->buf, info->value, tok_alnum_len)) {

What is held in item->string? and what is held in info->value, and why
are _both_ being compared to (the initial run of alphanumerics in)
tok?

> +                       apply_config(tok, val, info);
> +                       break;
> +               }
> +       }
> +}
> +
> +static struct strbuf **read_input_file(const char *infile)
> +{
> +       struct strbuf sb = STRBUF_INIT;
> +
> +       if (strbuf_read_file(&sb, infile, 0) < 0)
> +               die_errno(_("could not read input file '%s'"), infile);
> +
> +       return strbuf_split(&sb, '\n');
> +}
> +
> +/*
> + * Return the the (0 based) index of the first trailer line
> + * or the line count if there are no trailers.
> + */
> +static int find_trailer_start(struct strbuf **lines)
> +{
> +       int count, start, empty = 1;
> +
> +       /* Get the line count */
> +       for (count = 0; lines[count]; count++);
> +
> +       /*
> +        * Get the start of the trailers by looking starting from the end
> +        * for a line with only spaces before lines with one ':'.
> +        */

This means that if the footers end up in multiple paragraphs, like this:

    Tested-by: foo
    Reviewed-by: bar

    Signed-off-by: xyzzy
    Signed-off-by: zyxxy

then only the latter of these two sections will be considered? What
about instead looking for the last line (i.e. first line from bottom)
that neither (a) contains ':' or (b) consists solely of zero or more
whitespace (blank lines)? Maybe also want to tighten rule (a) to only
match lines that consist of a single word (i.e. no inter-word spaces)
before the ':' (i.e. something like "^\s*[a-zA-Z0-9-]+\s*:" in
regexp-speak).

That's all I had time for now. Hopefully, I'll have some more time to
continue the review later.


Hope this helps,

...Johan

> +       for (start = count - 1; start >= 0; start--) {
> +               if (strbuf_isspace(lines[start])) {
> +                       if (empty)
> +                               continue;
> +                       return start + 1;
> +               }
> +               if (strchr(lines[start]->buf, ':')) {
> +                       if (empty)
> +                               empty = 0;
> +                       continue;
> +               }
> +               return count;
> +       }
> +
> +       return empty ? count : start + 1;
> +}
> +
> +static void print_tok_val(const char *tok_buf, size_t tok_len,
> +                         const char *val_buf, size_t val_len)
> +{
> +       char c = tok_buf[tok_len - 1];
> +       if (isalnum(c))
> +               printf("%s: %s\n", tok_buf, val_buf);
> +       else if (isspace(c) || c == '#')
> +               printf("%s%s\n", tok_buf, val_buf);
> +       else
> +               printf("%s %s\n", tok_buf, val_buf);
> +}
> +
> +static void process_input_file(const char *infile,
> +                              struct string_list *tok_list,
> +                              struct string_list *val_list)
> +{
> +       struct strbuf **lines = read_input_file(infile);
> +       int start = find_trailer_start(lines);
> +       int i;
> +
> +       /* Output non trailer lines as is */
> +       for (i = 0; lines[i] && i < start; i++) {
> +               printf("%s", lines[i]->buf);
> +       }
> +
> +       /* Process trailer lines */
> +       for (i = start; lines[i]; i++) {
> +               struct strbuf tok = STRBUF_INIT;
> +               struct strbuf val = STRBUF_INIT;
> +               parse_arg(&tok, &val, lines[i]->buf);
> +               string_list_append(tok_list, strbuf_detach(&tok, NULL));
> +               string_list_append(val_list, strbuf_detach(&val, NULL));
> +       }
> +}
> +
> +int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
> +{
> +       const char *infile = NULL;
> +       int trim_empty = 0;
> +       int i;
> +       struct string_list tok_list = STRING_LIST_INIT_NODUP;
> +       struct string_list val_list = STRING_LIST_INIT_NODUP;
> +       char *applied_arg;
> +
> +       struct option options[] = {
> +               OPT_BOOL(0, "trim-empty", &trim_empty, N_("trim empty trailers")),
> +               OPT_FILENAME(0, "infile", &infile, N_("use message from file")),
> +               OPT_END()
> +       };
> +
> +       argc = parse_options(argc, argv, prefix, options,
> +                            git_interpret_trailers_usage, 0);
> +
> +       git_config(git_trailer_config, NULL);
> +
> +       /* Print the non trailer part of infile */
> +       if (infile) {
> +               process_input_file(infile, &tok_list, &val_list);
> +               applied_arg = xcalloc(tok_list.nr, 1);
> +       }
> +
> +       for (i = 0; i < argc; i++) {
> +               struct strbuf tok = STRBUF_INIT;
> +               struct strbuf val = STRBUF_INIT;
> +               int j, seen = 0;
> +
> +               parse_arg(&tok, &val, argv[i]);
> +
> +               apply_config_list(&tok, &val);
> +
> +               /* Apply the trailer arguments to the trailers in infile */
> +               for (j = 0; j < tok_list.nr; j++) {
> +                       struct string_list_item *tok_item = tok_list.items + j;
> +                       struct string_list_item *val_item = val_list.items + j;
> +                       int tok_alnum_len = alnum_len(tok.buf, tok.len);
> +                       if (!strncasecmp(tok.buf, tok_item->string, tok_alnum_len)) {
> +                               tok_item->string = xstrdup(tok.buf);
> +                               val_item->string = xstrdup(val.buf);
> +                               seen = 1;
> +                               applied_arg[j] = 1;
> +                               break;
> +                       }
> +               }
> +
> +               /* Print the trailer arguments that are not in infile */
> +               if (!seen && (!trim_empty || val.len > 0))
> +                       print_tok_val(tok.buf, tok.len, val.buf, val.len);
> +
> +               strbuf_release(&tok);
> +               strbuf_release(&val);
> +       }
> +
> +       /* Print the trailer part of infile */
> +       for (i = 0; i < tok_list.nr; i++) {
> +               struct strbuf tok = STRBUF_INIT;
> +               struct strbuf val = STRBUF_INIT;
> +               strbuf_addstr(&tok, tok_list.items[i].string);
> +               strbuf_addstr(&val, val_list.items[i].string);
> +
> +               if (!applied_arg[i])
> +                       apply_config_list(&tok, &val);
> +
> +               if (!trim_empty || val.len > 0)
> +                       print_tok_val(tok.buf, tok.len, val.buf, val.len);
> +       }
> +
> +       return 0;
> +}
> diff --git a/git.c b/git.c
> index cb5208d..9f2eb77 100644
> --- a/git.c
> +++ b/git.c
> @@ -383,6 +383,7 @@ static void handle_internal_command(int argc, const char **argv)
>                 { "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
>                 { "init", cmd_init_db },
>                 { "init-db", cmd_init_db },
> +               { "interpret-trailers", cmd_interpret_trailers, RUN_SETUP },
>                 { "log", cmd_log, RUN_SETUP },
>                 { "ls-files", cmd_ls_files, RUN_SETUP },
>                 { "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY },
> diff --git a/strbuf.c b/strbuf.c
> index 1170d01..f9080fa 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -106,6 +106,13 @@ void strbuf_ltrim(struct strbuf *sb)
>         sb->buf[sb->len] = '\0';
>  }
>
> +int strbuf_isspace(struct strbuf *sb)
> +{
> +       char *b;
> +       for (b = sb->buf; *b && isspace(*b); b++);
> +       return !*b;
> +}
> +
>  struct strbuf **strbuf_split_buf(const char *str, size_t slen,
>                                  int terminator, int max)
>  {
> diff --git a/strbuf.h b/strbuf.h
> index 73e80ce..02bff3a 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -42,6 +42,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) {
>  extern void strbuf_trim(struct strbuf *);
>  extern void strbuf_rtrim(struct strbuf *);
>  extern void strbuf_ltrim(struct strbuf *);
> +extern int strbuf_isspace(struct strbuf *);
>  extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
>
>  /*
> diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
> new file mode 100755
> index 0000000..5d2f967
> --- /dev/null
> +++ b/t/t7513-interpret-trailers.sh
> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2013 Christian Couder
> +#
> +
> +test_description='git interpret-trailers'
> +
> +. ./test-lib.sh
> +
> +cat >basic_message <<'EOF'
> +subject
> +
> +body
> +EOF
> +
> +cat >complex_message_body <<'EOF'
> +my subject
> +
> +my body which is long
> +and contains some special
> +chars like : = ? !
> +
> +EOF
> +
> +# Do not remove trailing spaces below!
> +cat >complex_message_trailers <<'EOF'
> +Fixes:
> +Acked-by:
> +Reviewed-by:
> +Signed-off-by:
> +EOF
> +
> +test_expect_success 'without config' '
> +       printf "ack: Peff\nReviewed-by: \nAcked-by: Johan\n" >expected &&
> +       git interpret-trailers "ack = Peff" "Reviewed-by" "Acked-by: Johan" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success '--trim-empty without config' '
> +       printf "ack: Peff\nAcked-by: Johan\n" >expected &&
> +       git interpret-trailers --trim-empty "ack = Peff" "Reviewed-by" "Acked-by: Johan" "sob:" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'with config setup' '
> +       git config trailer.ack.value "Acked-by: " &&
> +       printf "Acked-by: Peff\n" >expected &&
> +       git interpret-trailers --trim-empty "ack = Peff" >actual &&
> +       test_cmp expected actual &&
> +       git interpret-trailers --trim-empty "Acked-by = Peff" >actual &&
> +       test_cmp expected actual &&
> +       git interpret-trailers --trim-empty "Acked-by :Peff" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'with config setup and = sign' '
> +       git config trailer.ack.value "Acked-by= " &&
> +       printf "Acked-by= Peff\n" >expected &&
> +       git interpret-trailers --trim-empty "ack = Peff" >actual &&
> +       test_cmp expected actual &&
> +       git interpret-trailers --trim-empty "Acked-by= Peff" >actual &&
> +       test_cmp expected actual &&
> +       git interpret-trailers --trim-empty "Acked-by : Peff" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'with config setup and # sign' '
> +       git config trailer.bug.value "Bug #" &&
> +       printf "Bug #42\n" >expected &&
> +       git interpret-trailers --trim-empty "bug = 42" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'with commit basic message' '
> +       git interpret-trailers --infile basic_message >actual &&
> +       test_cmp basic_message actual
> +'
> +
> +test_expect_success 'with commit complex message' '
> +       cat complex_message_body complex_message_trailers >complex_message &&
> +       cat complex_message_body >expected &&
> +       printf "Fixes: \nAcked-by= \nReviewed-by: \nSigned-off-by: \n" >>expected &&
> +       git interpret-trailers --infile complex_message >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'with commit complex message and args' '
> +       cat complex_message_body >expected &&
> +       printf "Bug #42\nFixes: \nAcked-by= Peff\nReviewed-by: \nSigned-off-by: \n" >>expected &&
> +       git interpret-trailers --infile complex_message "ack: Peff" "bug: 42" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'with commit complex message, args and --trim-empty' '
> +       cat complex_message_body >expected &&
> +       printf "Bug #42\nAcked-by= Peff\n" >>expected &&
> +       git interpret-trailers --trim-empty --infile complex_message "ack: Peff" "bug: 42" >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_done
> --
> 1.8.4.1.576.g36ba827.dirty
>



-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-03 21:17 [RFC/PATCH] Add interpret-trailers builtin Christian Couder
  2013-11-04  1:01 ` Johan Herland
@ 2013-11-04 19:02 ` Junio C Hamano
  2013-11-06  6:43   ` Christian Couder
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2013-11-04 19:02 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Johan Herland, Josh Triplett, Thomas Rast, Michael Haggerty,
	Dan Carpenter, Greg Kroah-Hartman

Christian Couder <chriscool@tuxfamily.org> writes:

> This RFC patch shows the work in progress to implement a new
> command:
>
> 	git interpret-trailers
>
> 1) Rational:
>
> This command should help with RFC 822 style headers, called
> "trailers", that are found at the end of commit messages.
>
> For a long time, these trailers have become a de facto standard
> way to add helpful information into commit messages.
>
> Until now git commit has only supported the well known
> "Signed-off-by: " trailer, that is used by many projects like
> the Linux kernel and Git.
>
> It is better to implement features for these trailers in a new
> command rather than in builtin/commit.c, because this way the
> prepare-commit-msg and commit-msg hooks can reuse this command.
>
> 2) Current state:
>
> Currently the usage string of this command is:
>
> git interpret-trailers [--trim-empty] [--infile=file] [<token[=value]>...]
>
> The following features are implemented:
> 	- the result is printed on stdout
> 	- the [<token[=value]>...] arguments are interpreted
> 	- a commit message passed using the "--infile=file" option is interpreted
> 	- the "trailer.<token>.value" options in the config are interpreted
>
> The following features are planned but not yet implemented:
> 	- some documentation
> 	- more tests
> 	- the "trailer.<token>.if_exist" config option
> 	- the "trailer.<token>.if_missing" config option
> 	- the "trailer.<token>.command" config option
>
> 3) Notes:
>
> * "trailer" seems better than "commitTrailer" as the config key because
> it looks like all the config keys are lower case and "committrailer" is not
> very readable.

And closes the door for other things from later acquiring trailers?

> * "trailer.<token>.value" looks better than "trailer.<token>.trailer", so
> I chose the former.

If that is a literal value, then I think ".value" is indeed good.

> * Rather than only one "trailer.<token>.style" config option, it seems
> better to me to have both "trailer.<token>.if_exist" and
> "trailer.<token>.if_missing".

As there is no ".if_exist", ".if_missing", etc. here, I cannot guess
what these "seemingly independent and orthogonal, but some
combinations are impossible" configuration variables are meant to be
used, let alone agreeing to the above claim that they are better
than a single ".style".  I think you took the ".style" from my
thinking-aloud message the other day, but that aloud-thinking lead
to ".style" by taking various use cases people wanted to have
footers into account, including:

 - just appending, allowing duplication of the field name (e.g. more
   than one "cc:" can name different recipients);

 - appending, allowing duplication of the field <name,val> pair
   (e.g. a patch that flowed from person A to B and possibly
   somewhere else then finally back to A may have "Signed-off-by:
   A", chain of other's Sob, and another "Signed-off-by: A");

 - appending, but without consecutive repeats (e.g. the same
   "Signed-off-by:" example; person A does not pass a patch to
   himself, adding two same <name,val> pair next to each other);

 - adding only if there is no same <name> (e.g. "Change-Id:");

 - adding only if there is no <name,val> pair (e.g. "Closes: #bugId");

 - adding one if there is none, replacing one if there already is.

I do not think it is easier for users to express these (and other)
semantics as combinations of "seemingly independent and orthogonal,
but some combinations are impossible" configuration variables.

> * I might send a patch series instead of just one big patch when there will
> be fewer big changes in the code.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  .gitignore                    |   1 +
>  Makefile                      |   1 +
>  builtin.h                     |   1 +
>  builtin/interpret-trailers.c  | 284 ++++++++++++++++++++++++++++++++++++++++++
>  git.c                         |   1 +
>  strbuf.c                      |   7 ++
>  strbuf.h                      |   1 +

I think you're better off having trailers.c at the top level that is
called from builtin/interpret-trailers.c (aside from names), so that
we can later hook the former up with builtin/commit.c codepath.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-04  1:01 ` Johan Herland
@ 2013-11-04 19:12   ` Junio C Hamano
  2013-11-05  2:45     ` Johan Herland
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2013-11-04 19:12 UTC (permalink / raw)
  To: Johan Herland
  Cc: Christian Couder, Git mailing list, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

Johan Herland <johan@herland.net> writes:

Thanks for looking this over.  I am mostly in agreement and will
elide the parts I do not have much to add.

>> This command should help with RFC 822 style headers, called
>> "trailers", that are found at the end of commit messages.
>
> As has been asked earlier in this discussion, we should probably
> specify explicitly what we _mean_ with "RFC822-style"
> headers/footers/trailers, and exactly how closely we follow the actual
> RFC... E.g. do we make use of the linebreaking rules? encoding
> handling? etc... We may want to take a more relaxed approach (after
> all, we're not including a complete RFC822/RFC2822 implementation),
> but we should at least state so, and possibly how/why we do so.

Indeed; especially I do not think we would want to do the 2822
contination lines, or 2047 MIME quoting, ever.

>> For a long time, these trailers have become a de facto standard
>> way to add helpful information into commit messages.

"helpful" is not the key aspect of these footers. They are added at
the end to introduce some structure into a free-form part of the
commit objects.

>> The following features are implemented:
>>         - the result is printed on stdout
>>         - the [<token[=value]>...] arguments are interpreted
>>         - a commit message passed using the "--infile=file" option is interpreted
>
> If the output is written to stdout, then why is not the input taken
> from stdin? Or vice versa: why not --outfile?

I'd say just make it a filter without --in/outfile ;-)

>> +{
>> +       char *end = strchr(arg, '=');
>> +       if (!end)
>> +               end = strchr(arg, ':');
>
> So both '=' (preferred) and ':' are accepted as field/value
> separators. That's ok for the command-line, I believe.

Why?

Sometimes you have to be loose from the beginning _if_ some existing
uses and established conventions make it easier for the users, but
if you do not have to start from being loose, it is almost always a
mistake to do so.  The above code just closed the door to use ":"
for some other useful purposes we may later discover, and will make
us regret for doing so.

> From the enum values, I assume that these declare the desired behavior
> when there are two (or more?) or the same footer (i.e. same "field
> name"). However, it's not (yet) obvious to me in which order they are
> processed. There are several opportunities for multiple instances of
> the same "field name":
>
>  - Two (or more) occurences in the infile
>  - Two (or more) occurences on the command line
>  - One occurence in the infile, and one of the command line

Also, there is a distinction between fields with the same field name
appearing twice and fields with the same field name and the same
field value appearing twice. Some headers do want to have them, some
do not. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-04 19:12   ` Junio C Hamano
@ 2013-11-05  2:45     ` Johan Herland
  2013-11-05  5:42       ` Christian Couder
  2013-11-05 19:09       ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Johan Herland @ 2013-11-05  2:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, Git mailing list, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

On Mon, Nov 4, 2013 at 8:12 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Johan Herland <johan@herland.net> writes:
>>> +{
>>> +       char *end = strchr(arg, '=');
>>> +       if (!end)
>>> +               end = strchr(arg, ':');
>>
>> So both '=' (preferred) and ':' are accepted as field/value
>> separators. That's ok for the command-line, I believe.
>
> Why?
>
> Sometimes you have to be loose from the beginning _if_ some existing
> uses and established conventions make it easier for the users, but
> if you do not have to start from being loose, it is almost always a
> mistake to do so.  The above code just closed the door to use ":"
> for some other useful purposes we may later discover, and will make
> us regret for doing so.

Although I agree with the principle, I think there are (at least) two
established conventions that will be commonly used from the start, and
that we should support:

 - Using short forms with '=', e.g. "ack=Peff". There is already a
convention on how we specify <name> + <value> pairs on the command
line, e.g. "git -c foo=bar ..."

 - Copy-pasting footers from existing commit messages. These will have
the same format as the expected output of this command, and not
accepting the same format in its input seems silly, IMHO.

That said, I think this applies only to the formatting on the _command
line_. When it comes to how the resulting footers are formatted in the
commit message, I would argue it only makes sense to use ':', and I
think the testcase named 'with config setup and = sign' in the above
patch is ugly and unnecessary.

>> From the enum values, I assume that these declare the desired behavior
>> when there are two (or more?) or the same footer (i.e. same "field
>> name"). However, it's not (yet) obvious to me in which order they are
>> processed. There are several opportunities for multiple instances of
>> the same "field name":
>>
>>  - Two (or more) occurences in the infile
>>  - Two (or more) occurences on the command line
>>  - One occurence in the infile, and one of the command line
>
> Also, there is a distinction between fields with the same field name
> appearing twice and fields with the same field name and the same
> field value appearing twice. Some headers do want to have them, some
> do not.

True. This complexity is partly why I initially wanted to leave this
whole thing up to hooks, but I guess now we have to deal with it...
That said, I believe we don't need to cater to every possibility under
the sun, just the most common ones. If a minority of users are not
satisfied, they can simply configure this to leave all duplicates in
place, and then write their own commit-msg hook to do whatever weird
consolidation/cleanup they prefer.

...Johan

PS: Since this program will be run by "git commit", and might also be
invoked by hooks, we need to keep the following in mind:

 - Design for idempotence. A given set of headers might be filtered
through this program multiple times, and we should make sure that
multiple runs over the same data does not itself cause problems.

 - Optional/flexible configuration. When a hook runs this program, it
may want to impose its own set of rules that does not entirely (or at
all) coincide with what is set in the config. We should therefore
consider providing a way for the caller to specify a separate source
of trailer/footer config to apply on a given run.


-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-05  2:45     ` Johan Herland
@ 2013-11-05  5:42       ` Christian Couder
  2013-11-05 19:09       ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Christian Couder @ 2013-11-05  5:42 UTC (permalink / raw)
  To: johan; +Cc: gitster, git, josh, tr, mhagger, dan.carpenter, greg

From: Johan Herland <johan@herland.net>
> On Mon, Nov 4, 2013 at 8:12 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Johan Herland <johan@herland.net> writes:
>>>> +{
>>>> +       char *end = strchr(arg, '=');
>>>> +       if (!end)
>>>> +               end = strchr(arg, ':');
>>>
>>> So both '=' (preferred) and ':' are accepted as field/value
>>> separators. That's ok for the command-line, I believe.
>>
>> Why?
>>
>> Sometimes you have to be loose from the beginning _if_ some existing
>> uses and established conventions make it easier for the users,

The users are already used to appending "Acked-by: Joe <joe@example.com>".
So I think it makes it easier for the user to accept what they are already
used to provide.

>> but
>> if you do not have to start from being loose, it is almost always a
>> mistake to do so.  The above code just closed the door to use ":"
>> for some other useful purposes we may later discover, and will make
>> us regret for doing so.
> 
> Although I agree with the principle, I think there are (at least) two
> established conventions that will be commonly used from the start, and
> that we should support:
> 
>  - Using short forms with '=', e.g. "ack=Peff". There is already a
> convention on how we specify <name> + <value> pairs on the command
> line, e.g. "git -c foo=bar ..."
> 
>  - Copy-pasting footers from existing commit messages. These will have
> the same format as the expected output of this command, and not
> accepting the same format in its input seems silly, IMHO.

I agree. Also I think it will avoid some mistakes by the users.
Because it would require an effort for them to remember that if they
want to see "Acked-by: Joe <joe@example.com>" they have to put
"Acked-by= Joe <joe@example.com>" on the command line.

> That said, I think this applies only to the formatting on the _command
> line_.

But wouldn't it be nice if the same parsing function could be used for
both the command line and the commit message template?

> When it comes to how the resulting footers are formatted in the
> commit message, I would argue it only makes sense to use ':', and I
> think the testcase named 'with config setup and = sign' in the above
> patch is ugly and unnecessary.

I wanted to support configurations like this:

[trailer "ack"]
        value = "Acked-by= "
[trailer "bug"]
        value = "Bug #" 

because Peff said that GitHub uses '#' and while at it I suppose some
people might prefer '=' over ':'.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-05  2:45     ` Johan Herland
  2013-11-05  5:42       ` Christian Couder
@ 2013-11-05 19:09       ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2013-11-05 19:09 UTC (permalink / raw)
  To: Johan Herland
  Cc: Christian Couder, Git mailing list, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

Johan Herland <johan@herland.net> writes:

> On Mon, Nov 4, 2013 at 8:12 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Johan Herland <johan@herland.net> writes:
>>>> +{
>>>> +       char *end = strchr(arg, '=');
>>>> +       if (!end)
>>>> +               end = strchr(arg, ':');
>>>
>>> So both '=' (preferred) and ':' are accepted as field/value
>>> separators. That's ok for the command-line, I believe.
>>
>> Why?
>>
>> Sometimes you have to be loose from the beginning _if_ some existing
>> uses and established conventions make it easier for the users, but
>> if you do not have to start from being loose, it is almost always a
>> mistake to do so.  The above code just closed the door to use ":"
>> for some other useful purposes we may later discover, and will make
>> us regret for doing so.
>
> Although I agree with the principle, I think there are (at least) two
> established conventions that will be commonly used from the start, and
> that we should support:
>
>  - Using short forms with '=', e.g. "ack=Peff". There is already a
> convention on how we specify <name> + <value> pairs on the command
> line, e.g. "git -c foo=bar ..."

I do not have much problem with this.

>  - Copy-pasting footers from existing commit messages. These will have
> the same format as the expected output of this command, and not
> accepting the same format in its input seems silly, IMHO.

I am not sure about this, but syntactically, it is very similar to

    --message "CC: Johan Herland <j@h.net>"

so probably it is OK, but then we do not even have to limit it to
colon, no?  E.g. appending an arbitrary footer, with its literal
value, may be done with something like:

    --footer "CC: Johan Herland <j@h.net>"
    --footer "Closes #12345"

>> Also, there is a distinction between fields with the same field name
>> appearing twice and fields with the same field name and the same
>> field value appearing twice. Some headers do want to have them, some
>> do not.
>
> True. This complexity is partly why I initially wanted to leave this
> whole thing up to hooks, but I guess now we have to deal with it...

If we are adding anything, it has to be able to express what we
already do for Signed-off-by: line, so we cannot start from
somewhere any simpler than this, I am afraid.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-04 19:02 ` Junio C Hamano
@ 2013-11-06  6:43   ` Christian Couder
  2013-11-06  9:23     ` Christian Couder
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Couder @ 2013-11-06  6:43 UTC (permalink / raw)
  To: gitster; +Cc: git, johan, josh, tr, mhagger, dan.carpenter, greg

From: Junio C Hamano <gitster@pobox.com>
> Christian Couder <chriscool@tuxfamily.org> writes:
>>
>> * "trailer" seems better than "commitTrailer" as the config key because
>> it looks like all the config keys are lower case and "committrailer" is not
>> very readable.
> 
> And closes the door for other things from later acquiring trailers?

I don't think it really closes the door, as they can have a name like
"stufftrailer" then. Or better they could call it something else than
"trailer" everywhere from the beginning to avoid mistakes in the first
place.

Or if they are really trailers, for example maybe tag trailers, then
in many cases they might want to share the same configuration as
commit trailers. In this case, it would be a mistake to use
"commitTrailer" when most people would like them to also apply to
tags.

If/when tag trailers appear, then we can decide that "trailer"  is
common for all trailers, "commitTrailer" is for commits only and
"tagTrailer" for tags only.

And anyway commit trailers have existed since the beginning or nearly
the beginning of Git, and we haven't seen yet any other kind of
trailer, so it's reasonnable to think that we can safely take the name
and not worry, be happy.

>> * "trailer.<token>.value" looks better than "trailer.<token>.trailer", so
>> I chose the former.
> 
> If that is a literal value, then I think ".value" is indeed good.

That was what I thought first too.

But, after thinking about what Johan said, I think that it might be
confusing for some people, so now I wonder if I should call it "key".

>> * Rather than only one "trailer.<token>.style" config option, it seems
>> better to me to have both "trailer.<token>.if_exist" and
>> "trailer.<token>.if_missing".
> 
> As there is no ".if_exist", ".if_missing", etc. here, I cannot guess
> what these "seemingly independent and orthogonal, but some
> combinations are impossible" configuration variables are meant to be
> used, let alone agreeing to the above claim that they are better
> than a single ".style".

Yeah, I should have explained more. So I will do it now.

In the code I used the following enums:

enum style_if_exist { DONT_REPEAT, OVERWRITE, REPEAT };
enum style_if_missing { DONT_APPEND, APPEND };

The style_if_exist enum is meant to decide what is done when we have
to deal with 2 or more trailers, from the infile or the command line,
that have the same key but different not empty values.

For example, me might have the 3 following trailers:

"Acked-by: joe <joe@example>"
"Acked-by: bob <bob@example>"
"Acked-by: joe <joe@example>"

- The DONT_REPEAT style, which should be the default in my opinion,
would mean that we don't repeat the same values. So we would get:

"Acked-by: joe <joe@example>"
"Acked-by: bob <bob@example>"

- The OVERWRITE style would mean that we keep only one, (for example
the last one). So we would get:

"Acked-by: joe <joe@example>"

- The REPEAT style would mean that we keep everything. So we would
get:

"Acked-by: joe <joe@example>"
"Acked-by: bob <bob@example>"
"Acked-by: joe <joe@example>"

The style_if_missing enum is meant to decide what is done when we have
no trailer with the specified key. Then DONT_APPEND means that we do
nothing, which should be the default, and APPEND means that we append
a trailer with the specified key.

Of course in the latter case, a command should probably be specified
to tell which value should be used with the key.

For example:

[trailer "signoff"]
	 key = "Signed-off-by:"
	 if_missing = append
	 command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'

would append a s-o-b line only if there is no s-o-b already. 

Note that to always append a specific trailer one could use:

[trailer "signoff"]
	 key = "Signed-off-by:"
	 if_missing = append
	 if_exist = repeat
	 command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'

> I think you took the ".style" from my
> thinking-aloud message the other day, but that aloud-thinking lead
> to ".style" by taking various use cases people wanted to have
> footers into account, including:

Ok, I will try to guess below, how the use cases could be configured.

>  - just appending, allowing duplication of the field name (e.g. more
>    than one "cc:" can name different recipients);

That would be the default (if_exist = dont_repeat, if_missing = dont_append).

>  - appending, allowing duplication of the field <name,val> pair
>    (e.g. a patch that flowed from person A to B and possibly
>    somewhere else then finally back to A may have "Signed-off-by:
>    A", chain of other's Sob, and another "Signed-off-by: A");

That would be: if_exist = repeat, if_missing = dont_append

>  - appending, but without consecutive repeats (e.g. the same
>    "Signed-off-by:" example; person A does not pass a patch to
>    himself, adding two same <name,val> pair next to each other);

I could add a DONT_REPEAT_PREVIOUS into the style_if_exist enum, for
this case.

Then that would be: if_exist = dont_repeat_previous, if_missing = dont_append

>  - adding only if there is no same <name> (e.g. "Change-Id:");

I could add a DONT_APPEND into the style_if_exist enum, for this case.

Then that would be: if_exist = dont_append, if_missing = append

>  - adding only if there is no <name,val> pair (e.g. "Closes: #bugId");

That would be: if_exist = dont_repeat, if_missing = append

>  - adding one if there is none, replacing one if there already is.

That would be: if_exist = overwrite, if_missing = append

> I do not think it is easier for users to express these (and other)
> semantics as combinations of "seemingly independent and orthogonal,
> but some combinations are impossible" configuration variables.

With what I suggest, I don't think there would be some impossible
combinations.

>> * I might send a patch series instead of just one big patch when there will
>> be fewer big changes in the code.
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>>  .gitignore                    |   1 +
>>  Makefile                      |   1 +
>>  builtin.h                     |   1 +
>>  builtin/interpret-trailers.c  | 284 ++++++++++++++++++++++++++++++++++++++++++
>>  git.c                         |   1 +
>>  strbuf.c                      |   7 ++
>>  strbuf.h                      |   1 +
> 
> I think you're better off having trailers.c at the top level that is
> called from builtin/interpret-trailers.c (aside from names), so that
> we can later hook the former up with builtin/commit.c codepath.

Ok, I will do that.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-06  6:43   ` Christian Couder
@ 2013-11-06  9:23     ` Christian Couder
  2013-11-06 17:18       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Couder @ 2013-11-06  9:23 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, git, Johan Herland, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

On Wed, Nov 6, 2013 at 7:43 AM, Christian Couder
<chriscool@tuxfamily.org> wrote:
> Of course in the latter case, a command should probably be specified
> to tell which value should be used with the key.
>
> For example:
>
> [trailer "signoff"]
>          key = "Signed-off-by:"
>          if_missing = append
>          command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'
>
> would append a s-o-b line only if there is no s-o-b already.

Sorry, I realize that I was wrong above.
As the default for "if_exist" is "dont_repeat", the above would append
a s-o-b if there is no s-o-b or if there is one but with a different
value.

To append a s-o-b only if there is no s-o-b already, one would need to use:

[trailer "signoff"]
         key = "Signed-off-by:"
         if_exist = dont_append
         if_missing = append
         command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-06  9:23     ` Christian Couder
@ 2013-11-06 17:18       ` Junio C Hamano
  2013-11-06 20:16         ` Christian Couder
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2013-11-06 17:18 UTC (permalink / raw)
  To: Christian Couder
  Cc: Christian Couder, git, Johan Herland, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

Christian Couder <christian.couder@gmail.com> writes:

> To append a s-o-b only if there is no s-o-b already, one would need to use:
>
> [trailer "signoff"]
>          key = "Signed-off-by:"
>          if_exist = dont_append
>          if_missing = append
>          command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'

But that is insufficient to emulate what we do, no?  I.e. append
unless the last one is from the same person we are about to add.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-06 17:18       ` Junio C Hamano
@ 2013-11-06 20:16         ` Christian Couder
  2013-11-06 20:42           ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Couder @ 2013-11-06 20:16 UTC (permalink / raw)
  To: gitster
  Cc: christian.couder, git, johan, josh, tr, mhagger, dan.carpenter, greg

From: Junio C Hamano <gitster@pobox.com>

> Christian Couder <christian.couder@gmail.com> writes:
> 
>> To append a s-o-b only if there is no s-o-b already, one would need to use:
>>
>> [trailer "signoff"]
>>          key = "Signed-off-by:"
>>          if_exist = dont_append
>>          if_missing = append
>>          command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'
> 
> But that is insufficient to emulate what we do, no?  I.e. append
> unless the last one is from the same person we are about to add.

Yeah, but, with DONT_REPEAT_PREVIOUS, it would be possible using:

[trailer "signoff"]
         key = "Signed-off-by:"
         if_exist = dont_repeat_previous
         if_missing = append
         command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'

Cheers,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-06 20:16         ` Christian Couder
@ 2013-11-06 20:42           ` Junio C Hamano
  2013-11-07 14:57             ` Christian Couder
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2013-11-06 20:42 UTC (permalink / raw)
  To: Christian Couder
  Cc: christian.couder, git, johan, josh, tr, mhagger, dan.carpenter, greg

Christian Couder <chriscool@tuxfamily.org> writes:

> From: Junio C Hamano <gitster@pobox.com>
>
>> Christian Couder <christian.couder@gmail.com> writes:
>> 
>>> To append a s-o-b only if there is no s-o-b already, one would need to use:
>>>
>>> [trailer "signoff"]
>>>          key = "Signed-off-by:"
>>>          if_exist = dont_append
>>>          if_missing = append
>>>          command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'
>> 
>> But that is insufficient to emulate what we do, no?  I.e. append
>> unless the last one is from the same person we are about to add.
>
> Yeah, but, with DONT_REPEAT_PREVIOUS, it would be possible using:
>
> [trailer "signoff"]
>          key = "Signed-off-by:"
>          if_exist = dont_repeat_previous
>          if_missing = append
>          command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'

Anything is possible, but "possible" does not justify "it is better
way than other possible ways".

What are the plausible values for if_missing?  If if_missing needs
"prepend", for example, in addition to "append", does it mean
if_exist also needs corresponding "prepend" variant for the value
"dont_repeat_previous" you would give to if_exist?

Having two that are seemingly independent configuration does not
seem to be helping in reducing complexity (by keeping settings that
can be independently set orthogonal, by saying "if the other rule
decides to add, do we append, prepend, insert at the middle?", for
example).

How would one differentiate between "there is a field with that key"
and "there is a field with that <key, value> combination" with a
single if_exist?  Add another variable if_exist_exact?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH] Add interpret-trailers builtin
  2013-11-06 20:42           ` Junio C Hamano
@ 2013-11-07 14:57             ` Christian Couder
  0 siblings, 0 replies; 13+ messages in thread
From: Christian Couder @ 2013-11-07 14:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, git, Johan Herland, Josh Triplett, Thomas Rast,
	Michael Haggerty, Dan Carpenter, Greg Kroah-Hartman

On Wed, Nov 6, 2013 at 9:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <chriscool@tuxfamily.org> writes:
>> From: Junio C Hamano <gitster@pobox.com>
>>
>>> But that is insufficient to emulate what we do, no?  I.e. append
>>> unless the last one is from the same person we are about to add.
>>
>> Yeah, but, with DONT_REPEAT_PREVIOUS, it would be possible using:
>>
>> [trailer "signoff"]
>>          key = "Signed-off-by:"
>>          if_exist = dont_repeat_previous
>>          if_missing = append
>>          command = echo "$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"'
>
> Anything is possible, but "possible" does not justify "it is better
> way than other possible ways".
>
> What are the plausible values for if_missing?  If if_missing needs
> "prepend", for example, in addition to "append", does it mean
> if_exist also needs corresponding "prepend" variant for the value
> "dont_repeat_previous" you would give to if_exist?

Yeah, we could add "prepend" to the possible values for if_missing and
also other values that prepend instead of append to if_exist.
But I am not sure they would be as useful.

If we think they are useful, then maybe we could instead add another
configuration to say if we want to append or prepend or put in the
middle, and change the others to make them clearer.
For example we could have:

where = (after | middle | before)
if_exist = (add_if_different | add_if_neighbor_different | add |
overwrite | do_nothing)
if_missing = (do_nothing | add)

(The default being the first choice.)

> Having two that are seemingly independent configuration does not
> seem to be helping in reducing complexity (by keeping settings that
> can be independently set orthogonal, by saying "if the other rule
> decides to add, do we append, prepend, insert at the middle?", for
> example).

I am not sure I understand what you mean.
In my opinion, if we want to use just one configuration, instead of 2
or 3, it will not reduce complexity.

Maybe we could parse something like:

style = if_exist:add_if_different:after, if_missing:add:before

or like:

style = (append | prepend | insert | append_unless <regexp> |
prepend_unless <regexp> | insert_unless <regexp>)

with <regexp> being a regexp where $KEY is the key and $VALUE is the
value, so you can say: append_unless ^$KEY[:=]$VALUE.*

or like:

style = (append_if <cmd> | prepend_if <cmd> | insert_if <cmd>)

with <cmd> being a shell command that should exit with code 0.

(The command would be passed the existing trailers in stdin.
So you could say things like: "append_if true" or "append_if tail -1 |
grep -v '$KEY: $VALUE'")

But if we want to keep things simple, I think what I suggest first
above is probably best.

And by the way, later we might add "add_if_match <regexp>" or
"add_if_true <cmd>" to "if_exist" and "if_missing", so we could still
be as powerful as the other styles.

> How would one differentiate between "there is a field with that key"
> and "there is a field with that <key, value> combination" with a
> single if_exist?  Add another variable if_exist_exact?

"if_exist = do_nothing" would mean: "do nothing if there is a field
with that key"
"if_exist = overwrite" would mean: "overwrite the existing value of a
field with that key"
"if_exist = add" would mean: "add if there is one or more fields with
that key (whatever the value(s), so the value(s) could be the same)"
"if_exist = add_if_different" would mean: "add only if there is no
field with the same <key, value> pair"
"if_exist = add_if_different_neighbor" would mean: "add only if there
is no field with the same <key, value> pair where we are going to add"

"if_missing = do_nothing" would mean: "do nothing if there is no field
with that key"
"if_missing= add" would mean: "add if there is no field with that key"

"where = after" would mean: "if we decide to add, we will put the
trailer after the other trailers"
"where = middle" would mean: "if we decide to add, we will put the
trailer in the middle of the other trailers"
"where = before" would mean: "if we decide to add, we will put the
trailer before the other trailers"

In my opinion, that should be enough for most use cases.

It is true that some people might want something more complex (like
adding only if there is no value for the same key that match a given
regexp) or something stranger like adding only if there is already a
field with the same <key, value> pair.

But we can take care of these special cases later if they actually happen.
Then we will hopefully have more experience.
And we could add things like "add_if_no_value_match <regexp>" or
"add_if <cmd>" to if_exist and if_missing .

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-07 14:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-03 21:17 [RFC/PATCH] Add interpret-trailers builtin Christian Couder
2013-11-04  1:01 ` Johan Herland
2013-11-04 19:12   ` Junio C Hamano
2013-11-05  2:45     ` Johan Herland
2013-11-05  5:42       ` Christian Couder
2013-11-05 19:09       ` Junio C Hamano
2013-11-04 19:02 ` Junio C Hamano
2013-11-06  6:43   ` Christian Couder
2013-11-06  9:23     ` Christian Couder
2013-11-06 17:18       ` Junio C Hamano
2013-11-06 20:16         ` Christian Couder
2013-11-06 20:42           ` Junio C Hamano
2013-11-07 14:57             ` Christian Couder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.