All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/14] counting trailers with shortlogs
@ 2015-12-29  7:18 Jeff King
  2015-12-29  7:19 ` [PATCH 01/14] move string functions out of git-compat-util Jeff King
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:18 UTC (permalink / raw)
  To: git

I happened to be reading an old discussion on trailers a few weeks ago,
and the idea was mentioned of providing access to commit trailers in a
more structured way from traversal commands. E.g., we could presumably
have a log pretty-format for showing "Signed-off-by", like
"%(trailer:signed-off-by)" or something. But that opens up a lot of
questions, like the exact formatting, whether we would allow wildcards
("*-by:"), and if so how that affects formatting.

So I took the cowards way out and implemented trailer support for
git-shortlog, which seemed much simpler. I'll admit I haven't used it
for anything so far beyond a few "fun" queries, but it does provide some
infrastructure we could use later for git-log.

And things being what they are, I ended up with a few cleanups and
optimizations along the way. After this series, a regular "shortlog -ns"
is about 10% faster.

  [01/14]: move string functions out of git-compat-util
  [02/14]: log: refactor add_header to drop some magic numbers
  [03/14]: strutil: add skip_prefix_case
  [04/14]: shortlog: use skip_prefix_icase to parse "Author" lines
  [05/14]: shortlog: use strbufs to read from stdin
  [06/14]: shortlog: replace hand-parsing of author with pretty-printer
  [07/14]: shortlog: optimize "--summary" mode
  [08/14]: shortlog: optimize out useless "<none>" normalization
  [09/14]: shortlog: optimize out useless string list
  [10/14]: shortlog: change "author" variables to "ident"
  [11/14]: shortlog: allow grouping by committer ident
  [12/14]: trailer: factor out config reading
  [13/14]: trailer: add interface for parsing commit trailers
  [14/14]: shortlog: match commit trailers with --ident

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/14] move string functions out of git-compat-util
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
@ 2015-12-29  7:19 ` Jeff King
  2015-12-29  7:20 ` [PATCH 02/14] log: refactor add_header to drop some magic numbers Jeff King
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:19 UTC (permalink / raw)
  To: git

We have several inline string helpers (skip_prefix, etc)
defined in git-compat-util.h. This is convenient, as it
makes them available everywhere, but this is probably not
the best place for them. Besides cluttering a file which is
otherwise mostly about compatibility issues, defining inline
functions in git-compat-util.h restricts what they can do.

For instance, if we wanted to use tolower() in one of the
functions, we run into an ordering problem with our
redefinitions of tolower(), which are done later in
git-compat-util.h.

This patch moves them to their own header file, strutil.h.
However, it also includes them in strbuf.h (which needs them
for its inline strbuf_strip_suffix). This also puts them
indirectly into cache.h, which means they are effectively
available everywhere (just as they were before).

Signed-off-by: Jeff King <peff@peff.net>
---
 git-compat-util.h | 63 ---------------------------------------------------
 strbuf.h          |  2 ++
 strutil.h         | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 63 deletions(-)
 create mode 100644 strutil.h

diff --git a/git-compat-util.h b/git-compat-util.h
index 2da0a75..da67dc6 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -430,69 +430,6 @@ extern void set_error_routine(void (*routine)(const char *err, va_list params));
 extern void set_die_is_recursing_routine(int (*routine)(void));
 extern void set_error_handle(FILE *);
 
-extern int starts_with(const char *str, const char *prefix);
-
-/*
- * If the string "str" begins with the string found in "prefix", return 1.
- * The "out" parameter is set to "str + strlen(prefix)" (i.e., to the point in
- * the string right after the prefix).
- *
- * Otherwise, return 0 and leave "out" untouched.
- *
- * Examples:
- *
- *   [extract branch name, fail if not a branch]
- *   if (!skip_prefix(ref, "refs/heads/", &branch)
- *	return -1;
- *
- *   [skip prefix if present, otherwise use whole string]
- *   skip_prefix(name, "refs/heads/", &name);
- */
-static inline int skip_prefix(const char *str, const char *prefix,
-			      const char **out)
-{
-	do {
-		if (!*prefix) {
-			*out = str;
-			return 1;
-		}
-	} while (*str++ == *prefix++);
-	return 0;
-}
-
-/*
- * If buf ends with suffix, return 1 and subtract the length of the suffix
- * from *len. Otherwise, return 0 and leave *len untouched.
- */
-static inline int strip_suffix_mem(const char *buf, size_t *len,
-				   const char *suffix)
-{
-	size_t suflen = strlen(suffix);
-	if (*len < suflen || memcmp(buf + (*len - suflen), suffix, suflen))
-		return 0;
-	*len -= suflen;
-	return 1;
-}
-
-/*
- * If str ends with suffix, return 1 and set *len to the size of the string
- * without the suffix. Otherwise, return 0 and set *len to the size of the
- * string.
- *
- * Note that we do _not_ NUL-terminate str to the new length.
- */
-static inline int strip_suffix(const char *str, const char *suffix, size_t *len)
-{
-	*len = strlen(str);
-	return strip_suffix_mem(str, len, suffix);
-}
-
-static inline int ends_with(const char *str, const char *suffix)
-{
-	size_t len;
-	return strip_suffix(str, suffix, &len);
-}
-
 #if defined(NO_MMAP) || defined(USE_WIN32_MMAP)
 
 #ifndef PROT_READ
diff --git a/strbuf.h b/strbuf.h
index 7123fca..a1a2df9 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -1,6 +1,8 @@
 #ifndef STRBUF_H
 #define STRBUF_H
 
+#include "strutil.h"
+
 /**
  * strbuf's are meant to be used with all the usual C string and memory
  * APIs. Given that the length of the buffer is known, it's often better to
diff --git a/strutil.h b/strutil.h
new file mode 100644
index 0000000..4fa2071
--- /dev/null
+++ b/strutil.h
@@ -0,0 +1,67 @@
+#ifndef STRUTIL_H
+#define STRUTIL_H
+
+extern int starts_with(const char *str, const char *prefix);
+
+/*
+ * If the string "str" begins with the string found in "prefix", return 1.
+ * The "out" parameter is set to "str + strlen(prefix)" (i.e., to the point in
+ * the string right after the prefix).
+ *
+ * Otherwise, return 0 and leave "out" untouched.
+ *
+ * Examples:
+ *
+ *   [extract branch name, fail if not a branch]
+ *   if (!skip_prefix(ref, "refs/heads/", &branch)
+ *	return -1;
+ *
+ *   [skip prefix if present, otherwise use whole string]
+ *   skip_prefix(name, "refs/heads/", &name);
+ */
+static inline int skip_prefix(const char *str, const char *prefix,
+			      const char **out)
+{
+	do {
+		if (!*prefix) {
+			*out = str;
+			return 1;
+		}
+	} while (*str++ == *prefix++);
+	return 0;
+}
+
+/*
+ * If buf ends with suffix, return 1 and subtract the length of the suffix
+ * from *len. Otherwise, return 0 and leave *len untouched.
+ */
+static inline int strip_suffix_mem(const char *buf, size_t *len,
+				   const char *suffix)
+{
+	size_t suflen = strlen(suffix);
+	if (*len < suflen || memcmp(buf + (*len - suflen), suffix, suflen))
+		return 0;
+	*len -= suflen;
+	return 1;
+}
+
+/*
+ * If str ends with suffix, return 1 and set *len to the size of the string
+ * without the suffix. Otherwise, return 0 and set *len to the size of the
+ * string.
+ *
+ * Note that we do _not_ NUL-terminate str to the new length.
+ */
+static inline int strip_suffix(const char *str, const char *suffix, size_t *len)
+{
+	*len = strlen(str);
+	return strip_suffix_mem(str, len, suffix);
+}
+
+static inline int ends_with(const char *str, const char *suffix)
+{
+	size_t len;
+	return strip_suffix(str, suffix, &len);
+}
+
+#endif /* STRUTIL_H */
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/14] log: refactor add_header to drop some magic numbers
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
  2015-12-29  7:19 ` [PATCH 01/14] move string functions out of git-compat-util Jeff King
@ 2015-12-29  7:20 ` Jeff King
  2015-12-31  6:21   ` Eric Sunshine
  2015-12-29  7:22 ` [PATCH 03/14] strutil: add skip_prefix_icase Jeff King
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:20 UTC (permalink / raw)
  To: git

We want to chomp newlines off the end of the "value" string.
But because it's const, we must track its length rather than
writing a NUL. This leads to us having to tweak that length
later, to account for moving the pointer forward.

Since we are about to create a copy of it anyway, let's just
wait and chomp at the end.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/log.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index e00cea7..76823e6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -675,21 +675,18 @@ static struct string_list extra_cc;
 static void add_header(const char *value)
 {
 	struct string_list_item *item;
-	int len = strlen(value);
-	while (len && value[len - 1] == '\n')
-		len--;
+	size_t len;
 
-	if (!strncasecmp(value, "to: ", 4)) {
+	if (!strncasecmp(value, "to: ", 4))
 		item = string_list_append(&extra_to, value + 4);
-		len -= 4;
-	} else if (!strncasecmp(value, "cc: ", 4)) {
+	else if (!strncasecmp(value, "cc: ", 4))
 		item = string_list_append(&extra_cc, value + 4);
-		len -= 4;
-	} else {
+	else
 		item = string_list_append(&extra_hdr, value);
-	}
 
-	item->string[len] = '\0';
+	len = strlen(item->string);
+	while (len && item->string[len - 1] == '\n')
+		item->string[--len] = '\0';
 }
 
 #define THREAD_SHALLOW 1
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/14] strutil: add skip_prefix_icase
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
  2015-12-29  7:19 ` [PATCH 01/14] move string functions out of git-compat-util Jeff King
  2015-12-29  7:20 ` [PATCH 02/14] log: refactor add_header to drop some magic numbers Jeff King
@ 2015-12-29  7:22 ` Jeff King
  2015-12-31  6:40   ` Eric Sunshine
  2015-12-29  7:27 ` [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines Jeff King
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:22 UTC (permalink / raw)
  To: git

Some sites that otherwise would use skip_prefix cannot do
so, because it has no way to do case-insensitive
comparisons. Such sites usually get around this by using
strncasecmp, at the cost of having to use magic numbers.
We can help them by providing a case-insensitive version of
skip_prefix.

Unfortunately, we don't share any code with the original
skip_prefix. Since this is performance-sensitive code, we
would not want to introduce an extra "do we are about case?"
conditional into the middle of the loop. We could instead
use macros or another technique to generate the
almost-identical implementations, but the function simply
isn't long enough to merit that confusing boilerplate.

To show off the new function, we convert a simple case in
log's add_header function.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/log.c |  8 ++++----
 strutil.h     | 15 +++++++++++++++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 76823e6..44c6b26 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -677,10 +677,10 @@ static void add_header(const char *value)
 	struct string_list_item *item;
 	size_t len;
 
-	if (!strncasecmp(value, "to: ", 4))
-		item = string_list_append(&extra_to, value + 4);
-	else if (!strncasecmp(value, "cc: ", 4))
-		item = string_list_append(&extra_cc, value + 4);
+	if (skip_prefix_icase(value, "to: ", &value))
+		item = string_list_append(&extra_to, value);
+	else if (skip_prefix_icase(value, "cc: ", &value))
+		item = string_list_append(&extra_cc, value);
 	else
 		item = string_list_append(&extra_hdr, value);
 
diff --git a/strutil.h b/strutil.h
index 4fa2071..bb4c02d 100644
--- a/strutil.h
+++ b/strutil.h
@@ -32,6 +32,21 @@ static inline int skip_prefix(const char *str, const char *prefix,
 }
 
 /*
+ * Identical to skip_prefix, but compare characters case-insensitively.
+ */
+static inline int skip_prefix_icase(const char *str, const char *prefix,
+				    const char **out)
+{
+	do {
+		if (!*prefix) {
+			*out = str;
+			return 1;
+		}
+	} while (tolower(*str++) == tolower(*prefix++));
+	return 0;
+}
+
+/*
  * If buf ends with suffix, return 1 and subtract the length of the suffix
  * from *len. Otherwise, return 0 and leave *len untouched.
  */
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (2 preceding siblings ...)
  2015-12-29  7:22 ` [PATCH 03/14] strutil: add skip_prefix_icase Jeff King
@ 2015-12-29  7:27 ` Jeff King
  2015-12-31  6:47   ` Eric Sunshine
  2015-12-29  7:28 ` [PATCH 05/14] shortlog: use strbufs to read from stdin Jeff King
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:27 UTC (permalink / raw)
  To: git

Because we must match both "Author" and "author" here, we
could not use skip_prefix, and had to hand-code a partial
case-insensitive match. Now that we have skip_prefix_case,
we can use it. This is technically more liberal in what it
matches (e.g., it will match AUTHOR), but in this particular
case that that's OK (we are matching git-log output, so we
expect arbitrary data like commit headers to be indented).

In addition to being easier to read, this will make the code
easier to adapt for matching other lines.

Signed-off-by: Jeff King <peff@peff.net>
---
To be honest, I'm not sure what the original was trying for. I assumed
it was to match "log --raw" output, but because we always expect the
colon, it doesn't. I think this may actually have broken in b8ec592
(Build in shortlog, 2006-10-22); the original perl script looked for:

  /^[Aa]uthor:?\s*/

We could re-fix that, I guess, but it seems clear that nobody actually
cares (and anybody sane uses the builtin traversal these days anyway).

We could similarly drop the case-insensitivity, as I don't think it is
helping anyone in practice (though that is less easy to know).

 builtin/shortlog.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 35ebd17..86e277a 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -94,8 +94,8 @@ static void read_from_stdin(struct shortlog *log)
 	char author[1024], oneline[1024];
 
 	while (fgets(author, sizeof(author), stdin) != NULL) {
-		if (!(author[0] == 'A' || author[0] == 'a') ||
-		    !starts_with(author + 1, "uthor: "))
+		const char *v;
+		if (!skip_prefix_icase(author, "Author: ", &v))
 			continue;
 		while (fgets(oneline, sizeof(oneline), stdin) &&
 		       oneline[0] != '\n')
@@ -103,7 +103,7 @@ static void read_from_stdin(struct shortlog *log)
 		while (fgets(oneline, sizeof(oneline), stdin) &&
 		       oneline[0] == '\n')
 			; /* discard blanks */
-		insert_one_record(log, author + 8, oneline);
+		insert_one_record(log, v, oneline);
 	}
 }
 
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/14] shortlog: use strbufs to read from stdin
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (3 preceding siblings ...)
  2015-12-29  7:27 ` [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines Jeff King
@ 2015-12-29  7:28 ` Jeff King
  2015-12-29  7:29 ` [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer Jeff King
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:28 UTC (permalink / raw)
  To: git

We currently use fixed-size buffers with fgets(), which
could lead to incorrect results in the unlikely event that a
line had something like "Author:" at exactly its 1024th
character.

But it's easy to convert this to a strbuf, and because we
can reuse the same buffer through the loop, we don't even
pay the extra allocation cost.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 86e277a..668cdb4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -91,20 +91,23 @@ static void insert_one_record(struct shortlog *log,
 
 static void read_from_stdin(struct shortlog *log)
 {
-	char author[1024], oneline[1024];
+	struct strbuf author = STRBUF_INIT;
+	struct strbuf oneline = STRBUF_INIT;
 
-	while (fgets(author, sizeof(author), stdin) != NULL) {
+	while (strbuf_getline(&author, stdin, '\n') != EOF) {
 		const char *v;
-		if (!skip_prefix_icase(author, "Author: ", &v))
+		if (!skip_prefix_icase(author.buf, "Author: ", &v))
 			continue;
-		while (fgets(oneline, sizeof(oneline), stdin) &&
-		       oneline[0] != '\n')
+		while (strbuf_getline(&oneline, stdin, '\n') != EOF &&
+		       oneline.len)
 			; /* discard headers */
-		while (fgets(oneline, sizeof(oneline), stdin) &&
-		       oneline[0] == '\n')
+		while (strbuf_getline(&oneline, stdin, '\n') != EOF &&
+		       !oneline.len)
 			; /* discard blanks */
-		insert_one_record(log, v, oneline);
+		insert_one_record(log, v, oneline.buf);
 	}
+	strbuf_release(&author);
+	strbuf_release(&oneline);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (4 preceding siblings ...)
  2015-12-29  7:28 ` [PATCH 05/14] shortlog: use strbufs to read from stdin Jeff King
@ 2015-12-29  7:29 ` Jeff King
  2016-01-04  9:43   ` Eric Sunshine
  2015-12-29  7:30 ` [PATCH 07/14] shortlog: optimize "--summary" mode Jeff King
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:29 UTC (permalink / raw)
  To: git

When gathering the author and oneline subject for each
commit, we hand-parse the commit headers to find the
"author" line, and then continue past to the blank line at
the end of the header.

We can replace this tricky hand-parsing by simply asking the
pretty-printer for the relevant items. This also decouples
the author and oneline parsing, opening up some new
optimizations in further commits.

One reason to avoid the pretty-printer is that it might be
less efficient than hand-parsing. However, I measured no
slowdown at all running "git shortlog -ns HEAD" on
linux.git.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 57 +++++++++++++++++++++---------------------------------
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 668cdb4..10c92fa 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -112,45 +112,32 @@ static void read_from_stdin(struct shortlog *log)
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	const char *author = NULL, *buffer;
-	struct strbuf buf = STRBUF_INIT;
-	struct strbuf ufbuf = STRBUF_INIT;
-
-	pp_commit_easy(CMIT_FMT_RAW, commit, &buf);
-	buffer = buf.buf;
-	while (*buffer && *buffer != '\n') {
-		const char *eol = strchr(buffer, '\n');
-
-		if (eol == NULL)
-			eol = buffer + strlen(buffer);
-		else
-			eol++;
-
-		if (starts_with(buffer, "author "))
-			author = buffer + 7;
-		buffer = eol;
-	}
-	if (!author) {
+	struct strbuf author = STRBUF_INIT;
+	struct strbuf oneline = STRBUF_INIT;
+	struct pretty_print_context ctx = {0};
+
+	ctx.fmt = CMIT_FMT_USERFORMAT;
+	ctx.abbrev = log->abbrev;
+	ctx.subject = "";
+	ctx.after_subject = "";
+	ctx.date_mode.type = DATE_NORMAL;
+	ctx.output_encoding = get_log_output_encoding();
+
+	format_commit_message(commit, "%an <%ae>", &author, &ctx);
+	if (author.len <= 3) {
 		warning(_("Missing author: %s"),
 		    oid_to_hex(&commit->object.oid));
 		return;
 	}
-	if (log->user_format) {
-		struct pretty_print_context ctx = {0};
-		ctx.fmt = CMIT_FMT_USERFORMAT;
-		ctx.abbrev = log->abbrev;
-		ctx.subject = "";
-		ctx.after_subject = "";
-		ctx.date_mode.type = DATE_NORMAL;
-		ctx.output_encoding = get_log_output_encoding();
-		pretty_print_commit(&ctx, commit, &ufbuf);
-		buffer = ufbuf.buf;
-	} else if (*buffer) {
-		buffer++;
-	}
-	insert_one_record(log, author, !*buffer ? "<none>" : buffer);
-	strbuf_release(&ufbuf);
-	strbuf_release(&buf);
+
+	if (log->user_format)
+		pretty_print_commit(&ctx, commit, &oneline);
+	else
+		format_commit_message(commit, "%s", &oneline, &ctx);
+
+	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
+	strbuf_release(&author);
+	strbuf_release(&oneline);
 }
 
 static void get_from_rev(struct rev_info *rev, struct shortlog *log)
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/14] shortlog: optimize "--summary" mode
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (5 preceding siblings ...)
  2015-12-29  7:29 ` [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer Jeff King
@ 2015-12-29  7:30 ` Jeff King
  2015-12-29  7:31 ` [PATCH 08/14] shortlog: optimize out useless "<none>" normalization Jeff King
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:30 UTC (permalink / raw)
  To: git

If the user asked us only to show counts for each author,
rather than the individual summary lines, then there is no
point in us generating the summaries only to throw them
away. With this patch, I measured the following speedup for
"git shortlog -ns HEAD" on linux.git (best-of-five):

  [before]
  real    0m5.644s
  user    0m5.472s
  sys     0m0.176s

  [after]
  real    0m5.257s
  user    0m5.104s
  sys     0m0.156s

That's only ~7%, but it's so easy to do, there's no good
reason not to. We don't have to touch any downstream code,
since we already fill in the magic string "<none>" to handle
commits without a message.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 10c92fa..af24d84 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -130,10 +130,12 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		return;
 	}
 
-	if (log->user_format)
-		pretty_print_commit(&ctx, commit, &oneline);
-	else
-		format_commit_message(commit, "%s", &oneline, &ctx);
+	if (!log->summary) {
+		if (log->user_format)
+			pretty_print_commit(&ctx, commit, &oneline);
+		else
+			format_commit_message(commit, "%s", &oneline, &ctx);
+	}
 
 	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
 	strbuf_release(&author);
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/14] shortlog: optimize out useless "<none>" normalization
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (6 preceding siblings ...)
  2015-12-29  7:30 ` [PATCH 07/14] shortlog: optimize "--summary" mode Jeff King
@ 2015-12-29  7:31 ` Jeff King
  2015-12-29  7:32 ` [PATCH 09/14] shortlog: optimize out useless string list Jeff King
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:31 UTC (permalink / raw)
  To: git

If we are in --summary mode, we will always pass <none> to
insert_one_record, which will then do some normalization
(e.g., cutting out "[PATCH]"). There's no point in doing so
if we aren't going to use the result anyway.

This drops my best-of-five for "git shortlog -ns HEAD" on
linux.git from:

  real    0m5.257s
  user    0m5.104s
  sys     0m0.156s

to:

  real    0m5.194s
  user    0m5.028s
  sys     0m0.168s

That's only 1%, but arguably the result is clearer to read,
as we're able to group our variable declarations inside the
conditional block. It also opens up further optimization
possibilities for future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 63 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index af24d84..400e03b 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -31,13 +31,9 @@ static void insert_one_record(struct shortlog *log,
 			      const char *author,
 			      const char *oneline)
 {
-	const char *dot3 = log->common_repo_prefix;
-	char *buffer, *p;
 	struct string_list_item *item;
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
-	const char *eol;
-	struct strbuf subject = STRBUF_INIT;
 	struct strbuf namemailbuf = STRBUF_INIT;
 	struct ident_split ident;
 
@@ -59,34 +55,43 @@ static void insert_one_record(struct shortlog *log,
 	if (item->util == NULL)
 		item->util = xcalloc(1, sizeof(struct string_list));
 
-	/* Skip any leading whitespace, including any blank lines. */
-	while (*oneline && isspace(*oneline))
-		oneline++;
-	eol = strchr(oneline, '\n');
-	if (!eol)
-		eol = oneline + strlen(oneline);
-	if (starts_with(oneline, "[PATCH")) {
-		char *eob = strchr(oneline, ']');
-		if (eob && (!eol || eob < eol))
-			oneline = eob + 1;
-	}
-	while (*oneline && isspace(*oneline) && *oneline != '\n')
-		oneline++;
-	format_subject(&subject, oneline, " ");
-	buffer = strbuf_detach(&subject, NULL);
-
-	if (dot3) {
-		int dot3len = strlen(dot3);
-		if (dot3len > 5) {
-			while ((p = strstr(buffer, dot3)) != NULL) {
-				int taillen = strlen(p) - dot3len;
-				memcpy(p, "/.../", 5);
-				memmove(p + 5, p + dot3len, taillen + 1);
+	if (log->summary)
+		string_list_append(item->util, xstrdup(""));
+	else {
+		const char *dot3 = log->common_repo_prefix;
+		char *buffer, *p;
+		struct strbuf subject = STRBUF_INIT;
+		const char *eol;
+
+		/* Skip any leading whitespace, including any blank lines. */
+		while (*oneline && isspace(*oneline))
+			oneline++;
+		eol = strchr(oneline, '\n');
+		if (!eol)
+			eol = oneline + strlen(oneline);
+		if (starts_with(oneline, "[PATCH")) {
+			char *eob = strchr(oneline, ']');
+			if (eob && (!eol || eob < eol))
+				oneline = eob + 1;
+		}
+		while (*oneline && isspace(*oneline) && *oneline != '\n')
+			oneline++;
+		format_subject(&subject, oneline, " ");
+		buffer = strbuf_detach(&subject, NULL);
+
+		if (dot3) {
+			int dot3len = strlen(dot3);
+			if (dot3len > 5) {
+				while ((p = strstr(buffer, dot3)) != NULL) {
+					int taillen = strlen(p) - dot3len;
+					memcpy(p, "/.../", 5);
+					memmove(p + 5, p + dot3len, taillen + 1);
+				}
 			}
 		}
-	}
 
-	string_list_append(item->util, buffer);
+		string_list_append(item->util, buffer);
+	}
 }
 
 static void read_from_stdin(struct shortlog *log)
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/14] shortlog: optimize out useless string list
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (7 preceding siblings ...)
  2015-12-29  7:31 ` [PATCH 08/14] shortlog: optimize out useless "<none>" normalization Jeff King
@ 2015-12-29  7:32 ` Jeff King
  2015-12-29  7:32 ` [PATCH 10/14] shortlog: change "author" variables to "ident" Jeff King
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:32 UTC (permalink / raw)
  To: git

If we are in "--summary" mode, then we do not care about the
actual list of subject onelines associated with each author.
We care only about the number. So rather than store a
string-list for each author full of "<none>", let's just
keep a count.

This drops my best-of-five for "git shortlog -ns HEAD" on
linux.git from:

  real    0m5.194s
  user    0m5.028s
  sys     0m0.168s

to:

  real    0m5.057s
  user    0m4.916s
  sys     0m0.144s

That's about 2.5%.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 400e03b..7ec6b76 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -14,7 +14,26 @@ static char const * const shortlog_usage[] = {
 	NULL
 };
 
-static int compare_by_number(const void *a1, const void *a2)
+/*
+ * The util field of our string_list_items will contain one of two things:
+ *
+ *   - if --summary is not in use, it will point to a string list of the
+ *     oneline subjects assigned to this author
+ *
+ *   - if --summary is in use, we don't need that list; we only need to know
+ *     its size. So we abuse the pointer slot to store our integer counter.
+ *
+ *  This macro accesses the latter.
+ */
+#define UTIL_TO_INT(x) ((intptr_t)(x)->util)
+
+static int compare_by_counter(const void *a1, const void *a2)
+{
+	const struct string_list_item *i1 = a1, *i2 = a2;
+	return UTIL_TO_INT(i2) - UTIL_TO_INT(i1);
+}
+
+static int compare_by_list(const void *a1, const void *a2)
 {
 	const struct string_list_item *i1 = a1, *i2 = a2;
 	const struct string_list *l1 = i1->util, *l2 = i2->util;
@@ -52,11 +71,9 @@ static void insert_one_record(struct shortlog *log,
 		strbuf_addf(&namemailbuf, " <%.*s>", (int)maillen, mailbuf);
 
 	item = string_list_insert(&log->list, namemailbuf.buf);
-	if (item->util == NULL)
-		item->util = xcalloc(1, sizeof(struct string_list));
 
 	if (log->summary)
-		string_list_append(item->util, xstrdup(""));
+		item->util = (void *)(UTIL_TO_INT(item) + 1);
 	else {
 		const char *dot3 = log->common_repo_prefix;
 		char *buffer, *p;
@@ -90,6 +107,8 @@ static void insert_one_record(struct shortlog *log,
 			}
 		}
 
+		if (item->util == NULL)
+			item->util = xcalloc(1, sizeof(struct string_list));
 		string_list_append(item->util, buffer);
 	}
 }
@@ -291,14 +310,14 @@ void shortlog_output(struct shortlog *log)
 
 	if (log->sort_by_number)
 		qsort(log->list.items, log->list.nr, sizeof(struct string_list_item),
-			compare_by_number);
+		      log->summary ? compare_by_counter : compare_by_list);
 	for (i = 0; i < log->list.nr; i++) {
-		struct string_list *onelines = log->list.items[i].util;
-
+		const struct string_list_item *item = &log->list.items[i];
 		if (log->summary) {
-			printf("%6d\t%s\n", onelines->nr, log->list.items[i].string);
+			printf("%6d\t%s\n", (int)UTIL_TO_INT(item), item->string);
 		} else {
-			printf("%s (%d):\n", log->list.items[i].string, onelines->nr);
+			struct string_list *onelines = item->util;
+			printf("%s (%d):\n", item->string, onelines->nr);
 			for (j = onelines->nr - 1; j >= 0; j--) {
 				const char *msg = onelines->items[j].string;
 
@@ -311,11 +330,11 @@ void shortlog_output(struct shortlog *log)
 					printf("      %s\n", msg);
 			}
 			putchar('\n');
+			onelines->strdup_strings = 1;
+			string_list_clear(onelines, 0);
+			free(onelines);
 		}
 
-		onelines->strdup_strings = 1;
-		string_list_clear(onelines, 0);
-		free(onelines);
 		log->list.items[i].util = NULL;
 	}
 
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/14] shortlog: change "author" variables to "ident"
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (8 preceding siblings ...)
  2015-12-29  7:32 ` [PATCH 09/14] shortlog: optimize out useless string list Jeff King
@ 2015-12-29  7:32 ` Jeff King
  2015-12-29  7:35 ` [PATCH 11/14] shortlog: allow grouping by committer ident Jeff King
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:32 UTC (permalink / raw)
  To: git

This is in preparation for shortlog counting more things
than just authors. Breaking it out into a separate patch
keeps the noise down when the real changes come.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 7ec6b76..d7eb0cb 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -47,7 +47,7 @@ static int compare_by_list(const void *a1, const void *a2)
 }
 
 static void insert_one_record(struct shortlog *log,
-			      const char *author,
+			      const char *ident_string,
 			      const char *oneline)
 {
 	struct string_list_item *item;
@@ -56,7 +56,7 @@ static void insert_one_record(struct shortlog *log,
 	struct strbuf namemailbuf = STRBUF_INIT;
 	struct ident_split ident;
 
-	if (split_ident_line(&ident, author, strlen(author)))
+	if (split_ident_line(&ident, ident_string, strlen(ident_string)))
 		return;
 
 	namebuf = ident.name_begin;
@@ -115,12 +115,12 @@ static void insert_one_record(struct shortlog *log,
 
 static void read_from_stdin(struct shortlog *log)
 {
-	struct strbuf author = STRBUF_INIT;
+	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 
-	while (strbuf_getline(&author, stdin, '\n') != EOF) {
+	while (strbuf_getline(&ident, stdin, '\n') != EOF) {
 		const char *v;
-		if (!skip_prefix_icase(author.buf, "Author: ", &v))
+		if (!skip_prefix_icase(ident.buf, "Author: ", &v))
 			continue;
 		while (strbuf_getline(&oneline, stdin, '\n') != EOF &&
 		       oneline.len)
@@ -130,13 +130,13 @@ static void read_from_stdin(struct shortlog *log)
 			; /* discard blanks */
 		insert_one_record(log, v, oneline.buf);
 	}
-	strbuf_release(&author);
+	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf author = STRBUF_INIT;
+	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
 
@@ -147,8 +147,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.date_mode.type = DATE_NORMAL;
 	ctx.output_encoding = get_log_output_encoding();
 
-	format_commit_message(commit, "%an <%ae>", &author, &ctx);
-	if (author.len <= 3) {
+	format_commit_message(commit, "%an <%ae>", &ident, &ctx);
+	if (ident.len <= 3) {
 		warning(_("Missing author: %s"),
 		    oid_to_hex(&commit->object.oid));
 		return;
@@ -161,8 +161,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			format_commit_message(commit, "%s", &oneline, &ctx);
 	}
 
-	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
-	strbuf_release(&author);
+	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
+	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/14] shortlog: allow grouping by committer ident
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (9 preceding siblings ...)
  2015-12-29  7:32 ` [PATCH 10/14] shortlog: change "author" variables to "ident" Jeff King
@ 2015-12-29  7:35 ` Jeff King
  2016-01-04  9:44   ` Eric Sunshine
  2015-12-29  7:35 ` [PATCH 12/14] trailer: factor out config reading Jeff King
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:35 UTC (permalink / raw)
  To: git

Shortlog always groups commits by the author field. It can
be interesting to group by other fields, as well. The
obvious other identity in each commit is the committer
field. This might be interesting if your project follows a
workflow where committers and authors are not the same and
where there is not simply one maintainer picking up all of
the patches.

For instance, you can use this in git.git to see interim and
subsystem maintainers. Likewise, you can see in linux.git
the patches picked up by lieutenants and merged into Linus's
tree.

This patch also provides some of the necessary
infrastructure for adding more ident types (e.g., from
trailers) in the future.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  8 +++++++
 builtin/shortlog.c             | 51 +++++++++++++++++++++++++++++++++++++-----
 shortlog.h                     |  5 +++++
 t/t4201-shortlog.sh            | 13 +++++++++++
 4 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 31af7f2..a89a01e 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -47,6 +47,14 @@ OPTIONS
 
 	Each pretty-printed commit will be rewrapped before it is shown.
 
+--ident=<type>::
+	By default, `shortlog` collects and collates author identities;
+	using `--ident` will collect other types of identity. If
+	`<type>` is:
++
+ - `author`, commits are grouped by author (this is the default)
+ - `committer`, commits are grouped by committer
+
 -w[<width>[,<indent1>[,<indent2>]]]::
 	Linewrap the output by wrapping each line at `width`.  The first
 	line of each entry is indented by `indent1` spaces, and the second
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d7eb0cb..39da2d4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -115,12 +115,22 @@ static void insert_one_record(struct shortlog *log,
 
 static void read_from_stdin(struct shortlog *log)
 {
+	const char *ident_header = NULL;
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 
+	switch (log->ident_type) {
+	case SHORTLOG_IDENT_AUTHOR:
+		ident_header = "Author: ";
+		break;
+	case SHORTLOG_IDENT_COMMITTER:
+		ident_header = "Commit: ";
+		break;
+	}
+
 	while (strbuf_getline(&ident, stdin, '\n') != EOF) {
 		const char *v;
-		if (!skip_prefix_icase(ident.buf, "Author: ", &v))
+		if (!skip_prefix_icase(ident.buf, ident_header, &v))
 			continue;
 		while (strbuf_getline(&oneline, stdin, '\n') != EOF &&
 		       oneline.len)
@@ -147,11 +157,23 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.date_mode.type = DATE_NORMAL;
 	ctx.output_encoding = get_log_output_encoding();
 
-	format_commit_message(commit, "%an <%ae>", &ident, &ctx);
-	if (ident.len <= 3) {
-		warning(_("Missing author: %s"),
-		    oid_to_hex(&commit->object.oid));
-		return;
+	switch (log->ident_type) {
+	case SHORTLOG_IDENT_AUTHOR:
+		format_commit_message(commit, "%an <%ae>", &ident, &ctx);
+		if (ident.len <= 3) {
+			warning(_("Missing author: %s"),
+				oid_to_hex(&commit->object.oid));
+			return;
+		}
+		break;
+	case SHORTLOG_IDENT_COMMITTER:
+		format_commit_message(commit, "%cn <%ce>", &ident, &ctx);
+		if (ident.len <= 3) {
+			warning(_("Missing committer: %s"),
+				oid_to_hex(&commit->object.oid));
+			return;
+		}
+		break;
 	}
 
 	if (!log->summary) {
@@ -223,6 +245,21 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int parse_ident_option(const struct option *opt, const char *arg, int unset)
+{
+	struct shortlog *log = opt->value;
+
+	if (unset || !strcasecmp(arg, "author"))
+		log->ident_type = SHORTLOG_IDENT_AUTHOR;
+	else if (!strcasecmp(arg, "committer"))
+		log->ident_type = SHORTLOG_IDENT_COMMITTER;
+	else
+		die("unknown ident type: %s", arg);
+
+	return 0;
+}
+
+
 void shortlog_init(struct shortlog *log)
 {
 	memset(log, 0, sizeof(*log));
@@ -250,6 +287,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 			 N_("Show the email address of each author")),
 		{ OPTION_CALLBACK, 'w', NULL, &log, N_("w[,i1[,i2]]"),
 			N_("Linewrap output"), PARSE_OPT_OPTARG, &parse_wrap_args },
+		{ OPTION_CALLBACK, 0, "ident", &log, N_("field"),
+			N_("Use ident from field"), 0, parse_ident_option },
 		OPT_END(),
 	};
 
diff --git a/shortlog.h b/shortlog.h
index de4f86f..a365620 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -14,6 +14,11 @@ struct shortlog {
 	int user_format;
 	int abbrev;
 
+	enum {
+		SHORTLOG_IDENT_AUTHOR = 0,
+		SHORTLOG_IDENT_COMMITTER
+	} ident_type;
+
 	char *common_repo_prefix;
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 7600a3e..867a7ae 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -194,4 +194,17 @@ test_expect_success 'shortlog with revision pseudo options' '
 	git shortlog --exclude=refs/heads/m* --all
 '
 
+test_expect_success 'shortlog --ident=committer (internal)' '
+	cat >expect <<-\EOF &&
+	     5	C O Mitter
+	EOF
+	git shortlog -ns --ident=committer HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog --ident=committer (external)' '
+	git log --format=full | git shortlog -ns --ident=committer >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/14] trailer: factor out config reading
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (10 preceding siblings ...)
  2015-12-29  7:35 ` [PATCH 11/14] shortlog: allow grouping by committer ident Jeff King
@ 2015-12-29  7:35 ` Jeff King
  2015-12-29  7:36 ` [PATCH 13/14] trailer: add interface for parsing commit trailers Jeff King
  2015-12-29  7:38 ` [PATCH 14/14] shortlog: match commit trailers with --ident Jeff King
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:35 UTC (permalink / raw)
  To: git

The only entry point into the trailer.c code is
process_trailers, which initializes itself by reading the
trailer config. Let's pull that out into a separate function
and protect it with a lazy-initialization flag. That will
let us easily add new entry points in future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 trailer.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/trailer.c b/trailer.c
index 6f3416f..18bf209 100644
--- a/trailer.c
+++ b/trailer.c
@@ -842,6 +842,19 @@ static void free_all(struct trailer_item **first)
 	}
 }
 
+static void init_trailer_config(void)
+{
+	static int initialized;
+
+	if (initialized)
+		return;
+
+	git_config(git_trailer_default_config, NULL);
+	git_config(git_trailer_config, NULL);
+
+	initialized = 1;
+}
+
 void process_trailers(const char *file, int trim_empty, struct string_list *trailers)
 {
 	struct trailer_item *in_tok_first = NULL;
@@ -850,10 +863,7 @@ void process_trailers(const char *file, int trim_empty, struct string_list *trai
 	struct strbuf **lines;
 	int trailer_end;
 
-	/* Default config must be setup first */
-	git_config(git_trailer_default_config, NULL);
-	git_config(git_trailer_config, NULL);
-
+	init_trailer_config();
 	lines = read_input_file(file);
 
 	/* Print the lines before the trailers */
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/14] trailer: add interface for parsing commit trailers
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (11 preceding siblings ...)
  2015-12-29  7:35 ` [PATCH 12/14] trailer: factor out config reading Jeff King
@ 2015-12-29  7:36 ` Jeff King
  2015-12-29  7:38 ` [PATCH 14/14] shortlog: match commit trailers with --ident Jeff King
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:36 UTC (permalink / raw)
  To: git

The git-trailer command and its subfunctions work in only
one way: they take a set of trailer operations, and pass
through a file while performing those operations on it.

However, other parts of the system may want to simply parse
trailers, and we should be able to reuse the code here. This
patch provides a simple interface for parsing a commit
message with trailers, iterating over them, and retrieving
individual keys.

The trailer code is a little heavy on the use of strbufs, so
this is perhaps a bit slower than it would be if we were
able to parse the message in place (and this speed matters
when you are iterating over every commit in the repository).
However, there's nothing in this interface that paints us
too far into a corner; we can always optimize the internals
later.

Signed-off-by: Jeff King <peff@peff.net>
---
 trailer.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 trailer.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/trailer.c b/trailer.c
index 18bf209..36ba476 100644
--- a/trailer.c
+++ b/trailer.c
@@ -882,3 +882,46 @@ void process_trailers(const char *file, int trim_empty, struct string_list *trai
 
 	strbuf_list_free(lines);
 }
+
+void trailer_parse_init(struct trailer_parse_context *ctx, const struct strbuf *buf)
+{
+	int nr_lines = 0;
+
+	init_trailer_config();
+
+	ctx->lines = strbuf_split(buf, '\n');
+	while (ctx->lines[nr_lines])
+		nr_lines++;
+
+	ctx->end = find_trailer_end(ctx->lines, nr_lines);
+	ctx->start = find_trailer_start(ctx->lines, ctx->end);
+
+	strbuf_init(&ctx->token, 0);
+	strbuf_init(&ctx->value, 0);
+}
+
+void trailer_parse_clear(struct trailer_parse_context *ctx)
+{
+	strbuf_list_free(ctx->lines);
+	strbuf_release(&ctx->token);
+	strbuf_release(&ctx->value);
+}
+
+const char *trailer_parse_match(struct trailer_parse_context *ctx, int line, const char *match)
+{
+	size_t len;
+
+	if (ctx->lines[line]->buf[0] == comment_line_char)
+		return NULL;
+
+	strbuf_reset(&ctx->token);
+	strbuf_reset(&ctx->value);
+	if (parse_trailer(&ctx->token, &ctx->value, ctx->lines[line]->buf))
+		return NULL;
+
+	len = token_len_without_separator(ctx->token.buf, ctx->token.len);
+	if (strncasecmp(ctx->token.buf, match, len) || match[len])
+		return NULL;
+
+	return ctx->value.buf;
+}
diff --git a/trailer.h b/trailer.h
index 8eb25d5..1f985f6 100644
--- a/trailer.h
+++ b/trailer.h
@@ -3,4 +3,35 @@
 
 void process_trailers(const char *file, int trim_empty, struct string_list *trailers);
 
+struct trailer_parse_context {
+	struct strbuf **lines;
+	int start;
+	int end;
+
+	/* These fields are private to the parser. */
+	struct strbuf token;
+	struct strbuf value;
+};
+
+/*
+ * Parse the commit message found in "buf", looking for trailers. Any data in
+ * ctx is overwritten, and should later be freed with trailer_parse_clear().
+ *
+ * The caller can iterate over all trailers using the "start" and "end" indices
+ * into "lines".
+ */
+void trailer_parse_init(struct trailer_parse_context *ctx, const struct strbuf *buf);
+
+/*
+ * If the line contains a trailer with key "trailer", returns a pointer into
+ * "line" for the value. Otherwise, returns NULL.
+ */
+const char *trailer_parse_match(struct trailer_parse_context *ctx, int line,
+				const char *trailer);
+
+/*
+ * Free resources allocated by trailer_parse_init().
+ */
+void trailer_parse_clear(struct trailer_parse_context *ctx);
+
 #endif /* TRAILER_H */
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/14] shortlog: match commit trailers with --ident
  2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
                   ` (12 preceding siblings ...)
  2015-12-29  7:36 ` [PATCH 13/14] trailer: add interface for parsing commit trailers Jeff King
@ 2015-12-29  7:38 ` Jeff King
  2015-12-29  7:50   ` Jeff King
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:38 UTC (permalink / raw)
  To: git

If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:

  git shortlog -ns --ident=reviewed-by

in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:

  git shortlog --format="...helped %an (%ad)" --ident=helped-by

This does run a bit more slowly than a normal
author-grouping (about 33% slower in my tests). Some of that
is natural, because we have to spend time parsing the
trailers. But I suspect a fair bit of that could be cut off
by optimizing the trailer code, which involves quite a few
more copies of the data than we actually need.

Still, it is certainly fast enough to be usable, and
optimization can come later.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  9 +++++++++
 builtin/shortlog.c             | 38 +++++++++++++++++++++++++++++++++++---
 shortlog.h                     |  4 +++-
 t/t4201-shortlog.sh            |  9 +++++++++
 4 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index a89a01e..16080c4 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -54,6 +54,15 @@ OPTIONS
 +
  - `author`, commits are grouped by author (this is the default)
  - `committer`, commits are grouped by committer
+ - any other value, the `<type>` is interpreted as a case-insensitive
+   commit message trailer (see linkgit:git-interpret-trailers[1]). For
+   example, if your project uses `Reviewed-by` trailers, you might want
+   to see who has been reviewing with
+   `git shortlog -ns --ident=reviewed-by`.
++
+Note that commits that do not include the trailer will not be counted.
+Likewise, commits with multiple trailers (e.g., multiple signoffs) may
+count more than once.
 
 -w[<width>[,<indent1>[,<indent2>]]]::
 	Linewrap the output by wrapping each line at `width`.  The first
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 39da2d4..f774c84 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -8,6 +8,7 @@
 #include "mailmap.h"
 #include "shortlog.h"
 #include "parse-options.h"
+#include "trailer.h"
 
 static char const * const shortlog_usage[] = {
 	N_("git shortlog [<options>] [<revision-range>] [[--] [<path>...]]"),
@@ -126,6 +127,8 @@ static void read_from_stdin(struct shortlog *log)
 	case SHORTLOG_IDENT_COMMITTER:
 		ident_header = "Commit: ";
 		break;
+	case SHORTLOG_IDENT_TRAILER:
+		die(_("sorry, using a trailer --ident with stdin is not supported"));
 	}
 
 	while (strbuf_getline(&ident, stdin, '\n') != EOF) {
@@ -149,6 +152,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
+	char *oneline_str;
 
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
@@ -174,6 +178,12 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			return;
 		}
 		break;
+	case SHORTLOG_IDENT_TRAILER:
+		/*
+		 * We might have multiple matches, so deal with it in the loop
+		 * below.
+		 */
+		break;
 	}
 
 	if (!log->summary) {
@@ -183,7 +193,26 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			format_commit_message(commit, "%s", &oneline, &ctx);
 	}
 
-	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
+	oneline_str = oneline.len ? oneline.buf : "<none>";
+	if (log->ident_type != SHORTLOG_IDENT_TRAILER)
+		insert_one_record(log, ident.buf, oneline_str);
+	else {
+		struct strbuf msg = STRBUF_INIT;
+		struct trailer_parse_context tp;
+		int i;
+
+		format_commit_message(commit, "%B", &msg, &ctx);
+		trailer_parse_init(&tp, &msg);
+		for (i = tp.start; i < tp.end; i++) {
+			const char *v = trailer_parse_match(&tp, i, log->trailer);
+			if (!v)
+				continue;
+			insert_one_record(log, v, oneline_str);
+		}
+		trailer_parse_clear(&tp);
+		strbuf_release(&msg);
+	}
+
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
@@ -253,8 +282,11 @@ static int parse_ident_option(const struct option *opt, const char *arg, int uns
 		log->ident_type = SHORTLOG_IDENT_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
 		log->ident_type = SHORTLOG_IDENT_COMMITTER;
-	else
-		die("unknown ident type: %s", arg);
+	else {
+		log->ident_type = SHORTLOG_IDENT_TRAILER;
+		free(log->trailer);
+		log->trailer = xstrdup(arg);
+	}
 
 	return 0;
 }
diff --git a/shortlog.h b/shortlog.h
index a365620..718b49d 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -16,8 +16,10 @@ struct shortlog {
 
 	enum {
 		SHORTLOG_IDENT_AUTHOR = 0,
-		SHORTLOG_IDENT_COMMITTER
+		SHORTLOG_IDENT_COMMITTER,
+		SHORTLOG_IDENT_TRAILER,
 	} ident_type;
+	char *trailer;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 867a7ae..9c00ccb 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -207,4 +207,13 @@ test_expect_success 'shortlog --ident=committer (external)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --ident=signed-off-by' '
+	git commit --allow-empty -m foo -s &&
+	cat >expect <<-\EOF &&
+	     1	C O Mitter
+	EOF
+	git shortlog -ns --ident=signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.7.0.rc3.367.g09631da

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 14/14] shortlog: match commit trailers with --ident
  2015-12-29  7:38 ` [PATCH 14/14] shortlog: match commit trailers with --ident Jeff King
@ 2015-12-29  7:50   ` Jeff King
  2016-01-04  9:44     ` Eric Sunshine
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2015-12-29  7:50 UTC (permalink / raw)
  To: git

A few comments on my own patch...

On Tue, Dec 29, 2015 at 02:38:32AM -0500, Jeff King wrote:

> If a project uses commit trailers, this patch lets you use
> shortlog to see who is performing each action. For example,
> running:
> 
>   git shortlog -ns --ident=reviewed-by

So obviously you can do something similar by piping git-log into grep.
What I think this buys us is the ability to use the trailer-parsing code
rather than hacking together a regex. And it lets you do stuff like
this:

> in git.git shows who has reviewed. You can even use a custom
> format to see things like who has helped whom:
> 
>   git shortlog --format="...helped %an (%ad)" --ident=helped-by

Which relies on correlating trailers and their matching commits.

Since there may be multiple entries per commit, this always ends up as:

  Helper Ident (1):
          ...helped Author Ident
	  ...helped Other Author
	  etc

It would be neat to be able to format it so that each line was a full
record (which would make further stats easier). We'd need a new
formatting option for that.

> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> index a89a01e..16080c4 100644
> --- a/Documentation/git-shortlog.txt
> +++ b/Documentation/git-shortlog.txt
> @@ -54,6 +54,15 @@ OPTIONS
>  +
>   - `author`, commits are grouped by author (this is the default)
>   - `committer`, commits are grouped by committer
> + - any other value, the `<type>` is interpreted as a case-insensitive
> +   commit message trailer (see linkgit:git-interpret-trailers[1]). For
> +   example, if your project uses `Reviewed-by` trailers, you might want
> +   to see who has been reviewing with
> +   `git shortlog -ns --ident=reviewed-by`.

This doesn't leave us many syntactic "outs" for adding new ident types
in the future (nor can you match a trailer called "Author"). I guess we
could call this "--ident=trailer:reviewed-by" to be more precise, but
it's more annoying to type. Perhaps we could consider that the
"official" syntax, and DWIM "--ident=foo" to "--ident=trailer:foo", with
the caveat that it may change in the future. I dunno.

> ++
> +Note that commits that do not include the trailer will not be counted.
> +Likewise, commits with multiple trailers (e.g., multiple signoffs) may
> +count more than once.

I think counting multiples is the only sensible thing here. If you have
two helped-by fields, for example it would make sense to credit each
helper. For S-o-b, where the same person may sign-off twice, it does
mean you get credited twice for a single commit. I'm not sure if that's
a feature or not. :)

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/14] log: refactor add_header to drop some magic numbers
  2015-12-29  7:20 ` [PATCH 02/14] log: refactor add_header to drop some magic numbers Jeff King
@ 2015-12-31  6:21   ` Eric Sunshine
  2016-01-01  8:42     ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2015-12-31  6:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Tue, Dec 29, 2015 at 2:20 AM, Jeff King <peff@peff.net> wrote:
> We want to chomp newlines off the end of the "value" string.
> But because it's const, we must track its length rather than
> writing a NUL. This leads to us having to tweak that length
> later, to account for moving the pointer forward.
>
> Since we are about to create a copy of it anyway, let's just
> wait and chomp at the end.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/log.c b/builtin/log.c
> @@ -675,21 +675,18 @@ static struct string_list extra_cc;
>  static void add_header(const char *value)
>  {
>         struct string_list_item *item;
> -       int len = strlen(value);
> -       while (len && value[len - 1] == '\n')
> -               len--;
> +       size_t len;
>
> -       if (!strncasecmp(value, "to: ", 4)) {
> +       if (!strncasecmp(value, "to: ", 4))
>                 item = string_list_append(&extra_to, value + 4);
> -               len -= 4;
> -       } else if (!strncasecmp(value, "cc: ", 4)) {
> +       else if (!strncasecmp(value, "cc: ", 4))
>                 item = string_list_append(&extra_cc, value + 4);
> -               len -= 4;
> -       } else {
> +       else
>                 item = string_list_append(&extra_hdr, value);
> -       }
>
> -       item->string[len] = '\0';
> +       len = strlen(item->string);
> +       while (len && item->string[len - 1] == '\n')
> +               item->string[--len] = '\0';

Not a strong objection, but this implementation may make the reader
wonder why NUL needs to be assigned to all "stripped" characters. I'd
have expected to see:

    len = strlen(item->string);
    while (len && item->string[len - 1] == '\n')
        len--;
    item->string[len] = '\0';

which indicates clearly that this is a simple truncation rather than
some odd NUL-fill operation, and is slightly more easy to reason about
since it doesn't involve a pre-decrement as an array subscript.

>  }
>
>  #define THREAD_SHALLOW 1
> --
> 2.7.0.rc3.367.g09631da

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/14] strutil: add skip_prefix_icase
  2015-12-29  7:22 ` [PATCH 03/14] strutil: add skip_prefix_icase Jeff King
@ 2015-12-31  6:40   ` Eric Sunshine
  2016-01-01  8:50     ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2015-12-31  6:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Tue, Dec 29, 2015 at 2:22 AM, Jeff King <peff@peff.net> wrote:
> Some sites that otherwise would use skip_prefix cannot do
> so, because it has no way to do case-insensitive
> comparisons. Such sites usually get around this by using
> strncasecmp, at the cost of having to use magic numbers.
> We can help them by providing a case-insensitive version of
> skip_prefix.
>
> Unfortunately, we don't share any code with the original
> skip_prefix. Since this is performance-sensitive code, we
> would not want to introduce an extra "do we are about case?"
> conditional into the middle of the loop. We could instead
> use macros or another technique to generate the
> almost-identical implementations, but the function simply
> isn't long enough to merit that confusing boilerplate.
>
> To show off the new function, we convert a simple case in
> log's add_header function.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/log.c b/builtin/log.c
> @@ -677,10 +677,10 @@ static void add_header(const char *value)
> -       if (!strncasecmp(value, "to: ", 4))
> -               item = string_list_append(&extra_to, value + 4);
> -       else if (!strncasecmp(value, "cc: ", 4))
> -               item = string_list_append(&extra_cc, value + 4);
> +       if (skip_prefix_icase(value, "to: ", &value))
> +               item = string_list_append(&extra_to, value);
> +       else if (skip_prefix_icase(value, "cc: ", &value))
> +               item = string_list_append(&extra_cc, value);

Is it worth holding this patch, with its introduction of
skip_prefix_icase(), hostage to the unrelated change in the previous
patch (2/14) which touches the same bit of code? Would it make sense
to split this change out?

>         else
>                 item = string_list_append(&extra_hdr, value);
>
> diff --git a/strutil.h b/strutil.h
> @@ -32,6 +32,21 @@ static inline int skip_prefix(const char *str, const char *prefix,
>  /*
> + * Identical to skip_prefix, but compare characters case-insensitively.
> + */
> +static inline int skip_prefix_icase(const char *str, const char *prefix,
> +                                   const char **out)
> +{
> +       do {
> +               if (!*prefix) {
> +                       *out = str;
> +                       return 1;
> +               }
> +       } while (tolower(*str++) == tolower(*prefix++));

I wondered initially if we should be concerned about invoking
tolower() with an expression with side-effects since some older and/or
non-compliant libraries implement it as a macro which doesn't ensure
that the argument is evaluated only once. However, it seems that Git
already uses tolower(*p++) in a few other places, so I guess we're not
worrying about those broken implementations.

> +       return 0;
> +}
> +
> +/*

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines
  2015-12-29  7:27 ` [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines Jeff King
@ 2015-12-31  6:47   ` Eric Sunshine
  2016-01-01  8:53     ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2015-12-31  6:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Tue, Dec 29, 2015 at 2:27 AM, Jeff King <peff@peff.net> wrote:
> Because we must match both "Author" and "author" here, we
> could not use skip_prefix, and had to hand-code a partial
> case-insensitive match. Now that we have skip_prefix_case,

s/skip_prefix_case/skip_prefix_icase/

> we can use it. This is technically more liberal in what it
> matches (e.g., it will match AUTHOR), but in this particular
> case that that's OK (we are matching git-log output, so we

s/that that's/that's/

> expect arbitrary data like commit headers to be indented).
>
> In addition to being easier to read, this will make the code
> easier to adapt for matching other lines.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> @@ -94,8 +94,8 @@ static void read_from_stdin(struct shortlog *log)
>         char author[1024], oneline[1024];
>
>         while (fgets(author, sizeof(author), stdin) != NULL) {
> -               if (!(author[0] == 'A' || author[0] == 'a') ||
> -                   !starts_with(author + 1, "uthor: "))
> +               const char *v;
> +               if (!skip_prefix_icase(author, "Author: ", &v))
>                         continue;
>                 while (fgets(oneline, sizeof(oneline), stdin) &&
>                        oneline[0] != '\n')
> @@ -103,7 +103,7 @@ static void read_from_stdin(struct shortlog *log)
>                 while (fgets(oneline, sizeof(oneline), stdin) &&
>                        oneline[0] == '\n')
>                         ; /* discard blanks */
> -               insert_one_record(log, author + 8, oneline);
> +               insert_one_record(log, v, oneline);
>         }
>  }
>
> --
> 2.7.0.rc3.367.g09631da

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/14] log: refactor add_header to drop some magic numbers
  2015-12-31  6:21   ` Eric Sunshine
@ 2016-01-01  8:42     ` Jeff King
  2016-01-01  8:46       ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2016-01-01  8:42 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Dec 31, 2015 at 01:21:42AM -0500, Eric Sunshine wrote:

> > -       item->string[len] = '\0';
> > +       len = strlen(item->string);
> > +       while (len && item->string[len - 1] == '\n')
> > +               item->string[--len] = '\0';
> 
> Not a strong objection, but this implementation may make the reader
> wonder why NUL needs to be assigned to all "stripped" characters. I'd
> have expected to see:
> 
>     len = strlen(item->string);
>     while (len && item->string[len - 1] == '\n')
>         len--;
>     item->string[len] = '\0';
> 
> which indicates clearly that this is a simple truncation rather than
> some odd NUL-fill operation, and is slightly more easy to reason about
> since it doesn't involve a pre-decrement as an array subscript.

Hmm. I consider the "write NULs backward" strategy to be pretty
idiomatic (you can find several similar ones grepping for `\[--` in the
git codebase). But that may just be me (I didn't look, but it's possible
I wrote the other ones, too :) ).

I don't have a strong preference, though. What you've written is quite
readable.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/14] log: refactor add_header to drop some magic numbers
  2016-01-01  8:42     ` Jeff King
@ 2016-01-01  8:46       ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2016-01-01  8:46 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Fri, Jan 01, 2016 at 03:42:06AM -0500, Jeff King wrote:

> On Thu, Dec 31, 2015 at 01:21:42AM -0500, Eric Sunshine wrote:
> 
> > > -       item->string[len] = '\0';
> > > +       len = strlen(item->string);
> > > +       while (len && item->string[len - 1] == '\n')
> > > +               item->string[--len] = '\0';
> > 
> > Not a strong objection, but this implementation may make the reader
> > wonder why NUL needs to be assigned to all "stripped" characters. I'd
> > have expected to see:
> > 
> >     len = strlen(item->string);
> >     while (len && item->string[len - 1] == '\n')
> >         len--;
> >     item->string[len] = '\0';
> > 
> > which indicates clearly that this is a simple truncation rather than
> > some odd NUL-fill operation, and is slightly more easy to reason about
> > since it doesn't involve a pre-decrement as an array subscript.
> 
> Hmm. I consider the "write NULs backward" strategy to be pretty
> idiomatic (you can find several similar ones grepping for `\[--` in the
> git codebase). But that may just be me (I didn't look, but it's possible
> I wrote the other ones, too :) ).
> 
> I don't have a strong preference, though. What you've written is quite
> readable.

Actually, looking at the diff, the original already has your final line
(it _has_ to do the search and NUL-write separately because it does the
former before allocating the copy). So between the two options, yours
makes the diff more obvious, too, which is a good thing. I'll take your
suggestion.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/14] strutil: add skip_prefix_icase
  2015-12-31  6:40   ` Eric Sunshine
@ 2016-01-01  8:50     ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2016-01-01  8:50 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Dec 31, 2015 at 01:40:46AM -0500, Eric Sunshine wrote:

> > diff --git a/builtin/log.c b/builtin/log.c
> > @@ -677,10 +677,10 @@ static void add_header(const char *value)
> > -       if (!strncasecmp(value, "to: ", 4))
> > -               item = string_list_append(&extra_to, value + 4);
> > -       else if (!strncasecmp(value, "cc: ", 4))
> > -               item = string_list_append(&extra_cc, value + 4);
> > +       if (skip_prefix_icase(value, "to: ", &value))
> > +               item = string_list_append(&extra_to, value);
> > +       else if (skip_prefix_icase(value, "cc: ", &value))
> > +               item = string_list_append(&extra_cc, value);
> 
> Is it worth holding this patch, with its introduction of
> skip_prefix_icase(), hostage to the unrelated change in the previous
> patch (2/14) which touches the same bit of code? Would it make sense
> to split this change out?

I assumed that this was at least as contentious as 2/14, so it wouldn't
be a problem. I'm inclined to leave it, unless it looks like we'll scrap
2/14.

> > +static inline int skip_prefix_icase(const char *str, const char *prefix,
> > +                                   const char **out)
> > +{
> > +       do {
> > +               if (!*prefix) {
> > +                       *out = str;
> > +                       return 1;
> > +               }
> > +       } while (tolower(*str++) == tolower(*prefix++));
> 
> I wondered initially if we should be concerned about invoking
> tolower() with an expression with side-effects since some older and/or
> non-compliant libraries implement it as a macro which doesn't ensure
> that the argument is evaluated only once. However, it seems that Git
> already uses tolower(*p++) in a few other places, so I guess we're not
> worrying about those broken implementations.

Right. There are a few other gotchas with tolower(), and we work around
it by implementing the ctype functions ourselves (even on non-broken
platforms). It's one of the oldest bits of git-compat-util.h.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines
  2015-12-31  6:47   ` Eric Sunshine
@ 2016-01-01  8:53     ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2016-01-01  8:53 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Dec 31, 2015 at 01:47:21AM -0500, Eric Sunshine wrote:

> On Tue, Dec 29, 2015 at 2:27 AM, Jeff King <peff@peff.net> wrote:
> > Because we must match both "Author" and "author" here, we
> > could not use skip_prefix, and had to hand-code a partial
> > case-insensitive match. Now that we have skip_prefix_case,
> 
> s/skip_prefix_case/skip_prefix_icase/

Thanks. I originally called it the former, and thought I had managed to
fix up all of the references, but clearly didn't run "git log --grep".

> > we can use it. This is technically more liberal in what it
> > matches (e.g., it will match AUTHOR), but in this particular
> > case that that's OK (we are matching git-log output, so we
> 
> s/that that's/that's/

Fixed, thanks.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer
  2015-12-29  7:29 ` [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer Jeff King
@ 2016-01-04  9:43   ` Eric Sunshine
  2016-01-04 10:17     ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2016-01-04  9:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Tue, Dec 29, 2015 at 2:29 AM, Jeff King <peff@peff.net> wrote:
> When gathering the author and oneline subject for each
> commit, we hand-parse the commit headers to find the
> "author" line, and then continue past to the blank line at
> the end of the header.
>
> We can replace this tricky hand-parsing by simply asking the
> pretty-printer for the relevant items. This also decouples
> the author and oneline parsing, opening up some new
> optimizations in further commits.
>
> One reason to avoid the pretty-printer is that it might be
> less efficient than hand-parsing. However, I measured no
> slowdown at all running "git shortlog -ns HEAD" on
> linux.git.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> @@ -112,45 +112,32 @@ static void read_from_stdin(struct shortlog *log)
>  void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  {
> -       const char *author = NULL, *buffer;
> -       struct strbuf buf = STRBUF_INIT;
> -       struct strbuf ufbuf = STRBUF_INIT;
> -
> -       pp_commit_easy(CMIT_FMT_RAW, commit, &buf);
> -       buffer = buf.buf;
> -       while (*buffer && *buffer != '\n') {
> -               const char *eol = strchr(buffer, '\n');
> -
> -               if (eol == NULL)
> -                       eol = buffer + strlen(buffer);
> -               else
> -                       eol++;
> -
> -               if (starts_with(buffer, "author "))
> -                       author = buffer + 7;
> -               buffer = eol;
> -       }
> -       if (!author) {
> +       struct strbuf author = STRBUF_INIT;
> +       struct strbuf oneline = STRBUF_INIT;
> +       struct pretty_print_context ctx = {0};
> +
> +       ctx.fmt = CMIT_FMT_USERFORMAT;
> +       ctx.abbrev = log->abbrev;
> +       ctx.subject = "";
> +       ctx.after_subject = "";
> +       ctx.date_mode.type = DATE_NORMAL;
> +       ctx.output_encoding = get_log_output_encoding();
> +
> +       format_commit_message(commit, "%an <%ae>", &author, &ctx);
> +       if (author.len <= 3) {

I suppose magic number 3 is the space, '<', and '>'...

>                 warning(_("Missing author: %s"),
>                     oid_to_hex(&commit->object.oid));
>                 return;

Is this leaking strbuf 'author'?

>         }
> -       if (log->user_format) {
> -               struct pretty_print_context ctx = {0};
> -               ctx.fmt = CMIT_FMT_USERFORMAT;
> -               ctx.abbrev = log->abbrev;
> -               ctx.subject = "";
> -               ctx.after_subject = "";
> -               ctx.date_mode.type = DATE_NORMAL;
> -               ctx.output_encoding = get_log_output_encoding();
> -               pretty_print_commit(&ctx, commit, &ufbuf);
> -               buffer = ufbuf.buf;
> -       } else if (*buffer) {
> -               buffer++;
> -       }
> -       insert_one_record(log, author, !*buffer ? "<none>" : buffer);
> -       strbuf_release(&ufbuf);
> -       strbuf_release(&buf);
> +
> +       if (log->user_format)
> +               pretty_print_commit(&ctx, commit, &oneline);
> +       else
> +               format_commit_message(commit, "%s", &oneline, &ctx);
> +
> +       insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
> +       strbuf_release(&author);
> +       strbuf_release(&oneline);
>  }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/14] shortlog: allow grouping by committer ident
  2015-12-29  7:35 ` [PATCH 11/14] shortlog: allow grouping by committer ident Jeff King
@ 2016-01-04  9:44   ` Eric Sunshine
  2016-01-04 10:23     ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2016-01-04  9:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Tue, Dec 29, 2015 at 2:35 AM, Jeff King <peff@peff.net> wrote:
> Shortlog always groups commits by the author field. It can
> be interesting to group by other fields, as well. The
> obvious other identity in each commit is the committer
> field. This might be interesting if your project follows a
> workflow where committers and authors are not the same and
> where there is not simply one maintainer picking up all of
> the patches.
>
> For instance, you can use this in git.git to see interim and
> subsystem maintainers. Likewise, you can see in linux.git
> the patches picked up by lieutenants and merged into Linus's
> tree.
>
> This patch also provides some of the necessary
> infrastructure for adding more ident types (e.g., from
> trailers) in the future.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> @@ -47,6 +47,14 @@ OPTIONS
> +--ident=<type>::

Should this be called --group-by?

> +       By default, `shortlog` collects and collates author identities;
> +       using `--ident` will collect other types of identity. If
> +       `<type>` is:
> ++
> + - `author`, commits are grouped by author (this is the default)
> + - `committer`, commits are grouped by committer

There is a bit of redundancy here. I wonder if it could be described
more succinctly. For instance, instead of all the above, perhaps:

    Group commits by `<type>`, which can be one of:

    - `author` (default)
    - `committer`

> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index d7eb0cb..39da2d4 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -147,11 +157,23 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> -       format_commit_message(commit, "%an <%ae>", &ident, &ctx);
> -       if (ident.len <= 3) {
> -               warning(_("Missing author: %s"),
> -                   oid_to_hex(&commit->object.oid));
> -               return;
> +       switch (log->ident_type) {
> +       case SHORTLOG_IDENT_AUTHOR:
> +               format_commit_message(commit, "%an <%ae>", &ident, &ctx);
> +               if (ident.len <= 3) {
> +                       warning(_("Missing author: %s"),
> +                               oid_to_hex(&commit->object.oid));
> +                       return;
> +               }
> +               break;
> +       case SHORTLOG_IDENT_COMMITTER:
> +               format_commit_message(commit, "%cn <%ce>", &ident, &ctx);
> +               if (ident.len <= 3) {
> +                       warning(_("Missing committer: %s"),
> +                               oid_to_hex(&commit->object.oid));
> +                       return;

Is this leaking strbuf 'ident'?

(Ditto for the "author" case as mentioned already in the patch 6/14 review.)

> +               }
> +               break;
>         }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 14/14] shortlog: match commit trailers with --ident
  2015-12-29  7:50   ` Jeff King
@ 2016-01-04  9:44     ` Eric Sunshine
  2016-01-04 10:31       ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sunshine @ 2016-01-04  9:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Tue, Dec 29, 2015 at 2:50 AM, Jeff King <peff@peff.net> wrote:
> This doesn't leave us many syntactic "outs" for adding new ident types
> in the future (nor can you match a trailer called "Author"). I guess we
> could call this "--ident=trailer:reviewed-by" to be more precise, but
> it's more annoying to type. Perhaps we could consider that the
> "official" syntax, and DWIM "--ident=foo" to "--ident=trailer:foo", with
> the caveat that it may change in the future. I dunno.

Would anyone ever want to be able to specify multiple trailers for
counting? That is:

    --ident=trailer:helped-by,reviewed-by

or

    --ident=trailer:helped-by --ident=trailer:reviewed-by

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer
  2016-01-04  9:43   ` Eric Sunshine
@ 2016-01-04 10:17     ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2016-01-04 10:17 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Mon, Jan 04, 2016 at 04:43:23AM -0500, Eric Sunshine wrote:

> > +       format_commit_message(commit, "%an <%ae>", &author, &ctx);
> > +       if (author.len <= 3) {
> 
> I suppose magic number 3 is the space, '<', and '>'...

Yeah. That's worthy of a comment, I think (see below).

I don't think we will detect a case with just a missing name _or_ email,
but neither did the original.

> >                 warning(_("Missing author: %s"),
> >                     oid_to_hex(&commit->object.oid));
> >                 return;
> 
> Is this leaking strbuf 'author'?

Yes. Looks like the original leaked "buf" in the same way.

Rather than repeat the strbuf_release, I think we can introduce a goto,
like:

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 10c92fa..4180a2d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -124,10 +124,11 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.output_encoding = get_log_output_encoding();
 
 	format_commit_message(commit, "%an <%ae>", &author, &ctx);
+	/* we can detect a total failure only by seeing " <>" in the output */
 	if (author.len <= 3) {
 		warning(_("Missing author: %s"),
 		    oid_to_hex(&commit->object.oid));
-		return;
+		goto out;
 	}
 
 	if (log->user_format)
@@ -136,6 +137,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		format_commit_message(commit, "%s", &oneline, &ctx);
 
 	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
+
+out:
 	strbuf_release(&author);
 	strbuf_release(&oneline);
 }

That should make it easier to handle the cases added later in the
series, too.

-Peff

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/14] shortlog: allow grouping by committer ident
  2016-01-04  9:44   ` Eric Sunshine
@ 2016-01-04 10:23     ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2016-01-04 10:23 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Mon, Jan 04, 2016 at 04:44:00AM -0500, Eric Sunshine wrote:

> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > @@ -47,6 +47,14 @@ OPTIONS
> > +--ident=<type>::
> 
> Should this be called --group-by?

Yeah, that may be a more sensible name. For committer/author, I think
--ident is accurate. But for a trailer, we technically do not know what
is in it. I expect people would use this primarily with ident-based
trailers (like "reviewed-by" or whatever), but there's no reason you
could not collate references to a bug id, or something.

> > +       By default, `shortlog` collects and collates author identities;
> > +       using `--ident` will collect other types of identity. If
> > +       `<type>` is:
> > ++
> > + - `author`, commits are grouped by author (this is the default)
> > + - `committer`, commits are grouped by committer
> 
> There is a bit of redundancy here. I wonder if it could be described
> more succinctly. For instance, instead of all the above, perhaps:
> 
>     Group commits by `<type>`, which can be one of:
> 
>     - `author` (default)
>     - `committer`

I avoided that kind of parallel structure because I knew that later in
the series I would be adding a new bullet that would need more
explanation. I'm still pondering `trailer:<key>` as the syntax, which
would involve reworking this text. I'll keep it in mind when I do.

> > +       case SHORTLOG_IDENT_COMMITTER:
> > +               format_commit_message(commit, "%cn <%ce>", &ident, &ctx);
> > +               if (ident.len <= 3) {
> > +                       warning(_("Missing committer: %s"),
> > +                               oid_to_hex(&commit->object.oid));
> > +                       return;
> 
> Is this leaking strbuf 'ident'?
> 
> (Ditto for the "author" case as mentioned already in the patch 6/14 review.)

Yep. :) I think these can become "goto out" per my previous fixup.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 14/14] shortlog: match commit trailers with --ident
  2016-01-04  9:44     ` Eric Sunshine
@ 2016-01-04 10:31       ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2016-01-04 10:31 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Mon, Jan 04, 2016 at 04:44:15AM -0500, Eric Sunshine wrote:

> On Tue, Dec 29, 2015 at 2:50 AM, Jeff King <peff@peff.net> wrote:
> > This doesn't leave us many syntactic "outs" for adding new ident types
> > in the future (nor can you match a trailer called "Author"). I guess we
> > could call this "--ident=trailer:reviewed-by" to be more precise, but
> > it's more annoying to type. Perhaps we could consider that the
> > "official" syntax, and DWIM "--ident=foo" to "--ident=trailer:foo", with
> > the caveat that it may change in the future. I dunno.
> 
> Would anyone ever want to be able to specify multiple trailers for
> counting? That is:
> 
>     --ident=trailer:helped-by,reviewed-by
> 
> or
> 
>     --ident=trailer:helped-by --ident=trailer:reviewed-by

Generally I'd prefer the latter, as it builds on the existing option
parsing rather than inventing a new syntax. But it would also allow
mixing trailer and non-trailer items, like:

  git shortlog --ident=trailer:helped-by --ident=author

I guess that is well-formed, though I would expect it to be rare. I
dunno.

I'd rather punt on handling multiple types in this initial attempt, and
let somebody implement that later if they want. I suppose if we did a
release with one and then the other, it would technically change the
meaning of:

  git shortlog --ident=foo --ident=bar

from "just bar, overriding foo" to "both foo and bar". I think I could
live with that, though.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-01-04 10:31 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-29  7:18 [PATCH 0/14] counting trailers with shortlogs Jeff King
2015-12-29  7:19 ` [PATCH 01/14] move string functions out of git-compat-util Jeff King
2015-12-29  7:20 ` [PATCH 02/14] log: refactor add_header to drop some magic numbers Jeff King
2015-12-31  6:21   ` Eric Sunshine
2016-01-01  8:42     ` Jeff King
2016-01-01  8:46       ` Jeff King
2015-12-29  7:22 ` [PATCH 03/14] strutil: add skip_prefix_icase Jeff King
2015-12-31  6:40   ` Eric Sunshine
2016-01-01  8:50     ` Jeff King
2015-12-29  7:27 ` [PATCH 04/14] shortlog: use skip_prefix_icase to parse "Author" lines Jeff King
2015-12-31  6:47   ` Eric Sunshine
2016-01-01  8:53     ` Jeff King
2015-12-29  7:28 ` [PATCH 05/14] shortlog: use strbufs to read from stdin Jeff King
2015-12-29  7:29 ` [PATCH 06/14] shortlog: replace hand-parsing of author with pretty-printer Jeff King
2016-01-04  9:43   ` Eric Sunshine
2016-01-04 10:17     ` Jeff King
2015-12-29  7:30 ` [PATCH 07/14] shortlog: optimize "--summary" mode Jeff King
2015-12-29  7:31 ` [PATCH 08/14] shortlog: optimize out useless "<none>" normalization Jeff King
2015-12-29  7:32 ` [PATCH 09/14] shortlog: optimize out useless string list Jeff King
2015-12-29  7:32 ` [PATCH 10/14] shortlog: change "author" variables to "ident" Jeff King
2015-12-29  7:35 ` [PATCH 11/14] shortlog: allow grouping by committer ident Jeff King
2016-01-04  9:44   ` Eric Sunshine
2016-01-04 10:23     ` Jeff King
2015-12-29  7:35 ` [PATCH 12/14] trailer: factor out config reading Jeff King
2015-12-29  7:36 ` [PATCH 13/14] trailer: add interface for parsing commit trailers Jeff King
2015-12-29  7:38 ` [PATCH 14/14] shortlog: match commit trailers with --ident Jeff King
2015-12-29  7:50   ` Jeff King
2016-01-04  9:44     ` Eric Sunshine
2016-01-04 10:31       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.