git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 8/8] shortlog: allow multiple groups to be specified
Date: Fri, 25 Sep 2020 03:05:50 -0400	[thread overview]
Message-ID: <20200925070550.GH62741@coredump.intra.peff.net> (raw)
In-Reply-To: <20200925070120.GA3669667@coredump.intra.peff.net>

Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.

This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:

  git shortlog -ns --group=author --group=trailer:co-authored-by

to get a shortlog that counts authors and co-authors equally.

The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.

The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).

There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  5 +++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 69 ++++++++++++++++++++-----------
 shortlog.h                     | 10 ++---
 t/t4201-shortlog.sh            | 74 ++++++++++++++++++++++++++++++++++
 5 files changed, 130 insertions(+), 29 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 5f8f918cad..f31770dab5 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -69,6 +69,11 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
 identity. If successful, the mailmap is applied and the email is omitted
 unless the `--email` option is specified. If the value cannot be parsed
 as an identity, it will be taken literally and completely.
++
+If `--group` is specified multiple times, commits are counted under each
+value (but again, only once per unique value in that commit). For
+example, `git shortlog --group=author --group=trailer:co-authored-by`
+counts both authors and co-authors.
 
 -c::
 --committer::
diff --git a/builtin/log.c b/builtin/log.c
index b8824d898f..7f27e9eca1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1195,6 +1195,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	log.in1 = 2;
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
+	log.groups = SHORTLOG_GROUP_AUTHOR;
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 28133aec68..a36a8717b7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -121,6 +121,11 @@ static int parse_ident(struct shortlog *log,
 	return 0;
 }
 
+static inline int has_multi_bits(unsigned n)
+{
+	return (n & (n - 1)) != 0;
+}
+
 static void read_from_stdin(struct shortlog *log)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -130,7 +135,10 @@ static void read_from_stdin(struct shortlog *log)
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
-	switch (log->group) {
+	if (has_multi_bits(log->groups))
+		die(_("using multiple --group options with stdin is not supported"));
+
+	switch (log->groups) {
 	case SHORTLOG_GROUP_AUTHOR:
 		match = author_match;
 		break;
@@ -221,13 +229,13 @@ static void strset_clear(struct strset *ss)
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
+					 struct strset *dups,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
-	struct strset dups = STRSET_INIT;
 	struct strbuf ident = STRBUF_INIT;
 
 	/*
@@ -243,28 +251,28 @@ static void insert_records_from_trailers(struct shortlog *log,
 	while (trailer_iterator_advance(&iter)) {
 		const char *value = iter.val.buf;
 
-		if (strcasecmp(iter.key.buf, log->trailer))
+		if (!string_list_has_string(&log->trailers, iter.key.buf))
 			continue;
 
 		strbuf_reset(&ident);
 		if (!parse_ident(log, &ident, value))
 			value = ident.buf;
 
-		if (strset_check_and_add(&dups, value))
+		if (strset_check_and_add(dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
 	strbuf_release(&ident);
-	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
+	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
 	const char *oneline_str;
 
@@ -282,24 +290,29 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	switch (log->group) {
-	case SHORTLOG_GROUP_AUTHOR:
+	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%aN <%aE>" : "%aN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_COMMITTER:
+		if (!has_multi_bits(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%cN <%cE>" : "%cN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_TRAILER:
-		insert_records_from_trailers(log, commit, &ctx, oneline_str);
-		break;
+		if (!has_multi_bits(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_TRAILER) {
+		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 
+	strset_clear(&dups);
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
@@ -366,14 +379,16 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	struct shortlog *log = opt->value;
 	const char *field;
 
-	if (unset || !strcasecmp(arg, "author"))
-		log->group = SHORTLOG_GROUP_AUTHOR;
+	if (unset) {
+		log->groups = 0;
+		string_list_clear(&log->trailers, 0);
+	} else if (!strcasecmp(arg, "author"))
+		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
-		log->group = SHORTLOG_GROUP_COMMITTER;
+		log->groups |= SHORTLOG_GROUP_COMMITTER;
 	else if (skip_prefix(arg, "trailer:", &field)) {
-		log->group = SHORTLOG_GROUP_TRAILER;
-		free(log->trailer);
-		log->trailer = xstrdup(field);
+		log->groups |= SHORTLOG_GROUP_TRAILER;
+		string_list_append(&log->trailers, field);
 	} else
 		return error(_("unknown group type: %s"), arg);
 
@@ -391,6 +406,8 @@ void shortlog_init(struct shortlog *log)
 	log->wrap = DEFAULT_WRAPLEN;
 	log->in1 = DEFAULT_INDENT1;
 	log->in2 = DEFAULT_INDENT2;
+	log->trailers.strdup_strings = 1;
+	log->trailers.cmp = strcasecmp;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -400,9 +417,9 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 
 	const struct option options[] = {
-		OPT_SET_INT('c', "committer", &log.group,
-			    N_("Group by committer rather than author"),
-			    SHORTLOG_GROUP_COMMITTER),
+		OPT_BIT('c', "committer", &log.groups,
+			N_("Group by committer rather than author"),
+			SHORTLOG_GROUP_COMMITTER),
 		OPT_BOOL('n', "numbered", &log.sort_by_number,
 			 N_("sort output according to the number of commits per author")),
 		OPT_BOOL('s', "summary", &log.summary,
@@ -454,6 +471,10 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
 
+	if (!log.groups)
+		log.groups = SHORTLOG_GROUP_AUTHOR;
+	string_list_sort(&log.trailers);
+
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
 		add_head_to_pending(&rev);
diff --git a/shortlog.h b/shortlog.h
index 54ce55e9e9..030a0b8490 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -17,11 +17,11 @@ struct shortlog {
 	int abbrev;
 
 	enum {
-		SHORTLOG_GROUP_AUTHOR = 0,
-		SHORTLOG_GROUP_COMMITTER,
-		SHORTLOG_GROUP_TRAILER
-	} group;
-	char *trailer;
+		SHORTLOG_GROUP_AUTHOR = (1 << 0),
+		SHORTLOG_GROUP_COMMITTER = (1 << 1),
+		SHORTLOG_GROUP_TRAILER = (1 << 2)
+	} groups;
+	struct string_list trailers;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index a62ee9ed55..3d5c4a2086 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -282,4 +282,78 @@ test_expect_success 'shortlog de-duplicates trailers in a single commit' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple groups' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this has two trailers that are distinct from the author; it will count
+	3 times in the output
+
+	Some-trailer: User A <a@example.com>
+	Another-trailer: User B <b@example.com>
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	this one has two trailers, one of which is a duplicate with the author;
+	it will only be counted once for them
+
+	Another-trailer: A U Thor <author@example.com>
+	Some-trailer: User B <b@example.com>
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     2	User B
+	     1	User A
+	EOF
+	git shortlog -ns \
+		--group=author \
+		--group=trailer:some-trailer \
+		--group=trailer:another-trailer \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up option selection tests' '
+	git commit --allow-empty -F - <<-\EOF
+	subject
+
+	body
+
+	Trailer-one: value-one
+	Trailer-two: value-two
+	EOF
+'
+
+test_expect_success '--no-group resets group list to author' '
+	cat >expect <<-\EOF &&
+	     1	A U Thor
+	EOF
+	git shortlog -ns \
+		--group=committer \
+		--group=trailer:trailer-one \
+		--no-group \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-group resets trailer list' '
+	cat >expect <<-\EOF &&
+	     1	value-two
+	EOF
+	git shortlog -ns \
+		--group=trailer:trailer-one \
+		--no-group \
+		--group=trailer:trailer-two \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin with multiple groups reports error' '
+	git log >log &&
+	test_must_fail git shortlog --group=author --group=committer <log
+'
+
 test_done
-- 
2.28.0.1085.g44a0350633

  parent reply	other threads:[~2020-09-25  7:05 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
2020-09-25  7:01 ` [PATCH 1/8] shortlog: change "author" variables to "ident" Jeff King
2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
2020-09-25 20:05   ` Eric Sunshine
2020-09-27  8:03     ` Jeff King
2020-09-27  8:08       ` Jeff King
2020-09-27  8:23         ` Eric Sunshine
2020-09-26 12:31   ` Martin Ågren
2020-09-27  7:59     ` Jeff King
2020-09-25  7:02 ` [PATCH 3/8] trailer: add interface for iterating over commit trailers Jeff King
2020-09-26 12:39   ` Martin Ågren
2020-09-27  8:20     ` Jeff King
2020-09-25  7:03 ` [PATCH 4/8] shortlog: match commit trailers with --group Jeff King
2020-09-25  7:05 ` [PATCH 5/8] shortlog: de-duplicate trailer values Jeff King
2020-09-25  7:05 ` [PATCH 6/8] shortlog: rename parse_stdin_ident() Jeff King
2020-09-25  7:05 ` [PATCH 7/8] shortlog: parse trailer idents Jeff King
2020-09-25  7:05 ` Jeff King [this message]
2020-09-25 20:23   ` [PATCH 8/8] shortlog: allow multiple groups to be specified Eric Sunshine
2020-09-27  8:06     ` Jeff King
2020-09-26 12:48   ` Martin Ågren
2020-09-27  8:25     ` Jeff King
2020-09-25 14:27 ` [PATCH 0/8] parsing trailers with shortlog Derrick Stolee
2020-09-25 16:57 ` Junio C Hamano
2020-09-27  8:39 ` [PATCH v2 " Jeff King
2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
2020-09-27 19:18     ` Junio C Hamano
2020-09-27  8:39   ` [PATCH v2 2/8] shortlog: add grouping option Jeff King
2020-09-27  8:40   ` [PATCH v2 3/8] trailer: add interface for iterating over commit trailers Jeff King
2020-09-27  8:40   ` [PATCH v2 4/8] shortlog: match commit trailers with --group Jeff King
2020-09-27 19:51     ` Junio C Hamano
2020-09-28  3:17       ` Jeff King
2020-09-28 17:01         ` Junio C Hamano
2020-09-27  8:40   ` [PATCH v2 5/8] shortlog: de-duplicate trailer values Jeff King
2020-09-27 20:23     ` Junio C Hamano
2020-09-28  3:19       ` Jeff King
2020-09-27  8:40   ` [PATCH v2 6/8] shortlog: rename parse_stdin_ident() Jeff King
2020-09-27  8:40   ` [PATCH v2 7/8] shortlog: parse trailer idents Jeff King
2020-09-27 20:49     ` Junio C Hamano
2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
2020-09-27 21:18     ` Junio C Hamano
2020-09-28  3:25       ` Jeff King
2020-12-28 11:29     ` Junio C Hamano
2021-02-04  6:44       ` Junio C Hamano
2020-09-27 14:38   ` [PATCH v2 0/8] parsing trailers with shortlog Martin Ågren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200925070550.GH62741@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).