git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: <git@vger.kernel.org>
Subject: [PATCH 1/1] mailmap: support hashed entries in mailmaps
Date: Sun, 13 Dec 2020 01:05:39 +0000	[thread overview]
Message-ID: <20201213010539.544101-2-sandals@crustytoothpaste.net> (raw)
In-Reply-To: <20201213010539.544101-1-sandals@crustytoothpaste.net>

Many people, through the course of their lives, will change either a
name or an email address.  For this reason, we have the mailmap, to map
from a user's former name or email address to their current, canonical
forms.  Normally, this works well as it is.

However, sometimes people change a name or an email address and wish to
wholly disassociate themselves from that former name or email address.
For example, a person may have left a company which engaged in a deeply
unethical act with which the person does not want to be associated, or
they may have changed their name to disassociate themselves from an
abusive family or partner.  In such a case, using the former name or
address in any way may be undesirable and the person may wish to replace
it as completely as possible.

For projects which wish to support this, introduce hashed forms into the
mailmap.  These forms, which start with "@sha256:" followed by a SHA-256
hash of the entry, can be used in place of the form used in the commit
field.  This form is intentionally designed to be unlikely to conflict
with legitimate use cases.  For example, this is not a valid email
address according to RFC 5322.  In the unlikely event that a user has
put such a form into the actual commit as their name, we will accept it.

While the form of the data is designed to accept multiple hash
algorithms, we intentionally do not support SHA-1.  There is little
reason to support such a weak algorithm in new use cases and no
backwards compatibility to consider.  Moreover, SHA-256 is faster than
the SHA1DC implementation we use, so this not only improves performance,
but simplifies the current implementation somewhat as well.

Note that it is, of course, possible to perform a lookup on all commit
objects to determine the actual entry which matches the hashed form of
the data.  However, a project for which this feature is valuable may
simply insert entries for many contributors in order to make discovery
of "interesting" entries significantly less convenient.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 mailmap.c          | 39 +++++++++++++++++++++++++++++++++++++--
 t/t4203-mailmap.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index 962fd86d6d..09d0ad7ca4 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -313,6 +313,41 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
 	return NULL;
 }
 
+/*
+ * Convert an email or name into a hashed form for comparison.  The hashed form
+ * will be created in the form
+ * @sha256:c68b7a430ac8dee9676ec77a387194e23f234d024e03d844050cf6c01775c8f6,
+ * which would be the hashed form for "doe@example.com".
+ */
+static char *hashed_form(struct strbuf *buf, const struct git_hash_algo *algop, const char *key, size_t keylen)
+{
+	git_hash_ctx ctx;
+	unsigned char hashbuf[GIT_MAX_RAWSZ];
+	char hexbuf[GIT_MAX_HEXSZ + 1];
+
+	algop->init_fn(&ctx);
+	algop->update_fn(&ctx, key, keylen);
+	algop->final_fn(hashbuf, &ctx);
+	hash_to_hex_algop_r(hexbuf, hashbuf, algop);
+
+	strbuf_addf(buf, "@%s:%s", algop->name, hexbuf);
+	return buf->buf;
+}
+
+static struct string_list_item *lookup_one(struct string_list *map,
+					   const char *string, size_t len)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item = lookup_prefix(map, string, len);
+	if (item)
+		return item;
+
+	hashed_form(&buf, &hash_algos[GIT_HASH_SHA256], string, len);
+	item = lookup_prefix(map, buf.buf, buf.len);
+	strbuf_release(&buf);
+	return item;
+}
+
 int map_user(struct string_list *map,
 	     const char **email, size_t *emaillen,
 	     const char **name, size_t *namelen)
@@ -324,7 +359,7 @@ int map_user(struct string_list *map,
 		 (int)*namelen, debug_str(*name),
 		 (int)*emaillen, debug_str(*email));
 
-	item = lookup_prefix(map, *email, *emaillen);
+	item = lookup_one(map, *email, *emaillen);
 	if (item != NULL) {
 		me = (struct mailmap_entry *)item->util;
 		if (me->namemap.nr) {
@@ -334,7 +369,7 @@ int map_user(struct string_list *map,
 			 * simple entry.
 			 */
 			struct string_list_item *subitem;
-			subitem = lookup_prefix(&me->namemap, *name, *namelen);
+			subitem = lookup_one(&me->namemap, *name, *namelen);
 			if (subitem)
 				item = subitem;
 		}
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 586c3a86b1..794133ba5d 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -62,6 +62,41 @@ test_expect_success 'check-mailmap --stdin arguments' '
 	test_cmp expect actual
 '
 
+test_expect_success 'hashed mailmap' '
+	test_config mailmap.file ./hashed &&
+	hashed_author_name="@sha256:$(printf "$GIT_AUTHOR_NAME" | test-tool sha256)" &&
+	hashed_author_email="@sha256:$(printf "$GIT_AUTHOR_EMAIL" | test-tool sha256)" &&
+	cat >expect <<-EOF &&
+	$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>
+	EOF
+
+	cat >hashed <<-EOF &&
+	$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL>
+	EOF
+	git check-mailmap  "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual &&
+	test_cmp expect actual &&
+
+	cat >hashed <<-EOF &&
+	Wrong <wrong@example.org> $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>
+	$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL>
+	EOF
+	# Check that we prefer literal matches over hashed names.
+	git check-mailmap  "$hashed_author_name <$GIT_AUTHOR_EMAIL>" >actual &&
+	test_cmp expect actual &&
+
+	cat >hashed <<-EOF &&
+	$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$hashed_author_email>
+	EOF
+	git check-mailmap  "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual &&
+	test_cmp expect actual &&
+
+	cat >hashed <<-EOF &&
+	$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> <$hashed_author_email>
+	EOF
+	git check-mailmap  "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'check-mailmap bogus contact' '
 	test_must_fail git check-mailmap bogus
 '

  reply	other threads:[~2020-12-13  1:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-13  1:05 [PATCH 0/1] Hashed mailmap support brian m. carlson
2020-12-13  1:05 ` brian m. carlson [this message]
2020-12-13  9:34   ` [PATCH 1/1] mailmap: support hashed entries in mailmaps Johannes Sixt
2020-12-13  9:45     ` Johannes Sixt
2020-12-13 20:38       ` brian m. carlson
2020-12-14  0:09   ` Junio C Hamano
2020-12-16  0:50     ` brian m. carlson
2020-12-14 11:54   ` Ævar Arnfjörð Bjarmason
2020-12-15 11:13   ` Phillip Wood
2020-12-15  1:48 ` [PATCH 0/1] Hashed mailmap support Jeff King
2020-12-15  2:40   ` Jeff King
2020-12-15 11:15   ` Phillip Wood
2020-12-18  2:29   ` brian m. carlson
2020-12-18  5:56     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201213010539.544101-2-sandals@crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).