linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees] [PATCH RFC] checkpatch: extend same_email_address check to load mailmap entries
@ 2020-09-23 10:02 Dwaipayan Ray
  2020-09-23 10:17 ` Joe Perches
  0 siblings, 1 reply; 4+ messages in thread
From: Dwaipayan Ray @ 2020-09-23 10:02 UTC (permalink / raw)
  To: joe; +Cc: dwaipayanray1, linux-kernel-mentees

Checkpatch throws a NO_AUTHOR_SIGN_OFF warning whenever either the
name or email address of author doesn't match any signed-off-by
entry. But it has no mechanism to check whether different email
addresses belong to the same author.

As a result, there have been multiple commits where the author has
signed off with a different email address which caused checkpatch
to product that warning.

An example is commit dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore"),
where the author uses two different mail addresses, and checkpatch
generates a NO_AUTHOR_SIGN_OFF warning.

This was fixed by adding support for loading .mailmap entries.
If a mismatch in name or email address then occurs, the
entries in mailmap can be looked up to deterime the actual author
name and email address.

Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 113 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9e65d21456f1..bfa06c9b9625 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1128,6 +1128,78 @@ sub top_of_kernel_tree {
 	return 1;
 }
 
+my $mailmap;
+
+sub read_mailmap {
+    $mailmap = {
+	names => {},
+	addresses => {}
+    };
+
+    return if (!defined($root) || !(-f "${root}/.mailmap"));
+
+    open(my $mailmap_file, '<', "${root}/.mailmap")
+	or warn "$P: Can't open .mailmap: $!\n";
+
+    while (<$mailmap_file>) {
+	s/#.*$//; #strip comments
+	s/^\s+|\s+$//g; #trim
+
+	next if (/^\s*$/); #skip empty lines
+	#entries have one of the following formats:
+	# name1 <mail1>
+	# <mail1> <mail2>
+	# name1 <mail1> <mail2>
+	# name1 <mail1> name2 <mail2>
+	# (see man git-shortlog)
+
+	if (/^([^<]+)<([^>]+)>$/) {
+	    my $real_name = $1;
+	    my $address = $2;
+
+	    $real_name =~ s/\s+$//;
+	    ($real_name, $address) = parse_email("$real_name <$address>");
+	    $mailmap->{names}->{$address} = $real_name;
+
+	} elsif (/^<([^>]+)>\s*<([^>]+)>$/) {
+	    my $real_address = $1;
+	    my $wrong_address = $2;
+
+	    $mailmap->{addresses}->{$wrong_address} = $real_address;
+
+	} elsif (/^(.+)<([^>]+)>\s*<([^>]+)>$/) {
+	    my $real_name = $1;
+	    my $real_address = $2;
+	    my $wrong_address = $3;
+
+	    $real_name =~ s/\s+$//;
+	    ($real_name, $real_address) =
+		parse_email("$real_name <$real_address>");
+	    $mailmap->{names}->{$wrong_address} = $real_name;
+	    $mailmap->{addresses}->{$wrong_address} = $real_address;
+
+	} elsif (/^(.+)<([^>]+)>\s*(.+)\s*<([^>]+)>$/) {
+	    my $real_name = $1;
+	    my $real_address = $2;
+	    my $wrong_name = $3;
+	    my $wrong_address = $4;
+
+	    $real_name =~ s/\s+$//;
+	    ($real_name, $real_address) =
+		parse_email("$real_name <$real_address>");
+
+	    $wrong_name =~ s/\s+$//;
+	    ($wrong_name, $wrong_address) =
+		parse_email("$wrong_name <$wrong_address>");
+
+	    my $wrong_email = format_email($wrong_name, $wrong_address);
+	    $mailmap->{names}->{$wrong_email} = $real_name;
+	    $mailmap->{addresses}->{$wrong_email} = $real_address;
+	}
+    }
+    close($mailmap_file);
+}
+
 sub parse_email {
 	my ($formatted_email) = @_;
 
@@ -1210,14 +1282,50 @@ sub reformat_email {
 	return format_email($email_name, $email_address);
 }
 
+sub mailmap_email {
+    my ($name, $address) = @_;
+
+    my $email = format_email($name, $address);
+    my $real_name = $name;
+    my $real_address = $address;
+
+    if (exists $mailmap->{names}->{$email} ||
+	exists $mailmap->{addresses}->{$email}) {
+	if (exists $mailmap->{names}->{$email}) {
+	    $real_name = $mailmap->{names}->{$email};
+	}
+	if (exists $mailmap->{addresses}->{$email}) {
+	    $real_address = $mailmap->{addresses}->{$email};
+	}
+    } else {
+	if (exists $mailmap->{names}->{$address}) {
+	    $real_name = $mailmap->{names}->{$address};
+	}
+	if (exists $mailmap->{addresses}->{$address}) {
+	    $real_address = $mailmap->{addresses}->{$address};
+	}
+    }
+    return ($real_name, $real_address);
+}
+
 sub same_email_addresses {
 	my ($email1, $email2) = @_;
 
 	my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
 	my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
 
-	return $email1_name eq $email2_name &&
-	       $email1_address eq $email2_address;
+	my $name_match = $email1_name eq $email2_name;
+	my $address_match = $email1_address eq $email2_address;
+
+	if(!$name_match || !$address_match) {
+		my ($real_name1, $real_address1) = mailmap_email($email1_name, $email1_address);
+		my ($real_name2, $real_address2) = mailmap_email($email2_name, $email2_address);
+
+		$name_match |= ($real_name1 eq $real_name2);
+		$address_match |= ($real_address1 eq $real_address2);
+	}
+
+	return $name_match && $address_match;
 }
 
 sub which {
@@ -2400,6 +2508,7 @@ sub process {
 
 	my $checklicenseline = 1;
 
+	read_mailmap();
 	sanitise_line_reset();
 	my $line;
 	foreach my $rawline (@rawlines) {
-- 
2.27.0

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH RFC] checkpatch: extend same_email_address check to load mailmap entries
  2020-09-23 10:02 [Linux-kernel-mentees] [PATCH RFC] checkpatch: extend same_email_address check to load mailmap entries Dwaipayan Ray
@ 2020-09-23 10:17 ` Joe Perches
  2020-09-23 11:11   ` Dwaipayan Ray
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Perches @ 2020-09-23 10:17 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

On Wed, 2020-09-23 at 15:32 +0530, Dwaipayan Ray wrote:
> Checkpatch throws a NO_AUTHOR_SIGN_OFF warning whenever either the
> name or email address of author doesn't match any signed-off-by
> entry. But it has no mechanism to check whether different email
> addresses belong to the same author.
> 
> As a result, there have been multiple commits where the author has
> signed off with a different email address which caused checkpatch
> to product that warning.
> 
> An example is commit dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore"),
> where the author uses two different mail addresses, and checkpatch
> generates a NO_AUTHOR_SIGN_OFF warning.

.mailmap entries are used when a person no longer has
access to a particular email and a mapping is created to
allow these old/invalid email addresses to be converted
to a current/valid email address.

The idea here is that a person may have a .gitconfig or
equivalent that still uses an old/invalid email address
but uses a Signed-off-by: for a current/valid one, so
a warning _should_ be emitted for this case.

At least for checkpatch, mismatches in email addresses
should not be papered over by assuming equivalence between
multiple email addresses.

So this should be a separate test and not used in a function
named
"same_email_address".  These are specifically _not_
the same email
addresses.


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH RFC] checkpatch: extend same_email_address check to load mailmap entries
  2020-09-23 10:17 ` Joe Perches
@ 2020-09-23 11:11   ` Dwaipayan Ray
  2020-09-23 15:08     ` Joe Perches
  0 siblings, 1 reply; 4+ messages in thread
From: Dwaipayan Ray @ 2020-09-23 11:11 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees

> .mailmap entries are used when a person no longer has
> access to a particular email and a mapping is created to
> allow these old/invalid email addresses to be converted
> to a current/valid email address.
>
> The idea here is that a person may have a .gitconfig or
> equivalent that still uses an old/invalid email address
> but uses a Signed-off-by: for a current/valid one, so
> a warning _should_ be emitted for this case.
>
> At least for checkpatch, mismatches in email addresses
> should not be papered over by assuming equivalence between
> multiple email addresses.
>
> So this should be a separate test and not used in a function
> named
> "same_email_address".  These are specifically _not_
> the same email
> addresses.
>
>

Hi,
Thanks for the clarification.

So instead of eliminating the warning completely will it be a
better alternative to display a more descriptive warning
on the lines of:

  "WARNING: NO_AUTHOR_SIGN_OFF:
   invalid/obsolete email address used in signoff by
   author $author"

Or is it better to ignore the mailmap extension in checkpatch
and just display warnings for either mismatch in name/
mismatch in email.

Like, you had mentioned one in:
https://lore.kernel.org/linux-kernel-mentees/
7958ded756c895ca614ba900aae7b830a992475e.camel@perches.com/

In that case, same name, but different mail address was
reported as a different warning.

So, I believe two cases can be handled better:
1) Same name, different email
2) Same email, different name

For case 2, it might also be possible to ignore the warning
completely.

What do you think is better?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH RFC] checkpatch: extend same_email_address check to load mailmap entries
  2020-09-23 11:11   ` Dwaipayan Ray
@ 2020-09-23 15:08     ` Joe Perches
  0 siblings, 0 replies; 4+ messages in thread
From: Joe Perches @ 2020-09-23 15:08 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

On Wed, 2020-09-23 at 16:41 +0530, Dwaipayan Ray wrote:
> > .mailmap entries are used when a person no longer has
> > access to a particular email and a mapping is created to
> > allow these old/invalid email addresses to be converted
> > to a current/valid email address.
> > 
> > The idea here is that a person may have a .gitconfig or
> > equivalent that still uses an old/invalid email address
> > but uses a Signed-off-by: for a current/valid one, so
> > a warning _should_ be emitted for this case.
> > 
> > At least for checkpatch, mismatches in email addresses
> > should not be papered over by assuming equivalence between
> > multiple email addresses.
> > 
> > So this should be a separate test and not used in a function
> > named
> > "same_email_address".  These are specifically _not_
> > the same email
> > addresses.
> > 
> > 
> 
> Hi,
> Thanks for the clarification.
> 
> So instead of eliminating the warning completely will it be a
> better alternative to display a more descriptive warning
> on the lines of:
> 
>   "WARNING: NO_AUTHOR_SIGN_OFF:
>    invalid/obsolete email address used in signoff by
>    author $author"
> 
> Or is it better to ignore the mailmap extension in checkpatch
> and just display warnings for either mismatch in name/
> mismatch in email.
> 
> Like, you had mentioned one in:
> https://lore.kernel.org/linux-kernel-mentees/
> 7958ded756c895ca614ba900aae7b830a992475e.camel@perches.com/
> 
> In that case, same name, but different mail address was
> reported as a different warning.
> 
> So, I believe two cases can be handled better:
> 1) Same name, different email
> 2) Same email, different name
> 
> For case 2, it might also be possible to ignore the warning
> completely.

Maybe both.

I suggest seeing if extensions to email addresses like:

	<local_part+extension@domain.tld>
could be compared to:
	<local_part@domain.tld>

and comment blocks like:

	"First Last" (via mailing list) <local_part@domain.tld>
could be compared to:
	"First Last" <local_part@domain.tlc> (MAINTAINER)

Still, I think all of these should emit warnings or
--strict checks messages.

The idea is to notify that some oddity has occurred
rather than silently accept it.


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-23 19:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 10:02 [Linux-kernel-mentees] [PATCH RFC] checkpatch: extend same_email_address check to load mailmap entries Dwaipayan Ray
2020-09-23 10:17 ` Joe Perches
2020-09-23 11:11   ` Dwaipayan Ray
2020-09-23 15:08     ` Joe Perches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).