linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
@ 2020-09-18 10:06 Lukas Bulwahn
  2020-09-18 10:29 ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-18 10:06 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

Hi Dwaipayan, hi others,

I had a quick look on the NO AUTHOR SIGN OFF issues reported by 
checkpatch.pl on v5.4..v5.8 on my own small script.

After collecting all the data in a tsv (no details on that tsv here):

$ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l
1064

$ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv  | cut -f 7 | sort  | uniq -c | 
sort -nr | head -n 8
    175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'
    116 Missing Signed-off-by: line by nominal patch author ''
     68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy@gmail.com>'
     43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen@synopsys.com>'
     40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl@gmail.com>'
     36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei@solarflare.com>'
     31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks@vt.edu>'
     24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels@cs.washington.edu>'


Here a quick look at the first two:

    175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter 
<daniel.vetter@ffwll.ch>'

$ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \
  | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter
$ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \
  xargs git show  --format="%b" -s | \
  grep "Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>" | \
  wc -l

So all 175 commits are of the type:

Author: Daniel Vetter <daniel.vetter@ffwll.ch>
...
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


and the second:

    116 Missing Signed-off-by: line by nominal patch author ''

That is probably due to not parsing the patch author with a line break.


So, if we can find a solution for Daniel Vetter (of course, not 
hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him 
and making use of that in checkpatch.pl,

... and for the encoding problem, then we got around 27% of the 
NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the 
next few remaining ones. The longer tail of warnings are clearly warnings 
that deserve to be pointed out to newbies with a broken setup.

I hope you can continue to work on a solution for this class.


Thanks for your initial investigation,

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-18 10:06 [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues Lukas Bulwahn
@ 2020-09-18 10:29 ` Dwaipayan Ray
  2020-09-18 10:44   ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-18 10:29 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

On Fri, Sep 18, 2020 at 3:36 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
> Hi Dwaipayan, hi others,
>
> I had a quick look on the NO AUTHOR SIGN OFF issues reported by
> checkpatch.pl on v5.4..v5.8 on my own small script.
>
> After collecting all the data in a tsv (no details on that tsv here):
>
> $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l
> 1064
>
> $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv  | cut -f 7 | sort  | uniq -c |
> sort -nr | head -n 8
>     175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'
>     116 Missing Signed-off-by: line by nominal patch author ''
>      68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy@gmail.com>'
>      43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen@synopsys.com>'
>      40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl@gmail.com>'
>      36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei@solarflare.com>'
>      31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks@vt.edu>'
>      24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels@cs.washington.edu>'
>
>
> Here a quick look at the first two:
>
>     175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter
> <daniel.vetter@ffwll.ch>'
>
> $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \
>   | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter
> $ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \
>   xargs git show  --format="%b" -s | \
>   grep "Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>" | \
>   wc -l
>
> So all 175 commits are of the type:
>
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> ...
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>
>
> and the second:
>
>     116 Missing Signed-off-by: line by nominal patch author ''
>
> That is probably due to not parsing the patch author with a line break.
>
>
> So, if we can find a solution for Daniel Vetter (of course, not
> hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him
> and making use of that in checkpatch.pl,
>
> ... and for the encoding problem, then we got around 27% of the
> NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the
> next few remaining ones. The longer tail of warnings are clearly warnings
> that deserve to be pointed out to newbies with a broken setup.
>
> I hope you can continue to work on a solution for this class.
>
>
> Thanks for your initial investigation,
>
> Lukas

Hi,
I think I have figured out how to fix the authors with a line break.
The checkpatch script keeps a track of the previous line in a variable
$prevline. I can use it to fix the author's initial signature. I will
send you a patch on this when it's done.

And for Daniel Vetter's issue, it will be a bit more complex i think
as checkpatch
seems to check both author's name and author's email by parsing two
strings.

At revision 10b82d5176488acee2820e5a2cf0f2ec5c3488b6,
scripts/checkpatch.pl,


line 2673:
if ($author ne '') {
if (same_email_addresses($1, $author)) {
$authorsignoff = 1;
}
}

line 1213:
sub same_email_addresses {
my ($email1, $email2) = @_;

my ($email1_name, $name1_comment, $email1_address, $comment1)
 = parse_email($email1);
my ($email2_name, $name2_comment, $email2_address, $comment2)
 = parse_email($email2);

return $email1_name eq $email2_name &&
$email1_address eq $email2_address;
}

The easiest way at this point will be to modify same_email_address
subroutine and add checks for other email addresses belonging to
the same author by loading .mailmap.
But will this make the subroutine heavy?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-18 10:29 ` Dwaipayan Ray
@ 2020-09-18 10:44   ` Lukas Bulwahn
  2020-09-21  9:07     ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-18 10:44 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Fri, 18 Sep 2020, Dwaipayan Ray wrote:

> On Fri, Sep 18, 2020 at 3:36 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> >
> > Hi Dwaipayan, hi others,
> >
> > I had a quick look on the NO AUTHOR SIGN OFF issues reported by
> > checkpatch.pl on v5.4..v5.8 on my own small script.
> >
> > After collecting all the data in a tsv (no details on that tsv here):
> >
> > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l
> > 1064
> >
> > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv  | cut -f 7 | sort  | uniq -c |
> > sort -nr | head -n 8
> >     175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>'
> >     116 Missing Signed-off-by: line by nominal patch author ''
> >      68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy@gmail.com>'
> >      43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen@synopsys.com>'
> >      40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl@gmail.com>'
> >      36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei@solarflare.com>'
> >      31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks@vt.edu>'
> >      24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels@cs.washington.edu>'
> >
> >
> > Here a quick look at the first two:
> >
> >     175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter
> > <daniel.vetter@ffwll.ch>'
> >
> > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \
> >   | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter
> > $ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \
> >   xargs git show  --format="%b" -s | \
> >   grep "Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>" | \
> >   wc -l
> >
> > So all 175 commits are of the type:
> >
> > Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ...
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >
> >
> > and the second:
> >
> >     116 Missing Signed-off-by: line by nominal patch author ''
> >
> > That is probably due to not parsing the patch author with a line break.
> >
> >
> > So, if we can find a solution for Daniel Vetter (of course, not
> > hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him
> > and making use of that in checkpatch.pl,
> >
> > ... and for the encoding problem, then we got around 27% of the
> > NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the
> > next few remaining ones. The longer tail of warnings are clearly warnings
> > that deserve to be pointed out to newbies with a broken setup.
> >
> > I hope you can continue to work on a solution for this class.
> >
> >
> > Thanks for your initial investigation,
> >
> > Lukas
> 
> Hi,
> I think I have figured out how to fix the authors with a line break.
> The checkpatch script keeps a track of the previous line in a variable
> $prevline. I can use it to fix the author's initial signature. I will
> send you a patch on this when it's done.
>

Okay, let us start to have this first patch correct and accepted by Joe 
Perches.
 
> And for Daniel Vetter's issue, it will be a bit more complex i think
> as checkpatch
> seems to check both author's name and author's email by parsing two
> strings.
>

When the first patch is accepted, let us discuss with Joe the best 
solution for this case.

> At revision 10b82d5176488acee2820e5a2cf0f2ec5c3488b6,
> scripts/checkpatch.pl,
> 
> 
> line 2673:
> if ($author ne '') {
> if (same_email_addresses($1, $author)) {
> $authorsignoff = 1;
> }
> }
> 
> line 1213:
> sub same_email_addresses {
> my ($email1, $email2) = @_;
> 
> my ($email1_name, $name1_comment, $email1_address, $comment1)
>  = parse_email($email1);
> my ($email2_name, $name2_comment, $email2_address, $comment2)
>  = parse_email($email2);
> 
> return $email1_name eq $email2_name &&
> $email1_address eq $email2_address;
> }
> 
> The easiest way at this point will be to modify same_email_address
> subroutine and add checks for other email addresses belonging to
> the same author by loading .mailmap.
> But will this make the subroutine heavy?
>

Yes, it is becoming heavier... but maybe still as performant as before if 
done right.

You could always run a quick fast path check and only if you cannot find 
the sign-off, you do a slow path, loading .mailmap and checking.

By the way, there is already the function read_mailmap in 
./scripts/get_maintainers.pl, so we would not need to implement that, just 
find a good way for both scripts to share this implementation. 

But let us do the first patch and then discuss with Joe.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-18 10:44   ` Lukas Bulwahn
@ 2020-09-21  9:07     ` Dwaipayan Ray
  2020-09-21  9:12       ` Lukas Bulwahn
  2020-09-21  9:15       ` Lukas Bulwahn
  0 siblings, 2 replies; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-21  9:07 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

Hi Lukas,
I did send out the patch.
Weirdly enough, my gmail inbox decided to combine both
threads with subject [PATCH v2] and [PATCH v3]. Did you
get it separate or is it the same case? In other mail boxes
it seems to be okay.

Apart from that for fixing the different email addresses, I
am thinking to try copying the read_mailmap subroutine
from scripts/get_maintainer.pl, see how it works and
try to write a solution based on that. I will let you know
once I achieve something.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-21  9:07     ` Dwaipayan Ray
@ 2020-09-21  9:12       ` Lukas Bulwahn
  2020-09-21  9:15       ` Lukas Bulwahn
  1 sibling, 0 replies; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-21  9:12 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Mon, 21 Sep 2020, Dwaipayan Ray wrote:

> Hi Lukas,
> I did send out the patch.
> Weirdly enough, my gmail inbox decided to combine both
> threads with subject [PATCH v2] and [PATCH v3]. Did you
> get it separate or is it the same case? In other mail boxes
> it seems to be okay.
>

That is a bug in the gmail client. It looks good in my proper email 
client.
 
> Apart from that for fixing the different email addresses, I
> am thinking to try copying the read_mailmap subroutine
> from scripts/get_maintainer.pl, see how it works and
> try to write a solution based on that. I will let you know
> once I achieve something.
>

Sure, I am still wondering if it better to just use the .mailmap
file or if we should introduce a separate file with the same format.

But let us start implementing and see.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-21  9:07     ` Dwaipayan Ray
  2020-09-21  9:12       ` Lukas Bulwahn
@ 2020-09-21  9:15       ` Lukas Bulwahn
  2020-09-22 13:21         ` Dwaipayan Ray
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-21  9:15 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Mon, 21 Sep 2020, Dwaipayan Ray wrote:

> Hi Lukas,
> I did send out the patch.
> Weirdly enough, my gmail inbox decided to combine both
> threads with subject [PATCH v2] and [PATCH v3]. Did you
> get it separate or is it the same case? In other mail boxes
> it seems to be okay.
> 
> Apart from that for fixing the different email addresses, I
> am thinking to try copying the read_mailmap subroutine
> from scripts/get_maintainer.pl, see how it works and
> try to write a solution based on that. I will let you know
> once I achieve something.
>

Can you describe in a few sentence what the problem is, and which solution 
you are exploring to Joe, linux-kernel and the mentees list.

I would consider that your project proposal for the first mentee 
milestone. With that, we can progress you to become a proper mentee in the 
community bridge platform.

Lukas
 
> Thanks,
> Dwaipayan.
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-21  9:15       ` Lukas Bulwahn
@ 2020-09-22 13:21         ` Dwaipayan Ray
  2020-09-22 18:38           ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-22 13:21 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

Hi Lukas,

I looked up the mailmap implementations in get_maintainers
and I was able to draw up something. After that I updated
.mailmap with Daniel's mail addresses and ran the
checkpatch script on some of Daniel Vetter's commits and
the author sign off warning no longer appears :).

Let me know what you think of this.
I am sending you the diff for now.

Thanks,
Dwaipayan.

--------------------------------------------------------
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9e65d21456f1..e8c12a5a7e7b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -51,6 +51,7 @@ my %ignore_type = ();
 my @ignore = ();
 my $help = 0;
 my $configuration_file = ".checkpatch.conf";
+my $lk_path = "./";
 my $max_line_length = 100;
 my $ignore_perl_version = 0;
 my $minimum_perl_version = 5.10.0;
@@ -1128,6 +1129,78 @@ sub top_of_kernel_tree {
  return 1;
 }

+my $mailmap;
+
+sub read_mailmap {
+    $mailmap = {
+ names => {},
+ addresses => {}
+    };
+
+    return if (!defined($lk_path) || !(-f "${lk_path}.mailmap"));
+
+    open(my $mailmap_file, '<', "${lk_path}.mailmap")
+ or warn "$P: Can't open .mailmap: $!\n";
+
+    while (<$mailmap_file>) {
+ s/#.*$//; #strip comments
+ s/^\s+|\s+$//g; #trim
+
+ next if (/^\s*$/); #skip empty lines
+ #entries have one of the following formats:
+ # name1 <mail1>
+ # <mail1> <mail2>
+ # name1 <mail1> <mail2>
+ # name1 <mail1> name2 <mail2>
+ # (see man git-shortlog)
+
+ if (/^([^<]+)<([^>]+)>$/) {
+     my $real_name = $1;
+     my $address = $2;
+
+     $real_name =~ s/\s+$//;
+     ($real_name, $address) = parse_email("$real_name <$address>");
+     $mailmap->{names}->{$address} = $real_name;
+
+ } elsif (/^<([^>]+)>\s*<([^>]+)>$/) {
+     my $real_address = $1;
+     my $wrong_address = $2;
+
+     $mailmap->{addresses}->{$wrong_address} = $real_address;
+
+ } elsif (/^(.+)<([^>]+)>\s*<([^>]+)>$/) {
+     my $real_name = $1;
+     my $real_address = $2;
+     my $wrong_address = $3;
+
+     $real_name =~ s/\s+$//;
+     ($real_name, $real_address) =
+   parse_email("$real_name <$real_address>");
+     $mailmap->{names}->{$wrong_address} = $real_name;
+     $mailmap->{addresses}->{$wrong_address} = $real_address;
+
+ } elsif (/^(.+)<([^>]+)>\s*(.+)\s*<([^>]+)>$/) {
+     my $real_name = $1;
+     my $real_address = $2;
+     my $wrong_name = $3;
+     my $wrong_address = $4;
+
+     $real_name =~ s/\s+$//;
+     ($real_name, $real_address) =
+   parse_email("$real_name <$real_address>");
+
+     $wrong_name =~ s/\s+$//;
+     ($wrong_name, $wrong_address) =
+   parse_email("$wrong_name <$wrong_address>");
+
+     my $wrong_email = format_email($wrong_name, $wrong_address);
+     $mailmap->{names}->{$wrong_email} = $real_name;
+     $mailmap->{addresses}->{$wrong_email} = $real_address;
+ }
+    }
+    close($mailmap_file);
+}
+
 sub parse_email {
  my ($formatted_email) = @_;

@@ -1210,14 +1283,50 @@ sub reformat_email {
  return format_email($email_name, $email_address);
 }

+sub mailmap_email {
+    my ($name, $address) = @_;
+
+    my $email = format_email($name, $address);
+    my $real_name = $name;
+    my $real_address = $address;
+
+    if (exists $mailmap->{names}->{$email} ||
+ exists $mailmap->{addresses}->{$email}) {
+ if (exists $mailmap->{names}->{$email}) {
+     $real_name = $mailmap->{names}->{$email};
+ }
+ if (exists $mailmap->{addresses}->{$email}) {
+     $real_address = $mailmap->{addresses}->{$email};
+ }
+    } else {
+ if (exists $mailmap->{names}->{$address}) {
+     $real_name = $mailmap->{names}->{$address};
+ }
+ if (exists $mailmap->{addresses}->{$address}) {
+     $real_address = $mailmap->{addresses}->{$address};
+ }
+    }
+    return ($real_name, $real_address);
+}
+
 sub same_email_addresses {
  my ($email1, $email2) = @_;

  my ($email1_name, $name1_comment, $email1_address, $comment1) =
parse_email($email1);
  my ($email2_name, $name2_comment, $email2_address, $comment2) =
parse_email($email2);

- return $email1_name eq $email2_name &&
-        $email1_address eq $email2_address;
+ my $name_match = $email1_name eq $email2_name;
+ my $address_match = $email1_address eq $email2_address;
+
+ if(!$name_match || !$address_match) {
+   my ($real_name1, $real_address1) = mailmap_email($email1_name,
$email1_address);
+   my ($real_name2, $real_address2) = mailmap_email($email2_name,
$email2_address);
+
+   $name_match |= ($real_name1 eq $real_name2);
+   $address_match |= ($real_address1 eq $real_address2);
+ }
+
+ return $name_match && $address_match;
 }

 sub which {
@@ -2400,6 +2509,7 @@ sub process {

  my $checklicenseline = 1;

+ read_mailmap();
  sanitise_line_reset();
  my $line;
  foreach my $rawline (@rawlines) {
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-22 13:21         ` Dwaipayan Ray
@ 2020-09-22 18:38           ` Lukas Bulwahn
  2020-09-22 19:08             ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-22 18:38 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Tue, 22 Sep 2020, Dwaipayan Ray wrote:

> Hi Lukas,
> 
> I looked up the mailmap implementations in get_maintainers
> and I was able to draw up something. After that I updated
> .mailmap with Daniel's mail addresses and ran the
> checkpatch script on some of Daniel Vetter's commits and
> the author sign off warning no longer appears :).
> 
> Let me know what you think of this.
> I am sending you the diff for now.
>

Generally, I think it is a good first proof of concept.
I believe you that functionality 'basically' works; again, we might 
already want to run a full-scale evaluation on that. Just to see
if there are some impacts we might not be aware of yet.

As you already wrote, we, Joe, you and me, need to figure out
now all the further details:

- how can we avoid the duplicate code in checkpatch.pl and 
get_maintainers.pl?
 
- what is performance impact, especially as AUTHOR_SIGN_OFF check is not 
triggered often, and there are many other rules in checkpatch.pl?

- further details, such as why do we need the lk_path is the first place? 
and many more questions of that kind.

Feel free to sketch a first commit message and create a PATCH RFC for the 
discussion with Joe.


Lukas

> Thanks,
> Dwaipayan.
> 
> --------------------------------------------------------
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9e65d21456f1..e8c12a5a7e7b 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -51,6 +51,7 @@ my %ignore_type = ();
>  my @ignore = ();
>  my $help = 0;
>  my $configuration_file = ".checkpatch.conf";
> +my $lk_path = "./";
>  my $max_line_length = 100;
>  my $ignore_perl_version = 0;
>  my $minimum_perl_version = 5.10.0;
> @@ -1128,6 +1129,78 @@ sub top_of_kernel_tree {
>   return 1;
>  }
> 
> +my $mailmap;
> +
> +sub read_mailmap {
> +    $mailmap = {
> + names => {},
> + addresses => {}
> +    };
> +
> +    return if (!defined($lk_path) || !(-f "${lk_path}.mailmap"));
> +
> +    open(my $mailmap_file, '<', "${lk_path}.mailmap")
> + or warn "$P: Can't open .mailmap: $!\n";
> +
> +    while (<$mailmap_file>) {
> + s/#.*$//; #strip comments
> + s/^\s+|\s+$//g; #trim
> +
> + next if (/^\s*$/); #skip empty lines
> + #entries have one of the following formats:
> + # name1 <mail1>
> + # <mail1> <mail2>
> + # name1 <mail1> <mail2>
> + # name1 <mail1> name2 <mail2>
> + # (see man git-shortlog)
> +
> + if (/^([^<]+)<([^>]+)>$/) {
> +     my $real_name = $1;
> +     my $address = $2;
> +
> +     $real_name =~ s/\s+$//;
> +     ($real_name, $address) = parse_email("$real_name <$address>");
> +     $mailmap->{names}->{$address} = $real_name;
> +
> + } elsif (/^<([^>]+)>\s*<([^>]+)>$/) {
> +     my $real_address = $1;
> +     my $wrong_address = $2;
> +
> +     $mailmap->{addresses}->{$wrong_address} = $real_address;
> +
> + } elsif (/^(.+)<([^>]+)>\s*<([^>]+)>$/) {
> +     my $real_name = $1;
> +     my $real_address = $2;
> +     my $wrong_address = $3;
> +
> +     $real_name =~ s/\s+$//;
> +     ($real_name, $real_address) =
> +   parse_email("$real_name <$real_address>");
> +     $mailmap->{names}->{$wrong_address} = $real_name;
> +     $mailmap->{addresses}->{$wrong_address} = $real_address;
> +
> + } elsif (/^(.+)<([^>]+)>\s*(.+)\s*<([^>]+)>$/) {
> +     my $real_name = $1;
> +     my $real_address = $2;
> +     my $wrong_name = $3;
> +     my $wrong_address = $4;
> +
> +     $real_name =~ s/\s+$//;
> +     ($real_name, $real_address) =
> +   parse_email("$real_name <$real_address>");
> +
> +     $wrong_name =~ s/\s+$//;
> +     ($wrong_name, $wrong_address) =
> +   parse_email("$wrong_name <$wrong_address>");
> +
> +     my $wrong_email = format_email($wrong_name, $wrong_address);
> +     $mailmap->{names}->{$wrong_email} = $real_name;
> +     $mailmap->{addresses}->{$wrong_email} = $real_address;
> + }
> +    }
> +    close($mailmap_file);
> +}
> +
>  sub parse_email {
>   my ($formatted_email) = @_;
> 
> @@ -1210,14 +1283,50 @@ sub reformat_email {
>   return format_email($email_name, $email_address);
>  }
> 
> +sub mailmap_email {
> +    my ($name, $address) = @_;
> +
> +    my $email = format_email($name, $address);
> +    my $real_name = $name;
> +    my $real_address = $address;
> +
> +    if (exists $mailmap->{names}->{$email} ||
> + exists $mailmap->{addresses}->{$email}) {
> + if (exists $mailmap->{names}->{$email}) {
> +     $real_name = $mailmap->{names}->{$email};
> + }
> + if (exists $mailmap->{addresses}->{$email}) {
> +     $real_address = $mailmap->{addresses}->{$email};
> + }
> +    } else {
> + if (exists $mailmap->{names}->{$address}) {
> +     $real_name = $mailmap->{names}->{$address};
> + }
> + if (exists $mailmap->{addresses}->{$address}) {
> +     $real_address = $mailmap->{addresses}->{$address};
> + }
> +    }
> +    return ($real_name, $real_address);
> +}
> +
>  sub same_email_addresses {
>   my ($email1, $email2) = @_;
> 
>   my ($email1_name, $name1_comment, $email1_address, $comment1) =
> parse_email($email1);
>   my ($email2_name, $name2_comment, $email2_address, $comment2) =
> parse_email($email2);
> 
> - return $email1_name eq $email2_name &&
> -        $email1_address eq $email2_address;
> + my $name_match = $email1_name eq $email2_name;
> + my $address_match = $email1_address eq $email2_address;
> +
> + if(!$name_match || !$address_match) {
> +   my ($real_name1, $real_address1) = mailmap_email($email1_name,
> $email1_address);
> +   my ($real_name2, $real_address2) = mailmap_email($email2_name,
> $email2_address);
> +
> +   $name_match |= ($real_name1 eq $real_name2);
> +   $address_match |= ($real_address1 eq $real_address2);
> + }
> +
> + return $name_match && $address_match;
>  }
> 
>  sub which {
> @@ -2400,6 +2509,7 @@ sub process {
> 
>   my $checklicenseline = 1;
> 
> + read_mailmap();
>   sanitise_line_reset();
>   my $line;
>   foreach my $rawline (@rawlines) {
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-22 18:38           ` Lukas Bulwahn
@ 2020-09-22 19:08             ` Dwaipayan Ray
  2020-09-23  7:32               ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-22 19:08 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

>
> Generally, I think it is a good first proof of concept.
> I believe you that functionality 'basically' works; again, we might
> already want to run a full-scale evaluation on that. Just to see
> if there are some impacts we might not be aware of yet.
>
> As you already wrote, we, Joe, you and me, need to figure out
> now all the further details:
>
> - how can we avoid the duplicate code in checkpatch.pl and
> get_maintainers.pl?
>
> - what is performance impact, especially as AUTHOR_SIGN_OFF check is not
> triggered often, and there are many other rules in checkpatch.pl?
>
> - further details, such as why do we need the lk_path is the first place?
> and many more questions of that kind.
>
> Feel free to sketch a first commit message and create a PATCH RFC for the
> discussion with Joe.
>
>
> Lukas
>

Hi,
As for the lk_path, it can be removed easily. To me too, it didn't make much
sense since it was just a duplicate, as $root should contain the same.

But again due to some reason,
$root in checkpatch had the value ".",
while $lk_path in get_maintainer had the value "./"

I have no idea yet if this was a design decision or just different handling.

So, I can change the part where I referenced the mailmap file by adding
a trailing / with $root rather than $lk_path. That should do it.

And for the duplicate code part, Joe did mention that either I could copy
or place the read_mailmap code in a separate file and reference both
checkpatch and get_maintainers from there.

To me, copying seems much feasible because the referenced part of the
mailmap handling code here is very small as there are some minor
changes.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-22 19:08             ` Dwaipayan Ray
@ 2020-09-23  7:32               ` Lukas Bulwahn
  2020-09-23  7:38                 ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-23  7:32 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Wed, 23 Sep 2020, Dwaipayan Ray wrote:

> >
> > Generally, I think it is a good first proof of concept.
> > I believe you that functionality 'basically' works; again, we might
> > already want to run a full-scale evaluation on that. Just to see
> > if there are some impacts we might not be aware of yet.
> >
> > As you already wrote, we, Joe, you and me, need to figure out
> > now all the further details:
> >
> > - how can we avoid the duplicate code in checkpatch.pl and
> > get_maintainers.pl?
> >
> > - what is performance impact, especially as AUTHOR_SIGN_OFF check is not
> > triggered often, and there are many other rules in checkpatch.pl?
> >
> > - further details, such as why do we need the lk_path is the first place?
> > and many more questions of that kind.
> >
> > Feel free to sketch a first commit message and create a PATCH RFC for the
> > discussion with Joe.
> >
> >
> > Lukas
> >
> 
> Hi,
> As for the lk_path, it can be removed easily. To me too, it didn't make much
> sense since it was just a duplicate, as $root should contain the same.
> 
> But again due to some reason,
> $root in checkpatch had the value ".",
> while $lk_path in get_maintainer had the value "./"
> 
> I have no idea yet if this was a design decision or just different handling.
> 
> So, I can change the part where I referenced the mailmap file by adding
> a trailing / with $root rather than $lk_path. That should do it.
> 
> And for the duplicate code part, Joe did mention that either I could copy
> or place the read_mailmap code in a separate file and reference both
> checkpatch and get_maintainers from there.
> 
> To me, copying seems much feasible because the referenced part of the
> mailmap handling code here is very small as there are some minor
> changes.
>

Well, if you can argue copying code, it is fine. Go ahead and send out a 
RFC patch.

Regarding the mentorship, would you have time for some virtual 
face-to-face meeting to discuss the Eligibility Criteria and Mentorship 
Models?


Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-23  7:32               ` Lukas Bulwahn
@ 2020-09-23  7:38                 ` Dwaipayan Ray
  2020-09-23  7:42                   ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-23  7:38 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

> > Hi,
> > As for the lk_path, it can be removed easily. To me too, it didn't make much
> > sense since it was just a duplicate, as $root should contain the same.
> >
> > But again due to some reason,
> > $root in checkpatch had the value ".",
> > while $lk_path in get_maintainer had the value "./"
> >
> > I have no idea yet if this was a design decision or just different handling.
> >
> > So, I can change the part where I referenced the mailmap file by adding
> > a trailing / with $root rather than $lk_path. That should do it.
> >
> > And for the duplicate code part, Joe did mention that either I could copy
> > or place the read_mailmap code in a separate file and reference both
> > checkpatch and get_maintainers from there.
> >
> > To me, copying seems much feasible because the referenced part of the
> > mailmap handling code here is very small as there are some minor
> > changes.
> >
>
> Well, if you can argue copying code, it is fine. Go ahead and send out a
> RFC patch.
>

I definitely know that redundant code is a bad practice. But yes, I would
send a patch in with a proper commit message to get yours and Joe's
comments.

> Regarding the mentorship, would you have time for some virtual
> face-to-face meeting to discuss the Eligibility Criteria and Mentorship
> Models?
>
>
> Lukas

Yes sure, what time are you available? I don't have any more classes
today so I am free.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-23  7:38                 ` Dwaipayan Ray
@ 2020-09-23  7:42                   ` Lukas Bulwahn
  2020-09-25  4:18                     ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-23  7:42 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees


> 
> > Regarding the mentorship, would you have time for some virtual
> > face-to-face meeting to discuss the Eligibility Criteria and Mentorship
> > Models?
> >
> >
> > Lukas
> 
> Yes sure, what time are you available? I don't have any more classes
> today so I am free.
>

How about now? Let us quickly meet here:

https://meet.jit.si/Checkpatch.pl-Mentorship

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-23  7:42                   ` Lukas Bulwahn
@ 2020-09-25  4:18                     ` Dwaipayan Ray
  2020-09-25  7:20                       ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-25  4:18 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

Hi,
As Joe mentioned earlier, there might be four new
warnings to handle better:

1) Same name, different address
2) Same address, different name
3) email extensions present in header but
    not in signed off by
4) comment blocks after name

I am thinking to solve the first three in a single patch.

As for the email extensions part, should it be handled
differently? There will be redundant calls to that
though because it's not an error seen quite often.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-25  4:18                     ` Dwaipayan Ray
@ 2020-09-25  7:20                       ` Lukas Bulwahn
  2020-09-25  7:29                         ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-25  7:20 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 2883 bytes --]



On Fri, 25 Sep 2020, Dwaipayan Ray wrote:

> Hi,
> As Joe mentioned earlier, there might be four new
> warnings to handle better:
> 
> 1) Same name, different address
> 2) Same address, different name
> 3) email extensions present in header but
>     not in signed off by
> 4) comment blocks after name
> 
> I am thinking to solve the first three in a single patch.
> 
> As for the email extensions part, should it be handled
> differently? There will be redundant calls to that
> though because it's not an error seen quite often.
>

Dwaipayan, thanks for this initial structuring.

Let us try to put some systematic structure into this handling of the 
NO_AUTHOR_SIGN_OFF.

There are basically two aspects to consider:

- which classes are potential mismatches can we potentially observe?

- which level of severity do we assign to each class of mismatch?

To the first point, you started with some initial classes above.

  0) same name, same address (no mismatch)
  1) same name, different address
     subclasses: (how 'different' address?)
     - email address differ by valid difference, e.g., extensions that 
       will go to the same inbox, e.g.,  the xyz+fds extension in mail 
       addresses. (that is your class 3, right?)
     - two email addresses that are known to belong to the same individual
       - known because of .mailmap
       - known otherwise?
  2) different name, same address
     subclasses: (how 'different' name?)
     - examples:
       - Firstname, Lastname (but middle-name initials differ)
       - special "regional" characters, like ü vs. ue, etc.
       - firstname and lastname are reverted etc.
  3) different name, different address (but still we believe it "matches")
     combinations of subclasses of 1) and 2), which we want to consider.

Then, to the second question:

So, these different classes we can think of and "assign" different 
severities that checkpatch.pl can use.

Checkpatch.pl already has four levels:

three severity reporting levels: ERROR, WARN, CHECK, and
the level _do not report_ (which is implemented by just ignoring some kind 
of difference).

I think a lot of discussion will be around what severity to assign to 
which class above, and in some way, long-term maintainers have a larger 
say than we do here.

So, let us first modify checkpatch.pl to identify the various classes 
above, maybe just starting very basic, with different name, same address
and same name, different address and run it over the commit range and see 
which examples show up and how often.

For now, we first just use checkpatch.pl to identify the different types 
and refine them into subclasses; then, we can discuss with severity should 
be assigned to which type of mismatch.

The second question should not invalidate our data collection and 
identification of subclasses, though.

Does that help? What do you think?


Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-25  7:20                       ` Lukas Bulwahn
@ 2020-09-25  7:29                         ` Dwaipayan Ray
  2020-09-25  7:35                           ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-25  7:29 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

> Let us try to put some systematic structure into this handling of the
> NO_AUTHOR_SIGN_OFF.
>
> There are basically two aspects to consider:
>
> - which classes are potential mismatches can we potentially observe?
>
> - which level of severity do we assign to each class of mismatch?
>
> To the first point, you started with some initial classes above.
>
>   0) same name, same address (no mismatch)
>   1) same name, different address
>      subclasses: (how 'different' address?)
>      - email address differ by valid difference, e.g., extensions that
>        will go to the same inbox, e.g.,  the xyz+fds extension in mail
>        addresses. (that is your class 3, right?)
>      - two email addresses that are known to belong to the same individual
>        - known because of .mailmap
>        - known otherwise?
>   2) different name, same address
>      subclasses: (how 'different' name?)
>      - examples:
>        - Firstname, Lastname (but middle-name initials differ)
>        - special "regional" characters, like ü vs. ue, etc.
>        - firstname and lastname are reverted etc.
>   3) different name, different address (but still we believe it "matches")
>      combinations of subclasses of 1) and 2), which we want to consider.
>
> Then, to the second question:
>
> So, these different classes we can think of and "assign" different
> severities that checkpatch.pl can use.
>
> Checkpatch.pl already has four levels:
>
> three severity reporting levels: ERROR, WARN, CHECK, and
> the level _do not report_ (which is implemented by just ignoring some kind
> of difference).
>
> I think a lot of discussion will be around what severity to assign to
> which class above, and in some way, long-term maintainers have a larger
> say than we do here.
>
> So, let us first modify checkpatch.pl to identify the various classes
> above, maybe just starting very basic, with different name, same address
> and same name, different address and run it over the commit range and see
> which examples show up and how often.
>
> For now, we first just use checkpatch.pl to identify the different types
> and refine them into subclasses; then, we can discuss with severity should
> be assigned to which type of mismatch.
>
> The second question should not invalidate our data collection and
> identification of subclasses, though.
>
> Does that help? What do you think?
>
>
> Lukas

Yes, that should help I think. Currently checkpatch is very vague
about Author sign offs, so subclassing it will really be helpful.
But at the same point I don't think it should become
too complex either.

I have already prepared a patch for the basic two classes:
1) same name, different address
2) same address, different name

It would be great to have your feedback on this.

---
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9e65d21456f1..e23e753fcaf0 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1210,14 +1210,22 @@ sub reformat_email {
  return format_email($email_name, $email_address);
 }

-sub same_email_addresses {
+sub same_mail_check {
  my ($email1, $email2) = @_;

  my ($email1_name, $name1_comment, $email1_address, $comment1) =
parse_email($email1);
  my ($email2_name, $name2_comment, $email2_address, $comment2) =
parse_email($email2);

- return $email1_name eq $email2_name &&
-        $email1_address eq $email2_address;
+ my $same_name = $email1_name eq $email2_name;
+ my $same_address= $email1_address eq $email2_address;
+
+ return ($same_name, $same_address);
+}
+
+sub same_email_addresses {
+ my ($same_name, $same_address) = same_mail_check(@_);
+
+ return $same_name && $same_address;
 }

 sub which {
@@ -2347,6 +2355,7 @@ sub process {
  my $signoff = 0;
  my $author = '';
  my $authorsignoff = 0;
+ my $authorsignerr = '';
  my $is_patch = 0;
  my $is_binding_patch = -1;
  my $in_header_lines = $file ? 0 : 1;
@@ -2677,6 +2686,13 @@ sub process {
  if ($author ne '') {
  if (same_email_addresses($1, $author)) {
  $authorsignoff = 1;
+ } else {
+ my ($same_name, $same_address) = same_mail_check($1, $author);
+ if($same_name) {
+ $authorsignerr = "ADDRESS_MISMATCH:${1}";
+ } elsif ($same_address) {
+ $authorsignerr = "NAME_MISMATCH:${1}";
+ }
  }
  }
  }
@@ -6891,8 +6907,16 @@ sub process {
  ERROR("MISSING_SIGN_OFF",
        "Missing Signed-off-by: line(s)\n");
  } elsif (!$authorsignoff) {
- WARN("NO_AUTHOR_SIGN_OFF",
-      "Missing Signed-off-by: line by nominal patch author '$author'\n");
+ if($authorsignerr =~ /^NAME_MISMATCH:(.*)/) {
+ WARN("NO_AUTHOR_SIGN_OFF",
+      "Author name mismatch: 'From: $author' != 'Signed-off-by: $1'\n");
+ } elsif ($authorsignerr =~ /^ADDRESS_MISMATCH:(.*)/) {
+ WARN("NO_AUTHOR_SIGN_OFF",
+      "Author email address mismatch: 'From: $author' !=
'Signed-off-by: $1'\n");
+ } else {
+ WARN("NO_AUTHOR_SIGN_OFF",
+ "Missing Signed-off-by: line by nominal patch author '$author'\n");
+ }
  }
  }
---

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-25  7:29                         ` Dwaipayan Ray
@ 2020-09-25  7:35                           ` Lukas Bulwahn
  2020-09-26 11:31                             ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-25  7:35 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 5755 bytes --]



On Fri, 25 Sep 2020, Dwaipayan Ray wrote:

> > Let us try to put some systematic structure into this handling of the
> > NO_AUTHOR_SIGN_OFF.
> >
> > There are basically two aspects to consider:
> >
> > - which classes are potential mismatches can we potentially observe?
> >
> > - which level of severity do we assign to each class of mismatch?
> >
> > To the first point, you started with some initial classes above.
> >
> >   0) same name, same address (no mismatch)
> >   1) same name, different address
> >      subclasses: (how 'different' address?)
> >      - email address differ by valid difference, e.g., extensions that
> >        will go to the same inbox, e.g.,  the xyz+fds extension in mail
> >        addresses. (that is your class 3, right?)
> >      - two email addresses that are known to belong to the same individual
> >        - known because of .mailmap
> >        - known otherwise?
> >   2) different name, same address
> >      subclasses: (how 'different' name?)
> >      - examples:
> >        - Firstname, Lastname (but middle-name initials differ)
> >        - special "regional" characters, like ü vs. ue, etc.
> >        - firstname and lastname are reverted etc.
> >   3) different name, different address (but still we believe it "matches")
> >      combinations of subclasses of 1) and 2), which we want to consider.
> >
> > Then, to the second question:
> >
> > So, these different classes we can think of and "assign" different
> > severities that checkpatch.pl can use.
> >
> > Checkpatch.pl already has four levels:
> >
> > three severity reporting levels: ERROR, WARN, CHECK, and
> > the level _do not report_ (which is implemented by just ignoring some kind
> > of difference).
> >
> > I think a lot of discussion will be around what severity to assign to
> > which class above, and in some way, long-term maintainers have a larger
> > say than we do here.
> >
> > So, let us first modify checkpatch.pl to identify the various classes
> > above, maybe just starting very basic, with different name, same address
> > and same name, different address and run it over the commit range and see
> > which examples show up and how often.
> >
> > For now, we first just use checkpatch.pl to identify the different types
> > and refine them into subclasses; then, we can discuss with severity should
> > be assigned to which type of mismatch.
> >
> > The second question should not invalidate our data collection and
> > identification of subclasses, though.
> >
> > Does that help? What do you think?
> >
> >
> > Lukas
> 
> Yes, that should help I think. Currently checkpatch is very vague
> about Author sign offs, so subclassing it will really be helpful.
> But at the same point I don't think it should become
> too complex either.

Agree, that is where the evaluation will help (what is really happening, 
what do we need to consider) and the severity comes into play (if we 
consider all subclasses of same severity, we do not need to differentiate 
between them.)

> 
> I have already prepared a patch for the basic two classes:
> 1) same name, different address
> 2) same address, different name
> 
> It would be great to have your feedback on this.

Okay, I suggest to run this script on a larger set of commits and see.

Probably, you can speed it your evaluation with checkpatch.pl by 
restricting it to the AUTHOR_SIGN_OFF type. I will try that over the 
weekend as well.

Lukas

> 
> ---
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9e65d21456f1..e23e753fcaf0 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1210,14 +1210,22 @@ sub reformat_email {
>   return format_email($email_name, $email_address);
>  }
> 
> -sub same_email_addresses {
> +sub same_mail_check {
>   my ($email1, $email2) = @_;
> 
>   my ($email1_name, $name1_comment, $email1_address, $comment1) =
> parse_email($email1);
>   my ($email2_name, $name2_comment, $email2_address, $comment2) =
> parse_email($email2);
> 
> - return $email1_name eq $email2_name &&
> -        $email1_address eq $email2_address;
> + my $same_name = $email1_name eq $email2_name;
> + my $same_address= $email1_address eq $email2_address;
> +
> + return ($same_name, $same_address);
> +}
> +
> +sub same_email_addresses {
> + my ($same_name, $same_address) = same_mail_check(@_);
> +
> + return $same_name && $same_address;
>  }
> 
>  sub which {
> @@ -2347,6 +2355,7 @@ sub process {
>   my $signoff = 0;
>   my $author = '';
>   my $authorsignoff = 0;
> + my $authorsignerr = '';
>   my $is_patch = 0;
>   my $is_binding_patch = -1;
>   my $in_header_lines = $file ? 0 : 1;
> @@ -2677,6 +2686,13 @@ sub process {
>   if ($author ne '') {
>   if (same_email_addresses($1, $author)) {
>   $authorsignoff = 1;
> + } else {
> + my ($same_name, $same_address) = same_mail_check($1, $author);
> + if($same_name) {
> + $authorsignerr = "ADDRESS_MISMATCH:${1}";
> + } elsif ($same_address) {
> + $authorsignerr = "NAME_MISMATCH:${1}";
> + }
>   }
>   }
>   }
> @@ -6891,8 +6907,16 @@ sub process {
>   ERROR("MISSING_SIGN_OFF",
>         "Missing Signed-off-by: line(s)\n");
>   } elsif (!$authorsignoff) {
> - WARN("NO_AUTHOR_SIGN_OFF",
> -      "Missing Signed-off-by: line by nominal patch author '$author'\n");
> + if($authorsignerr =~ /^NAME_MISMATCH:(.*)/) {
> + WARN("NO_AUTHOR_SIGN_OFF",
> +      "Author name mismatch: 'From: $author' != 'Signed-off-by: $1'\n");
> + } elsif ($authorsignerr =~ /^ADDRESS_MISMATCH:(.*)/) {
> + WARN("NO_AUTHOR_SIGN_OFF",
> +      "Author email address mismatch: 'From: $author' !=
> 'Signed-off-by: $1'\n");
> + } else {
> + WARN("NO_AUTHOR_SIGN_OFF",
> + "Missing Signed-off-by: line by nominal patch author '$author'\n");
> + }
>   }
>   }
> ---
> 
> Thanks,
> Dwaipayan.
> 

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-25  7:35                           ` Lukas Bulwahn
@ 2020-09-26 11:31                             ` Dwaipayan Ray
  2020-09-28 13:30                               ` Dwaipayan Ray
  2020-09-28 15:06                               ` Lukas Bulwahn
  0 siblings, 2 replies; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-26 11:31 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

> > Yes, that should help I think. Currently checkpatch is very vague
> > about Author sign offs, so subclassing it will really be helpful.
> > But at the same point I don't think it should become
> > too complex either.
>
> Agree, that is where the evaluation will help (what is really happening,
> what do we need to consider) and the severity comes into play (if we
> consider all subclasses of same severity, we do not need to differentiate
> between them.)
>
> >
> > I have already prepared a patch for the basic two classes:
> > 1) same name, different address
> > 2) same address, different name
> >
> > It would be great to have your feedback on this.
>
> Okay, I suggest to run this script on a larger set of commits and see.
>
> Probably, you can speed it your evaluation with checkpatch.pl by
> restricting it to the AUTHOR_SIGN_OFF type. I will try that over the
> weekend as well.
>
> Lukas

Hi,
So I did run my script to evaluate the type of errors that were generated.
My results are from commits between v5.7 and v5.8

1) Same address, different name

There were 32 such instances. In general, there was no regularity in the
specific error.

There were some cases with letter case mismatch. Like there was one
committer who used the names "Arend Van Spriel", and "Arend van Spriel".
I believe this error should not be generated and matching be made case
neutral.

Apart from this, the rest of the cases in this category were some with missing
first or last names, using names in other languages or having commas in
one name, using initials etc. These should be warned about.

2) Same name, different address

There were 204 such instances. Most of them were of the form in which they
used a completely different email address.

Only one case was found with an email extension.
<hpeter@gmail.com>
<hpeter+linux_kernel@gmail.com>
in commit 423d9118c624

So in general I think most email mismatches should be specified by a WARN
and for email extension there should be a --strict CHECK.

What do you think of this?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-26 11:31                             ` Dwaipayan Ray
@ 2020-09-28 13:30                               ` Dwaipayan Ray
  2020-09-28 14:09                                 ` Lukas Bulwahn
  2020-09-28 15:06                               ` Lukas Bulwahn
  1 sibling, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-28 13:30 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

Hi,
I am continuing this thread, but am writing about another issue
separate from author sign off. While checking checkpatch
output, I was checking the commits with the warnings:

WARNING:UNNECESSARY_ELSE: else is not generally
 useful after a break or return

Looking into the referenced section, I found some
sections with a redundant else.

For example: (revision 196273fffc1c),
arch/powerpc/kernel/security.c , line 360:

static int ssb_prctl_get(struct task_struct *task)
{
if (stf_enabled_flush_types == STF_BARRIER_NONE)
/*
* We don't have an explicit signal from firmware that we're
* vulnerable or not, we only have certain CPU revisions that
* are known to be vulnerable.
*
* We assume that if we're on another CPU, where the barrier is
* NONE, then we are not vulnerable.
*/
return PR_SPEC_NOT_AFFECTED;
else
/*
* If we do have a barrier type then we are vulnerable. The
* barrier is not a global or per-process mitigation, so the
* only value we can report here is PR_SPEC_ENABLE, which
* appears as "vulnerable" in /proc.
*/
return PR_SPEC_ENABLE;

return -EINVAL;
}

The else is pretty much redundant and the control flow
never reaches to return -EINVAL.

Is it possible to clean up all these redundant code?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-28 13:30                               ` Dwaipayan Ray
@ 2020-09-28 14:09                                 ` Lukas Bulwahn
  2020-09-28 14:20                                   ` Dwaipayan Ray
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-28 14:09 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Mon, 28 Sep 2020, Dwaipayan Ray wrote:

> Hi,
> I am continuing this thread, but am writing about another issue
> separate from author sign off. While checking checkpatch
> output, I was checking the commits with the warnings:
>

How about just starting a new thread instead?

I will answer on the new thread.

Lukas
 
> WARNING:UNNECESSARY_ELSE: else is not generally
>  useful after a break or return
> 
> Looking into the referenced section, I found some
> sections with a redundant else.
> 
> For example: (revision 196273fffc1c),
> arch/powerpc/kernel/security.c , line 360:
> 
> static int ssb_prctl_get(struct task_struct *task)
> {
> if (stf_enabled_flush_types == STF_BARRIER_NONE)
> /*
> * We don't have an explicit signal from firmware that we're
> * vulnerable or not, we only have certain CPU revisions that
> * are known to be vulnerable.
> *
> * We assume that if we're on another CPU, where the barrier is
> * NONE, then we are not vulnerable.
> */
> return PR_SPEC_NOT_AFFECTED;
> else
> /*
> * If we do have a barrier type then we are vulnerable. The
> * barrier is not a global or per-process mitigation, so the
> * only value we can report here is PR_SPEC_ENABLE, which
> * appears as "vulnerable" in /proc.
> */
> return PR_SPEC_ENABLE;
> 
> return -EINVAL;
> }
> 
> The else is pretty much redundant and the control flow
> never reaches to return -EINVAL.
> 
> Is it possible to clean up all these redundant code?
> 
> Thanks,
> Dwaipayan.
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-28 14:09                                 ` Lukas Bulwahn
@ 2020-09-28 14:20                                   ` Dwaipayan Ray
  2020-09-28 15:09                                     ` Lukas Bulwahn
  0 siblings, 1 reply; 22+ messages in thread
From: Dwaipayan Ray @ 2020-09-28 14:20 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

>
> How about just starting a new thread instead?
>
> I will answer on the new thread.
>
> Lukas
>

Sure, I will.

Also did my last mail go through to you?

Quoted:
> So I did run my script to evaluate the type of errors that were generated.
> My results are from commits between v5.7 and v5.8
>
> 1) Same address, different name
> There were 32 such instances. In general, there was no regularity in the
> specific error.
>
> There were some cases with letter case mismatch. Like there was one
> committer who used the names "Arend Van Spriel", and "Arend van Spriel".
> I believe this error should not be generated and matching be made case
> neutral.
>
> Apart from this, the rest of the cases in this category were some with missing
> first or last names, using names in other languages or having commas in
> one name, using initials etc. These should be warned about.
>
> 2) Same name, different address
> There were 204 such instances. Most of them were of the form in which they
> used a completely different email address.
>
> Only one case was found with an email extension.
> <hpeter@gmail.com>
> <hpeter+linux_kernel@gmail.com>
> in commit 423d9118c624
>
> So in general I think most email mismatches should be specified by a WARN
> and for email extension there should be a --strict CHECK.
> What do you think of this?
>
> Thanks,
> Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-26 11:31                             ` Dwaipayan Ray
  2020-09-28 13:30                               ` Dwaipayan Ray
@ 2020-09-28 15:06                               ` Lukas Bulwahn
  1 sibling, 0 replies; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-28 15:06 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Sat, 26 Sep 2020, Dwaipayan Ray wrote:

> > > Yes, that should help I think. Currently checkpatch is very vague
> > > about Author sign offs, so subclassing it will really be helpful.
> > > But at the same point I don't think it should become
> > > too complex either.
> >
> > Agree, that is where the evaluation will help (what is really happening,
> > what do we need to consider) and the severity comes into play (if we
> > consider all subclasses of same severity, we do not need to differentiate
> > between them.)
> >
> > >
> > > I have already prepared a patch for the basic two classes:
> > > 1) same name, different address
> > > 2) same address, different name
> > >
> > > It would be great to have your feedback on this.
> >
> > Okay, I suggest to run this script on a larger set of commits and see.
> >
> > Probably, you can speed it your evaluation with checkpatch.pl by
> > restricting it to the AUTHOR_SIGN_OFF type. I will try that over the
> > weekend as well.
> >
> > Lukas
> 
> Hi,
> So I did run my script to evaluate the type of errors that were generated.
> My results are from commits between v5.7 and v5.8
>

Okay, good so far. How about running this on a larger set of commits?
 
> 1) Same address, different name
> 
> There were 32 such instances. In general, there was no regularity in the
> specific error.
> 
> There were some cases with letter case mismatch. Like there was one
> committer who used the names "Arend Van Spriel", and "Arend van Spriel".
> I believe this error should not be generated and matching be made case
> neutral.
> 
> Apart from this, the rest of the cases in this category were some with missing
> first or last names, using names in other languages or having commas in
> one name, using initials etc. These should be warned about.
>

Agree. Let us making this WARNING, but not ERROR.

> 2) Same name, different address
> 
> There were 204 such instances. Most of them were of the form in which they
> used a completely different email address.
>

I guess we would need to understand in more detail why developers do that 
and if .mailmap helps here.
  
> Only one case was found with an email extension.
> <hpeter@gmail.com>
> <hpeter+linux_kernel@gmail.com>
> in commit 423d9118c624
> 
> So in general I think most email mismatches should be specified by a WARN
> and for email extension there should be a --strict CHECK.
> 
> What do you think of this?
>

Basically, I agree. How about proposing a patch to Joe and lkml and then 
we see...

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
  2020-09-28 14:20                                   ` Dwaipayan Ray
@ 2020-09-28 15:09                                     ` Lukas Bulwahn
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Bulwahn @ 2020-09-28 15:09 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Mon, 28 Sep 2020, Dwaipayan Ray wrote:

> >
> > How about just starting a new thread instead?
> >
> > I will answer on the new thread.
> >
> > Lukas
> >
> 
> Sure, I will.
> 
> Also did my last mail go through to you?
>

Yes, that went through. I just did not have some immediate thought on any 
further response. I think it is a good idea but I had no strong feeling on 
louding rejecting or loudly raising acknowledgement to the idea.

Just go for it.

Lukas
 
> Quoted:
> > So I did run my script to evaluate the type of errors that were generated.
> > My results are from commits between v5.7 and v5.8
> >
> > 1) Same address, different name
> > There were 32 such instances. In general, there was no regularity in the
> > specific error.
> >
> > There were some cases with letter case mismatch. Like there was one
> > committer who used the names "Arend Van Spriel", and "Arend van Spriel".
> > I believe this error should not be generated and matching be made case
> > neutral.
> >
> > Apart from this, the rest of the cases in this category were some with missing
> > first or last names, using names in other languages or having commas in
> > one name, using initials etc. These should be warned about.
> >
> > 2) Same name, different address
> > There were 204 such instances. Most of them were of the form in which they
> > used a completely different email address.
> >
> > Only one case was found with an email extension.
> > <hpeter@gmail.com>
> > <hpeter+linux_kernel@gmail.com>
> > in commit 423d9118c624
> >
> > So in general I think most email mismatches should be specified by a WARN
> > and for email extension there should be a --strict CHECK.
> > What do you think of this?
> >
> > Thanks,
> > Dwaipayan.
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-09-28 15:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 10:06 [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues Lukas Bulwahn
2020-09-18 10:29 ` Dwaipayan Ray
2020-09-18 10:44   ` Lukas Bulwahn
2020-09-21  9:07     ` Dwaipayan Ray
2020-09-21  9:12       ` Lukas Bulwahn
2020-09-21  9:15       ` Lukas Bulwahn
2020-09-22 13:21         ` Dwaipayan Ray
2020-09-22 18:38           ` Lukas Bulwahn
2020-09-22 19:08             ` Dwaipayan Ray
2020-09-23  7:32               ` Lukas Bulwahn
2020-09-23  7:38                 ` Dwaipayan Ray
2020-09-23  7:42                   ` Lukas Bulwahn
2020-09-25  4:18                     ` Dwaipayan Ray
2020-09-25  7:20                       ` Lukas Bulwahn
2020-09-25  7:29                         ` Dwaipayan Ray
2020-09-25  7:35                           ` Lukas Bulwahn
2020-09-26 11:31                             ` Dwaipayan Ray
2020-09-28 13:30                               ` Dwaipayan Ray
2020-09-28 14:09                                 ` Lukas Bulwahn
2020-09-28 14:20                                   ` Dwaipayan Ray
2020-09-28 15:09                                     ` Lukas Bulwahn
2020-09-28 15:06                               ` Lukas Bulwahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).