* [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues @ 2020-09-18 10:06 Lukas Bulwahn 2020-09-18 10:29 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-18 10:06 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees Hi Dwaipayan, hi others, I had a quick look on the NO AUTHOR SIGN OFF issues reported by checkpatch.pl on v5.4..v5.8 on my own small script. After collecting all the data in a tsv (no details on that tsv here): $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l 1064 $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | cut -f 7 | sort | uniq -c | sort -nr | head -n 8 175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>' 116 Missing Signed-off-by: line by nominal patch author '' 68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy@gmail.com>' 43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen@synopsys.com>' 40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl@gmail.com>' 36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei@solarflare.com>' 31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks@vt.edu>' 24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels@cs.washington.edu>' Here a quick look at the first two: 175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>' $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \ | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter $ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \ xargs git show --format="%b" -s | \ grep "Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>" | \ wc -l So all 175 commits are of the type: Author: Daniel Vetter <daniel.vetter@ffwll.ch> ... Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> and the second: 116 Missing Signed-off-by: line by nominal patch author '' That is probably due to not parsing the patch author with a line break. So, if we can find a solution for Daniel Vetter (of course, not hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him and making use of that in checkpatch.pl, ... and for the encoding problem, then we got around 27% of the NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the next few remaining ones. The longer tail of warnings are clearly warnings that deserve to be pointed out to newbies with a broken setup. I hope you can continue to work on a solution for this class. Thanks for your initial investigation, Lukas _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-18 10:06 [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues Lukas Bulwahn @ 2020-09-18 10:29 ` Dwaipayan Ray 2020-09-18 10:44 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-18 10:29 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees On Fri, Sep 18, 2020 at 3:36 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote: > > Hi Dwaipayan, hi others, > > I had a quick look on the NO AUTHOR SIGN OFF issues reported by > checkpatch.pl on v5.4..v5.8 on my own small script. > > After collecting all the data in a tsv (no details on that tsv here): > > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l > 1064 > > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | cut -f 7 | sort | uniq -c | > sort -nr | head -n 8 > 175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>' > 116 Missing Signed-off-by: line by nominal patch author '' > 68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy@gmail.com>' > 43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen@synopsys.com>' > 40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl@gmail.com>' > 36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei@solarflare.com>' > 31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks@vt.edu>' > 24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels@cs.washington.edu>' > > > Here a quick look at the first two: > > 175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter > <daniel.vetter@ffwll.ch>' > > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \ > | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter > $ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \ > xargs git show --format="%b" -s | \ > grep "Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>" | \ > wc -l > > So all 175 commits are of the type: > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > ... > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > and the second: > > 116 Missing Signed-off-by: line by nominal patch author '' > > That is probably due to not parsing the patch author with a line break. > > > So, if we can find a solution for Daniel Vetter (of course, not > hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him > and making use of that in checkpatch.pl, > > ... and for the encoding problem, then we got around 27% of the > NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the > next few remaining ones. The longer tail of warnings are clearly warnings > that deserve to be pointed out to newbies with a broken setup. > > I hope you can continue to work on a solution for this class. > > > Thanks for your initial investigation, > > Lukas Hi, I think I have figured out how to fix the authors with a line break. The checkpatch script keeps a track of the previous line in a variable $prevline. I can use it to fix the author's initial signature. I will send you a patch on this when it's done. And for Daniel Vetter's issue, it will be a bit more complex i think as checkpatch seems to check both author's name and author's email by parsing two strings. At revision 10b82d5176488acee2820e5a2cf0f2ec5c3488b6, scripts/checkpatch.pl, line 2673: if ($author ne '') { if (same_email_addresses($1, $author)) { $authorsignoff = 1; } } line 1213: sub same_email_addresses { my ($email1, $email2) = @_; my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1); my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2); return $email1_name eq $email2_name && $email1_address eq $email2_address; } The easiest way at this point will be to modify same_email_address subroutine and add checks for other email addresses belonging to the same author by loading .mailmap. But will this make the subroutine heavy? Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-18 10:29 ` Dwaipayan Ray @ 2020-09-18 10:44 ` Lukas Bulwahn 2020-09-21 9:07 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-18 10:44 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Fri, 18 Sep 2020, Dwaipayan Ray wrote: > On Fri, Sep 18, 2020 at 3:36 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote: > > > > Hi Dwaipayan, hi others, > > > > I had a quick look on the NO AUTHOR SIGN OFF issues reported by > > checkpatch.pl on v5.4..v5.8 on my own small script. > > > > After collecting all the data in a tsv (no details on that tsv here): > > > > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l > > 1064 > > > > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | cut -f 7 | sort | uniq -c | > > sort -nr | head -n 8 > > 175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter@ffwll.ch>' > > 116 Missing Signed-off-by: line by nominal patch author '' > > 68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy@gmail.com>' > > 43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen@synopsys.com>' > > 40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl@gmail.com>' > > 36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei@solarflare.com>' > > 31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks@vt.edu>' > > 24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels@cs.washington.edu>' > > > > > > Here a quick look at the first two: > > > > 175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter > > <daniel.vetter@ffwll.ch>' > > > > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \ > > | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter > > $ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \ > > xargs git show --format="%b" -s | \ > > grep "Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>" | \ > > wc -l > > > > So all 175 commits are of the type: > > > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > ... > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > and the second: > > > > 116 Missing Signed-off-by: line by nominal patch author '' > > > > That is probably due to not parsing the patch author with a line break. > > > > > > So, if we can find a solution for Daniel Vetter (of course, not > > hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him > > and making use of that in checkpatch.pl, > > > > ... and for the encoding problem, then we got around 27% of the > > NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the > > next few remaining ones. The longer tail of warnings are clearly warnings > > that deserve to be pointed out to newbies with a broken setup. > > > > I hope you can continue to work on a solution for this class. > > > > > > Thanks for your initial investigation, > > > > Lukas > > Hi, > I think I have figured out how to fix the authors with a line break. > The checkpatch script keeps a track of the previous line in a variable > $prevline. I can use it to fix the author's initial signature. I will > send you a patch on this when it's done. > Okay, let us start to have this first patch correct and accepted by Joe Perches. > And for Daniel Vetter's issue, it will be a bit more complex i think > as checkpatch > seems to check both author's name and author's email by parsing two > strings. > When the first patch is accepted, let us discuss with Joe the best solution for this case. > At revision 10b82d5176488acee2820e5a2cf0f2ec5c3488b6, > scripts/checkpatch.pl, > > > line 2673: > if ($author ne '') { > if (same_email_addresses($1, $author)) { > $authorsignoff = 1; > } > } > > line 1213: > sub same_email_addresses { > my ($email1, $email2) = @_; > > my ($email1_name, $name1_comment, $email1_address, $comment1) > = parse_email($email1); > my ($email2_name, $name2_comment, $email2_address, $comment2) > = parse_email($email2); > > return $email1_name eq $email2_name && > $email1_address eq $email2_address; > } > > The easiest way at this point will be to modify same_email_address > subroutine and add checks for other email addresses belonging to > the same author by loading .mailmap. > But will this make the subroutine heavy? > Yes, it is becoming heavier... but maybe still as performant as before if done right. You could always run a quick fast path check and only if you cannot find the sign-off, you do a slow path, loading .mailmap and checking. By the way, there is already the function read_mailmap in ./scripts/get_maintainers.pl, so we would not need to implement that, just find a good way for both scripts to share this implementation. But let us do the first patch and then discuss with Joe. Lukas _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-18 10:44 ` Lukas Bulwahn @ 2020-09-21 9:07 ` Dwaipayan Ray 2020-09-21 9:12 ` Lukas Bulwahn 2020-09-21 9:15 ` Lukas Bulwahn 0 siblings, 2 replies; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-21 9:07 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees Hi Lukas, I did send out the patch. Weirdly enough, my gmail inbox decided to combine both threads with subject [PATCH v2] and [PATCH v3]. Did you get it separate or is it the same case? In other mail boxes it seems to be okay. Apart from that for fixing the different email addresses, I am thinking to try copying the read_mailmap subroutine from scripts/get_maintainer.pl, see how it works and try to write a solution based on that. I will let you know once I achieve something. Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-21 9:07 ` Dwaipayan Ray @ 2020-09-21 9:12 ` Lukas Bulwahn 2020-09-21 9:15 ` Lukas Bulwahn 1 sibling, 0 replies; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-21 9:12 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Mon, 21 Sep 2020, Dwaipayan Ray wrote: > Hi Lukas, > I did send out the patch. > Weirdly enough, my gmail inbox decided to combine both > threads with subject [PATCH v2] and [PATCH v3]. Did you > get it separate or is it the same case? In other mail boxes > it seems to be okay. > That is a bug in the gmail client. It looks good in my proper email client. > Apart from that for fixing the different email addresses, I > am thinking to try copying the read_mailmap subroutine > from scripts/get_maintainer.pl, see how it works and > try to write a solution based on that. I will let you know > once I achieve something. > Sure, I am still wondering if it better to just use the .mailmap file or if we should introduce a separate file with the same format. But let us start implementing and see. Lukas _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-21 9:07 ` Dwaipayan Ray 2020-09-21 9:12 ` Lukas Bulwahn @ 2020-09-21 9:15 ` Lukas Bulwahn 2020-09-22 13:21 ` Dwaipayan Ray 1 sibling, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-21 9:15 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Mon, 21 Sep 2020, Dwaipayan Ray wrote: > Hi Lukas, > I did send out the patch. > Weirdly enough, my gmail inbox decided to combine both > threads with subject [PATCH v2] and [PATCH v3]. Did you > get it separate or is it the same case? In other mail boxes > it seems to be okay. > > Apart from that for fixing the different email addresses, I > am thinking to try copying the read_mailmap subroutine > from scripts/get_maintainer.pl, see how it works and > try to write a solution based on that. I will let you know > once I achieve something. > Can you describe in a few sentence what the problem is, and which solution you are exploring to Joe, linux-kernel and the mentees list. I would consider that your project proposal for the first mentee milestone. With that, we can progress you to become a proper mentee in the community bridge platform. Lukas > Thanks, > Dwaipayan. > _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-21 9:15 ` Lukas Bulwahn @ 2020-09-22 13:21 ` Dwaipayan Ray 2020-09-22 18:38 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-22 13:21 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees Hi Lukas, I looked up the mailmap implementations in get_maintainers and I was able to draw up something. After that I updated .mailmap with Daniel's mail addresses and ran the checkpatch script on some of Daniel Vetter's commits and the author sign off warning no longer appears :). Let me know what you think of this. I am sending you the diff for now. Thanks, Dwaipayan. -------------------------------------------------------- diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 9e65d21456f1..e8c12a5a7e7b 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -51,6 +51,7 @@ my %ignore_type = (); my @ignore = (); my $help = 0; my $configuration_file = ".checkpatch.conf"; +my $lk_path = "./"; my $max_line_length = 100; my $ignore_perl_version = 0; my $minimum_perl_version = 5.10.0; @@ -1128,6 +1129,78 @@ sub top_of_kernel_tree { return 1; } +my $mailmap; + +sub read_mailmap { + $mailmap = { + names => {}, + addresses => {} + }; + + return if (!defined($lk_path) || !(-f "${lk_path}.mailmap")); + + open(my $mailmap_file, '<', "${lk_path}.mailmap") + or warn "$P: Can't open .mailmap: $!\n"; + + while (<$mailmap_file>) { + s/#.*$//; #strip comments + s/^\s+|\s+$//g; #trim + + next if (/^\s*$/); #skip empty lines + #entries have one of the following formats: + # name1 <mail1> + # <mail1> <mail2> + # name1 <mail1> <mail2> + # name1 <mail1> name2 <mail2> + # (see man git-shortlog) + + if (/^([^<]+)<([^>]+)>$/) { + my $real_name = $1; + my $address = $2; + + $real_name =~ s/\s+$//; + ($real_name, $address) = parse_email("$real_name <$address>"); + $mailmap->{names}->{$address} = $real_name; + + } elsif (/^<([^>]+)>\s*<([^>]+)>$/) { + my $real_address = $1; + my $wrong_address = $2; + + $mailmap->{addresses}->{$wrong_address} = $real_address; + + } elsif (/^(.+)<([^>]+)>\s*<([^>]+)>$/) { + my $real_name = $1; + my $real_address = $2; + my $wrong_address = $3; + + $real_name =~ s/\s+$//; + ($real_name, $real_address) = + parse_email("$real_name <$real_address>"); + $mailmap->{names}->{$wrong_address} = $real_name; + $mailmap->{addresses}->{$wrong_address} = $real_address; + + } elsif (/^(.+)<([^>]+)>\s*(.+)\s*<([^>]+)>$/) { + my $real_name = $1; + my $real_address = $2; + my $wrong_name = $3; + my $wrong_address = $4; + + $real_name =~ s/\s+$//; + ($real_name, $real_address) = + parse_email("$real_name <$real_address>"); + + $wrong_name =~ s/\s+$//; + ($wrong_name, $wrong_address) = + parse_email("$wrong_name <$wrong_address>"); + + my $wrong_email = format_email($wrong_name, $wrong_address); + $mailmap->{names}->{$wrong_email} = $real_name; + $mailmap->{addresses}->{$wrong_email} = $real_address; + } + } + close($mailmap_file); +} + sub parse_email { my ($formatted_email) = @_; @@ -1210,14 +1283,50 @@ sub reformat_email { return format_email($email_name, $email_address); } +sub mailmap_email { + my ($name, $address) = @_; + + my $email = format_email($name, $address); + my $real_name = $name; + my $real_address = $address; + + if (exists $mailmap->{names}->{$email} || + exists $mailmap->{addresses}->{$email}) { + if (exists $mailmap->{names}->{$email}) { + $real_name = $mailmap->{names}->{$email}; + } + if (exists $mailmap->{addresses}->{$email}) { + $real_address = $mailmap->{addresses}->{$email}; + } + } else { + if (exists $mailmap->{names}->{$address}) { + $real_name = $mailmap->{names}->{$address}; + } + if (exists $mailmap->{addresses}->{$address}) { + $real_address = $mailmap->{addresses}->{$address}; + } + } + return ($real_name, $real_address); +} + sub same_email_addresses { my ($email1, $email2) = @_; my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1); my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2); - return $email1_name eq $email2_name && - $email1_address eq $email2_address; + my $name_match = $email1_name eq $email2_name; + my $address_match = $email1_address eq $email2_address; + + if(!$name_match || !$address_match) { + my ($real_name1, $real_address1) = mailmap_email($email1_name, $email1_address); + my ($real_name2, $real_address2) = mailmap_email($email2_name, $email2_address); + + $name_match |= ($real_name1 eq $real_name2); + $address_match |= ($real_address1 eq $real_address2); + } + + return $name_match && $address_match; } sub which { @@ -2400,6 +2509,7 @@ sub process { my $checklicenseline = 1; + read_mailmap(); sanitise_line_reset(); my $line; foreach my $rawline (@rawlines) { _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-22 13:21 ` Dwaipayan Ray @ 2020-09-22 18:38 ` Lukas Bulwahn 2020-09-22 19:08 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-22 18:38 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Tue, 22 Sep 2020, Dwaipayan Ray wrote: > Hi Lukas, > > I looked up the mailmap implementations in get_maintainers > and I was able to draw up something. After that I updated > .mailmap with Daniel's mail addresses and ran the > checkpatch script on some of Daniel Vetter's commits and > the author sign off warning no longer appears :). > > Let me know what you think of this. > I am sending you the diff for now. > Generally, I think it is a good first proof of concept. I believe you that functionality 'basically' works; again, we might already want to run a full-scale evaluation on that. Just to see if there are some impacts we might not be aware of yet. As you already wrote, we, Joe, you and me, need to figure out now all the further details: - how can we avoid the duplicate code in checkpatch.pl and get_maintainers.pl? - what is performance impact, especially as AUTHOR_SIGN_OFF check is not triggered often, and there are many other rules in checkpatch.pl? - further details, such as why do we need the lk_path is the first place? and many more questions of that kind. Feel free to sketch a first commit message and create a PATCH RFC for the discussion with Joe. Lukas > Thanks, > Dwaipayan. > > -------------------------------------------------------- > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 9e65d21456f1..e8c12a5a7e7b 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -51,6 +51,7 @@ my %ignore_type = (); > my @ignore = (); > my $help = 0; > my $configuration_file = ".checkpatch.conf"; > +my $lk_path = "./"; > my $max_line_length = 100; > my $ignore_perl_version = 0; > my $minimum_perl_version = 5.10.0; > @@ -1128,6 +1129,78 @@ sub top_of_kernel_tree { > return 1; > } > > +my $mailmap; > + > +sub read_mailmap { > + $mailmap = { > + names => {}, > + addresses => {} > + }; > + > + return if (!defined($lk_path) || !(-f "${lk_path}.mailmap")); > + > + open(my $mailmap_file, '<', "${lk_path}.mailmap") > + or warn "$P: Can't open .mailmap: $!\n"; > + > + while (<$mailmap_file>) { > + s/#.*$//; #strip comments > + s/^\s+|\s+$//g; #trim > + > + next if (/^\s*$/); #skip empty lines > + #entries have one of the following formats: > + # name1 <mail1> > + # <mail1> <mail2> > + # name1 <mail1> <mail2> > + # name1 <mail1> name2 <mail2> > + # (see man git-shortlog) > + > + if (/^([^<]+)<([^>]+)>$/) { > + my $real_name = $1; > + my $address = $2; > + > + $real_name =~ s/\s+$//; > + ($real_name, $address) = parse_email("$real_name <$address>"); > + $mailmap->{names}->{$address} = $real_name; > + > + } elsif (/^<([^>]+)>\s*<([^>]+)>$/) { > + my $real_address = $1; > + my $wrong_address = $2; > + > + $mailmap->{addresses}->{$wrong_address} = $real_address; > + > + } elsif (/^(.+)<([^>]+)>\s*<([^>]+)>$/) { > + my $real_name = $1; > + my $real_address = $2; > + my $wrong_address = $3; > + > + $real_name =~ s/\s+$//; > + ($real_name, $real_address) = > + parse_email("$real_name <$real_address>"); > + $mailmap->{names}->{$wrong_address} = $real_name; > + $mailmap->{addresses}->{$wrong_address} = $real_address; > + > + } elsif (/^(.+)<([^>]+)>\s*(.+)\s*<([^>]+)>$/) { > + my $real_name = $1; > + my $real_address = $2; > + my $wrong_name = $3; > + my $wrong_address = $4; > + > + $real_name =~ s/\s+$//; > + ($real_name, $real_address) = > + parse_email("$real_name <$real_address>"); > + > + $wrong_name =~ s/\s+$//; > + ($wrong_name, $wrong_address) = > + parse_email("$wrong_name <$wrong_address>"); > + > + my $wrong_email = format_email($wrong_name, $wrong_address); > + $mailmap->{names}->{$wrong_email} = $real_name; > + $mailmap->{addresses}->{$wrong_email} = $real_address; > + } > + } > + close($mailmap_file); > +} > + > sub parse_email { > my ($formatted_email) = @_; > > @@ -1210,14 +1283,50 @@ sub reformat_email { > return format_email($email_name, $email_address); > } > > +sub mailmap_email { > + my ($name, $address) = @_; > + > + my $email = format_email($name, $address); > + my $real_name = $name; > + my $real_address = $address; > + > + if (exists $mailmap->{names}->{$email} || > + exists $mailmap->{addresses}->{$email}) { > + if (exists $mailmap->{names}->{$email}) { > + $real_name = $mailmap->{names}->{$email}; > + } > + if (exists $mailmap->{addresses}->{$email}) { > + $real_address = $mailmap->{addresses}->{$email}; > + } > + } else { > + if (exists $mailmap->{names}->{$address}) { > + $real_name = $mailmap->{names}->{$address}; > + } > + if (exists $mailmap->{addresses}->{$address}) { > + $real_address = $mailmap->{addresses}->{$address}; > + } > + } > + return ($real_name, $real_address); > +} > + > sub same_email_addresses { > my ($email1, $email2) = @_; > > my ($email1_name, $name1_comment, $email1_address, $comment1) = > parse_email($email1); > my ($email2_name, $name2_comment, $email2_address, $comment2) = > parse_email($email2); > > - return $email1_name eq $email2_name && > - $email1_address eq $email2_address; > + my $name_match = $email1_name eq $email2_name; > + my $address_match = $email1_address eq $email2_address; > + > + if(!$name_match || !$address_match) { > + my ($real_name1, $real_address1) = mailmap_email($email1_name, > $email1_address); > + my ($real_name2, $real_address2) = mailmap_email($email2_name, > $email2_address); > + > + $name_match |= ($real_name1 eq $real_name2); > + $address_match |= ($real_address1 eq $real_address2); > + } > + > + return $name_match && $address_match; > } > > sub which { > @@ -2400,6 +2509,7 @@ sub process { > > my $checklicenseline = 1; > > + read_mailmap(); > sanitise_line_reset(); > my $line; > foreach my $rawline (@rawlines) { > _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-22 18:38 ` Lukas Bulwahn @ 2020-09-22 19:08 ` Dwaipayan Ray 2020-09-23 7:32 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-22 19:08 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees > > Generally, I think it is a good first proof of concept. > I believe you that functionality 'basically' works; again, we might > already want to run a full-scale evaluation on that. Just to see > if there are some impacts we might not be aware of yet. > > As you already wrote, we, Joe, you and me, need to figure out > now all the further details: > > - how can we avoid the duplicate code in checkpatch.pl and > get_maintainers.pl? > > - what is performance impact, especially as AUTHOR_SIGN_OFF check is not > triggered often, and there are many other rules in checkpatch.pl? > > - further details, such as why do we need the lk_path is the first place? > and many more questions of that kind. > > Feel free to sketch a first commit message and create a PATCH RFC for the > discussion with Joe. > > > Lukas > Hi, As for the lk_path, it can be removed easily. To me too, it didn't make much sense since it was just a duplicate, as $root should contain the same. But again due to some reason, $root in checkpatch had the value ".", while $lk_path in get_maintainer had the value "./" I have no idea yet if this was a design decision or just different handling. So, I can change the part where I referenced the mailmap file by adding a trailing / with $root rather than $lk_path. That should do it. And for the duplicate code part, Joe did mention that either I could copy or place the read_mailmap code in a separate file and reference both checkpatch and get_maintainers from there. To me, copying seems much feasible because the referenced part of the mailmap handling code here is very small as there are some minor changes. Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-22 19:08 ` Dwaipayan Ray @ 2020-09-23 7:32 ` Lukas Bulwahn 2020-09-23 7:38 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-23 7:32 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Wed, 23 Sep 2020, Dwaipayan Ray wrote: > > > > Generally, I think it is a good first proof of concept. > > I believe you that functionality 'basically' works; again, we might > > already want to run a full-scale evaluation on that. Just to see > > if there are some impacts we might not be aware of yet. > > > > As you already wrote, we, Joe, you and me, need to figure out > > now all the further details: > > > > - how can we avoid the duplicate code in checkpatch.pl and > > get_maintainers.pl? > > > > - what is performance impact, especially as AUTHOR_SIGN_OFF check is not > > triggered often, and there are many other rules in checkpatch.pl? > > > > - further details, such as why do we need the lk_path is the first place? > > and many more questions of that kind. > > > > Feel free to sketch a first commit message and create a PATCH RFC for the > > discussion with Joe. > > > > > > Lukas > > > > Hi, > As for the lk_path, it can be removed easily. To me too, it didn't make much > sense since it was just a duplicate, as $root should contain the same. > > But again due to some reason, > $root in checkpatch had the value ".", > while $lk_path in get_maintainer had the value "./" > > I have no idea yet if this was a design decision or just different handling. > > So, I can change the part where I referenced the mailmap file by adding > a trailing / with $root rather than $lk_path. That should do it. > > And for the duplicate code part, Joe did mention that either I could copy > or place the read_mailmap code in a separate file and reference both > checkpatch and get_maintainers from there. > > To me, copying seems much feasible because the referenced part of the > mailmap handling code here is very small as there are some minor > changes. > Well, if you can argue copying code, it is fine. Go ahead and send out a RFC patch. Regarding the mentorship, would you have time for some virtual face-to-face meeting to discuss the Eligibility Criteria and Mentorship Models? Lukas _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-23 7:32 ` Lukas Bulwahn @ 2020-09-23 7:38 ` Dwaipayan Ray 2020-09-23 7:42 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-23 7:38 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees > > Hi, > > As for the lk_path, it can be removed easily. To me too, it didn't make much > > sense since it was just a duplicate, as $root should contain the same. > > > > But again due to some reason, > > $root in checkpatch had the value ".", > > while $lk_path in get_maintainer had the value "./" > > > > I have no idea yet if this was a design decision or just different handling. > > > > So, I can change the part where I referenced the mailmap file by adding > > a trailing / with $root rather than $lk_path. That should do it. > > > > And for the duplicate code part, Joe did mention that either I could copy > > or place the read_mailmap code in a separate file and reference both > > checkpatch and get_maintainers from there. > > > > To me, copying seems much feasible because the referenced part of the > > mailmap handling code here is very small as there are some minor > > changes. > > > > Well, if you can argue copying code, it is fine. Go ahead and send out a > RFC patch. > I definitely know that redundant code is a bad practice. But yes, I would send a patch in with a proper commit message to get yours and Joe's comments. > Regarding the mentorship, would you have time for some virtual > face-to-face meeting to discuss the Eligibility Criteria and Mentorship > Models? > > > Lukas Yes sure, what time are you available? I don't have any more classes today so I am free. Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-23 7:38 ` Dwaipayan Ray @ 2020-09-23 7:42 ` Lukas Bulwahn 2020-09-25 4:18 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-23 7:42 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees > > > Regarding the mentorship, would you have time for some virtual > > face-to-face meeting to discuss the Eligibility Criteria and Mentorship > > Models? > > > > > > Lukas > > Yes sure, what time are you available? I don't have any more classes > today so I am free. > How about now? Let us quickly meet here: https://meet.jit.si/Checkpatch.pl-Mentorship Lukas _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-23 7:42 ` Lukas Bulwahn @ 2020-09-25 4:18 ` Dwaipayan Ray 2020-09-25 7:20 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-25 4:18 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees Hi, As Joe mentioned earlier, there might be four new warnings to handle better: 1) Same name, different address 2) Same address, different name 3) email extensions present in header but not in signed off by 4) comment blocks after name I am thinking to solve the first three in a single patch. As for the email extensions part, should it be handled differently? There will be redundant calls to that though because it's not an error seen quite often. Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-25 4:18 ` Dwaipayan Ray @ 2020-09-25 7:20 ` Lukas Bulwahn 2020-09-25 7:29 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-25 7:20 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees [-- Attachment #1: Type: text/plain, Size: 2883 bytes --] On Fri, 25 Sep 2020, Dwaipayan Ray wrote: > Hi, > As Joe mentioned earlier, there might be four new > warnings to handle better: > > 1) Same name, different address > 2) Same address, different name > 3) email extensions present in header but > not in signed off by > 4) comment blocks after name > > I am thinking to solve the first three in a single patch. > > As for the email extensions part, should it be handled > differently? There will be redundant calls to that > though because it's not an error seen quite often. > Dwaipayan, thanks for this initial structuring. Let us try to put some systematic structure into this handling of the NO_AUTHOR_SIGN_OFF. There are basically two aspects to consider: - which classes are potential mismatches can we potentially observe? - which level of severity do we assign to each class of mismatch? To the first point, you started with some initial classes above. 0) same name, same address (no mismatch) 1) same name, different address subclasses: (how 'different' address?) - email address differ by valid difference, e.g., extensions that will go to the same inbox, e.g., the xyz+fds extension in mail addresses. (that is your class 3, right?) - two email addresses that are known to belong to the same individual - known because of .mailmap - known otherwise? 2) different name, same address subclasses: (how 'different' name?) - examples: - Firstname, Lastname (but middle-name initials differ) - special "regional" characters, like ü vs. ue, etc. - firstname and lastname are reverted etc. 3) different name, different address (but still we believe it "matches") combinations of subclasses of 1) and 2), which we want to consider. Then, to the second question: So, these different classes we can think of and "assign" different severities that checkpatch.pl can use. Checkpatch.pl already has four levels: three severity reporting levels: ERROR, WARN, CHECK, and the level _do not report_ (which is implemented by just ignoring some kind of difference). I think a lot of discussion will be around what severity to assign to which class above, and in some way, long-term maintainers have a larger say than we do here. So, let us first modify checkpatch.pl to identify the various classes above, maybe just starting very basic, with different name, same address and same name, different address and run it over the commit range and see which examples show up and how often. For now, we first just use checkpatch.pl to identify the different types and refine them into subclasses; then, we can discuss with severity should be assigned to which type of mismatch. The second question should not invalidate our data collection and identification of subclasses, though. Does that help? What do you think? Lukas [-- Attachment #2: Type: text/plain, Size: 201 bytes --] _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-25 7:20 ` Lukas Bulwahn @ 2020-09-25 7:29 ` Dwaipayan Ray 2020-09-25 7:35 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-25 7:29 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees > Let us try to put some systematic structure into this handling of the > NO_AUTHOR_SIGN_OFF. > > There are basically two aspects to consider: > > - which classes are potential mismatches can we potentially observe? > > - which level of severity do we assign to each class of mismatch? > > To the first point, you started with some initial classes above. > > 0) same name, same address (no mismatch) > 1) same name, different address > subclasses: (how 'different' address?) > - email address differ by valid difference, e.g., extensions that > will go to the same inbox, e.g., the xyz+fds extension in mail > addresses. (that is your class 3, right?) > - two email addresses that are known to belong to the same individual > - known because of .mailmap > - known otherwise? > 2) different name, same address > subclasses: (how 'different' name?) > - examples: > - Firstname, Lastname (but middle-name initials differ) > - special "regional" characters, like ü vs. ue, etc. > - firstname and lastname are reverted etc. > 3) different name, different address (but still we believe it "matches") > combinations of subclasses of 1) and 2), which we want to consider. > > Then, to the second question: > > So, these different classes we can think of and "assign" different > severities that checkpatch.pl can use. > > Checkpatch.pl already has four levels: > > three severity reporting levels: ERROR, WARN, CHECK, and > the level _do not report_ (which is implemented by just ignoring some kind > of difference). > > I think a lot of discussion will be around what severity to assign to > which class above, and in some way, long-term maintainers have a larger > say than we do here. > > So, let us first modify checkpatch.pl to identify the various classes > above, maybe just starting very basic, with different name, same address > and same name, different address and run it over the commit range and see > which examples show up and how often. > > For now, we first just use checkpatch.pl to identify the different types > and refine them into subclasses; then, we can discuss with severity should > be assigned to which type of mismatch. > > The second question should not invalidate our data collection and > identification of subclasses, though. > > Does that help? What do you think? > > > Lukas Yes, that should help I think. Currently checkpatch is very vague about Author sign offs, so subclassing it will really be helpful. But at the same point I don't think it should become too complex either. I have already prepared a patch for the basic two classes: 1) same name, different address 2) same address, different name It would be great to have your feedback on this. --- diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 9e65d21456f1..e23e753fcaf0 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1210,14 +1210,22 @@ sub reformat_email { return format_email($email_name, $email_address); } -sub same_email_addresses { +sub same_mail_check { my ($email1, $email2) = @_; my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1); my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2); - return $email1_name eq $email2_name && - $email1_address eq $email2_address; + my $same_name = $email1_name eq $email2_name; + my $same_address= $email1_address eq $email2_address; + + return ($same_name, $same_address); +} + +sub same_email_addresses { + my ($same_name, $same_address) = same_mail_check(@_); + + return $same_name && $same_address; } sub which { @@ -2347,6 +2355,7 @@ sub process { my $signoff = 0; my $author = ''; my $authorsignoff = 0; + my $authorsignerr = ''; my $is_patch = 0; my $is_binding_patch = -1; my $in_header_lines = $file ? 0 : 1; @@ -2677,6 +2686,13 @@ sub process { if ($author ne '') { if (same_email_addresses($1, $author)) { $authorsignoff = 1; + } else { + my ($same_name, $same_address) = same_mail_check($1, $author); + if($same_name) { + $authorsignerr = "ADDRESS_MISMATCH:${1}"; + } elsif ($same_address) { + $authorsignerr = "NAME_MISMATCH:${1}"; + } } } } @@ -6891,8 +6907,16 @@ sub process { ERROR("MISSING_SIGN_OFF", "Missing Signed-off-by: line(s)\n"); } elsif (!$authorsignoff) { - WARN("NO_AUTHOR_SIGN_OFF", - "Missing Signed-off-by: line by nominal patch author '$author'\n"); + if($authorsignerr =~ /^NAME_MISMATCH:(.*)/) { + WARN("NO_AUTHOR_SIGN_OFF", + "Author name mismatch: 'From: $author' != 'Signed-off-by: $1'\n"); + } elsif ($authorsignerr =~ /^ADDRESS_MISMATCH:(.*)/) { + WARN("NO_AUTHOR_SIGN_OFF", + "Author email address mismatch: 'From: $author' != 'Signed-off-by: $1'\n"); + } else { + WARN("NO_AUTHOR_SIGN_OFF", + "Missing Signed-off-by: line by nominal patch author '$author'\n"); + } } } --- Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-25 7:29 ` Dwaipayan Ray @ 2020-09-25 7:35 ` Lukas Bulwahn 2020-09-26 11:31 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-25 7:35 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees [-- Attachment #1: Type: text/plain, Size: 5755 bytes --] On Fri, 25 Sep 2020, Dwaipayan Ray wrote: > > Let us try to put some systematic structure into this handling of the > > NO_AUTHOR_SIGN_OFF. > > > > There are basically two aspects to consider: > > > > - which classes are potential mismatches can we potentially observe? > > > > - which level of severity do we assign to each class of mismatch? > > > > To the first point, you started with some initial classes above. > > > > 0) same name, same address (no mismatch) > > 1) same name, different address > > subclasses: (how 'different' address?) > > - email address differ by valid difference, e.g., extensions that > > will go to the same inbox, e.g., the xyz+fds extension in mail > > addresses. (that is your class 3, right?) > > - two email addresses that are known to belong to the same individual > > - known because of .mailmap > > - known otherwise? > > 2) different name, same address > > subclasses: (how 'different' name?) > > - examples: > > - Firstname, Lastname (but middle-name initials differ) > > - special "regional" characters, like ü vs. ue, etc. > > - firstname and lastname are reverted etc. > > 3) different name, different address (but still we believe it "matches") > > combinations of subclasses of 1) and 2), which we want to consider. > > > > Then, to the second question: > > > > So, these different classes we can think of and "assign" different > > severities that checkpatch.pl can use. > > > > Checkpatch.pl already has four levels: > > > > three severity reporting levels: ERROR, WARN, CHECK, and > > the level _do not report_ (which is implemented by just ignoring some kind > > of difference). > > > > I think a lot of discussion will be around what severity to assign to > > which class above, and in some way, long-term maintainers have a larger > > say than we do here. > > > > So, let us first modify checkpatch.pl to identify the various classes > > above, maybe just starting very basic, with different name, same address > > and same name, different address and run it over the commit range and see > > which examples show up and how often. > > > > For now, we first just use checkpatch.pl to identify the different types > > and refine them into subclasses; then, we can discuss with severity should > > be assigned to which type of mismatch. > > > > The second question should not invalidate our data collection and > > identification of subclasses, though. > > > > Does that help? What do you think? > > > > > > Lukas > > Yes, that should help I think. Currently checkpatch is very vague > about Author sign offs, so subclassing it will really be helpful. > But at the same point I don't think it should become > too complex either. Agree, that is where the evaluation will help (what is really happening, what do we need to consider) and the severity comes into play (if we consider all subclasses of same severity, we do not need to differentiate between them.) > > I have already prepared a patch for the basic two classes: > 1) same name, different address > 2) same address, different name > > It would be great to have your feedback on this. Okay, I suggest to run this script on a larger set of commits and see. Probably, you can speed it your evaluation with checkpatch.pl by restricting it to the AUTHOR_SIGN_OFF type. I will try that over the weekend as well. Lukas > > --- > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 9e65d21456f1..e23e753fcaf0 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -1210,14 +1210,22 @@ sub reformat_email { > return format_email($email_name, $email_address); > } > > -sub same_email_addresses { > +sub same_mail_check { > my ($email1, $email2) = @_; > > my ($email1_name, $name1_comment, $email1_address, $comment1) = > parse_email($email1); > my ($email2_name, $name2_comment, $email2_address, $comment2) = > parse_email($email2); > > - return $email1_name eq $email2_name && > - $email1_address eq $email2_address; > + my $same_name = $email1_name eq $email2_name; > + my $same_address= $email1_address eq $email2_address; > + > + return ($same_name, $same_address); > +} > + > +sub same_email_addresses { > + my ($same_name, $same_address) = same_mail_check(@_); > + > + return $same_name && $same_address; > } > > sub which { > @@ -2347,6 +2355,7 @@ sub process { > my $signoff = 0; > my $author = ''; > my $authorsignoff = 0; > + my $authorsignerr = ''; > my $is_patch = 0; > my $is_binding_patch = -1; > my $in_header_lines = $file ? 0 : 1; > @@ -2677,6 +2686,13 @@ sub process { > if ($author ne '') { > if (same_email_addresses($1, $author)) { > $authorsignoff = 1; > + } else { > + my ($same_name, $same_address) = same_mail_check($1, $author); > + if($same_name) { > + $authorsignerr = "ADDRESS_MISMATCH:${1}"; > + } elsif ($same_address) { > + $authorsignerr = "NAME_MISMATCH:${1}"; > + } > } > } > } > @@ -6891,8 +6907,16 @@ sub process { > ERROR("MISSING_SIGN_OFF", > "Missing Signed-off-by: line(s)\n"); > } elsif (!$authorsignoff) { > - WARN("NO_AUTHOR_SIGN_OFF", > - "Missing Signed-off-by: line by nominal patch author '$author'\n"); > + if($authorsignerr =~ /^NAME_MISMATCH:(.*)/) { > + WARN("NO_AUTHOR_SIGN_OFF", > + "Author name mismatch: 'From: $author' != 'Signed-off-by: $1'\n"); > + } elsif ($authorsignerr =~ /^ADDRESS_MISMATCH:(.*)/) { > + WARN("NO_AUTHOR_SIGN_OFF", > + "Author email address mismatch: 'From: $author' != > 'Signed-off-by: $1'\n"); > + } else { > + WARN("NO_AUTHOR_SIGN_OFF", > + "Missing Signed-off-by: line by nominal patch author '$author'\n"); > + } > } > } > --- > > Thanks, > Dwaipayan. > [-- Attachment #2: Type: text/plain, Size: 201 bytes --] _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-25 7:35 ` Lukas Bulwahn @ 2020-09-26 11:31 ` Dwaipayan Ray 2020-09-28 13:30 ` Dwaipayan Ray 2020-09-28 15:06 ` Lukas Bulwahn 0 siblings, 2 replies; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-26 11:31 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees > > Yes, that should help I think. Currently checkpatch is very vague > > about Author sign offs, so subclassing it will really be helpful. > > But at the same point I don't think it should become > > too complex either. > > Agree, that is where the evaluation will help (what is really happening, > what do we need to consider) and the severity comes into play (if we > consider all subclasses of same severity, we do not need to differentiate > between them.) > > > > > I have already prepared a patch for the basic two classes: > > 1) same name, different address > > 2) same address, different name > > > > It would be great to have your feedback on this. > > Okay, I suggest to run this script on a larger set of commits and see. > > Probably, you can speed it your evaluation with checkpatch.pl by > restricting it to the AUTHOR_SIGN_OFF type. I will try that over the > weekend as well. > > Lukas Hi, So I did run my script to evaluate the type of errors that were generated. My results are from commits between v5.7 and v5.8 1) Same address, different name There were 32 such instances. In general, there was no regularity in the specific error. There were some cases with letter case mismatch. Like there was one committer who used the names "Arend Van Spriel", and "Arend van Spriel". I believe this error should not be generated and matching be made case neutral. Apart from this, the rest of the cases in this category were some with missing first or last names, using names in other languages or having commas in one name, using initials etc. These should be warned about. 2) Same name, different address There were 204 such instances. Most of them were of the form in which they used a completely different email address. Only one case was found with an email extension. <hpeter@gmail.com> <hpeter+linux_kernel@gmail.com> in commit 423d9118c624 So in general I think most email mismatches should be specified by a WARN and for email extension there should be a --strict CHECK. What do you think of this? Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-26 11:31 ` Dwaipayan Ray @ 2020-09-28 13:30 ` Dwaipayan Ray 2020-09-28 14:09 ` Lukas Bulwahn 2020-09-28 15:06 ` Lukas Bulwahn 1 sibling, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-28 13:30 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees Hi, I am continuing this thread, but am writing about another issue separate from author sign off. While checking checkpatch output, I was checking the commits with the warnings: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return Looking into the referenced section, I found some sections with a redundant else. For example: (revision 196273fffc1c), arch/powerpc/kernel/security.c , line 360: static int ssb_prctl_get(struct task_struct *task) { if (stf_enabled_flush_types == STF_BARRIER_NONE) /* * We don't have an explicit signal from firmware that we're * vulnerable or not, we only have certain CPU revisions that * are known to be vulnerable. * * We assume that if we're on another CPU, where the barrier is * NONE, then we are not vulnerable. */ return PR_SPEC_NOT_AFFECTED; else /* * If we do have a barrier type then we are vulnerable. The * barrier is not a global or per-process mitigation, so the * only value we can report here is PR_SPEC_ENABLE, which * appears as "vulnerable" in /proc. */ return PR_SPEC_ENABLE; return -EINVAL; } The else is pretty much redundant and the control flow never reaches to return -EINVAL. Is it possible to clean up all these redundant code? Thanks, Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-28 13:30 ` Dwaipayan Ray @ 2020-09-28 14:09 ` Lukas Bulwahn 2020-09-28 14:20 ` Dwaipayan Ray 0 siblings, 1 reply; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-28 14:09 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Mon, 28 Sep 2020, Dwaipayan Ray wrote: > Hi, > I am continuing this thread, but am writing about another issue > separate from author sign off. While checking checkpatch > output, I was checking the commits with the warnings: > How about just starting a new thread instead? I will answer on the new thread. Lukas > WARNING:UNNECESSARY_ELSE: else is not generally > useful after a break or return > > Looking into the referenced section, I found some > sections with a redundant else. > > For example: (revision 196273fffc1c), > arch/powerpc/kernel/security.c , line 360: > > static int ssb_prctl_get(struct task_struct *task) > { > if (stf_enabled_flush_types == STF_BARRIER_NONE) > /* > * We don't have an explicit signal from firmware that we're > * vulnerable or not, we only have certain CPU revisions that > * are known to be vulnerable. > * > * We assume that if we're on another CPU, where the barrier is > * NONE, then we are not vulnerable. > */ > return PR_SPEC_NOT_AFFECTED; > else > /* > * If we do have a barrier type then we are vulnerable. The > * barrier is not a global or per-process mitigation, so the > * only value we can report here is PR_SPEC_ENABLE, which > * appears as "vulnerable" in /proc. > */ > return PR_SPEC_ENABLE; > > return -EINVAL; > } > > The else is pretty much redundant and the control flow > never reaches to return -EINVAL. > > Is it possible to clean up all these redundant code? > > Thanks, > Dwaipayan. > _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-28 14:09 ` Lukas Bulwahn @ 2020-09-28 14:20 ` Dwaipayan Ray 2020-09-28 15:09 ` Lukas Bulwahn 0 siblings, 1 reply; 22+ messages in thread From: Dwaipayan Ray @ 2020-09-28 14:20 UTC (permalink / raw) To: Lukas Bulwahn; +Cc: linux-kernel-mentees > > How about just starting a new thread instead? > > I will answer on the new thread. > > Lukas > Sure, I will. Also did my last mail go through to you? Quoted: > So I did run my script to evaluate the type of errors that were generated. > My results are from commits between v5.7 and v5.8 > > 1) Same address, different name > There were 32 such instances. In general, there was no regularity in the > specific error. > > There were some cases with letter case mismatch. Like there was one > committer who used the names "Arend Van Spriel", and "Arend van Spriel". > I believe this error should not be generated and matching be made case > neutral. > > Apart from this, the rest of the cases in this category were some with missing > first or last names, using names in other languages or having commas in > one name, using initials etc. These should be warned about. > > 2) Same name, different address > There were 204 such instances. Most of them were of the form in which they > used a completely different email address. > > Only one case was found with an email extension. > <hpeter@gmail.com> > <hpeter+linux_kernel@gmail.com> > in commit 423d9118c624 > > So in general I think most email mismatches should be specified by a WARN > and for email extension there should be a --strict CHECK. > What do you think of this? > > Thanks, > Dwaipayan. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-28 14:20 ` Dwaipayan Ray @ 2020-09-28 15:09 ` Lukas Bulwahn 0 siblings, 0 replies; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-28 15:09 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Mon, 28 Sep 2020, Dwaipayan Ray wrote: > > > > How about just starting a new thread instead? > > > > I will answer on the new thread. > > > > Lukas > > > > Sure, I will. > > Also did my last mail go through to you? > Yes, that went through. I just did not have some immediate thought on any further response. I think it is a good idea but I had no strong feeling on louding rejecting or loudly raising acknowledgement to the idea. Just go for it. Lukas > Quoted: > > So I did run my script to evaluate the type of errors that were generated. > > My results are from commits between v5.7 and v5.8 > > > > 1) Same address, different name > > There were 32 such instances. In general, there was no regularity in the > > specific error. > > > > There were some cases with letter case mismatch. Like there was one > > committer who used the names "Arend Van Spriel", and "Arend van Spriel". > > I believe this error should not be generated and matching be made case > > neutral. > > > > Apart from this, the rest of the cases in this category were some with missing > > first or last names, using names in other languages or having commas in > > one name, using initials etc. These should be warned about. > > > > 2) Same name, different address > > There were 204 such instances. Most of them were of the form in which they > > used a completely different email address. > > > > Only one case was found with an email extension. > > <hpeter@gmail.com> > > <hpeter+linux_kernel@gmail.com> > > in commit 423d9118c624 > > > > So in general I think most email mismatches should be specified by a WARN > > and for email extension there should be a --strict CHECK. > > What do you think of this? > > > > Thanks, > > Dwaipayan. > _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues 2020-09-26 11:31 ` Dwaipayan Ray 2020-09-28 13:30 ` Dwaipayan Ray @ 2020-09-28 15:06 ` Lukas Bulwahn 1 sibling, 0 replies; 22+ messages in thread From: Lukas Bulwahn @ 2020-09-28 15:06 UTC (permalink / raw) To: Dwaipayan Ray; +Cc: linux-kernel-mentees On Sat, 26 Sep 2020, Dwaipayan Ray wrote: > > > Yes, that should help I think. Currently checkpatch is very vague > > > about Author sign offs, so subclassing it will really be helpful. > > > But at the same point I don't think it should become > > > too complex either. > > > > Agree, that is where the evaluation will help (what is really happening, > > what do we need to consider) and the severity comes into play (if we > > consider all subclasses of same severity, we do not need to differentiate > > between them.) > > > > > > > > I have already prepared a patch for the basic two classes: > > > 1) same name, different address > > > 2) same address, different name > > > > > > It would be great to have your feedback on this. > > > > Okay, I suggest to run this script on a larger set of commits and see. > > > > Probably, you can speed it your evaluation with checkpatch.pl by > > restricting it to the AUTHOR_SIGN_OFF type. I will try that over the > > weekend as well. > > > > Lukas > > Hi, > So I did run my script to evaluate the type of errors that were generated. > My results are from commits between v5.7 and v5.8 > Okay, good so far. How about running this on a larger set of commits? > 1) Same address, different name > > There were 32 such instances. In general, there was no regularity in the > specific error. > > There were some cases with letter case mismatch. Like there was one > committer who used the names "Arend Van Spriel", and "Arend van Spriel". > I believe this error should not be generated and matching be made case > neutral. > > Apart from this, the rest of the cases in this category were some with missing > first or last names, using names in other languages or having commas in > one name, using initials etc. These should be warned about. > Agree. Let us making this WARNING, but not ERROR. > 2) Same name, different address > > There were 204 such instances. Most of them were of the form in which they > used a completely different email address. > I guess we would need to understand in more detail why developers do that and if .mailmap helps here. > Only one case was found with an email extension. > <hpeter@gmail.com> > <hpeter+linux_kernel@gmail.com> > in commit 423d9118c624 > > So in general I think most email mismatches should be specified by a WARN > and for email extension there should be a --strict CHECK. > > What do you think of this? > Basically, I agree. How about proposing a patch to Joe and lkml and then we see... Lukas _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2020-09-28 15:09 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-18 10:06 [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues Lukas Bulwahn 2020-09-18 10:29 ` Dwaipayan Ray 2020-09-18 10:44 ` Lukas Bulwahn 2020-09-21 9:07 ` Dwaipayan Ray 2020-09-21 9:12 ` Lukas Bulwahn 2020-09-21 9:15 ` Lukas Bulwahn 2020-09-22 13:21 ` Dwaipayan Ray 2020-09-22 18:38 ` Lukas Bulwahn 2020-09-22 19:08 ` Dwaipayan Ray 2020-09-23 7:32 ` Lukas Bulwahn 2020-09-23 7:38 ` Dwaipayan Ray 2020-09-23 7:42 ` Lukas Bulwahn 2020-09-25 4:18 ` Dwaipayan Ray 2020-09-25 7:20 ` Lukas Bulwahn 2020-09-25 7:29 ` Dwaipayan Ray 2020-09-25 7:35 ` Lukas Bulwahn 2020-09-26 11:31 ` Dwaipayan Ray 2020-09-28 13:30 ` Dwaipayan Ray 2020-09-28 14:09 ` Lukas Bulwahn 2020-09-28 14:20 ` Dwaipayan Ray 2020-09-28 15:09 ` Lukas Bulwahn 2020-09-28 15:06 ` Lukas Bulwahn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).