linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Bulwahn <lukas.bulwahn@gmail.com>
To: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: linux-kernel-mentees@lists.linuxfoundation.org
Subject: Re: [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues
Date: Fri, 25 Sep 2020 09:35:37 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.2009250930570.5992@felia> (raw)
In-Reply-To: <CABJPP5Chc5+tNda3af88=pUC2XN8bEga4FiP=+KBwyx0UWOmyA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5755 bytes --]



On Fri, 25 Sep 2020, Dwaipayan Ray wrote:

> > Let us try to put some systematic structure into this handling of the
> > NO_AUTHOR_SIGN_OFF.
> >
> > There are basically two aspects to consider:
> >
> > - which classes are potential mismatches can we potentially observe?
> >
> > - which level of severity do we assign to each class of mismatch?
> >
> > To the first point, you started with some initial classes above.
> >
> >   0) same name, same address (no mismatch)
> >   1) same name, different address
> >      subclasses: (how 'different' address?)
> >      - email address differ by valid difference, e.g., extensions that
> >        will go to the same inbox, e.g.,  the xyz+fds extension in mail
> >        addresses. (that is your class 3, right?)
> >      - two email addresses that are known to belong to the same individual
> >        - known because of .mailmap
> >        - known otherwise?
> >   2) different name, same address
> >      subclasses: (how 'different' name?)
> >      - examples:
> >        - Firstname, Lastname (but middle-name initials differ)
> >        - special "regional" characters, like ü vs. ue, etc.
> >        - firstname and lastname are reverted etc.
> >   3) different name, different address (but still we believe it "matches")
> >      combinations of subclasses of 1) and 2), which we want to consider.
> >
> > Then, to the second question:
> >
> > So, these different classes we can think of and "assign" different
> > severities that checkpatch.pl can use.
> >
> > Checkpatch.pl already has four levels:
> >
> > three severity reporting levels: ERROR, WARN, CHECK, and
> > the level _do not report_ (which is implemented by just ignoring some kind
> > of difference).
> >
> > I think a lot of discussion will be around what severity to assign to
> > which class above, and in some way, long-term maintainers have a larger
> > say than we do here.
> >
> > So, let us first modify checkpatch.pl to identify the various classes
> > above, maybe just starting very basic, with different name, same address
> > and same name, different address and run it over the commit range and see
> > which examples show up and how often.
> >
> > For now, we first just use checkpatch.pl to identify the different types
> > and refine them into subclasses; then, we can discuss with severity should
> > be assigned to which type of mismatch.
> >
> > The second question should not invalidate our data collection and
> > identification of subclasses, though.
> >
> > Does that help? What do you think?
> >
> >
> > Lukas
> 
> Yes, that should help I think. Currently checkpatch is very vague
> about Author sign offs, so subclassing it will really be helpful.
> But at the same point I don't think it should become
> too complex either.

Agree, that is where the evaluation will help (what is really happening, 
what do we need to consider) and the severity comes into play (if we 
consider all subclasses of same severity, we do not need to differentiate 
between them.)

> 
> I have already prepared a patch for the basic two classes:
> 1) same name, different address
> 2) same address, different name
> 
> It would be great to have your feedback on this.

Okay, I suggest to run this script on a larger set of commits and see.

Probably, you can speed it your evaluation with checkpatch.pl by 
restricting it to the AUTHOR_SIGN_OFF type. I will try that over the 
weekend as well.

Lukas

> 
> ---
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9e65d21456f1..e23e753fcaf0 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1210,14 +1210,22 @@ sub reformat_email {
>   return format_email($email_name, $email_address);
>  }
> 
> -sub same_email_addresses {
> +sub same_mail_check {
>   my ($email1, $email2) = @_;
> 
>   my ($email1_name, $name1_comment, $email1_address, $comment1) =
> parse_email($email1);
>   my ($email2_name, $name2_comment, $email2_address, $comment2) =
> parse_email($email2);
> 
> - return $email1_name eq $email2_name &&
> -        $email1_address eq $email2_address;
> + my $same_name = $email1_name eq $email2_name;
> + my $same_address= $email1_address eq $email2_address;
> +
> + return ($same_name, $same_address);
> +}
> +
> +sub same_email_addresses {
> + my ($same_name, $same_address) = same_mail_check(@_);
> +
> + return $same_name && $same_address;
>  }
> 
>  sub which {
> @@ -2347,6 +2355,7 @@ sub process {
>   my $signoff = 0;
>   my $author = '';
>   my $authorsignoff = 0;
> + my $authorsignerr = '';
>   my $is_patch = 0;
>   my $is_binding_patch = -1;
>   my $in_header_lines = $file ? 0 : 1;
> @@ -2677,6 +2686,13 @@ sub process {
>   if ($author ne '') {
>   if (same_email_addresses($1, $author)) {
>   $authorsignoff = 1;
> + } else {
> + my ($same_name, $same_address) = same_mail_check($1, $author);
> + if($same_name) {
> + $authorsignerr = "ADDRESS_MISMATCH:${1}";
> + } elsif ($same_address) {
> + $authorsignerr = "NAME_MISMATCH:${1}";
> + }
>   }
>   }
>   }
> @@ -6891,8 +6907,16 @@ sub process {
>   ERROR("MISSING_SIGN_OFF",
>         "Missing Signed-off-by: line(s)\n");
>   } elsif (!$authorsignoff) {
> - WARN("NO_AUTHOR_SIGN_OFF",
> -      "Missing Signed-off-by: line by nominal patch author '$author'\n");
> + if($authorsignerr =~ /^NAME_MISMATCH:(.*)/) {
> + WARN("NO_AUTHOR_SIGN_OFF",
> +      "Author name mismatch: 'From: $author' != 'Signed-off-by: $1'\n");
> + } elsif ($authorsignerr =~ /^ADDRESS_MISMATCH:(.*)/) {
> + WARN("NO_AUTHOR_SIGN_OFF",
> +      "Author email address mismatch: 'From: $author' !=
> 'Signed-off-by: $1'\n");
> + } else {
> + WARN("NO_AUTHOR_SIGN_OFF",
> + "Missing Signed-off-by: line by nominal patch author '$author'\n");
> + }
>   }
>   }
> ---
> 
> Thanks,
> Dwaipayan.
> 

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2020-09-25  7:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 10:06 [Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues Lukas Bulwahn
2020-09-18 10:29 ` Dwaipayan Ray
2020-09-18 10:44   ` Lukas Bulwahn
2020-09-21  9:07     ` Dwaipayan Ray
2020-09-21  9:12       ` Lukas Bulwahn
2020-09-21  9:15       ` Lukas Bulwahn
2020-09-22 13:21         ` Dwaipayan Ray
2020-09-22 18:38           ` Lukas Bulwahn
2020-09-22 19:08             ` Dwaipayan Ray
2020-09-23  7:32               ` Lukas Bulwahn
2020-09-23  7:38                 ` Dwaipayan Ray
2020-09-23  7:42                   ` Lukas Bulwahn
2020-09-25  4:18                     ` Dwaipayan Ray
2020-09-25  7:20                       ` Lukas Bulwahn
2020-09-25  7:29                         ` Dwaipayan Ray
2020-09-25  7:35                           ` Lukas Bulwahn [this message]
2020-09-26 11:31                             ` Dwaipayan Ray
2020-09-28 13:30                               ` Dwaipayan Ray
2020-09-28 14:09                                 ` Lukas Bulwahn
2020-09-28 14:20                                   ` Dwaipayan Ray
2020-09-28 15:09                                     ` Lukas Bulwahn
2020-09-28 15:06                               ` Lukas Bulwahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.2009250930570.5992@felia \
    --to=lukas.bulwahn@gmail.com \
    --cc=dwaipayanray1@gmail.com \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).