linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
From: Dwaipayan Ray <dwaipayanray1@gmail.com>
To: Joe Perches <joe@perches.com>
Cc: apw@canonical.com,
	linux-kernel-mentees@lists.linuxfoundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
Date: Sun, 20 Sep 2020 21:52:34 +0530	[thread overview]
Message-ID: <CABJPP5Chm2xd2PW77=Ru9t4C6Yvq3SyEmr1gKsaQGyF5AxRVfA@mail.gmail.com> (raw)
In-Reply-To: <7958ded756c895ca614ba900aae7b830a992475e.camel@perches.com>

On Sun, Sep 20, 2020 at 8:39 PM Joe Perches <joe@perches.com> wrote:
>
> On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> > Checkpatch did not handle cases where the author From: header
> > was split into multiple lines. The author identity could not
> > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > warning.
>
> Hi Dwaipayan.
>
> > A typical example is Commit e33bcbab16d1 ("tee: add support for
> > session's client UUID generation"). When checkpatch was run on
> > this commit, it displayed:
> >
> > "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
> > patch author ''"
> >
> > This was due to split header lines not being handled properly and
> > the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
> > if missing author Signed-off-by"):
> >
> > "Split From: headers are not fully handled: only the first part
> > is compared."
> >
> > Support split From: headers by correctly parsing the header
> > extension lines. RFC 2822, Section-2.2.3 stated that each extended
> > line must start with a WSP character (a space or htab). The solution
> > was therefore to concatenate the lines which start with a WSP to
> > get the correct long header.
>
> This is a good commit message, though I believe the
> latest rfc is 5322.  I'm not sure there is any real
> difference in the referenced section though.
>
> While your patch seems to work for git format-email,
> other emailers seem to set headers that have multiple
> whitespace chars that should be collapsed into a
> single space.
>
> I think you'll find that the eliding all whitespace
> after header folding causes mismatches for emails.
>
> For instance:
>
> From:   "=?UTF-8?q?Christian=20K=C3=B6nig?="
>         <ckoenig.leichtzumerken@gmail.com>
>
> Always inserting a single space if there is any
> whitespace after the folding WSP might be better
> otherwise this is decoded as
>
> From: "Christian König"<ckoenig.leichtzumerken@gmail.com>
>

Hi,
I think eliding all whitespaces shouldn't cause an issue
because at the end of the From: header parser block,
there is a call to reformat_email($author).

   $author =~ s/"//g;
   $author = reformat_email($author);

The subroutine reformat_email reparses the author string such
that the correct name <address> format is maintainined.

In revision b3b33d3c43bb,
line 1206:
sub reformat_email {
    my ($email) = @_;
    my ($email_name, $name_comment, $email_address,
$comment) = parse_email($email);
    return format_email($email_name, $email_address);
}

And I also checked the format_email subroutine:
line 1997:
if ("$name" eq "") {
    $formatted_email = "$address";
} else {
    $formatted_email = "$name <$address>";
}
return $formatted_email;

So I think the author string is basically reconstructed to
maintain the correct format.

As you pointed out, at first the author string might be:
  "Christian König"<ckoenig.leichtzumerken@gmail.com>

But after reformat_email is called, $author should be:
  Christian König <ckoenig.leichtzumerken@gmail.com>

So, I think there won't be any problem. Is my
observation correct?


> What I have does a bit more by saving any post-folding
>
> "From: <name and email address>"
>
> and comparing that to any "name and perhaps different
> email address" in a Signed-off-by: line.
>
> A new message is emitted if the name matches but the
> email address is different.
>
> Perhaps it's reasonable to apply your patch and then
> update it with something like the below:
> ---
>  scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 3e474072aa90..1ecc179e938d 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1240,6 +1240,15 @@ sub same_email_addresses {
>                $email1_address eq $email2_address;
>  }
>
> +sub same_email_names {
> +       my ($email1, $email2) = @_;
> +
> +       my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
> +       my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
> +
> +       return $email1_name eq $email2_name;
> +}
> +
>  sub which {
>         my ($bin) = @_;
>
> @@ -2679,20 +2688,32 @@ sub process {
>                 }
>
>  # Check the patch for a From:
> -               if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> +               if ($line =~ /^From:\s*(.*)/i) {
>                         $author = $1;
> -                       $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> +                       my $curline = $linenr;
> +                       while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
> +                               $author .= ' ' if (defined($1));
> +                               $author .= "$2";
> +                       }
> +                       if ($author =~ /=\?utf-8\?/i) {
> +                               $author = decode("MIME-Header", $author);
> +                               $author = encode("utf8", $author);
> +                       }
> +
>                         $author =~ s/"//g;
>                         $author = reformat_email($author);
>                 }
>
>  # Check the patch for a signoff:
>                 if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
> +                       my $sig = $1;
>                         $signoff++;
>                         $in_commit_log = 0;
>                         if ($author ne '') {
> -                               if (same_email_addresses($1, $author)) {
> -                                       $authorsignoff = 1;
> +                               if (same_email_addresses($sig, $author)) {
> +                                       $authorsignoff = "1";
> +                               } elsif (same_email_names($sig, $author)) {
> +                                       $authorsignoff = $sig;
>                                 }
>                         }
>                 }
> @@ -6937,6 +6958,9 @@ sub process {
>                 } elsif (!$authorsignoff) {
>                         WARN("NO_AUTHOR_SIGN_OFF",
>                              "Missing Signed-off-by: line by nominal patch author '$author'\n");
> +               } elsif ($authorsignoff ne "1") {
> +                       WARN("NO_AUTHOR_SIGN_OFF",
> +                            "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
>                 }
>         }
>
>

Yes, this is definitely more logical !
I was actually hoping to talk with you on this.

The code you sent better handles name mismatches when
email addresses are same. But I also have found several
such commits in which the author have signed off using
a different email address than the one which he/she used
to send the patch.

For example, Lukas checked commits between v5.4 and
v5.8 and he found:
    175 Missing Signed-off-by: line by nominal patch author
    'Daniel Vetter <daniel.vetter@ffwll.ch>'

Infact in all of those commits he signed off using a different
mail, Daniel Vetter <daniel.vetter@intel.com>.

So is it possible to resolve these using perhaps .mailmap
entries? Or should only the name mismatch part be better
handled? Or perhaps both?

Also, I would like to know if there are any more changes
required for the current patch or if it is good to go?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2020-09-20 16:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-20  9:17 [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
2020-09-20 15:09 ` Joe Perches
2020-09-20 16:22   ` Dwaipayan Ray [this message]
2020-09-20 16:54     ` Joe Perches
2020-09-21  7:39       ` Lukas Bulwahn
2020-09-21  9:47         ` Joe Perches
2020-09-20 17:39 ` Joe Perches
2020-09-21  7:49 ` Lukas Bulwahn
2020-09-21  8:31   ` Dwaipayan Ray
  -- strict thread matches above, loose matches on Subject: below --
2020-09-19 20:47 Dwaipayan Ray
2020-09-19 21:15 ` Dwaipayan Ray
2020-09-20  8:11   ` Lukas Bulwahn
2020-09-20  8:01 ` Lukas Bulwahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABJPP5Chm2xd2PW77=Ru9t4C6Yvq3SyEmr1gKsaQGyF5AxRVfA@mail.gmail.com' \
    --to=dwaipayanray1@gmail.com \
    --cc=apw@canonical.com \
    --cc=joe@perches.com \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).