linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
@ 2020-09-19 20:47 Dwaipayan Ray
  2020-09-19 21:15 ` Dwaipayan Ray
  2020-09-20  8:01 ` Lukas Bulwahn
  0 siblings, 2 replies; 13+ messages in thread
From: Dwaipayan Ray @ 2020-09-19 20:47 UTC (permalink / raw)
  To: lukas.bulwahn; +Cc: dwaipayanray1, linux-kernel-mentees

Checkpatch did not handle cases where the author From: header
was split into multiple lines. The author identity could not
be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
warning.

Support split From: headers in AUTHOR_SIGN_OFF check by correctly
parsing the entire From: header in such cases.

Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 504d2e431c60..8ee61ec346b3 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2661,6 +2661,10 @@ sub process {
 # Check the patch for a From:
 		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
 			$author = $1;
+			my $curline = $linenr-1;
+			while(defined($rawlines[++$curline]) && ($rawlines[$curline] =~ /^\s+(.*)/)) {
+				$author.= $1;
+			}
 			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
 			$author =~ s/"//g;
 			$author = reformat_email($author);
-- 
2.27.0

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 20:47 [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
@ 2020-09-19 21:15 ` Dwaipayan Ray
  2020-09-20  8:11   ` Lukas Bulwahn
  2020-09-20  8:01 ` Lukas Bulwahn
  1 sibling, 1 reply; 13+ messages in thread
From: Dwaipayan Ray @ 2020-09-19 21:15 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

On Sun, Sep 20, 2020 at 2:18 AM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
>
> Checkpatch did not handle cases where the author From: header
> was split into multiple lines. The author identity could not
> be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> warning.
>
> Support split From: headers in AUTHOR_SIGN_OFF check by correctly
> parsing the entire From: header in such cases.
>
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> ---
>  scripts/checkpatch.pl | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 504d2e431c60..8ee61ec346b3 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2661,6 +2661,10 @@ sub process {
>  # Check the patch for a From:
>                 if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
>                         $author = $1;
> +                       my $curline = $linenr-1;
> +                       while(defined($rawlines[++$curline]) && ($rawlines[$curline] =~ /^\s+(.*)/)) {
> +                               $author.= $1;
> +                       }
>                         $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
>                         $author =~ s/"//g;
>                         $author = reformat_email($author);
> --
> 2.27.0
>

Hi,
Based on Joe's suggestions I updated the patch. It's totally similar
except for the fact that Joe missed a closing parenthesis and
he removed the decode("MIME-Header") part which caused parsing
of utf-8 to fail.

example: (commit e33bcbab16d1)
WARNING: Missing Signed-off-by: line by nominal patch author
'"=?UTF-8?q?Vesa=20J=C3=A4=C3=A4skel=C3=A4inen?="
<vesa.jaaskelainen@vaisala.com>

So I fixed these.

Also, Joe stated that all split lines must start with a space. But I
looked it up and it states that it must be a WSP, that is either a
single space or a horizontal tab.
References:
WSP: (https://www.rfc-editor.org/std/std68.txt)
Long headers: https://tools.ietf.org/html/rfc2822#section-2.2.3

So, i went with the ^\s+ expression instead of ^ s*. Is it alright
or should be changed?

I do have a final question though, since he had already solved it,
is it okay to send this patch in again?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 20:47 [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
  2020-09-19 21:15 ` Dwaipayan Ray
@ 2020-09-20  8:01 ` Lukas Bulwahn
  1 sibling, 0 replies; 13+ messages in thread
From: Lukas Bulwahn @ 2020-09-20  8:01 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Sun, 20 Sep 2020, Dwaipayan Ray wrote:

> Checkpatch did not handle cases where the author From: header
> was split into multiple lines. The author identity could not
> be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> warning.
> 
> Support split From: headers in AUTHOR_SIGN_OFF check by correctly
> parsing the entire From: header in such cases.
>

Looks good to me.

Maybe you can some examples that are now handled better, typical cases of 
the different classes and a summary on your evaluation in the commit 
message.

Do that and send out the PATCH v2 to the linux-kernel mailing list with 
the typical recipients.

Lukas
 
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> ---
>  scripts/checkpatch.pl | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 504d2e431c60..8ee61ec346b3 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2661,6 +2661,10 @@ sub process {
>  # Check the patch for a From:
>  		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
>  			$author = $1;
> +			my $curline = $linenr-1;
> +			while(defined($rawlines[++$curline]) && ($rawlines[$curline] =~ /^\s+(.*)/)) {
> +				$author.= $1;
> +			}
>  			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
>  			$author =~ s/"//g;
>  			$author = reformat_email($author);
> -- 
> 2.27.0
> 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 21:15 ` Dwaipayan Ray
@ 2020-09-20  8:11   ` Lukas Bulwahn
  0 siblings, 0 replies; 13+ messages in thread
From: Lukas Bulwahn @ 2020-09-20  8:11 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Sun, 20 Sep 2020, Dwaipayan Ray wrote:

> On Sun, Sep 20, 2020 at 2:18 AM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
> >
> > Checkpatch did not handle cases where the author From: header
> > was split into multiple lines. The author identity could not
> > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > warning.
> >
> > Support split From: headers in AUTHOR_SIGN_OFF check by correctly
> > parsing the entire From: header in such cases.
> >
> > Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> > ---
> >  scripts/checkpatch.pl | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 504d2e431c60..8ee61ec346b3 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -2661,6 +2661,10 @@ sub process {
> >  # Check the patch for a From:
> >                 if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> >                         $author = $1;
> > +                       my $curline = $linenr-1;
> > +                       while(defined($rawlines[++$curline]) && ($rawlines[$curline] =~ /^\s+(.*)/)) {
> > +                               $author.= $1;
> > +                       }
> >                         $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> >                         $author =~ s/"//g;
> >                         $author = reformat_email($author);
> > --
> > 2.27.0
> >
> 
> Hi,
> Based on Joe's suggestions I updated the patch. It's totally similar
> except for the fact that Joe missed a closing parenthesis and
> he removed the decode("MIME-Header") part which caused parsing
> of utf-8 to fail.
> 
> example: (commit e33bcbab16d1)
> WARNING: Missing Signed-off-by: line by nominal patch author
> '"=?UTF-8?q?Vesa=20J=C3=A4=C3=A4skel=C3=A4inen?="
> <vesa.jaaskelainen@vaisala.com>
> 
> So I fixed these.
> 
> Also, Joe stated that all split lines must start with a space. But I
> looked it up and it states that it must be a WSP, that is either a
> single space or a horizontal tab.
> References:
> WSP: (https://www.rfc-editor.org/std/std68.txt)
> Long headers: https://tools.ietf.org/html/rfc2822#section-2.2.3
> 
> So, i went with the ^\s+ expression instead of ^ s*. Is it alright
> or should be changed?
> 
> I do have a final question though, since he had already solved it,
> is it okay to send this patch in again?
>

Yes, please provide your design decisions in the commit message.

You can add a 'Suggested-by:' tag to name that Joe pointed out the idea.

Also, you can add a 'Link:' to the previous email from Joe with a proper 
lore.kernel.org hyperlink.

Just send out your PATCH v2, then we can all review and test it.

When we have this patch accepted, we can discuss with Joe some way to 
handle the further cases of false warnings in AUTHOR_SIGN_OFF.


Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-21  7:39       ` Lukas Bulwahn
@ 2020-09-21  9:47         ` Joe Perches
  0 siblings, 0 replies; 13+ messages in thread
From: Joe Perches @ 2020-09-21  9:47 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: apw, Dwaipayan Ray, linux-kernel-mentees, linux-kernel

On Mon, 2020-09-21 at 09:39 +0200, Lukas Bulwahn wrote:
> On Sun, 20 Sep 2020, Joe Perches wrote:
> > On Sun, 2020-09-20 at 21:52 +0530, Dwaipayan Ray wrote:
[]
> > > For example, Lukas checked commits between v5.4 and
> > > v5.8 and he found:
> > >     175 Missing Signed-off-by: line by nominal patch authorrep
> > >     'Daniel Vetter <daniel.vetter@ffwll.ch>'
> > > 
> > > Infact in all of those commits he signed off using a different
> > > mail, Daniel Vetter <daniel.vetter@intel.com>.
> > > 
> > > So is it possible to resolve these using perhaps .mailmap
> > > entries? Or should only the name mismatch part be better
> > > handled? Or perhaps both?
> > 
> > Dunno.  It certainly can be improved...
> > Try adding some more logic and see what you come up with.
> > 
> > btw:
> > 
> > The most frequent NO_AUTHOR_SIGN_OFF messages for v5.7..v5.8 are
> > 
> >      98 WARNING: From:/SoB: email address mismatch: 'From: Daniel Vetter <daniel.vetter@ffwll.ch>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>'
> >      24 WARNING: From:/SoB: email address mismatch: 'From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>' != 'Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>'
> >      19 WARNING: From:/SoB: email address mismatch: 'From: Wolfram Sang <wsa+renesas@sang-engineering.com>' != 'Signed-off-by: Wolfram Sang <wsa@kernel.org>'
> >      11 WARNING: From:/SoB: email address mismatch: 'From: Luke Nelson <lukenels@cs.washington.edu>' != 'Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>'
> >       8 WARNING: From:/SoB: email address mismatch: 'From: Christophe Leroy <christophe.leroy@c-s.fr>' != 'Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>'
> >       6 WARNING: From:/SoB: email address mismatch: 'From: Davidlohr Bueso <dave@stgolabs.net>' != 'Signed-off-by: Davidlohr Bueso <dbueso@suse.de>'
> >       5 WARNING: Missing Signed-off-by: line by nominal patch author '"Paul A. Clarke" <pc@us.ibm.com>'
> >       4 WARNING: Missing Signed-off-by: line by nominal patch author 'Huang Ying <ying.huang@intel.com>'
> >       3 WARNING: Missing Signed-off-by: line by nominal patch author '"Stephan Müller" <smueller@chronox.de>'
[]
> I hope Dwaipayan can come up with a '.AUTHOR_SIGN_OFF.mailmap' file that 
> we can use to distinguish the known developers that knowingly and 
> intentionally use different identities vs. the 'newbies' that should 
> validly be warned.

I hope not.  One .mailmap should be enough.


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-21  7:49 ` Lukas Bulwahn
@ 2020-09-21  8:31   ` Dwaipayan Ray
  0 siblings, 0 replies; 13+ messages in thread
From: Dwaipayan Ray @ 2020-09-21  8:31 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: Joe Perches, apw, linux-kernel-mentees, linux-kernel

On Mon, Sep 21, 2020 at 1:19 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
>
> Dwaipayan, just a quick nitpick:
>
> Your subject line has two spaces in front of 'From:'
>
> On Sun, 20 Sep 2020, Dwaipayan Ray wrote:
>
> > Checkpatch did not handle cases where the author From: header
> > was split into multiple lines. The author identity could not
> > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > warning.
> >
> > A typical example is Commit e33bcbab16d1 ("tee: add support for
>
> You can write 'Commit' lowercase as 'commit'.
>
> > session's client UUID generation"). When checkpatch was run on
> > this commit, it displayed:
> >
> > "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
> > patch author ''"
> >
> > This was due to split header lines not being handled properly and
> > the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
> > if missing author Signed-off-by"):
> >
>
> Same here.
>
> > "Split From: headers are not fully handled: only the first part
> > is compared."
> >
> > Support split From: headers by correctly parsing the header
> > extension lines. RFC 2822, Section-2.2.3 stated that each extended
> > line must start with a WSP character (a space or htab). The solution
> > was therefore to concatenate the lines which start with a WSP to
> > get the correct long header.
> >
> > Suggested-by: Joe Perches <joe@perches.com>
> > Link: https://lore.kernel.org/linux-kernel-mentees/f5d8124e54a50480b0a9fa638787bc29b6e09854.camel@perches.com/
> > Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
>
>
>
> On v5.4..v5.7 using data from a previous evaluation, I found 116 commits
> with
> the error message "Missing Signed-off-by: line by nominal patch author
> ''",
> when running ./scripts/checkpatch.pl on v5.9-rc6.
>
>
> After this patch application, all 116 warnings disappeared with "Missing
> Signed-off-by: line by nominal patch author ''"
> and these three new warnings appeared:
>
> 322f6a3182d42df18059a89c53b09d33919f755e:35: WARNING: Missing Signed-off-by: line by nominal patch author 'Johnson CH Chen <JohnsonCH.Chen@moxa.com>'
> 18977008f44c66bdd6d20bb432adf71372589303:37: WARNING: Missing Signed-off-by: line by nominal patch author '"Marek Szyprowski via Linux.Kernel.Org" <m.szyprowski=samsung.com@linux.kernel.org>'
> ed3520427f57327f581de0cc28c1c30df08f0103:51: WARNING: Missing Signed-off-by: line by nominal patch author 'Chengguang Xu via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>'
>
> With that said, I think am happy to add my tags:
>
> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
>
> Dwaipayan, you can fix the minor commit message issues above, add the tags
> from Joe and me to the end of your commit message and send the patch as v3
> out to Andrew Morton with everyone sofar as CC. Andrew Morton will pick up
> the patch and make it travel to Linus Torvalds.
>
> Good job so far!
>
> After you did that, let us develop and discuss a plan to refine the
> 'trickier' part of false AUTHOR_SIGN_OFF warnings for developer and
> maintainers with some known special author and sign-off procedures.
>
> Lukas
>
> > ---
> >  scripts/checkpatch.pl | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 504d2e431c60..9e65d21456f1 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -2661,6 +2661,10 @@ sub process {
> >  # Check the patch for a From:
> >               if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> >                       $author = $1;
> > +                     my $curline = $linenr;
> > +                     while(defined($rawlines[$curline]) && ($rawlines[$curline++] =~ /^[ \t]\s*(.*)/)) {
> > +                             $author .= $1;
> > +                     }
> >                       $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> >                       $author =~ s/"//g;
> >                       $author = reformat_email($author);
> > --
> > 2.27.0
> >
> >

Hi,
Sure! I will do that and get back to you.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-20  9:17 Dwaipayan Ray
  2020-09-20 15:09 ` Joe Perches
  2020-09-20 17:39 ` Joe Perches
@ 2020-09-21  7:49 ` Lukas Bulwahn
  2020-09-21  8:31   ` Dwaipayan Ray
  2 siblings, 1 reply; 13+ messages in thread
From: Lukas Bulwahn @ 2020-09-21  7:49 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: joe, linux-kernel-mentees, apw, linux-kernel


Dwaipayan, just a quick nitpick:

Your subject line has two spaces in front of 'From:'

On Sun, 20 Sep 2020, Dwaipayan Ray wrote:

> Checkpatch did not handle cases where the author From: header
> was split into multiple lines. The author identity could not
> be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> warning.
> 
> A typical example is Commit e33bcbab16d1 ("tee: add support for

You can write 'Commit' lowercase as 'commit'.

> session's client UUID generation"). When checkpatch was run on
> this commit, it displayed:
> 
> "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
> patch author ''"
> 
> This was due to split header lines not being handled properly and
> the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
> if missing author Signed-off-by"):
>

Same here.
 
> "Split From: headers are not fully handled: only the first part
> is compared."
> 
> Support split From: headers by correctly parsing the header
> extension lines. RFC 2822, Section-2.2.3 stated that each extended
> line must start with a WSP character (a space or htab). The solution
> was therefore to concatenate the lines which start with a WSP to
> get the correct long header.
> 
> Suggested-by: Joe Perches <joe@perches.com>
> Link: https://lore.kernel.org/linux-kernel-mentees/f5d8124e54a50480b0a9fa638787bc29b6e09854.camel@perches.com/
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>



On v5.4..v5.7 using data from a previous evaluation, I found 116 commits 
with
the error message "Missing Signed-off-by: line by nominal patch author 
''",
when running ./scripts/checkpatch.pl on v5.9-rc6.


After this patch application, all 116 warnings disappeared with "Missing 
Signed-off-by: line by nominal patch author ''"
and these three new warnings appeared:

322f6a3182d42df18059a89c53b09d33919f755e:35: WARNING: Missing Signed-off-by: line by nominal patch author 'Johnson CH Chen <JohnsonCH.Chen@moxa.com>'
18977008f44c66bdd6d20bb432adf71372589303:37: WARNING: Missing Signed-off-by: line by nominal patch author '"Marek Szyprowski via Linux.Kernel.Org" <m.szyprowski=samsung.com@linux.kernel.org>'
ed3520427f57327f581de0cc28c1c30df08f0103:51: WARNING: Missing Signed-off-by: line by nominal patch author 'Chengguang Xu via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>'

With that said, I think am happy to add my tags:

Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>

Dwaipayan, you can fix the minor commit message issues above, add the tags 
from Joe and me to the end of your commit message and send the patch as v3 
out to Andrew Morton with everyone sofar as CC. Andrew Morton will pick up 
the patch and make it travel to Linus Torvalds.

Good job so far!

After you did that, let us develop and discuss a plan to refine the 
'trickier' part of false AUTHOR_SIGN_OFF warnings for developer and 
maintainers with some known special author and sign-off procedures.

Lukas

> ---
>  scripts/checkpatch.pl | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 504d2e431c60..9e65d21456f1 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2661,6 +2661,10 @@ sub process {
>  # Check the patch for a From:
>  		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
>  			$author = $1;
> +			my $curline = $linenr;
> +			while(defined($rawlines[$curline]) && ($rawlines[$curline++] =~ /^[ \t]\s*(.*)/)) {
> +				$author .= $1;
> +			}
>  			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
>  			$author =~ s/"//g;
>  			$author = reformat_email($author);
> -- 
> 2.27.0
> 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-20 16:54     ` Joe Perches
@ 2020-09-21  7:39       ` Lukas Bulwahn
  2020-09-21  9:47         ` Joe Perches
  0 siblings, 1 reply; 13+ messages in thread
From: Lukas Bulwahn @ 2020-09-21  7:39 UTC (permalink / raw)
  To: Joe Perches; +Cc: apw, Dwaipayan Ray, linux-kernel-mentees, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7958 bytes --]



On Sun, 20 Sep 2020, Joe Perches wrote:

> On Sun, 2020-09-20 at 21:52 +0530, Dwaipayan Ray wrote:
> > On Sun, Sep 20, 2020 at 8:39 PM Joe Perches <joe@perches.com> wrote:
> > > On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> > > > Checkpatch did not handle cases where the author From: header
> > > > was split into multiple lines. The author identity could not
> > > > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > > > warning.
> > > 
> > I think there won't be any problem. Is my
> > observation correct?
> 
> Likely true, it probably doesn't matter.
> Still, I'd imagine it doesn't hurt either.
> 
> > > What I have does a bit more by saving any post-folding
> > > 
> > > "From: <name and email address>"
> > > 
> > > and comparing that to any "name and perhaps different
> > > email address" in a Signed-off-by: line.
> > > 
> > > A new message is emitted if the name matches but the
> > > email address is different.
> > > 
> > > Perhaps it's reasonable to apply your patch and then
> > > update it with something like the below:
> > > ---
> > >  scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
> > >  1 file changed, 28 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 3e474072aa90..1ecc179e938d 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -1240,6 +1240,15 @@ sub same_email_addresses {
> > >                $email1_address eq $email2_address;
> > >  }
> > > 
> > > +sub same_email_names {
> > > +       my ($email1, $email2) = @_;
> > > +
> > > +       my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
> > > +       my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
> > > +
> > > +       return $email1_name eq $email2_name;
> > > +}
> > > +
> > >  sub which {
> > >         my ($bin) = @_;
> > > 
> > > @@ -2679,20 +2688,32 @@ sub process {
> > >                 }
> > > 
> > >  # Check the patch for a From:
> > > -               if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > > +               if ($line =~ /^From:\s*(.*)/i) {
> > >                         $author = $1;
> > > -                       $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > > +                       my $curline = $linenr;
> > > +                       while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
> > > +                               $author .= ' ' if (defined($1));
> > > +                               $author .= "$2";
> > > +                       }
> > > +                       if ($author =~ /=\?utf-8\?/i) {
> > > +                               $author = decode("MIME-Header", $author);
> > > +                               $author = encode("utf8", $author);
> > > +                       }
> > > +
> > >                         $author =~ s/"//g;
> > >                         $author = reformat_email($author);
> > >                 }
> > > 
> > >  # Check the patch for a signoff:
> > >                 if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
> > > +                       my $sig = $1;
> > >                         $signoff++;
> > >                         $in_commit_log = 0;
> > >                         if ($author ne '') {
> > > -                               if (same_email_addresses($1, $author)) {
> > > -                                       $authorsignoff = 1;
> > > +                               if (same_email_addresses($sig, $author)) {
> > > +                                       $authorsignoff = "1";
> > > +                               } elsif (same_email_names($sig, $author)) {
> > > +                                       $authorsignoff = $sig;
> > >                                 }
> > >                         }
> > >                 }
> > > @@ -6937,6 +6958,9 @@ sub process {
> > >                 } elsif (!$authorsignoff) {
> > >                         WARN("NO_AUTHOR_SIGN_OFF",
> > >                              "Missing Signed-off-by: line by nominal patch author '$author'\n");
> > > +               } elsif ($authorsignoff ne "1") {
> > > +                       WARN("NO_AUTHOR_SIGN_OFF",
> > > +                            "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
> > >                 }
> > >         }
> > > 
> > > 
> > 
> > Yes, this is definitely more logical !
> > I was actually hoping to talk with you on this.
> 
> Hope granted, but only via email... (smile)
> 
> > The code you sent better handles name mismatches when
> > email addresses are same. But I also have found several
> > such commits in which the author have signed off using
> > a different email address than the one which he/she used
> > to send the patch.
> > 
> > For example, Lukas checked commits between v5.4 and
> > v5.8 and he found:
> >     175 Missing Signed-off-by: line by nominal patch authorrep
> >     'Daniel Vetter <daniel.vetter@ffwll.ch>'
> > 
> > Infact in all of those commits he signed off using a different
> > mail, Daniel Vetter <daniel.vetter@intel.com>.
> > 
> > So is it possible to resolve these using perhaps .mailmap
> > entries? Or should only the name mismatch part be better
> > handled? Or perhaps both?
> 
> Dunno.  It certainly can be improved...
> Try adding some more logic and see what you come up with.
> 
> btw:
> 
> The most frequent NO_AUTHOR_SIGN_OFF messages for v5.7..v5.8 are
> 
>      98 WARNING: From:/SoB: email address mismatch: 'From: Daniel Vetter <daniel.vetter@ffwll.ch>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>'
>      24 WARNING: From:/SoB: email address mismatch: 'From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>' != 'Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>'
>      19 WARNING: From:/SoB: email address mismatch: 'From: Wolfram Sang <wsa+renesas@sang-engineering.com>' != 'Signed-off-by: Wolfram Sang <wsa@kernel.org>'
>      11 WARNING: From:/SoB: email address mismatch: 'From: Luke Nelson <lukenels@cs.washington.edu>' != 'Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>'
>       8 WARNING: From:/SoB: email address mismatch: 'From: Christophe Leroy <christophe.leroy@c-s.fr>' != 'Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>'
>       6 WARNING: From:/SoB: email address mismatch: 'From: Davidlohr Bueso <dave@stgolabs.net>' != 'Signed-off-by: Davidlohr Bueso <dbueso@suse.de>'
>       5 WARNING: Missing Signed-off-by: line by nominal patch author '"Paul A. Clarke" <pc@us.ibm.com>'
>       4 WARNING: Missing Signed-off-by: line by nominal patch author 'Huang Ying <ying.huang@intel.com>'
>       3 WARNING: Missing Signed-off-by: line by nominal patch author '"Stephan Müller" <smueller@chronox.de>'
>

Great minds think alike.

I did a similar investigation on Friday after the first discussion with 
Dwaipayan:

https://lore.kernel.org/linux-kernel-mentees/alpine.DEB.2.21.2009181238230.14717@felia/T/#m1bf5f7ca876d33d4d53e492b5d8a6232437c921f

I hope Dwaipayan can come up with a '.AUTHOR_SIGN_OFF.mailmap' file that 
we can use to distinguish the known developers that knowingly and 
intentionally use different identities vs. the 'newbies' that should 
validly be warned.

We will see if Dwaipayan can come up with a good idea how to handle that.

Lukas

> For the Missing Signed-off-by: lines above,
> even after decoding, the email matches but
> the names do not.
> 
> From: "Paul A. Clarke" <pc@us.ibm.com>
> [...]
> Signed-off-by: Paul Clarke <pc@us.ibm.com>
> 
> From: Huang Ying <ying.huang@intel.com>
> [...]
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> 
> From: =?UTF-8?q?Stephan=20M=C3=BCller?= <smueller@chronox.de>
> [...]
> Signed-off-by: Stephan Mueller <smueller@chronox.de>
> 
> > Also, I would like to know if there are any more changes
> > required for the current patch or if it is good to go?
> 
> I think it's fine.
> 
> cheers, Joe
> 
> 

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-20  9:17 Dwaipayan Ray
  2020-09-20 15:09 ` Joe Perches
@ 2020-09-20 17:39 ` Joe Perches
  2020-09-21  7:49 ` Lukas Bulwahn
  2 siblings, 0 replies; 13+ messages in thread
From: Joe Perches @ 2020-09-20 17:39 UTC (permalink / raw)
  To: Dwaipayan Ray, Andrew Morton; +Cc: apw, linux-kernel-mentees, linux-kernel

On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> Checkpatch did not handle cases where the author From: header
> was split into multiple lines. The author identity could not
> be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> warning.
> 
> A typical example is Commit e33bcbab16d1 ("tee: add support for
> session's client UUID generation"). When checkpatch was run on
> this commit, it displayed:
> 
> "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
> patch author ''"
> 
> This was due to split header lines not being handled properly and
> the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
> if missing author Signed-off-by"):
> 
> "Split From: headers are not fully handled: only the first part
> is compared."
> 
> Support split From: headers by correctly parsing the header
> extension lines. RFC 2822, Section-2.2.3 stated that each extended
> line must start with a WSP character (a space or htab). The solution
> was therefore to concatenate the lines which start with a WSP to
> get the correct long header.
> 
> Suggested-by: Joe Perches <joe@perches.com>
> Link: https://lore.kernel.org/linux-kernel-mentees/f5d8124e54a50480b0a9fa638787bc29b6e09854.camel@perches.com/
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>

Acked-by: Joe Perches <joe@perches.com>

> ---
>  scripts/checkpatch.pl | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 504d2e431c60..9e65d21456f1 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2661,6 +2661,10 @@ sub process {
>  # Check the patch for a From:
>  		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
>  			$author = $1;
> +			my $curline = $linenr;
> +			while(defined($rawlines[$curline]) && ($rawlines[$curline++] =~ /^[ \t]\s*(.*)/)) {
> +				$author .= $1;
> +			}
>  			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
>  			$author =~ s/"//g;
>  			$author = reformat_email($author);

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-20 16:22   ` Dwaipayan Ray
@ 2020-09-20 16:54     ` Joe Perches
  2020-09-21  7:39       ` Lukas Bulwahn
  0 siblings, 1 reply; 13+ messages in thread
From: Joe Perches @ 2020-09-20 16:54 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: apw, linux-kernel-mentees, linux-kernel

On Sun, 2020-09-20 at 21:52 +0530, Dwaipayan Ray wrote:
> On Sun, Sep 20, 2020 at 8:39 PM Joe Perches <joe@perches.com> wrote:
> > On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> > > Checkpatch did not handle cases where the author From: header
> > > was split into multiple lines. The author identity could not
> > > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > > warning.
> > 
> I think there won't be any problem. Is my
> observation correct?

Likely true, it probably doesn't matter.
Still, I'd imagine it doesn't hurt either.

> > What I have does a bit more by saving any post-folding
> > 
> > "From: <name and email address>"
> > 
> > and comparing that to any "name and perhaps different
> > email address" in a Signed-off-by: line.
> > 
> > A new message is emitted if the name matches but the
> > email address is different.
> > 
> > Perhaps it's reasonable to apply your patch and then
> > update it with something like the below:
> > ---
> >  scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
> >  1 file changed, 28 insertions(+), 4 deletions(-)
> > 
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 3e474072aa90..1ecc179e938d 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -1240,6 +1240,15 @@ sub same_email_addresses {
> >                $email1_address eq $email2_address;
> >  }
> > 
> > +sub same_email_names {
> > +       my ($email1, $email2) = @_;
> > +
> > +       my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
> > +       my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
> > +
> > +       return $email1_name eq $email2_name;
> > +}
> > +
> >  sub which {
> >         my ($bin) = @_;
> > 
> > @@ -2679,20 +2688,32 @@ sub process {
> >                 }
> > 
> >  # Check the patch for a From:
> > -               if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > +               if ($line =~ /^From:\s*(.*)/i) {
> >                         $author = $1;
> > -                       $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > +                       my $curline = $linenr;
> > +                       while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
> > +                               $author .= ' ' if (defined($1));
> > +                               $author .= "$2";
> > +                       }
> > +                       if ($author =~ /=\?utf-8\?/i) {
> > +                               $author = decode("MIME-Header", $author);
> > +                               $author = encode("utf8", $author);
> > +                       }
> > +
> >                         $author =~ s/"//g;
> >                         $author = reformat_email($author);
> >                 }
> > 
> >  # Check the patch for a signoff:
> >                 if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
> > +                       my $sig = $1;
> >                         $signoff++;
> >                         $in_commit_log = 0;
> >                         if ($author ne '') {
> > -                               if (same_email_addresses($1, $author)) {
> > -                                       $authorsignoff = 1;
> > +                               if (same_email_addresses($sig, $author)) {
> > +                                       $authorsignoff = "1";
> > +                               } elsif (same_email_names($sig, $author)) {
> > +                                       $authorsignoff = $sig;
> >                                 }
> >                         }
> >                 }
> > @@ -6937,6 +6958,9 @@ sub process {
> >                 } elsif (!$authorsignoff) {
> >                         WARN("NO_AUTHOR_SIGN_OFF",
> >                              "Missing Signed-off-by: line by nominal patch author '$author'\n");
> > +               } elsif ($authorsignoff ne "1") {
> > +                       WARN("NO_AUTHOR_SIGN_OFF",
> > +                            "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
> >                 }
> >         }
> > 
> > 
> 
> Yes, this is definitely more logical !
> I was actually hoping to talk with you on this.

Hope granted, but only via email... (smile)

> The code you sent better handles name mismatches when
> email addresses are same. But I also have found several
> such commits in which the author have signed off using
> a different email address than the one which he/she used
> to send the patch.
> 
> For example, Lukas checked commits between v5.4 and
> v5.8 and he found:
>     175 Missing Signed-off-by: line by nominal patch authorrep
>     'Daniel Vetter <daniel.vetter@ffwll.ch>'
> 
> Infact in all of those commits he signed off using a different
> mail, Daniel Vetter <daniel.vetter@intel.com>.
> 
> So is it possible to resolve these using perhaps .mailmap
> entries? Or should only the name mismatch part be better
> handled? Or perhaps both?

Dunno.  It certainly can be improved...
Try adding some more logic and see what you come up with.

btw:

The most frequent NO_AUTHOR_SIGN_OFF messages for v5.7..v5.8 are

     98 WARNING: From:/SoB: email address mismatch: 'From: Daniel Vetter <daniel.vetter@ffwll.ch>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>'
     24 WARNING: From:/SoB: email address mismatch: 'From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>' != 'Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>'
     19 WARNING: From:/SoB: email address mismatch: 'From: Wolfram Sang <wsa+renesas@sang-engineering.com>' != 'Signed-off-by: Wolfram Sang <wsa@kernel.org>'
     11 WARNING: From:/SoB: email address mismatch: 'From: Luke Nelson <lukenels@cs.washington.edu>' != 'Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>'
      8 WARNING: From:/SoB: email address mismatch: 'From: Christophe Leroy <christophe.leroy@c-s.fr>' != 'Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>'
      6 WARNING: From:/SoB: email address mismatch: 'From: Davidlohr Bueso <dave@stgolabs.net>' != 'Signed-off-by: Davidlohr Bueso <dbueso@suse.de>'
      5 WARNING: Missing Signed-off-by: line by nominal patch author '"Paul A. Clarke" <pc@us.ibm.com>'
      4 WARNING: Missing Signed-off-by: line by nominal patch author 'Huang Ying <ying.huang@intel.com>'
      3 WARNING: Missing Signed-off-by: line by nominal patch author '"Stephan Müller" <smueller@chronox.de>'

For the Missing Signed-off-by: lines above,
even after decoding, the email matches but
the names do not.

From: "Paul A. Clarke" <pc@us.ibm.com>
[...]
Signed-off-by: Paul Clarke <pc@us.ibm.com>

From: Huang Ying <ying.huang@intel.com>
[...]
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>

From: =?UTF-8?q?Stephan=20M=C3=BCller?= <smueller@chronox.de>
[...]
Signed-off-by: Stephan Mueller <smueller@chronox.de>

> Also, I would like to know if there are any more changes
> required for the current patch or if it is good to go?

I think it's fine.

cheers, Joe

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-20 15:09 ` Joe Perches
@ 2020-09-20 16:22   ` Dwaipayan Ray
  2020-09-20 16:54     ` Joe Perches
  0 siblings, 1 reply; 13+ messages in thread
From: Dwaipayan Ray @ 2020-09-20 16:22 UTC (permalink / raw)
  To: Joe Perches; +Cc: apw, linux-kernel-mentees, linux-kernel

On Sun, Sep 20, 2020 at 8:39 PM Joe Perches <joe@perches.com> wrote:
>
> On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> > Checkpatch did not handle cases where the author From: header
> > was split into multiple lines. The author identity could not
> > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > warning.
>
> Hi Dwaipayan.
>
> > A typical example is Commit e33bcbab16d1 ("tee: add support for
> > session's client UUID generation"). When checkpatch was run on
> > this commit, it displayed:
> >
> > "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
> > patch author ''"
> >
> > This was due to split header lines not being handled properly and
> > the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
> > if missing author Signed-off-by"):
> >
> > "Split From: headers are not fully handled: only the first part
> > is compared."
> >
> > Support split From: headers by correctly parsing the header
> > extension lines. RFC 2822, Section-2.2.3 stated that each extended
> > line must start with a WSP character (a space or htab). The solution
> > was therefore to concatenate the lines which start with a WSP to
> > get the correct long header.
>
> This is a good commit message, though I believe the
> latest rfc is 5322.  I'm not sure there is any real
> difference in the referenced section though.
>
> While your patch seems to work for git format-email,
> other emailers seem to set headers that have multiple
> whitespace chars that should be collapsed into a
> single space.
>
> I think you'll find that the eliding all whitespace
> after header folding causes mismatches for emails.
>
> For instance:
>
> From:   "=?UTF-8?q?Christian=20K=C3=B6nig?="
>         <ckoenig.leichtzumerken@gmail.com>
>
> Always inserting a single space if there is any
> whitespace after the folding WSP might be better
> otherwise this is decoded as
>
> From: "Christian König"<ckoenig.leichtzumerken@gmail.com>
>

Hi,
I think eliding all whitespaces shouldn't cause an issue
because at the end of the From: header parser block,
there is a call to reformat_email($author).

   $author =~ s/"//g;
   $author = reformat_email($author);

The subroutine reformat_email reparses the author string such
that the correct name <address> format is maintainined.

In revision b3b33d3c43bb,
line 1206:
sub reformat_email {
    my ($email) = @_;
    my ($email_name, $name_comment, $email_address,
$comment) = parse_email($email);
    return format_email($email_name, $email_address);
}

And I also checked the format_email subroutine:
line 1997:
if ("$name" eq "") {
    $formatted_email = "$address";
} else {
    $formatted_email = "$name <$address>";
}
return $formatted_email;

So I think the author string is basically reconstructed to
maintain the correct format.

As you pointed out, at first the author string might be:
  "Christian König"<ckoenig.leichtzumerken@gmail.com>

But after reformat_email is called, $author should be:
  Christian König <ckoenig.leichtzumerken@gmail.com>

So, I think there won't be any problem. Is my
observation correct?


> What I have does a bit more by saving any post-folding
>
> "From: <name and email address>"
>
> and comparing that to any "name and perhaps different
> email address" in a Signed-off-by: line.
>
> A new message is emitted if the name matches but the
> email address is different.
>
> Perhaps it's reasonable to apply your patch and then
> update it with something like the below:
> ---
>  scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 3e474072aa90..1ecc179e938d 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1240,6 +1240,15 @@ sub same_email_addresses {
>                $email1_address eq $email2_address;
>  }
>
> +sub same_email_names {
> +       my ($email1, $email2) = @_;
> +
> +       my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
> +       my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
> +
> +       return $email1_name eq $email2_name;
> +}
> +
>  sub which {
>         my ($bin) = @_;
>
> @@ -2679,20 +2688,32 @@ sub process {
>                 }
>
>  # Check the patch for a From:
> -               if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> +               if ($line =~ /^From:\s*(.*)/i) {
>                         $author = $1;
> -                       $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> +                       my $curline = $linenr;
> +                       while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
> +                               $author .= ' ' if (defined($1));
> +                               $author .= "$2";
> +                       }
> +                       if ($author =~ /=\?utf-8\?/i) {
> +                               $author = decode("MIME-Header", $author);
> +                               $author = encode("utf8", $author);
> +                       }
> +
>                         $author =~ s/"//g;
>                         $author = reformat_email($author);
>                 }
>
>  # Check the patch for a signoff:
>                 if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
> +                       my $sig = $1;
>                         $signoff++;
>                         $in_commit_log = 0;
>                         if ($author ne '') {
> -                               if (same_email_addresses($1, $author)) {
> -                                       $authorsignoff = 1;
> +                               if (same_email_addresses($sig, $author)) {
> +                                       $authorsignoff = "1";
> +                               } elsif (same_email_names($sig, $author)) {
> +                                       $authorsignoff = $sig;
>                                 }
>                         }
>                 }
> @@ -6937,6 +6958,9 @@ sub process {
>                 } elsif (!$authorsignoff) {
>                         WARN("NO_AUTHOR_SIGN_OFF",
>                              "Missing Signed-off-by: line by nominal patch author '$author'\n");
> +               } elsif ($authorsignoff ne "1") {
> +                       WARN("NO_AUTHOR_SIGN_OFF",
> +                            "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
>                 }
>         }
>
>

Yes, this is definitely more logical !
I was actually hoping to talk with you on this.

The code you sent better handles name mismatches when
email addresses are same. But I also have found several
such commits in which the author have signed off using
a different email address than the one which he/she used
to send the patch.

For example, Lukas checked commits between v5.4 and
v5.8 and he found:
    175 Missing Signed-off-by: line by nominal patch author
    'Daniel Vetter <daniel.vetter@ffwll.ch>'

Infact in all of those commits he signed off using a different
mail, Daniel Vetter <daniel.vetter@intel.com>.

So is it possible to resolve these using perhaps .mailmap
entries? Or should only the name mismatch part be better
handled? Or perhaps both?

Also, I would like to know if there are any more changes
required for the current patch or if it is good to go?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-20  9:17 Dwaipayan Ray
@ 2020-09-20 15:09 ` Joe Perches
  2020-09-20 16:22   ` Dwaipayan Ray
  2020-09-20 17:39 ` Joe Perches
  2020-09-21  7:49 ` Lukas Bulwahn
  2 siblings, 1 reply; 13+ messages in thread
From: Joe Perches @ 2020-09-20 15:09 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: apw, linux-kernel-mentees, linux-kernel

On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> Checkpatch did not handle cases where the author From: header
> was split into multiple lines. The author identity could not
> be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> warning.

Hi Dwaipayan.

> A typical example is Commit e33bcbab16d1 ("tee: add support for
> session's client UUID generation"). When checkpatch was run on
> this commit, it displayed:
> 
> "WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
> patch author ''"
> 
> This was due to split header lines not being handled properly and
> the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
> if missing author Signed-off-by"):
> 
> "Split From: headers are not fully handled: only the first part
> is compared."
> 
> Support split From: headers by correctly parsing the header
> extension lines. RFC 2822, Section-2.2.3 stated that each extended
> line must start with a WSP character (a space or htab). The solution
> was therefore to concatenate the lines which start with a WSP to
> get the correct long header.

This is a good commit message, though I believe the
latest rfc is 5322.  I'm not sure there is any real
difference in the referenced section though.

While your patch seems to work for git format-email,
other emailers seem to set headers that have multiple
whitespace chars that should be collapsed into a
single space.

I think you'll find that the eliding all whitespace
after header folding causes mismatches for emails.

For instance:

From:   "=?UTF-8?q?Christian=20K=C3=B6nig?=" 
        <ckoenig.leichtzumerken@gmail.com>

Always inserting a single space if there is any
whitespace after the folding WSP might be better
otherwise this is decoded as

From: "Christian König"<ckoenig.leichtzumerken@gmail.com>

What I have does a bit more by saving any post-folding

"From: <name and email address>"

and comparing that to any "name and perhaps different
email address" in a Signed-off-by: line.

A new message is emitted if the name matches but the
email address is different.

Perhaps it's reasonable to apply your patch and then
update it with something like the below:
---
 scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3e474072aa90..1ecc179e938d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1240,6 +1240,15 @@ sub same_email_addresses {
 	       $email1_address eq $email2_address;
 }
 
+sub same_email_names {
+	my ($email1, $email2) = @_;
+
+	my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
+	my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
+
+	return $email1_name eq $email2_name;
+}
+
 sub which {
 	my ($bin) = @_;
 
@@ -2679,20 +2688,32 @@ sub process {
 		}
 
 # Check the patch for a From:
-		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
+		if ($line =~ /^From:\s*(.*)/i) {
 			$author = $1;
-			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
+			my $curline = $linenr;
+			while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
+				$author .= ' ' if (defined($1));
+				$author .= "$2";
+			}
+			if ($author =~ /=\?utf-8\?/i) {
+				$author = decode("MIME-Header", $author);
+				$author = encode("utf8", $author);
+			}
+
 			$author =~ s/"//g;
 			$author = reformat_email($author);
 		}
 
 # Check the patch for a signoff:
 		if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
+			my $sig = $1;
 			$signoff++;
 			$in_commit_log = 0;
 			if ($author ne '') {
-				if (same_email_addresses($1, $author)) {
-					$authorsignoff = 1;
+				if (same_email_addresses($sig, $author)) {
+					$authorsignoff = "1";
+				} elsif (same_email_names($sig, $author)) {
+					$authorsignoff = $sig;
 				}
 			}
 		}
@@ -6937,6 +6958,9 @@ sub process {
 		} elsif (!$authorsignoff) {
 			WARN("NO_AUTHOR_SIGN_OFF",
 			     "Missing Signed-off-by: line by nominal patch author '$author'\n");
+		} elsif ($authorsignoff ne "1") {
+			WARN("NO_AUTHOR_SIGN_OFF",
+			     "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
 		}
 	}
 

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header
@ 2020-09-20  9:17 Dwaipayan Ray
  2020-09-20 15:09 ` Joe Perches
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Dwaipayan Ray @ 2020-09-20  9:17 UTC (permalink / raw)
  To: joe; +Cc: apw, linux-kernel-mentees, linux-kernel, dwaipayanray1

Checkpatch did not handle cases where the author From: header
was split into multiple lines. The author identity could not
be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
warning.

A typical example is Commit e33bcbab16d1 ("tee: add support for
session's client UUID generation"). When checkpatch was run on
this commit, it displayed:

"WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
patch author ''"

This was due to split header lines not being handled properly and
the author himself wrote in Commit cd2614967d8b ("checkpatch: warn
if missing author Signed-off-by"):

"Split From: headers are not fully handled: only the first part
is compared."

Support split From: headers by correctly parsing the header
extension lines. RFC 2822, Section-2.2.3 stated that each extended
line must start with a WSP character (a space or htab). The solution
was therefore to concatenate the lines which start with a WSP to
get the correct long header.

Suggested-by: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/linux-kernel-mentees/f5d8124e54a50480b0a9fa638787bc29b6e09854.camel@perches.com/
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 504d2e431c60..9e65d21456f1 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2661,6 +2661,10 @@ sub process {
 # Check the patch for a From:
 		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
 			$author = $1;
+			my $curline = $linenr;
+			while(defined($rawlines[$curline]) && ($rawlines[$curline++] =~ /^[ \t]\s*(.*)/)) {
+				$author .= $1;
+			}
 			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
 			$author =~ s/"//g;
 			$author = reformat_email($author);
-- 
2.27.0

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-21  9:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-19 20:47 [Linux-kernel-mentees] [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
2020-09-19 21:15 ` Dwaipayan Ray
2020-09-20  8:11   ` Lukas Bulwahn
2020-09-20  8:01 ` Lukas Bulwahn
2020-09-20  9:17 Dwaipayan Ray
2020-09-20 15:09 ` Joe Perches
2020-09-20 16:22   ` Dwaipayan Ray
2020-09-20 16:54     ` Joe Perches
2020-09-21  7:39       ` Lukas Bulwahn
2020-09-21  9:47         ` Joe Perches
2020-09-20 17:39 ` Joe Perches
2020-09-21  7:49 ` Lukas Bulwahn
2020-09-21  8:31   ` Dwaipayan Ray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).