linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
@ 2020-09-19  8:12 Dwaipayan Ray
  2020-09-19 17:26 ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Dwaipayan Ray @ 2020-09-19  8:12 UTC (permalink / raw)
  To: joe; +Cc: apw, linux-kernel-mentees, linux-kernel, dwaipayanray1

Checkpatch did not handle cases where the author From: header
was split into two lines. The author string went empty and
checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.

Support split From: headers in AUTHOR_SIGN_OFF check by adding
an additional clause to resolve author identity in such cases.

Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 504d2e431c60..86975baead22 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1210,6 +1210,16 @@ sub reformat_email {
 	return format_email($email_name, $email_address);
 }
 
+sub format_author_email {
+	my ($email, $from) = @_;
+
+	$email = encode("utf8", $email) if ($from =~ /=\?utf-8\?/i);
+	$email =~ s/"//g;
+	$email = reformat_email($email);
+
+	return $email;
+}
+
 sub same_email_addresses {
 	my ($email1, $email2) = @_;
 
@@ -2347,6 +2357,7 @@ sub process {
 	my $signoff = 0;
 	my $author = '';
 	my $authorsignoff = 0;
+	my $prevheader = '';
 	my $is_patch = 0;
 	my $is_binding_patch = -1;
 	my $in_header_lines = $file ? 0 : 1;
@@ -2658,12 +2669,21 @@ sub process {
 			}
 		}
 
+# Check the patch for a split From:
+		if($prevheader ne '') {
+			if ($author eq '' && decode("MIME-Header", $prevheader) =~ /^From:\s*(.*)/) {
+				my $email = $1.$line;
+				$author = format_author_email($email, $prevheader);
+			}
+			$prevheader = '';
+		}
+
 # Check the patch for a From:
 		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
-			$author = $1;
-			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
-			$author =~ s/"//g;
-			$author = reformat_email($author);
+			$author = format_author_email($1, $line);
+			if($author eq '') {
+				$prevheader = $line;
+			}
 		}
 
 # Check the patch for a signoff:
-- 
2.27.0

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19  8:12 [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
@ 2020-09-19 17:26 ` Joe Perches
  2020-09-19 18:12   ` Lukas Bulwahn
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2020-09-19 17:26 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: apw, linux-kernel-mentees, linux-kernel

On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> Checkpatch did not handle cases where the author From: header
> was split into two lines. The author string went empty and
> checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.

It's good to provide an example where the current code
doesn't work.

It likely would be better to do this by searching forward for
any extension lines after a "^From:' rather than searching
backwards as there can be any number of extension lines.

> Support split From: headers in AUTHOR_SIGN_OFF check by adding
> an additional clause to resolve author identity in such cases.
> 
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> ---
>  scripts/checkpatch.pl | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 504d2e431c60..86975baead22 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1210,6 +1210,16 @@ sub reformat_email {
>  	return format_email($email_name, $email_address);
>  }
>  
> +sub format_author_email {
> +	my ($email, $from) = @_;
> +
> +	$email = encode("utf8", $email) if ($from =~ /=\?utf-8\?/i);
> +	$email =~ s/"//g;
> +	$email = reformat_email($email);
> +
> +	return $email;
> +}
> +
>  sub same_email_addresses {
>  	my ($email1, $email2) = @_;
>  
> @@ -2347,6 +2357,7 @@ sub process {
>  	my $signoff = 0;
>  	my $author = '';
>  	my $authorsignoff = 0;
> +	my $prevheader = '';
>  	my $is_patch = 0;
>  	my $is_binding_patch = -1;
>  	my $in_header_lines = $file ? 0 : 1;
> @@ -2658,12 +2669,21 @@ sub process {
>  			}
>  		}
>  
> +# Check the patch for a split From:
> +		if($prevheader ne '') {
> +			if ($author eq '' && decode("MIME-Header", $prevheader) =~ /^From:\s*(.*)/) {
> +				my $email = $1.$line;
> +				$author = format_author_email($email, $prevheader);
> +			}
> +			$prevheader = '';
> +		}
> +
>  # Check the patch for a From:
>  		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> -			$author = $1;
> -			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> -			$author =~ s/"//g;
> -			$author = reformat_email($author);
> +			$author = format_author_email($1, $line);
> +			if($author eq '') {
> +				$prevheader = $line;
> +			}
>  		}
>  
>  # Check the patch for a signoff:

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 17:26 ` Joe Perches
@ 2020-09-19 18:12   ` Lukas Bulwahn
  2020-09-19 18:19     ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Lukas Bulwahn @ 2020-09-19 18:12 UTC (permalink / raw)
  To: Joe Perches; +Cc: apw, Dwaipayan Ray, linux-kernel-mentees, linux-kernel



On Sat, 19 Sep 2020, Joe Perches wrote:

> On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> > Checkpatch did not handle cases where the author From: header
> > was split into two lines. The author string went empty and
> > checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.
> 
> It's good to provide an example where the current code
> doesn't work.
>

Joe, as this is a linux-kernel-mentees patch, we discussed that before 
reaching out to you; you can find Dwaipayan's own evaluation here:

https://lore.kernel.org/linux-kernel-mentees/CABJPP5BOTG0QLFSaRJTb2vAZ_hJf229OAQihHKG4sYd35i_WMw@mail.gmail.com/

Dwaipayan, Joe's comment is still valid; it would be good to describe
the reasons why patches might have split lines (as far as see, long 
encodings for non-ascii names).

I will run my own evaluation of checkpatch.pl before and after patch 
application on Monday and then check if I can confirm Dwaipayan's results.

> It likely would be better to do this by searching forward for
> any extension lines after a "^From:' rather than searching
> backwards as there can be any number of extension lines.
>

Just to sure what you are talking about...

You mean just to access the next line through the lines array, rather 
than using prevheader and trying to decode that one line twice.

I agree the logic is a bit redundant and complicated at the moment.

Once prevheader is non-empty, it already clear that author is '' and 
prevheader decodes with that match, because that is the only way to
make prevheader non-empty in the first place; at least as far I see it 
right now.


Lukas
 
> > Support split From: headers in AUTHOR_SIGN_OFF check by adding
> > an additional clause to resolve author identity in such cases.
> > 
> > Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> > ---
> >  scripts/checkpatch.pl | 28 ++++++++++++++++++++++++----
> >  1 file changed, 24 insertions(+), 4 deletions(-)
> > 
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 504d2e431c60..86975baead22 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -1210,6 +1210,16 @@ sub reformat_email {
> >  	return format_email($email_name, $email_address);
> >  }
> >  
> > +sub format_author_email {
> > +	my ($email, $from) = @_;
> > +
> > +	$email = encode("utf8", $email) if ($from =~ /=\?utf-8\?/i);
> > +	$email =~ s/"//g;
> > +	$email = reformat_email($email);
> > +
> > +	return $email;
> > +}
> > +
> >  sub same_email_addresses {
> >  	my ($email1, $email2) = @_;
> >  
> > @@ -2347,6 +2357,7 @@ sub process {
> >  	my $signoff = 0;
> >  	my $author = '';
> >  	my $authorsignoff = 0;
> > +	my $prevheader = '';
> >  	my $is_patch = 0;
> >  	my $is_binding_patch = -1;
> >  	my $in_header_lines = $file ? 0 : 1;
> > @@ -2658,12 +2669,21 @@ sub process {
> >  			}
> >  		}
> >  
> > +# Check the patch for a split From:
> > +		if($prevheader ne '') {
> > +			if ($author eq '' && decode("MIME-Header", $prevheader) =~ /^From:\s*(.*)/) {
> > +				my $email = $1.$line;
> > +				$author = format_author_email($email, $prevheader);
> > +			}
> > +			$prevheader = '';
> > +		}
> > +
> >  # Check the patch for a From:
> >  		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > -			$author = $1;
> > -			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > -			$author =~ s/"//g;
> > -			$author = reformat_email($author);
> > +			$author = format_author_email($1, $line);
> > +			if($author eq '') {
> > +				$prevheader = $line;
> > +			}
> >  		}
> >  
> >  # Check the patch for a signoff:
> 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 18:12   ` Lukas Bulwahn
@ 2020-09-19 18:19     ` Joe Perches
  2020-09-19 18:36       ` Lukas Bulwahn
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2020-09-19 18:19 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: apw, Dwaipayan Ray, linux-kernel-mentees, linux-kernel

On Sat, 2020-09-19 at 20:12 +0200, Lukas Bulwahn wrote:
> 
> On Sat, 19 Sep 2020, Joe Perches wrote:
> 
> > On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> > > Checkpatch did not handle cases where the author From: header
> > > was split into two lines. The author string went empty and
> > > checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.
> > 
> > It's good to provide an example where the current code
> > doesn't work.
> > 
> 
> Joe, as this is a linux-kernel-mentees patch, we discussed that before 
> reaching out to you; you can find Dwaipayan's own evaluation here:
> 
> https://lore.kernel.org/linux-kernel-mentees/CABJPP5BOTG0QLFSaRJTb2vAZ_hJf229OAQihHKG4sYd35i_WMw@mail.gmail.com/
> 
> Dwaipayan, Joe's comment is still valid; it would be good to describe
> the reasons why patches might have split lines (as far as see, long 
> encodings for non-ascii names).
> 
> I will run my own evaluation of checkpatch.pl before and after patch 
> application on Monday and then check if I can confirm Dwaipayan's results.
> 
> > It likely would be better to do this by searching forward for
> > any extension lines after a "^From:' rather than searching
> > backwards as there can be any number of extension lines.
> > 
> 
> Just to sure what you are talking about...
> 
> You mean just to access the next line through the lines array, rather 
> than using prevheader and trying to decode that one line twice.
> 
> I agree the logic is a bit redundant and complicated at the moment.
> 
> Once prevheader is non-empty, it already clear that author is '' and 
> prevheader decodes with that match, because that is the only way to
> make prevheader non-empty in the first place; at least as far I see it 
> right now.

Yeah, something like this (completely untested):
---
 scripts/checkpatch.pl | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3e474072aa90..2c710d05b184 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2679,9 +2679,13 @@ sub process {
 		}
 
 # Check the patch for a From:
-		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
+		if ($line =~ /^From:\s*(.*)/i) {
 			$author = $1;
-			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
+			my $curline = $linenr;
+			while (defined($rawlines[$curline] && $rawlines[$curline++] =~ /^ \s*(.*)/) {
+				$author .= $1;
+			}
+			$author = encode("utf8", $author) if ($author =~ /=\?utf-8\?/i);
 			$author =~ s/"//g;
 			$author = reformat_email($author);
 		}


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 18:19     ` Joe Perches
@ 2020-09-19 18:36       ` Lukas Bulwahn
  2020-09-19 19:38         ` Dwaipayan Ray
  0 siblings, 1 reply; 7+ messages in thread
From: Lukas Bulwahn @ 2020-09-19 18:36 UTC (permalink / raw)
  To: Joe Perches; +Cc: apw, linux-kernel-mentees, linux-kernel, Dwaipayan Ray



On Sat, 19 Sep 2020, Joe Perches wrote:

> On Sat, 2020-09-19 at 20:12 +0200, Lukas Bulwahn wrote:
> > 
> > On Sat, 19 Sep 2020, Joe Perches wrote:
> > 
> > > On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> > > > Checkpatch did not handle cases where the author From: header
> > > > was split into two lines. The author string went empty and
> > > > checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.
> > > 
> > > It's good to provide an example where the current code
> > > doesn't work.
> > > 
> > 
> > Joe, as this is a linux-kernel-mentees patch, we discussed that before 
> > reaching out to you; you can find Dwaipayan's own evaluation here:
> > 
> > https://lore.kernel.org/linux-kernel-mentees/CABJPP5BOTG0QLFSaRJTb2vAZ_hJf229OAQihHKG4sYd35i_WMw@mail.gmail.com/
> > 
> > Dwaipayan, Joe's comment is still valid; it would be good to describe
> > the reasons why patches might have split lines (as far as see, long 
> > encodings for non-ascii names).
> > 
> > I will run my own evaluation of checkpatch.pl before and after patch 
> > application on Monday and then check if I can confirm Dwaipayan's results.
> > 
> > > It likely would be better to do this by searching forward for
> > > any extension lines after a "^From:' rather than searching
> > > backwards as there can be any number of extension lines.
> > > 
> > 
> > Just to sure what you are talking about...
> > 
> > You mean just to access the next line through the lines array, rather 
> > than using prevheader and trying to decode that one line twice.
> > 
> > I agree the logic is a bit redundant and complicated at the moment.
> > 
> > Once prevheader is non-empty, it already clear that author is '' and 
> > prevheader decodes with that match, because that is the only way to
> > make prevheader non-empty in the first place; at least as far I see it 
> > right now.
> 
> Yeah, something like this (completely untested):
> ---
>  scripts/checkpatch.pl | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 3e474072aa90..2c710d05b184 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2679,9 +2679,13 @@ sub process {
>  		}
>  
>  # Check the patch for a From:
> -		if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> +		if ($line =~ /^From:\s*(.*)/i) {
>  			$author = $1;
> -			$author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> +			my $curline = $linenr;
> +			while (defined($rawlines[$curline] && $rawlines[$curline++] =~ /^ \s*(.*)/) {
> +				$author .= $1;
> +			}
> +			$author = encode("utf8", $author) if ($author =~ /=\?utf-8\?/i);
>  			$author =~ s/"//g;
>  			$author = reformat_email($author);
>  		}
>

Yeah, I get how you would like to see that being implemented. I will work 
with Dwaipayan to get that properly implemented, properly described and 
tested.

But let us keep the fun of that task to Dwaipayan... that is what a 
mentorship is all about :)

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 18:36       ` Lukas Bulwahn
@ 2020-09-19 19:38         ` Dwaipayan Ray
  2020-09-19 19:47           ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Dwaipayan Ray @ 2020-09-19 19:38 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: Joe Perches, apw, linux-kernel-mentees, linux-kernel

On Sun, Sep 20, 2020 at 12:06 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
>
>
> On Sat, 19 Sep 2020, Joe Perches wrote:
>
> > On Sat, 2020-09-19 at 20:12 +0200, Lukas Bulwahn wrote:
> > >
> > > On Sat, 19 Sep 2020, Joe Perches wrote:
> > >
> > > > On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> > > > > Checkpatch did not handle cases where the author From: header
> > > > > was split into two lines. The author string went empty and
> > > > > checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.
> > > >
> > > > It's good to provide an example where the current code
> > > > doesn't work.
> > > >
> > >
> > > Joe, as this is a linux-kernel-mentees patch, we discussed that before
> > > reaching out to you; you can find Dwaipayan's own evaluation here:
> > >
> > > https://lore.kernel.org/linux-kernel-mentees/CABJPP5BOTG0QLFSaRJTb2vAZ_hJf229OAQihHKG4sYd35i_WMw@mail.gmail.com/
> > >
> > > Dwaipayan, Joe's comment is still valid; it would be good to describe
> > > the reasons why patches might have split lines (as far as see, long
> > > encodings for non-ascii names).
> > >
> > > I will run my own evaluation of checkpatch.pl before and after patch
> > > application on Monday and then check if I can confirm Dwaipayan's results.
> > >
> > > > It likely would be better to do this by searching forward for
> > > > any extension lines after a "^From:' rather than searching
> > > > backwards as there can be any number of extension lines.
> > > >
> > >
> > > Just to sure what you are talking about...
> > >
> > > You mean just to access the next line through the lines array, rather
> > > than using prevheader and trying to decode that one line twice.
> > >
> > > I agree the logic is a bit redundant and complicated at the moment.
> > >
> > > Once prevheader is non-empty, it already clear that author is '' and
> > > prevheader decodes with that match, because that is the only way to
> > > make prevheader non-empty in the first place; at least as far I see it
> > > right now.
> >
> > Yeah, something like this (completely untested):
> > ---
> >  scripts/checkpatch.pl | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 3e474072aa90..2c710d05b184 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -2679,9 +2679,13 @@ sub process {
> >               }
> >
> >  # Check the patch for a From:
> > -             if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > +             if ($line =~ /^From:\s*(.*)/i) {
> >                       $author = $1;
> > -                     $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > +                     my $curline = $linenr;
> > +                     while (defined($rawlines[$curline] && $rawlines[$curline++] =~ /^ \s*(.*)/) {
> > +                             $author .= $1;
> > +                     }
> > +                     $author = encode("utf8", $author) if ($author =~ /=\?utf-8\?/i);
> >                       $author =~ s/"//g;
> >                       $author = reformat_email($author);
> >               }
> >
>

Hi,

Yeah I think the backwards checking was pretty redundant after all. If the
extended encoding went too long, the From: header would be split into
more than two lines and my proposed solution would fail.

Thanks for the heads up, Joe!

> Yeah, I get how you would like to see that being implemented. I will work
> with Dwaipayan to get that properly implemented, properly described and
> tested.
>
> But let us keep the fun of that task to Dwaipayan... that is what a
> mentorship is all about :)
>
> Lukas

Yes definitely, the task is interesting for me, and I would like to solve
it in a proper way.

As for the fix, shouldn't we stop the author string concatenation once
an email address is found? something like:

  last if  $rawlines[$curline] = ~/^\s*(\S+\@\S+)\s*/

I will update the patch and sync up with Lukas on this.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
  2020-09-19 19:38         ` Dwaipayan Ray
@ 2020-09-19 19:47           ` Joe Perches
  0 siblings, 0 replies; 7+ messages in thread
From: Joe Perches @ 2020-09-19 19:47 UTC (permalink / raw)
  To: Dwaipayan Ray, Lukas Bulwahn; +Cc: apw, linux-kernel-mentees, linux-kernel

On Sun, 2020-09-20 at 01:08 +0530, Dwaipayan Ray wrote:
> On Sun, Sep 20, 2020 at 12:06 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> > On Sat, 19 Sep 2020, Joe Perches wrote:
> > > On Sat, 2020-09-19 at 20:12 +0200, Lukas Bulwahn wrote:
> > > > On Sat, 19 Sep 2020, Joe Perches wrote:
> > > > > On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> > > > > > Checkpatch did not handle cases where the author From: header
> > > > > > was split into two lines. The author string went empty and
> > > > > > checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.
> > > > > 
> > > > > It's good to provide an example where the current code
> > > > > doesn't work.
> > > > > 
> > > > Joe, as this is a linux-kernel-mentees patch, we discussed that before
> > > > reaching out to you; you can find Dwaipayan's own evaluation here:
> > > > 
> > > > https://lore.kernel.org/linux-kernel-mentees/CABJPP5BOTG0QLFSaRJTb2vAZ_hJf229OAQihHKG4sYd35i_WMw@mail.gmail.com/
> > > > 
> > > > Dwaipayan, Joe's comment is still valid; it would be good to describe
> > > > the reasons why patches might have split lines (as far as see, long
> > > > encodings for non-ascii names).
> > > > 
> > > > I will run my own evaluation of checkpatch.pl before and after patch
> > > > application on Monday and then check if I can confirm Dwaipayan's results.
> > > > 
> > > > > It likely would be better to do this by searching forward for
> > > > > any extension lines after a "^From:' rather than searching
> > > > > backwards as there can be any number of extension lines.
> > > > > 
> > > > Just to sure what you are talking about...
> > > > 
> > > > You mean just to access the next line through the lines array, rather
> > > > than using prevheader and trying to decode that one line twice.
> > > > 
> > > > I agree the logic is a bit redundant and complicated at the moment.
> > > > 
> > > > Once prevheader is non-empty, it already clear that author is '' and
> > > > prevheader decodes with that match, because that is the only way to
> > > > make prevheader non-empty in the first place; at least as far I see it
> > > > right now.
> > > 
> > > Yeah, something like this (completely untested):
> > > ---
> > >  scripts/checkpatch.pl | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 3e474072aa90..2c710d05b184 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -2679,9 +2679,13 @@ sub process {
> > >               }
> > > 
> > >  # Check the patch for a From:
> > > -             if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > > +             if ($line =~ /^From:\s*(.*)/i) {
> > >                       $author = $1;
> > > -                     $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > > +                     my $curline = $linenr;
> > > +                     while (defined($rawlines[$curline] && $rawlines[$curline++] =~ /^ \s*(.*)/) {
> > > +                             $author .= $1;
> > > +                     }
> > > +                     $author = encode("utf8", $author) if ($author =~ /=\?utf-8\?/i);
> > >                       $author =~ s/"//g;
> > >                       $author = reformat_email($author);
> > >               }
> > > 
> 
> Hi,
> 
> Yeah I think the backwards checking was pretty redundant after all. If the
> extended encoding went too long, the From: header would be split into
> more than two lines and my proposed solution would fail.
> 
> Thanks for the heads up, Joe!
> 
> > Yeah, I get how you would like to see that being implemented. I will work
> > with Dwaipayan to get that properly implemented, properly described and
> > tested.
> > 
> > But let us keep the fun of that task to Dwaipayan... that is what a
> > mentorship is all about :)
> > 
> > Lukas
> 
> Yes definitely, the task is interesting for me, and I would like to solve
> it in a proper way.
> 
> As for the fix, shouldn't we stop the author string concatenation once
> an email address is found? something like:
> 
>   last if  $rawlines[$curline] = ~/^\s*(\S+\@\S+)\s*/

Probably not.

I think it should follow the rfc standard with extension
lines starting with a space.

See rfc 5322, 2.2.3 Long Header Fields

> I will update the patch and sync up with Lukas on this.

Enjoy.

I believe I now have a working version, we can compare later.

cheers, Joe

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-19 19:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-19  8:12 [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
2020-09-19 17:26 ` Joe Perches
2020-09-19 18:12   ` Lukas Bulwahn
2020-09-19 18:19     ` Joe Perches
2020-09-19 18:36       ` Lukas Bulwahn
2020-09-19 19:38         ` Dwaipayan Ray
2020-09-19 19:47           ` Joe Perches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).