All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
@ 2017-03-21 16:30 John 'Warthog9' Hawley (VMware)
  2017-03-21 18:31 ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: John 'Warthog9' Hawley (VMware) @ 2017-03-21 16:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joe Perches, Andy Whitcroft, Darren Hart (VMware),
	John 'Warthog9' Hawley (VMware)

Spamassassin sticks a long (~79 character) long string after a
line that has a single space in it. The line with space causes
checkpatch to erroniously think that it's in the content body, as
opposed to headers and thus flag a mail header as an unwrapped long
comment line.

This flags when X-Spam-Report is found, till an e-mail
body indicator (blank line, /^[/n/r]*/) is found, blocking setting
$in_commit_log.  When found, unsets itself allowing the rest of
the detection to function normally.

Reported-by: Darren Hart (VMware) <dvhart@infradead.org>
Signed-off-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
---
 scripts/checkpatch.pl | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index baa3c7b..d2ce89a 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2134,6 +2134,7 @@ sub process {
 	my $signoff = 0;
 	my $is_patch = 0;
 	my $in_header_lines = $file ? 0 : 1;
+	my $found_spam_header = 0;
 	my $in_commit_log = 0;		#Scanning lines before patch
 	my $has_commit_log = 0;		#Encountered lines before patch
 	my $commit_log_possible_stack_dump = 0;
@@ -2279,6 +2280,12 @@ sub process {
 
 		my $rawline = $rawlines[$linenr - 1];
 
+#if we encounter a spamassassin mail header, mark it
+		if ($in_header_lines == 1 && $line =~ /^X-Spam-Report:/) {
+			#mail header found, this needs to be flagged
+			$found_spam_header = 1;
+		}
+
 #extract the line range in the file after the patch is applied
 		if (!$in_commit_log &&
 		    $line =~ /^\@\@ -\d+(?:,\d+)? \+(\d+)(,(\d+))? \@\@/) {
@@ -2627,9 +2634,15 @@ sub process {
 
 # Check if it's the start of a commit log
 # (not a header line and we haven't seen the patch filename)
+		if ($in_header_lines && $found_spam_header && $line =~ /^[\n\r]*$/) {
+			# we are now past the header info that could be confusing
+			$found_spam_header = 0;
+		}
+
 		if ($in_header_lines && $realfile =~ /^$/ &&
 		    !($rawline =~ /^\s+\S/ ||
-		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
+		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i) &&
+		    !$found_spam_header) {
 			$in_header_lines = 0;
 			$in_commit_log = 1;
 			$has_commit_log = 1;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-21 16:30 [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings John 'Warthog9' Hawley (VMware)
@ 2017-03-21 18:31 ` Joe Perches
  2017-03-22 15:25   ` Darren Hart
  2017-03-24 20:14   ` John 'Warthog9' Hawley
  0 siblings, 2 replies; 8+ messages in thread
From: Joe Perches @ 2017-03-21 18:31 UTC (permalink / raw)
  To: John 'Warthog9' Hawley (VMware), linux-kernel
  Cc: Andy Whitcroft, Darren Hart (VMware)

On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote:
> Spamassassin sticks a long (~79 character) long string after a
> line that has a single space in it. The line with space causes
> checkpatch to erroniously think that it's in the content body, as
> opposed to headers and thus flag a mail header as an unwrapped long
> comment line.

If the spammassassin header is like

email-header-n: foo
email-header-m: bar
 
X-Spam-Report: bar

Does that form follow rfc 5322?

If it does then any email header could have that
form and the header wrapping test should be
updated from

		if ($in_header_lines && $realfile =~ /^$/ &&
		    !($rawline =~ /^\s+\S/ ||
		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
			$in_header_lines = 0;
			$in_commit_log = 1;
			$has_commit_log = 1;
		}

to something like

		if ($in_header_lines && $realfile =~ /^$/ &&
		    !($rawline =~ /^ (?:\s*\S|$)/ ||
		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-21 18:31 ` Joe Perches
@ 2017-03-22 15:25   ` Darren Hart
  2017-03-22 18:17     ` Joe Perches
  2017-03-24 20:14   ` John 'Warthog9' Hawley
  1 sibling, 1 reply; 8+ messages in thread
From: Darren Hart @ 2017-03-22 15:25 UTC (permalink / raw)
  To: Joe Perches
  Cc: John 'Warthog9' Hawley (VMware), linux-kernel, Andy Whitcroft

On Tue, Mar 21, 2017 at 11:31:08AM -0700, Joe Perches wrote:
> On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote:
> > Spamassassin sticks a long (~79 character) long string after a
> > line that has a single space in it. The line with space causes
> > checkpatch to erroniously think that it's in the content body, as
> > opposed to headers and thus flag a mail header as an unwrapped long
> > comment line.
> 
> If the spammassassin header is like
> 
> email-header-n: foo
> email-header-m: bar
>  
> X-Spam-Report: bar

The specific content of the X-Spam-Report that triggers this for me,
from this patch for example, is:

=== 8< ===
X-Spam-Report: SpamAssassin version 3.4.1 on casper.infradead.org summary:
 Content analysis details:   (-1.9 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-TUID: alGBIuPZmqOj

=== >8 ===

The long ---- ----... line is over 75 characters and triggers the test
for long commit_log lines.

> 
> Does that form follow rfc 5322?

By my reading, this is governed by the long header fields defined by
2.2.3, with whitespace folding defined as "a CRLF may be inserted before
any WSP."

> 
> If it does then any email header could have that
> form and the header wrapping test should be

Yes, agreed.

So the logic we want is:

If we are in headers and we detect a CRLF and the next line starts with a WSP,
then we are still in headers (and therefor not in the commit log). The CRLF
information does not appear to be available as it is replaced with just \n.

> updated from
> 
> 		if ($in_header_lines && $realfile =~ /^$/ &&
> 		    !($rawline =~ /^\s+\S/ ||
> 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
> 			$in_header_lines = 0;
> 			$in_commit_log = 1;
> 			$has_commit_log = 1;
> 		}
> 
> to something like
> 
> 		if ($in_header_lines && $realfile =~ /^$/ &&
> 		    !($rawline =~ /^ (?:\s*\S|$)/ ||

Hrm... lines that start with maybe a space followed by a : ... Why did you
introduce that part of the check?

Looking at this more closely, I was also not clear why the original test looked
for several spaces followed by non-space. What case is this for?

> 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
> 
> 

Thanks,

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-22 15:25   ` Darren Hart
@ 2017-03-22 18:17     ` Joe Perches
  2017-03-23  6:01       ` Darren Hart
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2017-03-22 18:17 UTC (permalink / raw)
  To: Darren Hart
  Cc: John 'Warthog9' Hawley (VMware), linux-kernel, Andy Whitcroft

On Wed, 2017-03-22 at 08:25 -0700, Darren Hart wrote:
> On Tue, Mar 21, 2017 at 11:31:08AM -0700, Joe Perches wrote:
> > On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote:
> > > Spamassassin sticks a long (~79 character) long string after a
> > > line that has a single space in it. The line with space causes
> > > checkpatch to erroniously think that it's in the content body, as
> > > opposed to headers and thus flag a mail header as an unwrapped long
> > > comment line.
> > 
> > If the spammassassin header is like
> > 
> > email-header-n: foo
> > email-header-m: bar
> >  
> > X-Spam-Report: bar
> 
> The specific content of the X-Spam-Report that triggers this for me,
> from this patch for example, is:
> 
> === 8< ===
> X-Spam-Report: SpamAssassin version 3.4.1 on casper.infradead.org summary:
>  Content analysis details:   (-1.9 points, 5.0 required)
>  
>   pts rule name              description
>  ---- ---------------------- --------------------------------------------------
>  -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
>                              [score: 0.0000]
> X-TUID: alGBIuPZmqOj
> 
> === >8 ===
> 
> The long ---- ----... line is over 75 characters and triggers the test
> for long commit_log lines.
> 
> > 
> > Does that form follow rfc 5322?
> 
> By my reading, this is governed by the long header fields defined by
> 2.2.3, with whitespace folding defined as "a CRLF may be inserted before
> any WSP."
> 
> > 
> > If it does then any email header could have that
> > form and the header wrapping test should be
> 
> Yes, agreed.
> 
> So the logic we want is:
> 
> If we are in headers and we detect a CRLF and the next line starts with a WSP,
> then we are still in headers (and therefor not in the commit log). The CRLF
> information does not appear to be available as it is replaced with just \n.
> 
> > updated from
> > 
> > 		if ($in_header_lines && $realfile =~ /^$/ &&
> > 		    !($rawline =~ /^\s+\S/ ||
> > 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
> > 			$in_header_lines = 0;
> > 			$in_commit_log = 1;
> > 			$has_commit_log = 1;
> > 		}
> > 
> > to something like
> > 
> > 		if ($in_header_lines && $realfile =~ /^$/ &&
> > 		    !($rawline =~ /^ (?:\s*\S|$)/ ||
> 
> Hrm... lines that start with maybe a space followed by a : ... Why did you
> introduce that part of the check?

The regex doesn't care about colons.
It's a perl non-capturing group.
https://perldoc.perl.org/perlretut.html#Non-capturing-groupings

> Looking at this more closely, I was also not clear why the original test looked
> for several spaces followed by non-space. What case is this for?

Not several spaces, one or more spaces then a non-space.
The only change here is allowing an initial space followed
by either:

	1: optional spaces, then non-space.
	2: EOL

I supposed you could argue that case 2 should
also allow optional spaces before EOL and the
test should be

		if ($in_header_lines && $realfile =~ /^$/ &&
		    !($rawline =~ /^\s+(?:\S|$)/ ||
...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-22 18:17     ` Joe Perches
@ 2017-03-23  6:01       ` Darren Hart
  2017-03-23  6:07         ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: Darren Hart @ 2017-03-23  6:01 UTC (permalink / raw)
  To: Joe Perches
  Cc: John 'Warthog9' Hawley (VMware), linux-kernel, Andy Whitcroft

On Wed, Mar 22, 2017 at 11:17:33AM -0700, Joe Perches wrote:
> On Wed, 2017-03-22 at 08:25 -0700, Darren Hart wrote:
> > On Tue, Mar 21, 2017 at 11:31:08AM -0700, Joe Perches wrote:
> > > On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote:
> > > > Spamassassin sticks a long (~79 character) long string after a
> > > > line that has a single space in it. The line with space causes
> > > > checkpatch to erroniously think that it's in the content body, as
> > > > opposed to headers and thus flag a mail header as an unwrapped long
> > > > comment line.
> > > 
> > > If the spammassassin header is like
> > > 
> > > email-header-n: foo
> > > email-header-m: bar
> > >  
> > > X-Spam-Report: bar
> > 
> > The specific content of the X-Spam-Report that triggers this for me,
> > from this patch for example, is:
> > 
> > === 8< ===
> > X-Spam-Report: SpamAssassin version 3.4.1 on casper.infradead.org summary:
> >  Content analysis details:   (-1.9 points, 5.0 required)
> >  
> >   pts rule name              description
> >  ---- ---------------------- --------------------------------------------------
> >  -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
> >                              [score: 0.0000]
> > X-TUID: alGBIuPZmqOj
> > 
> > === >8 ===
> > 
> > The long ---- ----... line is over 75 characters and triggers the test
> > for long commit_log lines.
> > 
> > > 
> > > Does that form follow rfc 5322?
> > 
> > By my reading, this is governed by the long header fields defined by
> > 2.2.3, with whitespace folding defined as "a CRLF may be inserted before
> > any WSP."
> > 
> > > 
> > > If it does then any email header could have that
> > > form and the header wrapping test should be
> > 
> > Yes, agreed.
> > 
> > So the logic we want is:
> > 
> > If we are in headers and we detect a CRLF and the next line starts with a WSP,
> > then we are still in headers (and therefor not in the commit log). The CRLF
> > information does not appear to be available as it is replaced with just \n.
> > 
> > > updated from
> > > 
> > > 		if ($in_header_lines && $realfile =~ /^$/ &&
> > > 		    !($rawline =~ /^\s+\S/ ||
> > > 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
> > > 			$in_header_lines = 0;
> > > 			$in_commit_log = 1;
> > > 			$has_commit_log = 1;
> > > 		}
> > > 
> > > to something like
> > > 
> > > 		if ($in_header_lines && $realfile =~ /^$/ &&
> > > 		    !($rawline =~ /^ (?:\s*\S|$)/ ||
> > 
> > Hrm... lines that start with maybe a space followed by a : ... Why did you
> > introduce that part of the check?
> 
> The regex doesn't care about colons.
> It's a perl non-capturing group.
> https://perldoc.perl.org/perlretut.html#Non-capturing-groupings

Aha, thank you for the pointer.

> 
> > Looking at this more closely, I was also not clear why the original test looked
> > for several spaces followed by non-space. What case is this for?
> 
> Not several spaces, one or more spaces then a non-space.
> The only change here is allowing an initial space followed
> by either:
> 
> 	1: optional spaces, then non-space.
> 	2: EOL
> 
> I supposed you could argue that case 2 should
> also allow optional spaces before EOL and the
> test should be
> 
> 		if ($in_header_lines && $realfile =~ /^$/ &&
> 		    !($rawline =~ /^\s+(?:\S|$)/ ||

I still haven't figured out why we test for this specific set of patterns. Why
is a line that starts with a space and ends with a newline considered still
in_header_lines. Or more specifically, why aren't we just testing for an empty
line (RFC 5322 Section 2.1, defining the separation of headers and the body).

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-23  6:01       ` Darren Hart
@ 2017-03-23  6:07         ` Joe Perches
  0 siblings, 0 replies; 8+ messages in thread
From: Joe Perches @ 2017-03-23  6:07 UTC (permalink / raw)
  To: Darren Hart
  Cc: John 'Warthog9' Hawley (VMware), linux-kernel, Andy Whitcroft

On Wed, 2017-03-22 at 23:01 -0700, Darren Hart wrote:
> I still haven't figured out why we test for this specific set of patterns. Why
> is a line that starts with a space and ends with a newline considered still
> in_header_lines. Or more specifically, why aren't we just testing for an empty
> line (RFC 5322 Section 2.1, defining the separation of headers and the body).

Because of the From: and commit: lines sometimes used to describe
patches sent on behalf of others.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-21 18:31 ` Joe Perches
  2017-03-22 15:25   ` Darren Hart
@ 2017-03-24 20:14   ` John 'Warthog9' Hawley
  2017-03-24 20:19     ` John 'Warthog9' Hawley
  1 sibling, 1 reply; 8+ messages in thread
From: John 'Warthog9' Hawley @ 2017-03-24 20:14 UTC (permalink / raw)
  To: Joe Perches, linux-kernel; +Cc: Andy Whitcroft, Darren Hart (VMware)

On 03/21/2017 11:31 AM, Joe Perches wrote:
> On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote:
>> Spamassassin sticks a long (~79 character) long string after a
>> line that has a single space in it. The line with space causes
>> checkpatch to erroniously think that it's in the content body, as
>> opposed to headers and thus flag a mail header as an unwrapped long
>> comment line.
> 
> If the spammassassin header is like
> 
> email-header-n: foo
> email-header-m: bar
>  
> X-Spam-Report: bar
> 
> Does that form follow rfc 5322?

It does look that way

> If it does then any email header could have that
> form and the header wrapping test should be
> updated from
> 
> 		if ($in_header_lines && $realfile =~ /^$/ &&
> 		    !($rawline =~ /^\s+\S/ ||
> 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
> 			$in_header_lines = 0;
> 			$in_commit_log = 1;
> 			$has_commit_log = 1;
> 		}
> 
> to something like
> 
> 		if ($in_header_lines && $realfile =~ /^$/ &&
> 		    !($rawline =~ /^ (?:\s*\S|$)/ ||
> 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
> 

So this seems to fix the specific issue we were tripping over, but in
doing so causes some other problems on some of the other headers in the
message we are using for testing (3 new warnings, and an error),
specifically flagging a:
	- Warning on 'Received:' header
	- Warning on 'DKIM-Signature:' header (twice, for both leading and
trailing white space on To: and From:)
	- Erroring on the DKIM-Signature as well

Noting the DKIM-Signature, in this case, is also a multi-line header message

This resolves the issue we were seeing, and doesn't (at least in my test
cases), cause any new errors:

	if ($in_header_lines && $realfile =~ /^$/ &&
		!(
			(
				$rawline =~ /^\s+\S*/
				&&
				$rawline !~ /^[\r\n]+$/
			)
			||
			$rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)
	) {

as the line that's causing issues, /^ [\r\n]+$/, wouldn't get
incorrectly caught as the end of the headers.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings
  2017-03-24 20:14   ` John 'Warthog9' Hawley
@ 2017-03-24 20:19     ` John 'Warthog9' Hawley
  0 siblings, 0 replies; 8+ messages in thread
From: John 'Warthog9' Hawley @ 2017-03-24 20:19 UTC (permalink / raw)
  To: Joe Perches, linux-kernel; +Cc: Andy Whitcroft, Darren Hart (VMware)

You can disregard this, I missed the other thread and that looks fine.

On 03/24/2017 01:14 PM, John 'Warthog9' Hawley wrote:
> On 03/21/2017 11:31 AM, Joe Perches wrote:
>> On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote:
>>> Spamassassin sticks a long (~79 character) long string after a
>>> line that has a single space in it. The line with space causes
>>> checkpatch to erroniously think that it's in the content body, as
>>> opposed to headers and thus flag a mail header as an unwrapped long
>>> comment line.
>>
>> If the spammassassin header is like
>>
>> email-header-n: foo
>> email-header-m: bar
>>  
>> X-Spam-Report: bar
>>
>> Does that form follow rfc 5322?
> 
> It does look that way
> 
>> If it does then any email header could have that
>> form and the header wrapping test should be
>> updated from
>>
>> 		if ($in_header_lines && $realfile =~ /^$/ &&
>> 		    !($rawline =~ /^\s+\S/ ||
>> 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
>> 			$in_header_lines = 0;
>> 			$in_commit_log = 1;
>> 			$has_commit_log = 1;
>> 		}
>>
>> to something like
>>
>> 		if ($in_header_lines && $realfile =~ /^$/ &&
>> 		    !($rawline =~ /^ (?:\s*\S|$)/ ||
>> 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
>>
> 
> So this seems to fix the specific issue we were tripping over, but in
> doing so causes some other problems on some of the other headers in the
> message we are using for testing (3 new warnings, and an error),
> specifically flagging a:
> 	- Warning on 'Received:' header
> 	- Warning on 'DKIM-Signature:' header (twice, for both leading and
> trailing white space on To: and From:)
> 	- Erroring on the DKIM-Signature as well
> 
> Noting the DKIM-Signature, in this case, is also a multi-line header message
> 
> This resolves the issue we were seeing, and doesn't (at least in my test
> cases), cause any new errors:
> 
> 	if ($in_header_lines && $realfile =~ /^$/ &&
> 		!(
> 			(
> 				$rawline =~ /^\s+\S*/
> 				&&
> 				$rawline !~ /^[\r\n]+$/
> 			)
> 			||
> 			$rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)
> 	) {
> 
> as the line that's causing issues, /^ [\r\n]+$/, wouldn't get
> incorrectly caught as the end of the headers.
> 
> - John 'Warthog9' Hawley
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-24 20:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-21 16:30 [PATCH] checkpatch: Flag spam header (X-Spam-Report) to prevent spurious warnings John 'Warthog9' Hawley (VMware)
2017-03-21 18:31 ` Joe Perches
2017-03-22 15:25   ` Darren Hart
2017-03-22 18:17     ` Joe Perches
2017-03-23  6:01       ` Darren Hart
2017-03-23  6:07         ` Joe Perches
2017-03-24 20:14   ` John 'Warthog9' Hawley
2017-03-24 20:19     ` John 'Warthog9' Hawley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.