linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
@ 2020-10-21 14:44 Aditya Srivastava
  2020-10-21 14:50 ` Lukas Bulwahn
  0 siblings, 1 reply; 28+ messages in thread
From: Aditya Srivastava @ 2020-10-21 14:44 UTC (permalink / raw)
  To: lukas.bulwahn; +Cc: dwaipayanray1, linux-kernel-mentees, Aditya Srivastava

Presence of hexadecimal address or symbol results in false warning
message by checkpatch.pl.

For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
memory leak in mptcp_subflow_create_socket()") results in warning:

WARNING:REPEATED_WORD: Possible repeated word: 'ff'
    00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....

Here, it reports 'ff' to be repeated, but it is in fact part of some
address or code, where it has to be repeated.
In this case, the intent of the warning to find stylistic issues in
commit messages is not met and the warning is just completely wrong in
this case.

To avoid all such reports, add an additional regex check for a repeating
pattern of 4 or more 2-lettered words separated by space in a line.

A quick evaluation on v5.6..v5.8 showed that this fix reduces
REPEATED_WORD warnings from 2797 to 1043.

A quick manual check found all cases are related to hex output in
commit messages.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
---
 scripts/checkpatch.pl | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9b9ffd876e8a..78aeb7a3ca3d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3050,8 +3050,10 @@ sub process {
 			}
 		}
 
-# check for repeated words separated by a single space
-		if ($rawline =~ /^\+/ || $in_commit_log) {
+# check for repeated words separated by a single space and
+# avoid repeating hex occurrences like 'ff ff fe 09 ...'
+		if (($rawline =~ /^\+/ || $in_commit_log) &&
+		$rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
 			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
 
 				my $first = $1;
-- 
2.17.1

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 14:44 [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning Aditya Srivastava
@ 2020-10-21 14:50 ` Lukas Bulwahn
  0 siblings, 0 replies; 28+ messages in thread
From: Lukas Bulwahn @ 2020-10-21 14:50 UTC (permalink / raw)
  To: Aditya Srivastava; +Cc: linux-kernel-mentees, dwaipayanray1



On Wed, 21 Oct 2020, Aditya Srivastava wrote:

> Presence of hexadecimal address or symbol results in false warning
> message by checkpatch.pl.
> 
> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> memory leak in mptcp_subflow_create_socket()") results in warning:
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
>     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
> 
> Here, it reports 'ff' to be repeated, but it is in fact part of some
> address or code, where it has to be repeated.
> In this case, the intent of the warning to find stylistic issues in
> commit messages is not met and the warning is just completely wrong in
> this case.
> 
> To avoid all such reports, add an additional regex check for a repeating
> pattern of 4 or more 2-lettered words separated by space in a line.
> 
> A quick evaluation on v5.6..v5.8 showed that this fix reduces
> REPEATED_WORD warnings from 2797 to 1043.
> 
> A quick manual check found all cases are related to hex output in
> commit messages.
>

Looks good to me. Send it to the general mailing list and Joe Perches to 
get more feedback and finally an Acked-by.

Merge window is currently ongoing; so, Joe probably has time, but Andrew 
Morton who picks up the patch will probably need a week or two.

Please also CC this mailing list, Dwaipayan and me.

Lukas

> Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
> ---
>  scripts/checkpatch.pl | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9b9ffd876e8a..78aeb7a3ca3d 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3050,8 +3050,10 @@ sub process {
>  			}
>  		}
>  
> -# check for repeated words separated by a single space
> -		if ($rawline =~ /^\+/ || $in_commit_log) {
> +# check for repeated words separated by a single space and
> +# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> +		if (($rawline =~ /^\+/ || $in_commit_log) &&
> +		$rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
>  			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>  
>  				my $first = $1;
> -- 
> 2.17.1
> 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-22 14:21               ` Aditya
@ 2020-10-22 14:35                 ` Joe Perches
  0 siblings, 0 replies; 28+ messages in thread
From: Joe Perches @ 2020-10-22 14:35 UTC (permalink / raw)
  To: Aditya, Lukas Bulwahn
  Cc: Dwaipayan Ray, linux-kernel-mentees, Linux Kernel Mailing List

On Thu, 2020-10-22 at 19:51 +0530, Aditya wrote:
> > > Alright Sir.

Joe is fine, sir isn't necessary.
> Hi Sir
> I have implemented my solution. Should I send the patch in reply to
> this mail or as a different mail? Also should I label it as v2? I have
> also addressed the warnings out of list command output in it. for eg.

Either way works.


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 19:12             ` Lukas Bulwahn
@ 2020-10-22 14:21               ` Aditya
  2020-10-22 14:35                 ` Joe Perches
  0 siblings, 1 reply; 28+ messages in thread
From: Aditya @ 2020-10-22 14:21 UTC (permalink / raw)
  To: Lukas Bulwahn
  Cc: Joe Perches, Dwaipayan Ray, linux-kernel-mentees,
	Linux Kernel Mailing List

On 22/10/20 12:42 am, Lukas Bulwahn wrote:
> On Wed, Oct 21, 2020 at 8:25 PM Aditya <yashsri421@gmail.com> wrote:
>>
>> On 21/10/20 11:35 pm, Joe Perches wrote:
>>> On Wed, 2020-10-21 at 23:25 +0530, Aditya wrote:
>>>> Thanks for your feedback. I ran a manual check using this approach
>>>> over v5.6..v5.8.
>>>> The negatives occurring with this approach are for the word 'be'
>>>> (Frequency 5) and 'add'(Frequency 1). For eg.
>>>>
>>>> WARNING:REPEATED_WORD: Possible repeated word: 'be'
>>>> #278: FILE: drivers/net/ethernet/intel/ice/ice_flow.c:388:
>>>> + * @seg: index of packet segment whose raw fields are to be be extracted
>>>>
>>>> WARNING:REPEATED_WORD: Possible repeated word: 'add'
>>>> #21:
>>>> Let's also add add a note about using only the l3 access without l4
>>>>
>>>> Apart from these, it works as expected. It also takes into account the
>>>> cases for multiple occurrences of hex, as you mentioned. For eg.
>>>>
>>>> WARNING:REPEATED_WORD: Possible repeated word: 'ffff'
>>>> #15:
>>> []
>>>> I'll try to combine both methods and come up with a better approach.
>>>
>>> Enjoy, but please consider:
>>>
>>> If for over 30K patches, there are just a few false positives and
>>> a few false negatives, it likely doesn't need much improvement...
>>>
>>> checkpatch works on patch contexts.
>>>
>>> It's not intended to be perfect.
>>>
>>> It's just a little tool that can help avoid some common defects.
>>>
>>>
>>
>> Alright Sir. Then, we can proceed with the method you suggested, as it
>> is more or less perfect.
>> I'll re-send the patch with modified reduced warning figure.
>>
> 
> Aditya, you can also choose to implement your solution;
> yes, it is more work for you but it also seems to function better in
> the long run.
> 
> Clearly, Joe would settle for a simpler solution, but his TODO list of
> topics to engage in and work on is also much longer...
> 
> Lukas
> 

Hi Sir
I have implemented my solution. Should I send the patch in reply to
this mail or as a different mail? Also should I label it as v2? I have
also addressed the warnings out of list command output in it. for eg.

WARNING:REPEATED_WORD: Possible repeated word: 'root'
#18:
  drwxr-xr-x. 2 root root    0 Apr 17 10:53 .

WARNING:REPEATED_WORD: Possible repeated word: 'nobody'
#28:
drwxr-xr-x 5 nobody nobody    0 Jan 25 18:08 .

Sincerely
Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 19:26     ` Joe Perches
@ 2020-10-21 20:36       ` Joe Perches
  0 siblings, 0 replies; 28+ messages in thread
From: Joe Perches @ 2020-10-21 20:36 UTC (permalink / raw)
  To: Aditya; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On Wed, 2020-10-21 at 12:26 -0700, Joe Perches wrote:

> Perhaps a regex for permissions is good enough
> 	$line !~ /\b[cbdl-][rwxs-]{9,9}\b/

Maybe not completely correct...

From info ls:

    The file type is one of the following characters:

     ‘-’
          regular file
     ‘b’
          block special file
     ‘c’
          character special file
     ‘C’
          high performance (“contiguous data”) file
     ‘d’
          directory
     ‘D’
          door (Solaris 2.5 and up)
     ‘l’
          symbolic link
     ‘M’
          off-line (“migrated”) file (Cray DMF)
     ‘n’
          network special file (HP-UX)
     ‘p’
          FIFO (named pipe)
     ‘P’
          port (Solaris 10 and up)
     ‘s’
          socket
     ‘?’
          some other file type

     The file mode bits listed are similar to symbolic mode
     specifications (*note Symbolic Modes::).  But ‘ls’ combines
     multiple bits into the third character of each set of permissions
     as follows:

     ‘s’
          If the set-user-ID or set-group-ID bit and the corresponding
          executable bit are both set.

     ‘S’
          If the set-user-ID or set-group-ID bit is set but the
          corresponding executable bit is not set.

     ‘t’
          If the restricted deletion flag or sticky bit, and the
          other-executable bit, are both set.  The restricted deletion
          flag is another name for the sticky bit.  *Note Mode
          Structure::.

     ‘T’
          If the restricted deletion flag or sticky bit is set but the
          other-executable bit is not set.

     ‘x’
          If the executable bit is set and none of the above apply.

     ‘-’
          Otherwise.

So apparently to be correct this should be:

	$line !~ /\b[bcCdDlMnpPs\?-][rwxsStT-]{9,9}\b/


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 19:10   ` Aditya
@ 2020-10-21 19:26     ` Joe Perches
  2020-10-21 20:36       ` Joe Perches
  0 siblings, 1 reply; 28+ messages in thread
From: Joe Perches @ 2020-10-21 19:26 UTC (permalink / raw)
  To: Aditya; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On Thu, 2020-10-22 at 00:40 +0530, Aditya wrote:
> On 21/10/20 8:48 pm, Joe Perches wrote:
> > On Wed, 2020-10-21 at 20:31 +0530, Aditya Srivastava wrote:
> > > Presence of hexadecimal address or symbol results in false warning
> > > message by checkpatch.pl.
> > > 
> > > For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> > > memory leak in mptcp_subflow_create_socket()") results in warning:
> > > 
> > > WARNING:REPEATED_WORD: Possible repeated word: 'ff'
> > >     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
> > 
> > Right.
> > 
> > > To avoid all such reports, add an additional regex check for a repeating
> > > pattern of 4 or more 2-lettered words separated by space in a line.
> > > A quick evaluation on v5.6..v5.8 showed that this fix reduces
> > > REPEATED_WORD warnings from 2797 to 1043.
> > 
> > Are many of the other 1043 false positives?
> > Any pattern to them?
> > 
> Apart from the changes suggested by Dwaipayan in
> https://lore.kernel.org/linux-kernel-mentees/20201017162732.152351-1-dwaipayanray1@gmail.com/
> 
> The 'ls -l' output seems to be another common false positive for
> REPEATED_WORD (Frequency 106 over v5.6..v5.8). For eg.
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'root'
> #18:
>   drwxr-xr-x. 2 root root    0 Apr 17 10:53 .
[]
> @@ -3050,8 +3050,10 @@ sub process {
>  			}
>  		}
> 
> -		if ($rawline =~ /^\+/ || $in_commit_log) {
> +		if (($rawline =~ /^\+/ || $in_commit_log) &&
> +		$rawline !~ /\b[a-z-]+.* \d{1,3} [a-zA-Z]+ \w+ +\d+ \w{3} \d{1,2}
> \d{1,2}:\d{1,2}/) {

Perhaps a regex for permissions is good enough

	$line !~ /\b[cbdl-][rwxs-]{9,9}\b/


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 18:25           ` Aditya
@ 2020-10-21 19:12             ` Lukas Bulwahn
  2020-10-22 14:21               ` Aditya
  0 siblings, 1 reply; 28+ messages in thread
From: Lukas Bulwahn @ 2020-10-21 19:12 UTC (permalink / raw)
  To: Aditya
  Cc: Joe Perches, Dwaipayan Ray, linux-kernel-mentees,
	Linux Kernel Mailing List

On Wed, Oct 21, 2020 at 8:25 PM Aditya <yashsri421@gmail.com> wrote:
>
> On 21/10/20 11:35 pm, Joe Perches wrote:
> > On Wed, 2020-10-21 at 23:25 +0530, Aditya wrote:
> >> Thanks for your feedback. I ran a manual check using this approach
> >> over v5.6..v5.8.
> >> The negatives occurring with this approach are for the word 'be'
> >> (Frequency 5) and 'add'(Frequency 1). For eg.
> >>
> >> WARNING:REPEATED_WORD: Possible repeated word: 'be'
> >> #278: FILE: drivers/net/ethernet/intel/ice/ice_flow.c:388:
> >> + * @seg: index of packet segment whose raw fields are to be be extracted
> >>
> >> WARNING:REPEATED_WORD: Possible repeated word: 'add'
> >> #21:
> >> Let's also add add a note about using only the l3 access without l4
> >>
> >> Apart from these, it works as expected. It also takes into account the
> >> cases for multiple occurrences of hex, as you mentioned. For eg.
> >>
> >> WARNING:REPEATED_WORD: Possible repeated word: 'ffff'
> >> #15:
> > []
> >> I'll try to combine both methods and come up with a better approach.
> >
> > Enjoy, but please consider:
> >
> > If for over 30K patches, there are just a few false positives and
> > a few false negatives, it likely doesn't need much improvement...
> >
> > checkpatch works on patch contexts.
> >
> > It's not intended to be perfect.
> >
> > It's just a little tool that can help avoid some common defects.
> >
> >
>
> Alright Sir. Then, we can proceed with the method you suggested, as it
> is more or less perfect.
> I'll re-send the patch with modified reduced warning figure.
>

Aditya, you can also choose to implement your solution;
yes, it is more work for you but it also seems to function better in
the long run.

Clearly, Joe would settle for a simpler solution, but his TODO list of
topics to engage in and work on is also much longer...

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 15:18 ` Joe Perches
  2020-10-21 15:28   ` Joe Perches
@ 2020-10-21 19:10   ` Aditya
  2020-10-21 19:26     ` Joe Perches
  1 sibling, 1 reply; 28+ messages in thread
From: Aditya @ 2020-10-21 19:10 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On 21/10/20 8:48 pm, Joe Perches wrote:
> On Wed, 2020-10-21 at 20:31 +0530, Aditya Srivastava wrote:
>> Presence of hexadecimal address or symbol results in false warning
>> message by checkpatch.pl.
>>
>> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
>> memory leak in mptcp_subflow_create_socket()") results in warning:
>>
>> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
>>     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
> 
> Right.
> 
>> To avoid all such reports, add an additional regex check for a repeating
>> pattern of 4 or more 2-lettered words separated by space in a line.
> 
>> A quick evaluation on v5.6..v5.8 showed that this fix reduces
>> REPEATED_WORD warnings from 2797 to 1043.
> 
> Are many of the other 1043 false positives?
> Any pattern to them?
> 

Apart from the changes suggested by Dwaipayan in
https://lore.kernel.org/linux-kernel-mentees/20201017162732.152351-1-dwaipayanray1@gmail.com/

The 'ls -l' output seems to be another common false positive for
REPEATED_WORD (Frequency 106 over v5.6..v5.8). For eg.

WARNING:REPEATED_WORD: Possible repeated word: 'root'
#18:
  drwxr-xr-x. 2 root root    0 Apr 17 10:53 .

WARNING:REPEATED_WORD: Possible repeated word: 'nobody'
#28:
drwxr-xr-x 5 nobody nobody    0 Jan 25 18:08 .

WARNING:REPEATED_WORD: Possible repeated word: 'irogers'
#17:
  -rw-r----- 1 irogers irogers 553 Apr 17 14:31
../../../util/unwind-libdw.h

These can be avoided by using:
@@ -3050,8 +3050,10 @@ sub process {
 			}
 		}

-		if ($rawline =~ /^\+/ || $in_commit_log) {
+		if (($rawline =~ /^\+/ || $in_commit_log) &&
+		$rawline !~ /\b[a-z-]+.* \d{1,3} [a-zA-Z]+ \w+ +\d+ \w{3} \d{1,2}
\d{1,2}:\d{1,2}/) {

Sincerely
Aditya

>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
>> @@ -3050,8 +3050,10 @@ sub process {
>>  			}
>>  		}
>>  
>> -# check for repeated words separated by a single space
>> -		if ($rawline =~ /^\+/ || $in_commit_log) {
>> +# check for repeated words separated by a single space and
>> +# avoid repeating hex occurrences like 'ff ff fe 09 ...'
>> +		if (($rawline =~ /^\+/ || $in_commit_log) &&
>> +		$rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
> 
> This might be better as \b$Hex to avoid FF FF
> and FFFFFFFF FFFFFFFF
> 
> I might add that check to the line below where
> the repeated words are checked against long
> ---
>  scripts/checkpatch.pl | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..929866999f81 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3062,6 +3062,7 @@ sub process {
>  
>  				next if ($first ne $second);
>  				next if ($first eq 'long');
> +				next if ($first =~ /^$Hex$/;
>  
>  				if (WARN("REPEATED_WORD",
>  					 "Possible repeated word: '$first'\n" . $herecurr) &&
> 
> 

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 18:05         ` Joe Perches
@ 2020-10-21 18:25           ` Aditya
  2020-10-21 19:12             ` Lukas Bulwahn
  0 siblings, 1 reply; 28+ messages in thread
From: Aditya @ 2020-10-21 18:25 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On 21/10/20 11:35 pm, Joe Perches wrote:
> On Wed, 2020-10-21 at 23:25 +0530, Aditya wrote:
>> Thanks for your feedback. I ran a manual check using this approach
>> over v5.6..v5.8.
>> The negatives occurring with this approach are for the word 'be'
>> (Frequency 5) and 'add'(Frequency 1). For eg.
>>
>> WARNING:REPEATED_WORD: Possible repeated word: 'be'
>> #278: FILE: drivers/net/ethernet/intel/ice/ice_flow.c:388:
>> + * @seg: index of packet segment whose raw fields are to be be extracted
>>
>> WARNING:REPEATED_WORD: Possible repeated word: 'add'
>> #21:
>> Let's also add add a note about using only the l3 access without l4
>>
>> Apart from these, it works as expected. It also takes into account the
>> cases for multiple occurrences of hex, as you mentioned. For eg.
>>
>> WARNING:REPEATED_WORD: Possible repeated word: 'ffff'
>> #15:
> []
>> I'll try to combine both methods and come up with a better approach.
> 
> Enjoy, but please consider:
> 
> If for over 30K patches, there are just a few false positives and
> a few false negatives, it likely doesn't need much improvement...
> 
> checkpatch works on patch contexts.
> 
> It's not intended to be perfect.
> 
> It's just a little tool that can help avoid some common defects.
> 
> 

Alright Sir. Then, we can proceed with the method you suggested, as it
is more or less perfect.
I'll re-send the patch with modified reduced warning figure.

Thanks
Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 17:55       ` Aditya
@ 2020-10-21 18:05         ` Joe Perches
  2020-10-21 18:25           ` Aditya
  0 siblings, 1 reply; 28+ messages in thread
From: Joe Perches @ 2020-10-21 18:05 UTC (permalink / raw)
  To: Aditya; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On Wed, 2020-10-21 at 23:25 +0530, Aditya wrote:
> Thanks for your feedback. I ran a manual check using this approach
> over v5.6..v5.8.
> The negatives occurring with this approach are for the word 'be'
> (Frequency 5) and 'add'(Frequency 1). For eg.
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'be'
> #278: FILE: drivers/net/ethernet/intel/ice/ice_flow.c:388:
> + * @seg: index of packet segment whose raw fields are to be be extracted
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'add'
> #21:
> Let's also add add a note about using only the l3 access without l4
> 
> Apart from these, it works as expected. It also takes into account the
> cases for multiple occurrences of hex, as you mentioned. For eg.
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'ffff'
> #15:
[]
> I'll try to combine both methods and come up with a better approach.

Enjoy, but please consider:

If for over 30K patches, there are just a few false positives and
a few false negatives, it likely doesn't need much improvement...

checkpatch works on patch contexts.

It's not intended to be perfect.

It's just a little tool that can help avoid some common defects.


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 16:50     ` Joe Perches
  2020-10-21 16:59       ` Dwaipayan Ray
@ 2020-10-21 17:55       ` Aditya
  2020-10-21 18:05         ` Joe Perches
  1 sibling, 1 reply; 28+ messages in thread
From: Aditya @ 2020-10-21 17:55 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On 21/10/20 10:20 pm, Joe Perches wrote:
> On Wed, 2020-10-21 at 08:28 -0700, Joe Perches wrote:
>> On Wed, 2020-10-21 at 08:18 -0700, Joe Perches wrote:
>>> I might add that check to the line below where
>>> the repeated words are checked against long
>> []
>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> []
>>> @@ -3062,6 +3062,7 @@ sub process {
>>>  
>>>  				next if ($first ne $second);
>>>  				next if ($first eq 'long');
>>> +				next if ($first =~ /^$Hex$/;
>>
>> oops.  with a close parenthesis added of course...
> 
> That doesn't work as $Hex expects a leading 0x.
> 
> But this does...
> 
> The negative of this approach is it would also not emit
> a warning on these repeated words: (doesn't seem too bad)
> 
> $ grep -P '^[0-9a-f]{2,}$' /usr/share/dict/words
> abed
> accede
> acceded
> ace
> aced
> ad
> add
> added
> baa
> baaed
> babe
> bad
> bade
> be
> bead
> beaded
> bed
> bedded
> bee
> beef
> beefed
> cab
> cabbed
> cad
> cede
> ceded
> dab
> dabbed
> dad
> dead
> deaf
> deb
> decade
> decaf
> deed
> deeded
> deface
> defaced
> ebb
> ebbed
> efface
> effaced
> fa
> facade
> face
> faced
> fad
> fade
> faded
> fed
> fee
> feed
> ---
>  scripts/checkpatch.pl | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..79d7a4cba19e 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3062,6 +3062,7 @@ sub process {
>  
>  				next if ($first ne $second);
>  				next if ($first eq 'long');
> +				next if ($first =~ /^[0-9a-f]+$/i);
>  
>  				if (WARN("REPEATED_WORD",
>  					 "Possible repeated word: '$first'\n" . $herecurr) &&
> 
> 
> 

Hi Sir,
Thanks for your feedback. I ran a manual check using this approach
over v5.6..v5.8.
The negatives occurring with this approach are for the word 'be'
(Frequency 5) and 'add'(Frequency 1). For eg.

WARNING:REPEATED_WORD: Possible repeated word: 'be'
#278: FILE: drivers/net/ethernet/intel/ice/ice_flow.c:388:
+ * @seg: index of packet segment whose raw fields are to be be extracted

WARNING:REPEATED_WORD: Possible repeated word: 'add'
#21:
Let's also add add a note about using only the l3 access without l4

Apart from these, it works as expected. It also takes into account the
cases for multiple occurrences of hex, as you mentioned. For eg.

WARNING:REPEATED_WORD: Possible repeated word: 'ffff'
#15:
	0x0040:  ffff ffff ffff ffff ffff ffff ffff ffff

These cases were getting missed with my approach.

Also, it is able to detect warnings for hex sequences which are
occurring less than 4 times(frequency 2), for eg,

WARNING:REPEATED_WORD: Possible repeated word: 'ff'
#38:
 Code: ff ff 48 (...)

I'll try to combine both methods and come up with a better approach.

Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 16:59       ` Dwaipayan Ray
@ 2020-10-21 17:17         ` Joe Perches
  0 siblings, 0 replies; 28+ messages in thread
From: Joe Perches @ 2020-10-21 17:17 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel, Aditya Srivastava

On Wed, 2020-10-21 at 22:29 +0530, Dwaipayan Ray wrote:
> Can it be considered that the Hex numbers occur
> mostly in pairs or groups of 8, like "FF" or "FFFFFFFF"?
> 
> I think it might reduce the negative side further.

Maybe.  This already looks for pairs.

Combined with your previous patch,
https://lore.kernel.org/linux-kernel-mentees/20201017162732.152351-1-dwaipayanray1@gmail.com/
it seems OK to me.

Try something out and see if it makes a difference.



_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 16:50     ` Joe Perches
@ 2020-10-21 16:59       ` Dwaipayan Ray
  2020-10-21 17:17         ` Joe Perches
  2020-10-21 17:55       ` Aditya
  1 sibling, 1 reply; 28+ messages in thread
From: Dwaipayan Ray @ 2020-10-21 16:59 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Aditya Srivastava

On Wed, Oct 21, 2020 at 10:21 PM Joe Perches <joe@perches.com> wrote:
>
> On Wed, 2020-10-21 at 08:28 -0700, Joe Perches wrote:
> > On Wed, 2020-10-21 at 08:18 -0700, Joe Perches wrote:
> > > I might add that check to the line below where
> > > the repeated words are checked against long
> > []
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > []
> > > @@ -3062,6 +3062,7 @@ sub process {
> > >
> > >                             next if ($first ne $second);
> > >                             next if ($first eq 'long');
> > > +                           next if ($first =~ /^$Hex$/;
> >
> > oops.  with a close parenthesis added of course...
>
> That doesn't work as $Hex expects a leading 0x.
>
> But this does...
>
> The negative of this approach is it would also not emit
> a warning on these repeated words: (doesn't seem too bad)
>
> $ grep -P '^[0-9a-f]{2,}$' /usr/share/dict/words
> abed
> accede
> acceded
> ace
> aced
> ad
> add
> added
> baa
> baaed
> babe
> bad
> bade
> be
> bead
> beaded
> bed
> bedded
> bee
> beef
> beefed
> cab
> cabbed
> cad
> cede
> ceded
> dab
> dabbed
> dad
> dead
> deaf
> deb
> decade
> decaf
> deed
> deeded
> deface
> defaced
> ebb
> ebbed
> efface
> effaced
> fa
> facade
> face
> faced
> fad
> fade
> faded
> fed
> fee
> feed
> ---
>  scripts/checkpatch.pl | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..79d7a4cba19e 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3062,6 +3062,7 @@ sub process {
>
>                                 next if ($first ne $second);
>                                 next if ($first eq 'long');
> +                               next if ($first =~ /^[0-9a-f]+$/i);
>
>                                 if (WARN("REPEATED_WORD",
>                                          "Possible repeated word: '$first'\n" . $herecurr) &&
>
>

Hi,
Can it be considered that the Hex numbers occur
mostly in pairs or groups of 8, like "FF" or "FFFFFFFF"?

I think it might reduce the negative side further.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 15:28   ` Joe Perches
@ 2020-10-21 16:50     ` Joe Perches
  2020-10-21 16:59       ` Dwaipayan Ray
  2020-10-21 17:55       ` Aditya
  0 siblings, 2 replies; 28+ messages in thread
From: Joe Perches @ 2020-10-21 16:50 UTC (permalink / raw)
  To: Aditya Srivastava; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On Wed, 2020-10-21 at 08:28 -0700, Joe Perches wrote:
> On Wed, 2020-10-21 at 08:18 -0700, Joe Perches wrote:
> > I might add that check to the line below where
> > the repeated words are checked against long
> []
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -3062,6 +3062,7 @@ sub process {
> >  
> >  				next if ($first ne $second);
> >  				next if ($first eq 'long');
> > +				next if ($first =~ /^$Hex$/;
> 
> oops.  with a close parenthesis added of course...

That doesn't work as $Hex expects a leading 0x.

But this does...

The negative of this approach is it would also not emit
a warning on these repeated words: (doesn't seem too bad)

$ grep -P '^[0-9a-f]{2,}$' /usr/share/dict/words
abed
accede
acceded
ace
aced
ad
add
added
baa
baaed
babe
bad
bade
be
bead
beaded
bed
bedded
bee
beef
beefed
cab
cabbed
cad
cede
ceded
dab
dabbed
dad
dead
deaf
deb
decade
decaf
deed
deeded
deface
defaced
ebb
ebbed
efface
effaced
fa
facade
face
faced
fad
fade
faded
fed
fee
feed
---
 scripts/checkpatch.pl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fab38b493cef..79d7a4cba19e 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3062,6 +3062,7 @@ sub process {
 
 				next if ($first ne $second);
 				next if ($first eq 'long');
+				next if ($first =~ /^[0-9a-f]+$/i);
 
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&



_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 15:18 ` Joe Perches
@ 2020-10-21 15:28   ` Joe Perches
  2020-10-21 16:50     ` Joe Perches
  2020-10-21 19:10   ` Aditya
  1 sibling, 1 reply; 28+ messages in thread
From: Joe Perches @ 2020-10-21 15:28 UTC (permalink / raw)
  To: Aditya Srivastava; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On Wed, 2020-10-21 at 08:18 -0700, Joe Perches wrote:
> I might add that check to the line below where
> the repeated words are checked against long
[]
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -3062,6 +3062,7 @@ sub process {
>  
>  				next if ($first ne $second);
>  				next if ($first eq 'long');
> +				next if ($first =~ /^$Hex$/;

oops.  with a close parenthesis added of course...


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 15:01 Aditya Srivastava
  2020-10-21 15:08 ` Lukas Bulwahn
@ 2020-10-21 15:18 ` Joe Perches
  2020-10-21 15:28   ` Joe Perches
  2020-10-21 19:10   ` Aditya
  1 sibling, 2 replies; 28+ messages in thread
From: Joe Perches @ 2020-10-21 15:18 UTC (permalink / raw)
  To: Aditya Srivastava; +Cc: linux-kernel-mentees, linux-kernel, dwaipayanray1

On Wed, 2020-10-21 at 20:31 +0530, Aditya Srivastava wrote:
> Presence of hexadecimal address or symbol results in false warning
> message by checkpatch.pl.
> 
> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> memory leak in mptcp_subflow_create_socket()") results in warning:
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
>     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....

Right.

> To avoid all such reports, add an additional regex check for a repeating
> pattern of 4 or more 2-lettered words separated by space in a line.

> A quick evaluation on v5.6..v5.8 showed that this fix reduces
> REPEATED_WORD warnings from 2797 to 1043.

Are many of the other 1043 false positives?
Any pattern to them?

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -3050,8 +3050,10 @@ sub process {
>  			}
>  		}
>  
> -# check for repeated words separated by a single space
> -		if ($rawline =~ /^\+/ || $in_commit_log) {
> +# check for repeated words separated by a single space and
> +# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> +		if (($rawline =~ /^\+/ || $in_commit_log) &&
> +		$rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {

This might be better as \b$Hex to avoid FF FF
and FFFFFFFF FFFFFFFF

I might add that check to the line below where
the repeated words are checked against long
---
 scripts/checkpatch.pl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fab38b493cef..929866999f81 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3062,6 +3062,7 @@ sub process {
 
 				next if ($first ne $second);
 				next if ($first eq 'long');
+				next if ($first =~ /^$Hex$/;
 
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 15:01 Aditya Srivastava
@ 2020-10-21 15:08 ` Lukas Bulwahn
  2020-10-21 15:18 ` Joe Perches
  1 sibling, 0 replies; 28+ messages in thread
From: Lukas Bulwahn @ 2020-10-21 15:08 UTC (permalink / raw)
  To: Aditya Srivastava; +Cc: joe, linux-kernel-mentees, linux-kernel, dwaipayanray1



On Wed, 21 Oct 2020, Aditya Srivastava wrote:

> Presence of hexadecimal address or symbol results in false warning
> message by checkpatch.pl.
> 
> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> memory leak in mptcp_subflow_create_socket()") results in warning:
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
>     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
> 
> Here, it reports 'ff' to be repeated, but it is in fact part of some
> address or code, where it has to be repeated.
> In this case, the intent of the warning to find stylistic issues in
> commit messages is not met and the warning is just completely wrong in
> this case.
> 
> To avoid all such reports, add an additional regex check for a repeating
> pattern of 4 or more 2-lettered words separated by space in a line.
> 
> A quick evaluation on v5.6..v5.8 showed that this fix reduces
> REPEATED_WORD warnings from 2797 to 1043.
> 
> A quick manual check found all cases are related to hex output in
> commit messages.
>

Aditya, one thing I just noticed the commit message header is a bit
uninformative.

How about something like:

identify typical hex output for a better REPEATED_WORD check

Other than that, it looks good. You might want to share the link to the 
complete report of differences before and after this patch for Joe to 
check as well.

Lukas

> Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
> ---
>  scripts/checkpatch.pl | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9b9ffd876e8a..78aeb7a3ca3d 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3050,8 +3050,10 @@ sub process {
>  			}
>  		}
>  
> -# check for repeated words separated by a single space
> -		if ($rawline =~ /^\+/ || $in_commit_log) {
> +# check for repeated words separated by a single space and
> +# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> +		if (($rawline =~ /^\+/ || $in_commit_log) &&
> +		$rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
>  			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>  
>  				my $first = $1;
> -- 
> 2.17.1
> 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
@ 2020-10-21 15:01 Aditya Srivastava
  2020-10-21 15:08 ` Lukas Bulwahn
  2020-10-21 15:18 ` Joe Perches
  0 siblings, 2 replies; 28+ messages in thread
From: Aditya Srivastava @ 2020-10-21 15:01 UTC (permalink / raw)
  To: joe; +Cc: Aditya Srivastava, linux-kernel-mentees, linux-kernel, dwaipayanray1

Presence of hexadecimal address or symbol results in false warning
message by checkpatch.pl.

For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
memory leak in mptcp_subflow_create_socket()") results in warning:

WARNING:REPEATED_WORD: Possible repeated word: 'ff'
    00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....

Here, it reports 'ff' to be repeated, but it is in fact part of some
address or code, where it has to be repeated.
In this case, the intent of the warning to find stylistic issues in
commit messages is not met and the warning is just completely wrong in
this case.

To avoid all such reports, add an additional regex check for a repeating
pattern of 4 or more 2-lettered words separated by space in a line.

A quick evaluation on v5.6..v5.8 showed that this fix reduces
REPEATED_WORD warnings from 2797 to 1043.

A quick manual check found all cases are related to hex output in
commit messages.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
---
 scripts/checkpatch.pl | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9b9ffd876e8a..78aeb7a3ca3d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3050,8 +3050,10 @@ sub process {
 			}
 		}
 
-# check for repeated words separated by a single space
-		if ($rawline =~ /^\+/ || $in_commit_log) {
+# check for repeated words separated by a single space and
+# avoid repeating hex occurrences like 'ff ff fe 09 ...'
+		if (($rawline =~ /^\+/ || $in_commit_log) &&
+		$rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
 			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
 
 				my $first = $1;
-- 
2.17.1

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 12:09         ` Aditya
  2020-10-21 12:53           ` Aditya
  2020-10-21 12:58           ` Lukas Bulwahn
@ 2020-10-21 12:59           ` Dwaipayan Ray
  2 siblings, 0 replies; 28+ messages in thread
From: Dwaipayan Ray @ 2020-10-21 12:59 UTC (permalink / raw)
  To: Aditya; +Cc: linux-kernel-mentees

>  # check for repeated words separated by a single space
>                 if ($rawline =~ /^\+/ || $in_commit_log) {
> -                       while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> +                       # avoid repeating hex occurrences like 'ff ff fe 09 ...'

Hey,
Probably one more change you could do here:

> +                       while ($rawline !~ /(\b[0-9a-f]{2}( )+){4,}/ &&
> +                               $rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {

The hex check is performed everytime a duplicate word is found
in the line. A line with multiple duplicate words will lead to
unnecessary re run of the hex check.

Example:

"This is is the the repeated word"
Two repeated words: 'is' and 'the', and two runs
of the hex check on the same line.


Probably move it here?
 +               if (($rawline =~ /^\+/ || $in_commit_log) &&
 +                    $rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 12:09         ` Aditya
  2020-10-21 12:53           ` Aditya
@ 2020-10-21 12:58           ` Lukas Bulwahn
  2020-10-21 12:59           ` Dwaipayan Ray
  2 siblings, 0 replies; 28+ messages in thread
From: Lukas Bulwahn @ 2020-10-21 12:58 UTC (permalink / raw)
  To: Aditya; +Cc: linux-kernel-mentees, Dwaipayan Ray



On Wed, 21 Oct 2020, Aditya wrote:

> On 21/10/20 2:22 pm, Lukas Bulwahn wrote:
> > 
> > 
> > On Wed, 21 Oct 2020, Dwaipayan Ray wrote:
> > 
> >> Hey Aditya and Lukas,
> >>
> >>>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> >>>>> index 9b9ffd876e8a..181c95691715 100755
> >>>>> --- a/scripts/checkpatch.pl
> >>>>> +++ b/scripts/checkpatch.pl
> >>>>> @@ -3052,7 +3052,9 @@ sub process {
> >>>>>
> >>>>>  # check for repeated words separated by a single space
> >>>>>             if ($rawline =~ /^\+/ || $in_commit_log) {
> >>>>> -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> >>>>> +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
> >>>>> +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
> >>
> >> Pattern is probably wrong. It doesn't recognize word boundaries or
> >> tabs between words. Example of the first type:
> >>
> >> 000 00 ff ff ...
> >>
> > 
> > I am wondering if this pattern really appears.
> > 
> > Hex stuff is usually written two-letter and spaces.
> > 
> > Maybe it is best to limit it to 0-9a-f, though. I think there should not 
> > be matches with other letters than that.
> > 
> > Aditya, evaluations on those alternatives would help to make decisions.
> > 
> >> The regex matches "00 00 ff ff" ignoring the first 0.
> >>
> >> I think it could be perhaps better with something like:
> >>
> >>  # check for repeated words separated by a single space
> >> -               if ($rawline =~ /^\+/ || $in_commit_log) {
> >> +               if (($rawline =~ /^\+/ || $in_commit_log) &&
> >> +                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
> >>                         pos($rawline) = 1 if (!$in_commit_log);
> >>                         while ($rawline =~ /\b($word_pattern)
> >> (?=($word_pattern))/g) {
> >>
> >> Please test it though. I only ran it on a few patterns.
> >>
> >> Apart from it, this does fix the problem. But I am quite sceptical about
> >> matching 4 or more 2 lettered words in a row. There could be counter
> >> examples but I guess that is very rare. It's not very general, but for
> >> the moment it does the job.
> >>
> >> So I think it's probably good with some changes. Not sure what Joe
> >> would have in mind though.
> >>
> >> Lukas, I think with the changes in place, it is ready to go for discussion.
> >>
> > 
> > Dwaipayan, thanks for your review.
> > 
> > Lukas
> > 
> 
> Hi Sir
> I made these changes:
>  # check for repeated words separated by a single space
>  		if ($rawline =~ /^\+/ || $in_commit_log) {
> -			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> +			# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> +			while ($rawline !~ /(\b[0-9a-f]{2}( )+){4,}/ &&
> +				$rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> 
>  				my $first = $1;
>  				my $second = $2;
> 
> 
> 
> Reports:
> List of errors and warnings after applying the patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/summary.txt
> 
> Change in errors and warnings compared to previous patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/summary_relative.txt
> 
> Dropped warnings compared to previous patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/dropped_warnings/summary.txt
>

Looks good.

I suggest you quickly scan through the dropped warnings and confirm, so 
that you can add something like this to your commit message:

---
A quick evaluation on <git commit range, fill in here> showed that
this change reduces REPEATED_WORD warnings from xxx to yyy.

A quick manual check found all cases are related to hex output in commit 
messages.

---

Then send out the patch again here quickly and if we do not see big 
mistake, send it our to lkml and Joe Perches.

If you need any help, just let us know.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21 12:09         ` Aditya
@ 2020-10-21 12:53           ` Aditya
  2020-10-21 12:58           ` Lukas Bulwahn
  2020-10-21 12:59           ` Dwaipayan Ray
  2 siblings, 0 replies; 28+ messages in thread
From: Aditya @ 2020-10-21 12:53 UTC (permalink / raw)
  To: Lukas Bulwahn, Dwaipayan Ray; +Cc: linux-kernel-mentees

On 21/10/20 5:39 pm, Aditya wrote:
> On 21/10/20 2:22 pm, Lukas Bulwahn wrote:
>>
>>
>> On Wed, 21 Oct 2020, Dwaipayan Ray wrote:
>>
>>> Hey Aditya and Lukas,
>>>
>>>>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>>>>> index 9b9ffd876e8a..181c95691715 100755
>>>>>> --- a/scripts/checkpatch.pl
>>>>>> +++ b/scripts/checkpatch.pl
>>>>>> @@ -3052,7 +3052,9 @@ sub process {
>>>>>>
>>>>>>  # check for repeated words separated by a single space
>>>>>>             if ($rawline =~ /^\+/ || $in_commit_log) {
>>>>>> -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>>>>>> +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
>>>>>> +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
>>>
>>> Pattern is probably wrong. It doesn't recognize word boundaries or
>>> tabs between words. Example of the first type:
>>>
>>> 000 00 ff ff ...
>>>
>>
>> I am wondering if this pattern really appears.
>>
>> Hex stuff is usually written two-letter and spaces.
>>
>> Maybe it is best to limit it to 0-9a-f, though. I think there should not 
>> be matches with other letters than that.
>>
>> Aditya, evaluations on those alternatives would help to make decisions.
>>
>>> The regex matches "00 00 ff ff" ignoring the first 0.
>>>
>>> I think it could be perhaps better with something like:
>>>
>>>  # check for repeated words separated by a single space
>>> -               if ($rawline =~ /^\+/ || $in_commit_log) {
>>> +               if (($rawline =~ /^\+/ || $in_commit_log) &&
>>> +                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
>>>                         pos($rawline) = 1 if (!$in_commit_log);
>>>                         while ($rawline =~ /\b($word_pattern)
>>> (?=($word_pattern))/g) {
>>>
>>> Please test it though. I only ran it on a few patterns.
>>>
>>> Apart from it, this does fix the problem. But I am quite sceptical about
>>> matching 4 or more 2 lettered words in a row. There could be counter
>>> examples but I guess that is very rare. It's not very general, but for
>>> the moment it does the job.
>>>
>>> So I think it's probably good with some changes. Not sure what Joe
>>> would have in mind though.
>>>
>>> Lukas, I think with the changes in place, it is ready to go for discussion.
>>>
>>
>> Dwaipayan, thanks for your review.
>>
>> Lukas
>>
> 
> Hi Sir
> I made these changes:
>  # check for repeated words separated by a single space
>  		if ($rawline =~ /^\+/ || $in_commit_log) {
> -			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> +			# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> +			while ($rawline !~ /(\b[0-9a-f]{2}( )+){4,}/ &&
> +				$rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> 
>  				my $first = $1;
>  				my $second = $2;
> 
> 
> 
> Reports:
> List of errors and warnings after applying the patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/summary.txt
> 
> Change in errors and warnings compared to previous patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/summary_relative.txt
> 
> Dropped warnings compared to previous patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/dropped_warnings/summary.txt
> 
> Thanks
> Aditya
> 

I have also generated report to find the missing warnings from both
versions of my patches. Here is the report:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/dropped_warnings/patch_revision_summary.txt

Thank You
Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21  8:52       ` Lukas Bulwahn
@ 2020-10-21 12:09         ` Aditya
  2020-10-21 12:53           ` Aditya
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Aditya @ 2020-10-21 12:09 UTC (permalink / raw)
  To: Lukas Bulwahn, Dwaipayan Ray; +Cc: linux-kernel-mentees

On 21/10/20 2:22 pm, Lukas Bulwahn wrote:
> 
> 
> On Wed, 21 Oct 2020, Dwaipayan Ray wrote:
> 
>> Hey Aditya and Lukas,
>>
>>>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>>>> index 9b9ffd876e8a..181c95691715 100755
>>>>> --- a/scripts/checkpatch.pl
>>>>> +++ b/scripts/checkpatch.pl
>>>>> @@ -3052,7 +3052,9 @@ sub process {
>>>>>
>>>>>  # check for repeated words separated by a single space
>>>>>             if ($rawline =~ /^\+/ || $in_commit_log) {
>>>>> -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>>>>> +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
>>>>> +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
>>
>> Pattern is probably wrong. It doesn't recognize word boundaries or
>> tabs between words. Example of the first type:
>>
>> 000 00 ff ff ...
>>
> 
> I am wondering if this pattern really appears.
> 
> Hex stuff is usually written two-letter and spaces.
> 
> Maybe it is best to limit it to 0-9a-f, though. I think there should not 
> be matches with other letters than that.
> 
> Aditya, evaluations on those alternatives would help to make decisions.
> 
>> The regex matches "00 00 ff ff" ignoring the first 0.
>>
>> I think it could be perhaps better with something like:
>>
>>  # check for repeated words separated by a single space
>> -               if ($rawline =~ /^\+/ || $in_commit_log) {
>> +               if (($rawline =~ /^\+/ || $in_commit_log) &&
>> +                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
>>                         pos($rawline) = 1 if (!$in_commit_log);
>>                         while ($rawline =~ /\b($word_pattern)
>> (?=($word_pattern))/g) {
>>
>> Please test it though. I only ran it on a few patterns.
>>
>> Apart from it, this does fix the problem. But I am quite sceptical about
>> matching 4 or more 2 lettered words in a row. There could be counter
>> examples but I guess that is very rare. It's not very general, but for
>> the moment it does the job.
>>
>> So I think it's probably good with some changes. Not sure what Joe
>> would have in mind though.
>>
>> Lukas, I think with the changes in place, it is ready to go for discussion.
>>
> 
> Dwaipayan, thanks for your review.
> 
> Lukas
> 

Hi Sir
I made these changes:
 # check for repeated words separated by a single space
 		if ($rawline =~ /^\+/ || $in_commit_log) {
-			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
+			# avoid repeating hex occurrences like 'ff ff fe 09 ...'
+			while ($rawline !~ /(\b[0-9a-f]{2}( )+){4,}/ &&
+				$rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {

 				my $first = $1;
 				my $second = $2;



Reports:
List of errors and warnings after applying the patch:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/summary.txt

Change in errors and warnings compared to previous patch:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/summary_relative.txt

Dropped warnings compared to previous patch:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/dropped_warnings/summary.txt

Thanks
Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21  8:20     ` Dwaipayan Ray
  2020-10-21  8:35       ` Aditya
@ 2020-10-21  8:52       ` Lukas Bulwahn
  2020-10-21 12:09         ` Aditya
  1 sibling, 1 reply; 28+ messages in thread
From: Lukas Bulwahn @ 2020-10-21  8:52 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, Aditya



On Wed, 21 Oct 2020, Dwaipayan Ray wrote:

> Hey Aditya and Lukas,
> 
> > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > index 9b9ffd876e8a..181c95691715 100755
> > > > --- a/scripts/checkpatch.pl
> > > > +++ b/scripts/checkpatch.pl
> > > > @@ -3052,7 +3052,9 @@ sub process {
> > > >
> > > >  # check for repeated words separated by a single space
> > > >             if ($rawline =~ /^\+/ || $in_commit_log) {
> > > > -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > > +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
> > > > +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
> 
> Pattern is probably wrong. It doesn't recognize word boundaries or
> tabs between words. Example of the first type:
> 
> 000 00 ff ff ...
>

I am wondering if this pattern really appears.

Hex stuff is usually written two-letter and spaces.

Maybe it is best to limit it to 0-9a-f, though. I think there should not 
be matches with other letters than that.

Aditya, evaluations on those alternatives would help to make decisions.

> The regex matches "00 00 ff ff" ignoring the first 0.
>
> I think it could be perhaps better with something like:
> 
>  # check for repeated words separated by a single space
> -               if ($rawline =~ /^\+/ || $in_commit_log) {
> +               if (($rawline =~ /^\+/ || $in_commit_log) &&
> +                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
>                         pos($rawline) = 1 if (!$in_commit_log);
>                         while ($rawline =~ /\b($word_pattern)
> (?=($word_pattern))/g) {
> 
> Please test it though. I only ran it on a few patterns.
> 
> Apart from it, this does fix the problem. But I am quite sceptical about
> matching 4 or more 2 lettered words in a row. There could be counter
> examples but I guess that is very rare. It's not very general, but for
> the moment it does the job.
> 
> So I think it's probably good with some changes. Not sure what Joe
> would have in mind though.
> 
> Lukas, I think with the changes in place, it is ready to go for discussion.
>

Dwaipayan, thanks for your review.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21  8:20     ` Dwaipayan Ray
@ 2020-10-21  8:35       ` Aditya
  2020-10-21  8:52       ` Lukas Bulwahn
  1 sibling, 0 replies; 28+ messages in thread
From: Aditya @ 2020-10-21  8:35 UTC (permalink / raw)
  To: Dwaipayan Ray, Lukas Bulwahn; +Cc: linux-kernel-mentees

On 21/10/20 1:50 pm, Dwaipayan Ray wrote:
> Hey Aditya and Lukas,
> 
>>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>>> index 9b9ffd876e8a..181c95691715 100755
>>>> --- a/scripts/checkpatch.pl
>>>> +++ b/scripts/checkpatch.pl
>>>> @@ -3052,7 +3052,9 @@ sub process {
>>>>
>>>>  # check for repeated words separated by a single space
>>>>             if ($rawline =~ /^\+/ || $in_commit_log) {
>>>> -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>>>> +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
>>>> +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
> 
> Pattern is probably wrong. It doesn't recognize word boundaries or
> tabs between words. Example of the first type:
> 
> 000 00 ff ff ...
> 
> The regex matches "00 00 ff ff" ignoring the first 0.
> 
> I think it could be perhaps better with something like:
> 
>  # check for repeated words separated by a single space
> -               if ($rawline =~ /^\+/ || $in_commit_log) {
> +               if (($rawline =~ /^\+/ || $in_commit_log) &&
> +                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
>                         pos($rawline) = 1 if (!$in_commit_log);
>                         while ($rawline =~ /\b($word_pattern)
> (?=($word_pattern))/g) {
> 
> Please test it though. I only ran it on a few patterns.
> 
> Apart from it, this does fix the problem. But I am quite sceptical about
> matching 4 or more 2 lettered words in a row. There could be counter
> examples but I guess that is very rare. It's not very general, but for
> the moment it does the job.
> 
> So I think it's probably good with some changes. Not sure what Joe
> would have in mind though.
> 
> Lukas, I think with the changes in place, it is ready to go for discussion.
> 
> Thanks,
> Dwaipayan.
> 

Thanks Dwaipayan. You're correct.
I'll use \b for checking the word boundaries and regenerate the
reports. I used 4 as the minimum as there were some occurrences with 4
hex words, For eg,
WARNING:REPEATED_WORD: Possible repeated word: 'ff'
#15:
 d68:	61 29 ff ff 	ori     r9,r9,65535

for the commit 332ce969b763 ("powerpc/8xx: Reduce time spent in
allow_user_access() and friends")

In addition to your changes, I also plan to modify regex with [0-9a-f]
(instead of a-z).
I'll apply all the changes and send the report, along with the removed
warnings again.

Thanks
Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21  6:12   ` Lukas Bulwahn
@ 2020-10-21  8:20     ` Dwaipayan Ray
  2020-10-21  8:35       ` Aditya
  2020-10-21  8:52       ` Lukas Bulwahn
  0 siblings, 2 replies; 28+ messages in thread
From: Dwaipayan Ray @ 2020-10-21  8:20 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees, Aditya

Hey Aditya and Lukas,

> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 9b9ffd876e8a..181c95691715 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3052,7 +3052,9 @@ sub process {
> > >
> > >  # check for repeated words separated by a single space
> > >             if ($rawline =~ /^\+/ || $in_commit_log) {
> > > -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
> > > +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&

Pattern is probably wrong. It doesn't recognize word boundaries or
tabs between words. Example of the first type:

000 00 ff ff ...

The regex matches "00 00 ff ff" ignoring the first 0.

I think it could be perhaps better with something like:

 # check for repeated words separated by a single space
-               if ($rawline =~ /^\+/ || $in_commit_log) {
+               if (($rawline =~ /^\+/ || $in_commit_log) &&
+                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
                        pos($rawline) = 1 if (!$in_commit_log);
                        while ($rawline =~ /\b($word_pattern)
(?=($word_pattern))/g) {

Please test it though. I only ran it on a few patterns.

Apart from it, this does fix the problem. But I am quite sceptical about
matching 4 or more 2 lettered words in a row. There could be counter
examples but I guess that is very rare. It's not very general, but for
the moment it does the job.

So I think it's probably good with some changes. Not sure what Joe
would have in mind though.

Lukas, I think with the changes in place, it is ready to go for discussion.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21  5:15 ` Aditya
@ 2020-10-21  6:12   ` Lukas Bulwahn
  2020-10-21  8:20     ` Dwaipayan Ray
  0 siblings, 1 reply; 28+ messages in thread
From: Lukas Bulwahn @ 2020-10-21  6:12 UTC (permalink / raw)
  To: Aditya, Dwaipayan Ray; +Cc: linux-kernel-mentees



On Wed, 21 Oct 2020, Aditya wrote:

> On 21/10/20 10:30 am, Aditya Srivastava wrote:
> > Presence of hexadecimal address or symbol results in false warning
> > message by checkpatch.pl.
> > 
> > For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> > memory leak in mptcp_subflow_create_socket()") results in warning:
> > 
> > WARNING:REPEATED_WORD: Possible repeated word: 'ff'
> >     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
> > 
> > Here, it reports 'ff' to be repeated, but it is infact part of some

Aditya:

s/infact/in fact/

> > address or code, where it has to be repeated. Thus the warning seems
> > unnecessary in this case.
> >

"unnecessary in this case" is a bit weak.
You can say it stronger:

In this case, the intent of the warning to find stylistic issues in commit 
messages is not met and the warning is just completely wrong in this case.

> > To avoid all such reports, add an additional regex check for a repeating
> > pattern of 4 or more 2-lettered words separated by space in a line.
> > 
> > Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
> > ---
> >  scripts/checkpatch.pl | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 9b9ffd876e8a..181c95691715 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -3052,7 +3052,9 @@ sub process {
> >  
> >  # check for repeated words separated by a single space
> >  		if ($rawline =~ /^\+/ || $in_commit_log) {
> > -			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > +			# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> > +			while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
> > +				$rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> >  
> >  				my $first = $1;
> >  				my $second = $2;
> > 
> 
> 
> Report of the impact of patch/changes(taken over v5.6..v5.8):
> 
> List of errors and warnings after applying the patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/summary.txt
> 
> Change in errors and warnings compared to previous patch:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/summary_relative.txt
> 
> Impact/Conclusion:
> It can be seen that a large amount of warning messages under
> REPEATED_WORD were because of such hex occurrences.
> The changes made reduces the warning count of REPEATED_WORD by more
> than 60%, ie. from 2797 to 1015 (over v5.6..v5.8)
>

Aditya, so can you share a quick export of all the warnings that were 
dropped?
Just to check that all of them are hex stuff and we did not silence 
another class of valid REPEATED_WORD warnings?

Dwaipayan, can you review this patch and comment if you would have 
implemented that the same way or if you see other options that you have 
considered during your review?

Aditya, once you address my stylistic comments on the commit message and 
once you got feedback from Dwaipayan, you can send the patch to the 
checkpatch maintainers and lkml and please CC me, Dwaipayan and the 
linux-kernel-mentees list.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
  2020-10-21  5:00 Aditya Srivastava
@ 2020-10-21  5:15 ` Aditya
  2020-10-21  6:12   ` Lukas Bulwahn
  0 siblings, 1 reply; 28+ messages in thread
From: Aditya @ 2020-10-21  5:15 UTC (permalink / raw)
  To: lukas.bulwahn; +Cc: linux-kernel-mentees, yashsri421

On 21/10/20 10:30 am, Aditya Srivastava wrote:
> Presence of hexadecimal address or symbol results in false warning
> message by checkpatch.pl.
> 
> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> memory leak in mptcp_subflow_create_socket()") results in warning:
> 
> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
>     00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
> 
> Here, it reports 'ff' to be repeated, but it is infact part of some
> address or code, where it has to be repeated. Thus the warning seems
> unnecessary in this case.
> 
> To avoid all such reports, add an additional regex check for a repeating
> pattern of 4 or more 2-lettered words separated by space in a line.
> 
> Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
> ---
>  scripts/checkpatch.pl | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9b9ffd876e8a..181c95691715 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3052,7 +3052,9 @@ sub process {
>  
>  # check for repeated words separated by a single space
>  		if ($rawline =~ /^\+/ || $in_commit_log) {
> -			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> +			# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> +			while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
> +				$rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>  
>  				my $first = $1;
>  				my $second = $2;
> 


Report of the impact of patch/changes(taken over v5.6..v5.8):

List of errors and warnings after applying the patch:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/summary.txt

Change in errors and warnings compared to previous patch:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/Task3/relative_summary/summary_relative.txt

Impact/Conclusion:
It can be seen that a large amount of warning messages under
REPEATED_WORD were because of such hex occurrences.
The changes made reduces the warning count of REPEATED_WORD by more
than 60%, ie. from 2797 to 1015 (over v5.6..v5.8)

Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
@ 2020-10-21  5:00 Aditya Srivastava
  2020-10-21  5:15 ` Aditya
  0 siblings, 1 reply; 28+ messages in thread
From: Aditya Srivastava @ 2020-10-21  5:00 UTC (permalink / raw)
  To: lukas.bulwahn; +Cc: linux-kernel-mentees, Aditya Srivastava

Presence of hexadecimal address or symbol results in false warning
message by checkpatch.pl.

For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
memory leak in mptcp_subflow_create_socket()") results in warning:

WARNING:REPEATED_WORD: Possible repeated word: 'ff'
    00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....

Here, it reports 'ff' to be repeated, but it is infact part of some
address or code, where it has to be repeated. Thus the warning seems
unnecessary in this case.

To avoid all such reports, add an additional regex check for a repeating
pattern of 4 or more 2-lettered words separated by space in a line.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
---
 scripts/checkpatch.pl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9b9ffd876e8a..181c95691715 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3052,7 +3052,9 @@ sub process {
 
 # check for repeated words separated by a single space
 		if ($rawline =~ /^\+/ || $in_commit_log) {
-			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
+			# avoid repeating hex occurrences like 'ff ff fe 09 ...'
+			while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
+				$rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
 
 				my $first = $1;
 				my $second = $2;
-- 
2.17.1

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-10-22 14:35 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21 14:44 [Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning Aditya Srivastava
2020-10-21 14:50 ` Lukas Bulwahn
  -- strict thread matches above, loose matches on Subject: below --
2020-10-21 15:01 Aditya Srivastava
2020-10-21 15:08 ` Lukas Bulwahn
2020-10-21 15:18 ` Joe Perches
2020-10-21 15:28   ` Joe Perches
2020-10-21 16:50     ` Joe Perches
2020-10-21 16:59       ` Dwaipayan Ray
2020-10-21 17:17         ` Joe Perches
2020-10-21 17:55       ` Aditya
2020-10-21 18:05         ` Joe Perches
2020-10-21 18:25           ` Aditya
2020-10-21 19:12             ` Lukas Bulwahn
2020-10-22 14:21               ` Aditya
2020-10-22 14:35                 ` Joe Perches
2020-10-21 19:10   ` Aditya
2020-10-21 19:26     ` Joe Perches
2020-10-21 20:36       ` Joe Perches
2020-10-21  5:00 Aditya Srivastava
2020-10-21  5:15 ` Aditya
2020-10-21  6:12   ` Lukas Bulwahn
2020-10-21  8:20     ` Dwaipayan Ray
2020-10-21  8:35       ` Aditya
2020-10-21  8:52       ` Lukas Bulwahn
2020-10-21 12:09         ` Aditya
2020-10-21 12:53           ` Aditya
2020-10-21 12:58           ` Lukas Bulwahn
2020-10-21 12:59           ` Dwaipayan Ray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).