Linux Kernel Mentees Archive on lore.kernel.org
 help / color / Atom feed
* [Linux-kernel-mentees] [PATCH v3] checkpatch: add new exception to repeated word check
@ 2020-10-17  5:22 Dwaipayan Ray
  2020-10-17  5:41 ` Joe Perches
  0 siblings, 1 reply; 4+ messages in thread
From: Dwaipayan Ray @ 2020-10-17  5:22 UTC (permalink / raw)
  To: joe; +Cc: dwaipayanray1, linux-kernel-mentees, linux-kernel

Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
moved the repeated word test to check for more file types. But after
this, if checkpatch.pl is run on MAINTAINERS, it generates several
new warnings of the type:

WARNING: Possible repeated word: 'git'

For example:
WARNING: Possible repeated word: 'git'
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git

So, the pattern "git git://..." is a false positive in this case.

There are several other combinations which may produce a wrong
warning message, such as "@size size", ":Begin begin", etc.

Extend repeated word check to compare the characters before and
after the word matches. If the preceding or succeeding character
belongs to the exception list, the warning is avoided.

Link: https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201ebcd4876f1.camel@perches.com/
Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f1a4e61917eb..89430dfd6652 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -595,6 +595,7 @@ our @mode_permission_funcs = (
 );
 
 my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
+my $exclude_chars = '[^\.\,\+\s]';
 
 #Create a search pattern for all these functions to speed up a loop below
 our $mode_perms_search = "";
@@ -3056,15 +3057,27 @@ sub process {
 
 				my $first = $1;
 				my $second = $2;
-
+				my $start_pos = $-[1];
+				my $end_pos = $+[2];
 				if ($first =~ /(?:struct|union|enum)/) {
 					pos($rawline) += length($first) + length($second) + 1;
 					next;
 				}
 
-				next if ($first ne $second);
+				next if (lc($first) ne lc($second));
 				next if ($first eq 'long');
 
+				# check for character before and after the word matches
+				my $start_char = '';
+				my $end_char = '';
+				$start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0);
+				$end_char = substr($rawline, $end_pos, 1) if ($end_pos <= length($rawline));
+
+				if ($start_char =~ /^$exclude_chars$/ ||
+				    $end_char =~ /^$exclude_chars$/) {
+					next;
+				}
+
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&
 				    $fix) {
-- 
2.27.0

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v3] checkpatch: add new exception to repeated word check
  2020-10-17  5:22 [Linux-kernel-mentees] [PATCH v3] checkpatch: add new exception to repeated word check Dwaipayan Ray
@ 2020-10-17  5:41 ` Joe Perches
  2020-10-17  6:02   ` Dwaipayan Ray
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Perches @ 2020-10-17  5:41 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel

On Sat, 2020-10-17 at 10:52 +0530, Dwaipayan Ray wrote:
> Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> moved the repeated word test to check for more file types. But after
> this, if checkpatch.pl is run on MAINTAINERS, it generates several
> new warnings of the type:
> 
> WARNING: Possible repeated word: 'git'
> 
> For example:
> WARNING: Possible repeated word: 'git'
> +T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
> 
> So, the pattern "git git://..." is a false positive in this case.
> 
> There are several other combinations which may produce a wrong
> warning message, such as "@size size", ":Begin begin", etc.
> 
> Extend repeated word check to compare the characters before and
> after the word matches. If the preceding or succeeding character
> belongs to the exception list, the warning is avoided.
> 
> Link: https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201ebcd4876f1.camel@perches.com/
> Suggested-by: Joe Perches <joe@perches.com>
> Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> ---
>  scripts/checkpatch.pl | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index f1a4e61917eb..89430dfd6652 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -595,6 +595,7 @@ our @mode_permission_funcs = (
>  );
>  
>  my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> +my $exclude_chars = '[^\.\,\+\s]';

Why include a + character here?

>  #Create a search pattern for all these functions to speed up a loop below
>  our $mode_perms_search = "";
> @@ -3056,15 +3057,27 @@ sub process {
>  
>  				my $first = $1;
>  				my $second = $2;
> -
> +				my $start_pos = $-[1];
> +				my $end_pos = $+[2];
>  				if ($first =~ /(?:struct|union|enum)/) {
>  					pos($rawline) += length($first) + length($second) + 1;
>  					next;
>  				}
>  
> -				next if ($first ne $second);
> +				next if (lc($first) ne lc($second));
>  				next if ($first eq 'long');
>  
> +				# check for character before and after the word matches
> +				my $start_char = '';
> +				my $end_char = '';
> +				$start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0);
> +				$end_char = substr($rawline, $end_pos, 1) if ($end_pos <= length($rawline));


substr uses index 0, so I believe the if should be < 

> +
> +				if ($start_char =~ /^$exclude_chars$/ ||
> +				    $end_char =~ /^$exclude_chars$/) {
> +					next;
> +				}
 
Please use "next if (test);" to be similar to the other uses above.

And this doesn't work on end of phrase or sentence.

ie: "my sentence is is, a duplicate word word."

so $end_char could be a comma or a period.

so likely the $end_char test should be !~

What is the reason to add and use $exclude_chars?


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v3] checkpatch: add new exception to repeated word check
  2020-10-17  5:41 ` Joe Perches
@ 2020-10-17  6:02   ` Dwaipayan Ray
  2020-10-17  7:28     ` Joe Perches
  0 siblings, 1 reply; 4+ messages in thread
From: Dwaipayan Ray @ 2020-10-17  6:02 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel

> Why include a + character here?
>
Hi,
I tried it without + first, but then lines like
"The the repeated word."
didn't register a warning.

I think checkpatch adds a + to the line when used on
files. Am not sure but my $rawline was:
+The the repeated word.

> Please use "next if (test);" to be similar to the other uses above.
>
> And this doesn't work on end of phrase or sentence.
>
> ie: "my sentence is is, a duplicate word word."
>
> so $end_char could be a comma or a period.
>
> so likely the $end_char test should be !~
>

I tried on "my sentence is is, a duplicate word word.",
and got the following:

WARNING: Possible repeated word: 'is'
#8: FILE: MAINTAINERS:8:
+my sentence is is, a duplicate word word.

WARNING: Possible repeated word: 'word'
#8: FILE: MAINTAINERS:8:
+my sentence is is, a duplicate word word.

Am I doing something wrong?

> What is the reason to add and use $exclude_chars?
>
I am comparing both start_char and end_char to find
whether they have the characters which will exclude them
from repeated word check. So i am keeping the common
variable to match from. I thought I would do that so that
more exceptions could be added later on easily.

I might be wrong in doing that. What do you think?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v3] checkpatch: add new exception to repeated word check
  2020-10-17  6:02   ` Dwaipayan Ray
@ 2020-10-17  7:28     ` Joe Perches
  0 siblings, 0 replies; 4+ messages in thread
From: Joe Perches @ 2020-10-17  7:28 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel

On Sat, 2020-10-17 at 11:32 +0530, Dwaipayan Ray wrote:
> > Why include a + character here?
> > 
> Hi,
> I tried it without + first, but then lines like
> "The the repeated word."
> didn't register a warning.
> 
> I think checkpatch adds a + to the line when used on
> files. Am not sure but my $rawline was:
> +The the repeated word.

The + is the first character of an added line in a
patch.

That's different from lines in a commit message so
there needs to be an additional mechanism to strip
the leading + when not !$in_commit_log.

Add:
	pos($rawline) = 1 if (!$in_commit_log);

and test the start position too

---
 scripts/checkpatch.pl | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fab38b493cef..99563b3d5a3e 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3050,19 +3050,28 @@ sub process {
 
 # check for repeated words separated by a single space
 		if ($rawline =~ /^\+/ || $in_commit_log) {
+			pos($rawline) = 1 if (!$in_commit_log);
 			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
-
 				my $first = $1;
 				my $second = $2;
+				my $start_pos = $-[1];
+				my $end_pos = $+[2];
 
 				if ($first =~ /(?:struct|union|enum)/) {
 					pos($rawline) += length($first) + length($second) + 1;
 					next;
 				}
 
-				next if ($first ne $second);
+				next if (lc($first) ne lc($second));
 				next if ($first eq 'long');
 
+				my $start_char = "";
+				my $end_char = "";
+				$start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > ($in_commit_log ? 0 : 1));
+				$end_char = substr($rawline, $end_pos, 1) if (length($rawline) > $end_pos);
+				next if ($start_char =~ /^\S$/);
+				next if ($end_char !~ /^[\.\,\s]?$/);
+
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&
 				    $fix) {


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-17  5:22 [Linux-kernel-mentees] [PATCH v3] checkpatch: add new exception to repeated word check Dwaipayan Ray
2020-10-17  5:41 ` Joe Perches
2020-10-17  6:02   ` Dwaipayan Ray
2020-10-17  7:28     ` Joe Perches

Linux Kernel Mentees Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-kernel-mentees/0 linux-kernel-mentees/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-kernel-mentees linux-kernel-mentees/ https://lore.kernel.org/linux-kernel-mentees \
		linux-kernel-mentees@lists.linuxfoundation.org linux-kernel-mentees@lists.linux-foundation.org
	public-inbox-index linux-kernel-mentees

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linuxfoundation.lists.linux-kernel-mentees


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git