linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] checkpatch: add new exception to repeated word check
@ 2020-10-17  7:51 Dwaipayan Ray
  2020-10-17 15:54 ` Joe Perches
  0 siblings, 1 reply; 3+ messages in thread
From: Dwaipayan Ray @ 2020-10-17  7:51 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel-mentees, dwaipayanray1, linux-kernel, lukas.bulwahn

Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
moved the repeated word test to check for more file types. But after
this, if checkpatch.pl is run on MAINTAINERS, it generates several
new warnings of the type:

WARNING: Possible repeated word: 'git'

For example:
WARNING: Possible repeated word: 'git'
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git

So, the pattern "git git://..." is a false positive in this case.

There are several other combinations which may produce a wrong
warning message, such as "@size size", ":Begin begin", etc.

Extend repeated word check to compare the characters before and
after the word matches. If the preceding or succeeding character
belongs to the exception list, the warning is avoided.

Link: https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201ebcd4876f1.camel@perches.com/
Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f1a4e61917eb..5dc5bf75c6e7 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3052,19 +3052,30 @@ sub process {
 
 # check for repeated words separated by a single space
 		if ($rawline =~ /^\+/ || $in_commit_log) {
+			pos($rawline) = 1 if (!$in_commit_log);
 			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
 
 				my $first = $1;
 				my $second = $2;
-
+				my $start_pos = $-[1];
+				my $end_pos = $+[2];
 				if ($first =~ /(?:struct|union|enum)/) {
 					pos($rawline) += length($first) + length($second) + 1;
 					next;
 				}
 
-				next if ($first ne $second);
+				next if (lc($first) ne lc($second));
 				next if ($first eq 'long');
 
+				# check for character before and after the word matches
+				my $start_char = '';
+				my $end_char = '';
+				$start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > ($in_commit_log? 0 : 1));
+				$end_char = substr($rawline, $end_pos, 1) if ($end_pos < length($rawline));
+
+				next if ($start_char =~ /^\S$/);
+				next if ($end_char !~ /^[\.\,\;\?\!\s]?$/);
+
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&
 				    $fix) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] checkpatch: add new exception to repeated word check
  2020-10-17  7:51 [PATCH v4] checkpatch: add new exception to repeated word check Dwaipayan Ray
@ 2020-10-17 15:54 ` Joe Perches
  2020-10-17 16:01   ` Dwaipayan Ray
  0 siblings, 1 reply; 3+ messages in thread
From: Joe Perches @ 2020-10-17 15:54 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel, lukas.bulwahn

On Sat, 2020-10-17 at 13:21 +0530, Dwaipayan Ray wrote:
> Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> moved the repeated word test to check for more file types. But after
> this, if checkpatch.pl is run on MAINTAINERS, it generates several
> new warnings of the type:
> 
> WARNING: Possible repeated word: 'git'
> 
> For example:
> WARNING: Possible repeated word: 'git'
> +T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
> 
> So, the pattern "git git://..." is a false positive in this case.
> 
> There are several other combinations which may produce a wrong
> warning message, such as "@size size", ":Begin begin", etc.
> 
> Extend repeated word check to compare the characters before and
> after the word matches. If the preceding or succeeding character
> belongs to the exception list, the warning is avoided.

Not true.

This excludes any non-space character before the first word
and excludes space or punctuation after the second word.

This also adds case insensitive word matching.

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -3052,19 +3052,30 @@ sub process {
>  
>  # check for repeated words separated by a single space
>  		if ($rawline =~ /^\+/ || $in_commit_log) {
> +			pos($rawline) = 1 if (!$in_commit_log);
>  			while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>  
>  				my $first = $1;
>  				my $second = $2;
> -
> +				my $start_pos = $-[1];
> +				my $end_pos = $+[2];
>  				if ($first =~ /(?:struct|union|enum)/) {
>  					pos($rawline) += length($first) + length($second) + 1;
>  					next;
>  				}
>  
> -				next if ($first ne $second);
> +				next if (lc($first) ne lc($second));

case-insensitive matching

>  				next if ($first eq 'long');
>  
> +				# check for character before and after the word matches
> +				my $start_char = '';
> +				my $end_char = '';
> +				$start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > ($in_commit_log? 0 : 1));
> +				$end_char = substr($rawline, $end_pos, 1) if ($end_pos < length($rawline));
> +
> +				next if ($start_char =~ /^\S$/);

non-space

> +				next if ($end_char !~ /^[\.\,\;\?\!\s]?$/);

space or punctuation.

trivia:

I believe using index would be ~50% faster than !~ here
Perhaps more readable too.

				next if (index(" \t.,;?!", $end_char) >= 0);

> +
>  				if (WARN("REPEATED_WORD",
>  					 "Possible repeated word: '$first'\n" . $herecurr) &&
>  				    $fix) {


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] checkpatch: add new exception to repeated word check
  2020-10-17 15:54 ` Joe Perches
@ 2020-10-17 16:01   ` Dwaipayan Ray
  0 siblings, 0 replies; 3+ messages in thread
From: Dwaipayan Ray @ 2020-10-17 16:01 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn

On Sat, Oct 17, 2020 at 9:24 PM Joe Perches <joe@perches.com> wrote:
>
> On Sat, 2020-10-17 at 13:21 +0530, Dwaipayan Ray wrote:
> > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > moved the repeated word test to check for more file types. But after
> > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > new warnings of the type:
> >
> > WARNING: Possible repeated word: 'git'
> >
> > For example:
> > WARNING: Possible repeated word: 'git'
> > +T:   git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
> >
> > So, the pattern "git git://..." is a false positive in this case.
> >
> > There are several other combinations which may produce a wrong
> > warning message, such as "@size size", ":Begin begin", etc.
> >
> > Extend repeated word check to compare the characters before and
> > after the word matches. If the preceding or succeeding character
> > belongs to the exception list, the warning is avoided.
>
> Not true.
>
> This excludes any non-space character before the first word
> and excludes space or punctuation after the second word.
>
> This also adds case insensitive word matching.
>

Right, I didn't change the commit message. I will update it
properly.

> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -3052,19 +3052,30 @@ sub process {
> >
> >  # check for repeated words separated by a single space
> >               if ($rawline =~ /^\+/ || $in_commit_log) {
> > +                     pos($rawline) = 1 if (!$in_commit_log);
> >                       while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> >
> >                               my $first = $1;
> >                               my $second = $2;
> > -
> > +                             my $start_pos = $-[1];
> > +                             my $end_pos = $+[2];
> >                               if ($first =~ /(?:struct|union|enum)/) {
> >                                       pos($rawline) += length($first) + length($second) + 1;
> >                                       next;
> >                               }
> >
> > -                             next if ($first ne $second);
> > +                             next if (lc($first) ne lc($second));
>
> case-insensitive matching
>
> >                               next if ($first eq 'long');
> >
> > +                             # check for character before and after the word matches
> > +                             my $start_char = '';
> > +                             my $end_char = '';
> > +                             $start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > ($in_commit_log? 0 : 1));
> > +                             $end_char = substr($rawline, $end_pos, 1) if ($end_pos < length($rawline));
> > +
> > +                             next if ($start_char =~ /^\S$/);
>
> non-space
>
> > +                             next if ($end_char !~ /^[\.\,\;\?\!\s]?$/);
>
> space or punctuation.
>
> trivia:
>
> I believe using index would be ~50% faster than !~ here
> Perhaps more readable too.
>
>                                 next if (index(" \t.,;?!", $end_char) >= 0);
>
Yes, it looks better. Thanks for the suggestion.

I will send in a new iteration after all of this is incorporated.

Thank you,
Dwaipayan.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-17 16:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-17  7:51 [PATCH v4] checkpatch: add new exception to repeated word check Dwaipayan Ray
2020-10-17 15:54 ` Joe Perches
2020-10-17 16:01   ` Dwaipayan Ray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).