Linux Kernel Mentees Archive on lore.kernel.org
 help / color / Atom feed
* [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
@ 2020-10-14 16:37 Dwaipayan Ray
  2020-10-14 18:03 ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-10-14 16:37 UTC (permalink / raw)
  To: joe; +Cc: dwaipayanray1, linux-kernel-mentees, linux-kernel

Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
moved the repeated word test to check for more file types. But after
this, if checkpatch.pl is run on MAINTAINERS, it generates several
new warnings of the type:

WARNING: Possible repeated word: 'git'

For example:
WARNING: Possible repeated word: 'git'
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git

So, the pattern "git git://..." is a false positive in this case.

Add 'git' to the exception list for repeated word check. This effectively
fixes all the newly generated false positives.

Fixes: 4f6ad8aa1eac ("checkpatch: move repeated word test")
Link: https://lore.kernel.org/linux-kernel-mentees/b6cd81b936671a8868fe98536d7c80771bdfd61c.camel@perches.com/

Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f1a4e61917eb..b55d83360366 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3063,7 +3063,7 @@ sub process {
 				}
 
 				next if ($first ne $second);
-				next if ($first eq 'long');
+				next if ($first =~ /^(?:long|git)$/);
 
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&
-- 
2.27.0

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-14 16:37 [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check Dwaipayan Ray
@ 2020-10-14 18:03 ` Joe Perches
  2020-10-14 18:12   ` Dwaipayan Ray
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-10-14 18:03 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel

On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> moved the repeated word test to check for more file types. But after
> this, if checkpatch.pl is run on MAINTAINERS, it generates several
> new warnings of the type:

Perhaps instead of adding more content checks so that
word boundaries are not something like \S but also
not punctuation so that content like

	git git://
	@size size

does not match?


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-14 18:03 ` Joe Perches
@ 2020-10-14 18:12   ` Dwaipayan Ray
  2020-10-14 18:35     ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-10-14 18:12 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel

On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <joe@perches.com> wrote:
>
> On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > moved the repeated word test to check for more file types. But after
> > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > new warnings of the type:
>
> Perhaps instead of adding more content checks so that
> word boundaries are not something like \S but also
> not punctuation so that content like
>
>         git git://
>         @size size
>
> does not match?
>
>
Hi,
So currently the words are trimmed of non alphabets before the check:

while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
my $first = $1;
my $second = $2;

where, the word_pattern is:
my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';

So do you perhaps recommend modifying this word pattern to
include the punctuation as well rather than trimming them off?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-14 18:12   ` Dwaipayan Ray
@ 2020-10-14 18:35     ` Joe Perches
  2020-10-17  2:56       ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-10-14 18:35 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel

On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <joe@perches.com> wrote:
> > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > > moved the repeated word test to check for more file types. But after
> > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > new warnings of the type:
> > 
> > Perhaps instead of adding more content checks so that
> > word boundaries are not something like \S but also
> > not punctuation so that content like
> > 
> >         git git://
> >         @size size
> > 
> > does not match?
> > 
> > 
> Hi,
> So currently the words are trimmed of non alphabets before the check:
> 
> while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> my $first = $1;
> my $second = $2;
> 
> where, the word_pattern is:
> my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';

I'm familiar.

> So do you perhaps recommend modifying this word pattern to
> include the punctuation as well rather than trimming them off?

Not really, perhaps use the capture group position
markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
substr could be used to see what characters are
before and after the word matches.
> 
> Thanks,
> Dwaipayan.

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-14 18:35     ` Joe Perches
@ 2020-10-17  2:56       ` Joe Perches
  2020-10-17  4:32         ` Dwaipayan Ray
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-10-17  2:56 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel

On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote:
> On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <joe@perches.com> wrote:
> > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > > > moved the repeated word test to check for more file types. But after
> > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > > new warnings of the type:
> > > 
> > > Perhaps instead of adding more content checks so that
> > > word boundaries are not something like \S but also
> > > not punctuation so that content like
> > > 
> > >         git git://
> > >         @size size
> > > 
> > > does not match?
> > > 
> > > 
> > Hi,
> > So currently the words are trimmed of non alphabets before the check:
> > 
> > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > my $first = $1;
> > my $second = $2;
> > 
> > where, the word_pattern is:
> > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> 
> I'm familiar.
> 
> > So do you perhaps recommend modifying this word pattern to
> > include the punctuation as well rather than trimming them off?
> 
> Not really, perhaps use the capture group position
> markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
> substr could be used to see what characters are
> before and after the word matches.

Perhaps something like:
---
 scripts/checkpatch.pl | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fab38b493cef..a65eb40a5539 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3054,15 +3054,25 @@ sub process {
 
 				my $first = $1;
 				my $second = $2;
+				my $start_pos = $-[1];
+				my $end_pos = $+[2];
 
 				if ($first =~ /(?:struct|union|enum)/) {
 					pos($rawline) += length($first) + length($second) + 1;
 					next;
 				}
 
-				next if ($first ne $second);
+				next if (lc($first) ne lc($second));
 				next if ($first eq 'long');
 
+				my $start_char = "";
+				my $end_char = "";
+				$start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0);
+				$end_char = substr($rawline, $end_pos, 1) if (length($rawline) > $end_pos);
+
+				next if ($start_char =~ /^\S$/);
+				next if ($end_char !~ /^[\.\,\s]?$/);
+
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&
 				    $fix) {


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-17  2:56       ` Joe Perches
@ 2020-10-17  4:32         ` Dwaipayan Ray
  2020-10-17  4:42           ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-10-17  4:32 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel

On Sat, Oct 17, 2020 at 8:26 AM Joe Perches <joe@perches.com> wrote:
>
> On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote:
> > On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> > > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <joe@perches.com> wrote:
> > > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > > > > moved the repeated word test to check for more file types. But after
> > > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > > > new warnings of the type:
> > > >
> > > > Perhaps instead of adding more content checks so that
> > > > word boundaries are not something like \S but also
> > > > not punctuation so that content like
> > > >
> > > >         git git://
> > > >         @size size
> > > >
> > > > does not match?
> > > >
> > > >
> > > Hi,
> > > So currently the words are trimmed of non alphabets before the check:
> > >
> > > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > my $first = $1;
> > > my $second = $2;
> > >
> > > where, the word_pattern is:
> > > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> >
> > I'm familiar.
> >
> > > So do you perhaps recommend modifying this word pattern to
> > > include the punctuation as well rather than trimming them off?
> >
> > Not really, perhaps use the capture group position
> > markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
> > substr could be used to see what characters are
> > before and after the word matches.
>
> Perhaps something like:
> ---
>  scripts/checkpatch.pl | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..a65eb40a5539 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3054,15 +3054,25 @@ sub process {
>
>                                 my $first = $1;
>                                 my $second = $2;
> +                               my $start_pos = $-[1];
> +                               my $end_pos = $+[2];
>
>                                 if ($first =~ /(?:struct|union|enum)/) {
>                                         pos($rawline) += length($first) + length($second) + 1;
>                                         next;
>                                 }
>
> -                               next if ($first ne $second);
> +                               next if (lc($first) ne lc($second));
>                                 next if ($first eq 'long');
>
> +                               my $start_char = "";
> +                               my $end_char = "";
> +                               $start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0);
> +                               $end_char = substr($rawline, $end_pos, 1) if (length($rawline) > $end_pos);
> +
> +                               next if ($start_char =~ /^\S$/);
> +                               next if ($end_char !~ /^[\.\,\s]?$/);
> +
>                                 if (WARN("REPEATED_WORD",
>                                          "Possible repeated word: '$first'\n" . $herecurr) &&
>                                     $fix) {
>
>

Hi Joe,
Thank you for the insight. I was also doing something similar:

---
 scripts/checkpatch.pl | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f1a4e61917eb..82497a71ac96 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -595,6 +595,7 @@ our @mode_permission_funcs = (
 );

 my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
+my $punctuation_chars = '[,:;@\.\-]';

 #Create a search pattern for all these functions to speed up a loop below
 our $mode_perms_search = "";
@@ -3065,6 +3066,21 @@ sub process {
  next if ($first ne $second);
  next if ($first eq 'long');

+ # check for character before and after the word matches
+ my $ca_first = substr($rawline, $-[1]-1, 1);
+ my $cb_first = substr($rawline, $+[1], 1);
+ my $ca_second = substr($rawline, $-[2]-1, 1);
+ my $cb_second = substr($rawline, $+[2], 1);
+
+ if ($ca_first ne $ca_second || $cb_first ne $cb_second) {
+ if ($ca_first =~ /$punctuation_chars/ ||
+     $ca_second =~ /$punctuation_chars/ ||
+     $cb_first =~ /$punctuation_chars/ ||
+     $cb_second =~ /$punctuation_chars/) {
+ next;
+ }
+ }
+
  if (WARN("REPEATED_WORD",
  "Possible repeated word: '$first'\n" . $herecurr) &&
      $fix) {

Does it look okay to you??

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-17  4:32         ` Dwaipayan Ray
@ 2020-10-17  4:42           ` Joe Perches
  2020-10-17  4:49             ` Dwaipayan Ray
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-10-17  4:42 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees, linux-kernel

On Sat, 2020-10-17 at 10:02 +0530, Dwaipayan Ray wrote:
> On Sat, Oct 17, 2020 at 8:26 AM Joe Perches <joe@perches.com> wrote:
> > On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote:
> > > On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> > > > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <joe@perches.com> wrote:
> > > > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > > > > > moved the repeated word test to check for more file types. But after
> > > > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > > > > new warnings of the type:
> > > > > 
> > > > > Perhaps instead of adding more content checks so that
> > > > > word boundaries are not something like \S but also
> > > > > not punctuation so that content like
> > > > > 
> > > > >         git git://
> > > > >         @size size
> > > > > 
> > > > > does not match?
> > > > > 
> > > > > 
> > > > Hi,
> > > > So currently the words are trimmed of non alphabets before the check:
> > > > 
> > > > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > > my $first = $1;
> > > > my $second = $2;
> > > > 
> > > > where, the word_pattern is:
> > > > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> > > 
> > > I'm familiar.
> > > 
> > > > So do you perhaps recommend modifying this word pattern to
> > > > include the punctuation as well rather than trimming them off?
> > > 
> > > Not really, perhaps use the capture group position
> > > markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
> > > substr could be used to see what characters are
> > > before and after the word matches.
> > 
> > Perhaps something like:
> > ---
> >  scripts/checkpatch.pl | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index fab38b493cef..a65eb40a5539 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -3054,15 +3054,25 @@ sub process {
> > 
> >                                 my $first = $1;
> >                                 my $second = $2;
> > +                               my $start_pos = $-[1];
> > +                               my $end_pos = $+[2];
> > 
> >                                 if ($first =~ /(?:struct|union|enum)/) {
> >                                         pos($rawline) += length($first) + length($second) + 1;
> >                                         next;
> >                                 }
> > 
> > -                               next if ($first ne $second);
> > +                               next if (lc($first) ne lc($second));
> >                                 next if ($first eq 'long');
> > 
> > +                               my $start_char = "";
> > +                               my $end_char = "";
> > +                               $start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0);
> > +                               $end_char = substr($rawline, $end_pos, 1) if (length($rawline) > $end_pos);
> > +
> > +                               next if ($start_char =~ /^\S$/);
> > +                               next if ($end_char !~ /^[\.\,\s]?$/);
> > +
> >                                 if (WARN("REPEATED_WORD",
> >                                          "Possible repeated word: '$first'\n" . $herecurr) &&
> >                                     $fix) {
> > 
> > 
> 
> Hi Joe,
> Thank you for the insight. I was also doing something similar:
> 
> ---
>  scripts/checkpatch.pl | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index f1a4e61917eb..82497a71ac96 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -595,6 +595,7 @@ our @mode_permission_funcs = (
>  );
> 
>  my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> +my $punctuation_chars = '[,:;@\.\-]';
> 
>  #Create a search pattern for all these functions to speed up a loop below
>  our $mode_perms_search = "";
> @@ -3065,6 +3066,21 @@ sub process {
>   next if ($first ne $second);
>   next if ($first eq 'long');
> 
> + # check for character before and after the word matches
> + my $ca_first = substr($rawline, $-[1]-1, 1);
> + my $cb_first = substr($rawline, $+[1], 1);
> + my $ca_second = substr($rawline, $-[2]-1, 1);
> + my $cb_second = substr($rawline, $+[2], 1);
> +
> + if ($ca_first ne $ca_second || $cb_first ne $cb_second) {
> + if ($ca_first =~ /$punctuation_chars/ ||
> +     $ca_second =~ /$punctuation_chars/ ||
> +     $cb_first =~ /$punctuation_chars/ ||
> +     $cb_second =~ /$punctuation_chars/) {
> + next;
> + }
> + }
> +
>   if (WARN("REPEATED_WORD",
>   "Possible repeated word: '$first'\n" . $herecurr) &&
>       $fix) {
> 
> Does it look okay to you??

Not really, as ca_second and cb_first are both known
to be the same position and known to be a single space.


_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check
  2020-10-17  4:42           ` Joe Perches
@ 2020-10-17  4:49             ` Dwaipayan Ray
  0 siblings, 0 replies; 8+ messages in thread
From: Dwaipayan Ray @ 2020-10-17  4:49 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel

On Sat, Oct 17, 2020 at 10:12 AM Joe Perches <joe@perches.com> wrote:
>
> On Sat, 2020-10-17 at 10:02 +0530, Dwaipayan Ray wrote:
> > On Sat, Oct 17, 2020 at 8:26 AM Joe Perches <joe@perches.com> wrote:
> > > On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote:
> > > > On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> > > > > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches <joe@perches.com> wrote:
> > > > > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > > > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > > > > > > moved the repeated word test to check for more file types. But after
> > > > > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > > > > > new warnings of the type:
> > > > > >
> > > > > > Perhaps instead of adding more content checks so that
> > > > > > word boundaries are not something like \S but also
> > > > > > not punctuation so that content like
> > > > > >
> > > > > >         git git://
> > > > > >         @size size
> > > > > >
> > > > > > does not match?
> > > > > >
> > > > > >
> > > > > Hi,
> > > > > So currently the words are trimmed of non alphabets before the check:
> > > > >
> > > > > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > > > my $first = $1;
> > > > > my $second = $2;
> > > > >
> > > > > where, the word_pattern is:
> > > > > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> > > >
> > > > I'm familiar.
> > > >
> > > > > So do you perhaps recommend modifying this word pattern to
> > > > > include the punctuation as well rather than trimming them off?
> > > >
> > > > Not really, perhaps use the capture group position
> > > > markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
> > > > substr could be used to see what characters are
> > > > before and after the word matches.
> > >
> > > Perhaps something like:
> > > ---
> > >  scripts/checkpatch.pl | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index fab38b493cef..a65eb40a5539 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3054,15 +3054,25 @@ sub process {
> > >
> > >                                 my $first = $1;
> > >                                 my $second = $2;
> > > +                               my $start_pos = $-[1];
> > > +                               my $end_pos = $+[2];
> > >
> > >                                 if ($first =~ /(?:struct|union|enum)/) {
> > >                                         pos($rawline) += length($first) + length($second) + 1;
> > >                                         next;
> > >                                 }
> > >
> > > -                               next if ($first ne $second);
> > > +                               next if (lc($first) ne lc($second));
> > >                                 next if ($first eq 'long');
> > >
> > > +                               my $start_char = "";
> > > +                               my $end_char = "";
> > > +                               $start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > 0);
> > > +                               $end_char = substr($rawline, $end_pos, 1) if (length($rawline) > $end_pos);
> > > +
> > > +                               next if ($start_char =~ /^\S$/);
> > > +                               next if ($end_char !~ /^[\.\,\s]?$/);
> > > +
> > >                                 if (WARN("REPEATED_WORD",
> > >                                          "Possible repeated word: '$first'\n" . $herecurr) &&
> > >                                     $fix) {
> > >
> > >
> >
> > Hi Joe,
> > Thank you for the insight. I was also doing something similar:
> >
> > ---
> >  scripts/checkpatch.pl | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index f1a4e61917eb..82497a71ac96 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -595,6 +595,7 @@ our @mode_permission_funcs = (
> >  );
> >
> >  my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> > +my $punctuation_chars = '[,:;@\.\-]';
> >
> >  #Create a search pattern for all these functions to speed up a loop below
> >  our $mode_perms_search = "";
> > @@ -3065,6 +3066,21 @@ sub process {
> >   next if ($first ne $second);
> >   next if ($first eq 'long');
> >
> > + # check for character before and after the word matches
> > + my $ca_first = substr($rawline, $-[1]-1, 1);
> > + my $cb_first = substr($rawline, $+[1], 1);
> > + my $ca_second = substr($rawline, $-[2]-1, 1);
> > + my $cb_second = substr($rawline, $+[2], 1);
> > +
> > + if ($ca_first ne $ca_second || $cb_first ne $cb_second) {
> > + if ($ca_first =~ /$punctuation_chars/ ||
> > +     $ca_second =~ /$punctuation_chars/ ||
> > +     $cb_first =~ /$punctuation_chars/ ||
> > +     $cb_second =~ /$punctuation_chars/) {
> > + next;
> > + }
> > + }
> > +
> >   if (WARN("REPEATED_WORD",
> >   "Possible repeated word: '$first'\n" . $herecurr) &&
> >       $fix) {
> >
> > Does it look okay to you??
>
> Not really, as ca_second and cb_first are both known
> to be the same position and known to be a single space.
>
>
Oh right, my bad. I will adjust it with your suggestions and
send in a V2 then.

Thanks for your time.

Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 16:37 [Linux-kernel-mentees] [PATCH v2] checkpatch: add new exception to repeated word check Dwaipayan Ray
2020-10-14 18:03 ` Joe Perches
2020-10-14 18:12   ` Dwaipayan Ray
2020-10-14 18:35     ` Joe Perches
2020-10-17  2:56       ` Joe Perches
2020-10-17  4:32         ` Dwaipayan Ray
2020-10-17  4:42           ` Joe Perches
2020-10-17  4:49             ` Dwaipayan Ray

Linux Kernel Mentees Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-kernel-mentees/0 linux-kernel-mentees/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-kernel-mentees linux-kernel-mentees/ https://lore.kernel.org/linux-kernel-mentees \
		linux-kernel-mentees@lists.linuxfoundation.org linux-kernel-mentees@lists.linux-foundation.org
	public-inbox-index linux-kernel-mentees

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linuxfoundation.lists.linux-kernel-mentees


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git