linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] checkpatch: add new cases to commit handling
@ 2020-11-13 12:31 Dwaipayan Ray
  2020-11-13 13:37 ` Lukas Bulwahn
  0 siblings, 1 reply; 4+ messages in thread
From: Dwaipayan Ray @ 2020-11-13 12:31 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel-mentees, dwaipayanray1, linux-kernel, lukas.bulwahn

Commit extraction in checkpatch fails in some cases.
One of the most common false positives is a split line
between "commit" and the git SHA of the commit.

Improve commit handling to reduce false positives.

Improvements:
- handle split line between commit and git SHA of commit.
- fix handling of split commit description.

A quick evaluation of 50k commits from v5.4 showed that
the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
of these were split lines between commit and its hash.

Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
 scripts/checkpatch.pl | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 024514946bed..f5ba2beac008 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2990,6 +2990,16 @@ sub process {
 			if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) {
 				$orig_desc = $1;
 				$hasparens = 1;
+			} elsif ($line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i &&
+				 defined $rawlines[$linenr-2] &&
+				 $rawlines[$linenr-2] =~ /\bcommit\s*$/i) {
+				$line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i;
+				$orig_desc = $1;
+				$hasparens = 1;
+				$space = 0;
+				$short = 0 if ($line =~ /\b[0-9a-f]{12,40}/i);
+				$long = 1 if ($line =~ /\b[0-9a-f]{41,}/i);
+				$case = 0 if ($line =~ /\b[0-9a-f]{5,40}[^A-F]/ && $rawlines[$linenr-2] =~ /\b[Cc]ommit\s*$/);
 			} elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i &&
 				 defined $rawlines[$linenr] &&
 				 $rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) {
@@ -3001,7 +3011,9 @@ sub process {
 				$line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i;
 				$orig_desc = $1;
 				$rawlines[$linenr] =~ /^\s*([^"]+)"\)/;
-				$orig_desc .= " " . $1;
+				my $split_desc = $1;
+				$split_desc = " $split_desc" if ($line =~ /[\w\,\.]$/);
+				$orig_desc .= $split_desc;
 				$hasparens = 1;
 			}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] checkpatch: add new cases to commit handling
  2020-11-13 12:31 [PATCH RFC] checkpatch: add new cases to commit handling Dwaipayan Ray
@ 2020-11-13 13:37 ` Lukas Bulwahn
  2020-11-13 14:01   ` Lukas Bulwahn
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Bulwahn @ 2020-11-13 13:37 UTC (permalink / raw)
  To: Dwaipayan Ray
  Cc: Joe Perches, linux-kernel-mentees, Linux Kernel Mailing List

On Fri, Nov 13, 2020 at 1:31 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
>
> Commit extraction in checkpatch fails in some cases.
> One of the most common false positives is a split line
> between "commit" and the git SHA of the commit.
>
> Improve commit handling to reduce false positives.
>
> Improvements:
> - handle split line between commit and git SHA of commit.
> - fix handling of split commit description.
>
> A quick evaluation of 50k commits from v5.4 showed that
> the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
> of these were split lines between commit and its hash.
>

Can you send me the file of the evaluation, e.g., all contexts (two
lines above and two lines below) around the warned line in the commits
where the GIT_COMMIT_ID dropped?

Then, I can do a quick sanity check as well.

Lukas

> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> ---
>  scripts/checkpatch.pl | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 024514946bed..f5ba2beac008 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2990,6 +2990,16 @@ sub process {
>                         if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) {
>                                 $orig_desc = $1;
>                                 $hasparens = 1;
> +                       } elsif ($line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i &&
> +                                defined $rawlines[$linenr-2] &&
> +                                $rawlines[$linenr-2] =~ /\bcommit\s*$/i) {
> +                               $line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i;
> +                               $orig_desc = $1;
> +                               $hasparens = 1;
> +                               $space = 0;
> +                               $short = 0 if ($line =~ /\b[0-9a-f]{12,40}/i);
> +                               $long = 1 if ($line =~ /\b[0-9a-f]{41,}/i);
> +                               $case = 0 if ($line =~ /\b[0-9a-f]{5,40}[^A-F]/ && $rawlines[$linenr-2] =~ /\b[Cc]ommit\s*$/);
>                         } elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i &&
>                                  defined $rawlines[$linenr] &&
>                                  $rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) {
> @@ -3001,7 +3011,9 @@ sub process {
>                                 $line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i;
>                                 $orig_desc = $1;
>                                 $rawlines[$linenr] =~ /^\s*([^"]+)"\)/;
> -                               $orig_desc .= " " . $1;
> +                               my $split_desc = $1;
> +                               $split_desc = " $split_desc" if ($line =~ /[\w\,\.]$/);
> +                               $orig_desc .= $split_desc;
>                                 $hasparens = 1;
>                         }
>
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] checkpatch: add new cases to commit handling
  2020-11-13 13:37 ` Lukas Bulwahn
@ 2020-11-13 14:01   ` Lukas Bulwahn
  2020-11-13 14:17     ` Dwaipayan Ray
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Bulwahn @ 2020-11-13 14:01 UTC (permalink / raw)
  To: Dwaipayan Ray
  Cc: Joe Perches, linux-kernel-mentees, Linux Kernel Mailing List

On Fri, Nov 13, 2020 at 2:37 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 1:31 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
> >
> > Commit extraction in checkpatch fails in some cases.
> > One of the most common false positives is a split line
> > between "commit" and the git SHA of the commit.
> >
> > Improve commit handling to reduce false positives.
> >
> > Improvements:
> > - handle split line between commit and git SHA of commit.
> > - fix handling of split commit description.
> >
> > A quick evaluation of 50k commits from v5.4 showed that
> > the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
> > of these were split lines between commit and its hash.
> >
>
> Can you send me the file of the evaluation, e.g., all contexts (two
> lines above and two lines below) around the warned line in the commits
> where the GIT_COMMIT_ID dropped?
>
> Then, I can do a quick sanity check as well.
>

Thanks, Dwaipayan; I checked your file sent off-list and it looks good
to not report on those cases.

Maybe we can now check the remaining 900 cases once again; are they
all true positives or is there still a big false positive class?

Lukas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] checkpatch: add new cases to commit handling
  2020-11-13 14:01   ` Lukas Bulwahn
@ 2020-11-13 14:17     ` Dwaipayan Ray
  0 siblings, 0 replies; 4+ messages in thread
From: Dwaipayan Ray @ 2020-11-13 14:17 UTC (permalink / raw)
  To: Lukas Bulwahn
  Cc: Joe Perches, linux-kernel-mentees, Linux Kernel Mailing List

On Fri, Nov 13, 2020 at 7:31 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 2:37 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> >
> > On Fri, Nov 13, 2020 at 1:31 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
> > >
> > > Commit extraction in checkpatch fails in some cases.
> > > One of the most common false positives is a split line
> > > between "commit" and the git SHA of the commit.
> > >
> > > Improve commit handling to reduce false positives.
> > >
> > > Improvements:
> > > - handle split line between commit and git SHA of commit.
> > > - fix handling of split commit description.
> > >
> > > A quick evaluation of 50k commits from v5.4 showed that
> > > the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
> > > of these were split lines between commit and its hash.
> > >
> >
> > Can you send me the file of the evaluation, e.g., all contexts (two
> > lines above and two lines below) around the warned line in the commits
> > where the GIT_COMMIT_ID dropped?
> >
> > Then, I can do a quick sanity check as well.
> >
>
> Thanks, Dwaipayan; I checked your file sent off-list and it looks good
> to not report on those cases.
>
> Maybe we can now check the remaining 900 cases once again; are they
> all true positives or is there still a big false positive class?
>
> Lukas

Hi,
I had roughly gone through the list and most of them are true positives.
But there are two particular cases which may be false:

1) References: tag. (I don't know if it is a proper convention).
There were about 50 of these:

References: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")

But it is non uniform. Some commits use this tag to refer to links also.

2) Quotes inside commit title. (apart from the main enclosing quotes)
I think by design checkpatch doesn't handle this case.

Thanks,
Dwaipayan.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-13 14:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13 12:31 [PATCH RFC] checkpatch: add new cases to commit handling Dwaipayan Ray
2020-11-13 13:37 ` Lukas Bulwahn
2020-11-13 14:01   ` Lukas Bulwahn
2020-11-13 14:17     ` Dwaipayan Ray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).