* [PATCH RFC] checkpatch: add new cases to commit handling
@ 2020-11-13 12:31 Dwaipayan Ray
2020-11-13 13:37 ` Lukas Bulwahn
0 siblings, 1 reply; 4+ messages in thread
From: Dwaipayan Ray @ 2020-11-13 12:31 UTC (permalink / raw)
To: joe; +Cc: linux-kernel-mentees, dwaipayanray1, linux-kernel, lukas.bulwahn
Commit extraction in checkpatch fails in some cases.
One of the most common false positives is a split line
between "commit" and the git SHA of the commit.
Improve commit handling to reduce false positives.
Improvements:
- handle split line between commit and git SHA of commit.
- fix handling of split commit description.
A quick evaluation of 50k commits from v5.4 showed that
the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
of these were split lines between commit and its hash.
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
scripts/checkpatch.pl | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 024514946bed..f5ba2beac008 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2990,6 +2990,16 @@ sub process {
if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) {
$orig_desc = $1;
$hasparens = 1;
+ } elsif ($line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i &&
+ defined $rawlines[$linenr-2] &&
+ $rawlines[$linenr-2] =~ /\bcommit\s*$/i) {
+ $line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i;
+ $orig_desc = $1;
+ $hasparens = 1;
+ $space = 0;
+ $short = 0 if ($line =~ /\b[0-9a-f]{12,40}/i);
+ $long = 1 if ($line =~ /\b[0-9a-f]{41,}/i);
+ $case = 0 if ($line =~ /\b[0-9a-f]{5,40}[^A-F]/ && $rawlines[$linenr-2] =~ /\b[Cc]ommit\s*$/);
} elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i &&
defined $rawlines[$linenr] &&
$rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) {
@@ -3001,7 +3011,9 @@ sub process {
$line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i;
$orig_desc = $1;
$rawlines[$linenr] =~ /^\s*([^"]+)"\)/;
- $orig_desc .= " " . $1;
+ my $split_desc = $1;
+ $split_desc = " $split_desc" if ($line =~ /[\w\,\.]$/);
+ $orig_desc .= $split_desc;
$hasparens = 1;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] checkpatch: add new cases to commit handling
2020-11-13 12:31 [PATCH RFC] checkpatch: add new cases to commit handling Dwaipayan Ray
@ 2020-11-13 13:37 ` Lukas Bulwahn
2020-11-13 14:01 ` Lukas Bulwahn
0 siblings, 1 reply; 4+ messages in thread
From: Lukas Bulwahn @ 2020-11-13 13:37 UTC (permalink / raw)
To: Dwaipayan Ray
Cc: Joe Perches, linux-kernel-mentees, Linux Kernel Mailing List
On Fri, Nov 13, 2020 at 1:31 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
>
> Commit extraction in checkpatch fails in some cases.
> One of the most common false positives is a split line
> between "commit" and the git SHA of the commit.
>
> Improve commit handling to reduce false positives.
>
> Improvements:
> - handle split line between commit and git SHA of commit.
> - fix handling of split commit description.
>
> A quick evaluation of 50k commits from v5.4 showed that
> the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
> of these were split lines between commit and its hash.
>
Can you send me the file of the evaluation, e.g., all contexts (two
lines above and two lines below) around the warned line in the commits
where the GIT_COMMIT_ID dropped?
Then, I can do a quick sanity check as well.
Lukas
> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> ---
> scripts/checkpatch.pl | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 024514946bed..f5ba2beac008 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2990,6 +2990,16 @@ sub process {
> if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) {
> $orig_desc = $1;
> $hasparens = 1;
> + } elsif ($line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i &&
> + defined $rawlines[$linenr-2] &&
> + $rawlines[$linenr-2] =~ /\bcommit\s*$/i) {
> + $line =~ /^\s*[0-9a-f]{5,}\s+\("([^"]+)"\)/i;
> + $orig_desc = $1;
> + $hasparens = 1;
> + $space = 0;
> + $short = 0 if ($line =~ /\b[0-9a-f]{12,40}/i);
> + $long = 1 if ($line =~ /\b[0-9a-f]{41,}/i);
> + $case = 0 if ($line =~ /\b[0-9a-f]{5,40}[^A-F]/ && $rawlines[$linenr-2] =~ /\b[Cc]ommit\s*$/);
> } elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i &&
> defined $rawlines[$linenr] &&
> $rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) {
> @@ -3001,7 +3011,9 @@ sub process {
> $line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i;
> $orig_desc = $1;
> $rawlines[$linenr] =~ /^\s*([^"]+)"\)/;
> - $orig_desc .= " " . $1;
> + my $split_desc = $1;
> + $split_desc = " $split_desc" if ($line =~ /[\w\,\.]$/);
> + $orig_desc .= $split_desc;
> $hasparens = 1;
> }
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] checkpatch: add new cases to commit handling
2020-11-13 13:37 ` Lukas Bulwahn
@ 2020-11-13 14:01 ` Lukas Bulwahn
2020-11-13 14:17 ` Dwaipayan Ray
0 siblings, 1 reply; 4+ messages in thread
From: Lukas Bulwahn @ 2020-11-13 14:01 UTC (permalink / raw)
To: Dwaipayan Ray
Cc: Joe Perches, linux-kernel-mentees, Linux Kernel Mailing List
On Fri, Nov 13, 2020 at 2:37 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 1:31 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
> >
> > Commit extraction in checkpatch fails in some cases.
> > One of the most common false positives is a split line
> > between "commit" and the git SHA of the commit.
> >
> > Improve commit handling to reduce false positives.
> >
> > Improvements:
> > - handle split line between commit and git SHA of commit.
> > - fix handling of split commit description.
> >
> > A quick evaluation of 50k commits from v5.4 showed that
> > the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
> > of these were split lines between commit and its hash.
> >
>
> Can you send me the file of the evaluation, e.g., all contexts (two
> lines above and two lines below) around the warned line in the commits
> where the GIT_COMMIT_ID dropped?
>
> Then, I can do a quick sanity check as well.
>
Thanks, Dwaipayan; I checked your file sent off-list and it looks good
to not report on those cases.
Maybe we can now check the remaining 900 cases once again; are they
all true positives or is there still a big false positive class?
Lukas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] checkpatch: add new cases to commit handling
2020-11-13 14:01 ` Lukas Bulwahn
@ 2020-11-13 14:17 ` Dwaipayan Ray
0 siblings, 0 replies; 4+ messages in thread
From: Dwaipayan Ray @ 2020-11-13 14:17 UTC (permalink / raw)
To: Lukas Bulwahn
Cc: Joe Perches, linux-kernel-mentees, Linux Kernel Mailing List
On Fri, Nov 13, 2020 at 7:31 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 2:37 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> >
> > On Fri, Nov 13, 2020 at 1:31 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
> > >
> > > Commit extraction in checkpatch fails in some cases.
> > > One of the most common false positives is a split line
> > > between "commit" and the git SHA of the commit.
> > >
> > > Improve commit handling to reduce false positives.
> > >
> > > Improvements:
> > > - handle split line between commit and git SHA of commit.
> > > - fix handling of split commit description.
> > >
> > > A quick evaluation of 50k commits from v5.4 showed that
> > > the GIT_COMMIT_ID errors dropped from 1032 to 897. Most
> > > of these were split lines between commit and its hash.
> > >
> >
> > Can you send me the file of the evaluation, e.g., all contexts (two
> > lines above and two lines below) around the warned line in the commits
> > where the GIT_COMMIT_ID dropped?
> >
> > Then, I can do a quick sanity check as well.
> >
>
> Thanks, Dwaipayan; I checked your file sent off-list and it looks good
> to not report on those cases.
>
> Maybe we can now check the remaining 900 cases once again; are they
> all true positives or is there still a big false positive class?
>
> Lukas
Hi,
I had roughly gone through the list and most of them are true positives.
But there are two particular cases which may be false:
1) References: tag. (I don't know if it is a proper convention).
There were about 50 of these:
References: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
But it is non uniform. Some commits use this tag to refer to links also.
2) Quotes inside commit title. (apart from the main enclosing quotes)
I think by design checkpatch doesn't handle this case.
Thanks,
Dwaipayan.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-11-13 14:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13 12:31 [PATCH RFC] checkpatch: add new cases to commit handling Dwaipayan Ray
2020-11-13 13:37 ` Lukas Bulwahn
2020-11-13 14:01 ` Lukas Bulwahn
2020-11-13 14:17 ` Dwaipayan Ray
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).