* [PATCH] scripts: Report 'suspicious' comments
@ 2020-08-27 15:13 Mohammed Billoo
2020-09-09 14:05 ` [linux-safety] " Lukas Bulwahn
0 siblings, 1 reply; 5+ messages in thread
From: Mohammed Billoo @ 2020-08-27 15:13 UTC (permalink / raw)
To: linux-safety; +Cc: skhan, Mohammed Billoo
This perl script attempts to mitigate CWE-546
(https://cwe.mitre.org/data/definitions/546.html), which identifies code
with comments that suggest that code is incomplete. This script was
tested against the kernel, and the following is a snippet of the
output that was generated. The output was verified by confirming that
the specified file does indeed have that string at the specified line.
./arch/arm/include/asm/pgtable.h contains FIXME on line 316
./arch/arm/include/debug/imx.S contains FIXME on line 14
./arch/arm/kernel/entry-header.S contains BUG on line 71
./arch/arm/kernel/fiq.c contains FIXME on line 72
Signed-off-by: Mohammed Billoo <mab@mab-labs.com>
---
Makefile | 8 +++++++-
| 35 +++++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+), 1 deletion(-)
create mode 100644 scripts/checkcomment.pl
diff --git a/Makefile b/Makefile
index f21168154160..c84b8bc5c18e 100644
--- a/Makefile
+++ b/Makefile
@@ -264,7 +264,7 @@ no-dot-config-targets := $(clean-targets) \
cscope gtags TAGS tags help% %docs check% coccicheck \
$(version_h) headers headers_% archheaders archscripts \
%asm-generic kernelversion %src-pkg dt_binding_check \
- outputmakefile
+ outputmakefile commentcheck
no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease
single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/
@@ -1575,6 +1575,7 @@ help:
@echo ' export_report - List the usages of all exported symbols'
@echo ' headerdep - Detect inclusion cycles in headers'
@echo ' coccicheck - Check with Coccinelle'
+ @echo ' commentcheck - Check and report suspicious comments'
@echo ''
@echo 'Tools:'
@echo ' nsdeps - Generate missing symbol namespace dependencies'
@@ -1842,6 +1843,11 @@ versioncheck:
-name '*.[hcS]' -type f -print | sort \
| xargs $(PERL) -w $(srctree)/scripts/checkversion.pl
+commentcheck:
+ find $(srctree)/* $(RCS_FIND_IGNORE) \
+ -name '*.[hcS]' -type f -print | sort \
+ | xargs $(PERL) -w $(srctree)/scripts/checkcomment.pl
+
coccicheck:
$(Q)$(BASH) $(srctree)/scripts/$@
--git a/scripts/checkcomment.pl b/scripts/checkcomment.pl
new file mode 100644
index 000000000000..22fd77bc75d1
--- /dev/null
+++ b/scripts/checkcomment.pl
@@ -0,0 +1,35 @@
+#!/usr/bin/env perl
+# SPDX-License-Identifier: GPL-2.0
+#
+# (c) 2020, Mohammed Billoo (mab@mab-labs.com)
+#
+# This script checks for any keywords outlined in CWE-546
+# (https://cwe.mitre.org/data/definitions/546.html)
+# and simply reports them to the user. It's up to the user
+# to take any further actions.
+
+use strict;
+
+my @keywords = ('TODO', 'BUG', 'FIXME', 'HACK');
+my @mismatch_keywords = ('BUG\(\)');
+
+foreach my $file (@ARGV) {
+ my $i = 1;
+ open(my $f, '<', $file)
+ or die "Cannot open $file: $!\n";
+
+ while (my $line = <$f>) {
+ foreach my $keyword (@keywords) {
+ if ($line =~ /\b$keyword\b/) {
+ foreach my $mismatch_keyword (@mismatch_keywords) {
+ if ($line =~ /$mismatch_keyword/) {}
+ else {
+ print "$file contains $keyword on line $i\n";
+ }
+ }
+ }
+ }
+
+ $i++;
+ }
+}
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [linux-safety] [PATCH] scripts: Report 'suspicious' comments
2020-08-27 15:13 [PATCH] scripts: Report 'suspicious' comments Mohammed Billoo
@ 2020-09-09 14:05 ` Lukas Bulwahn
2020-09-09 23:45 ` Mohammed Billoo
0 siblings, 1 reply; 5+ messages in thread
From: Lukas Bulwahn @ 2020-09-09 14:05 UTC (permalink / raw)
To: Mohammed Billoo; +Cc: linux-safety, skhan
On Thu, 27 Aug 2020, Mohammed Billoo wrote:
> This perl script attempts to mitigate CWE-546
> (https://cwe.mitre.org/data/definitions/546.html), which identifies code
> with comments that suggest that code is incomplete. This script was
> tested against the kernel, and the following is a snippet of the
> output that was generated. The output was verified by confirming that
> the specified file does indeed have that string at the specified line.
>
> ./arch/arm/include/asm/pgtable.h contains FIXME on line 316
> ./arch/arm/include/debug/imx.S contains FIXME on line 14
> ./arch/arm/kernel/entry-header.S contains BUG on line 71
> ./arch/arm/kernel/fiq.c contains FIXME on line 72
>
Okay, easy things first:
The patch applies on v5.9-rc4 and the script runs. :)
Now, the challenging parts:
- I think your output should follow in its format the format of other
"standard" tools that other tools use, e.g., look how gcc and clang
compiler emit warnings, or how checkpatch.pl emits warnings.
We should try to imitate their output format for this tool.
Also, I would like to see the line which was warned about.
- Second, we then need to look into true and false positives here.
BUG appears in intended error messages, in DEBUG, in functions, such as
BUG, BUG_ON. So we need to collect the context around BUG and see to
improve the pattern to at least reduce the false positives.
- Third, we then need have suitable ways to aggregate the findings to know
which places are known to have many such findings and hence deserve some
kind of special care.
And then, we need a lot of help to make proper evaluations what to do.
Lukas
> Signed-off-by: Mohammed Billoo <mab@mab-labs.com>
> ---
> Makefile | 8 +++++++-
> scripts/checkcomment.pl | 35 +++++++++++++++++++++++++++++++++++
> 2 files changed, 42 insertions(+), 1 deletion(-)
> create mode 100644 scripts/checkcomment.pl
>
> diff --git a/Makefile b/Makefile
> index f21168154160..c84b8bc5c18e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -264,7 +264,7 @@ no-dot-config-targets := $(clean-targets) \
> cscope gtags TAGS tags help% %docs check% coccicheck \
> $(version_h) headers headers_% archheaders archscripts \
> %asm-generic kernelversion %src-pkg dt_binding_check \
> - outputmakefile
> + outputmakefile commentcheck
> no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease
> single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/
>
> @@ -1575,6 +1575,7 @@ help:
> @echo ' export_report - List the usages of all exported symbols'
> @echo ' headerdep - Detect inclusion cycles in headers'
> @echo ' coccicheck - Check with Coccinelle'
> + @echo ' commentcheck - Check and report suspicious comments'
> @echo ''
> @echo 'Tools:'
> @echo ' nsdeps - Generate missing symbol namespace dependencies'
> @@ -1842,6 +1843,11 @@ versioncheck:
> -name '*.[hcS]' -type f -print | sort \
> | xargs $(PERL) -w $(srctree)/scripts/checkversion.pl
>
> +commentcheck:
> + find $(srctree)/* $(RCS_FIND_IGNORE) \
> + -name '*.[hcS]' -type f -print | sort \
> + | xargs $(PERL) -w $(srctree)/scripts/checkcomment.pl
> +
> coccicheck:
> $(Q)$(BASH) $(srctree)/scripts/$@
>
> diff --git a/scripts/checkcomment.pl b/scripts/checkcomment.pl
> new file mode 100644
> index 000000000000..22fd77bc75d1
> --- /dev/null
> +++ b/scripts/checkcomment.pl
> @@ -0,0 +1,35 @@
> +#!/usr/bin/env perl
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# (c) 2020, Mohammed Billoo (mab@mab-labs.com)
> +#
> +# This script checks for any keywords outlined in CWE-546
> +# (https://cwe.mitre.org/data/definitions/546.html)
> +# and simply reports them to the user. It's up to the user
> +# to take any further actions.
> +
> +use strict;
> +
> +my @keywords = ('TODO', 'BUG', 'FIXME', 'HACK');
> +my @mismatch_keywords = ('BUG\(\)');
> +
> +foreach my $file (@ARGV) {
> + my $i = 1;
> + open(my $f, '<', $file)
> + or die "Cannot open $file: $!\n";
> +
> + while (my $line = <$f>) {
> + foreach my $keyword (@keywords) {
> + if ($line =~ /\b$keyword\b/) {
> + foreach my $mismatch_keyword (@mismatch_keywords) {
> + if ($line =~ /$mismatch_keyword/) {}
> + else {
> + print "$file contains $keyword on line $i\n";
> + }
> + }
> + }
> + }
> +
> + $i++;
> + }
> +}
> --
> 2.17.1
>
>
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-safety] [PATCH] scripts: Report 'suspicious' comments
2020-09-09 14:05 ` [linux-safety] " Lukas Bulwahn
@ 2020-09-09 23:45 ` Mohammed Billoo
2020-09-10 6:42 ` Lukas Bulwahn
0 siblings, 1 reply; 5+ messages in thread
From: Mohammed Billoo @ 2020-09-09 23:45 UTC (permalink / raw)
To: Lukas Bulwahn; +Cc: linux-safety, Shuah Khan
> - Second, we then need to look into true and false positives here.
> BUG appears in intended error messages, in DEBUG, in functions, such as
> BUG, BUG_ON. So we need to collect the context around BUG and see to
> improve the pattern to at least reduce the false positives.
>
I thought this was addressed in the Perl script. Did you find
instances where this isn't the case (IIRC, BUG(), BUG_ON(), and
anything else that is outside of a comment were ignored)?
> - Third, we then need have suitable ways to aggregate the findings to know
> which places are known to have many such findings and hence deserve some
> kind of special care.
>
Do you think CodeChecker would be suitable for this?
> And then, we need a lot of help to make proper evaluations what to do.
Can you elaborate on this? I don't understand what you mean here.
--
Mohammed A Billoo
Founder
MAB Labs, LLC
www.mab-labs.com
201-338-2022
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-safety] [PATCH] scripts: Report 'suspicious' comments
2020-09-09 23:45 ` Mohammed Billoo
@ 2020-09-10 6:42 ` Lukas Bulwahn
2020-09-10 11:55 ` Mohammed Billoo
0 siblings, 1 reply; 5+ messages in thread
From: Lukas Bulwahn @ 2020-09-10 6:42 UTC (permalink / raw)
To: Mohammed Billoo; +Cc: Lukas Bulwahn, linux-safety, Shuah Khan
On Wed, 9 Sep 2020, Mohammed Billoo wrote:
> > - Second, we then need to look into true and false positives here.
> > BUG appears in intended error messages, in DEBUG, in functions, such as
> > BUG, BUG_ON. So we need to collect the context around BUG and see to
> > improve the pattern to at least reduce the false positives.
> >
> I thought this was addressed in the Perl script. Did you find
> instances where this isn't the case (IIRC, BUG(), BUG_ON(), and
> anything else that is outside of a comment were ignored)?
>
Yes, I did when scanning through the instances.
Let me consider these 10 examples here:
grep "BUG" commentcheck_on_v5.9-rc4 | grep "arch/x86" | head -n 10
./arch/x86/entry/entry_32.S contains BUG on line 155
./arch/x86/include/asm/cpufeatures.h contains BUG on line 381
./arch/x86/kernel/cpu/cpuid-deps.c contains BUG on line 90
./arch/x86/kernel/kprobes/core.c contains BUG on line 655
./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
./arch/x86/kernel/paravirt.c contains BUG on line 128
./arch/x86/kernel/traps.c contains BUG on line 225
./arch/x86/kernel/traps.c contains BUG on line 314
./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326
Case:
./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n",
unexpected_testcase_failures, testcase_total);
Here BUG is within an intended printk message.
Case:
./arch/x86/kernel/traps.c contains BUG on line 314
printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
(void *)fault_address, current->stack,
(char *)current->stack + THREAD_SIZE - 1);
Here BUG is within an intended printk message.
Case:
./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
pr_err("%s: %p 0->BUG\n", __func__, spte);
Here BUG is within an intended pr_err message.
Case:
./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326
pr_err("%s: %p 1->BUG\n", __func__, spte);
Here BUG is within an intended pr_err message.
Summary:
10 examples looked at, 4 cases where it is part of an error message.
At least in six cases, it needs more thought why it might not be
relevant :)
> > - Third, we then need have suitable ways to aggregate the findings to know
> > which places are known to have many such findings and hence deserve some
> > kind of special care.
> >
> Do you think CodeChecker would be suitable for this?
>
Well, I think we can record and track findings with CodeChecker if the
output from the tool has a suitable format to parse back.
The aggregations I have in mind are a bit orthogonal, though.
For example:
How "bad" is my architecture?
grep "\./arch" commentcheck_on_v5.9-rc4 | sed 's!^.*arch/\([^/]*\)/.*$!\1!' | sort | uniq -c | sort -nr
159 powerpc
113 x86
109 arm
48 parisc
42 mips
28 alpha
24 microblaze
20 sh
18 ia64
16 m68k
15 sparc
15 openrisc
13 arc
10 xtensa
10 um
10 arm64
7 s390
7 riscv
5 nds32
3 hexagon
3 h8300
1 nios2
1 csky
1 c6x
We want to provide a tool that can guide a user through this data your
script collects. Some way to show this data in a reasonable aggregation
can help (not that I am suggesting that this aggregation above is
reasonable, but it is a first example I could quickly hack together, but
maybe it is even reasonable?).
> > And then, we need a lot of help to make proper evaluations what to do.
> Can you elaborate on this? I don't understand what you mean here.
>
The statistics above shows there are 113 issues in x86; now, if you would
ever want to use that architecture for anything that matters to you, I
guess you need to understand all those 113 issues. That needs a lot of
help from others that understand the code and the issue, right?
Collecting data is a good first step, but there is still a long way ahead.
Lukas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-safety] [PATCH] scripts: Report 'suspicious' comments
2020-09-10 6:42 ` Lukas Bulwahn
@ 2020-09-10 11:55 ` Mohammed Billoo
0 siblings, 0 replies; 5+ messages in thread
From: Mohammed Billoo @ 2020-09-10 11:55 UTC (permalink / raw)
To: Lukas Bulwahn; +Cc: linux-safety, Shuah Khan
Thanks for the details. I will update the script and submit a revised
version of the patch to address your comments.
Thanks
On Thu, Sep 10, 2020 at 2:42 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
>
>
> On Wed, 9 Sep 2020, Mohammed Billoo wrote:
>
> > > - Second, we then need to look into true and false positives here.
> > > BUG appears in intended error messages, in DEBUG, in functions, such as
> > > BUG, BUG_ON. So we need to collect the context around BUG and see to
> > > improve the pattern to at least reduce the false positives.
> > >
> > I thought this was addressed in the Perl script. Did you find
> > instances where this isn't the case (IIRC, BUG(), BUG_ON(), and
> > anything else that is outside of a comment were ignored)?
> >
>
> Yes, I did when scanning through the instances.
>
> Let me consider these 10 examples here:
>
> grep "BUG" commentcheck_on_v5.9-rc4 | grep "arch/x86" | head -n 10
>
> ./arch/x86/entry/entry_32.S contains BUG on line 155
> ./arch/x86/include/asm/cpufeatures.h contains BUG on line 381
> ./arch/x86/kernel/cpu/cpuid-deps.c contains BUG on line 90
> ./arch/x86/kernel/kprobes/core.c contains BUG on line 655
> ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
> ./arch/x86/kernel/paravirt.c contains BUG on line 128
> ./arch/x86/kernel/traps.c contains BUG on line 225
> ./arch/x86/kernel/traps.c contains BUG on line 314
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326
>
>
> Case:
>
> ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
>
> printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n",
> unexpected_testcase_failures, testcase_total);
>
>
> Here BUG is within an intended printk message.
>
>
> Case:
>
> ./arch/x86/kernel/traps.c contains BUG on line 314
>
>
> printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
> (void *)fault_address, current->stack,
> (char *)current->stack + THREAD_SIZE - 1);
>
> Here BUG is within an intended printk message.
>
>
> Case:
>
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
>
> pr_err("%s: %p 0->BUG\n", __func__, spte);
>
> Here BUG is within an intended pr_err message.
>
>
> Case:
>
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326
>
> pr_err("%s: %p 1->BUG\n", __func__, spte);
>
> Here BUG is within an intended pr_err message.
>
>
> Summary:
>
> 10 examples looked at, 4 cases where it is part of an error message.
>
> At least in six cases, it needs more thought why it might not be
> relevant :)
>
> > > - Third, we then need have suitable ways to aggregate the findings to know
> > > which places are known to have many such findings and hence deserve some
> > > kind of special care.
> > >
> > Do you think CodeChecker would be suitable for this?
> >
>
> Well, I think we can record and track findings with CodeChecker if the
> output from the tool has a suitable format to parse back.
>
> The aggregations I have in mind are a bit orthogonal, though.
>
> For example:
>
> How "bad" is my architecture?
>
> grep "\./arch" commentcheck_on_v5.9-rc4 | sed 's!^.*arch/\([^/]*\)/.*$!\1!' | sort | uniq -c | sort -nr
>
> 159 powerpc
> 113 x86
> 109 arm
> 48 parisc
> 42 mips
> 28 alpha
> 24 microblaze
> 20 sh
> 18 ia64
> 16 m68k
> 15 sparc
> 15 openrisc
> 13 arc
> 10 xtensa
> 10 um
> 10 arm64
> 7 s390
> 7 riscv
> 5 nds32
> 3 hexagon
> 3 h8300
> 1 nios2
> 1 csky
> 1 c6x
>
> We want to provide a tool that can guide a user through this data your
> script collects. Some way to show this data in a reasonable aggregation
> can help (not that I am suggesting that this aggregation above is
> reasonable, but it is a first example I could quickly hack together, but
> maybe it is even reasonable?).
>
> > > And then, we need a lot of help to make proper evaluations what to do.
> > Can you elaborate on this? I don't understand what you mean here.
> >
>
> The statistics above shows there are 113 issues in x86; now, if you would
> ever want to use that architecture for anything that matters to you, I
> guess you need to understand all those 113 issues. That needs a lot of
> help from others that understand the code and the issue, right?
>
> Collecting data is a good first step, but there is still a long way ahead.
>
> Lukas
--
Mohammed A Billoo
Founder
MAB Labs, LLC
www.mab-labs.com
201-338-2022
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-10 11:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-27 15:13 [PATCH] scripts: Report 'suspicious' comments Mohammed Billoo
2020-09-09 14:05 ` [linux-safety] " Lukas Bulwahn
2020-09-09 23:45 ` Mohammed Billoo
2020-09-10 6:42 ` Lukas Bulwahn
2020-09-10 11:55 ` Mohammed Billoo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).