linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] checkkconfigsymbols.sh: reimplementation in python
@ 2014-09-20 14:15 Valentin Rothberg
  2014-09-21 19:39 ` [PATCH v2] " Valentin Rothberg
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-20 14:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Valentin Rothberg, Stefan Hengelein

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
The script is also hard to read and understand, and is thereby difficult
to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report false positives
    (b) additionally detects broken references in Kconfig and Kbuild files
    (c) is up to 75 times faster than the shell script

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and detects 572 broken references in Linux v3.17-rc1;
49 additional references of which 17 are located in Kconfig files.

Moreover, we intentionally include references in C comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
 scripts/checkkconfigsymbols.py | 134 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 ------------------
 2 files changed, 134 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..0157426
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifieres that are referenced but not defined."""
+
+# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+
+
+import os
+import re
+import sys
+
+# REGEX EXPRESSIONS
+OPERATORS = r"[|\s|&|\(|\)|\||\!]"
+FEATURE = r"[\w]*[A-Z]+[\w]*"
+EXPR = OPERATORS + r"*([" + FEATURE + r"]*)*" + OPERATORS + r"*"
+IF_STMT = r"^\s*if\s+" + EXPR + r""
+SELECT_STMT = r"^\s*select\s+" + EXPR + r""
+DEPENDS_STMT = r"^\s*depends\s+on\s+" + EXPR + r""
+CONFIG_STMT = r"^\s*[menu]*config\s+(" + FEATURE + r")"
+
+# REGEX OBJECTS
+REGEX_EXPR = re.compile(EXPR)
+REGEX_KCONFIG = re.compile(r"Kconfig[\.\w+]*")
+REGEX_SOURCE = re.compile(r"\.[cSh]$")
+REGEX_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*")
+REGEX_KCONFIG_HELP = re.compile(r"[\s|-]*help[\s|-]*")
+REGEX_CONFIG_STMT = re.compile(CONFIG_STMT)
+REGEX_DEPENDS_STMT = re.compile(DEPENDS_STMT)
+REGEX_SELECT_STMT = re.compile(SELECT_STMT)
+REGEX_IF_STMT = re.compile(IF_STMT)
+REGEX_FEATURES = re.compile(r"[\W]+CONFIG_(" + FEATURE + r")[.]*")
+
+
+def main():
+    """Main function of this module."""
+    output = []
+    kconfig_files = []
+    source_files = []
+    defined_features = set()
+    referenced_features = dict()
+
+    for root, _, files in os.walk("."):
+        for fil in files:
+            if REGEX_KCONFIG.match(fil):
+                kconfig_files.append(os.path.join(root, fil))
+            elif REGEX_SOURCE.search(fil) or REGEX_MAKE.match(fil):
+                source_files.append(os.path.join(root, fil))
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    print "File list\tundefined symbol used"
+    for feature in referenced_features.keys():
+        if feature not in defined_features:
+            files = referenced_features.get(feature)
+            output.append("%s:\t%s" % (", ".join(files), feature))
+    for out in sorted(output):
+        print out
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_FEATURES.findall(line)
+        for feat in features:
+            if feat.endswith("_MODULE"):
+                feat = feat[:-len("_MODULE")]
+            paths = referenced_features.get(feat, set())
+            paths.add(sfile)
+            referenced_features[feat] = paths
+
+
+def get_items_in_line(line):
+    """Return mentioned kconfig items in @line."""
+    return REGEX_EXPR.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    skip = False
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # Ignore right side of comments
+
+        definition = REGEX_CONFIG_STMT.findall(line)
+        if definition:
+            defined_features.add(definition[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # Ignore content of help messages
+            pass
+        elif REGEX_DEPENDS_STMT.match(line) or \
+                REGEX_SELECT_STMT.match(line) or \
+                REGEX_IF_STMT.match(line):
+            items = get_items_in_line(line)
+            # Multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                items.extend(get_items_in_line(line))
+            for item in set(items):
+                paths = referenced_features.get(kfile, set())
+                paths.add(kfile)
+                referenced_features[item] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
@ 2014-09-21 19:39 ` Valentin Rothberg
  2014-09-21 19:53 ` [PATCH v3] " Valentin Rothberg
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-21 19:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh, valentinrothberg, stefan.hengelein

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
The script is also hard to read and understand, and is thereby difficult
to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report false positives
    (b) additionally detects broken references in Kconfig and Kbuild files
    (c) is up to 75 times faster than the shell script

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 570 broken references in Linux v3.17-rc1;
49 additional reports of which 16 are located in Kconfig files.

Moreover, we intentionally include references in C comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Changelog:
v2: Fix of regural expressions

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
 scripts/checkkconfigsymbols.py | 134 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 ------------------
 2 files changed, 134 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..0157426
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifieres that are referenced but not defined."""
+
+# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+
+
+import os
+import re
+import sys
+
+# REGEX EXPRESSIONS
+OPERATORS = r"[|\s|&|\(|\)|\||\!]"
+FEATURE = r"[\w]*[A-Z]+[\w]*"
+EXPR = OPERATORS + r"*([" + FEATURE + r"]*)*" + OPERATORS + r"*"
+IF_STMT = r"^\s*if\s+" + EXPR + r""
+SELECT_STMT = r"^\s*select\s+" + EXPR + r""
+DEPENDS_STMT = r"^\s*depends\s+on\s+" + EXPR + r""
+CONFIG_STMT = r"^\s*[menu]*config\s+(" + FEATURE + r")"
+
+# REGEX OBJECTS
+REGEX_EXPR = re.compile(EXPR)
+REGEX_KCONFIG = re.compile(r"Kconfig[\.\w+]*")
+REGEX_SOURCE = re.compile(r"\.[cSh]$")
+REGEX_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*")
+REGEX_KCONFIG_HELP = re.compile(r"[\s|-]*help[\s|-]*")
+REGEX_CONFIG_STMT = re.compile(CONFIG_STMT)
+REGEX_DEPENDS_STMT = re.compile(DEPENDS_STMT)
+REGEX_SELECT_STMT = re.compile(SELECT_STMT)
+REGEX_IF_STMT = re.compile(IF_STMT)
+REGEX_FEATURES = re.compile(r"[\W]+CONFIG_(" + FEATURE + r")[.]*")
+
+
+def main():
+    """Main function of this module."""
+    output = []
+    kconfig_files = []
+    source_files = []
+    defined_features = set()
+    referenced_features = dict()
+
+    for root, _, files in os.walk("."):
+        for fil in files:
+            if REGEX_KCONFIG.match(fil):
+                kconfig_files.append(os.path.join(root, fil))
+            elif REGEX_SOURCE.search(fil) or REGEX_MAKE.match(fil):
+                source_files.append(os.path.join(root, fil))
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    print "File list\tundefined symbol used"
+    for feature in referenced_features.keys():
+        if feature not in defined_features:
+            files = referenced_features.get(feature)
+            output.append("%s:\t%s" % (", ".join(files), feature))
+    for out in sorted(output):
+        print out
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_FEATURES.findall(line)
+        for feat in features:
+            if feat.endswith("_MODULE"):
+                feat = feat[:-len("_MODULE")]
+            paths = referenced_features.get(feat, set())
+            paths.add(sfile)
+            referenced_features[feat] = paths
+
+
+def get_items_in_line(line):
+    """Return mentioned kconfig items in @line."""
+    return REGEX_EXPR.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    skip = False
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # Ignore right side of comments
+
+        definition = REGEX_CONFIG_STMT.findall(line)
+        if definition:
+            defined_features.add(definition[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # Ignore content of help messages
+            pass
+        elif REGEX_DEPENDS_STMT.match(line) or \
+                REGEX_SELECT_STMT.match(line) or \
+                REGEX_IF_STMT.match(line):
+            items = get_items_in_line(line)
+            # Multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                items.extend(get_items_in_line(line))
+            for item in set(items):
+                paths = referenced_features.get(kfile, set())
+                paths.add(kfile)
+                referenced_features[item] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
  2014-09-21 19:39 ` [PATCH v2] " Valentin Rothberg
@ 2014-09-21 19:53 ` Valentin Rothberg
  2014-09-21 21:28   ` Paul Bolle
  2014-09-22 14:58 ` [PATCH v4] " Valentin Rothberg
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-21 19:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh, valentinrothberg, stefan.hengelein

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
The script is also hard to read and understand, and is thereby difficult
to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report false positives
    (b) additionally detects broken references in Kconfig and Kbuild files
    (c) is up to 75 times faster than the shell script

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 570 broken references in Linux v3.17-rc1;
49 additional reports of which 16 are located in Kconfig files.

Moreover, we intentionally include references in C comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
Changelog:
v2: Fix of regural expressions
v3: Changelog replacement, and add changes of v2
---
 scripts/checkkconfigsymbols.py | 131 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 -------------------
 2 files changed, 131 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..5c51dd1
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifieres that are referenced but not defined."""
+
+# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+
+
+import os
+import re
+
+
+# REGEX EXPRESSIONS
+OPERATORS = r"&|\(|\)|\||\!"
+FEATURE = r"\w*[A-Z]{1}\w*"
+FEATURE_DEF = r"^\s*(menu){,1}config\s+" + FEATURE + r"\s*"
+EXPR = r"(" + OPERATORS + r"|\s|" + FEATURE + r")*"
+STMT = r"^\s*(if|select|depends\s+on)\s+" + EXPR
+
+# REGEX OBJECTS
+REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
+REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")
+REGEX_FILE_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*$")
+REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
+REGEX_FEATURE_DEF = re.compile(FEATURE_DEF)
+REGEX_CPP_FEATURE = re.compile(r"\W+CONFIG_(" + FEATURE + r")[.]*")
+REGEX_KCONFIG_EXPR = re.compile(EXPR)
+REGEX_KCONFIG_STMT = re.compile(STMT)
+REGEX_KCONFIG_HELP = re.compile(r"^[\s|-]*help[\s|-]*")
+
+
+def main():
+    """Main function of this module."""
+    output = []
+    source_files = []
+    kconfig_files = []
+    defined_features = set()
+    referenced_features = dict()
+
+    for root, _, files in os.walk("."):
+        for fil in files:
+            if REGEX_FILE_KCONFIG.match(fil):
+                kconfig_files.append(os.path.join(root, fil))
+            elif REGEX_FILE_SOURCE.search(fil) or REGEX_FILE_MAKE.match(fil):
+                source_files.append(os.path.join(root, fil))
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    print "File list\tundefined symbol used"
+    for feature in referenced_features:
+        if feature not in defined_features:
+            files = referenced_features.get(feature)
+            output.append("%s:\t%s" % (", ".join(files), feature))
+    for out in sorted(output):
+        print out
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_CPP_FEATURE.findall(line)
+        for feature in features:
+            if feature.endswith("_MODULE"):
+                feature = feature[:-len("_MODULE")]
+            paths = referenced_features.get(feature, set())
+            paths.add(sfile)
+            referenced_features[feature] = paths
+
+
+def get_features_in_line(line):
+    """Return mentioned kconfig features in @line."""
+    return REGEX_FEATURE.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    skip = False
+
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # Ignore right side of comments
+
+        if REGEX_FEATURE_DEF.match(line):
+            feature_def = REGEX_FEATURE.findall(line)
+            defined_features.add(feature_def[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # Ignore content of help messages
+            pass
+        elif REGEX_KCONFIG_STMT.match(line):
+            features = get_features_in_line(line)
+            # Multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                features.extend(get_features_in_line(line))
+            for feature in set(features):
+                paths = referenced_features.get(feature, set())
+                paths.add(kfile)
+                referenced_features[feature] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
--
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] checkkconfigsymbols.sh: reimplementation in python
  2014-09-21 19:53 ` [PATCH v3] " Valentin Rothberg
@ 2014-09-21 21:28   ` Paul Bolle
  2014-09-22  7:43     ` Valentin Rothberg
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Bolle @ 2014-09-21 21:28 UTC (permalink / raw)
  To: Valentin Rothberg; +Cc: linux-kernel, gregkh, stefan.hengelein

Valentin Rothberg schreef op zo 21-09-2014 om 21:53 [+0200]:
> The scripts/checkkconfigsymbols.sh script searches Kconfig features
> in the source code that are not defined in Kconfig. Such identifiers
> always evaluate to false and are the source of various kinds of bugs.
> However, the shell script is slow and it does not detect such broken
> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
> Furthermore, it generates false positives (4 of 526 in v3.17-rc1).

Curiosity: what are those four false positives?

> The script is also hard to read and understand, and is thereby difficult
> to maintain.
> 
> This patch replaces the shell script with an implementation in Python,
> which:
>     (a) detects the same bugs, but does not report false positives

Depends a bit on the definition of false positives. Ie, the hit for
    ./arch/sh/kernel/head_64.S:	CACHE_

is caused by
     #error preprocessor flag CONFIG_CACHE_... not recognized!

Should that line, and similar lines, be changed?

>     (b) additionally detects broken references in Kconfig and Kbuild files
>     (c) is up to 75 times faster than the shell script
> 
> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> from 3m47s to 0m3s, and reports 570 broken references in Linux v3.17-rc1;
> 49 additional reports of which 16 are located in Kconfig files.
> 
> Moreover, we intentionally include references in C comments, which have been
> ignored until now. Such comments may be leftovers of features that have
> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
> These references can be misleading and should be removed or replaced.

Yes, that's a good idea.

> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
> ---
> Changelog:
> v2: Fix of regural expressions
> v3: Changelog replacement, and add changes of v2
> ---
>  scripts/checkkconfigsymbols.py | 131 +++++++++++++++++++++++++++++++++++++++++
>  scripts/checkkconfigsymbols.sh |  59 -------------------
>  2 files changed, 131 insertions(+), 59 deletions(-)
>  create mode 100644 scripts/checkkconfigsymbols.py
>  delete mode 100755 scripts/checkkconfigsymbols.sh
> 
> diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> new file mode 100644
> index 0000000..5c51dd1
> --- /dev/null
> +++ b/scripts/checkkconfigsymbols.py
> @@ -0,0 +1,131 @@
> +#!/usr/bin/env python
> +
> +"""Find Kconfig identifieres that are referenced but not defined."""
> +
> +# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
> +# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> +# more details.
> +
> +
> +import os
> +import re
> +
> +
> +# REGEX EXPRESSIONS
> +OPERATORS = r"&|\(|\)|\||\!"
> +FEATURE = r"\w*[A-Z]{1}\w*"
> +FEATURE_DEF = r"^\s*(menu){,1}config\s+" + FEATURE + r"\s*"
> +EXPR = r"(" + OPERATORS + r"|\s|" + FEATURE + r")*"
> +STMT = r"^\s*(if|select|depends\s+on)\s+" + EXPR

                          "depends on" with multiple spaces?
> +
> +# REGEX OBJECTS
> +REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
> +REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")
> +REGEX_FILE_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*$")
> +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> +REGEX_FEATURE_DEF = re.compile(FEATURE_DEF)
> +REGEX_CPP_FEATURE = re.compile(r"\W+CONFIG_(" + FEATURE + r")[.]*")

There are a few uses of "-DCONFIG_[...]" in Makefiles. This will miss
those, won't it? That's not bad, per se, but a comment why you're
skipping those might be nice. Or are those caught too, somewhere else?

> +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> +REGEX_KCONFIG_STMT = re.compile(STMT)
> +REGEX_KCONFIG_HELP = re.compile(r"^[\s|-]*help[\s|-]*")

Won't "^\s\+(---help---|help)$" do? Might help catch creative variants
of the help statement (we had a few in the past).

> +
> +
> +def main():
> +    """Main function of this module."""
> +    output = []
> +    source_files = []
> +    kconfig_files = []
> +    defined_features = set()
> +    referenced_features = dict()
> +
> +    for root, _, files in os.walk("."):

Does this descend into the .git directory?

> +        for fil in files:
> +            if REGEX_FILE_KCONFIG.match(fil):
> +                kconfig_files.append(os.path.join(root, fil))
> +            elif REGEX_FILE_SOURCE.search(fil) or REGEX_FILE_MAKE.match(fil):
> +                source_files.append(os.path.join(root, fil))
> +
> +    for kfile in kconfig_files:
> +        parse_kconfig_file(kfile, defined_features, referenced_features)
> +
> +    for sfile in source_files:
> +        parse_source_file(sfile, referenced_features)
> +
> +    print "File list\tundefined symbol used"
> +    for feature in referenced_features:
> +        if feature not in defined_features:
> +            files = referenced_features.get(feature)
> +            output.append("%s:\t%s" % (", ".join(files), feature))
> +    for out in sorted(output):
> +        print out
> +
> +
> +def parse_source_file(sfile, referenced_features):
> +    """Parse @sfile for referenced Kconfig features."""
> +    lines = []
> +    with open(sfile, "r") as stream:
> +        lines = stream.readlines()
> +
> +    for line in lines:
> +        if not "CONFIG_" in line:
> +            continue
> +        features = REGEX_CPP_FEATURE.findall(line)
> +        for feature in features:
> +            if feature.endswith("_MODULE"):
> +                feature = feature[:-len("_MODULE")]

Uses of CONFIG_FOO_MODULE are now reported as uses of CONFIG_FOO. That
might be confusing. 

> +            paths = referenced_features.get(feature, set())
> +            paths.add(sfile)
> +            referenced_features[feature] = paths
> +
> +
> +def get_features_in_line(line):
> +    """Return mentioned kconfig features in @line."""
> +    return REGEX_FEATURE.findall(line)
> +
> +
> +def parse_kconfig_file(kfile, defined_features, referenced_features):
> +    """Parse @kfile and update feature definitions and references."""
> +    lines = []
> +    skip = False
> +
> +    with open(kfile, "r") as stream:
> +        lines = stream.readlines()
> +
> +    for i in range(len(lines)):
> +        line = lines[i]
> +        line = line.strip('\n')
> +        line = line.split("#")[0]  # Ignore right side of comments
> +
> +        if REGEX_FEATURE_DEF.match(line):
> +            feature_def = REGEX_FEATURE.findall(line)
> +            defined_features.add(feature_def[0])
> +            skip = False
> +        elif REGEX_KCONFIG_HELP.match(line):
> +            skip = True
> +        elif skip:
> +            # Ignore content of help messages
> +            pass
> +        elif REGEX_KCONFIG_STMT.match(line):
> +            features = get_features_in_line(line)
> +            # Multi-line statements
> +            while line.endswith("\\"):
> +                i += 1
> +                line = lines[i]
> +                line = line.strip('\n')
> +                features.extend(get_features_in_line(line))
> +            for feature in set(features):
> +                paths = referenced_features.get(feature, set())
> +                paths.add(kfile)
> +                referenced_features[feature] = paths
> +
> +
> +if __name__ == "__main__":
> +    main()
> diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
> deleted file mode 100755
> index ccb3391..0000000
> --- a/scripts/checkkconfigsymbols.sh
> +++ /dev/null
>[...]

Suggestion for future work: make this git aware. Ie, make things like
    python scripts/checkkconfigsymbols.py v3.17-rc5
    python scripts/checkkconfigsymbols.py next-20140919
    python scripts/checkkconfigsymbols.py $SHA1SUM

produce useful output.


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] checkkconfigsymbols.sh: reimplementation in python
  2014-09-21 21:28   ` Paul Bolle
@ 2014-09-22  7:43     ` Valentin Rothberg
  2014-09-22  8:24       ` Paul Bolle
  0 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-22  7:43 UTC (permalink / raw)
  To: Paul Bolle; +Cc: linux-kernel, gregkh, stefan.hengelein, Valentin Rothberg

On dim., 2014-09-21 at 23:28 +0200, Paul Bolle wrote:
> Valentin Rothberg schreef op zo 21-09-2014 om 21:53 [+0200]:
> > The scripts/checkkconfigsymbols.sh script searches Kconfig features
> > in the source code that are not defined in Kconfig. Such identifiers
> > always evaluate to false and are the source of various kinds of bugs.
> > However, the shell script is slow and it does not detect such broken
> > references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
> > Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
> 
> Curiosity: what are those four false positives?

1) /arch/cris/kernel/module.c: 	ETRAX_KMALLOCED_MODULES (defined in
arch/cris/Kconfig)

2) ./lib/Makefile: TEST_MODULE (defined in lib/Kconfig.debug)

3,4) ./include/linux/module.h, ./kernel/module.c: DEBUG_SET_MODULE_RONX
(defined in arch/{s390,arm,x86}/Kconfig.debug)

> > The script is also hard to read and understand, and is thereby difficult
> > to maintain.
> > 
> > This patch replaces the shell script with an implementation in Python,
> > which:
> >     (a) detects the same bugs, but does not report false positives
> 
> Depends a bit on the definition of false positives. Ie, the hit for
>     ./arch/sh/kernel/head_64.S:	CACHE_
> 
> is caused by
>      #error preprocessor flag CONFIG_CACHE_... not recognized!
> 
> Should that line, and similar lines, be changed?

I consider a false positive to actually be defined in Kconfig. The
feature in your example does not really apply to the naming convention
of Kconfig features ("..."), so that our regex does not match it.

However, the regex matches "CONFIG_X86_". I shall change the regex to
not accept strings ending with "_", so that such cases are not reported.

> >     (b) additionally detects broken references in Kconfig and Kbuild files
> >     (c) is up to 75 times faster than the shell script
> > 
> > The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> > from 3m47s to 0m3s, and reports 570 broken references in Linux v3.17-rc1;
> > 49 additional reports of which 16 are located in Kconfig files.
> > 
> > Moreover, we intentionally include references in C comments, which have been
> > ignored until now. Such comments may be leftovers of features that have
> > been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
> > These references can be misleading and should be removed or replaced.
> 
> Yes, that's a good idea.
> 
> > Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
> > Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
> > ---
> > Changelog:
> > v2: Fix of regural expressions
> > v3: Changelog replacement, and add changes of v2
> > ---
> >  scripts/checkkconfigsymbols.py | 131 +++++++++++++++++++++++++++++++++++++++++
> >  scripts/checkkconfigsymbols.sh |  59 -------------------
> >  2 files changed, 131 insertions(+), 59 deletions(-)
> >  create mode 100644 scripts/checkkconfigsymbols.py
> >  delete mode 100755 scripts/checkkconfigsymbols.sh
> > 
> > diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> > new file mode 100644
> > index 0000000..5c51dd1
> > --- /dev/null
> > +++ b/scripts/checkkconfigsymbols.py
> > @@ -0,0 +1,131 @@
> > +#!/usr/bin/env python
> > +
> > +"""Find Kconfig identifieres that are referenced but not defined."""
> > +
> > +# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
> > +# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
> > +#
> > +# This program is free software; you can redistribute it and/or modify it
> > +# under the terms and conditions of the GNU General Public License,
> > +# version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but WITHOUT
> > +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> > +# more details.
> > +
> > +
> > +import os
> > +import re
> > +
> > +
> > +# REGEX EXPRESSIONS
> > +OPERATORS = r"&|\(|\)|\||\!"
> > +FEATURE = r"\w*[A-Z]{1}\w*"
> > +FEATURE_DEF = r"^\s*(menu){,1}config\s+" + FEATURE + r"\s*"
> > +EXPR = r"(" + OPERATORS + r"|\s|" + FEATURE + r")*"
> > +STMT = r"^\s*(if|select|depends\s+on)\s+" + EXPR
> 
>                           "depends on" with multiple spaces?
> > +
> > +# REGEX OBJECTS
> > +REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
> > +REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")
> > +REGEX_FILE_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*$")
> > +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> > +REGEX_FEATURE_DEF = re.compile(FEATURE_DEF)
> > +REGEX_CPP_FEATURE = re.compile(r"\W+CONFIG_(" + FEATURE + r")[.]*")
> 
> There are a few uses of "-DCONFIG_[...]" in Makefiles. This will miss
> those, won't it? That's not bad, per se, but a comment why you're
> skipping those might be nice. Or are those caught too, somewhere else?

I was not aware of such uses, thanks. It seems important to cover them
too. Does this prefix has a certain purpose?


> > +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> > +REGEX_KCONFIG_STMT = re.compile(STMT)
> > +REGEX_KCONFIG_HELP = re.compile(r"^[\s|-]*help[\s|-]*")
> 
> Won't "^\s\+(---help---|help)$" do? Might help catch creative variants
> of the help statement (we had a few in the past).

Yes, your regex is more robust. Thanks!

> > +
> > +
> > +def main():
> > +    """Main function of this module."""
> > +    output = []
> > +    source_files = []
> > +    kconfig_files = []
> > +    defined_features = set()
> > +    referenced_features = dict()
> > +
> > +    for root, _, files in os.walk("."):
> 
> Does this descend into the .git directory?

Yes, it does. I will exclude the .git repository in the next version of
this patch.

> > +        for fil in files:
> > +            if REGEX_FILE_KCONFIG.match(fil):
> > +                kconfig_files.append(os.path.join(root, fil))
> > +            elif REGEX_FILE_SOURCE.search(fil) or REGEX_FILE_MAKE.match(fil):
> > +                source_files.append(os.path.join(root, fil))
> > +
> > +    for kfile in kconfig_files:
> > +        parse_kconfig_file(kfile, defined_features, referenced_features)
> > +
> > +    for sfile in source_files:
> > +        parse_source_file(sfile, referenced_features)
> > +
> > +    print "File list\tundefined symbol used"
> > +    for feature in referenced_features:
> > +        if feature not in defined_features:
> > +            files = referenced_features.get(feature)
> > +            output.append("%s:\t%s" % (", ".join(files), feature))
> > +    for out in sorted(output):
> > +        print out
> > +
> > +
> > +def parse_source_file(sfile, referenced_features):
> > +    """Parse @sfile for referenced Kconfig features."""
> > +    lines = []
> > +    with open(sfile, "r") as stream:
> > +        lines = stream.readlines()
> > +
> > +    for line in lines:
> > +        if not "CONFIG_" in line:
> > +            continue
> > +        features = REGEX_CPP_FEATURE.findall(line)
> > +        for feature in features:
> > +            if feature.endswith("_MODULE"):
> > +                feature = feature[:-len("_MODULE")]
> 
> Uses of CONFIG_FOO_MODULE are now reported as uses of CONFIG_FOO. That
> might be confusing. 

OK. I can leave the suffix and only remove it to check if the feature is
defined. Then, the output should be more precise and less confusing.

> > +            paths = referenced_features.get(feature, set())
> > +            paths.add(sfile)
> > +            referenced_features[feature] = paths
> > +
> > +
> > +def get_features_in_line(line):
> > +    """Return mentioned kconfig features in @line."""
> > +    return REGEX_FEATURE.findall(line)
> > +
> > +
> > +def parse_kconfig_file(kfile, defined_features, referenced_features):
> > +    """Parse @kfile and update feature definitions and references."""
> > +    lines = []
> > +    skip = False
> > +
> > +    with open(kfile, "r") as stream:
> > +        lines = stream.readlines()
> > +
> > +    for i in range(len(lines)):
> > +        line = lines[i]
> > +        line = line.strip('\n')
> > +        line = line.split("#")[0]  # Ignore right side of comments
> > +
> > +        if REGEX_FEATURE_DEF.match(line):
> > +            feature_def = REGEX_FEATURE.findall(line)
> > +            defined_features.add(feature_def[0])
> > +            skip = False
> > +        elif REGEX_KCONFIG_HELP.match(line):
> > +            skip = True
> > +        elif skip:
> > +            # Ignore content of help messages
> > +            pass
> > +        elif REGEX_KCONFIG_STMT.match(line):
> > +            features = get_features_in_line(line)
> > +            # Multi-line statements
> > +            while line.endswith("\\"):
> > +                i += 1
> > +                line = lines[i]
> > +                line = line.strip('\n')
> > +                features.extend(get_features_in_line(line))
> > +            for feature in set(features):
> > +                paths = referenced_features.get(feature, set())
> > +                paths.add(kfile)
> > +                referenced_features[feature] = paths
> > +
> > +
> > +if __name__ == "__main__":
> > +    main()
> > diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
> > deleted file mode 100755
> > index ccb3391..0000000
> > --- a/scripts/checkkconfigsymbols.sh
> > +++ /dev/null
> >[...]
> 
> Suggestion for future work: make this git aware. Ie, make things like
>     python scripts/checkkconfigsymbols.py v3.17-rc5
>     python scripts/checkkconfigsymbols.py next-20140919
>     python scripts/checkkconfigsymbols.py $SHA1SUM
> 
> produce useful output.

I am happy about your idea, as I desire to do exactly that. In fact, I
am developing a tool that is checking Git commits for such symbolic
violations and logic violations (i.e., when the precondition of an
#ifdef block is contradictory). However, this tool cannot be integrated
into the kernel for various reasons.

Thank you for giving feedback.

Valentin Rothberg

> 
> 
> Paul Bolle
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] checkkconfigsymbols.sh: reimplementation in python
  2014-09-22  7:43     ` Valentin Rothberg
@ 2014-09-22  8:24       ` Paul Bolle
  2014-09-22  8:45         ` Valentin Rothberg
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Bolle @ 2014-09-22  8:24 UTC (permalink / raw)
  To: Valentin Rothberg; +Cc: linux-kernel, gregkh, stefan.hengelein

Hi Valentin,

On Mon, 2014-09-22 at 09:43 +0200, Valentin Rothberg wrote:
> On dim., 2014-09-21 at 23:28 +0200, Paul Bolle wrote:
> > Valentin Rothberg schreef op zo 21-09-2014 om 21:53 [+0200]:
> > > Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
> > 
> > Curiosity: what are those four false positives?
> 
> 1) /arch/cris/kernel/module.c: 	ETRAX_KMALLOCED_MODULES (defined in
> arch/cris/Kconfig)

This probably because
    symb_bare=`echo $symb | sed -e 's/_MODULE//'`

in the shell script you removed should read (something untested like):
    symb_bare=`echo $symb | sed -e 's/_MODULE$//'`

> 2) ./lib/Makefile: TEST_MODULE (defined in lib/Kconfig.debug)

TEST_MODULE is an awkward name for a Kconfig symbol. My local script has
it special cased.

> 3,4) ./include/linux/module.h, ./kernel/module.c: DEBUG_SET_MODULE_RONX
> (defined in arch/{s390,arm,x86}/Kconfig.debug)

See above.

> > > This patch replaces the shell script with an implementation in Python,
> > > which:
> > >     (a) detects the same bugs, but does not report false positives
> > 
> > Depends a bit on the definition of false positives. Ie, the hit for
> >     ./arch/sh/kernel/head_64.S:	CACHE_
> > 
> > is caused by
> >      #error preprocessor flag CONFIG_CACHE_... not recognized!
> > 
> > Should that line, and similar lines, be changed?
> 
> I consider a false positive to actually be defined in Kconfig. The
> feature in your example does not really apply to the naming convention
> of Kconfig features ("..."), so that our regex does not match it.

But your python script does report it, doesn't it?

> However, the regex matches "CONFIG_X86_". I shall change the regex to
> not accept strings ending with "_", so that such cases are not reported.

> > > +# REGEX EXPRESSIONS
> > > +OPERATORS = r"&|\(|\)|\||\!"
> > > +FEATURE = r"\w*[A-Z]{1}\w*"
> > > +FEATURE_DEF = r"^\s*(menu){,1}config\s+" + FEATURE + r"\s*"
> > > +EXPR = r"(" + OPERATORS + r"|\s|" + FEATURE + r")*"
> > > +STMT = r"^\s*(if|select|depends\s+on)\s+" + EXPR
> > 
> >                           "depends on" with multiple spaces?
> > > +
> > > +# REGEX OBJECTS
> > > +REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
> > > +REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")

New observation: this causes the script to skip text files, shell
scripts, etc, doesn't it? 

> > > +REGEX_FILE_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*$")
> > > +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> > > +REGEX_FEATURE_DEF = re.compile(FEATURE_DEF)
> > > +REGEX_CPP_FEATURE = re.compile(r"\W+CONFIG_(" + FEATURE + r")[.]*")
> > 
> > There are a few uses of "-DCONFIG_[...]" in Makefiles. This will miss
> > those, won't it? That's not bad, per se, but a comment why you're
> > skipping those might be nice. Or are those caught too, somewhere else?
> 
> I was not aware of such uses, thanks. It seems important to cover them
> too. Does this prefix has a certain purpose?

It is, in short, a way to define preprocessor macros from the GCC
command line (see info gcc).

> > > +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> > > +REGEX_KCONFIG_STMT = re.compile(STMT)
> > > +REGEX_KCONFIG_HELP = re.compile(r"^[\s|-]*help[\s|-]*")
> > 
> > Won't "^\s\+(---help---|help)$" do? Might help catch creative variants
> > of the help statement (we had a few in the past).
> 
> Yes, your regex is more robust. Thanks!

But it seems I should not have escaped the plus. Please check.


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] checkkconfigsymbols.sh: reimplementation in python
  2014-09-22  8:24       ` Paul Bolle
@ 2014-09-22  8:45         ` Valentin Rothberg
  2014-09-22  9:06           ` Paul Bolle
  0 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-22  8:45 UTC (permalink / raw)
  To: Paul Bolle; +Cc: linux-kernel, gregkh, stefan.hengelein, Valentin Rothberg

On lun., 2014-09-22 at 10:24 +0200, Paul Bolle wrote:
> Hi Valentin,
> 
> On Mon, 2014-09-22 at 09:43 +0200, Valentin Rothberg wrote:
> > On dim., 2014-09-21 at 23:28 +0200, Paul Bolle wrote:
> > > Valentin Rothberg schreef op zo 21-09-2014 om 21:53 [+0200]:
> > > > Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
> > > 
> > > Curiosity: what are those four false positives?
> > 
> > 1) /arch/cris/kernel/module.c: 	ETRAX_KMALLOCED_MODULES (defined in
> > arch/cris/Kconfig)
> 
> This probably because
>     symb_bare=`echo $symb | sed -e 's/_MODULE//'`
> 
> in the shell script you removed should read (something untested like):
>     symb_bare=`echo $symb | sed -e 's/_MODULE$//'`
> 
> > 2) ./lib/Makefile: TEST_MODULE (defined in lib/Kconfig.debug)
> 
> TEST_MODULE is an awkward name for a Kconfig symbol. My local script has
> it special cased.

I plan to rename this feature in a future patch, since imho it violates
the _MODULE suffix for kernel modules.

> 
> > 3,4) ./include/linux/module.h, ./kernel/module.c: DEBUG_SET_MODULE_RONX
> > (defined in arch/{s390,arm,x86}/Kconfig.debug)
> 
> See above.
> 
> > > > This patch replaces the shell script with an implementation in Python,
> > > > which:
> > > >     (a) detects the same bugs, but does not report false positives
> > > 
> > > Depends a bit on the definition of false positives. Ie, the hit for
> > >     ./arch/sh/kernel/head_64.S:	CACHE_
> > > 
> > > is caused by
> > >      #error preprocessor flag CONFIG_CACHE_... not recognized!
> > > 
> > > Should that line, and similar lines, be changed?
> > 
> > I consider a false positive to actually be defined in Kconfig. The
> > feature in your example does not really apply to the naming convention
> > of Kconfig features ("..."), so that our regex does not match it.
> 
> But your python script does report it, doesn't it?

It does. Regexes are hell :) I will check that and fix it in the next
version of this patch.

> 
> > However, the regex matches "CONFIG_X86_". I shall change the regex to
> > not accept strings ending with "_", so that such cases are not reported.
> 
> > > > +# REGEX EXPRESSIONS
> > > > +OPERATORS = r"&|\(|\)|\||\!"
> > > > +FEATURE = r"\w*[A-Z]{1}\w*"
> > > > +FEATURE_DEF = r"^\s*(menu){,1}config\s+" + FEATURE + r"\s*"
> > > > +EXPR = r"(" + OPERATORS + r"|\s|" + FEATURE + r")*"
> > > > +STMT = r"^\s*(if|select|depends\s+on)\s+" + EXPR
> > > 
> > >                           "depends on" with multiple spaces?
> > > > +
> > > > +# REGEX OBJECTS
> > > > +REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
> > > > +REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")
> 
> New observation: this causes the script to skip text files, shell
> scripts, etc, doesn't it? 

Yes. Do you prefer to cover such files? I just grepped CONFIG_ in
Documentation and think that covering such could improve the quality
there too. I will put this into the next version of the patch.

> > > > +REGEX_FILE_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*$")
> > > > +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> > > > +REGEX_FEATURE_DEF = re.compile(FEATURE_DEF)
> > > > +REGEX_CPP_FEATURE = re.compile(r"\W+CONFIG_(" + FEATURE + r")[.]*")
> > > 
> > > There are a few uses of "-DCONFIG_[...]" in Makefiles. This will miss
> > > those, won't it? That's not bad, per se, but a comment why you're
> > > skipping those might be nice. Or are those caught too, somewhere else?
> > 
> > I was not aware of such uses, thanks. It seems important to cover them
> > too. Does this prefix has a certain purpose?
> 
> It is, in short, a way to define preprocessor macros from the GCC
> command line (see info gcc).
> 
> > > > +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> > > > +REGEX_KCONFIG_STMT = re.compile(STMT)
> > > > +REGEX_KCONFIG_HELP = re.compile(r"^[\s|-]*help[\s|-]*")
> > > 
> > > Won't "^\s\+(---help---|help)$" do? Might help catch creative variants
> > > of the help statement (we had a few in the past).
> > 
> > Yes, your regex is more robust. Thanks!
> 
> But it seems I should not have escaped the plus. Please check.

Thank you,
 Valentin

> 
> 
> Paul Bolle
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] checkkconfigsymbols.sh: reimplementation in python
  2014-09-22  8:45         ` Valentin Rothberg
@ 2014-09-22  9:06           ` Paul Bolle
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Bolle @ 2014-09-22  9:06 UTC (permalink / raw)
  To: Valentin Rothberg; +Cc: linux-kernel, gregkh, stefan.hengelein

On Mon, 2014-09-22 at 10:45 +0200, Valentin Rothberg wrote:
> On lun., 2014-09-22 at 10:24 +0200, Paul Bolle wrote: 
> > On Mon, 2014-09-22 at 09:43 +0200, Valentin Rothberg wrote:
> > > On dim., 2014-09-21 at 23:28 +0200, Paul Bolle wrote:
> > > 2) ./lib/Makefile: TEST_MODULE (defined in lib/Kconfig.debug)
> > 
> > TEST_MODULE is an awkward name for a Kconfig symbol. My local script has
> > it special cased.
> 
> I plan to rename this feature in a future patch, since imho it violates
> the _MODULE suffix for kernel modules.

Probably a good idea. Could you please CC me if you actually do that?

> > > > > +REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")
> > 
> > New observation: this causes the script to skip text files, shell
> > scripts, etc, doesn't it? 
> 
> Yes. Do you prefer to cover such files?

It seems useful. By now I must have pointed out stale, misspelled, or
simply odd references to Kconfig macros in Documentation and other
non-code files dozens of times. Documentation seems often overlooked
when Kconfig symbols change.

> I just grepped CONFIG_ in
> Documentation and think that covering such could improve the quality
> there too. I will put this into the next version of the patch.

You'll have to think about what to do with defconfig files. (I ignore
them in my local script, but you may want to report oddities in those
too. They are almost all outdated in one way or another.)

Hope this helps,


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
  2014-09-21 19:39 ` [PATCH v2] " Valentin Rothberg
  2014-09-21 19:53 ` [PATCH v3] " Valentin Rothberg
@ 2014-09-22 14:58 ` Valentin Rothberg
  2014-09-23 16:45   ` Paul Bolle
  2014-09-27 12:57 ` [PATCH v5] " Valentin Rothberg
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-22 14:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: gregkh, valentinrothberg, stefan.hengelein, pebolle

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
The script is also hard to read and understand, and is thereby difficult
to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report previous false positives
    (b) additionally detects broken references in Kconfig and all
        non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
    (c) is up to 75 times faster than the shell script

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 939 broken references in Linux v3.17-rc1;
543 additional reports of which 16 are located in Kconfig files,
287 in defconfigs, 66 in ./Documentation, 1 in Kbuild.

Moreover, we intentionally include references in C comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Note that the output format changed from (file list <tab> feature) to
(feature <tab> file list) as it simplifies the detection of the Kconfig
feature for long file lists.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
Changelog:
v2: Fix of regular expressions
v3: Changelog replacement, and add changes of v2
v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
    - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
    - Changes of regular expressions
    - Increases additional reports from 49 to 229 compared to v3
    - Change of output format from (file list <tab> feature) to
      (feature <tab> file list)·
---

 scripts/checkkconfigsymbols.py | 136 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 ------------------
 2 files changed, 136 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..74c18b1
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,136 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifieres that are referenced but not defined."""
+
+# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+
+
+import os
+import re
+
+
+# REGEX EXPRESSIONS
+OPERATORS = r"&|\(|\)|\||\!"
+FEATURE = r"\w*[A-Z]{1}\w*"
+CONFIG_DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
+EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
+STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
+
+# REGEX OBJECTS
+REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
+REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
+REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")
+REGEX_KCONFIG_DEF = re.compile(CONFIG_DEF)
+REGEX_KCONFIG_EXPR = re.compile(EXPR)
+REGEX_KCONFIG_STMT = re.compile(STMT)
+REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
+REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
+
+
+def main():
+    """Main function of this module."""
+    source_files = []
+    kconfig_files = []
+    defined_features = set()
+    referenced_features = dict()
+
+    for root, subdirs, files in os.walk("."):
+        if ".git" in subdirs:
+            subdirs.remove(".git")
+        for fil in files:
+            if REGEX_FILE_KCONFIG.match(fil):
+                kconfig_files.append(os.path.join(root, fil))
+            else:
+                # All non-Kconfig files are checked for consistency
+                source_files.append(os.path.join(root, fil))
+
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+
+    print "Undefined symbol used\tFile list"
+    for feature in sorted(referenced_features):
+        if feature not in defined_features:
+            if feature.endswith("_MODULE"):
+                # Avoid false positives for kernel modules
+                if feature[:-len("_MODULE")] in defined_features:
+                    continue
+            if "FOO" in feature or "BAR" in feature:
+                continue
+            files = referenced_features.get(feature)
+            print "%s:\t%s" % (feature, ", ".join(files))
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_SOURCE_FEATURE.findall(line)
+        for feature in features:
+            if not REGEX_FILTER_FEATURES.search(feature):
+                continue
+            paths = referenced_features.get(feature, set())
+            paths.add(sfile)
+            referenced_features[feature] = paths
+
+
+def get_features_in_line(line):
+    """Return mentioned Kconfig features in @line."""
+    return REGEX_FEATURE.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    skip = False
+
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # Ignore Kconfig comments
+
+        if REGEX_KCONFIG_DEF.match(line):
+            feature_def = REGEX_KCONFIG_DEF.findall(line)
+            defined_features.add(feature_def[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # Ignore content of help messages
+            pass
+        elif REGEX_KCONFIG_STMT.match(line):
+            features = get_features_in_line(line)
+            # Multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                features.extend(get_features_in_line(line))
+            for feature in set(features):
+                paths = referenced_features.get(feature, set())
+                paths.add(kfile)
+                referenced_features[feature] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
--
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] checkkconfigsymbols.sh: reimplementation in python
  2014-09-22 14:58 ` [PATCH v4] " Valentin Rothberg
@ 2014-09-23 16:45   ` Paul Bolle
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Bolle @ 2014-09-23 16:45 UTC (permalink / raw)
  To: Valentin Rothberg; +Cc: linux-kernel, gregkh, stefan.hengelein

Hi Valentin,

On Mon, 2014-09-22 at 16:58 +0200, Valentin Rothberg wrote:
> +FEATURE = r"\w*[A-Z]{1}\w*"
> +CONFIG_DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"

If I read this correctly, this misidentifies these Kconfig symbols (all
found in powerpc, by the way):
    8260
    8272
    40x
    44x
    6xx
    8xx
    4xx

8260 and 8272 are rather unfortunate names. I'm all for renaming them.
But we're probably stuck with those other five symbols.


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v5] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
                   ` (2 preceding siblings ...)
  2014-09-22 14:58 ` [PATCH v4] " Valentin Rothberg
@ 2014-09-27 12:57 ` Valentin Rothberg
  2014-09-27 14:30 ` [PATCH v6] " Valentin Rothberg
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-27 12:57 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: gregkh, valentinrothberg, stefan.hengelein, pebolle

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives. The script is also hard to
read and understand, and is thereby difficult to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report previous false positives
    (b) additionally detects broken references in Kconfig and all
        non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
    (c) is up to 75 times faster than the shell script
    (d) only checks files under version control

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 933 broken references in Linux v3.17-rc1;
419 additional reports of which 16 are located in Kconfig files,
287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.

Moreover, we intentionally include references in comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Note that the output format changed from (file list <tab> feature) to
(feature <tab> file list) as it simplifies the detection of the Kconfig
feature for long file lists.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
Changelog:
v2: Fix of regular expressions
v3: Changelog replacement, and add changes of v2
v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
    - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
    - Changes of regular expressions
    - Increases additional reports from 49 to 229 compared to v3
    - Change of output format from (file list <tab> feature) to
      (feature <tab> file list)·
v5: Only analyze files under version control ('git ls-files')
---
 scripts/checkkconfigsymbols.py | 142 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 -----------------
 2 files changed, 142 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..f944089
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,142 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifieres that are referenced but not defined."""
+
+# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+
+
+import os
+import re
+from subprocess import Popen, PIPE, STDOUT
+
+# REGEX EXPRESSIONS
+OPERATORS = r"&|\(|\)|\||\!"
+FEATURE = r"\w*[A-Z]{1}\w*"
+CONFIG_DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
+EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
+STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
+
+# REGEX OBJECTS
+REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
+REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
+REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")
+REGEX_KCONFIG_DEF = re.compile(CONFIG_DEF)
+REGEX_KCONFIG_EXPR = re.compile(EXPR)
+REGEX_KCONFIG_STMT = re.compile(STMT)
+REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
+REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
+
+
+def main():
+    """Main function of this module."""
+    source_files = []
+    kconfig_files = []
+    defined_features = set()
+    referenced_features = dict()
+
+    # use 'git ls-files' to get the worklist
+    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
+    (stdout, _) = pop.communicate()  # wait until finished
+    if len(stdout) > 0 and stdout[-1] == "\n":
+        stdout = stdout[:-1]
+
+    for gitfile in stdout.rsplit("\n"):
+        if ".git" in gitfile or "ChangeLog" in gitfile or \
+                os.path.isdir(gitfile):
+            continue
+        if REGEX_FILE_KCONFIG.match(gitfile):
+            kconfig_files.append(gitfile)
+        else:
+            # All non-Kconfig files are checked for consistency
+            source_files.append(gitfile)
+
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+
+    print "Undefined symbol used\tFile list"
+    for feature in sorted(referenced_features):
+        if feature not in defined_features:
+            if feature.endswith("_MODULE"):
+                # Avoid false positives for kernel modules
+                if feature[:-len("_MODULE")] in defined_features:
+                    continue
+            if "FOO" in feature or "BAR" in feature:
+                continue
+            files = referenced_features.get(feature)
+            print "%s:\t%s" % (feature, ", ".join(files))
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_SOURCE_FEATURE.findall(line)
+        for feature in features:
+            if not REGEX_FILTER_FEATURES.search(feature):
+                continue
+            paths = referenced_features.get(feature, set())
+            paths.add(sfile)
+            referenced_features[feature] = paths
+
+
+def get_features_in_line(line):
+    """Return mentioned Kconfig features in @line."""
+    return REGEX_FEATURE.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    skip = False
+
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # Ignore Kconfig comments
+
+        if REGEX_KCONFIG_DEF.match(line):
+            feature_def = REGEX_KCONFIG_DEF.findall(line)
+            defined_features.add(feature_def[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # Ignore content of help messages
+            pass
+        elif REGEX_KCONFIG_STMT.match(line):
+            features = get_features_in_line(line)
+            # Multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                features.extend(get_features_in_line(line))
+            for feature in set(features):
+                paths = referenced_features.get(feature, set())
+                paths.add(kfile)
+                referenced_features[feature] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
                   ` (3 preceding siblings ...)
  2014-09-27 12:57 ` [PATCH v5] " Valentin Rothberg
@ 2014-09-27 14:30 ` Valentin Rothberg
  2014-09-28 15:55 ` [PATCH v7] " Valentin Rothberg
  2014-09-29 17:05 ` [PATCH v8] " Valentin Rothberg
  6 siblings, 0 replies; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-27 14:30 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: gregkh, valentinrothberg, stefan.hengelein, pebolle

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives. The script is also hard to
read and understand, and is thereby difficult to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report previous false positives
    (b) additionally detects broken references in Kconfig and all
        non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
    (c) is up to 75 times faster than the shell script
    (d) only checks files under version control

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 938 broken references in Linux v3.17-rc1;
419 additional reports of which 16 are located in Kconfig files,
287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.

Moreover, we intentionally include references in comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Note that the output format changed from (file list <tab> feature) to
(feature <tab> file list) as it simplifies the detection of the Kconfig
feature for long file lists.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
Changelog:
v2: Fix of regular expressions
v3: Changelog replacement, and add changes of v2
v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
    - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
    - Changes of regular expressions
    - Increases additional reports from 49 to 229 compared to v3
    - Change of output format from (file list <tab> feature) to
      (feature <tab> file list)·
v5: Only analyze files under version control ('git ls-files')
v6: Cover features with numbers and small letters (e.g., 4xx)
---

 scripts/checkkconfigsymbols.py | 142 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 -----------------
 2 files changed, 142 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..f944089
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,142 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifieres that are referenced but not defined."""
+
+# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+
+
+import os
+import re
+from subprocess import Popen, PIPE, STDOUT
+
+# REGEX EXPRESSIONS
+OPERATORS = r"&|\(|\)|\||\!"
+FEATURE = r"\w*[A-Z]{1}\w*"
+CONFIG_DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
+EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
+STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
+
+# REGEX OBJECTS
+REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
+REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
+REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")
+REGEX_KCONFIG_DEF = re.compile(CONFIG_DEF)
+REGEX_KCONFIG_EXPR = re.compile(EXPR)
+REGEX_KCONFIG_STMT = re.compile(STMT)
+REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
+REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
+
+
+def main():
+    """Main function of this module."""
+    source_files = []
+    kconfig_files = []
+    defined_features = set()
+    referenced_features = dict()
+
+    # use 'git ls-files' to get the worklist
+    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
+    (stdout, _) = pop.communicate()  # wait until finished
+    if len(stdout) > 0 and stdout[-1] == "\n":
+        stdout = stdout[:-1]
+
+    for gitfile in stdout.rsplit("\n"):
+        if ".git" in gitfile or "ChangeLog" in gitfile or \
+                os.path.isdir(gitfile):
+            continue
+        if REGEX_FILE_KCONFIG.match(gitfile):
+            kconfig_files.append(gitfile)
+        else:
+            # All non-Kconfig files are checked for consistency
+            source_files.append(gitfile)
+
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+
+    print "Undefined symbol used\tFile list"
+    for feature in sorted(referenced_features):
+        if feature not in defined_features:
+            if feature.endswith("_MODULE"):
+                # Avoid false positives for kernel modules
+                if feature[:-len("_MODULE")] in defined_features:
+                    continue
+            if "FOO" in feature or "BAR" in feature:
+                continue
+            files = referenced_features.get(feature)
+            print "%s:\t%s" % (feature, ", ".join(files))
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_SOURCE_FEATURE.findall(line)
+        for feature in features:
+            if not REGEX_FILTER_FEATURES.search(feature):
+                continue
+            paths = referenced_features.get(feature, set())
+            paths.add(sfile)
+            referenced_features[feature] = paths
+
+
+def get_features_in_line(line):
+    """Return mentioned Kconfig features in @line."""
+    return REGEX_FEATURE.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    skip = False
+
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # Ignore Kconfig comments
+
+        if REGEX_KCONFIG_DEF.match(line):
+            feature_def = REGEX_KCONFIG_DEF.findall(line)
+            defined_features.add(feature_def[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # Ignore content of help messages
+            pass
+        elif REGEX_KCONFIG_STMT.match(line):
+            features = get_features_in_line(line)
+            # Multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                features.extend(get_features_in_line(line))
+            for feature in set(features):
+                paths = referenced_features.get(feature, set())
+                paths.add(kfile)
+                referenced_features[feature] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
                   ` (4 preceding siblings ...)
  2014-09-27 14:30 ` [PATCH v6] " Valentin Rothberg
@ 2014-09-28 15:55 ` Valentin Rothberg
  2014-09-29 10:28   ` Paul Bolle
  2014-09-29 17:05 ` [PATCH v8] " Valentin Rothberg
  6 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-28 15:55 UTC (permalink / raw)
  To: linux-kernel, akpm; +Cc: pebolle, Valentin Rothberg, Stefan Hengelein

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives. The script is also hard to
read and understand, and is thereby difficult to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report previous false positives
    (b) additionally detects broken references in Kconfig and all
        non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
    (c) is up to 75 times faster than the shell script
    (d) only checks files under version control ('git ls-files')

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 939 broken identifiers in Linux v3.17-rc1;
420 additional reports of which 16 are located in Kconfig files,
287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.

Moreover, we intentionally include references in comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Note that the output format changed from (file list <tab> feature) to
(feature <tab> file list) as it simplifies the detection of the Kconfig
feature for long file lists.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
---
Changelog:
v2: Fix of regular expressions
v3: Changelog replacement, and add changes of v2
v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
  - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
  - Changes of regular expressions
  - Increases additional reports from 49 to 229 compared to v3
  - Change of output format from (file list <tab> feature) to
        (feature <tab> file list)·
v5: Only analyze files under version control ('git ls-files')
v6: Cover features with numbers and small letters (e.g., 4xx)
v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
---
 scripts/checkkconfigsymbols.py | 138 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 ------------------
 2 files changed, 138 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..01bf9c4
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,138 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifiers that are referenced but not defined."""
+
+# (c) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# (c) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# Licensed under the terms of the GNU GPL License version 2
+
+
+import os
+import re
+from subprocess import Popen, PIPE, STDOUT
+
+
+# regex expressions
+OPERATORS = r"&|\(|\)|\||\!"
+FEATURE = r"(?:\w*[A-Z0-9]\w*){2,}"
+DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
+EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
+STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
+
+# regex objects
+REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
+REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
+REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")
+REGEX_KCONFIG_DEF = re.compile(DEF)
+REGEX_KCONFIG_EXPR = re.compile(EXPR)
+REGEX_KCONFIG_STMT = re.compile(STMT)
+REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
+REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
+
+
+def main():
+    """Main function of this module."""
+    source_files = []
+    kconfig_files = []
+    defined_features = set()
+    referenced_features = dict()  # {feature: [files]}
+
+    # use 'git ls-files' to get the worklist
+    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
+    (stdout, _) = pop.communicate()  # wait until finished
+    if len(stdout) > 0 and stdout[-1] == "\n":
+        stdout = stdout[:-1]
+
+    for gitfile in stdout.rsplit("\n"):
+        if ".git" in gitfile or "ChangeLog" in gitfile or \
+                os.path.isdir(gitfile):
+            continue
+        if REGEX_FILE_KCONFIG.match(gitfile):
+            kconfig_files.append(gitfile)
+        else:
+            # all non-Kconfig files are checked for consistency
+            source_files.append(gitfile)
+
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+
+    print "Undefined symbol used\tFile list"
+    for feature in sorted(referenced_features):
+        # filter some false positives
+        if feature == "FOO" or feature == "BAR" or \
+                feature == "FOO_BAR":
+            continue
+        if feature not in defined_features:
+            if feature.endswith("_MODULE"):
+                # avoid false positives for kernel modules
+                if feature[:-len("_MODULE")] in defined_features:
+                    continue
+            files = referenced_features.get(feature)
+            print "%s\t%s" % (feature, ", ".join(files))
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_SOURCE_FEATURE.findall(line)
+        for feature in features:
+            if not REGEX_FILTER_FEATURES.search(feature):
+                continue
+            sfiles = referenced_features.get(feature, set())
+            sfiles.add(sfile)
+            referenced_features[feature] = sfiles
+
+
+def get_features_in_line(line):
+    """Return mentioned Kconfig features in @line."""
+    return REGEX_FEATURE.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    skip = False
+
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # ignore comments
+
+        if REGEX_KCONFIG_DEF.match(line):
+            feature_def = REGEX_KCONFIG_DEF.findall(line)
+            defined_features.add(feature_def[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # ignore content of help messages
+            pass
+        elif REGEX_KCONFIG_STMT.match(line):
+            features = get_features_in_line(line)
+            # multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                features.extend(get_features_in_line(line))
+            for feature in set(features):
+                paths = referenced_features.get(feature, set())
+                paths.add(kfile)
+                referenced_features[feature] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
--
1.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python
  2014-09-28 15:55 ` [PATCH v7] " Valentin Rothberg
@ 2014-09-29 10:28   ` Paul Bolle
  2014-09-29 12:08     ` Valentin Rothberg
  2014-09-29 14:47     ` Michal Marek
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Bolle @ 2014-09-29 10:28 UTC (permalink / raw)
  To: Valentin Rothberg; +Cc: akpm, Stefan Hengelein, linux-kbuild, linux-kernel

[Added  linux-kbuild@vger.kernel.org.]

On Sun, 2014-09-28 at 17:55 +0200, Valentin Rothberg wrote:
> The scripts/checkkconfigsymbols.sh script searches Kconfig features
> in the source code that are not defined in Kconfig. Such identifiers
> always evaluate to false and are the source of various kinds of bugs.
> However, the shell script is slow and it does not detect such broken
> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
> Furthermore, it generates false positives. The script is also hard to
> read and understand, and is thereby difficult to maintain.
> 
> This patch replaces the shell script with an implementation in Python,
> which:
>     (a) detects the same bugs, but does not report previous false positives
>     (b) additionally detects broken references in Kconfig and all
>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>     (c) is up to 75 times faster than the shell script
>     (d) only checks files under version control ('git ls-files')

(The shell script is .git unaware. If you happen to have one or more
branches with "Kconfig" in their name, as I do, it generates a lot of
noise on stderr.)

> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> from 3m47s to 0m3s, and reports 939 broken identifiers in Linux v3.17-rc1;
> 420 additional reports of which 16 are located in Kconfig files,
> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
> 
> Moreover, we intentionally include references in comments, which have been
> ignored until now. Such comments may be leftovers of features that have
> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
> These references can be misleading and should be removed or replaced.
> 
> Note that the output format changed from (file list <tab> feature) to
> (feature <tab> file list) as it simplifies the detection of the Kconfig
> feature for long file lists.
> 
> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
> ---
> Changelog:
> v2: Fix of regular expressions
> v3: Changelog replacement, and add changes of v2
> v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>   - Changes of regular expressions
>   - Increases additional reports from 49 to 229 compared to v3
>   - Change of output format from (file list <tab> feature) to
>         (feature <tab> file list)·
> v5: Only analyze files under version control ('git ls-files')
> v6: Cover features with numbers and small letters (e.g., 4xx)
> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
> ---
>  scripts/checkkconfigsymbols.py | 138 +++++++++++++++++++++++++++++++++++++++++
>  scripts/checkkconfigsymbols.sh |  59 ------------------
>  2 files changed, 138 insertions(+), 59 deletions(-)
>  create mode 100644 scripts/checkkconfigsymbols.py
>  delete mode 100755 scripts/checkkconfigsymbols.sh
> 
> diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> new file mode 100644
> index 0000000..01bf9c4
> --- /dev/null
> +++ b/scripts/checkkconfigsymbols.py
> @@ -0,0 +1,138 @@
> +#!/usr/bin/env python
> +
> +"""Find Kconfig identifiers that are referenced but not defined."""
> +
> +# (c) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
> +# (c) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
> +#
> +# Licensed under the terms of the GNU GPL License version 2
> +
> +
> +import os
> +import re
> +from subprocess import Popen, PIPE, STDOUT
> +
> +
> +# regex expressions
> +OPERATORS = r"&|\(|\)|\||\!"
> +FEATURE = r"(?:\w*[A-Z0-9]\w*){2,}"
> +DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
> +EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR

Could please make that "depends on"? Yes, it seems the yacc grammar
accepts any amount of whitespace, but that doesn't make it right to use
anything other than a single space. (Can the yacc grammar be tweaked to
see "depends on" as one, well, token?)

> +
> +# regex objects
> +REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
> +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> +REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")

That should be "-D", because now you get a hit on
    TPS6507X_ADCONFIG_CONVERT_TS

and friends. What does \W do, by the way?

> +REGEX_KCONFIG_DEF = re.compile(DEF)
> +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> +REGEX_KCONFIG_STMT = re.compile(STMT)
> +REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
> +REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
> +
> +
> +def main():
> +    """Main function of this module."""
> +    source_files = []
> +    kconfig_files = []
> +    defined_features = set()
> +    referenced_features = dict()  # {feature: [files]}
> +
> +    # use 'git ls-files' to get the worklist
> +    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
> +    (stdout, _) = pop.communicate()  # wait until finished
> +    if len(stdout) > 0 and stdout[-1] == "\n":
> +        stdout = stdout[:-1]
> +
> +    for gitfile in stdout.rsplit("\n"):
> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
> +                os.path.isdir(gitfile):

(The only directories in the git tree are a few symlinks:
    arch/arm/boot/dts/include/dt-bindings
    arch/metag/boot/dts/include/dt-bindings
    arch/mips/boot/dts/include/dt-bindings
    arch/powerpc/boot/dts/include/dt-bindings

Filtering out symlinks makes sense anyway. But that's probably not worth
the effort.)

You might also consider filtering out "Next/merge.log". It tends to have
references to Kconfig macros. But it's only a few hits anyway.

> +            continue
> +        if REGEX_FILE_KCONFIG.match(gitfile):
> +            kconfig_files.append(gitfile)
> +        else:
> +            # all non-Kconfig files are checked for consistency
> +            source_files.append(gitfile)
> +
> +    for sfile in source_files:
> +        parse_source_file(sfile, referenced_features)
> +
> +    for kfile in kconfig_files:
> +        parse_kconfig_file(kfile, defined_features, referenced_features)
> +
> +    print "Undefined symbol used\tFile list"
> +    for feature in sorted(referenced_features):
> +        # filter some false positives
> +        if feature == "FOO" or feature == "BAR" or \
> +                feature == "FOO_BAR":
> +            continue
> +        if feature not in defined_features:
> +            if feature.endswith("_MODULE"):
> +                # avoid false positives for kernel modules
> +                if feature[:-len("_MODULE")] in defined_features:
> +                    continue
> +            files = referenced_features.get(feature)
> +            print "%s\t%s" % (feature, ", ".join(files))
> +
> +
> +def parse_source_file(sfile, referenced_features):
> +    """Parse @sfile for referenced Kconfig features."""
> +    lines = []
> +    with open(sfile, "r") as stream:
> +        lines = stream.readlines()
> +
> +    for line in lines:
> +        if not "CONFIG_" in line:
> +            continue
> +        features = REGEX_SOURCE_FEATURE.findall(line)
> +        for feature in features:
> +            if not REGEX_FILTER_FEATURES.search(feature):
> +                continue
> +            sfiles = referenced_features.get(feature, set())
> +            sfiles.add(sfile)
> +            referenced_features[feature] = sfiles
> +
> +
> +def get_features_in_line(line):
> +    """Return mentioned Kconfig features in @line."""
> +    return REGEX_FEATURE.findall(line)
> +
> +
> +def parse_kconfig_file(kfile, defined_features, referenced_features):
> +    """Parse @kfile and update feature definitions and references."""
> +    lines = []
> +    skip = False
> +
> +    with open(kfile, "r") as stream:
> +        lines = stream.readlines()
> +
> +    for i in range(len(lines)):
> +        line = lines[i]
> +        line = line.strip('\n')
> +        line = line.split("#")[0]  # ignore comments
> +
> +        if REGEX_KCONFIG_DEF.match(line):
> +            feature_def = REGEX_KCONFIG_DEF.findall(line)
> +            defined_features.add(feature_def[0])
> +            skip = False
> +        elif REGEX_KCONFIG_HELP.match(line):
> +            skip = True
> +        elif skip:
> +            # ignore content of help messages
> +            pass
> +        elif REGEX_KCONFIG_STMT.match(line):
> +            features = get_features_in_line(line)
> +            # multi-line statements
> +            while line.endswith("\\"):
> +                i += 1
> +                line = lines[i]
> +                line = line.strip('\n')
> +                features.extend(get_features_in_line(line))
> +            for feature in set(features):
> +                paths = referenced_features.get(feature, set())
> +                paths.add(kfile)
> +                referenced_features[feature] = paths
> +
> +
> +if __name__ == "__main__":
> +    main()
>[...]

This seems to find, roughly, what my local perl script finds. It does
skip references to Kconfig macros in the Kconfig help texts and
comments: grep for CONFIG_ITANIUM_ASTEP_SPECIFIC as an example. But
there are so few of those that it's probable not worth the trouble to
check for them too.

A few test show the speedup is especially nice with the entire tree in
cache: run time drops from over four minutes to just under five seconds
here. Provided you look into my comments, this is:

Acked-by: Paul Bolle <pebolle@tiscali.nl>

Thanks,


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python
  2014-09-29 10:28   ` Paul Bolle
@ 2014-09-29 12:08     ` Valentin Rothberg
  2014-09-29 12:45       ` Paul Bolle
  2014-09-29 14:47     ` Michal Marek
  1 sibling, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-29 12:08 UTC (permalink / raw)
  To: Paul Bolle; +Cc: akpm, Stefan Hengelein, linux-kbuild, linux-kernel

On Mon, Sep 29, 2014 at 12:28 PM, Paul Bolle <pebolle@tiscali.nl> wrote:
> [Added  linux-kbuild@vger.kernel.org.]
>
> On Sun, 2014-09-28 at 17:55 +0200, Valentin Rothberg wrote:
>> The scripts/checkkconfigsymbols.sh script searches Kconfig features
>> in the source code that are not defined in Kconfig. Such identifiers
>> always evaluate to false and are the source of various kinds of bugs.
>> However, the shell script is slow and it does not detect such broken
>> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
>> Furthermore, it generates false positives. The script is also hard to
>> read and understand, and is thereby difficult to maintain.
>>
>> This patch replaces the shell script with an implementation in Python,
>> which:
>>     (a) detects the same bugs, but does not report previous false positives
>>     (b) additionally detects broken references in Kconfig and all
>>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>>     (c) is up to 75 times faster than the shell script
>>     (d) only checks files under version control ('git ls-files')
>
> (The shell script is .git unaware. If you happen to have one or more
> branches with "Kconfig" in their name, as I do, it generates a lot of
> noise on stderr.)

Do you prefer to use os.walk() then? I just don't want to assume that
the script is called in a clean tree. 'git ls-files' avoids to filter
files. How do you solve this issue in your script?

>> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
>> from 3m47s to 0m3s, and reports 939 broken identifiers in Linux v3.17-rc1;
>> 420 additional reports of which 16 are located in Kconfig files,
>> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
>>
>> Moreover, we intentionally include references in comments, which have been
>> ignored until now. Such comments may be leftovers of features that have
>> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
>> These references can be misleading and should be removed or replaced.
>>
>> Note that the output format changed from (file list <tab> feature) to
>> (feature <tab> file list) as it simplifies the detection of the Kconfig
>> feature for long file lists.
>>
>> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
>> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
>> ---
>> Changelog:
>> v2: Fix of regular expressions
>> v3: Changelog replacement, and add changes of v2
>> v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>>   - Changes of regular expressions
>>   - Increases additional reports from 49 to 229 compared to v3
>>   - Change of output format from (file list <tab> feature) to
>>         (feature <tab> file list)·
>> v5: Only analyze files under version control ('git ls-files')
>> v6: Cover features with numbers and small letters (e.g., 4xx)
>> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
>> ---
>>  scripts/checkkconfigsymbols.py | 138 +++++++++++++++++++++++++++++++++++++++++
>>  scripts/checkkconfigsymbols.sh |  59 ------------------
>>  2 files changed, 138 insertions(+), 59 deletions(-)
>>  create mode 100644 scripts/checkkconfigsymbols.py
>>  delete mode 100755 scripts/checkkconfigsymbols.sh
>>
>> diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
>> new file mode 100644
>> index 0000000..01bf9c4
>> --- /dev/null
>> +++ b/scripts/checkkconfigsymbols.py
>> @@ -0,0 +1,138 @@
>> +#!/usr/bin/env python
>> +
>> +"""Find Kconfig identifiers that are referenced but not defined."""
>> +
>> +# (c) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
>> +# (c) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
>> +#
>> +# Licensed under the terms of the GNU GPL License version 2
>> +
>> +
>> +import os
>> +import re
>> +from subprocess import Popen, PIPE, STDOUT
>> +
>> +
>> +# regex expressions
>> +OPERATORS = r"&|\(|\)|\||\!"
>> +FEATURE = r"(?:\w*[A-Z0-9]\w*){2,}"
>> +DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
>> +EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
>> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
>
> Could please make that "depends on"? Yes, it seems the yacc grammar
> accepts any amount of whitespace, but that doesn't make it right to use
> anything other than a single space. (Can the yacc grammar be tweaked to
> see "depends on" as one, well, token?)

I don't know if yacc can do that. Usually the lexer trims whitespaces.
If I change the regex to "depends on", we will miss potential
statements as Kconfig accepts multiple spaces between the "depends"
and the "on".

>> +
>> +# regex objects
>> +REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
>> +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
>> +REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")
>
> That should be "-D", because now you get a hit on
>     TPS6507X_ADCONFIG_CONVERT_TS
>
> and friends. What does \W do, by the way?

(?:\W|\b)+[D]{,1}CONFIG_... will do the trick. "-D" would fail in
getting DCONFIG_ in the build files. "\W" matches not "\w", which is
[A-Za-z0-9_].

>> +REGEX_KCONFIG_DEF = re.compile(DEF)
>> +REGEX_KCONFIG_EXPR = re.compile(EXPR)
>> +REGEX_KCONFIG_STMT = re.compile(STMT)
>> +REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
>> +REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
>> +
>> +
>> +def main():
>> +    """Main function of this module."""
>> +    source_files = []
>> +    kconfig_files = []
>> +    defined_features = set()
>> +    referenced_features = dict()  # {feature: [files]}
>> +
>> +    # use 'git ls-files' to get the worklist
>> +    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
>> +    (stdout, _) = pop.communicate()  # wait until finished
>> +    if len(stdout) > 0 and stdout[-1] == "\n":
>> +        stdout = stdout[:-1]
>> +
>> +    for gitfile in stdout.rsplit("\n"):
>> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
>> +                os.path.isdir(gitfile):
>
> (The only directories in the git tree are a few symlinks:
>     arch/arm/boot/dts/include/dt-bindings
>     arch/metag/boot/dts/include/dt-bindings
>     arch/mips/boot/dts/include/dt-bindings
>     arch/powerpc/boot/dts/include/dt-bindings
>
> Filtering out symlinks makes sense anyway. But that's probably not worth
> the effort.)
>
> You might also consider filtering out "Next/merge.log". It tends to have
> references to Kconfig macros. But it's only a few hits anyway.

Thanks. I will put ".log" in

>> +            continue
>> +        if REGEX_FILE_KCONFIG.match(gitfile):
>> +            kconfig_files.append(gitfile)
>> +        else:
>> +            # all non-Kconfig files are checked for consistency
>> +            source_files.append(gitfile)
>> +
>> +    for sfile in source_files:
>> +        parse_source_file(sfile, referenced_features)
>> +
>> +    for kfile in kconfig_files:
>> +        parse_kconfig_file(kfile, defined_features, referenced_features)
>> +
>> +    print "Undefined symbol used\tFile list"
>> +    for feature in sorted(referenced_features):
>> +        # filter some false positives
>> +        if feature == "FOO" or feature == "BAR" or \
>> +                feature == "FOO_BAR":
>> +            continue
>> +        if feature not in defined_features:
>> +            if feature.endswith("_MODULE"):
>> +                # avoid false positives for kernel modules
>> +                if feature[:-len("_MODULE")] in defined_features:
>> +                    continue
>> +            files = referenced_features.get(feature)
>> +            print "%s\t%s" % (feature, ", ".join(files))
>> +
>> +
>> +def parse_source_file(sfile, referenced_features):
>> +    """Parse @sfile for referenced Kconfig features."""
>> +    lines = []
>> +    with open(sfile, "r") as stream:
>> +        lines = stream.readlines()
>> +
>> +    for line in lines:
>> +        if not "CONFIG_" in line:
>> +            continue
>> +        features = REGEX_SOURCE_FEATURE.findall(line)
>> +        for feature in features:
>> +            if not REGEX_FILTER_FEATURES.search(feature):
>> +                continue
>> +            sfiles = referenced_features.get(feature, set())
>> +            sfiles.add(sfile)
>> +            referenced_features[feature] = sfiles
>> +
>> +
>> +def get_features_in_line(line):
>> +    """Return mentioned Kconfig features in @line."""
>> +    return REGEX_FEATURE.findall(line)
>> +
>> +
>> +def parse_kconfig_file(kfile, defined_features, referenced_features):
>> +    """Parse @kfile and update feature definitions and references."""
>> +    lines = []
>> +    skip = False
>> +
>> +    with open(kfile, "r") as stream:
>> +        lines = stream.readlines()
>> +
>> +    for i in range(len(lines)):
>> +        line = lines[i]
>> +        line = line.strip('\n')
>> +        line = line.split("#")[0]  # ignore comments
>> +
>> +        if REGEX_KCONFIG_DEF.match(line):
>> +            feature_def = REGEX_KCONFIG_DEF.findall(line)
>> +            defined_features.add(feature_def[0])
>> +            skip = False
>> +        elif REGEX_KCONFIG_HELP.match(line):
>> +            skip = True
>> +        elif skip:
>> +            # ignore content of help messages
>> +            pass
>> +        elif REGEX_KCONFIG_STMT.match(line):
>> +            features = get_features_in_line(line)
>> +            # multi-line statements
>> +            while line.endswith("\\"):
>> +                i += 1
>> +                line = lines[i]
>> +                line = line.strip('\n')
>> +                features.extend(get_features_in_line(line))
>> +            for feature in set(features):
>> +                paths = referenced_features.get(feature, set())
>> +                paths.add(kfile)
>> +                referenced_features[feature] = paths
>> +
>> +
>> +if __name__ == "__main__":
>> +    main()
>>[...]
>
> This seems to find, roughly, what my local perl script finds. It does
> skip references to Kconfig macros in the Kconfig help texts and
> comments: grep for CONFIG_ITANIUM_ASTEP_SPECIFIC as an example. But
> there are so few of those that it's probable not worth the trouble to
> check for them too.
>
> A few test show the speedup is especially nice with the entire tree in
> cache: run time drops from over four minutes to just under five seconds
> here. Provided you look into my comments, this is:
>
> Acked-by: Paul Bolle <pebolle@tiscali.nl>

Thank you,
 Valentin Rothberg

>
> Thanks,
>
>
> Paul Bolle
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python
  2014-09-29 12:08     ` Valentin Rothberg
@ 2014-09-29 12:45       ` Paul Bolle
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Bolle @ 2014-09-29 12:45 UTC (permalink / raw)
  To: Valentin Rothberg; +Cc: akpm, Stefan Hengelein, linux-kbuild, linux-kernel

On Mon, 2014-09-29 at 14:08 +0200, Valentin Rothberg wrote:
> On Mon, Sep 29, 2014 at 12:28 PM, Paul Bolle <pebolle@tiscali.nl> wrote:
> > On Sun, 2014-09-28 at 17:55 +0200, Valentin Rothberg wrote:
> >>     (d) only checks files under version control ('git ls-files')
> >
> > (The shell script is .git unaware. If you happen to have one or more
> > branches with "Kconfig" in their name, as I do, it generates a lot of
> > noise on stderr.)
> 
> Do you prefer to use os.walk() then?

It seems I was unclear: this described a problem with the shell script
and not a problem with your python rewrite.

> I just don't want to assume that
> the script is called in a clean tree. 'git ls-files' avoids to filter
> files. How do you solve this issue in your script?

This is getting off topic, but I choose to not care about the state of
the tree on disk. My script basically does "git ls-tree -r $SOME_TAG",
filters out symlinks, and generates a long list of (blob, path) pairs to
work with.

(You may remember from https://lkml.org/lkml/2014/9/26/485 that my
script uses git notes for each blob it parses. Different approach,
different pros and cons.)

> >> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
> >
> > Could please make that "depends on"? Yes, it seems the yacc grammar
> > accepts any amount of whitespace, but that doesn't make it right to use
> > anything other than a single space. (Can the yacc grammar be tweaked to
> > see "depends on" as one, well, token?)
> 
> I don't know if yacc can do that. Usually the lexer trims whitespaces.
> If I change the regex to "depends on", we will miss potential
> statements as Kconfig accepts multiple spaces between the "depends"
> and the "on".

Your choice. (The expectation behind my request is that bogus results
from this script might help catch uses of this misfeature. But every use
of "depends on" is currently sane, anyway.)

> >> +REGEX_SOURCE_FEATURE = re.compile(r"(?:D|\W|\b)+CONFIG_(" + FEATURE + r")")
> >
> > That should be "-D", because now you get a hit on
> >     TPS6507X_ADCONFIG_CONVERT_TS
> >
> > and friends. What does \W do, by the way?
> 
> (?:\W|\b)+[D]{,1}CONFIG_... will do the trick. "-D" would fail in
> getting DCONFIG_ in the build files. "\W" matches not "\w", which is
> [A-Za-z0-9_].

The joy of regular expressions! You now know what wrong, and you'll
surely come up with something better than my idea to fix it.

Thanks,


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python
  2014-09-29 10:28   ` Paul Bolle
  2014-09-29 12:08     ` Valentin Rothberg
@ 2014-09-29 14:47     ` Michal Marek
  2014-09-29 16:21       ` Paul Bolle
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Marek @ 2014-09-29 14:47 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Valentin Rothberg, akpm, Stefan Hengelein, linux-kbuild, linux-kernel

On 2014-09-29 12:28, Paul Bolle wrote:
>> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
> 
> Could please make that "depends on"? Yes, it seems the yacc grammar
> accepts any amount of whitespace, but that doesn't make it right to use
> anything other than a single space.

But then lines that violate coding style would not be checked for real
errors.


> (Can the yacc grammar be tweaked to
> see "depends on" as one, well, token?)

I don't think this is a good idea. This is a style issue, why make it a
grammar issue.

Michal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7] checkkconfigsymbols.sh: reimplementation in python
  2014-09-29 14:47     ` Michal Marek
@ 2014-09-29 16:21       ` Paul Bolle
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Bolle @ 2014-09-29 16:21 UTC (permalink / raw)
  To: Michal Marek
  Cc: Valentin Rothberg, akpm, Stefan Hengelein, linux-kbuild, linux-kernel

Hi Michal,

[This point I already conceded to Valentin, so my remarks are moot for
Valentin's script.]

On Mon, 2014-09-29 at 16:47 +0200, Michal Marek wrote:
> On 2014-09-29 12:28, Paul Bolle wrote:
> >> +STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
> > 
> > Could please make that "depends on"? Yes, it seems the yacc grammar
> > accepts any amount of whitespace, but that doesn't make it right to use
> > anything other than a single space.
> 
> But then lines that violate coding style would not be checked for real
> errors.

Perhaps my dislike of this element of the grammar won here. There are
probably better ways to enforce proper use of "depends on".

> > (Can the yacc grammar be tweaked to
> > see "depends on" as one, well, token?)
> 
> I don't think this is a good idea. This is a style issue, why make it a
> grammar issue.

Well, a grammar that allows one of its keywords to be written in
different ways makes style and grammar issues overlap. Whatever. I guess
it's just me being annoyed with writing
    git grep -w "depends\s\+on\s\+FOO"

instead of
    git grep -w "depends on FOO"

Ditto for writing a parser for Kconfig files. (Real world examples are
more complicated, but you catch my drift.)


Paul Bolle


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v8] checkkconfigsymbols.sh: reimplementation in python
  2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
                   ` (5 preceding siblings ...)
  2014-09-28 15:55 ` [PATCH v7] " Valentin Rothberg
@ 2014-09-29 17:05 ` Valentin Rothberg
  2014-10-01 14:58   ` Michal Marek
  6 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-09-29 17:05 UTC (permalink / raw)
  To: linux-kernel, akpm, linux-kbuild
  Cc: pebolle, mmarek, Valentin Rothberg, Stefan Hengelein

The scripts/checkkconfigsymbols.sh script searches Kconfig features
in the source code that are not defined in Kconfig. Such identifiers
always evaluate to false and are the source of various kinds of bugs.
However, the shell script is slow and it does not detect such broken
references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
Furthermore, it generates false positives. The script is also hard to
read and understand, and is thereby difficult to maintain.

This patch replaces the shell script with an implementation in Python,
which:
    (a) detects the same bugs, but does not report previous false positives
    (b) additionally detects broken references in Kconfig and all
        non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
    (c) is up to 75 times faster than the shell script
    (d) only checks files under version control ('git ls-files')

The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
from 3m47s to 0m3s, and reports 912 broken references in Linux v3.17-rc1;
424 additional reports of which 16 are located in Kconfig files,
287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.

Moreover, we intentionally include references in comments, which have been
ignored until now. Such comments may be leftovers of features that have
been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
These references can be misleading and should be removed or replaced.

Note that the output format changed from (file list <tab> feature) to
(feature <tab> file list) as it simplifies the detection of the Kconfig
feature for long file lists.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
Acked-by: Paul Bolle <pebolle@tiscali.nl>
---
Changelog:
v2: Fix of regular expressions
v3: Changelog replacement, and add changes of v2
v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
  - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
  - Changes of regular expressions
  - Increases additional reports from 49 to 229 compared to v3
  - Change of output format from (file list <tab> feature) to
        (feature <tab> file list)
v5: Only analyze files under version control ('git ls-files')
v6: Cover features with numbers and small letters (e.g., 4xx)
v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
v8: Based on comments from Paul Bolle <pebolle@tiscali.nl>
  - Fix of [D]{,1}CONFIG_ regex to exclude false positives
  - Exclude ".log" files of analysis
  - Filter "XXX" feature
---
 scripts/checkkconfigsymbols.py | 139 +++++++++++++++++++++++++++++++++++++++++
 scripts/checkkconfigsymbols.sh |  59 -----------------
 2 files changed, 139 insertions(+), 59 deletions(-)
 create mode 100644 scripts/checkkconfigsymbols.py
 delete mode 100755 scripts/checkkconfigsymbols.sh

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
new file mode 100644
index 0000000..e9cc689
--- /dev/null
+++ b/scripts/checkkconfigsymbols.py
@@ -0,0 +1,139 @@
+#!/usr/bin/env python
+
+"""Find Kconfig identifiers that are referenced but not defined."""
+
+# (c) 2014 Valentin Rothberg <valentinrothberg@gmail.com>
+# (c) 2014 Stefan Hengelein <stefan.hengelein@fau.de>
+#
+# Licensed under the terms of the GNU GPL License version 2
+
+
+import os
+import re
+from subprocess import Popen, PIPE, STDOUT
+
+
+# regex expressions
+OPERATORS = r"&|\(|\)|\||\!"
+FEATURE = r"(?:\w*[A-Z0-9]\w*){2,}"
+DEF = r"^\s*(?:menu){,1}config\s+(" + FEATURE + r")\s*"
+EXPR = r"(?:" + OPERATORS + r"|\s|" + FEATURE + r")+"
+STMT = r"^\s*(?:if|select|depends\s+on)\s+" + EXPR
+SOURCE_FEATURE = r"(?:\W|\b)+[D]{,1}CONFIG_(" + FEATURE + r")"
+
+# regex objects
+REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
+REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
+REGEX_SOURCE_FEATURE = re.compile(SOURCE_FEATURE)
+REGEX_KCONFIG_DEF = re.compile(DEF)
+REGEX_KCONFIG_EXPR = re.compile(EXPR)
+REGEX_KCONFIG_STMT = re.compile(STMT)
+REGEX_KCONFIG_HELP = re.compile(r"^\s+(help|---help---)\s*$")
+REGEX_FILTER_FEATURES = re.compile(r"[A-Za-z0-9]$")
+
+
+def main():
+    """Main function of this module."""
+    source_files = []
+    kconfig_files = []
+    defined_features = set()
+    referenced_features = dict()  # {feature: [files]}
+
+    # use 'git ls-files' to get the worklist
+    pop = Popen("git ls-files", stdout=PIPE, stderr=STDOUT, shell=True)
+    (stdout, _) = pop.communicate()  # wait until finished
+    if len(stdout) > 0 and stdout[-1] == "\n":
+        stdout = stdout[:-1]
+
+    for gitfile in stdout.rsplit("\n"):
+        if ".git" in gitfile or "ChangeLog" in gitfile or \
+                ".log" in gitfile or os.path.isdir(gitfile):
+            continue
+        if REGEX_FILE_KCONFIG.match(gitfile):
+            kconfig_files.append(gitfile)
+        else:
+            # all non-Kconfig files are checked for consistency
+            source_files.append(gitfile)
+
+    for sfile in source_files:
+        parse_source_file(sfile, referenced_features)
+
+    for kfile in kconfig_files:
+        parse_kconfig_file(kfile, defined_features, referenced_features)
+
+    print "Undefined symbol used\tFile list"
+    for feature in sorted(referenced_features):
+        # filter some false positives
+        if feature == "FOO" or feature == "BAR" or \
+                feature == "FOO_BAR" or feature == "XXX":
+            continue
+        if feature not in defined_features:
+            if feature.endswith("_MODULE"):
+                # avoid false positives for kernel modules
+                if feature[:-len("_MODULE")] in defined_features:
+                    continue
+            files = referenced_features.get(feature)
+            print "%s\t%s" % (feature, ", ".join(files))
+
+
+def parse_source_file(sfile, referenced_features):
+    """Parse @sfile for referenced Kconfig features."""
+    lines = []
+    with open(sfile, "r") as stream:
+        lines = stream.readlines()
+
+    for line in lines:
+        if not "CONFIG_" in line:
+            continue
+        features = REGEX_SOURCE_FEATURE.findall(line)
+        for feature in features:
+            if not REGEX_FILTER_FEATURES.search(feature):
+                continue
+            sfiles = referenced_features.get(feature, set())
+            sfiles.add(sfile)
+            referenced_features[feature] = sfiles
+
+
+def get_features_in_line(line):
+    """Return mentioned Kconfig features in @line."""
+    return REGEX_FEATURE.findall(line)
+
+
+def parse_kconfig_file(kfile, defined_features, referenced_features):
+    """Parse @kfile and update feature definitions and references."""
+    lines = []
+    skip = False
+
+    with open(kfile, "r") as stream:
+        lines = stream.readlines()
+
+    for i in range(len(lines)):
+        line = lines[i]
+        line = line.strip('\n')
+        line = line.split("#")[0]  # ignore comments
+
+        if REGEX_KCONFIG_DEF.match(line):
+            feature_def = REGEX_KCONFIG_DEF.findall(line)
+            defined_features.add(feature_def[0])
+            skip = False
+        elif REGEX_KCONFIG_HELP.match(line):
+            skip = True
+        elif skip:
+            # ignore content of help messages
+            pass
+        elif REGEX_KCONFIG_STMT.match(line):
+            features = get_features_in_line(line)
+            # multi-line statements
+            while line.endswith("\\"):
+                i += 1
+                line = lines[i]
+                line = line.strip('\n')
+                features.extend(get_features_in_line(line))
+            for feature in set(features):
+                paths = referenced_features.get(feature, set())
+                paths.add(kfile)
+                referenced_features[feature] = paths
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
deleted file mode 100755
index ccb3391..0000000
--- a/scripts/checkkconfigsymbols.sh
+++ /dev/null
@@ -1,59 +0,0 @@
-#!/bin/sh
-# Find Kconfig variables used in source code but never defined in Kconfig
-# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
-
-# Tested with dash.
-paths="$@"
-[ -z "$paths" ] && paths=.
-
-# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
-Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
-
-printf "File list \tundefined symbol used\n"
-find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| while read i
-do
-	# Output the bare Kconfig variable and the filename; the _MODULE part at
-	# the end is not removed here (would need perl an not-hungry regexp for that).
-	sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Za-z_]\+\).*!\2 '$i'!p' < $i
-done | \
-# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
-# files which use a given symbol
-awk '{map[$1, count[$1]++] = $2; }
-END {
-	for (combIdx in map) {
-		split(combIdx, separate, SUBSEP);
-		# The value may have been removed.
-		if (! ( (separate[1], separate[2]) in map ) )
-			continue;
-		symb=separate[1];
-		printf "%s ", symb;
-		#Use gawk extension to delete the names vector
-		delete names;
-		#Portably delete the names vector
-		#split("", names);
-		for (i=0; i < count[symb]; i++) {
-			names[map[symb, i]] = 1;
-			# Unfortunately, we may still encounter symb, i in the
-			# outside iteration.
-			delete map[symb, i];
-		}
-		i=0;
-		for (name in names) {
-			if (i > 0)
-				printf ", %s", name;
-			else
-				printf "%s", name;
-			i++;
-		}
-		printf "\n";
-	}
-}' |
-while read symb files; do
-	# Remove the _MODULE suffix when checking the variable name. This should
-	# be done only on tristate symbols, actually, but Kconfig parsing is
-	# beyond the purpose of this script.
-	symb_bare=`echo $symb | sed -e 's/_MODULE//'`
-	if ! grep -q "\<$symb_bare\>" $Kconfigs; then
-		printf "$files: \t$symb\n"
-	fi
-done|sort
--
1.9.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v8] checkkconfigsymbols.sh: reimplementation in python
  2014-09-29 17:05 ` [PATCH v8] " Valentin Rothberg
@ 2014-10-01 14:58   ` Michal Marek
  2014-10-04  9:29     ` Valentin Rothberg
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Marek @ 2014-10-01 14:58 UTC (permalink / raw)
  To: Valentin Rothberg
  Cc: linux-kernel, akpm, linux-kbuild, pebolle, Stefan Hengelein

On 2014-09-29 19:05, Valentin Rothberg wrote:
> The scripts/checkkconfigsymbols.sh script searches Kconfig features
> in the source code that are not defined in Kconfig. Such identifiers
> always evaluate to false and are the source of various kinds of bugs.
> However, the shell script is slow and it does not detect such broken
> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
> Furthermore, it generates false positives. The script is also hard to
> read and understand, and is thereby difficult to maintain.
> 
> This patch replaces the shell script with an implementation in Python,
> which:
>     (a) detects the same bugs, but does not report previous false positives
>     (b) additionally detects broken references in Kconfig and all
>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>     (c) is up to 75 times faster than the shell script
>     (d) only checks files under version control ('git ls-files')
> 
> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> from 3m47s to 0m3s, and reports 912 broken references in Linux v3.17-rc1;
> 424 additional reports of which 16 are located in Kconfig files,
> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
> 
> Moreover, we intentionally include references in comments, which have been
> ignored until now. Such comments may be leftovers of features that have
> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
> These references can be misleading and should be removed or replaced.
> 
> Note that the output format changed from (file list <tab> feature) to
> (feature <tab> file list) as it simplifies the detection of the Kconfig
> feature for long file lists.
> 
> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
> Acked-by: Paul Bolle <pebolle@tiscali.nl>
> ---
> Changelog:
> v2: Fix of regular expressions
> v3: Changelog replacement, and add changes of v2
> v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>   - Changes of regular expressions
>   - Increases additional reports from 49 to 229 compared to v3
>   - Change of output format from (file list <tab> feature) to
>         (feature <tab> file list)
> v5: Only analyze files under version control ('git ls-files')
> v6: Cover features with numbers and small letters (e.g., 4xx)
> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
> v8: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>   - Fix of [D]{,1}CONFIG_ regex to exclude false positives
>   - Exclude ".log" files of analysis
>   - Filter "XXX" feature
> ---
>  scripts/checkkconfigsymbols.py | 139 +++++++++++++++++++++++++++++++++++++++++
>  scripts/checkkconfigsymbols.sh |  59 -----------------
>  2 files changed, 139 insertions(+), 59 deletions(-)
>  create mode 100644 scripts/checkkconfigsymbols.py
>  delete mode 100755 scripts/checkkconfigsymbols.sh

Please make the new file executable as well.


> +    for gitfile in stdout.rsplit("\n"):
> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
> +                ".log" in gitfile or os.path.isdir(gitfile):
> +            continue

Can you also skip arch/*/configs? A significant part of the output are
defconfig files, but there is little value in reporting them. The stale
options will go away automatically as soon as the defconfig is refreshed.

Michal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v8] checkkconfigsymbols.sh: reimplementation in python
  2014-10-01 14:58   ` Michal Marek
@ 2014-10-04  9:29     ` Valentin Rothberg
  2014-10-08 13:39       ` Michal Marek
  0 siblings, 1 reply; 23+ messages in thread
From: Valentin Rothberg @ 2014-10-04  9:29 UTC (permalink / raw)
  To: Michal Marek
  Cc: linux-kernel, akpm, linux-kbuild, Paul Bolle, Stefan Hengelein

On Wed, Oct 1, 2014 at 4:58 PM, Michal Marek <mmarek@suse.cz> wrote:
> On 2014-09-29 19:05, Valentin Rothberg wrote:
>> The scripts/checkkconfigsymbols.sh script searches Kconfig features
>> in the source code that are not defined in Kconfig. Such identifiers
>> always evaluate to false and are the source of various kinds of bugs.
>> However, the shell script is slow and it does not detect such broken
>> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
>> Furthermore, it generates false positives. The script is also hard to
>> read and understand, and is thereby difficult to maintain.
>>
>> This patch replaces the shell script with an implementation in Python,
>> which:
>>     (a) detects the same bugs, but does not report previous false positives
>>     (b) additionally detects broken references in Kconfig and all
>>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>>     (c) is up to 75 times faster than the shell script
>>     (d) only checks files under version control ('git ls-files')
>>
>> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
>> from 3m47s to 0m3s, and reports 912 broken references in Linux v3.17-rc1;
>> 424 additional reports of which 16 are located in Kconfig files,
>> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
>>
>> Moreover, we intentionally include references in comments, which have been
>> ignored until now. Such comments may be leftovers of features that have
>> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
>> These references can be misleading and should be removed or replaced.
>>
>> Note that the output format changed from (file list <tab> feature) to
>> (feature <tab> file list) as it simplifies the detection of the Kconfig
>> feature for long file lists.
>>
>> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
>> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
>> Acked-by: Paul Bolle <pebolle@tiscali.nl>
>> ---
>> Changelog:
>> v2: Fix of regular expressions
>> v3: Changelog replacement, and add changes of v2
>> v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>>   - Changes of regular expressions
>>   - Increases additional reports from 49 to 229 compared to v3
>>   - Change of output format from (file list <tab> feature) to
>>         (feature <tab> file list)
>> v5: Only analyze files under version control ('git ls-files')
>> v6: Cover features with numbers and small letters (e.g., 4xx)
>> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
>> v8: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>   - Fix of [D]{,1}CONFIG_ regex to exclude false positives
>>   - Exclude ".log" files of analysis
>>   - Filter "XXX" feature
>> ---
>>  scripts/checkkconfigsymbols.py | 139 +++++++++++++++++++++++++++++++++++++++++
>>  scripts/checkkconfigsymbols.sh |  59 -----------------
>>  2 files changed, 139 insertions(+), 59 deletions(-)
>>  create mode 100644 scripts/checkkconfigsymbols.py
>>  delete mode 100755 scripts/checkkconfigsymbols.sh
>
> Please make the new file executable as well.
>
>
>> +    for gitfile in stdout.rsplit("\n"):
>> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
>> +                ".log" in gitfile or os.path.isdir(gitfile):
>> +            continue
>
> Can you also skip arch/*/configs? A significant part of the output are
> defconfig files, but there is little value in reporting them. The stale
> options will go away automatically as soon as the defconfig is refreshed.

What do you mean by "refreshed"? And how often are they refreshed? I
think that reporting defconfigs helps to point to a problem, namely
that features are just not present anymore.

Thanks,
 Valentin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v8] checkkconfigsymbols.sh: reimplementation in python
  2014-10-04  9:29     ` Valentin Rothberg
@ 2014-10-08 13:39       ` Michal Marek
  2014-10-19 15:30         ` Valentin Rothberg
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Marek @ 2014-10-08 13:39 UTC (permalink / raw)
  To: Valentin Rothberg
  Cc: linux-kernel, akpm, linux-kbuild, Paul Bolle, Stefan Hengelein

On 2014-10-04 11:29, Valentin Rothberg wrote:
> On Wed, Oct 1, 2014 at 4:58 PM, Michal Marek <mmarek@suse.cz> wrote:
>> On 2014-09-29 19:05, Valentin Rothberg wrote:
>>> The scripts/checkkconfigsymbols.sh script searches Kconfig features
>>> in the source code that are not defined in Kconfig. Such identifiers
>>> always evaluate to false and are the source of various kinds of bugs.
>>> However, the shell script is slow and it does not detect such broken
>>> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
>>> Furthermore, it generates false positives. The script is also hard to
>>> read and understand, and is thereby difficult to maintain.
>>>
>>> This patch replaces the shell script with an implementation in Python,
>>> which:
>>>     (a) detects the same bugs, but does not report previous false positives
>>>     (b) additionally detects broken references in Kconfig and all
>>>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>>>     (c) is up to 75 times faster than the shell script
>>>     (d) only checks files under version control ('git ls-files')
>>>
>>> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
>>> from 3m47s to 0m3s, and reports 912 broken references in Linux v3.17-rc1;
>>> 424 additional reports of which 16 are located in Kconfig files,
>>> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
>>>
>>> Moreover, we intentionally include references in comments, which have been
>>> ignored until now. Such comments may be leftovers of features that have
>>> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
>>> These references can be misleading and should be removed or replaced.
>>>
>>> Note that the output format changed from (file list <tab> feature) to
>>> (feature <tab> file list) as it simplifies the detection of the Kconfig
>>> feature for long file lists.
>>>
>>> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
>>> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
>>> Acked-by: Paul Bolle <pebolle@tiscali.nl>
>>> ---
>>> Changelog:
>>> v2: Fix of regular expressions
>>> v3: Changelog replacement, and add changes of v2
>>> v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>>>   - Changes of regular expressions
>>>   - Increases additional reports from 49 to 229 compared to v3
>>>   - Change of output format from (file list <tab> feature) to
>>>         (feature <tab> file list)
>>> v5: Only analyze files under version control ('git ls-files')
>>> v6: Cover features with numbers and small letters (e.g., 4xx)
>>> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
>>> v8: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>>   - Fix of [D]{,1}CONFIG_ regex to exclude false positives
>>>   - Exclude ".log" files of analysis
>>>   - Filter "XXX" feature
>>> ---
>>>  scripts/checkkconfigsymbols.py | 139 +++++++++++++++++++++++++++++++++++++++++
>>>  scripts/checkkconfigsymbols.sh |  59 -----------------
>>>  2 files changed, 139 insertions(+), 59 deletions(-)
>>>  create mode 100644 scripts/checkkconfigsymbols.py
>>>  delete mode 100755 scripts/checkkconfigsymbols.sh
>>
>> Please make the new file executable as well.
>>
>>
>>> +    for gitfile in stdout.rsplit("\n"):
>>> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
>>> +                ".log" in gitfile or os.path.isdir(gitfile):
>>> +            continue
>>
>> Can you also skip arch/*/configs? A significant part of the output are
>> defconfig files, but there is little value in reporting them. The stale
>> options will go away automatically as soon as the defconfig is refreshed.
> 
> What do you mean by "refreshed"?

Generated again with new Kconfig data.


> And how often are they refreshed? I
> think that reporting defconfigs helps to point to a problem, namely
> that features are just not present anymore.

Any given defconfig is outdated most of the time, so you could as well
list them all :). In most of the cases, it does not matter though,
thanks to the defaults in Kconfig files.

Michal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v8] checkkconfigsymbols.sh: reimplementation in python
  2014-10-08 13:39       ` Michal Marek
@ 2014-10-19 15:30         ` Valentin Rothberg
  0 siblings, 0 replies; 23+ messages in thread
From: Valentin Rothberg @ 2014-10-19 15:30 UTC (permalink / raw)
  To: Michal Marek
  Cc: linux-kernel, akpm, linux-kbuild, Paul Bolle, Stefan Hengelein, gregkh

On Wed, Oct 8, 2014 at 3:39 PM, Michal Marek <mmarek@suse.cz> wrote:
> On 2014-10-04 11:29, Valentin Rothberg wrote:
>> On Wed, Oct 1, 2014 at 4:58 PM, Michal Marek <mmarek@suse.cz> wrote:
>>> On 2014-09-29 19:05, Valentin Rothberg wrote:
>>>> The scripts/checkkconfigsymbols.sh script searches Kconfig features
>>>> in the source code that are not defined in Kconfig. Such identifiers
>>>> always evaluate to false and are the source of various kinds of bugs.
>>>> However, the shell script is slow and it does not detect such broken
>>>> references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINED´´).
>>>> Furthermore, it generates false positives. The script is also hard to
>>>> read and understand, and is thereby difficult to maintain.
>>>>
>>>> This patch replaces the shell script with an implementation in Python,
>>>> which:
>>>>     (a) detects the same bugs, but does not report previous false positives
>>>>     (b) additionally detects broken references in Kconfig and all
>>>>         non-Kconfig files, such as Kbuild, .[cSh], .txt, .sh, defconfig, etc.
>>>>     (c) is up to 75 times faster than the shell script
>>>>     (d) only checks files under version control ('git ls-files')
>>>>
>>>> The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
>>>> from 3m47s to 0m3s, and reports 912 broken references in Linux v3.17-rc1;
>>>> 424 additional reports of which 16 are located in Kconfig files,
>>>> 287 in defconfigs, 63 in ./Documentation, 1 in Kbuild.
>>>>
>>>> Moreover, we intentionally include references in comments, which have been
>>>> ignored until now. Such comments may be leftovers of features that have
>>>> been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */´´).
>>>> These references can be misleading and should be removed or replaced.
>>>>
>>>> Note that the output format changed from (file list <tab> feature) to
>>>> (feature <tab> file list) as it simplifies the detection of the Kconfig
>>>> feature for long file lists.
>>>>
>>>> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
>>>> Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de>
>>>> Acked-by: Paul Bolle <pebolle@tiscali.nl>
>>>> ---
>>>> Changelog:
>>>> v2: Fix of regular expressions
>>>> v3: Changelog replacement, and add changes of v2
>>>> v4: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>>>   - Inclusion of all non-Kconfig files, such as .txt, .sh, etc.
>>>>   - Changes of regular expressions
>>>>   - Increases additional reports from 49 to 229 compared to v3
>>>>   - Change of output format from (file list <tab> feature) to
>>>>         (feature <tab> file list)
>>>> v5: Only analyze files under version control ('git ls-files')
>>>> v6: Cover features with numbers and small letters (e.g., 4xx)
>>>> v7: Add changes of v6 (lost 'git add') and filter FOO/BAR features
>>>> v8: Based on comments from Paul Bolle <pebolle@tiscali.nl>
>>>>   - Fix of [D]{,1}CONFIG_ regex to exclude false positives
>>>>   - Exclude ".log" files of analysis
>>>>   - Filter "XXX" feature
>>>> ---
>>>>  scripts/checkkconfigsymbols.py | 139 +++++++++++++++++++++++++++++++++++++++++
>>>>  scripts/checkkconfigsymbols.sh |  59 -----------------
>>>>  2 files changed, 139 insertions(+), 59 deletions(-)
>>>>  create mode 100644 scripts/checkkconfigsymbols.py
>>>>  delete mode 100755 scripts/checkkconfigsymbols.sh
>>>
>>> Please make the new file executable as well.
>>>
>>>
>>>> +    for gitfile in stdout.rsplit("\n"):
>>>> +        if ".git" in gitfile or "ChangeLog" in gitfile or \
>>>> +                ".log" in gitfile or os.path.isdir(gitfile):
>>>> +            continue
>>>
>>> Can you also skip arch/*/configs? A significant part of the output are
>>> defconfig files, but there is little value in reporting them. The stale
>>> options will go away automatically as soon as the defconfig is refreshed.
>>
>> What do you mean by "refreshed"?
>
> Generated again with new Kconfig data.
>
>
>> And how often are they refreshed? I
>> think that reporting defconfigs helps to point to a problem, namely
>> that features are just not present anymore.
>
> Any given defconfig is outdated most of the time, so you could as well
> list them all :). In most of the cases, it does not matter though,
> thanks to the defaults in Kconfig files.

I talked with Greg about this issue on Linux Plumbers last week in
Dusseldorf. It seems that there are people sending and accepting
patches which address exactly the problem of outdated Kconfig options
in defconfigs.

Michal, I understand your arguments, but I feel that reporting
defconfigs would help people doing this kind of work. As a
consequence, I suggest to keep the patch as it is and wait for
feedback. In case defconfigs are too annoying, we can still change it
later.

What do you think?

Valentin

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-10-19 15:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-20 14:15 [PATCH] checkkconfigsymbols.sh: reimplementation in python Valentin Rothberg
2014-09-21 19:39 ` [PATCH v2] " Valentin Rothberg
2014-09-21 19:53 ` [PATCH v3] " Valentin Rothberg
2014-09-21 21:28   ` Paul Bolle
2014-09-22  7:43     ` Valentin Rothberg
2014-09-22  8:24       ` Paul Bolle
2014-09-22  8:45         ` Valentin Rothberg
2014-09-22  9:06           ` Paul Bolle
2014-09-22 14:58 ` [PATCH v4] " Valentin Rothberg
2014-09-23 16:45   ` Paul Bolle
2014-09-27 12:57 ` [PATCH v5] " Valentin Rothberg
2014-09-27 14:30 ` [PATCH v6] " Valentin Rothberg
2014-09-28 15:55 ` [PATCH v7] " Valentin Rothberg
2014-09-29 10:28   ` Paul Bolle
2014-09-29 12:08     ` Valentin Rothberg
2014-09-29 12:45       ` Paul Bolle
2014-09-29 14:47     ` Michal Marek
2014-09-29 16:21       ` Paul Bolle
2014-09-29 17:05 ` [PATCH v8] " Valentin Rothberg
2014-10-01 14:58   ` Michal Marek
2014-10-04  9:29     ` Valentin Rothberg
2014-10-08 13:39       ` Michal Marek
2014-10-19 15:30         ` Valentin Rothberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).