linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI  specs
@ 2021-09-18  9:52 Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 1/7] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

Hi Greg,

Add a new feature at get_abi.pl to optionally check for existing symbols
under /sys that won't match a "What:" inside Documentation/ABI.

Such feature is very useful to detect missing documentation for ABI.

This series brings a major speedup, plus it fixes a few border cases when
matching regexes that end with a ".*" or \d+.

patch 1 changes get_abi.pl logic to handle multiple What: lines, in
order to make the script more robust;

patch 2 adds the basic logic. It runs really quicky (up to 2
seconds), but it doesn't use sysfs softlinks.

Patch 3 adds support for parsing softlinks. It makes the script a
lot slower, making it take a couple of minutes to process the entire
sysfs files. It could be optimized in the future by using a graph,
but, for now, let's keep it simple.

Patch 4 adds an optional parameter to allow filtering the results
using a regex given by the user. When this parameter is used
(which should be the normal usecase), it will only try to find softlinks
if the sysfs node matches a regex.

Patch 5 improves the report by avoiding it to ignore What: that
ends with a wildcard.

Patch 6 is a minor speedup.  On a Dell Precision 5820, after patch 6, 
results are:

	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols

	real	2m35.563s
	user	2m34.346s
	sys	0m1.220s
	7595 undefined
	896 undefined_symbols

Patch 7 makes a *huge* speedup: it basically switches a linear O(n^3)
search for links by a logic which handle symlinks using BFS. It
also addresses a border case that was making 'msi-irqs/\d+' regex to
be misparsed. 

After patch 7, it is 11 times faster:

	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols

	real	0m14.137s
	user	0m12.795s
	sys	0m1.348s
	7030 undefined
	794 undefined_symbols

(the difference on the number of undefined symbols are due to the fix for
it to properly handle 'msi-irqs/\d+' regex)

-

While this series is independent from Documentation/ABI changes, it
works best when applied from this tree, which also contain ABI fixes
and a couple of additions of frequent missed symbols on my machine:

    https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_undefined_abi_v3

-

v3:
  - Fixed parse issues with 'msi-irqs/\d+' regex;
  - Added a BFS graph logic to solve symlinks at sysfs;

v2:
  - multiple What: for the same description are now properly handled;
  - some special cases are now better handled;
  - some bugs got fixed.

The full series, with the ABI changes and some ABI improvements can be found
at:
	https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/commit/?h=get_undefined&id=1838d8fb149170f6c19feda0645d6c3157f46f4f



Mauro Carvalho Chehab (7):
  scripts: get_abi.pl: Better handle multiple What parameters
  scripts: get_abi.pl: Check for missing symbols at the ABI specs
  scripts: get_abi.pl: detect softlinks
  scripts: get_abi.pl: add an option to filter undefined results
  scripts: get_abi.pl: don't skip what that ends with wildcards
  scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier
  scripts: get_abi.pl: add a graph to speedup the undefined algorithm

 scripts/get_abi.pl | 327 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 320 insertions(+), 7 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/7] scripts: get_abi.pl: Better handle multiple What parameters
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 2/7] scripts: get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Using a comma here is problematic, as some What: expressions
may already contain a comma. So, use \xac character instead.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d7aa82094296..cfc107df59f4 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -129,12 +129,12 @@ sub parse_abi {
 				push @{$symbols{$content}->{file}}, " $file:" . ($ln - 1);
 
 				if ($tag =~ m/what/) {
-					$what .= ", " . $content;
+					$what .= "\xac" . $content;
 				} else {
 					if ($what) {
 						parse_error($file, $ln, "What '$what' doesn't have a description", "") if (!$data{$what}->{description});
 
-						foreach my $w(split /, /, $what) {
+						foreach my $w(split /\xac/, $what) {
 							$symbols{$w}->{xref} = $what;
 						};
 					}
@@ -239,7 +239,7 @@ sub parse_abi {
 	if ($what) {
 		parse_error($file, $ln, "What '$what' doesn't have a description", "") if (!$data{$what}->{description});
 
-		foreach my $w(split /, /,$what) {
+		foreach my $w(split /\xac/,$what) {
 			$symbols{$w}->{xref} = $what;
 		};
 	}
@@ -328,7 +328,7 @@ sub output_rest {
 
 			printf ".. _%s:\n\n", $data{$what}->{label};
 
-			my @names = split /, /,$w;
+			my @names = split /\xac/,$w;
 			my $len = 0;
 
 			foreach my $name (@names) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/7] scripts: get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 1/7] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 3/7] scripts: get_abi.pl: detect softlinks Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Alexei Starovoitov,
	Andrii Nakryiko, Anton Vorontsov, Colin Cross, Daniel Borkmann,
	John Fastabend, KP Singh, Kees Cook, Martin KaFai Lau, Song Liu,
	Tony Luck, Yonghong Song, bpf, linux-kernel, netdev

Check for the symbols that exists under /sys but aren't
defined at Documentation/ABI.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 90 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 88 insertions(+), 2 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index cfc107df59f4..78364c4c4967 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -13,7 +13,9 @@ my $help = 0;
 my $man = 0;
 my $debug = 0;
 my $enable_lineno = 0;
+my $show_warnings = 1;
 my $prefix="Documentation/ABI";
+my $sysfs_prefix="/sys";
 
 #
 # If true, assumes that the description is formatted with ReST
@@ -36,7 +38,7 @@ pod2usage(2) if (scalar @ARGV < 1 || @ARGV > 2);
 
 my ($cmd, $arg) = @ARGV;
 
-pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate");
+pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate" && $cmd ne "undefined");
 pod2usage(2) if ($cmd eq "search" && !$arg);
 
 require Data::Dumper if ($debug);
@@ -50,6 +52,8 @@ my %symbols;
 sub parse_error($$$$) {
 	my ($file, $ln, $msg, $data) = @_;
 
+	return if (!$show_warnings);
+
 	$data =~ s/\s+$/\n/;
 
 	print STDERR "Warning: file $file#$ln:\n\t$msg";
@@ -521,11 +525,88 @@ sub search_symbols {
 	}
 }
 
+# Exclude /sys/kernel/debug and /sys/kernel/tracing from the search path
+sub skip_debugfs {
+	if (($File::Find::dir =~ m,^/sys/kernel,)) {
+		return grep {!/(debug|tracing)/ } @_;
+	}
+
+	if (($File::Find::dir =~ m,^/sys/fs,)) {
+		return grep {!/(pstore|bpf|fuse)/ } @_;
+	}
+
+	return @_
+}
+
+my %leaf;
+
+my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xff]) }x;
+sub parse_existing_sysfs {
+	my $file = $File::Find::name;
+
+	my $mode = (stat($file))[2];
+	return if ($mode & S_IFDIR);
+
+	my $leave = $file;
+	$leave =~ s,.*/,,;
+
+	if (defined($leaf{$leave})) {
+		# FIXME: need to check if the path makes sense
+		my $what = $leaf{$leave};
+
+		$what =~ s/,/ /g;
+
+		$what =~ s/\<[^\>]+\>/.*/g;
+		$what =~ s/\{[^\}]+\}/.*/g;
+		$what =~ s/\[[^\]]+\]/.*/g;
+		$what =~ s,/\.\.\./,/.*/,g;
+		$what =~ s,/\*/,/.*/,g;
+
+		$what =~ s/\s+/ /g;
+
+		# Escape all other symbols
+		$what =~ s/$escape_symbols/\\$1/g;
+
+		foreach my $i (split / /,$what) {
+			if ($file =~ m#^$i$#) {
+#				print "$file: $i: OK!\n";
+				return;
+			}
+		}
+
+		print "$file: $leave is defined at $what\n";
+
+		return;
+	}
+
+	print "$file not found.\n";
+}
+
+sub undefined_symbols {
+	foreach my $w (sort keys %data) {
+		foreach my $what (split /\xac /,$w) {
+			my $leave = $what;
+			$leave =~ s,.*/,,;
+
+			if (defined($leaf{$leave})) {
+				$leaf{$leave} .= " " . $what;
+			} else {
+				$leaf{$leave} = $what;
+			}
+		}
+	}
+
+	find({wanted =>\&parse_existing_sysfs, preprocess =>\&skip_debugfs, no_chdir => 1}, $sysfs_prefix);
+}
+
 # Ensure that the prefix will always end with a slash
 # While this is not needed for find, it makes the patch nicer
 # with --enable-lineno
 $prefix =~ s,/?$,/,;
 
+if ($cmd eq "undefined" || $cmd eq "search") {
+	$show_warnings = 0;
+}
 #
 # Parses all ABI files located at $prefix dir
 #
@@ -536,7 +617,9 @@ print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug);
 #
 # Handles the command
 #
-if ($cmd eq "search") {
+if ($cmd eq "undefined") {
+	undefined_symbols;
+} elsif ($cmd eq "search") {
 	search_symbols;
 } else {
 	if ($cmd eq "rest") {
@@ -575,6 +658,9 @@ B<rest>                  - output the ABI in ReST markup language
 
 B<validate>              - validate the ABI contents
 
+B<undefined>             - existing symbols at the system that aren't
+                           defined at Documentation/ABI
+
 =back
 
 =head1 OPTIONS
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/7] scripts: get_abi.pl: detect softlinks
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 1/7] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 2/7] scripts: get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 4/7] scripts: get_abi.pl: add an option to filter undefined results Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The way sysfs works is that the same leave may be present under
/sys/devices, /sys/bus and /sys/class, etc, linked via soft
symlinks.

To make it harder to parse, the ABI definition usually refers
only to one of those locations.

So, improve the logic in order to retrieve the symlinks.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 207 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 165 insertions(+), 42 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 78364c4c4967..b0ca4f4e56f2 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -8,8 +8,10 @@ use Pod::Usage;
 use Getopt::Long;
 use File::Find;
 use Fcntl ':mode';
+use Cwd 'abs_path';
 
 my $help = 0;
+my $hint = 0;
 my $man = 0;
 my $debug = 0;
 my $enable_lineno = 0;
@@ -28,6 +30,7 @@ GetOptions(
 	"rst-source!" => \$description_is_rst,
 	"dir=s" => \$prefix,
 	'help|?' => \$help,
+	"show-hints" => \$hint,
 	man => \$man
 ) or pod2usage(2);
 
@@ -526,7 +529,7 @@ sub search_symbols {
 }
 
 # Exclude /sys/kernel/debug and /sys/kernel/tracing from the search path
-sub skip_debugfs {
+sub dont_parse_special_attributes {
 	if (($File::Find::dir =~ m,^/sys/kernel,)) {
 		return grep {!/(debug|tracing)/ } @_;
 	}
@@ -539,64 +542,178 @@ sub skip_debugfs {
 }
 
 my %leaf;
+my %aliases;
+my @files;
 
-my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xff]) }x;
+my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
 sub parse_existing_sysfs {
 	my $file = $File::Find::name;
+	my $mode = (lstat($file))[2];
+	my $abs_file = abs_path($file);
 
-	my $mode = (stat($file))[2];
-	return if ($mode & S_IFDIR);
-
-	my $leave = $file;
-	$leave =~ s,.*/,,;
-
-	if (defined($leaf{$leave})) {
-		# FIXME: need to check if the path makes sense
-		my $what = $leaf{$leave};
-
-		$what =~ s/,/ /g;
-
-		$what =~ s/\<[^\>]+\>/.*/g;
-		$what =~ s/\{[^\}]+\}/.*/g;
-		$what =~ s/\[[^\]]+\]/.*/g;
-		$what =~ s,/\.\.\./,/.*/,g;
-		$what =~ s,/\*/,/.*/,g;
-
-		$what =~ s/\s+/ /g;
-
-		# Escape all other symbols
-		$what =~ s/$escape_symbols/\\$1/g;
-
-		foreach my $i (split / /,$what) {
-			if ($file =~ m#^$i$#) {
-#				print "$file: $i: OK!\n";
-				return;
-			}
-		}
-
-		print "$file: $leave is defined at $what\n";
-
+	if (S_ISLNK($mode)) {
+		$aliases{$file} = $abs_file;
 		return;
 	}
 
-	print "$file not found.\n";
+	return if (S_ISDIR($mode));
+
+	# Trivial: file is defined exactly the same way at ABI What:
+	return if (defined($data{$file}));
+	return if (defined($data{$abs_file}));
+
+	push @files, $abs_file;
+}
+
+sub check_undefined_symbols {
+	foreach my $file (sort @files) {
+
+		# sysfs-module is special, as its definitions are inside
+		# a text. For now, just ignore them.
+		next if ($file =~ m#^/sys/module/#);
+
+		# Ignore cgroup and firmware
+		next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+
+		my $defined = 0;
+		my $exact = 0;
+		my $whats = "";
+
+		my $leave = $file;
+		$leave =~ s,.*/,,;
+
+		my $path = $file;
+		$path =~ s,(.*/).*,$1,;
+
+		if (defined($leaf{$leave})) {
+			my $what = $leaf{$leave};
+			$whats .= " $what" if (!($whats =~ m/$what/));
+
+			foreach my $w (split / /, $what) {
+				if ($file =~ m#^$w$#) {
+					$exact = 1;
+					last;
+				}
+			}
+			# Check for aliases
+			#
+			# TODO: this algorithm is O(w * n²). It can be
+			# improved in the future in order to handle it
+			# faster, by changing parse_existing_sysfs to
+			# store the sysfs inside a tree, at the expense
+			# on making the code less readable and/or using some
+			# additional perl library.
+			foreach my $a (keys %aliases) {
+				my $new = $aliases{$a};
+				my $len = length($new);
+
+				if (substr($file, 0, $len) eq $new) {
+					my $newf = $a . substr($file, $len);
+
+					foreach my $w (split / /, $what) {
+						if ($newf =~ m#^$w$#) {
+							$exact = 1;
+							last;
+						}
+					}
+				}
+			}
+
+			$defined++;
+		}
+		next if ($exact);
+
+		# Ignore some sysfs nodes
+		next if ($file =~ m#/(sections|notes)/#);
+
+		# Would need to check at
+		# Documentation/admin-guide/kernel-parameters.txt, but this
+		# is not easily parseable.
+		next if ($file =~ m#/parameters/#);
+
+		if ($hint && $defined) {
+			print "$leave at $path might be one of:$whats\n";
+			next;
+		}
+		print "$file not found.\n";
+	}
 }
 
 sub undefined_symbols {
+	find({
+		wanted =>\&parse_existing_sysfs,
+		preprocess =>\&dont_parse_special_attributes,
+		no_chdir => 1
+	     }, $sysfs_prefix);
+
 	foreach my $w (sort keys %data) {
 		foreach my $what (split /\xac /,$w) {
+			next if (!($what =~ m/^$sysfs_prefix/));
+
+			# Convert what into regular expressions
+
+			$what =~ s,/\.\.\./,/*/,g;
+			$what =~ s,\*,.*,g;
+
+			# Temporarily change [0-9]+ type of patterns
+			$what =~ s/\[0\-9\]\+/\xff/g;
+
+			# Temporarily change [\d+-\d+] type of patterns
+			$what =~ s/\[0\-\d+\]/\xff/g;
+			$what =~ s/\[(\d+)\]/\xf4$1\xf5/g;
+
+			# Temporarily change [0-9] type of patterns
+			$what =~ s/\[(\d)\-(\d)\]/\xf4$1-$2\xf5/g;
+
+			# Handle multiple option patterns
+			$what =~ s/[\{\<\[]([\w_]+)(?:[,|]+([\w_]+)){1,}[\}\>\]]/($1|$2)/g;
+
+			# Handle wildcards
+			$what =~ s/\<[^\>]+\>/.*/g;
+			$what =~ s/\{[^\}]+\}/.*/g;
+			$what =~ s/\[[^\]]+\]/.*/g;
+
+			$what =~ s/[XYZ]/.*/g;
+
+			# Recover [0-9] type of patterns
+			$what =~ s/\xf4/[/g;
+			$what =~ s/\xf5/]/g;
+
+			# Remove duplicated spaces
+			$what =~ s/\s+/ /g;
+
+			# Special case: this ABI has a parenthesis on it
+			$what =~ s/sqrt\(x^2\+y^2\+z^2\)/sqrt\(x^2\+y^2\+z^2\)/;
+
+			# Special case: drop comparition as in:
+			#	What: foo = <something>
+			# (this happens on a few IIO definitions)
+			$what =~ s,\s*\=.*$,,;
+
 			my $leave = $what;
 			$leave =~ s,.*/,,;
 
-			if (defined($leaf{$leave})) {
-				$leaf{$leave} .= " " . $what;
-			} else {
-				$leaf{$leave} = $what;
+			next if ($leave =~ m/^\.\*/ || $leave eq "");
+
+			# Escape all other symbols
+			$what =~ s/$escape_symbols/\\$1/g;
+			$what =~ s/\\\\/\\/g;
+			$what =~ s/\\([\[\]\(\)\|])/$1/g;
+			$what =~ s/(\d+)\\(-\d+)/$1$2/g;
+
+			$leave =~ s/[\(\)]//g;
+
+			foreach my $l (split /\|/, $leave) {
+				if (defined($leaf{$l})) {
+					next if ($leaf{$l} =~ m/$what/);
+					$leaf{$l} .= " " . $what;
+				} else {
+					$leaf{$l} = $what;
+				}
 			}
 		}
 	}
-
-	find({wanted =>\&parse_existing_sysfs, preprocess =>\&skip_debugfs, no_chdir => 1}, $sysfs_prefix);
+	check_undefined_symbols;
 }
 
 # Ensure that the prefix will always end with a slash
@@ -646,7 +763,8 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
 =head1 SYNOPSIS
 
 B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
-	       [--(no-)rst-source] [--dir=<dir>] <COMAND> [<ARGUMENT>]
+	       [--(no-)rst-source] [--dir=<dir>] [--show-hints]
+	       <COMAND> [<ARGUMENT>]
 
 Where <COMMAND> can be:
 
@@ -688,6 +806,11 @@ Enable output of #define LINENO lines.
 Put the script in verbose mode, useful for debugging. Can be called multiple
 times, to increase verbosity.
 
+=item B<--show-hints>
+
+Show hints about possible definitions for the missing ABI symbols.
+Used only when B<undefined>.
+
 =item B<--help>
 
 Prints a brief help message and exits.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/7] scripts: get_abi.pl: add an option to filter undefined results
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2021-09-18  9:52 ` [PATCH v3 3/7] scripts: get_abi.pl: detect softlinks Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 5/7] scripts: get_abi.pl: don't skip what that ends with wildcards Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The output of this script can be too big. Add an option to
filter out results, in order to help finding issues at the
ABI files.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 37 +++++++++++++++++++++++++++++++------
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index b0ca4f4e56f2..f5f2f664e336 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -18,6 +18,7 @@ my $enable_lineno = 0;
 my $show_warnings = 1;
 my $prefix="Documentation/ABI";
 my $sysfs_prefix="/sys";
+my $search_string;
 
 #
 # If true, assumes that the description is formatted with ReST
@@ -31,6 +32,7 @@ GetOptions(
 	"dir=s" => \$prefix,
 	'help|?' => \$help,
 	"show-hints" => \$hint,
+	"search-string=s" => \$search_string,
 	man => \$man
 ) or pod2usage(2);
 
@@ -568,16 +570,13 @@ sub parse_existing_sysfs {
 sub check_undefined_symbols {
 	foreach my $file (sort @files) {
 
-		# sysfs-module is special, as its definitions are inside
-		# a text. For now, just ignore them.
-		next if ($file =~ m#^/sys/module/#);
-
 		# Ignore cgroup and firmware
 		next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
 
 		my $defined = 0;
 		my $exact = 0;
 		my $whats = "";
+		my $found_string;
 
 		my $leave = $file;
 		$leave =~ s,.*/,,;
@@ -585,6 +584,12 @@ sub check_undefined_symbols {
 		my $path = $file;
 		$path =~ s,(.*/).*,$1,;
 
+		if ($search_string) {
+			next if (!($file =~ m#$search_string#));
+			$found_string = 1;
+		}
+
+		print "--> $file\n" if ($found_string && $hint);
 		if (defined($leaf{$leave})) {
 			my $what = $leaf{$leave};
 			$whats .= " $what" if (!($whats =~ m/$what/));
@@ -610,6 +615,7 @@ sub check_undefined_symbols {
 				if (substr($file, 0, $len) eq $new) {
 					my $newf = $a . substr($file, $len);
 
+					print "    $newf\n" if ($found_string && $hint);
 					foreach my $w (split / /, $what) {
 						if ($newf =~ m#^$w$#) {
 							$exact = 1;
@@ -632,10 +638,10 @@ sub check_undefined_symbols {
 		next if ($file =~ m#/parameters/#);
 
 		if ($hint && $defined) {
-			print "$leave at $path might be one of:$whats\n";
+			print "$leave at $path might be one of:$whats\n"  if (!$search_string || $found_string);
 			next;
 		}
-		print "$file not found.\n";
+		print "$file not found.\n" if (!$search_string || $found_string);
 	}
 }
 
@@ -701,16 +707,29 @@ sub undefined_symbols {
 			$what =~ s/\\([\[\]\(\)\|])/$1/g;
 			$what =~ s/(\d+)\\(-\d+)/$1$2/g;
 
+			$what =~ s/\xff/\\d+/g;
+
+
+			# Special case: IIO ABI which a parenthesis.
+			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
+
 			$leave =~ s/[\(\)]//g;
 
+			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
 					next if ($leaf{$l} =~ m/$what/);
 					$leaf{$l} .= " " . $what;
+					$added = 1;
 				} else {
 					$leaf{$l} = $what;
+					$added = 1;
 				}
 			}
+			if ($search_string && $added) {
+				print "What: $what\n" if ($what =~ m#$search_string#);
+			}
+
 		}
 	}
 	check_undefined_symbols;
@@ -764,6 +783,7 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
 
 B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
 	       [--(no-)rst-source] [--dir=<dir>] [--show-hints]
+	       [--search-string <regex>]
 	       <COMAND> [<ARGUMENT>]
 
 Where <COMMAND> can be:
@@ -811,6 +831,11 @@ times, to increase verbosity.
 Show hints about possible definitions for the missing ABI symbols.
 Used only when B<undefined>.
 
+=item B<--search-string> [regex string]
+
+Show only occurences that match a search string.
+Used only when B<undefined>.
+
 =item B<--help>
 
 Prints a brief help message and exits.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/7] scripts: get_abi.pl: don't skip what that ends with wildcards
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2021-09-18  9:52 ` [PATCH v3 4/7] scripts: get_abi.pl: add an option to filter undefined results Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 6/7] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The search algorithm used inside check_undefined_symbols
has an optimization: it seeks only whats that have the same
leave name. This helps not only to speedup the search, but
it also allows providing a hint about a partial match.

There's a drawback, however: when "what:" finishes with a
wildcard, the logic will skip the what, reporting it as
"not found".

Fix it by grouping the remaining cases altogether, and
disabing any hints for such cases.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 74 +++++++++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 31 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index f5f2f664e336..fe83f295600c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -589,44 +589,47 @@ sub check_undefined_symbols {
 			$found_string = 1;
 		}
 
+		if ($leave =~ /^\d+$/ || !defined($leaf{$leave})) {
+			$leave = "others";
+		}
+
 		print "--> $file\n" if ($found_string && $hint);
-		if (defined($leaf{$leave})) {
-			my $what = $leaf{$leave};
-			$whats .= " $what" if (!($whats =~ m/$what/));
+		my $what = $leaf{$leave};
+		$whats .= " $what" if (!($whats =~ m/$what/));
 
-			foreach my $w (split / /, $what) {
-				if ($file =~ m#^$w$#) {
-					$exact = 1;
-					last;
-				}
+		foreach my $w (split / /, $what) {
+			if ($file =~ m#^$w$#) {
+				$exact = 1;
+				last;
 			}
-			# Check for aliases
-			#
-			# TODO: this algorithm is O(w * n²). It can be
-			# improved in the future in order to handle it
-			# faster, by changing parse_existing_sysfs to
-			# store the sysfs inside a tree, at the expense
-			# on making the code less readable and/or using some
-			# additional perl library.
-			foreach my $a (keys %aliases) {
-				my $new = $aliases{$a};
-				my $len = length($new);
+		}
+		# Check for aliases
+		#
+		# TODO: this algorithm is O(w * n²). It can be
+		# improved in the future in order to handle it
+		# faster, by changing parse_existing_sysfs to
+		# store the sysfs inside a tree, at the expense
+		# on making the code less readable and/or using some
+		# additional perl library.
+		foreach my $a (keys %aliases) {
+			my $new = $aliases{$a};
+			my $len = length($new);
 
-				if (substr($file, 0, $len) eq $new) {
-					my $newf = $a . substr($file, $len);
+			if (substr($file, 0, $len) eq $new) {
+				my $newf = $a . substr($file, $len);
 
-					print "    $newf\n" if ($found_string && $hint);
-					foreach my $w (split / /, $what) {
-						if ($newf =~ m#^$w$#) {
-							$exact = 1;
-							last;
-						}
+				print "    $newf\n" if ($found_string && $hint);
+				foreach my $w (split / /, $what) {
+					if ($newf =~ m#^$w$#) {
+						$exact = 1;
+						last;
 					}
 				}
 			}
-
-			$defined++;
 		}
+
+		$defined++;
+
 		next if ($exact);
 
 		# Ignore some sysfs nodes
@@ -637,7 +640,7 @@ sub check_undefined_symbols {
 		# is not easily parseable.
 		next if ($file =~ m#/parameters/#);
 
-		if ($hint && $defined) {
+		if ($hint && $defined && $leave ne "others") {
 			print "$leave at $path might be one of:$whats\n"  if (!$search_string || $found_string);
 			next;
 		}
@@ -699,7 +702,16 @@ sub undefined_symbols {
 			my $leave = $what;
 			$leave =~ s,.*/,,;
 
-			next if ($leave =~ m/^\.\*/ || $leave eq "");
+			# $leave is used to improve search performance at
+			# check_undefined_symbols, as the algorithm there can seek
+			# for a small number of "what". It also allows giving a
+			# hint about a leave with the same name somewhere else.
+			# However, there are a few occurences where the leave is
+			# either a wildcard or a number. Just group such cases
+			# altogether.
+			if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+				$leave = "others" ;
+			}
 
 			# Escape all other symbols
 			$what =~ s/$escape_symbols/\\$1/g;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/7] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2021-09-18  9:52 ` [PATCH v3 5/7] scripts: get_abi.pl: don't skip what that ends with wildcards Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-18  9:52 ` [PATCH v3 7/7] scripts: get_abi.pl: add a graph to speedup the undefined algorithm Mauro Carvalho Chehab
  2021-09-21 16:52 ` [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Greg Kroah-Hartman
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

In order to speedup the parser and store less data, handle
fs/cgroup exceptions a lot earlier.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index fe83f295600c..aa0a751563ba 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -550,6 +550,10 @@ my @files;
 my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
 sub parse_existing_sysfs {
 	my $file = $File::Find::name;
+
+	# Ignore cgroup and firmware
+	return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+
 	my $mode = (lstat($file))[2];
 	my $abs_file = abs_path($file);
 
@@ -570,9 +574,6 @@ sub parse_existing_sysfs {
 sub check_undefined_symbols {
 	foreach my $file (sort @files) {
 
-		# Ignore cgroup and firmware
-		next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
-
 		my $defined = 0;
 		my $exact = 0;
 		my $whats = "";
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 7/7] scripts: get_abi.pl: add a graph to speedup the undefined algorithm
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2021-09-18  9:52 ` [PATCH v3 6/7] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier Mauro Carvalho Chehab
@ 2021-09-18  9:52 ` Mauro Carvalho Chehab
  2021-09-21 16:52 ` [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Greg Kroah-Hartman
  7 siblings, 0 replies; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-18  9:52 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Searching for symlinks is an expensive operation with the current
logic, as it is at the order of O(n^3). In practice, running the
check spends 2-3 minutes to check all symbols.

Fix it by storing the directory tree into a graph, and using
a Breadth First Search (BFS) to find the links for each sysfs node.

With such improvement, it can now report issues with ~11 seconds
on my machine.

It comes with a price, though: there are more symbols reported
as undefined after this change. I suspect it is due to some
sysfs circular loops that are dropped by BFS. Despite such
increase, it seems that the reports are now more coherent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 188 ++++++++++++++++++++++++++++++---------------
 1 file changed, 127 insertions(+), 61 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index aa0a751563ba..c52a1cf0f49d 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -546,6 +546,73 @@ sub dont_parse_special_attributes {
 my %leaf;
 my %aliases;
 my @files;
+my %root;
+
+sub graph_add_file {
+	my $file = shift;
+	my $type = shift;
+
+	my $dir = $file;
+	$dir =~ s,^(.*/).*,$1,;
+	$file =~ s,.*/,,;
+
+	my $name;
+	my $file_ref = \%root;
+	foreach my $edge(split "/", $dir) {
+		$name .= "$edge/";
+		if (!defined ${$file_ref}{$edge}) {
+			${$file_ref}{$edge} = { };
+		}
+		$file_ref = \%{$$file_ref{$edge}};
+		${$file_ref}{"__name"} = [ $name ];
+	}
+	$name .= "$file";
+	${$file_ref}{$file} = {
+		"__name" => [ $name ]
+	};
+
+	return \%{$$file_ref{$file}};
+}
+
+sub graph_add_link {
+	my $file = shift;
+	my $link = shift;
+
+	# Traverse graph to find the reference
+	my $file_ref = \%root;
+	foreach my $edge(split "/", $file) {
+		$file_ref = \%{$$file_ref{$edge}} || die "Missing node!";
+	}
+
+	# do a BFS
+
+	my @queue;
+	my %seen;
+	my $base_name;
+	my $st;
+
+	push @queue, $file_ref;
+	$seen{$start}++;
+
+	while (@queue) {
+		my $v = shift @queue;
+		my @child = keys(%{$v});
+
+		foreach my $c(@child) {
+			next if $seen{$$v{$c}};
+			next if ($c eq "__name");
+
+			# Add new name
+			my $name = @{$$v{$c}{"__name"}}[0];
+			if ($name =~ s#^$file/#$link/#) {
+				push @{$$v{$c}{"__name"}}, $name;
+			}
+			# Add child to the queue and mark as seen
+			push @queue, $$v{$c};
+			$seen{$c}++;
+		}
+	}
+}
 
 my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
 sub parse_existing_sysfs {
@@ -568,19 +635,50 @@ sub parse_existing_sysfs {
 	return if (defined($data{$file}));
 	return if (defined($data{$abs_file}));
 
-	push @files, $abs_file;
+	push @files, graph_add_file($abs_file, "file");
+}
+
+sub get_leave($)
+{
+	my $what = shift;
+	my $leave;
+
+	my $l = $what;
+	my $stop = 1;
+
+	$leave = $l;
+	$leave =~ s,/$,,;
+	$leave =~ s,.*/,,;
+	$leave =~ s/[\(\)]//g;
+
+	# $leave is used to improve search performance at
+	# check_undefined_symbols, as the algorithm there can seek
+	# for a small number of "what". It also allows giving a
+	# hint about a leave with the same name somewhere else.
+	# However, there are a few occurences where the leave is
+	# either a wildcard or a number. Just group such cases
+	# altogether.
+	if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+		$leave = "others";
+	}
+
+	return $leave;
 }
 
 sub check_undefined_symbols {
-	foreach my $file (sort @files) {
+	foreach my $file_ref (sort @files) {
+		my @names = @{$$file_ref{"__name"}};
+		my $file = $names[0];
 
 		my $defined = 0;
 		my $exact = 0;
-		my $whats = "";
 		my $found_string;
 
-		my $leave = $file;
-		$leave =~ s,.*/,,;
+		my $leave = get_leave($file);
+		if (!defined($leaf{$leave})) {
+			$leave = "others";
+		}
+		my $what = $leaf{$leave};
 
 		my $path = $file;
 		$path =~ s,(.*/).*,$1,;
@@ -590,41 +688,12 @@ sub check_undefined_symbols {
 			$found_string = 1;
 		}
 
-		if ($leave =~ /^\d+$/ || !defined($leaf{$leave})) {
-			$leave = "others";
-		}
-
-		print "--> $file\n" if ($found_string && $hint);
-		my $what = $leaf{$leave};
-		$whats .= " $what" if (!($whats =~ m/$what/));
-
-		foreach my $w (split / /, $what) {
-			if ($file =~ m#^$w$#) {
-				$exact = 1;
-				last;
-			}
-		}
-		# Check for aliases
-		#
-		# TODO: this algorithm is O(w * n²). It can be
-		# improved in the future in order to handle it
-		# faster, by changing parse_existing_sysfs to
-		# store the sysfs inside a tree, at the expense
-		# on making the code less readable and/or using some
-		# additional perl library.
-		foreach my $a (keys %aliases) {
-			my $new = $aliases{$a};
-			my $len = length($new);
-
-			if (substr($file, 0, $len) eq $new) {
-				my $newf = $a . substr($file, $len);
-
-				print "    $newf\n" if ($found_string && $hint);
-				foreach my $w (split / /, $what) {
-					if ($newf =~ m#^$w$#) {
-						$exact = 1;
-						last;
-					}
+		foreach my $a (@names) {
+			print "--> $a\n" if ($found_string && $hint);
+			foreach my $w (split /\xac/, $what) {
+				if ($a =~ m#^$w$#) {
+					$exact = 1;
+					last;
 				}
 			}
 		}
@@ -641,8 +710,13 @@ sub check_undefined_symbols {
 		# is not easily parseable.
 		next if ($file =~ m#/parameters/#);
 
-		if ($hint && $defined && $leave ne "others") {
-			print "$leave at $path might be one of:$whats\n"  if (!$search_string || $found_string);
+		if ($hint && $defined && (!$search_string || $found_string)) {
+			$what =~ s/\xac/\n\t/g;
+			if ($leave ne "others") {
+				print "    more likely regexes:\n\t$what\n";
+			} else {
+				print "    tested regexes:\n\t$what\n";
+			}
 			next;
 		}
 		print "$file not found.\n" if (!$search_string || $found_string);
@@ -656,8 +730,10 @@ sub undefined_symbols {
 		no_chdir => 1
 	     }, $sysfs_prefix);
 
+	$leaf{"others"} = "";
+
 	foreach my $w (sort keys %data) {
-		foreach my $what (split /\xac /,$w) {
+		foreach my $what (split /\xac/,$w) {
 			next if (!($what =~ m/^$sysfs_prefix/));
 
 			# Convert what into regular expressions
@@ -700,19 +776,7 @@ sub undefined_symbols {
 			# (this happens on a few IIO definitions)
 			$what =~ s,\s*\=.*$,,;
 
-			my $leave = $what;
-			$leave =~ s,.*/,,;
-
-			# $leave is used to improve search performance at
-			# check_undefined_symbols, as the algorithm there can seek
-			# for a small number of "what". It also allows giving a
-			# hint about a leave with the same name somewhere else.
-			# However, there are a few occurences where the leave is
-			# either a wildcard or a number. Just group such cases
-			# altogether.
-			if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
-				$leave = "others" ;
-			}
+			my $leave = get_leave($what);
 
 			# Escape all other symbols
 			$what =~ s/$escape_symbols/\\$1/g;
@@ -722,17 +786,14 @@ sub undefined_symbols {
 
 			$what =~ s/\xff/\\d+/g;
 
-
 			# Special case: IIO ABI which a parenthesis.
 			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
 
-			$leave =~ s/[\(\)]//g;
-
 			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
-					next if ($leaf{$l} =~ m/$what/);
-					$leaf{$l} .= " " . $what;
+					next if ($leaf{$l} =~ m/\b$what\b/);
+					$leaf{$l} .= "\xac" . $what;
 					$added = 1;
 				} else {
 					$leaf{$l} = $what;
@@ -745,6 +806,11 @@ sub undefined_symbols {
 
 		}
 	}
+	# Take links into account
+	foreach my $link (keys %aliases) {
+		my $abs_file = $aliases{$link};
+		graph_add_link($abs_file, $link);
+	}
 	check_undefined_symbols;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (6 preceding siblings ...)
  2021-09-18  9:52 ` [PATCH v3 7/7] scripts: get_abi.pl: add a graph to speedup the undefined algorithm Mauro Carvalho Chehab
@ 2021-09-21 16:52 ` Greg Kroah-Hartman
  2021-09-21 18:16   ` Mauro Carvalho Chehab
  7 siblings, 1 reply; 14+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-21 16:52 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

On Sat, Sep 18, 2021 at 11:52:10AM +0200, Mauro Carvalho Chehab wrote:
> Hi Greg,
> 
> Add a new feature at get_abi.pl to optionally check for existing symbols
> under /sys that won't match a "What:" inside Documentation/ABI.
> 
> Such feature is very useful to detect missing documentation for ABI.
> 
> This series brings a major speedup, plus it fixes a few border cases when
> matching regexes that end with a ".*" or \d+.
> 
> patch 1 changes get_abi.pl logic to handle multiple What: lines, in
> order to make the script more robust;
> 
> patch 2 adds the basic logic. It runs really quicky (up to 2
> seconds), but it doesn't use sysfs softlinks.
> 
> Patch 3 adds support for parsing softlinks. It makes the script a
> lot slower, making it take a couple of minutes to process the entire
> sysfs files. It could be optimized in the future by using a graph,
> but, for now, let's keep it simple.
> 
> Patch 4 adds an optional parameter to allow filtering the results
> using a regex given by the user. When this parameter is used
> (which should be the normal usecase), it will only try to find softlinks
> if the sysfs node matches a regex.
> 
> Patch 5 improves the report by avoiding it to ignore What: that
> ends with a wildcard.
> 
> Patch 6 is a minor speedup.  On a Dell Precision 5820, after patch 6, 
> results are:
> 
> 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> 
> 	real	2m35.563s
> 	user	2m34.346s
> 	sys	0m1.220s
> 	7595 undefined
> 	896 undefined_symbols
> 
> Patch 7 makes a *huge* speedup: it basically switches a linear O(n^3)
> search for links by a logic which handle symlinks using BFS. It
> also addresses a border case that was making 'msi-irqs/\d+' regex to
> be misparsed. 
> 
> After patch 7, it is 11 times faster:
> 
> 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> 
> 	real	0m14.137s
> 	user	0m12.795s
> 	sys	0m1.348s
> 	7030 undefined
> 	794 undefined_symbols
> 
> (the difference on the number of undefined symbols are due to the fix for
> it to properly handle 'msi-irqs/\d+' regex)
> 
> -
> 
> While this series is independent from Documentation/ABI changes, it
> works best when applied from this tree, which also contain ABI fixes
> and a couple of additions of frequent missed symbols on my machine:
> 
>     https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_undefined_abi_v3

I've taken all of these, but get_abi.pl seems to be stuck in an endless
loop or something.  I gave up and stopped it after 14 minutes.  It had
stopped printing out anything after finding all of the pci attributes
that are not documented :)

Anything I can do to help debug this?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-21 16:52 ` [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Greg Kroah-Hartman
@ 2021-09-21 18:16   ` Mauro Carvalho Chehab
  2021-09-22  5:43     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-21 18:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

Em Tue, 21 Sep 2021 18:52:42 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:

> On Sat, Sep 18, 2021 at 11:52:10AM +0200, Mauro Carvalho Chehab wrote:
> > Hi Greg,
> > 
> > Add a new feature at get_abi.pl to optionally check for existing symbols
> > under /sys that won't match a "What:" inside Documentation/ABI.
> > 
> > Such feature is very useful to detect missing documentation for ABI.
> > 
> > This series brings a major speedup, plus it fixes a few border cases when
> > matching regexes that end with a ".*" or \d+.
> > 
> > patch 1 changes get_abi.pl logic to handle multiple What: lines, in
> > order to make the script more robust;
> > 
> > patch 2 adds the basic logic. It runs really quicky (up to 2
> > seconds), but it doesn't use sysfs softlinks.
> > 
> > Patch 3 adds support for parsing softlinks. It makes the script a
> > lot slower, making it take a couple of minutes to process the entire
> > sysfs files. It could be optimized in the future by using a graph,
> > but, for now, let's keep it simple.
> > 
> > Patch 4 adds an optional parameter to allow filtering the results
> > using a regex given by the user. When this parameter is used
> > (which should be the normal usecase), it will only try to find softlinks
> > if the sysfs node matches a regex.
> > 
> > Patch 5 improves the report by avoiding it to ignore What: that
> > ends with a wildcard.
> > 
> > Patch 6 is a minor speedup.  On a Dell Precision 5820, after patch 6, 
> > results are:
> > 
> > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > 
> > 	real	2m35.563s
> > 	user	2m34.346s
> > 	sys	0m1.220s
> > 	7595 undefined
> > 	896 undefined_symbols
> > 
> > Patch 7 makes a *huge* speedup: it basically switches a linear O(n^3)
> > search for links by a logic which handle symlinks using BFS. It
> > also addresses a border case that was making 'msi-irqs/\d+' regex to
> > be misparsed. 
> > 
> > After patch 7, it is 11 times faster:
> > 
> > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > 
> > 	real	0m14.137s
> > 	user	0m12.795s
> > 	sys	0m1.348s
> > 	7030 undefined
> > 	794 undefined_symbols
> > 
> > (the difference on the number of undefined symbols are due to the fix for
> > it to properly handle 'msi-irqs/\d+' regex)
> > 
> > -
> > 
> > While this series is independent from Documentation/ABI changes, it
> > works best when applied from this tree, which also contain ABI fixes
> > and a couple of additions of frequent missed symbols on my machine:
> > 
> >     https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_undefined_abi_v3  
> 
> I've taken all of these, but get_abi.pl seems to be stuck in an endless
> loop or something.  I gave up and stopped it after 14 minutes.  It had
> stopped printing out anything after finding all of the pci attributes
> that are not documented :)

It is probably not an endless loop, just there are too many vars to
check on your system, which could make it really slow.

The way the search algorithm works is that reduces the number of regex 
expressions that will be checked for a given file entry at sysfs. It 
does that by looking at the devnode name. For instance, when it checks for
this file:

	/sys/bus/pci/drivers/iosf_mbi_pci/bind

The logic will seek only the "What:" expressions that end with "bind".
Currently, there are just two What expressions for it[1]:

	What: /sys/bus/fsl\-mc/drivers/.*/bind
	What: /sys/bus/pci/drivers/.*/bind

It will then run an O(n²) algorithm to seek:

		foreach my $a (@names) {
                       foreach my $w (split /\xac/, $what) {
                               if ($a =~ m#^$w$#) {
					exact = 1;
                                        last;
                                }
			}
		}

Which runs quickly, when there are few regexs to seek. There are, 
however, some What: expressions that end with a wildcard. Those are
harder to process. Right now, they're all grouped together, which
makes them slower. Most of the processing time are spent on those.

I'm working right now on some strategy to also speed up the search 
for them. Once I get something better, I'll send a patch series.

--

[1] On a side note, there are currently some problems with the What:
    definitions for bind/unbind, as:

	- it doesn't match all PCI devices;
	- it doesn't match ACPI and other buses that also export
	  bind/unbind.

> 
> Anything I can do to help debug this?
>

There are two parameters that can help to identify the issue:

a) You can add a "--show-hints" parameter. This turns on some 
   prints that may help to identify what the script is doing.
   It is not really a debug option, but it helps to identify
   when some regexes are failing.

b) You can limit the What expressions that will be parsed with:
	   --search-string <something>

You can combine both. For instance, if you want to make it
a lot more verbose, you could run it as:

	./scripts/get_abi.pl undefined --search-string /sys --show-hints

The script will then print all regexes that will be checked, and when
actually checking for the missing vars, it will print all names for
a given entry at sysfs.

So, if you want to know how an i2c bind has been validated, you
could do:

	$ ./scripts/get_abi.pl undefined --search-string i2c/.*/bind --show-hints
	--> /sys/bus/i2c/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/driver/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/driver/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/dummy/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/dummy/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind
	--> /sys/bus/i2c/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/axp20x-i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/axp20x-i2c/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind
	--> /sys/bus/i2c/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/smbus_alert/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/smbus_alert/bind
	--> /sys/module/i2c_smbus/drivers/i2c:smbus_alert/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind
	--> /sys/bus/i2c/drivers/ee1004/bind
	--> /sys/module/ee1004/drivers/i2c:ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/ee1004/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/driver/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind
	--> /sys/bus/i2c/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/intel_soc_pmic_i2c/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/intel_soc_pmic_i2c/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind
	--> /sys/bus/i2c/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/tps68470/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/tps68470/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind
	--> /sys/bus/i2c/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3/i2c-14/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-4/i2c-15/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0036/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2/i2c-13/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0050/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-10/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-5/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-3/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1/i2c-12/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/16-0037/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-8/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-9/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:15.2/i2c_designware.2/i2c-2/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-0/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:1f.4/i2c-16/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-7/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-6/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	--> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-11/subsystem/drivers/CHT Whiskey Cove PMIC/bind
	    more likely regexes:
		/sys/bus/fsl\-mc/drivers/.*/bind
		/sys/bus/pci/drivers/.*/bind

Btw, on the above example, I have already a patch addressing it
(see enclosed). I intend to submit it on a newer patch series.

Thanks,
Mauro

[PATCH] ABI: sysfs-bus-pci: add a alternative What fields

There are some PCI ABI that aren't shown under:

	/sys/bus/pci/drivers/.../

Because they're registered with a different class. That's
the case of, for instance:

	/sys/bus/i2c/drivers/CHT Whiskey Cove PMIC/unbind

This one is not present under /sys/bus/pci:

	$ find /sys/bus/pci -name 'CHT Whiskey Cove PMIC'

Although clearly this is provided by a PCI driver:

	/sys/devices/pci0000:00/0000:00:02.0/i2c-4/subsystem/drivers/CHT Whiskey Cove PMIC/unbind

So, add an altertate What location in order to match bind/unbind
to such devices.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 1da4c8db3a9e..f4efbcb0b18c 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -1,4 +1,5 @@
 What:		/sys/bus/pci/drivers/.../bind
+What:		/sys/devices/pciX/.../bind
 Date:		December 2003
 Contact:	linux-pci@vger.kernel.org
 Description:
@@ -14,6 +15,7 @@ Description:
 		(Note: kernels before 2.6.28 may require echo -n).
 
 What:		/sys/bus/pci/drivers/.../unbind
+What:		/sys/devices/pciX/.../unbind
 Date:		December 2003
 Contact:	linux-pci@vger.kernel.org
 Description:
@@ -29,6 +31,7 @@ Description:
 		(Note: kernels before 2.6.28 may require echo -n).
 
 What:		/sys/bus/pci/drivers/.../new_id
+What:		/sys/devices/pciX/.../new_id
 Date:		December 2003
 Contact:	linux-pci@vger.kernel.org
 Description:
@@ -47,6 +50,7 @@ Description:
 		  # echo "8086 10f5" > /sys/bus/pci/drivers/foo/new_id
 
 What:		/sys/bus/pci/drivers/.../remove_id
+What:		/sys/devices/pciX/.../remove_id
 Date:		February 2009
 Contact:	Chris Wright <chrisw@sous-sol.org>
 Description:



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-21 18:16   ` Mauro Carvalho Chehab
@ 2021-09-22  5:43     ` Greg Kroah-Hartman
       [not found]       ` <YUrLqdCQyGaCc1XJ@kroah.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-22  5:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

On Tue, Sep 21, 2021 at 08:16:33PM +0200, Mauro Carvalho Chehab wrote:
> Em Tue, 21 Sep 2021 18:52:42 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> 
> > On Sat, Sep 18, 2021 at 11:52:10AM +0200, Mauro Carvalho Chehab wrote:
> > > Hi Greg,
> > > 
> > > Add a new feature at get_abi.pl to optionally check for existing symbols
> > > under /sys that won't match a "What:" inside Documentation/ABI.
> > > 
> > > Such feature is very useful to detect missing documentation for ABI.
> > > 
> > > This series brings a major speedup, plus it fixes a few border cases when
> > > matching regexes that end with a ".*" or \d+.
> > > 
> > > patch 1 changes get_abi.pl logic to handle multiple What: lines, in
> > > order to make the script more robust;
> > > 
> > > patch 2 adds the basic logic. It runs really quicky (up to 2
> > > seconds), but it doesn't use sysfs softlinks.
> > > 
> > > Patch 3 adds support for parsing softlinks. It makes the script a
> > > lot slower, making it take a couple of minutes to process the entire
> > > sysfs files. It could be optimized in the future by using a graph,
> > > but, for now, let's keep it simple.
> > > 
> > > Patch 4 adds an optional parameter to allow filtering the results
> > > using a regex given by the user. When this parameter is used
> > > (which should be the normal usecase), it will only try to find softlinks
> > > if the sysfs node matches a regex.
> > > 
> > > Patch 5 improves the report by avoiding it to ignore What: that
> > > ends with a wildcard.
> > > 
> > > Patch 6 is a minor speedup.  On a Dell Precision 5820, after patch 6, 
> > > results are:
> > > 
> > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > 
> > > 	real	2m35.563s
> > > 	user	2m34.346s
> > > 	sys	0m1.220s
> > > 	7595 undefined
> > > 	896 undefined_symbols
> > > 
> > > Patch 7 makes a *huge* speedup: it basically switches a linear O(n^3)
> > > search for links by a logic which handle symlinks using BFS. It
> > > also addresses a border case that was making 'msi-irqs/\d+' regex to
> > > be misparsed. 
> > > 
> > > After patch 7, it is 11 times faster:
> > > 
> > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > 
> > > 	real	0m14.137s
> > > 	user	0m12.795s
> > > 	sys	0m1.348s
> > > 	7030 undefined
> > > 	794 undefined_symbols
> > > 
> > > (the difference on the number of undefined symbols are due to the fix for
> > > it to properly handle 'msi-irqs/\d+' regex)
> > > 
> > > -
> > > 
> > > While this series is independent from Documentation/ABI changes, it
> > > works best when applied from this tree, which also contain ABI fixes
> > > and a couple of additions of frequent missed symbols on my machine:
> > > 
> > >     https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_undefined_abi_v3  
> > 
> > I've taken all of these, but get_abi.pl seems to be stuck in an endless
> > loop or something.  I gave up and stopped it after 14 minutes.  It had
> > stopped printing out anything after finding all of the pci attributes
> > that are not documented :)
> 
> It is probably not an endless loop, just there are too many vars to
> check on your system, which could make it really slow.

Ah, yes, I ran it overnight and got the following:

$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols

real	29m39.503s
user	29m37.556s
sys	0m0.851s
26669 undefined
765 undefined_symbols

> The way the search algorithm works is that reduces the number of regex 
> expressions that will be checked for a given file entry at sysfs. It 
> does that by looking at the devnode name. For instance, when it checks for
> this file:
> 
> 	/sys/bus/pci/drivers/iosf_mbi_pci/bind
> 
> The logic will seek only the "What:" expressions that end with "bind".
> Currently, there are just two What expressions for it[1]:
> 
> 	What: /sys/bus/fsl\-mc/drivers/.*/bind
> 	What: /sys/bus/pci/drivers/.*/bind
> 
> It will then run an O(n²) algorithm to seek:
> 
> 		foreach my $a (@names) {
>                        foreach my $w (split /\xac/, $what) {
>                                if ($a =~ m#^$w$#) {
> 					exact = 1;
>                                         last;
>                                 }
> 			}
> 		}
> 
> Which runs quickly, when there are few regexs to seek. There are, 
> however, some What: expressions that end with a wildcard. Those are
> harder to process. Right now, they're all grouped together, which
> makes them slower. Most of the processing time are spent on those.
> 
> I'm working right now on some strategy to also speed up the search 
> for them. Once I get something better, I'll send a patch series.
> 
> --
> 
> [1] On a side note, there are currently some problems with the What:
>     definitions for bind/unbind, as:
> 
> 	- it doesn't match all PCI devices;
> 	- it doesn't match ACPI and other buses that also export
> 	  bind/unbind.
> 
> > 
> > Anything I can do to help debug this?
> >
> 
> There are two parameters that can help to identify the issue:
> 
> a) You can add a "--show-hints" parameter. This turns on some 
>    prints that may help to identify what the script is doing.
>    It is not really a debug option, but it helps to identify
>    when some regexes are failing.
> 
> b) You can limit the What expressions that will be parsed with:
> 	   --search-string <something>
> 
> You can combine both. For instance, if you want to make it
> a lot more verbose, you could run it as:
> 
> 	./scripts/get_abi.pl undefined --search-string /sys --show-hints

Let me run this and time stamp it to see where it is getting hung up on.
Give it another 30 minutes :)

thanks,

greg k-hj

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs
       [not found]       ` <YUrLqdCQyGaCc1XJ@kroah.com>
@ 2021-09-22  7:36         ` Mauro Carvalho Chehab
  2021-09-22  8:11           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 14+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-22  7:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

Em Wed, 22 Sep 2021 08:22:33 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:

> On Wed, Sep 22, 2021 at 07:43:42AM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Sep 21, 2021 at 08:16:33PM +0200, Mauro Carvalho Chehab wrote:
> > > Em Tue, 21 Sep 2021 18:52:42 +0200
> > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> > > 
> > > > On Sat, Sep 18, 2021 at 11:52:10AM +0200, Mauro Carvalho Chehab wrote:
> > > > > Hi Greg,
> > > > > 
> > > > > Add a new feature at get_abi.pl to optionally check for existing symbols
> > > > > under /sys that won't match a "What:" inside Documentation/ABI.
> > > > > 
> > > > > Such feature is very useful to detect missing documentation for ABI.
> > > > > 
> > > > > This series brings a major speedup, plus it fixes a few border cases when
> > > > > matching regexes that end with a ".*" or \d+.
> > > > > 
> > > > > patch 1 changes get_abi.pl logic to handle multiple What: lines, in
> > > > > order to make the script more robust;
> > > > > 
> > > > > patch 2 adds the basic logic. It runs really quicky (up to 2
> > > > > seconds), but it doesn't use sysfs softlinks.
> > > > > 
> > > > > Patch 3 adds support for parsing softlinks. It makes the script a
> > > > > lot slower, making it take a couple of minutes to process the entire
> > > > > sysfs files. It could be optimized in the future by using a graph,
> > > > > but, for now, let's keep it simple.
> > > > > 
> > > > > Patch 4 adds an optional parameter to allow filtering the results
> > > > > using a regex given by the user. When this parameter is used
> > > > > (which should be the normal usecase), it will only try to find softlinks
> > > > > if the sysfs node matches a regex.
> > > > > 
> > > > > Patch 5 improves the report by avoiding it to ignore What: that
> > > > > ends with a wildcard.
> > > > > 
> > > > > Patch 6 is a minor speedup.  On a Dell Precision 5820, after patch 6, 
> > > > > results are:
> > > > > 
> > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > 
> > > > > 	real	2m35.563s
> > > > > 	user	2m34.346s
> > > > > 	sys	0m1.220s
> > > > > 	7595 undefined
> > > > > 	896 undefined_symbols
> > > > > 
> > > > > Patch 7 makes a *huge* speedup: it basically switches a linear O(n^3)
> > > > > search for links by a logic which handle symlinks using BFS. It
> > > > > also addresses a border case that was making 'msi-irqs/\d+' regex to
> > > > > be misparsed. 
> > > > > 
> > > > > After patch 7, it is 11 times faster:
> > > > > 
> > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > 
> > > > > 	real	0m14.137s
> > > > > 	user	0m12.795s
> > > > > 	sys	0m1.348s
> > > > > 	7030 undefined
> > > > > 	794 undefined_symbols
> > > > > 
> > > > > (the difference on the number of undefined symbols are due to the fix for
> > > > > it to properly handle 'msi-irqs/\d+' regex)
> > > > > 
> > > > > -
> > > > > 
> > > > > While this series is independent from Documentation/ABI changes, it
> > > > > works best when applied from this tree, which also contain ABI fixes
> > > > > and a couple of additions of frequent missed symbols on my machine:
> > > > > 
> > > > >     https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_undefined_abi_v3  
> > > > 
> > > > I've taken all of these, but get_abi.pl seems to be stuck in an endless
> > > > loop or something.  I gave up and stopped it after 14 minutes.  It had
> > > > stopped printing out anything after finding all of the pci attributes
> > > > that are not documented :)
> > > 
> > > It is probably not an endless loop, just there are too many vars to
> > > check on your system, which could make it really slow.
> > 
> > Ah, yes, I ran it overnight and got the following:
> > 
> > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > 
> > real	29m39.503s
> > user	29m37.556s
> > sys	0m0.851s
> > 26669 undefined
> > 765 undefined_symbols
> > 
> > > The way the search algorithm works is that reduces the number of regex 
> > > expressions that will be checked for a given file entry at sysfs. It 
> > > does that by looking at the devnode name. For instance, when it checks for
> > > this file:
> > > 
> > > 	/sys/bus/pci/drivers/iosf_mbi_pci/bind
> > > 
> > > The logic will seek only the "What:" expressions that end with "bind".
> > > Currently, there are just two What expressions for it[1]:
> > > 
> > > 	What: /sys/bus/fsl\-mc/drivers/.*/bind
> > > 	What: /sys/bus/pci/drivers/.*/bind
> > > 
> > > It will then run an O(n²) algorithm to seek:
> > > 
> > > 		foreach my $a (@names) {
> > >                        foreach my $w (split /\xac/, $what) {
> > >                                if ($a =~ m#^$w$#) {
> > > 					exact = 1;
> > >                                         last;
> > >                                 }
> > > 			}
> > > 		}
> > > 
> > > Which runs quickly, when there are few regexs to seek. There are, 
> > > however, some What: expressions that end with a wildcard. Those are
> > > harder to process. Right now, they're all grouped together, which
> > > makes them slower. Most of the processing time are spent on those.
> > > 
> > > I'm working right now on some strategy to also speed up the search 
> > > for them. Once I get something better, I'll send a patch series.
> > > 
> > > --
> > > 
> > > [1] On a side note, there are currently some problems with the What:
> > >     definitions for bind/unbind, as:
> > > 
> > > 	- it doesn't match all PCI devices;
> > > 	- it doesn't match ACPI and other buses that also export
> > > 	  bind/unbind.
> > > 
> > > > 
> > > > Anything I can do to help debug this?
> > > >
> > > 
> > > There are two parameters that can help to identify the issue:
> > > 
> > > a) You can add a "--show-hints" parameter. This turns on some 
> > >    prints that may help to identify what the script is doing.
> > >    It is not really a debug option, but it helps to identify
> > >    when some regexes are failing.
> > > 
> > > b) You can limit the What expressions that will be parsed with:
> > > 	   --search-string <something>
> > > 
> > > You can combine both. For instance, if you want to make it
> > > a lot more verbose, you could run it as:
> > > 
> > > 	./scripts/get_abi.pl undefined --search-string /sys --show-hints
> > 
> > Let me run this and time stamp it to see where it is getting hung up on.
> > Give it another 30 minutes :)
> 
> Hm, that didn't make too much sense as to what it was stalled on.  I've
> attached the compressed file if you are curious.

Hmm...

	[07:52:44] --> /sys/devices/pci0000:40/0000:40:01.3/0000:4a:00.1/iommu/amd-iommu/cap
	[08:07:52] --> /sys/devices/pci0000:40/0000:40:01.1/0000:41:00.0/0000:42:05.0/iommu/amd-iommu/cap

It sounds it took quite a while handling iommu cap, which sounds weird, as
it should be looking just 3 What expressions:

	[07:43:06] What: /sys/class/iommu/.*/amd\-iommu/cap
	[07:43:06] What: /sys/class/iommu/.*/intel\-iommu/cap
	[07:43:06] What: /sys/devices/pci.*.*.*.*\:.*.*/0000\:.*.*\:.*.*..*/dma/dma.*chan.*/quickdata/cap

Maybe there was a memory starvation while running the script, causing
swaps. Still, it is weird that it would happen there, as the hashes
and arrays used at the script are all allocated before it starts the
search logic. Here, the allocation part takes ~2 seconds.

At least on my Dell Precision 5820 (12 cpu threads), the amount of memory it
uses is not huge:

    $ /usr/bin/time -v ./scripts/get_abi.pl undefined >/dev/null
	Command being timed: "./scripts/get_abi.pl undefined"
	User time (seconds): 12.68
	System time (seconds): 1.29
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.98
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 212608
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 52003
	Voluntary context switches: 1
	Involuntary context switches: 56
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Unfortunately, I don't have any amd-based machine here, but I'll
try to run it later on a big arm server and see how it behaves.

> Anyway, this is all in my tree now, and I'll gladly take patches to make
> it go faster :)

Ok!

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-22  7:36         ` Mauro Carvalho Chehab
@ 2021-09-22  8:11           ` Greg Kroah-Hartman
  2021-09-22  8:43             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 14+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-22  8:11 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

On Wed, Sep 22, 2021 at 09:36:09AM +0200, Mauro Carvalho Chehab wrote:
> Em Wed, 22 Sep 2021 08:22:33 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> 
> > On Wed, Sep 22, 2021 at 07:43:42AM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Sep 21, 2021 at 08:16:33PM +0200, Mauro Carvalho Chehab wrote:
> > > > Em Tue, 21 Sep 2021 18:52:42 +0200
> > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> > > > 
> > > > > On Sat, Sep 18, 2021 at 11:52:10AM +0200, Mauro Carvalho Chehab wrote:
> > > > > > Hi Greg,
> > > > > > 
> > > > > > Add a new feature at get_abi.pl to optionally check for existing symbols
> > > > > > under /sys that won't match a "What:" inside Documentation/ABI.
> > > > > > 
> > > > > > Such feature is very useful to detect missing documentation for ABI.
> > > > > > 
> > > > > > This series brings a major speedup, plus it fixes a few border cases when
> > > > > > matching regexes that end with a ".*" or \d+.
> > > > > > 
> > > > > > patch 1 changes get_abi.pl logic to handle multiple What: lines, in
> > > > > > order to make the script more robust;
> > > > > > 
> > > > > > patch 2 adds the basic logic. It runs really quicky (up to 2
> > > > > > seconds), but it doesn't use sysfs softlinks.
> > > > > > 
> > > > > > Patch 3 adds support for parsing softlinks. It makes the script a
> > > > > > lot slower, making it take a couple of minutes to process the entire
> > > > > > sysfs files. It could be optimized in the future by using a graph,
> > > > > > but, for now, let's keep it simple.
> > > > > > 
> > > > > > Patch 4 adds an optional parameter to allow filtering the results
> > > > > > using a regex given by the user. When this parameter is used
> > > > > > (which should be the normal usecase), it will only try to find softlinks
> > > > > > if the sysfs node matches a regex.
> > > > > > 
> > > > > > Patch 5 improves the report by avoiding it to ignore What: that
> > > > > > ends with a wildcard.
> > > > > > 
> > > > > > Patch 6 is a minor speedup.  On a Dell Precision 5820, after patch 6, 
> > > > > > results are:
> > > > > > 
> > > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > > 
> > > > > > 	real	2m35.563s
> > > > > > 	user	2m34.346s
> > > > > > 	sys	0m1.220s
> > > > > > 	7595 undefined
> > > > > > 	896 undefined_symbols
> > > > > > 
> > > > > > Patch 7 makes a *huge* speedup: it basically switches a linear O(n^3)
> > > > > > search for links by a logic which handle symlinks using BFS. It
> > > > > > also addresses a border case that was making 'msi-irqs/\d+' regex to
> > > > > > be misparsed. 
> > > > > > 
> > > > > > After patch 7, it is 11 times faster:
> > > > > > 
> > > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > > 
> > > > > > 	real	0m14.137s
> > > > > > 	user	0m12.795s
> > > > > > 	sys	0m1.348s
> > > > > > 	7030 undefined
> > > > > > 	794 undefined_symbols
> > > > > > 
> > > > > > (the difference on the number of undefined symbols are due to the fix for
> > > > > > it to properly handle 'msi-irqs/\d+' regex)
> > > > > > 
> > > > > > -
> > > > > > 
> > > > > > While this series is independent from Documentation/ABI changes, it
> > > > > > works best when applied from this tree, which also contain ABI fixes
> > > > > > and a couple of additions of frequent missed symbols on my machine:
> > > > > > 
> > > > > >     https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_undefined_abi_v3  
> > > > > 
> > > > > I've taken all of these, but get_abi.pl seems to be stuck in an endless
> > > > > loop or something.  I gave up and stopped it after 14 minutes.  It had
> > > > > stopped printing out anything after finding all of the pci attributes
> > > > > that are not documented :)
> > > > 
> > > > It is probably not an endless loop, just there are too many vars to
> > > > check on your system, which could make it really slow.
> > > 
> > > Ah, yes, I ran it overnight and got the following:
> > > 
> > > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > 
> > > real	29m39.503s
> > > user	29m37.556s
> > > sys	0m0.851s
> > > 26669 undefined
> > > 765 undefined_symbols
> > > 
> > > > The way the search algorithm works is that reduces the number of regex 
> > > > expressions that will be checked for a given file entry at sysfs. It 
> > > > does that by looking at the devnode name. For instance, when it checks for
> > > > this file:
> > > > 
> > > > 	/sys/bus/pci/drivers/iosf_mbi_pci/bind
> > > > 
> > > > The logic will seek only the "What:" expressions that end with "bind".
> > > > Currently, there are just two What expressions for it[1]:
> > > > 
> > > > 	What: /sys/bus/fsl\-mc/drivers/.*/bind
> > > > 	What: /sys/bus/pci/drivers/.*/bind
> > > > 
> > > > It will then run an O(n²) algorithm to seek:
> > > > 
> > > > 		foreach my $a (@names) {
> > > >                        foreach my $w (split /\xac/, $what) {
> > > >                                if ($a =~ m#^$w$#) {
> > > > 					exact = 1;
> > > >                                         last;
> > > >                                 }
> > > > 			}
> > > > 		}
> > > > 
> > > > Which runs quickly, when there are few regexs to seek. There are, 
> > > > however, some What: expressions that end with a wildcard. Those are
> > > > harder to process. Right now, they're all grouped together, which
> > > > makes them slower. Most of the processing time are spent on those.
> > > > 
> > > > I'm working right now on some strategy to also speed up the search 
> > > > for them. Once I get something better, I'll send a patch series.
> > > > 
> > > > --
> > > > 
> > > > [1] On a side note, there are currently some problems with the What:
> > > >     definitions for bind/unbind, as:
> > > > 
> > > > 	- it doesn't match all PCI devices;
> > > > 	- it doesn't match ACPI and other buses that also export
> > > > 	  bind/unbind.
> > > > 
> > > > > 
> > > > > Anything I can do to help debug this?
> > > > >
> > > > 
> > > > There are two parameters that can help to identify the issue:
> > > > 
> > > > a) You can add a "--show-hints" parameter. This turns on some 
> > > >    prints that may help to identify what the script is doing.
> > > >    It is not really a debug option, but it helps to identify
> > > >    when some regexes are failing.
> > > > 
> > > > b) You can limit the What expressions that will be parsed with:
> > > > 	   --search-string <something>
> > > > 
> > > > You can combine both. For instance, if you want to make it
> > > > a lot more verbose, you could run it as:
> > > > 
> > > > 	./scripts/get_abi.pl undefined --search-string /sys --show-hints
> > > 
> > > Let me run this and time stamp it to see where it is getting hung up on.
> > > Give it another 30 minutes :)
> > 
> > Hm, that didn't make too much sense as to what it was stalled on.  I've
> > attached the compressed file if you are curious.
> 
> Hmm...
> 
> 	[07:52:44] --> /sys/devices/pci0000:40/0000:40:01.3/0000:4a:00.1/iommu/amd-iommu/cap
> 	[08:07:52] --> /sys/devices/pci0000:40/0000:40:01.1/0000:41:00.0/0000:42:05.0/iommu/amd-iommu/cap
> 
> It sounds it took quite a while handling iommu cap, which sounds weird, as
> it should be looking just 3 What expressions:
> 
> 	[07:43:06] What: /sys/class/iommu/.*/amd\-iommu/cap
> 	[07:43:06] What: /sys/class/iommu/.*/intel\-iommu/cap
> 	[07:43:06] What: /sys/devices/pci.*.*.*.*\:.*.*/0000\:.*.*\:.*.*..*/dma/dma.*chan.*/quickdata/cap
> 
> Maybe there was a memory starvation while running the script, causing
> swaps. Still, it is weird that it would happen there, as the hashes
> and arrays used at the script are all allocated before it starts the
> search logic. Here, the allocation part takes ~2 seconds.

No memory starvation here, this thing is a beast:
	$ free -h
	               total        used        free      shared  buff/cache   available
	Mem:           251Gi        36Gi        13Gi       402Mi       202Gi       212Gi
	Swap:          4.0Gi       182Mi       3.8Gi

	$ nproc
	64


> At least on my Dell Precision 5820 (12 cpu threads), the amount of memory it
> uses is not huge:
> 
>     $ /usr/bin/time -v ./scripts/get_abi.pl undefined >/dev/null
> 	Command being timed: "./scripts/get_abi.pl undefined"
> 	User time (seconds): 12.68
> 	System time (seconds): 1.29
> 	Percent of CPU this job got: 99%
> 	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.98
> 	Average shared text size (kbytes): 0
> 	Average unshared data size (kbytes): 0
> 	Average stack size (kbytes): 0
> 	Average total size (kbytes): 0
> 	Maximum resident set size (kbytes): 212608
> 	Average resident set size (kbytes): 0
> 	Major (requiring I/O) page faults: 0
> 	Minor (reclaiming a frame) page faults: 52003
> 	Voluntary context switches: 1
> 	Involuntary context switches: 56
> 	Swaps: 0
> 	File system inputs: 0
> 	File system outputs: 0
> 	Socket messages sent: 0
> 	Socket messages received: 0
> 	Signals delivered: 0
> 	Page size (bytes): 4096
> 	Exit status: 0
> 
> Unfortunately, I don't have any amd-based machine here, but I'll
> try to run it later on a big arm server and see how it behaves.

I'll run that and get back to you in 30 minutes :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-22  8:11           ` Greg Kroah-Hartman
@ 2021-09-22  8:43             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 14+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-22  8:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

On Wed, Sep 22, 2021 at 10:11:02AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Sep 22, 2021 at 09:36:09AM +0200, Mauro Carvalho Chehab wrote:
> > It sounds it took quite a while handling iommu cap, which sounds weird, as
> > it should be looking just 3 What expressions:
> > 
> > 	[07:43:06] What: /sys/class/iommu/.*/amd\-iommu/cap
> > 	[07:43:06] What: /sys/class/iommu/.*/intel\-iommu/cap
> > 	[07:43:06] What: /sys/devices/pci.*.*.*.*\:.*.*/0000\:.*.*\:.*.*..*/dma/dma.*chan.*/quickdata/cap
> > 
> > Maybe there was a memory starvation while running the script, causing
> > swaps. Still, it is weird that it would happen there, as the hashes
> > and arrays used at the script are all allocated before it starts the
> > search logic. Here, the allocation part takes ~2 seconds.
> 
> No memory starvation here, this thing is a beast:
> 	$ free -h
> 	               total        used        free      shared  buff/cache   available
> 	Mem:           251Gi        36Gi        13Gi       402Mi       202Gi       212Gi
> 	Swap:          4.0Gi       182Mi       3.8Gi
> 
> 	$ nproc
> 	64
> 
> 
> > At least on my Dell Precision 5820 (12 cpu threads), the amount of memory it
> > uses is not huge:
> > 
> >     $ /usr/bin/time -v ./scripts/get_abi.pl undefined >/dev/null
> > 	Command being timed: "./scripts/get_abi.pl undefined"
> > 	User time (seconds): 12.68
> > 	System time (seconds): 1.29
> > 	Percent of CPU this job got: 99%
> > 	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.98
> > 	Average shared text size (kbytes): 0
> > 	Average unshared data size (kbytes): 0
> > 	Average stack size (kbytes): 0
> > 	Average total size (kbytes): 0
> > 	Maximum resident set size (kbytes): 212608
> > 	Average resident set size (kbytes): 0
> > 	Major (requiring I/O) page faults: 0
> > 	Minor (reclaiming a frame) page faults: 52003
> > 	Voluntary context switches: 1
> > 	Involuntary context switches: 56
> > 	Swaps: 0
> > 	File system inputs: 0
> > 	File system outputs: 0
> > 	Socket messages sent: 0
> > 	Socket messages received: 0
> > 	Signals delivered: 0
> > 	Page size (bytes): 4096
> > 	Exit status: 0
> > 
> > Unfortunately, I don't have any amd-based machine here, but I'll
> > try to run it later on a big arm server and see how it behaves.
> 
> I'll run that and get back to you in 30 minutes :)

$ /usr/bin/time -v ./scripts/get_abi.pl undefined > /dev/null
	Command being timed: "./scripts/get_abi.pl undefined"
	User time (seconds): 1756.94
	System time (seconds): 0.76
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 29:18.94
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 228116
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 55862
	Voluntary context switches: 1
	Involuntary context switches: 17205
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-22  8:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-18  9:52 [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 1/7] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 2/7] scripts: get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 3/7] scripts: get_abi.pl: detect softlinks Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 4/7] scripts: get_abi.pl: add an option to filter undefined results Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 5/7] scripts: get_abi.pl: don't skip what that ends with wildcards Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 6/7] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier Mauro Carvalho Chehab
2021-09-18  9:52 ` [PATCH v3 7/7] scripts: get_abi.pl: add a graph to speedup the undefined algorithm Mauro Carvalho Chehab
2021-09-21 16:52 ` [PATCH v3 0/7] get_abi.pl: Check for missing symbols at the ABI specs Greg Kroah-Hartman
2021-09-21 18:16   ` Mauro Carvalho Chehab
2021-09-22  5:43     ` Greg Kroah-Hartman
     [not found]       ` <YUrLqdCQyGaCc1XJ@kroah.com>
2021-09-22  7:36         ` Mauro Carvalho Chehab
2021-09-22  8:11           ` Greg Kroah-Hartman
2021-09-22  8:43             ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).