linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] get_abi.pl undefined: improve precision and performance
@ 2021-09-23 13:29 Mauro Carvalho Chehab
  2021-09-23 13:29 ` [PATCH 01/13] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:29 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

Hi Greg,

It follows a series of improvements for get_abi.pl. it is on the top of next-20210923.

With such changes, on my development tree, the script is taking 6 seconds to run 
on my desktop:

	$ !1076
	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols

	real	0m6,292s
	user	0m5,640s
	sys	0m0,634s
	  6838 undefined_after
	   808 undefined_symbols
	  7646 total

And 7 seconds on a Dell Precision 5820:

	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols

	real	0m7.162s
	user	0m5.836s
	sys	0m1.329s
	6548 undefined
	772 undefined_symbols

Both tests were done against this tree (based on today's linux-next):

	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest

It should be noticed that, as my tree has several ABI fixes,  the time to run the
script is likely less than if you run on your tree, as there will be less symbols to
be reported, and the algorithm is optimized to reduce the number of regexes
when a symbol is found.

Besides optimizing and improving the seek logic, this series also change the
debug logic. It how receives a bitmap, where "8" means to print the regexes
that will be used by "undefined" command:

	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
	real	0m17,189s
	user	0m13,940s
	sys	0m2,404s

	$wc -l foo
	18421939 foo

	$ cat foo
	...
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
	...

On other words, on my desktop, the /sys match is performing >18M regular 
expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
enabled and sent to an area on my nvme storage).

Regards,
Mauro

---

Mauro Carvalho Chehab (13):
  scripts: get_abi.pl: Better handle multiple What parameters
  scripts: get_abi.pl: Check for missing symbols at the ABI specs
  scripts: get_abi.pl: detect softlinks
  scripts: get_abi.pl: add an option to filter undefined results
  scripts: get_abi.pl: don't skip what that ends with wildcards
  scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier
  scripts: get_abi.pl: add a graph to speedup the undefined algorithm
  scripts: get_abi.pl: improve debug logic
  scripts: get_abi.pl: Better handle leaves with wildcards
  scripts: get_abi.pl: ignore some sysfs nodes earlier
  scripts: get_abi.pl: stop check loop earlier when regex is found
  scripts: get_abi.pl: precompile what match regexes
  scripts: get_abi.pl: ensure that "others" regex will be parsed

 scripts/get_abi.pl | 388 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 372 insertions(+), 16 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/13] scripts: get_abi.pl: Better handle multiple What parameters
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
@ 2021-09-23 13:29 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 02/13] scripts: get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:29 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Using a comma here is problematic, as some What: expressions
may already contain a comma. So, use \xac character instead.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d7aa82094296..48077feea89c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -129,12 +129,12 @@ sub parse_abi {
 				push @{$symbols{$content}->{file}}, " $file:" . ($ln - 1);
 
 				if ($tag =~ m/what/) {
-					$what .= ", " . $content;
+					$what .= "\xac" . $content;
 				} else {
 					if ($what) {
 						parse_error($file, $ln, "What '$what' doesn't have a description", "") if (!$data{$what}->{description});
 
-						foreach my $w(split /, /, $what) {
+						foreach my $w(split /\xac/, $what) {
 							$symbols{$w}->{xref} = $what;
 						};
 					}
@@ -239,7 +239,7 @@ sub parse_abi {
 	if ($what) {
 		parse_error($file, $ln, "What '$what' doesn't have a description", "") if (!$data{$what}->{description});
 
-		foreach my $w(split /, /,$what) {
+		foreach my $w(split /\xac/,$what) {
 			$symbols{$w}->{xref} = $what;
 		};
 	}
@@ -328,7 +328,7 @@ sub output_rest {
 
 			printf ".. _%s:\n\n", $data{$what}->{label};
 
-			my @names = split /, /,$w;
+			my @names = split /\xac/,$w;
 			my $len = 0;
 
 			foreach my $name (@names) {
@@ -492,6 +492,7 @@ sub search_symbols {
 
 		my $file = $data{$what}->{filepath};
 
+		$what =~ s/\xac/, /g;
 		my $bar = $what;
 		$bar =~ s/./-/g;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/13] scripts: get_abi.pl: Check for missing symbols at the ABI specs
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
  2021-09-23 13:29 ` [PATCH 01/13] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 03/13] scripts: get_abi.pl: detect softlinks Mauro Carvalho Chehab
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, Alexei Starovoitov,
	Andrii Nakryiko, Anton Vorontsov, Colin Cross, Daniel Borkmann,
	John Fastabend, KP Singh, Kees Cook, Martin KaFai Lau, Song Liu,
	Tony Luck, Yonghong Song, bpf, linux-kernel, netdev

Check for the symbols that exists under /sys but aren't
defined at Documentation/ABI.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 90 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 88 insertions(+), 2 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 48077feea89c..e714bf75f5c2 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -13,7 +13,9 @@ my $help = 0;
 my $man = 0;
 my $debug = 0;
 my $enable_lineno = 0;
+my $show_warnings = 1;
 my $prefix="Documentation/ABI";
+my $sysfs_prefix="/sys";
 
 #
 # If true, assumes that the description is formatted with ReST
@@ -36,7 +38,7 @@ pod2usage(2) if (scalar @ARGV < 1 || @ARGV > 2);
 
 my ($cmd, $arg) = @ARGV;
 
-pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate");
+pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate" && $cmd ne "undefined");
 pod2usage(2) if ($cmd eq "search" && !$arg);
 
 require Data::Dumper if ($debug);
@@ -50,6 +52,8 @@ my %symbols;
 sub parse_error($$$$) {
 	my ($file, $ln, $msg, $data) = @_;
 
+	return if (!$show_warnings);
+
 	$data =~ s/\s+$/\n/;
 
 	print STDERR "Warning: file $file#$ln:\n\t$msg";
@@ -522,11 +526,88 @@ sub search_symbols {
 	}
 }
 
+# Exclude /sys/kernel/debug and /sys/kernel/tracing from the search path
+sub skip_debugfs {
+	if (($File::Find::dir =~ m,^/sys/kernel,)) {
+		return grep {!/(debug|tracing)/ } @_;
+	}
+
+	if (($File::Find::dir =~ m,^/sys/fs,)) {
+		return grep {!/(pstore|bpf|fuse)/ } @_;
+	}
+
+	return @_
+}
+
+my %leaf;
+
+my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xff]) }x;
+sub parse_existing_sysfs {
+	my $file = $File::Find::name;
+
+	my $mode = (stat($file))[2];
+	return if ($mode & S_IFDIR);
+
+	my $leave = $file;
+	$leave =~ s,.*/,,;
+
+	if (defined($leaf{$leave})) {
+		# FIXME: need to check if the path makes sense
+		my $what = $leaf{$leave};
+
+		$what =~ s/,/ /g;
+
+		$what =~ s/\<[^\>]+\>/.*/g;
+		$what =~ s/\{[^\}]+\}/.*/g;
+		$what =~ s/\[[^\]]+\]/.*/g;
+		$what =~ s,/\.\.\./,/.*/,g;
+		$what =~ s,/\*/,/.*/,g;
+
+		$what =~ s/\s+/ /g;
+
+		# Escape all other symbols
+		$what =~ s/$escape_symbols/\\$1/g;
+
+		foreach my $i (split / /,$what) {
+			if ($file =~ m#^$i$#) {
+#				print "$file: $i: OK!\n";
+				return;
+			}
+		}
+
+		print "$file: $leave is defined at $what\n";
+
+		return;
+	}
+
+	print "$file not found.\n";
+}
+
+sub undefined_symbols {
+	foreach my $w (sort keys %data) {
+		foreach my $what (split /\xac /,$w) {
+			my $leave = $what;
+			$leave =~ s,.*/,,;
+
+			if (defined($leaf{$leave})) {
+				$leaf{$leave} .= " " . $what;
+			} else {
+				$leaf{$leave} = $what;
+			}
+		}
+	}
+
+	find({wanted =>\&parse_existing_sysfs, preprocess =>\&skip_debugfs, no_chdir => 1}, $sysfs_prefix);
+}
+
 # Ensure that the prefix will always end with a slash
 # While this is not needed for find, it makes the patch nicer
 # with --enable-lineno
 $prefix =~ s,/?$,/,;
 
+if ($cmd eq "undefined" || $cmd eq "search") {
+	$show_warnings = 0;
+}
 #
 # Parses all ABI files located at $prefix dir
 #
@@ -537,7 +618,9 @@ print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug);
 #
 # Handles the command
 #
-if ($cmd eq "search") {
+if ($cmd eq "undefined") {
+	undefined_symbols;
+} elsif ($cmd eq "search") {
 	search_symbols;
 } else {
 	if ($cmd eq "rest") {
@@ -576,6 +659,9 @@ B<rest>                  - output the ABI in ReST markup language
 
 B<validate>              - validate the ABI contents
 
+B<undefined>             - existing symbols at the system that aren't
+                           defined at Documentation/ABI
+
 =back
 
 =head1 OPTIONS
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/13] scripts: get_abi.pl: detect softlinks
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
  2021-09-23 13:29 ` [PATCH 01/13] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 02/13] scripts: get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 04/13] scripts: get_abi.pl: add an option to filter undefined results Mauro Carvalho Chehab
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The way sysfs works is that the same leave may be present under
/sys/devices, /sys/bus and /sys/class, etc, linked via soft
symlinks.

To make it harder to parse, the ABI definition usually refers
only to one of those locations.

So, improve the logic in order to retrieve the symlinks.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 207 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 165 insertions(+), 42 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index e714bf75f5c2..a7cb4be6886c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -8,8 +8,10 @@ use Pod::Usage;
 use Getopt::Long;
 use File::Find;
 use Fcntl ':mode';
+use Cwd 'abs_path';
 
 my $help = 0;
+my $hint = 0;
 my $man = 0;
 my $debug = 0;
 my $enable_lineno = 0;
@@ -28,6 +30,7 @@ GetOptions(
 	"rst-source!" => \$description_is_rst,
 	"dir=s" => \$prefix,
 	'help|?' => \$help,
+	"show-hints" => \$hint,
 	man => \$man
 ) or pod2usage(2);
 
@@ -527,7 +530,7 @@ sub search_symbols {
 }
 
 # Exclude /sys/kernel/debug and /sys/kernel/tracing from the search path
-sub skip_debugfs {
+sub dont_parse_special_attributes {
 	if (($File::Find::dir =~ m,^/sys/kernel,)) {
 		return grep {!/(debug|tracing)/ } @_;
 	}
@@ -540,64 +543,178 @@ sub skip_debugfs {
 }
 
 my %leaf;
+my %aliases;
+my @files;
 
-my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xff]) }x;
+my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
 sub parse_existing_sysfs {
 	my $file = $File::Find::name;
+	my $mode = (lstat($file))[2];
+	my $abs_file = abs_path($file);
 
-	my $mode = (stat($file))[2];
-	return if ($mode & S_IFDIR);
-
-	my $leave = $file;
-	$leave =~ s,.*/,,;
-
-	if (defined($leaf{$leave})) {
-		# FIXME: need to check if the path makes sense
-		my $what = $leaf{$leave};
-
-		$what =~ s/,/ /g;
-
-		$what =~ s/\<[^\>]+\>/.*/g;
-		$what =~ s/\{[^\}]+\}/.*/g;
-		$what =~ s/\[[^\]]+\]/.*/g;
-		$what =~ s,/\.\.\./,/.*/,g;
-		$what =~ s,/\*/,/.*/,g;
-
-		$what =~ s/\s+/ /g;
-
-		# Escape all other symbols
-		$what =~ s/$escape_symbols/\\$1/g;
-
-		foreach my $i (split / /,$what) {
-			if ($file =~ m#^$i$#) {
-#				print "$file: $i: OK!\n";
-				return;
-			}
-		}
-
-		print "$file: $leave is defined at $what\n";
-
+	if (S_ISLNK($mode)) {
+		$aliases{$file} = $abs_file;
 		return;
 	}
 
-	print "$file not found.\n";
+	return if (S_ISDIR($mode));
+
+	# Trivial: file is defined exactly the same way at ABI What:
+	return if (defined($data{$file}));
+	return if (defined($data{$abs_file}));
+
+	push @files, $abs_file;
+}
+
+sub check_undefined_symbols {
+	foreach my $file (sort @files) {
+
+		# sysfs-module is special, as its definitions are inside
+		# a text. For now, just ignore them.
+		next if ($file =~ m#^/sys/module/#);
+
+		# Ignore cgroup and firmware
+		next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+
+		my $defined = 0;
+		my $exact = 0;
+		my $whats = "";
+
+		my $leave = $file;
+		$leave =~ s,.*/,,;
+
+		my $path = $file;
+		$path =~ s,(.*/).*,$1,;
+
+		if (defined($leaf{$leave})) {
+			my $what = $leaf{$leave};
+			$whats .= " $what" if (!($whats =~ m/$what/));
+
+			foreach my $w (split / /, $what) {
+				if ($file =~ m#^$w$#) {
+					$exact = 1;
+					last;
+				}
+			}
+			# Check for aliases
+			#
+			# TODO: this algorithm is O(w * n²). It can be
+			# improved in the future in order to handle it
+			# faster, by changing parse_existing_sysfs to
+			# store the sysfs inside a tree, at the expense
+			# on making the code less readable and/or using some
+			# additional perl library.
+			foreach my $a (keys %aliases) {
+				my $new = $aliases{$a};
+				my $len = length($new);
+
+				if (substr($file, 0, $len) eq $new) {
+					my $newf = $a . substr($file, $len);
+
+					foreach my $w (split / /, $what) {
+						if ($newf =~ m#^$w$#) {
+							$exact = 1;
+							last;
+						}
+					}
+				}
+			}
+
+			$defined++;
+		}
+		next if ($exact);
+
+		# Ignore some sysfs nodes
+		next if ($file =~ m#/(sections|notes)/#);
+
+		# Would need to check at
+		# Documentation/admin-guide/kernel-parameters.txt, but this
+		# is not easily parseable.
+		next if ($file =~ m#/parameters/#);
+
+		if ($hint && $defined) {
+			print "$leave at $path might be one of:$whats\n";
+			next;
+		}
+		print "$file not found.\n";
+	}
 }
 
 sub undefined_symbols {
+	find({
+		wanted =>\&parse_existing_sysfs,
+		preprocess =>\&dont_parse_special_attributes,
+		no_chdir => 1
+	     }, $sysfs_prefix);
+
 	foreach my $w (sort keys %data) {
 		foreach my $what (split /\xac /,$w) {
+			next if (!($what =~ m/^$sysfs_prefix/));
+
+			# Convert what into regular expressions
+
+			$what =~ s,/\.\.\./,/*/,g;
+			$what =~ s,\*,.*,g;
+
+			# Temporarily change [0-9]+ type of patterns
+			$what =~ s/\[0\-9\]\+/\xff/g;
+
+			# Temporarily change [\d+-\d+] type of patterns
+			$what =~ s/\[0\-\d+\]/\xff/g;
+			$what =~ s/\[(\d+)\]/\xf4$1\xf5/g;
+
+			# Temporarily change [0-9] type of patterns
+			$what =~ s/\[(\d)\-(\d)\]/\xf4$1-$2\xf5/g;
+
+			# Handle multiple option patterns
+			$what =~ s/[\{\<\[]([\w_]+)(?:[,|]+([\w_]+)){1,}[\}\>\]]/($1|$2)/g;
+
+			# Handle wildcards
+			$what =~ s/\<[^\>]+\>/.*/g;
+			$what =~ s/\{[^\}]+\}/.*/g;
+			$what =~ s/\[[^\]]+\]/.*/g;
+
+			$what =~ s/[XYZ]/.*/g;
+
+			# Recover [0-9] type of patterns
+			$what =~ s/\xf4/[/g;
+			$what =~ s/\xf5/]/g;
+
+			# Remove duplicated spaces
+			$what =~ s/\s+/ /g;
+
+			# Special case: this ABI has a parenthesis on it
+			$what =~ s/sqrt\(x^2\+y^2\+z^2\)/sqrt\(x^2\+y^2\+z^2\)/;
+
+			# Special case: drop comparition as in:
+			#	What: foo = <something>
+			# (this happens on a few IIO definitions)
+			$what =~ s,\s*\=.*$,,;
+
 			my $leave = $what;
 			$leave =~ s,.*/,,;
 
-			if (defined($leaf{$leave})) {
-				$leaf{$leave} .= " " . $what;
-			} else {
-				$leaf{$leave} = $what;
+			next if ($leave =~ m/^\.\*/ || $leave eq "");
+
+			# Escape all other symbols
+			$what =~ s/$escape_symbols/\\$1/g;
+			$what =~ s/\\\\/\\/g;
+			$what =~ s/\\([\[\]\(\)\|])/$1/g;
+			$what =~ s/(\d+)\\(-\d+)/$1$2/g;
+
+			$leave =~ s/[\(\)]//g;
+
+			foreach my $l (split /\|/, $leave) {
+				if (defined($leaf{$l})) {
+					next if ($leaf{$l} =~ m/$what/);
+					$leaf{$l} .= " " . $what;
+				} else {
+					$leaf{$l} = $what;
+				}
 			}
 		}
 	}
-
-	find({wanted =>\&parse_existing_sysfs, preprocess =>\&skip_debugfs, no_chdir => 1}, $sysfs_prefix);
+	check_undefined_symbols;
 }
 
 # Ensure that the prefix will always end with a slash
@@ -647,7 +764,8 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
 =head1 SYNOPSIS
 
 B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
-	       [--(no-)rst-source] [--dir=<dir>] <COMAND> [<ARGUMENT>]
+	       [--(no-)rst-source] [--dir=<dir>] [--show-hints]
+	       <COMAND> [<ARGUMENT>]
 
 Where <COMMAND> can be:
 
@@ -689,6 +807,11 @@ Enable output of #define LINENO lines.
 Put the script in verbose mode, useful for debugging. Can be called multiple
 times, to increase verbosity.
 
+=item B<--show-hints>
+
+Show hints about possible definitions for the missing ABI symbols.
+Used only when B<undefined>.
+
 =item B<--help>
 
 Prints a brief help message and exits.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/13] scripts: get_abi.pl: add an option to filter undefined results
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 03/13] scripts: get_abi.pl: detect softlinks Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 05/13] scripts: get_abi.pl: don't skip what that ends with wildcards Mauro Carvalho Chehab
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The output of this script can be too big. Add an option to
filter out results, in order to help finding issues at the
ABI files.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 37 +++++++++++++++++++++++++++++++------
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index a7cb4be6886c..40f10175bb98 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -18,6 +18,7 @@ my $enable_lineno = 0;
 my $show_warnings = 1;
 my $prefix="Documentation/ABI";
 my $sysfs_prefix="/sys";
+my $search_string;
 
 #
 # If true, assumes that the description is formatted with ReST
@@ -31,6 +32,7 @@ GetOptions(
 	"dir=s" => \$prefix,
 	'help|?' => \$help,
 	"show-hints" => \$hint,
+	"search-string=s" => \$search_string,
 	man => \$man
 ) or pod2usage(2);
 
@@ -569,16 +571,13 @@ sub parse_existing_sysfs {
 sub check_undefined_symbols {
 	foreach my $file (sort @files) {
 
-		# sysfs-module is special, as its definitions are inside
-		# a text. For now, just ignore them.
-		next if ($file =~ m#^/sys/module/#);
-
 		# Ignore cgroup and firmware
 		next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
 
 		my $defined = 0;
 		my $exact = 0;
 		my $whats = "";
+		my $found_string;
 
 		my $leave = $file;
 		$leave =~ s,.*/,,;
@@ -586,6 +585,12 @@ sub check_undefined_symbols {
 		my $path = $file;
 		$path =~ s,(.*/).*,$1,;
 
+		if ($search_string) {
+			next if (!($file =~ m#$search_string#));
+			$found_string = 1;
+		}
+
+		print "--> $file\n" if ($found_string && $hint);
 		if (defined($leaf{$leave})) {
 			my $what = $leaf{$leave};
 			$whats .= " $what" if (!($whats =~ m/$what/));
@@ -611,6 +616,7 @@ sub check_undefined_symbols {
 				if (substr($file, 0, $len) eq $new) {
 					my $newf = $a . substr($file, $len);
 
+					print "    $newf\n" if ($found_string && $hint);
 					foreach my $w (split / /, $what) {
 						if ($newf =~ m#^$w$#) {
 							$exact = 1;
@@ -633,10 +639,10 @@ sub check_undefined_symbols {
 		next if ($file =~ m#/parameters/#);
 
 		if ($hint && $defined) {
-			print "$leave at $path might be one of:$whats\n";
+			print "$leave at $path might be one of:$whats\n"  if (!$search_string || $found_string);
 			next;
 		}
-		print "$file not found.\n";
+		print "$file not found.\n" if (!$search_string || $found_string);
 	}
 }
 
@@ -702,16 +708,29 @@ sub undefined_symbols {
 			$what =~ s/\\([\[\]\(\)\|])/$1/g;
 			$what =~ s/(\d+)\\(-\d+)/$1$2/g;
 
+			$what =~ s/\xff/\\d+/g;
+
+
+			# Special case: IIO ABI which a parenthesis.
+			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
+
 			$leave =~ s/[\(\)]//g;
 
+			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
 					next if ($leaf{$l} =~ m/$what/);
 					$leaf{$l} .= " " . $what;
+					$added = 1;
 				} else {
 					$leaf{$l} = $what;
+					$added = 1;
 				}
 			}
+			if ($search_string && $added) {
+				print "What: $what\n" if ($what =~ m#$search_string#);
+			}
+
 		}
 	}
 	check_undefined_symbols;
@@ -765,6 +784,7 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
 
 B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
 	       [--(no-)rst-source] [--dir=<dir>] [--show-hints]
+	       [--search-string <regex>]
 	       <COMAND> [<ARGUMENT>]
 
 Where <COMMAND> can be:
@@ -812,6 +832,11 @@ times, to increase verbosity.
 Show hints about possible definitions for the missing ABI symbols.
 Used only when B<undefined>.
 
+=item B<--search-string> [regex string]
+
+Show only occurences that match a search string.
+Used only when B<undefined>.
+
 =item B<--help>
 
 Prints a brief help message and exits.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/13] scripts: get_abi.pl: don't skip what that ends with wildcards
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 04/13] scripts: get_abi.pl: add an option to filter undefined results Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 06/13] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier Mauro Carvalho Chehab
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The search algorithm used inside check_undefined_symbols
has an optimization: it seeks only whats that have the same
leave name. This helps not only to speedup the search, but
it also allows providing a hint about a partial match.

There's a drawback, however: when "what:" finishes with a
wildcard, the logic will skip the what, reporting it as
"not found".

Fix it by grouping the remaining cases altogether, and
disabing any hints for such cases.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 74 +++++++++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 31 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 40f10175bb98..8f69acec4ae5 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -590,44 +590,47 @@ sub check_undefined_symbols {
 			$found_string = 1;
 		}
 
+		if ($leave =~ /^\d+$/ || !defined($leaf{$leave})) {
+			$leave = "others";
+		}
+
 		print "--> $file\n" if ($found_string && $hint);
-		if (defined($leaf{$leave})) {
-			my $what = $leaf{$leave};
-			$whats .= " $what" if (!($whats =~ m/$what/));
+		my $what = $leaf{$leave};
+		$whats .= " $what" if (!($whats =~ m/$what/));
 
-			foreach my $w (split / /, $what) {
-				if ($file =~ m#^$w$#) {
-					$exact = 1;
-					last;
-				}
+		foreach my $w (split / /, $what) {
+			if ($file =~ m#^$w$#) {
+				$exact = 1;
+				last;
 			}
-			# Check for aliases
-			#
-			# TODO: this algorithm is O(w * n²). It can be
-			# improved in the future in order to handle it
-			# faster, by changing parse_existing_sysfs to
-			# store the sysfs inside a tree, at the expense
-			# on making the code less readable and/or using some
-			# additional perl library.
-			foreach my $a (keys %aliases) {
-				my $new = $aliases{$a};
-				my $len = length($new);
+		}
+		# Check for aliases
+		#
+		# TODO: this algorithm is O(w * n²). It can be
+		# improved in the future in order to handle it
+		# faster, by changing parse_existing_sysfs to
+		# store the sysfs inside a tree, at the expense
+		# on making the code less readable and/or using some
+		# additional perl library.
+		foreach my $a (keys %aliases) {
+			my $new = $aliases{$a};
+			my $len = length($new);
 
-				if (substr($file, 0, $len) eq $new) {
-					my $newf = $a . substr($file, $len);
+			if (substr($file, 0, $len) eq $new) {
+				my $newf = $a . substr($file, $len);
 
-					print "    $newf\n" if ($found_string && $hint);
-					foreach my $w (split / /, $what) {
-						if ($newf =~ m#^$w$#) {
-							$exact = 1;
-							last;
-						}
+				print "    $newf\n" if ($found_string && $hint);
+				foreach my $w (split / /, $what) {
+					if ($newf =~ m#^$w$#) {
+						$exact = 1;
+						last;
 					}
 				}
 			}
-
-			$defined++;
 		}
+
+		$defined++;
+
 		next if ($exact);
 
 		# Ignore some sysfs nodes
@@ -638,7 +641,7 @@ sub check_undefined_symbols {
 		# is not easily parseable.
 		next if ($file =~ m#/parameters/#);
 
-		if ($hint && $defined) {
+		if ($hint && $defined && $leave ne "others") {
 			print "$leave at $path might be one of:$whats\n"  if (!$search_string || $found_string);
 			next;
 		}
@@ -700,7 +703,16 @@ sub undefined_symbols {
 			my $leave = $what;
 			$leave =~ s,.*/,,;
 
-			next if ($leave =~ m/^\.\*/ || $leave eq "");
+			# $leave is used to improve search performance at
+			# check_undefined_symbols, as the algorithm there can seek
+			# for a small number of "what". It also allows giving a
+			# hint about a leave with the same name somewhere else.
+			# However, there are a few occurences where the leave is
+			# either a wildcard or a number. Just group such cases
+			# altogether.
+			if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+				$leave = "others" ;
+			}
 
 			# Escape all other symbols
 			$what =~ s/$escape_symbols/\\$1/g;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/13] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 05/13] scripts: get_abi.pl: don't skip what that ends with wildcards Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 07/13] scripts: get_abi.pl: add a graph to speedup the undefined algorithm Mauro Carvalho Chehab
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

In order to speedup the parser and store less data, handle
fs/cgroup exceptions a lot earlier.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 8f69acec4ae5..41a49ae31c25 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -551,6 +551,10 @@ my @files;
 my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
 sub parse_existing_sysfs {
 	my $file = $File::Find::name;
+
+	# Ignore cgroup and firmware
+	return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+
 	my $mode = (lstat($file))[2];
 	my $abs_file = abs_path($file);
 
@@ -571,9 +575,6 @@ sub parse_existing_sysfs {
 sub check_undefined_symbols {
 	foreach my $file (sort @files) {
 
-		# Ignore cgroup and firmware
-		next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
-
 		my $defined = 0;
 		my $exact = 0;
 		my $whats = "";
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/13] scripts: get_abi.pl: add a graph to speedup the undefined algorithm
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 06/13] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 08/13] scripts: get_abi.pl: improve debug logic Mauro Carvalho Chehab
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Searching for symlinks is an expensive operation with the current
logic, as it is at the order of O(n^3). In practice, running the
check spends 2-3 minutes to check all symbols.

Fix it by storing the directory tree into a graph, and using
a Breadth First Search (BFS) to find the links for each sysfs node.

With such improvement, it can now report issues with ~11 seconds
on my machine.

It comes with a price, though: there are more symbols reported
as undefined after this change. I suspect it is due to some
sysfs circular loops that are dropped by BFS. Despite such
increase, it seems that the reports are now more coherent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 189 ++++++++++++++++++++++++++++++---------------
 1 file changed, 127 insertions(+), 62 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 41a49ae31c25..9eb8a033d363 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -547,6 +547,73 @@ sub dont_parse_special_attributes {
 my %leaf;
 my %aliases;
 my @files;
+my %root;
+
+sub graph_add_file {
+	my $file = shift;
+	my $type = shift;
+
+	my $dir = $file;
+	$dir =~ s,^(.*/).*,$1,;
+	$file =~ s,.*/,,;
+
+	my $name;
+	my $file_ref = \%root;
+	foreach my $edge(split "/", $dir) {
+		$name .= "$edge/";
+		if (!defined ${$file_ref}{$edge}) {
+			${$file_ref}{$edge} = { };
+		}
+		$file_ref = \%{$$file_ref{$edge}};
+		${$file_ref}{"__name"} = [ $name ];
+	}
+	$name .= "$file";
+	${$file_ref}{$file} = {
+		"__name" => [ $name ]
+	};
+
+	return \%{$$file_ref{$file}};
+}
+
+sub graph_add_link {
+	my $file = shift;
+	my $link = shift;
+
+	# Traverse graph to find the reference
+	my $file_ref = \%root;
+	foreach my $edge(split "/", $file) {
+		$file_ref = \%{$$file_ref{$edge}} || die "Missing node!";
+	}
+
+	# do a BFS
+
+	my @queue;
+	my %seen;
+	my $base_name;
+	my $st;
+
+	push @queue, $file_ref;
+	$seen{$start}++;
+
+	while (@queue) {
+		my $v = shift @queue;
+		my @child = keys(%{$v});
+
+		foreach my $c(@child) {
+			next if $seen{$$v{$c}};
+			next if ($c eq "__name");
+
+			# Add new name
+			my $name = @{$$v{$c}{"__name"}}[0];
+			if ($name =~ s#^$file/#$link/#) {
+				push @{$$v{$c}{"__name"}}, $name;
+			}
+			# Add child to the queue and mark as seen
+			push @queue, $$v{$c};
+			$seen{$c}++;
+		}
+	}
+}
 
 my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
 sub parse_existing_sysfs {
@@ -569,19 +636,50 @@ sub parse_existing_sysfs {
 	return if (defined($data{$file}));
 	return if (defined($data{$abs_file}));
 
-	push @files, $abs_file;
+	push @files, graph_add_file($abs_file, "file");
+}
+
+sub get_leave($)
+{
+	my $what = shift;
+	my $leave;
+
+	my $l = $what;
+	my $stop = 1;
+
+	$leave = $l;
+	$leave =~ s,/$,,;
+	$leave =~ s,.*/,,;
+	$leave =~ s/[\(\)]//g;
+
+	# $leave is used to improve search performance at
+	# check_undefined_symbols, as the algorithm there can seek
+	# for a small number of "what". It also allows giving a
+	# hint about a leave with the same name somewhere else.
+	# However, there are a few occurences where the leave is
+	# either a wildcard or a number. Just group such cases
+	# altogether.
+	if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+		$leave = "others";
+	}
+
+	return $leave;
 }
 
 sub check_undefined_symbols {
-	foreach my $file (sort @files) {
+	foreach my $file_ref (sort @files) {
+		my @names = @{$$file_ref{"__name"}};
+		my $file = $names[0];
 
 		my $defined = 0;
 		my $exact = 0;
-		my $whats = "";
 		my $found_string;
 
-		my $leave = $file;
-		$leave =~ s,.*/,,;
+		my $leave = get_leave($file);
+		if (!defined($leaf{$leave})) {
+			$leave = "others";
+		}
+		my $what = $leaf{$leave};
 
 		my $path = $file;
 		$path =~ s,(.*/).*,$1,;
@@ -591,41 +689,12 @@ sub check_undefined_symbols {
 			$found_string = 1;
 		}
 
-		if ($leave =~ /^\d+$/ || !defined($leaf{$leave})) {
-			$leave = "others";
-		}
-
-		print "--> $file\n" if ($found_string && $hint);
-		my $what = $leaf{$leave};
-		$whats .= " $what" if (!($whats =~ m/$what/));
-
-		foreach my $w (split / /, $what) {
-			if ($file =~ m#^$w$#) {
-				$exact = 1;
-				last;
-			}
-		}
-		# Check for aliases
-		#
-		# TODO: this algorithm is O(w * n²). It can be
-		# improved in the future in order to handle it
-		# faster, by changing parse_existing_sysfs to
-		# store the sysfs inside a tree, at the expense
-		# on making the code less readable and/or using some
-		# additional perl library.
-		foreach my $a (keys %aliases) {
-			my $new = $aliases{$a};
-			my $len = length($new);
-
-			if (substr($file, 0, $len) eq $new) {
-				my $newf = $a . substr($file, $len);
-
-				print "    $newf\n" if ($found_string && $hint);
-				foreach my $w (split / /, $what) {
-					if ($newf =~ m#^$w$#) {
-						$exact = 1;
-						last;
-					}
+		foreach my $a (@names) {
+			print "--> $a\n" if ($found_string && $hint);
+			foreach my $w (split /\xac/, $what) {
+				if ($a =~ m#^$w$#) {
+					$exact = 1;
+					last;
 				}
 			}
 		}
@@ -642,8 +711,13 @@ sub check_undefined_symbols {
 		# is not easily parseable.
 		next if ($file =~ m#/parameters/#);
 
-		if ($hint && $defined && $leave ne "others") {
-			print "$leave at $path might be one of:$whats\n"  if (!$search_string || $found_string);
+		if ($hint && $defined && (!$search_string || $found_string)) {
+			$what =~ s/\xac/\n\t/g;
+			if ($leave ne "others") {
+				print "    more likely regexes:\n\t$what\n";
+			} else {
+				print "    tested regexes:\n\t$what\n";
+			}
 			next;
 		}
 		print "$file not found.\n" if (!$search_string || $found_string);
@@ -657,8 +731,10 @@ sub undefined_symbols {
 		no_chdir => 1
 	     }, $sysfs_prefix);
 
+	$leaf{"others"} = "";
+
 	foreach my $w (sort keys %data) {
-		foreach my $what (split /\xac /,$w) {
+		foreach my $what (split /\xac/,$w) {
 			next if (!($what =~ m/^$sysfs_prefix/));
 
 			# Convert what into regular expressions
@@ -701,20 +777,6 @@ sub undefined_symbols {
 			# (this happens on a few IIO definitions)
 			$what =~ s,\s*\=.*$,,;
 
-			my $leave = $what;
-			$leave =~ s,.*/,,;
-
-			# $leave is used to improve search performance at
-			# check_undefined_symbols, as the algorithm there can seek
-			# for a small number of "what". It also allows giving a
-			# hint about a leave with the same name somewhere else.
-			# However, there are a few occurences where the leave is
-			# either a wildcard or a number. Just group such cases
-			# altogether.
-			if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
-				$leave = "others" ;
-			}
-
 			# Escape all other symbols
 			$what =~ s/$escape_symbols/\\$1/g;
 			$what =~ s/\\\\/\\/g;
@@ -723,17 +785,15 @@ sub undefined_symbols {
 
 			$what =~ s/\xff/\\d+/g;
 
-
 			# Special case: IIO ABI which a parenthesis.
 			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
 
-			$leave =~ s/[\(\)]//g;
-
+			my $leave = get_leave($what);
 			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
-					next if ($leaf{$l} =~ m/$what/);
-					$leaf{$l} .= " " . $what;
+					next if ($leaf{$l} =~ m/\b$what\b/);
+					$leaf{$l} .= "\xac" . $what;
 					$added = 1;
 				} else {
 					$leaf{$l} = $what;
@@ -746,6 +806,11 @@ sub undefined_symbols {
 
 		}
 	}
+	# Take links into account
+	foreach my $link (keys %aliases) {
+		my $abs_file = $aliases{$link};
+		graph_add_link($abs_file, $link);
+	}
 	check_undefined_symbols;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/13] scripts: get_abi.pl: improve debug logic
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (6 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 07/13] scripts: get_abi.pl: add a graph to speedup the undefined algorithm Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 09/13] scripts: get_abi.pl: Better handle leaves with wildcards Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Add a level for debug, in order to allow it to be extended to
debug other parts of the script.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 9eb8a033d363..bb80303fea22 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -9,6 +9,7 @@ use Getopt::Long;
 use File::Find;
 use Fcntl ':mode';
 use Cwd 'abs_path';
+use Data::Dumper;
 
 my $help = 0;
 my $hint = 0;
@@ -20,13 +21,18 @@ my $prefix="Documentation/ABI";
 my $sysfs_prefix="/sys";
 my $search_string;
 
+# Debug options
+my $dbg_what_parsing = 1;
+my $dbg_what_open = 2;
+my $dbg_dump_abi_structs = 4;
+
 #
 # If true, assumes that the description is formatted with ReST
 #
 my $description_is_rst = 1;
 
 GetOptions(
-	"debug|d+" => \$debug,
+	"debug=i" => \$debug,
 	"enable-lineno" => \$enable_lineno,
 	"rst-source!" => \$description_is_rst,
 	"dir=s" => \$prefix,
@@ -46,7 +52,7 @@ my ($cmd, $arg) = @ARGV;
 pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate" && $cmd ne "undefined");
 pod2usage(2) if ($cmd eq "search" && !$arg);
 
-require Data::Dumper if ($debug);
+require Data::Dumper if ($debug & $dbg_dump_abi_structs);
 
 my %data;
 my %symbols;
@@ -106,7 +112,7 @@ sub parse_abi {
 	my @labels;
 	my $label = "";
 
-	print STDERR "Opening $file\n" if ($debug > 1);
+	print STDERR "Opening $file\n" if ($debug & $dbg_what_open);
 	open IN, $file;
 	while(<IN>) {
 		$ln++;
@@ -178,7 +184,7 @@ sub parse_abi {
 							$data{$what}->{filepath} .= " " . $file;
 						}
 					}
-					print STDERR "\twhat: $what\n" if ($debug > 1);
+					print STDERR "\twhat: $what\n" if ($debug & $dbg_what_parsing);
 					$data{$what}->{line_no} = $ln;
 				} else {
 					$data{$what}->{line_no} = $ln if (!defined($data{$what}->{line_no}));
@@ -827,7 +833,7 @@ if ($cmd eq "undefined" || $cmd eq "search") {
 #
 find({wanted =>\&parse_abi, no_chdir => 1}, $prefix);
 
-print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug);
+print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug & $dbg_dump_abi_structs);
 
 #
 # Handles the command
@@ -860,7 +866,7 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
 
 =head1 SYNOPSIS
 
-B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
+B<abi_book.pl> [--debug <level>] [--enable-lineno] [--man] [--help]
 	       [--(no-)rst-source] [--dir=<dir>] [--show-hints]
 	       [--search-string <regex>]
 	       <COMAND> [<ARGUMENT>]
@@ -900,10 +906,14 @@ logic (--no-rst-source).
 
 Enable output of #define LINENO lines.
 
-=item B<--debug>
+=item B<--debug> I<debug level>
 
-Put the script in verbose mode, useful for debugging. Can be called multiple
-times, to increase verbosity.
+Print debug information according with the level, which is given by the
+following bitmask:
+
+    -  1: Debug parsing What entries from ABI files;
+    -  2: Shows what files are opened from ABI files;
+    -  4: Dump the structs used to store the contents of the ABI files.
 
 =item B<--show-hints>
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/13] scripts: get_abi.pl: Better handle leaves with wildcards
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 08/13] scripts: get_abi.pl: improve debug logic Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 10/13] scripts: get_abi.pl: ignore some sysfs nodes earlier Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

When the the leaf of a regex ends with a wildcard, the speedup
algorithm to reduce the number of regexes to seek won't work.

So, when those are found, place at the "others" exception.

That slows down the search from 0.14s to 1 minute on my
machine, but the results are a lot more consistent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index bb80303fea22..3c0063d0e05e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -665,7 +665,7 @@ sub get_leave($)
 	# However, there are a few occurences where the leave is
 	# either a wildcard or a number. Just group such cases
 	# altogether.
-	if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+	if ($leave =~ m/\.\*/ || $leave eq "" || $leave =~ /\\d/) {
 		$leave = "others";
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/13] scripts: get_abi.pl: ignore some sysfs nodes earlier
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 09/13] scripts: get_abi.pl: Better handle leaves with wildcards Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 11/13] scripts: get_abi.pl: stop check loop earlier when regex is found Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

When checking for undefined symbols, some nodes aren't easy
or don't make sense to be checked right now. Prevent allocating
memory for those, as they'll be ignored anyway.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 3c0063d0e05e..42eb16eb78e9 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -628,6 +628,14 @@ sub parse_existing_sysfs {
 	# Ignore cgroup and firmware
 	return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
 
+	# Ignore some sysfs nodes
+	return if ($file =~ m#/(sections|notes)/#);
+
+	# Would need to check at
+	# Documentation/admin-guide/kernel-parameters.txt, but this
+	# is not easily parseable.
+	return if ($file =~ m#/parameters/#);
+
 	my $mode = (lstat($file))[2];
 	my $abs_file = abs_path($file);
 
@@ -709,14 +717,6 @@ sub check_undefined_symbols {
 
 		next if ($exact);
 
-		# Ignore some sysfs nodes
-		next if ($file =~ m#/(sections|notes)/#);
-
-		# Would need to check at
-		# Documentation/admin-guide/kernel-parameters.txt, but this
-		# is not easily parseable.
-		next if ($file =~ m#/parameters/#);
-
 		if ($hint && $defined && (!$search_string || $found_string)) {
 			$what =~ s/\xac/\n\t/g;
 			if ($leave ne "others") {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/13] scripts: get_abi.pl: stop check loop earlier when regex is found
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 10/13] scripts: get_abi.pl: ignore some sysfs nodes earlier Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 12/13] scripts: get_abi.pl: precompile what match regexes Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Right now, there are two loops used to seek for a regex. Make
sure that both will be skip when a match is found.

While here, drop the unused $defined variable.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 42eb16eb78e9..d45e5ba56f9c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -685,7 +685,6 @@ sub check_undefined_symbols {
 		my @names = @{$$file_ref{"__name"}};
 		my $file = $names[0];
 
-		my $defined = 0;
 		my $exact = 0;
 		my $found_string;
 
@@ -711,13 +710,11 @@ sub check_undefined_symbols {
 					last;
 				}
 			}
+			last if ($exact);
 		}
-
-		$defined++;
-
 		next if ($exact);
 
-		if ($hint && $defined && (!$search_string || $found_string)) {
+		if ($hint && (!$search_string || $found_string)) {
 			$what =~ s/\xac/\n\t/g;
 			if ($leave ne "others") {
 				print "    more likely regexes:\n\t$what\n";
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/13] scripts: get_abi.pl: precompile what match regexes
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (10 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 11/13] scripts: get_abi.pl: stop check loop earlier when regex is found Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:30 ` [PATCH 13/13] scripts: get_abi.pl: ensure that "others" regex will be parsed Mauro Carvalho Chehab
  2021-09-23 13:58 ` [PATCH 00/13] get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

In order to earn some time during matches, pre-compile regexes.

Before this patch:
	$ time ./scripts/get_abi.pl undefined |wc -l
	6970

	real	0m54,751s
	user	0m54,022s
	sys	0m0,592s

Afterwards:

	$ time ./scripts/get_abi.pl undefined |wc -l
	6970

	real	0m5,888s
	user	0m5,310s
	sys	0m0,562s

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d45e5ba56f9c..f2b5efef9c30 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -25,6 +25,7 @@ my $search_string;
 my $dbg_what_parsing = 1;
 my $dbg_what_open = 2;
 my $dbg_dump_abi_structs = 4;
+my $dbg_undefined = 8;
 
 #
 # If true, assumes that the description is formatted with ReST
@@ -692,7 +693,8 @@ sub check_undefined_symbols {
 		if (!defined($leaf{$leave})) {
 			$leave = "others";
 		}
-		my $what = $leaf{$leave};
+		my @expr = @{$leaf{$leave}->{expr}};
+		die ("missing rules for $leave") if (!defined($leaf{$leave}));
 
 		my $path = $file;
 		$path =~ s,(.*/).*,$1,;
@@ -702,10 +704,17 @@ sub check_undefined_symbols {
 			$found_string = 1;
 		}
 
-		foreach my $a (@names) {
-			print "--> $a\n" if ($found_string && $hint);
-			foreach my $w (split /\xac/, $what) {
-				if ($a =~ m#^$w$#) {
+		for (my $i = 0; $i < @names; $i++) {
+			if ($found_string && $hint) {
+				if (!$i) {
+					print "--> $names[$i]\n";
+				} else {
+					print "    $names[$i]\n";
+				}
+			}
+			foreach my $re (@expr) {
+				print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+				if ($names[$i] =~ $re) {
 					$exact = 1;
 					last;
 				}
@@ -715,6 +724,7 @@ sub check_undefined_symbols {
 		next if ($exact);
 
 		if ($hint && (!$search_string || $found_string)) {
+			my $what = $leaf{$leave}->{what};
 			$what =~ s/\xac/\n\t/g;
 			if ($leave ne "others") {
 				print "    more likely regexes:\n\t$what\n";
@@ -734,7 +744,7 @@ sub undefined_symbols {
 		no_chdir => 1
 	     }, $sysfs_prefix);
 
-	$leaf{"others"} = "";
+	$leaf{"others"}->{what} = "";
 
 	foreach my $w (sort keys %data) {
 		foreach my $what (split /\xac/,$w) {
@@ -792,14 +802,15 @@ sub undefined_symbols {
 			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
 
 			my $leave = get_leave($what);
+
 			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
-					next if ($leaf{$l} =~ m/\b$what\b/);
-					$leaf{$l} .= "\xac" . $what;
+					next if ($leaf{$l}->{what} =~ m/\b$what\b/);
+					$leaf{$l}->{what} .= "\xac" . $what;
 					$added = 1;
 				} else {
-					$leaf{$l} = $what;
+					$leaf{$l}->{what} = $what;
 					$added = 1;
 				}
 			}
@@ -809,6 +820,15 @@ sub undefined_symbols {
 
 		}
 	}
+	# Compile regexes
+	foreach my $l (keys %leaf) {
+		my @expr;
+		foreach my $w(split /\xac/, $leaf{$l}->{what}) {
+			push @expr, qr /^$w$/;
+		}
+		$leaf{$l}->{expr} = \@expr;
+	}
+
 	# Take links into account
 	foreach my $link (keys %aliases) {
 		my $abs_file = $aliases{$link};
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/13] scripts: get_abi.pl: ensure that "others" regex will be parsed
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (11 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 12/13] scripts: get_abi.pl: precompile what match regexes Mauro Carvalho Chehab
@ 2021-09-23 13:30 ` Mauro Carvalho Chehab
  2021-09-23 13:58 ` [PATCH 00/13] get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
  13 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 13:30 UTC (permalink / raw)
  To: Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The way the search algorithm works is that reduces the number of regex
expressions that will be checked for a given file entry at sysfs. It
does that by looking at the devnode name. For instance, when it checks for
this file:

	/sys/bus/pci/drivers/iosf_mbi_pci/bind

The logic will seek only the "What:" expressions that end with "bind".
Currently, there are just a couple of What expressions that matches
it:

	What: /sys/bus/fsl\-mc/drivers/.*/bind
	What: /sys/bus/pci/drivers/.*/bind

It will then run an O(n²) algorithm to seek, which runs quickly
when there are few regexs to seek. There are, however, some What:
expressions that end with a wildcard. Those are harder to process.
Right now, they're all grouped together at the "others" group.

As those don't depend on the basename of the node, add an extra
loop to ensure that those will be processed at the end, if
not done yet.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index f2b5efef9c30..f25c98b1971e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -723,6 +723,22 @@ sub check_undefined_symbols {
 		}
 		next if ($exact);
 
+		if ($leave ne "others") {
+			my @expr = @{$leaf{$leave}->{expr}};
+			for (my $i = 0; $i < @names; $i++) {
+				foreach my $re (@expr) {
+					print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+					if ($names[$i] =~ $re) {
+						$exact = 1;
+						last;
+					}
+				}
+				last if ($exact);
+			}
+			last if ($exact);
+		}
+		next if ($exact);
+
 		if ($hint && (!$search_string || $found_string)) {
 			my $what = $leaf{$leave}->{what};
 			$what =~ s/\xac/\n\t/g;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/13] get_abi.pl undefined: improve precision and performance
  2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
                   ` (12 preceding siblings ...)
  2021-09-23 13:30 ` [PATCH 13/13] scripts: get_abi.pl: ensure that "others" regex will be parsed Mauro Carvalho Chehab
@ 2021-09-23 13:58 ` Greg Kroah-Hartman
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
  13 siblings, 1 reply; 31+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-23 13:58 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet,
	Anton Vorontsov, Colin Cross, John Fastabend, KP Singh,
	Kees Cook, Martin KaFai Lau, Song Liu, Tony Luck, Yonghong Song,
	bpf, netdev

On Thu, Sep 23, 2021 at 03:29:58PM +0200, Mauro Carvalho Chehab wrote:
> Hi Greg,
> 
> It follows a series of improvements for get_abi.pl. it is on the top of next-20210923.

Hm, looks like I hadn't pushed my -testing tree out so that it will show
up in linux-next yet, so we got a bunch of conflicts here.

I've done so now, can you rebase against my tree and resend?  I think
only 4 patches are new here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
@ 2021-09-23 15:41   ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 1/8] scripts: get_abi.pl: Fix get_abi.pl search output Mauro Carvalho Chehab
                       ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, linux-kernel, Jonathan Corbet

Hi Greg,

As requested, this is exactly the same changes, rebased on the top of
driver-core/driver-core-next.

-

It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.

With such changes, on my development tree, the script is taking 6 seconds to run 
on my desktop:

	$ !1076
	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols

	real	0m6,292s
	user	0m5,640s
	sys	0m0,634s
	  6838 undefined_after
	   808 undefined_symbols
	  7646 total

And 7 seconds on a Dell Precision 5820:

	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols

	real	0m7.162s
	user	0m5.836s
	sys	0m1.329s
	6548 undefined
	772 undefined_symbols

Both tests were done against this tree (based on today's linux-next):

	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest

It should be noticed that, as my tree has several ABI fixes,  the time to run the
script is likely less than if you run on your tree, as there will be less symbols to
be reported, and the algorithm is optimized to reduce the number of regexes
when a symbol is found.

Besides optimizing and improving the seek logic, this series also change the
debug logic. It how receives a bitmap, where "8" means to print the regexes
that will be used by "undefined" command:

	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
	real	0m17,189s
	user	0m13,940s
	sys	0m2,404s

	$wc -l foo
	18421939 foo

	$ cat foo
	...
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
	...

On other words, on my desktop, the /sys match is performing >18M regular 
expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
enabled and sent to an area on my nvme storage).

Regards,
Mauro


Mauro Carvalho Chehab (8):
  scripts: get_abi.pl: Fix get_abi.pl search output
  scripts: get_abi.pl: call get_leave() a little late
  scripts: get_abi.pl: improve debug logic
  scripts: get_abi.pl: Better handle leaves with wildcards
  scripts: get_abi.pl: ignore some sysfs nodes earlier
  scripts: get_abi.pl: stop check loop earlier when regex is found
  scripts: get_abi.pl: precompile what match regexes
  scripts: get_abi.pl: ensure that "others" regex will be parsed

 scripts/get_abi.pl | 109 +++++++++++++++++++++++++++++++--------------
 1 file changed, 76 insertions(+), 33 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/8] scripts: get_abi.pl: Fix get_abi.pl search output
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 2/8] scripts: get_abi.pl: call get_leave() a little late Mauro Carvalho Chehab
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Currently, the get_abi.pl will print an invalid symbol
(\xac character). Fix it.

Fixes: ab9c14805b37 ("scripts: get_abi.pl: Better handle multiple What parameters")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index c52a1cf0f49d..65261f464e25 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -501,6 +501,7 @@ sub search_symbols {
 
 		my $file = $data{$what}->{filepath};
 
+		$what =~ s/\xac/, /g;
 		my $bar = $what;
 		$bar =~ s/./-/g;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/8] scripts: get_abi.pl: call get_leave() a little late
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 1/8] scripts: get_abi.pl: Fix get_abi.pl search output Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 3/8] scripts: get_abi.pl: improve debug logic Mauro Carvalho Chehab
                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The $what conversions need to replace some characters to avoid
breaking regex expressions found on some What:.

only after replacing them back, the script should get the
$leave devnode.

Fixes: ca8e055c2215 ("scripts: get_abi.pl: add a graph to speedup the undefined algorithm")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 65261f464e25..9eb8a033d363 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -777,8 +777,6 @@ sub undefined_symbols {
 			# (this happens on a few IIO definitions)
 			$what =~ s,\s*\=.*$,,;
 
-			my $leave = get_leave($what);
-
 			# Escape all other symbols
 			$what =~ s/$escape_symbols/\\$1/g;
 			$what =~ s/\\\\/\\/g;
@@ -790,6 +788,7 @@ sub undefined_symbols {
 			# Special case: IIO ABI which a parenthesis.
 			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
 
+			my $leave = get_leave($what);
 			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/8] scripts: get_abi.pl: improve debug logic
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 1/8] scripts: get_abi.pl: Fix get_abi.pl search output Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 2/8] scripts: get_abi.pl: call get_leave() a little late Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 4/8] scripts: get_abi.pl: Better handle leaves with wildcards Mauro Carvalho Chehab
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Add a level for debug, in order to allow it to be extended to
debug other parts of the script.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 9eb8a033d363..bb80303fea22 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -9,6 +9,7 @@ use Getopt::Long;
 use File::Find;
 use Fcntl ':mode';
 use Cwd 'abs_path';
+use Data::Dumper;
 
 my $help = 0;
 my $hint = 0;
@@ -20,13 +21,18 @@ my $prefix="Documentation/ABI";
 my $sysfs_prefix="/sys";
 my $search_string;
 
+# Debug options
+my $dbg_what_parsing = 1;
+my $dbg_what_open = 2;
+my $dbg_dump_abi_structs = 4;
+
 #
 # If true, assumes that the description is formatted with ReST
 #
 my $description_is_rst = 1;
 
 GetOptions(
-	"debug|d+" => \$debug,
+	"debug=i" => \$debug,
 	"enable-lineno" => \$enable_lineno,
 	"rst-source!" => \$description_is_rst,
 	"dir=s" => \$prefix,
@@ -46,7 +52,7 @@ my ($cmd, $arg) = @ARGV;
 pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate" && $cmd ne "undefined");
 pod2usage(2) if ($cmd eq "search" && !$arg);
 
-require Data::Dumper if ($debug);
+require Data::Dumper if ($debug & $dbg_dump_abi_structs);
 
 my %data;
 my %symbols;
@@ -106,7 +112,7 @@ sub parse_abi {
 	my @labels;
 	my $label = "";
 
-	print STDERR "Opening $file\n" if ($debug > 1);
+	print STDERR "Opening $file\n" if ($debug & $dbg_what_open);
 	open IN, $file;
 	while(<IN>) {
 		$ln++;
@@ -178,7 +184,7 @@ sub parse_abi {
 							$data{$what}->{filepath} .= " " . $file;
 						}
 					}
-					print STDERR "\twhat: $what\n" if ($debug > 1);
+					print STDERR "\twhat: $what\n" if ($debug & $dbg_what_parsing);
 					$data{$what}->{line_no} = $ln;
 				} else {
 					$data{$what}->{line_no} = $ln if (!defined($data{$what}->{line_no}));
@@ -827,7 +833,7 @@ if ($cmd eq "undefined" || $cmd eq "search") {
 #
 find({wanted =>\&parse_abi, no_chdir => 1}, $prefix);
 
-print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug);
+print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug & $dbg_dump_abi_structs);
 
 #
 # Handles the command
@@ -860,7 +866,7 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
 
 =head1 SYNOPSIS
 
-B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
+B<abi_book.pl> [--debug <level>] [--enable-lineno] [--man] [--help]
 	       [--(no-)rst-source] [--dir=<dir>] [--show-hints]
 	       [--search-string <regex>]
 	       <COMAND> [<ARGUMENT>]
@@ -900,10 +906,14 @@ logic (--no-rst-source).
 
 Enable output of #define LINENO lines.
 
-=item B<--debug>
+=item B<--debug> I<debug level>
 
-Put the script in verbose mode, useful for debugging. Can be called multiple
-times, to increase verbosity.
+Print debug information according with the level, which is given by the
+following bitmask:
+
+    -  1: Debug parsing What entries from ABI files;
+    -  2: Shows what files are opened from ABI files;
+    -  4: Dump the structs used to store the contents of the ABI files.
 
 =item B<--show-hints>
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/8] scripts: get_abi.pl: Better handle leaves with wildcards
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
                       ` (2 preceding siblings ...)
  2021-09-23 15:41     ` [PATCH 3/8] scripts: get_abi.pl: improve debug logic Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 5/8] scripts: get_abi.pl: ignore some sysfs nodes earlier Mauro Carvalho Chehab
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

When the the leaf of a regex ends with a wildcard, the speedup
algorithm to reduce the number of regexes to seek won't work.

So, when those are found, place at the "others" exception.

That slows down the search from 0.14s to 1 minute on my
machine, but the results are a lot more consistent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index bb80303fea22..3c0063d0e05e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -665,7 +665,7 @@ sub get_leave($)
 	# However, there are a few occurences where the leave is
 	# either a wildcard or a number. Just group such cases
 	# altogether.
-	if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+	if ($leave =~ m/\.\*/ || $leave eq "" || $leave =~ /\\d/) {
 		$leave = "others";
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/8] scripts: get_abi.pl: ignore some sysfs nodes earlier
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
                       ` (3 preceding siblings ...)
  2021-09-23 15:41     ` [PATCH 4/8] scripts: get_abi.pl: Better handle leaves with wildcards Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 6/8] scripts: get_abi.pl: stop check loop earlier when regex is found Mauro Carvalho Chehab
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

When checking for undefined symbols, some nodes aren't easy
or don't make sense to be checked right now. Prevent allocating
memory for those, as they'll be ignored anyway.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 3c0063d0e05e..42eb16eb78e9 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -628,6 +628,14 @@ sub parse_existing_sysfs {
 	# Ignore cgroup and firmware
 	return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
 
+	# Ignore some sysfs nodes
+	return if ($file =~ m#/(sections|notes)/#);
+
+	# Would need to check at
+	# Documentation/admin-guide/kernel-parameters.txt, but this
+	# is not easily parseable.
+	return if ($file =~ m#/parameters/#);
+
 	my $mode = (lstat($file))[2];
 	my $abs_file = abs_path($file);
 
@@ -709,14 +717,6 @@ sub check_undefined_symbols {
 
 		next if ($exact);
 
-		# Ignore some sysfs nodes
-		next if ($file =~ m#/(sections|notes)/#);
-
-		# Would need to check at
-		# Documentation/admin-guide/kernel-parameters.txt, but this
-		# is not easily parseable.
-		next if ($file =~ m#/parameters/#);
-
 		if ($hint && $defined && (!$search_string || $found_string)) {
 			$what =~ s/\xac/\n\t/g;
 			if ($leave ne "others") {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/8] scripts: get_abi.pl: stop check loop earlier when regex is found
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
                       ` (4 preceding siblings ...)
  2021-09-23 15:41     ` [PATCH 5/8] scripts: get_abi.pl: ignore some sysfs nodes earlier Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 7/8] scripts: get_abi.pl: precompile what match regexes Mauro Carvalho Chehab
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

Right now, there are two loops used to seek for a regex. Make
sure that both will be skip when a match is found.

While here, drop the unused $defined variable.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 42eb16eb78e9..d45e5ba56f9c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -685,7 +685,6 @@ sub check_undefined_symbols {
 		my @names = @{$$file_ref{"__name"}};
 		my $file = $names[0];
 
-		my $defined = 0;
 		my $exact = 0;
 		my $found_string;
 
@@ -711,13 +710,11 @@ sub check_undefined_symbols {
 					last;
 				}
 			}
+			last if ($exact);
 		}
-
-		$defined++;
-
 		next if ($exact);
 
-		if ($hint && $defined && (!$search_string || $found_string)) {
+		if ($hint && (!$search_string || $found_string)) {
 			$what =~ s/\xac/\n\t/g;
 			if ($leave ne "others") {
 				print "    more likely regexes:\n\t$what\n";
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/8] scripts: get_abi.pl: precompile what match regexes
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
                       ` (5 preceding siblings ...)
  2021-09-23 15:41     ` [PATCH 6/8] scripts: get_abi.pl: stop check loop earlier when regex is found Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 15:41     ` [PATCH 8/8] scripts: get_abi.pl: ensure that "others" regex will be parsed Mauro Carvalho Chehab
  2021-09-23 17:13     ` [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

In order to earn some time during matches, pre-compile regexes.

Before this patch:
	$ time ./scripts/get_abi.pl undefined |wc -l
	6970

	real	0m54,751s
	user	0m54,022s
	sys	0m0,592s

Afterwards:

	$ time ./scripts/get_abi.pl undefined |wc -l
	6970

	real	0m5,888s
	user	0m5,310s
	sys	0m0,562s

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d45e5ba56f9c..f2b5efef9c30 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -25,6 +25,7 @@ my $search_string;
 my $dbg_what_parsing = 1;
 my $dbg_what_open = 2;
 my $dbg_dump_abi_structs = 4;
+my $dbg_undefined = 8;
 
 #
 # If true, assumes that the description is formatted with ReST
@@ -692,7 +693,8 @@ sub check_undefined_symbols {
 		if (!defined($leaf{$leave})) {
 			$leave = "others";
 		}
-		my $what = $leaf{$leave};
+		my @expr = @{$leaf{$leave}->{expr}};
+		die ("missing rules for $leave") if (!defined($leaf{$leave}));
 
 		my $path = $file;
 		$path =~ s,(.*/).*,$1,;
@@ -702,10 +704,17 @@ sub check_undefined_symbols {
 			$found_string = 1;
 		}
 
-		foreach my $a (@names) {
-			print "--> $a\n" if ($found_string && $hint);
-			foreach my $w (split /\xac/, $what) {
-				if ($a =~ m#^$w$#) {
+		for (my $i = 0; $i < @names; $i++) {
+			if ($found_string && $hint) {
+				if (!$i) {
+					print "--> $names[$i]\n";
+				} else {
+					print "    $names[$i]\n";
+				}
+			}
+			foreach my $re (@expr) {
+				print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+				if ($names[$i] =~ $re) {
 					$exact = 1;
 					last;
 				}
@@ -715,6 +724,7 @@ sub check_undefined_symbols {
 		next if ($exact);
 
 		if ($hint && (!$search_string || $found_string)) {
+			my $what = $leaf{$leave}->{what};
 			$what =~ s/\xac/\n\t/g;
 			if ($leave ne "others") {
 				print "    more likely regexes:\n\t$what\n";
@@ -734,7 +744,7 @@ sub undefined_symbols {
 		no_chdir => 1
 	     }, $sysfs_prefix);
 
-	$leaf{"others"} = "";
+	$leaf{"others"}->{what} = "";
 
 	foreach my $w (sort keys %data) {
 		foreach my $what (split /\xac/,$w) {
@@ -792,14 +802,15 @@ sub undefined_symbols {
 			$what =~ s/sqrt(.*)/sqrt\(.*\)/;
 
 			my $leave = get_leave($what);
+
 			my $added = 0;
 			foreach my $l (split /\|/, $leave) {
 				if (defined($leaf{$l})) {
-					next if ($leaf{$l} =~ m/\b$what\b/);
-					$leaf{$l} .= "\xac" . $what;
+					next if ($leaf{$l}->{what} =~ m/\b$what\b/);
+					$leaf{$l}->{what} .= "\xac" . $what;
 					$added = 1;
 				} else {
-					$leaf{$l} = $what;
+					$leaf{$l}->{what} = $what;
 					$added = 1;
 				}
 			}
@@ -809,6 +820,15 @@ sub undefined_symbols {
 
 		}
 	}
+	# Compile regexes
+	foreach my $l (keys %leaf) {
+		my @expr;
+		foreach my $w(split /\xac/, $leaf{$l}->{what}) {
+			push @expr, qr /^$w$/;
+		}
+		$leaf{$l}->{expr} = \@expr;
+	}
+
 	# Take links into account
 	foreach my $link (keys %aliases) {
 		my $abs_file = $aliases{$link};
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 8/8] scripts: get_abi.pl: ensure that "others" regex will be parsed
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
                       ` (6 preceding siblings ...)
  2021-09-23 15:41     ` [PATCH 7/8] scripts: get_abi.pl: precompile what match regexes Mauro Carvalho Chehab
@ 2021-09-23 15:41     ` Mauro Carvalho Chehab
  2021-09-23 17:13     ` [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
  8 siblings, 0 replies; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-23 15:41 UTC (permalink / raw)
  To: #, YUyICHTRdfL8Ul7X, Linux Doc Mailing List, Greg Kroah-Hartman
  Cc: Mauro Carvalho Chehab, Jonathan Corbet, linux-kernel

The way the search algorithm works is that reduces the number of regex
expressions that will be checked for a given file entry at sysfs. It
does that by looking at the devnode name. For instance, when it checks for
this file:

	/sys/bus/pci/drivers/iosf_mbi_pci/bind

The logic will seek only the "What:" expressions that end with "bind".
Currently, there are just a couple of What expressions that matches
it:

	What: /sys/bus/fsl\-mc/drivers/.*/bind
	What: /sys/bus/pci/drivers/.*/bind

It will then run an O(n²) algorithm to seek, which runs quickly
when there are few regexs to seek. There are, however, some What:
expressions that end with a wildcard. Those are harder to process.
Right now, they're all grouped together at the "others" group.

As those don't depend on the basename of the node, add an extra
loop to ensure that those will be processed at the end, if
not done yet.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 scripts/get_abi.pl | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index f2b5efef9c30..f25c98b1971e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -723,6 +723,22 @@ sub check_undefined_symbols {
 		}
 		next if ($exact);
 
+		if ($leave ne "others") {
+			my @expr = @{$leaf{$leave}->{expr}};
+			for (my $i = 0; $i < @names; $i++) {
+				foreach my $re (@expr) {
+					print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+					if ($names[$i] =~ $re) {
+						$exact = 1;
+						last;
+					}
+				}
+				last if ($exact);
+			}
+			last if ($exact);
+		}
+		next if ($exact);
+
 		if ($hint && (!$search_string || $found_string)) {
 			my $what = $leaf{$leave}->{what};
 			$what =~ s/\xac/\n\t/g;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
                       ` (7 preceding siblings ...)
  2021-09-23 15:41     ` [PATCH 8/8] scripts: get_abi.pl: ensure that "others" regex will be parsed Mauro Carvalho Chehab
@ 2021-09-23 17:13     ` Greg Kroah-Hartman
  2021-09-27  8:55       ` Mauro Carvalho Chehab
  8 siblings, 1 reply; 31+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-23 17:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> Hi Greg,
> 
> As requested, this is exactly the same changes, rebased on the top of
> driver-core/driver-core-next.
> 
> -
> 
> It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> 
> With such changes, on my development tree, the script is taking 6 seconds to run 
> on my desktop:
> 
> 	$ !1076
> 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> 
> 	real	0m6,292s
> 	user	0m5,640s
> 	sys	0m0,634s
> 	  6838 undefined_after
> 	   808 undefined_symbols
> 	  7646 total
> 
> And 7 seconds on a Dell Precision 5820:
> 
> 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> 
> 	real	0m7.162s
> 	user	0m5.836s
> 	sys	0m1.329s
> 	6548 undefined
> 	772 undefined_symbols
> 
> Both tests were done against this tree (based on today's linux-next):
> 
> 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> 
> It should be noticed that, as my tree has several ABI fixes,  the time to run the
> script is likely less than if you run on your tree, as there will be less symbols to
> be reported, and the algorithm is optimized to reduce the number of regexes
> when a symbol is found.
> 
> Besides optimizing and improving the seek logic, this series also change the
> debug logic. It how receives a bitmap, where "8" means to print the regexes
> that will be used by "undefined" command:
> 
> 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> 	real	0m17,189s
> 	user	0m13,940s
> 	sys	0m2,404s
> 
> 	$wc -l foo
> 	18421939 foo
> 
> 	$ cat foo
> 	...
> 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> 	...
> 
> On other words, on my desktop, the /sys match is performing >18M regular 
> expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> enabled and sent to an area on my nvme storage).

Better, it's down to 10 minutes on my machine now:

	real	10m39.218s
	user	10m37.742s
	sys	0m0.775s

thanks!

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-23 17:13     ` [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
@ 2021-09-27  8:55       ` Mauro Carvalho Chehab
  2021-09-27  9:23         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-27  8:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

Em Thu, 23 Sep 2021 19:13:04 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:

> On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > Hi Greg,
> > 
> > As requested, this is exactly the same changes, rebased on the top of
> > driver-core/driver-core-next.
> > 
> > -
> > 
> > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > 
> > With such changes, on my development tree, the script is taking 6 seconds to run 
> > on my desktop:
> > 
> > 	$ !1076
> > 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > 
> > 	real	0m6,292s
> > 	user	0m5,640s
> > 	sys	0m0,634s
> > 	  6838 undefined_after
> > 	   808 undefined_symbols
> > 	  7646 total
> > 
> > And 7 seconds on a Dell Precision 5820:
> > 
> > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > 
> > 	real	0m7.162s
> > 	user	0m5.836s
> > 	sys	0m1.329s
> > 	6548 undefined
> > 	772 undefined_symbols
> > 
> > Both tests were done against this tree (based on today's linux-next):
> > 
> > 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > 
> > It should be noticed that, as my tree has several ABI fixes,  the time to run the
> > script is likely less than if you run on your tree, as there will be less symbols to
> > be reported, and the algorithm is optimized to reduce the number of regexes
> > when a symbol is found.
> > 
> > Besides optimizing and improving the seek logic, this series also change the
> > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > that will be used by "undefined" command:
> > 
> > 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > 	real	0m17,189s
> > 	user	0m13,940s
> > 	sys	0m2,404s
> > 
> > 	$wc -l foo
> > 	18421939 foo
> > 
> > 	$ cat foo
> > 	...
> > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > 	...
> > 
> > On other words, on my desktop, the /sys match is performing >18M regular 
> > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> > enabled and sent to an area on my nvme storage).  
> 
> Better, it's down to 10 minutes on my machine now:
> 
> 	real	10m39.218s
> 	user	10m37.742s
> 	sys	0m0.775s

A lot better, but not clear why it is still taking ~40x more than here...
It could well be due to the other ABI changes yet to be applied
(I'll submit it probably later today), but it could also be related to
something else. Could this be due to disk writes?

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-27  8:55       ` Mauro Carvalho Chehab
@ 2021-09-27  9:23         ` Greg Kroah-Hartman
  2021-09-27 13:39           ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 31+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-27  9:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 23 Sep 2021 19:13:04 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> 
> > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi Greg,
> > > 
> > > As requested, this is exactly the same changes, rebased on the top of
> > > driver-core/driver-core-next.
> > > 
> > > -
> > > 
> > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > 
> > > With such changes, on my development tree, the script is taking 6 seconds to run 
> > > on my desktop:
> > > 
> > > 	$ !1076
> > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > 
> > > 	real	0m6,292s
> > > 	user	0m5,640s
> > > 	sys	0m0,634s
> > > 	  6838 undefined_after
> > > 	   808 undefined_symbols
> > > 	  7646 total
> > > 
> > > And 7 seconds on a Dell Precision 5820:
> > > 
> > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > 
> > > 	real	0m7.162s
> > > 	user	0m5.836s
> > > 	sys	0m1.329s
> > > 	6548 undefined
> > > 	772 undefined_symbols
> > > 
> > > Both tests were done against this tree (based on today's linux-next):
> > > 
> > > 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > 
> > > It should be noticed that, as my tree has several ABI fixes,  the time to run the
> > > script is likely less than if you run on your tree, as there will be less symbols to
> > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > when a symbol is found.
> > > 
> > > Besides optimizing and improving the seek logic, this series also change the
> > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > that will be used by "undefined" command:
> > > 
> > > 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > 	real	0m17,189s
> > > 	user	0m13,940s
> > > 	sys	0m2,404s
> > > 
> > > 	$wc -l foo
> > > 	18421939 foo
> > > 
> > > 	$ cat foo
> > > 	...
> > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > 	...
> > > 
> > > On other words, on my desktop, the /sys match is performing >18M regular 
> > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> > > enabled and sent to an area on my nvme storage).  
> > 
> > Better, it's down to 10 minutes on my machine now:
> > 
> > 	real	10m39.218s
> > 	user	10m37.742s
> > 	sys	0m0.775s
> 
> A lot better, but not clear why it is still taking ~40x more than here...
> It could well be due to the other ABI changes yet to be applied
> (I'll submit it probably later today), but it could also be related to
> something else. Could this be due to disk writes?

Disk writes to where for what?  This is a very fast disk (nvme raid
array)  It's also a very "big" system, with lots of sysfs files:

	$ find /sys/devices/ -type f | wc -l
	44334

compared to my laptop that only has 17k entries in /sys/devices/

I'll run this updated script on my laptop later today and give you some
numbers.  And any Documentation/ABI/ updates you might have I'll gladly
take as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-27  9:23         ` Greg Kroah-Hartman
@ 2021-09-27 13:39           ` Mauro Carvalho Chehab
  2021-09-27 15:48             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-27 13:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

Em Mon, 27 Sep 2021 11:23:20 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:

> On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> > Em Thu, 23 Sep 2021 19:13:04 +0200
> > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> >   
> > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:  
> > > > Hi Greg,
> > > > 
> > > > As requested, this is exactly the same changes, rebased on the top of
> > > > driver-core/driver-core-next.
> > > > 
> > > > -
> > > > 
> > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > 
> > > > With such changes, on my development tree, the script is taking 6 seconds to run 
> > > > on my desktop:
> > > > 
> > > > 	$ !1076
> > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > 
> > > > 	real	0m6,292s
> > > > 	user	0m5,640s
> > > > 	sys	0m0,634s
> > > > 	  6838 undefined_after
> > > > 	   808 undefined_symbols
> > > > 	  7646 total
> > > > 
> > > > And 7 seconds on a Dell Precision 5820:
> > > > 
> > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > 
> > > > 	real	0m7.162s
> > > > 	user	0m5.836s
> > > > 	sys	0m1.329s
> > > > 	6548 undefined
> > > > 	772 undefined_symbols
> > > > 
> > > > Both tests were done against this tree (based on today's linux-next):
> > > > 
> > > > 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > 
> > > > It should be noticed that, as my tree has several ABI fixes,  the time to run the
> > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > when a symbol is found.
> > > > 
> > > > Besides optimizing and improving the seek logic, this series also change the
> > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > that will be used by "undefined" command:
> > > > 
> > > > 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > 	real	0m17,189s
> > > > 	user	0m13,940s
> > > > 	sys	0m2,404s
> > > > 
> > > > 	$wc -l foo
> > > > 	18421939 foo
> > > > 
> > > > 	$ cat foo
> > > > 	...
> > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > 	...
> > > > 
> > > > On other words, on my desktop, the /sys match is performing >18M regular 
> > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> > > > enabled and sent to an area on my nvme storage).    
> > > 
> > > Better, it's down to 10 minutes on my machine now:
> > > 
> > > 	real	10m39.218s
> > > 	user	10m37.742s
> > > 	sys	0m0.775s  
> > 
> > A lot better, but not clear why it is still taking ~40x more than here...
> > It could well be due to the other ABI changes yet to be applied
> > (I'll submit it probably later today), but it could also be related to
> > something else. Could this be due to disk writes?  
> 
> Disk writes to where for what?  This is a very fast disk (nvme raid
> array)  It's also a very "big" system, with lots of sysfs files:
> 
> 	$ find /sys/devices/ -type f | wc -l
> 	44334

Ok. Maybe that partially explains why it is taking so long, as the
number of regex to compare will increase (not linearly).

> compared to my laptop that only has 17k entries in /sys/devices/
> 
> I'll run this updated script on my laptop later today and give you some
> numbers.

Ok, thanks!

> And any Documentation/ABI/ updates you might have I'll gladly
> take as well.

I'll be submitting it soon enough. Got sidetracked by a regression
on my INBOX due to a fetchmail regression[1].

> thanks,
> 
> greg k-h

[1] https://gitlab.com/fetchmail/fetchmail/-/issues/39

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-27 13:39           ` Mauro Carvalho Chehab
@ 2021-09-27 15:48             ` Greg Kroah-Hartman
  2021-09-28 10:03               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 31+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-27 15:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

On Mon, Sep 27, 2021 at 03:39:42PM +0200, Mauro Carvalho Chehab wrote:
> Em Mon, 27 Sep 2021 11:23:20 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> 
> > On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> > > Em Thu, 23 Sep 2021 19:13:04 +0200
> > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> > >   
> > > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:  
> > > > > Hi Greg,
> > > > > 
> > > > > As requested, this is exactly the same changes, rebased on the top of
> > > > > driver-core/driver-core-next.
> > > > > 
> > > > > -
> > > > > 
> > > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > > 
> > > > > With such changes, on my development tree, the script is taking 6 seconds to run 
> > > > > on my desktop:
> > > > > 
> > > > > 	$ !1076
> > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > > 
> > > > > 	real	0m6,292s
> > > > > 	user	0m5,640s
> > > > > 	sys	0m0,634s
> > > > > 	  6838 undefined_after
> > > > > 	   808 undefined_symbols
> > > > > 	  7646 total
> > > > > 
> > > > > And 7 seconds on a Dell Precision 5820:
> > > > > 
> > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > 
> > > > > 	real	0m7.162s
> > > > > 	user	0m5.836s
> > > > > 	sys	0m1.329s
> > > > > 	6548 undefined
> > > > > 	772 undefined_symbols
> > > > > 
> > > > > Both tests were done against this tree (based on today's linux-next):
> > > > > 
> > > > > 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > > 
> > > > > It should be noticed that, as my tree has several ABI fixes,  the time to run the
> > > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > > when a symbol is found.
> > > > > 
> > > > > Besides optimizing and improving the seek logic, this series also change the
> > > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > > that will be used by "undefined" command:
> > > > > 
> > > > > 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > > 	real	0m17,189s
> > > > > 	user	0m13,940s
> > > > > 	sys	0m2,404s
> > > > > 
> > > > > 	$wc -l foo
> > > > > 	18421939 foo
> > > > > 
> > > > > 	$ cat foo
> > > > > 	...
> > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > > 	...
> > > > > 
> > > > > On other words, on my desktop, the /sys match is performing >18M regular 
> > > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> > > > > enabled and sent to an area on my nvme storage).    
> > > > 
> > > > Better, it's down to 10 minutes on my machine now:
> > > > 
> > > > 	real	10m39.218s
> > > > 	user	10m37.742s
> > > > 	sys	0m0.775s  
> > > 
> > > A lot better, but not clear why it is still taking ~40x more than here...
> > > It could well be due to the other ABI changes yet to be applied
> > > (I'll submit it probably later today), but it could also be related to
> > > something else. Could this be due to disk writes?  
> > 
> > Disk writes to where for what?  This is a very fast disk (nvme raid
> > array)  It's also a very "big" system, with lots of sysfs files:
> > 
> > 	$ find /sys/devices/ -type f | wc -l
> > 	44334
> 
> Ok. Maybe that partially explains why it is taking so long, as the
> number of regex to compare will increase (not linearly).

No idea.  I just ran it on my laptop and it took only 5 seconds.

Hm, you aren't reading the values of the sysfs files, right?

Anything I can do to run to help figure out where the script is taking
so long?

> > And any Documentation/ABI/ updates you might have I'll gladly
> > take as well.
> 
> I'll be submitting it soon enough. Got sidetracked by a regression
> on my INBOX due to a fetchmail regression[1].

Ick, fetchmail.  I recommend getmail instead, much more robust and a
sane maintainer :)

I'll take a look at those patches now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-27 15:48             ` Greg Kroah-Hartman
@ 2021-09-28 10:03               ` Mauro Carvalho Chehab
  2021-09-28 10:43                 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 31+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-28 10:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

Em Mon, 27 Sep 2021 17:48:05 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:

> On Mon, Sep 27, 2021 at 03:39:42PM +0200, Mauro Carvalho Chehab wrote:
> > Em Mon, 27 Sep 2021 11:23:20 +0200
> > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> >   
> > > On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:  
> > > > Em Thu, 23 Sep 2021 19:13:04 +0200
> > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> > > >     
> > > > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:    
> > > > > > Hi Greg,
> > > > > > 
> > > > > > As requested, this is exactly the same changes, rebased on the top of
> > > > > > driver-core/driver-core-next.
> > > > > > 
> > > > > > -
> > > > > > 
> > > > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > > > 
> > > > > > With such changes, on my development tree, the script is taking 6 seconds to run 
> > > > > > on my desktop:
> > > > > > 
> > > > > > 	$ !1076
> > > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > > > 
> > > > > > 	real	0m6,292s
> > > > > > 	user	0m5,640s
> > > > > > 	sys	0m0,634s
> > > > > > 	  6838 undefined_after
> > > > > > 	   808 undefined_symbols
> > > > > > 	  7646 total
> > > > > > 
> > > > > > And 7 seconds on a Dell Precision 5820:
> > > > > > 
> > > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > > 
> > > > > > 	real	0m7.162s
> > > > > > 	user	0m5.836s
> > > > > > 	sys	0m1.329s
> > > > > > 	6548 undefined
> > > > > > 	772 undefined_symbols
> > > > > > 
> > > > > > Both tests were done against this tree (based on today's linux-next):
> > > > > > 
> > > > > > 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > > > 
> > > > > > It should be noticed that, as my tree has several ABI fixes,  the time to run the
> > > > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > > > when a symbol is found.
> > > > > > 
> > > > > > Besides optimizing and improving the seek logic, this series also change the
> > > > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > > > that will be used by "undefined" command:
> > > > > > 
> > > > > > 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > > > 	real	0m17,189s
> > > > > > 	user	0m13,940s
> > > > > > 	sys	0m2,404s
> > > > > > 
> > > > > > 	$wc -l foo
> > > > > > 	18421939 foo
> > > > > > 
> > > > > > 	$ cat foo
> > > > > > 	...
> > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > > > 	...
> > > > > > 
> > > > > > On other words, on my desktop, the /sys match is performing >18M regular 
> > > > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> > > > > > enabled and sent to an area on my nvme storage).      
> > > > > 
> > > > > Better, it's down to 10 minutes on my machine now:
> > > > > 
> > > > > 	real	10m39.218s
> > > > > 	user	10m37.742s
> > > > > 	sys	0m0.775s    
> > > > 
> > > > A lot better, but not clear why it is still taking ~40x more than here...
> > > > It could well be due to the other ABI changes yet to be applied
> > > > (I'll submit it probably later today), but it could also be related to
> > > > something else. Could this be due to disk writes?    
> > > 
> > > Disk writes to where for what?  This is a very fast disk (nvme raid
> > > array)  It's also a very "big" system, with lots of sysfs files:
> > > 
> > > 	$ find /sys/devices/ -type f | wc -l
> > > 	44334  
> > 
> > Ok. Maybe that partially explains why it is taking so long, as the
> > number of regex to compare will increase (not linearly).  
> 
> No idea.  I just ran it on my laptop and it took only 5 seconds.

Ok, 5 seconds is similar to what I got here on the machines I
tested so far. I'm waiting for a (shared) big machine to be available
in order to be able to do some tests on it.

> Hm, you aren't reading the values of the sysfs files, right?

No. Just retrieving the directory contents. That part is actually
fast: it takes less than 2 seconds here to read all ABI + traverse
sysfs directories. Also, from your past logs, the time is spent
later on, when it is handling the regex. On that time, there are
just the regex parsing and printing the results. 

> Anything I can do to run to help figure out where the script is taking
> so long?

Not sure if it is worth the efforts. I mean, the relationship
between the number of processed sysfs nodes and the number of regex
to be tested (using big-oh and big-omega notation) should be between
Ω(n . log(n)) and O(n^2 . log(n)). There's not much space left for
optimizing it, I guess.

So, I would expect that a big server would take a log more time to
process, it, due to the larger number of sysfs entries.

Also, if one wants to speedup on a big machine, it could either
exclude some pattern, like:

	# Won't parse any PCI devices
	$time ./scripts/get_abi.pl undefined --search-string '^(?!.*pci)' |wc -l
	8438

	real	0m3,494s
	user	0m2,829s
	sys	0m0,658s

or (more likely) just search for an specific part of the ABI:

	# Seek ABI only for PCI devices
 	$ ./scripts/get_abi.pl undefined --search-string pci

---

After sleeping on it, I opted to implement some progress information.

That will help to identify any issues that might be causing the
script to take so long to finish.

I'll send the patches on a new series.

> 
> > > And any Documentation/ABI/ updates you might have I'll gladly
> > > take as well.  
> > 
> > I'll be submitting it soon enough. Got sidetracked by a regression
> > on my INBOX due to a fetchmail regression[1].  
> 
> Ick, fetchmail.  I recommend getmail instead, much more robust and a
> sane maintainer :)

Hmm... interesting. Never tried getmail. I guess I'll give it a
try. It is a shame that Fedora doesn't package it yet.

> 
> I'll take a look at those patches now.
> 
> thanks,
> 
> greg k-h



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance
  2021-09-28 10:03               ` Mauro Carvalho Chehab
@ 2021-09-28 10:43                 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 31+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-28 10:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, linux-kernel, Jonathan Corbet

On Tue, Sep 28, 2021 at 12:03:04PM +0200, Mauro Carvalho Chehab wrote:
> Em Mon, 27 Sep 2021 17:48:05 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> 
> > On Mon, Sep 27, 2021 at 03:39:42PM +0200, Mauro Carvalho Chehab wrote:
> > > Em Mon, 27 Sep 2021 11:23:20 +0200
> > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> > >   
> > > > On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:  
> > > > > Em Thu, 23 Sep 2021 19:13:04 +0200
> > > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:
> > > > >     
> > > > > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:    
> > > > > > > Hi Greg,
> > > > > > > 
> > > > > > > As requested, this is exactly the same changes, rebased on the top of
> > > > > > > driver-core/driver-core-next.
> > > > > > > 
> > > > > > > -
> > > > > > > 
> > > > > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > > > > 
> > > > > > > With such changes, on my development tree, the script is taking 6 seconds to run 
> > > > > > > on my desktop:
> > > > > > > 
> > > > > > > 	$ !1076
> > > > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > > > > 
> > > > > > > 	real	0m6,292s
> > > > > > > 	user	0m5,640s
> > > > > > > 	sys	0m0,634s
> > > > > > > 	  6838 undefined_after
> > > > > > > 	   808 undefined_symbols
> > > > > > > 	  7646 total
> > > > > > > 
> > > > > > > And 7 seconds on a Dell Precision 5820:
> > > > > > > 
> > > > > > > 	$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > > > 
> > > > > > > 	real	0m7.162s
> > > > > > > 	user	0m5.836s
> > > > > > > 	sys	0m1.329s
> > > > > > > 	6548 undefined
> > > > > > > 	772 undefined_symbols
> > > > > > > 
> > > > > > > Both tests were done against this tree (based on today's linux-next):
> > > > > > > 
> > > > > > > 	$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > > > > 
> > > > > > > It should be noticed that, as my tree has several ABI fixes,  the time to run the
> > > > > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > > > > when a symbol is found.
> > > > > > > 
> > > > > > > Besides optimizing and improving the seek logic, this series also change the
> > > > > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > > > > that will be used by "undefined" command:
> > > > > > > 
> > > > > > > 	$ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > > > > 	real	0m17,189s
> > > > > > > 	user	0m13,940s
> > > > > > > 	sys	0m2,404s
> > > > > > > 
> > > > > > > 	$wc -l foo
> > > > > > > 	18421939 foo
> > > > > > > 
> > > > > > > 	$ cat foo
> > > > > > > 	...
> > > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > > > > 	/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > > > > 	...
> > > > > > > 
> > > > > > > On other words, on my desktop, the /sys match is performing >18M regular 
> > > > > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is 
> > > > > > > enabled and sent to an area on my nvme storage).      
> > > > > > 
> > > > > > Better, it's down to 10 minutes on my machine now:
> > > > > > 
> > > > > > 	real	10m39.218s
> > > > > > 	user	10m37.742s
> > > > > > 	sys	0m0.775s    
> > > > > 
> > > > > A lot better, but not clear why it is still taking ~40x more than here...
> > > > > It could well be due to the other ABI changes yet to be applied
> > > > > (I'll submit it probably later today), but it could also be related to
> > > > > something else. Could this be due to disk writes?    
> > > > 
> > > > Disk writes to where for what?  This is a very fast disk (nvme raid
> > > > array)  It's also a very "big" system, with lots of sysfs files:
> > > > 
> > > > 	$ find /sys/devices/ -type f | wc -l
> > > > 	44334  
> > > 
> > > Ok. Maybe that partially explains why it is taking so long, as the
> > > number of regex to compare will increase (not linearly).  
> > 
> > No idea.  I just ran it on my laptop and it took only 5 seconds.
> 
> Ok, 5 seconds is similar to what I got here on the machines I
> tested so far. I'm waiting for a (shared) big machine to be available
> in order to be able to do some tests on it.
> 
> > Hm, you aren't reading the values of the sysfs files, right?
> 
> No. Just retrieving the directory contents. That part is actually
> fast: it takes less than 2 seconds here to read all ABI + traverse
> sysfs directories. Also, from your past logs, the time is spent
> later on, when it is handling the regex. On that time, there are
> just the regex parsing and printing the results. 
> 
> > Anything I can do to run to help figure out where the script is taking
> > so long?
> 
> Not sure if it is worth the efforts. I mean, the relationship
> between the number of processed sysfs nodes and the number of regex
> to be tested (using big-oh and big-omega notation) should be between
> Ω(n . log(n)) and O(n^2 . log(n)). There's not much space left for
> optimizing it, I guess.
> 
> So, I would expect that a big server would take a log more time to
> process, it, due to the larger number of sysfs entries.
> 
> Also, if one wants to speedup on a big machine, it could either
> exclude some pattern, like:
> 
> 	# Won't parse any PCI devices
> 	$time ./scripts/get_abi.pl undefined --search-string '^(?!.*pci)' |wc -l
> 	8438
> 
> 	real	0m3,494s
> 	user	0m2,829s
> 	sys	0m0,658s

That only takes 8 seconds on this box:
	$ time ./scripts/get_abi.pl undefined  --search-string '^(?!.*pci)' |wc -l
	18872

	real	0m8.026s
	user	0m7.300s
	sys	0m0.726s

> or (more likely) just search for an specific part of the ABI:
> 
> 	# Seek ABI only for PCI devices
>  	$ ./scripts/get_abi.pl undefined --search-string pci

This takes much longer, I didn't want to wait the 10 minutes :)

> ---
> 
> After sleeping on it, I opted to implement some progress information.
> 
> That will help to identify any issues that might be causing the
> script to take so long to finish.
> 
> I'll send the patches on a new series.

Thanks, I'll go try those now...

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-09-28 10:43 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23 13:29 [PATCH 00/13] get_abi.pl undefined: improve precision and performance Mauro Carvalho Chehab
2021-09-23 13:29 ` [PATCH 01/13] scripts: get_abi.pl: Better handle multiple What parameters Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 02/13] scripts: get_abi.pl: Check for missing symbols at the ABI specs Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 03/13] scripts: get_abi.pl: detect softlinks Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 04/13] scripts: get_abi.pl: add an option to filter undefined results Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 05/13] scripts: get_abi.pl: don't skip what that ends with wildcards Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 06/13] scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 07/13] scripts: get_abi.pl: add a graph to speedup the undefined algorithm Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 08/13] scripts: get_abi.pl: improve debug logic Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 09/13] scripts: get_abi.pl: Better handle leaves with wildcards Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 10/13] scripts: get_abi.pl: ignore some sysfs nodes earlier Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 11/13] scripts: get_abi.pl: stop check loop earlier when regex is found Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 12/13] scripts: get_abi.pl: precompile what match regexes Mauro Carvalho Chehab
2021-09-23 13:30 ` [PATCH 13/13] scripts: get_abi.pl: ensure that "others" regex will be parsed Mauro Carvalho Chehab
2021-09-23 13:58 ` [PATCH 00/13] get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
2021-09-23 15:41   ` [PATCH 0/8] (REBASED) " Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 1/8] scripts: get_abi.pl: Fix get_abi.pl search output Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 2/8] scripts: get_abi.pl: call get_leave() a little late Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 3/8] scripts: get_abi.pl: improve debug logic Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 4/8] scripts: get_abi.pl: Better handle leaves with wildcards Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 5/8] scripts: get_abi.pl: ignore some sysfs nodes earlier Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 6/8] scripts: get_abi.pl: stop check loop earlier when regex is found Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 7/8] scripts: get_abi.pl: precompile what match regexes Mauro Carvalho Chehab
2021-09-23 15:41     ` [PATCH 8/8] scripts: get_abi.pl: ensure that "others" regex will be parsed Mauro Carvalho Chehab
2021-09-23 17:13     ` [PATCH 0/8] (REBASED) get_abi.pl undefined: improve precision and performance Greg Kroah-Hartman
2021-09-27  8:55       ` Mauro Carvalho Chehab
2021-09-27  9:23         ` Greg Kroah-Hartman
2021-09-27 13:39           ` Mauro Carvalho Chehab
2021-09-27 15:48             ` Greg Kroah-Hartman
2021-09-28 10:03               ` Mauro Carvalho Chehab
2021-09-28 10:43                 ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).