linux-modules.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols
@ 2022-09-20  7:13 Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

v3 --> v4:
1. Move the declaration of function kallsyms_sym_address() to linux/kallsyms.h,
   fix a build warning.

v2 --> v3:
1. Improve test cases, perform complete functional tests on functions
   kallsyms_lookup_name(), kallsyms_on_each_symbol() and
   kallsyms_on_each_match_symbol().
2. Add patch [PATCH v3 2/8] scripts/kallsyms: ensure that all possible
   combinations are compressed.
3. The symbol type is not compressed regardless of whether
   CONFIG_KALLSYMS_ALL is set or not. The memory overhead is increased
   by less than 20KiB if CONFIG_KALLSYMS_ALL=n.
4. Discard [PATCH v2 3/8] kallsyms: Adjust the types of some local variables

v1 --> v2:
Add self-test facility

v1:
Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. This is very slow.

In fact, we can first compress the name being looked up and then use
it for comparison when traversing 'kallsyms_names'.

This patch series optimizes the performance of function kallsyms_lookup_name(),
and function klp_find_object_symbol() in the livepatch module. Based on the
test results, the performance overhead is reduced to 5%. That is, the
performance of these functions is improved by 20 times.

To avoid increasing the kernel size in non-debug mode, the optimization is only
for the case CONFIG_KALLSYMS_ALL=y.


Zhen Lei (8):
  scripts/kallsyms: rename build_initial_tok_table()
  scripts/kallsyms: ensure that all possible combinations are compressed
  scripts/kallsyms: don't compress symbol types
  kallsyms: Improve the performance of kallsyms_lookup_name()
  kallsyms: Add helper kallsyms_on_each_match_symbol()
  livepatch: Use kallsyms_on_each_match_symbol() to improve performance
  livepatch: Improve the search performance of
    module_kallsyms_on_each_symbol()
  kallsyms: Add self-test facility

 include/linux/kallsyms.h   |   9 +
 init/Kconfig               |  13 ++
 kernel/Makefile            |   1 +
 kernel/kallsyms.c          | 106 +++++++++-
 kernel/kallsyms_selftest.c | 423 +++++++++++++++++++++++++++++++++++++
 kernel/livepatch/core.c    |  25 ++-
 kernel/module/kallsyms.c   |  13 +-
 scripts/kallsyms.c         |  24 ++-
 8 files changed, 598 insertions(+), 16 deletions(-)
 create mode 100644 kernel/kallsyms_selftest.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table()
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-21  7:47   ` Petr Mladek
  2022-09-20  7:13 ` [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed Zhen Lei
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Except for the function build_initial_tok_table(), no token abbreviation
is used elsewhere.

$ cat scripts/kallsyms.c | grep tok | wc -l
33
$ cat scripts/kallsyms.c | grep token | wc -l
31

Here, it would be clearer to use the full name.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 scripts/kallsyms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index f18e6dfc68c5839..8caccc8f4a23703 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -525,7 +525,7 @@ static void forget_symbol(const unsigned char *symbol, int len)
 }
 
 /* do the initial token count */
-static void build_initial_tok_table(void)
+static void build_initial_token_table(void)
 {
 	unsigned int i;
 
@@ -650,7 +650,7 @@ static void insert_real_symbols_in_table(void)
 
 static void optimize_token_table(void)
 {
-	build_initial_tok_table();
+	build_initial_token_table();
 
 	insert_real_symbols_in_table();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-21  8:00   ` Petr Mladek
  2022-09-20  7:13 ` [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types Zhen Lei
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

For a symbol, there may be more than one place that can be merged. For
example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
And we're only compressing the first combination at the moment. Let's
compress all possible combinations.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 scripts/kallsyms.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 8caccc8f4a23703..3319d9f38d7a5f2 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
 	unsigned char *p1, *p2;
 
 	for (i = 0; i < table_cnt; i++) {
-
+retry:
 		len = table[i]->len;
 		p1 = table[i]->sym;
 
@@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
 
 		/* increase the counts for this symbol's new tokens */
 		learn_symbol(table[i]->sym, len);
+
+		/* May be more than one place that can be merged, try again */
+		goto retry;
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-21  9:00   ` Petr Mladek
  2022-09-20  7:13 ` [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. Because we do not know the symbol type, and the symbol type
may be combined with the following characters to form a token.

So if we don't compress the symbol type, we can first compress the
searched symbol and then make a quick comparison based on the compressed
length and content. In this way, for entries with mismatched lengths,
there is no need to expand and compare strings. And for those matching
lengths, there's no need to expand the symbol. This saves a lot of time.
According to my test results, the average performance of
kallsyms_lookup_name() can be improved by 20 to 30 times.

Of course, because the symbol type is forcibly not compressed, the
compression rate also decreases. Here are the test results with
defconfig:

arm64: <<<<<<
        ---------------------------------------------------------------
       | ALL | nr_symbols | compressed size | original size | ratio(%) |
        -----|---------------------------------------------------------|
Before |  Y  |     174094 |       1884938   |      3750653  |  50.25   |
After  |  Y  |     174099 |       1960154   |      3750756  |  52.26   |
Before |  N  |      61744 |        725507   |      1222737  |  59.33   |
After  |  N  |      61747 |        745733   |      1222801  |  60.98   |
        ---------------------------------------------------------------
The memory overhead is increased by:
  73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
  19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.

x86: <<<<<<<<
       ---------------------------------------------------------------
       | ALL | nr_symbols | compressed size | original size | ratio(%) |
        -----|---------------------------------------------------------|
Before |  Y  |     131415 |       1697542   |      3161216  |  53.69   |
After  |  Y  |     131540 |       1747769   |      3163933  |  55.24   |
Before |  N  |      60695 |        737627   |      1283046  |  57.49   |
After  |  N  |      60699 |        754797   |      1283149  |  58.82   |
        ---------------------------------------------------------------
The memory overhead is increased by:
  49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
  16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.

This additional memory overhead is worth it compared to the performance
improvement, I think.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 scripts/kallsyms.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -61,6 +61,15 @@ static int all_symbols;
 static int absolute_percpu;
 static int base_relative;
 
+/*
+ * Each entry in the symbol table consists of the symbol type and the symbol
+ * itself. To optimize the performance of finding or traversing symbols in
+ * kernel, do not compress the symbol type. In this way, when looking for a
+ * symbol of unknown type, we can first compress the searched symbol and then
+ * make a quick comparison based on the compressed length and content.
+ */
+static int sym_start_idx = 1;
+
 static int token_profit[0x10000];
 
 /* the table that holds the result of the compression */
@@ -511,7 +520,7 @@ static void learn_symbol(const unsigned char *symbol, int len)
 {
 	int i;
 
-	for (i = 0; i < len - 1; i++)
+	for (i = sym_start_idx; i < len - 1; i++)
 		token_profit[ symbol[i] + (symbol[i + 1] << 8) ]++;
 }
 
@@ -520,7 +529,7 @@ static void forget_symbol(const unsigned char *symbol, int len)
 {
 	int i;
 
-	for (i = 0; i < len - 1; i++)
+	for (i = sym_start_idx; i < len - 1; i++)
 		token_profit[ symbol[i] + (symbol[i + 1] << 8) ]--;
 }
 
@@ -538,7 +547,7 @@ static unsigned char *find_token(unsigned char *str, int len,
 {
 	int i;
 
-	for (i = 0; i < len - 1; i++) {
+	for (i = sym_start_idx; i < len - 1; i++) {
 		if (str[i] == token[0] && str[i+1] == token[1])
 			return &str[i];
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
                   ` (2 preceding siblings ...)
  2022-09-20  7:13 ` [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-21 15:25   ` Petr Mladek
  2022-09-20  7:13 ` [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. This process can be optimized.

And now scripts/kallsyms no longer compresses the symbol types, each
symbol type always occupies one byte. So we can first compress the
searched symbol and then make a quick comparison based on the compressed
length and content. In this way, for entries with mismatched lengths,
there is no need to expand and compare strings. And for those matching
lengths, there's no need to expand the symbol. This saves a lot of time.
According to my test results, the average performance of
kallsyms_lookup_name() can be improved by 20 to 30 times.

The pseudo code of the test case is as follows:
static int stat_find_name(...)
{
	start = sched_clock();
	(void)kallsyms_lookup_name(name);
	end = sched_clock();
	//Update min, max, cnt, sum
}

/*
 * Traverse all symbols in sequence and collect statistics on the time
 * taken by kallsyms_lookup_name() to lookup each symbol.
 */
kallsyms_on_each_symbol(stat_find_name, NULL);

The test results are as follows (twice):
After : min=5250, max=  726560, avg= 302132
After : min=5320, max=  726850, avg= 301978
Before: min=170,  max=15949190, avg=7553906
Before: min=160,  max=15877280, avg=7517784

The average time consumed is only 4.01% and the maximum time consumed is
only 4.57% of the time consumed before optimization.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
 	return off;
 }
 
+static int kallsyms_name_to_tokens(const char *name, char *buf)
+{
+	int i, j, k, n;
+	int len, token_len;
+	const char *token;
+	unsigned char token_idx[KSYM_NAME_LEN];
+	unsigned char token_bak[KSYM_NAME_LEN];
+
+	/*
+	 * n, number of tokens in the string name.
+	 * token_idx[i], the start index of the ith token.
+	 * token_idx[n] is used to calculate the length of the last token.
+	 */
+	n = strlen(name);
+	if (n >= KSYM_NAME_LEN) {
+		buf[0] = 0;
+		return 0;
+	}
+	for (i = 0; i <= n; i++)
+		token_idx[i] = (unsigned char)i;
+
+	/*
+	 * For tokens whose token_len >= 2, a larger index value indicates
+	 * a higher occurrence frequency. See scripts/kallsyms.c
+	 */
+	for (i = 255; i >= 0; i--) {
+		token = &kallsyms_token_table[kallsyms_token_index[i]];
+		token_len = strlen(token);
+		if (token_len <= 1)
+			continue;
+
+		/*
+		 * Find and merge two tokens into one.
+		 *
+		 *                |<-- new_token -->|
+		 *                | token1 | token2 |
+		 * token_idx[]:   j       j+1      j+2
+		 *
+		 */
+		for (j = 0; j < n - 1; j++) {
+			len = token_idx[j + 2] - token_idx[j];
+			if (len == token_len &&
+			    !strncmp(name + token_idx[j], token, len)) {
+				token_bak[token_idx[j]] = (unsigned char)i;
+				for (k = j + 1; k < n; k++)
+					token_idx[k] = token_idx[k + 1];
+				n--;
+			}
+		}
+	}
+
+	for (j = 0; j < n; j++) {
+		len = token_idx[j + 1] - token_idx[j];
+		if (len <= 1) {
+			buf[j] = name[token_idx[j]];
+			continue;
+		}
+
+		buf[j] = token_bak[token_idx[j]];
+	}
+	buf[n] = 0;
+
+	return n;
+}
+
 /*
  * Get symbol type information. This is encoded as a single char at the
  * beginning of the symbol name.
@@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
 	char namebuf[KSYM_NAME_LEN];
 	unsigned long i;
 	unsigned int off;
+	int len;
 
 	/* Skip the search for empty string. */
 	if (!*name)
 		return 0;
 
+	len = kallsyms_name_to_tokens(name, namebuf);
+	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
+		if (kallsyms_names[off] == len + 1 &&
+		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
+			return kallsyms_sym_address(i);
+
+		off += kallsyms_names[off] + 1;
+	}
+
 	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
 		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
 
-		if (strcmp(namebuf, name) == 0)
-			return kallsyms_sym_address(i);
-
 		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
 			return kallsyms_sym_address(i);
 	}
+
 	return module_kallsyms_lookup_name(name);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol()
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
                   ` (3 preceding siblings ...)
  2022-09-20  7:13 ` [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-21 15:30   ` Petr Mladek
  2022-09-20  7:13 ` [PATCH v4 6/8] livepatch: Use kallsyms_on_each_match_symbol() to improve performance Zhen Lei
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Function kallsyms_on_each_symbol() traverses all symbols and submits each
symbol to the hook 'fn' for judgment and processing. For some cases, the
hook actually only handles the matched symbol, such as livepatch.

So that, we can first compress the name being looked up and then use
it for comparison when traversing 'kallsyms_names', this greatly reduces
the time consumed by traversing.

The pseudo code of the test case is as follows:
static int tst_find(void *data, const char *name,
		    struct module *mod, unsigned long addr)
{
	if (strcmp(name, "vmap") == 0)
		*(unsigned long *)data = addr;
        return 0;
}

static int tst_match(void *data, unsigned long addr)
{
        *(unsigned long *)data = addr;
        return 0;
}

start = sched_clock();
kallsyms_on_each_match_symbol(tst_match, "vmap", &addr);
end = sched_clock();

start = sched_clock();
kallsyms_on_each_symbol(tst_find, &addr);
end = sched_clock();

The test results are as follows (twice):
kallsyms_on_each_match_symbol:   557400,   583900
kallsyms_on_each_symbol      : 16659500, 16113950

kallsyms_on_each_match_symbol() consumes only 3.48% of
kallsyms_on_each_symbol()'s time.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kallsyms.h |  8 ++++++++
 kernel/kallsyms.c        | 25 +++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index ad39636e0c3f122..f9f2cc084cab16b 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -69,6 +69,8 @@ static inline void *dereference_symbol_descriptor(void *ptr)
 int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 				      unsigned long),
 			    void *data);
+extern int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
+					 const char *name, void *data);
 
 /* Lookup the address for a symbol. Returns 0 if not found. */
 unsigned long kallsyms_lookup_name(const char *name);
@@ -168,6 +170,12 @@ static inline int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct
 {
 	return -EOPNOTSUPP;
 }
+
+static inline int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
+						const char *name, void *data)
+{
+	return -EOPNOTSUPP;
+}
 #endif /*CONFIG_KALLSYMS*/
 
 static inline void print_ip_sym(const char *loglvl, unsigned long ip)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 2d76196cfe89f34..cbcc9c560f5c188 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -305,6 +305,31 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 	return 0;
 }
 
+int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
+				  const char *name, void *data)
+{
+	unsigned int i, off;
+	int len, ret;
+	char namebuf[KSYM_NAME_LEN];
+
+	len = kallsyms_name_to_tokens(name, namebuf);
+	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
+		if ((i & 0xfff) == 0)
+			cond_resched();
+
+		if ((kallsyms_names[off] == len + 1) &&
+		    !memcmp(&kallsyms_names[off + 2], namebuf, len)) {
+			ret = fn(data, kallsyms_sym_address(i));
+			if (ret != 0)
+				return ret;
+			cond_resched();
+		}
+		off += kallsyms_names[off] + 1;
+	}
+
+	return 0;
+}
+
 static unsigned long get_symbol_pos(unsigned long addr,
 				    unsigned long *symbolsize,
 				    unsigned long *offset)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 6/8] livepatch: Use kallsyms_on_each_match_symbol() to improve performance
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
                   ` (4 preceding siblings ...)
  2022-09-20  7:13 ` [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 7/8] livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 8/8] kallsyms: Add self-test facility Zhen Lei
  7 siblings, 0 replies; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Based on the test results of kallsyms_on_each_match_symbol() and
kallsyms_on_each_symbol(), the average performance can be improved by 20
to 30 times.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/livepatch/core.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 42f7e716d56bf72..31b57ccf908017e 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -153,6 +153,24 @@ static int klp_find_callback(void *data, const char *name,
 	return 0;
 }
 
+static int klp_match_callback(void *data, unsigned long addr)
+{
+	struct klp_find_arg *args = data;
+
+	args->addr = addr;
+	args->count++;
+
+	/*
+	 * Finish the search when the symbol is found for the desired position
+	 * or the position is not defined for a non-unique symbol.
+	 */
+	if ((args->pos && (args->count == args->pos)) ||
+	    (!args->pos && (args->count > 1)))
+		return 1;
+
+	return 0;
+}
+
 static int klp_find_object_symbol(const char *objname, const char *name,
 				  unsigned long sympos, unsigned long *addr)
 {
@@ -167,7 +185,7 @@ static int klp_find_object_symbol(const char *objname, const char *name,
 	if (objname)
 		module_kallsyms_on_each_symbol(klp_find_callback, &args);
 	else
-		kallsyms_on_each_symbol(klp_find_callback, &args);
+		kallsyms_on_each_match_symbol(klp_match_callback, name, &args);
 
 	/*
 	 * Ensure an address was found. If sympos is 0, ensure symbol is unique;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 7/8] livepatch: Improve the search performance of module_kallsyms_on_each_symbol()
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
                   ` (5 preceding siblings ...)
  2022-09-20  7:13 ` [PATCH v4 6/8] livepatch: Use kallsyms_on_each_match_symbol() to improve performance Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  2022-09-20  7:13 ` [PATCH v4 8/8] kallsyms: Add self-test facility Zhen Lei
  7 siblings, 0 replies; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Currently we traverse all symbols of all modules to find the specified
function for the specified module. But in reality, we just need to find
the given module and then traverse all the symbols in it.

In order to achieve this purpose, split the call to hook 'fn' into two
phases:
1. Finds the given module. Pass pointer 'mod'. Hook 'fn' directly returns
   the comparison result of the module name without comparing the function
   name.
2. Finds the given function in that module. Pass pointer 'mod = NULL'.
   Hook 'fn' skip the comparison of module name and directly compare
   function names.

Phase1: mod1-->mod2..(subsequent modules do not need to be compared)
                |
Phase2:          -->f1-->f2-->f3

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/livepatch/core.c  |  7 ++-----
 kernel/module/kallsyms.c | 13 ++++++++++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 31b57ccf908017e..98e23137e4133bc 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -130,15 +130,12 @@ static int klp_find_callback(void *data, const char *name,
 {
 	struct klp_find_arg *args = data;
 
-	if ((mod && !args->objname) || (!mod && args->objname))
-		return 0;
+	if (mod)
+		return strcmp(args->objname, mod->name);
 
 	if (strcmp(args->name, name))
 		return 0;
 
-	if (args->objname && strcmp(args->objname, mod->name))
-		return 0;
-
 	args->addr = addr;
 	args->count++;
 
diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
index f5c5c9175333df7..b033613e6c7e3bb 100644
--- a/kernel/module/kallsyms.c
+++ b/kernel/module/kallsyms.c
@@ -510,6 +510,11 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
 		if (mod->state == MODULE_STATE_UNFORMED)
 			continue;
 
+		/* check mod->name first */
+		ret = fn(data, NULL, mod, 0);
+		if (ret)
+			continue;
+
 		/* Use rcu_dereference_sched() to remain compliant with the sparse tool */
 		preempt_disable();
 		kallsyms = rcu_dereference_sched(mod->kallsyms);
@@ -522,10 +527,16 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
 				continue;
 
 			ret = fn(data, kallsyms_symbol_name(kallsyms, i),
-				 mod, kallsyms_symbol_value(sym));
+				 NULL, kallsyms_symbol_value(sym));
 			if (ret != 0)
 				goto out;
 		}
+
+		/*
+		 * The given module is found, the subsequent modules do not
+		 * need to be compared.
+		 */
+		break;
 	}
 out:
 	mutex_unlock(&module_mutex);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 8/8] kallsyms: Add self-test facility
  2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
                   ` (6 preceding siblings ...)
  2022-09-20  7:13 ` [PATCH v4 7/8] livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Zhen Lei
@ 2022-09-20  7:13 ` Zhen Lei
  7 siblings, 0 replies; 25+ messages in thread
From: Zhen Lei @ 2022-09-20  7:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, live-patching, linux-kernel, Masahiro Yamada,
	Alexei Starovoitov, Jiri Olsa, Kees Cook, Andrew Morton,
	Luis Chamberlain, linux-modules
  Cc: Zhen Lei

Added test cases for basic functions and performance of functions
kallsyms_lookup_name(), kallsyms_on_each_symbol() and
kallsyms_on_each_match_symbol(). It also calculates the compression rate
of the kallsyms compression algorithm for the current symbol set.

The basic functions test begins by testing a set of symbols whose address
values are known. Then, traverse all symbol addresses and find the
corresponding symbol name based on the address. It's impossible to
determine whether these addresses are correct, but we can use the above
three functions along with the addresses to test each other. Due to the
traversal operation of kallsyms_on_each_symbol() is too slow, only 60
symbols can be tested in one second, so let it test on average once
every 128 symbols. The other two functions validate all symbols.

If the basic functions test is passed, print only performance test
results. If the test fails, print error information, but do not perform
subsequent performance tests.

Start self-test automatically after system startup if
CONFIG_KALLSYMS_SELFTEST=y.

Example of output content: (prefix 'kallsyms_selftest:' is omitted)
 start
  ---------------------------------------------------------
 | nr_symbols | compressed size | original size | ratio(%) |
 |---------------------------------------------------------|
 |     174099 |       1960154   |      3750756  |  52.26   |
  ---------------------------------------------------------
 kallsyms_lookup_name() looked up 174099 symbols
 The time spent on each symbol is (ns): min=5250, max=726560, avg=302132
 kallsyms_on_each_symbol() traverse all: 16659500 ns
 kallsyms_on_each_match_symbol() traverse all: 557400 ns
 finish

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 include/linux/kallsyms.h   |   1 +
 init/Kconfig               |  13 ++
 kernel/Makefile            |   1 +
 kernel/kallsyms.c          |   2 +-
 kernel/kallsyms_selftest.c | 423 +++++++++++++++++++++++++++++++++++++
 5 files changed, 439 insertions(+), 1 deletion(-)
 create mode 100644 kernel/kallsyms_selftest.c

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index f9f2cc084cab16b..b46e1d8b0c18316 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -66,6 +66,7 @@ static inline void *dereference_symbol_descriptor(void *ptr)
 }
 
 #ifdef CONFIG_KALLSYMS
+unsigned long kallsyms_sym_address(int idx);
 int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
 				      unsigned long),
 			    void *data);
diff --git a/init/Kconfig b/init/Kconfig
index 532362fcfe31fd3..60193fd185fb6e6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1716,6 +1716,19 @@ config KALLSYMS
 	  symbolic stack backtraces. This increases the size of the kernel
 	  somewhat, as all symbols have to be loaded into the kernel image.
 
+config KALLSYMS_SELFTEST
+	bool "Test the basic functions and performance of kallsyms"
+	depends on KALLSYMS
+	default n
+	help
+	  Test the basic functions and performance of some interfaces, such as
+	  kallsyms_lookup_name. It also calculates the compression rate of the
+	  kallsyms compression algorithm for the current symbol set.
+
+	  Start self-test automatically after system startup. Suggest executing
+	  "dmesg | grep kallsyms_selftest" to collect test results. "finish" is
+	  displayed in the last line, indicating that the test is complete.
+
 config KALLSYMS_ALL
 	bool "Include all symbols in kallsyms"
 	depends on DEBUG_KERNEL && KALLSYMS
diff --git a/kernel/Makefile b/kernel/Makefile
index 318789c728d3290..122a5fed457bd98 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -68,6 +68,7 @@ endif
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULE_SIG_FORMAT) += module_signature.o
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
+obj-$(CONFIG_KALLSYMS_SELFTEST) += kallsyms_selftest.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_CRASH_CORE) += crash_core.o
 obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index cbcc9c560f5c188..34e306eecbb60c8 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -193,7 +193,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
 	return name - kallsyms_names;
 }
 
-static unsigned long kallsyms_sym_address(int idx)
+unsigned long kallsyms_sym_address(int idx)
 {
 	if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
 		return kallsyms_addresses[idx];
diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
new file mode 100644
index 000000000000000..cf9751507e4337e
--- /dev/null
+++ b/kernel/kallsyms_selftest.c
@@ -0,0 +1,423 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Test the function and performance of kallsyms
+ *
+ * Copyright (C) Huawei Technologies Co., Ltd., 2022
+ *
+ * Authors: Zhen Lei <thunder.leizhen@huawei.com> Huawei
+ */
+
+#define pr_fmt(fmt) "kallsyms_selftest: " fmt
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/kallsyms.h>
+#include <linux/random.h>
+#include <linux/sched/clock.h>
+#include <linux/kthread.h>
+#include <linux/vmalloc.h>
+
+#include "kallsyms_internal.h"
+
+
+#define MAX_NUM_OF_RECORDS		64
+
+struct test_stat {
+	int min;
+	int max;
+	int save_cnt;
+	int real_cnt;
+	u64 sum;
+	char *name;
+	unsigned long addr;
+	unsigned long addrs[MAX_NUM_OF_RECORDS];
+};
+
+struct test_item {
+	char *name;
+	unsigned long addr;
+};
+
+#define ITEM_FUNC(s)				\
+	{					\
+		.name = #s,			\
+		.addr = (unsigned long)s,	\
+	}
+
+#define ITEM_DATA(s)				\
+	{					\
+		.name = #s,			\
+		.addr = (unsigned long)&s,	\
+	}
+
+static int test_var_bss_static;
+static int test_var_data_static = 1;
+int test_var_bss;
+int test_var_data = 1;
+
+static int test_func_static(void)
+{
+	test_var_bss_static++;
+	test_var_data_static++;
+
+	return 0;
+}
+
+int test_func(void)
+{
+	return test_func_static();
+}
+
+__weak int test_func_weak(void)
+{
+	test_var_bss++;
+	test_var_data++;
+	return 0;
+}
+
+static struct test_item test_items[] = {
+	ITEM_FUNC(test_func_static),
+	ITEM_FUNC(test_func),
+	ITEM_FUNC(test_func_weak),
+	ITEM_FUNC(vmalloc),
+	ITEM_FUNC(vfree),
+#ifdef CONFIG_KALLSYMS_ALL
+	ITEM_DATA(test_var_bss_static),
+	ITEM_DATA(test_var_data_static),
+	ITEM_DATA(test_var_bss),
+	ITEM_DATA(test_var_data),
+	ITEM_DATA(vmap_area_list),
+#endif
+};
+
+static char stub_name[KSYM_NAME_LEN];
+
+static int stat_symbol_len(void *data, const char *name,
+			   struct module *mod, unsigned long addr)
+{
+	*(u32 *)data += strlen(name);
+
+	return 0;
+}
+
+static void test_kallsyms_compression_ratio(void)
+{
+	int i;
+	const u8 *name;
+	u32 pos;
+	u32 ratio, total_size, total_len = 0;
+
+	kallsyms_on_each_symbol(stat_symbol_len, &total_len);
+
+	/*
+	 * A symbol name cannot start with a number. This stub name helps us
+	 * traverse the entire symbol table without finding a match. It's used
+	 * for subsequent performance tests, and its length is the average
+	 * length of all symbol names.
+	 */
+	memset(stub_name, '4', sizeof(stub_name));
+	pos = total_len / kallsyms_num_syms;
+	stub_name[pos] = 0;
+
+	pos = kallsyms_num_syms - 1;
+	name = &kallsyms_names[kallsyms_markers[pos >> 8]];
+	for (i = 0; i <= (pos & 0xff); i++)
+		name = name + (*name) + 1;
+
+	/*
+	 * 1. The length fields is not counted
+	 * 2. The memory occupied by array kallsyms_token_table[] and
+	 *    kallsyms_token_index[] needs to be counted.
+	 */
+	total_size = (name - kallsyms_names) - kallsyms_num_syms;
+	pos = kallsyms_token_index[0xff];
+	total_size += pos + strlen(&kallsyms_token_table[pos]) + 1;
+	total_size += 0x100 * sizeof(u16);
+
+	pr_info(" ---------------------------------------------------------\n");
+	pr_info("| nr_symbols | compressed size | original size | ratio(%%) |\n");
+	pr_info("|---------------------------------------------------------|\n");
+	ratio = 10000ULL * total_size / total_len;
+	pr_info("| %10d |    %10d   |   %10d  |  %2d.%-2d   |\n",
+		kallsyms_num_syms, total_size, total_len, ratio / 100, ratio % 100);
+	pr_info(" ---------------------------------------------------------\n");
+}
+
+static int lookup_name(void *data, const char *name, struct module *mod, unsigned long addr)
+{
+	u64 t0, t1, t;
+	unsigned long flags;
+	struct test_stat *stat = (struct test_stat *)data;
+
+	local_irq_save(flags);
+	t0 = sched_clock();
+	(void)kallsyms_lookup_name(name);
+	t1 = sched_clock();
+	local_irq_restore(flags);
+
+	t = t1 - t0;
+	if (t < stat->min)
+		stat->min = t;
+
+	if (t > stat->max)
+		stat->max = t;
+
+	stat->real_cnt++;
+	stat->sum += t;
+
+	return 0;
+}
+
+static void test_perf_kallsyms_lookup_name(void)
+{
+	struct test_stat stat;
+
+	memset(&stat, 0, sizeof(stat));
+	stat.min = INT_MAX;
+	kallsyms_on_each_symbol(lookup_name, &stat);
+	pr_info("kallsyms_lookup_name() looked up %d symbols\n", stat.real_cnt);
+	pr_info("The time spent on each symbol is (ns): min=%d, max=%d, avg=%lld\n",
+		stat.min, stat.max, stat.sum / stat.real_cnt);
+}
+
+static int find_symbol(void *data, const char *name,
+		       struct module *mod, unsigned long addr)
+{
+	struct test_stat *stat = (struct test_stat *)data;
+
+	if (strcmp(name, stat->name) == 0) {
+		stat->real_cnt++;
+		stat->addr = addr;
+
+		if (stat->save_cnt < MAX_NUM_OF_RECORDS) {
+			stat->addrs[stat->save_cnt] = addr;
+			stat->save_cnt++;
+		}
+
+		if (stat->real_cnt == stat->max)
+			return 1;
+	}
+
+	return 0;
+}
+
+static void test_perf_kallsyms_on_each_symbol(void)
+{
+	u64 t0, t1;
+	unsigned long flags;
+	struct test_stat stat;
+
+	memset(&stat, 0, sizeof(stat));
+	stat.max = INT_MAX;
+	stat.name = stub_name;
+	local_irq_save(flags);
+	t0 = sched_clock();
+	kallsyms_on_each_symbol(find_symbol, &stat);
+	t1 = sched_clock();
+	local_irq_restore(flags);
+	pr_info("kallsyms_on_each_symbol() traverse all: %lld ns\n", t1 - t0);
+}
+
+static int match_symbol(void *data, unsigned long addr)
+{
+	struct test_stat *stat = (struct test_stat *)data;
+
+	stat->real_cnt++;
+	stat->addr = addr;
+
+	if (stat->save_cnt < MAX_NUM_OF_RECORDS) {
+		stat->addrs[stat->save_cnt] = addr;
+		stat->save_cnt++;
+	}
+
+	if (stat->real_cnt == stat->max)
+		return 1;
+
+	return 0;
+}
+
+static void test_perf_kallsyms_on_each_match_symbol(void)
+{
+	u64 t0, t1;
+	unsigned long flags;
+	struct test_stat stat;
+
+	memset(&stat, 0, sizeof(stat));
+	stat.max = INT_MAX;
+	stat.name = stub_name;
+	local_irq_save(flags);
+	t0 = sched_clock();
+	kallsyms_on_each_match_symbol(match_symbol, stat.name, &stat);
+	t1 = sched_clock();
+	local_irq_restore(flags);
+	pr_info("kallsyms_on_each_match_symbol() traverse all: %lld ns\n", t1 - t0);
+}
+
+static int test_kallsyms_basic_function(void)
+{
+	int i, j, ret;
+	int next = 0, nr_failed = 0;
+	char *prefix;
+	unsigned short rand;
+	unsigned long addr;
+	char namebuf[KSYM_NAME_LEN];
+	struct test_stat stat, stat1, stat2;
+
+	prefix = "kallsyms_lookup_name() for";
+	for (i = 0; i < ARRAY_SIZE(test_items); i++) {
+		addr = kallsyms_lookup_name(test_items[i].name);
+		if (addr != test_items[i].addr) {
+			nr_failed++;
+			pr_info("%s %s failed: addr=%lx, expect %lx\n",
+				prefix, test_items[i].name, addr, test_items[i].addr);
+		}
+	}
+
+	prefix = "kallsyms_on_each_symbol() for";
+	for (i = 0; i < ARRAY_SIZE(test_items); i++) {
+		memset(&stat, 0, sizeof(stat));
+		stat.max = INT_MAX;
+		stat.name = test_items[i].name;
+		kallsyms_on_each_symbol(find_symbol, &stat);
+		if (stat.addr != test_items[i].addr || stat.real_cnt != 1) {
+			nr_failed++;
+			pr_info("%s %s failed: count=%d, addr=%lx, expect %lx\n",
+				prefix, test_items[i].name,
+				stat.real_cnt, stat.addr, test_items[i].addr);
+		}
+	}
+
+	prefix = "kallsyms_on_each_match_symbol() for";
+	for (i = 0; i < ARRAY_SIZE(test_items); i++) {
+		memset(&stat, 0, sizeof(stat));
+		stat.max = INT_MAX;
+		stat.name = test_items[i].name;
+		kallsyms_on_each_match_symbol(match_symbol, test_items[i].name, &stat);
+		if (stat.addr != test_items[i].addr || stat.real_cnt != 1) {
+			nr_failed++;
+			pr_info("%s %s failed: count=%d, addr=%lx, expect %lx\n",
+				prefix, test_items[i].name,
+				stat.real_cnt, stat.addr, test_items[i].addr);
+		}
+	}
+
+	if (nr_failed)
+		return -EFAULT;
+
+	for (i = 0; i < kallsyms_num_syms; i++) {
+		addr = kallsyms_sym_address(i);
+		if (!is_ksym_addr(addr))
+			continue;
+
+		ret = lookup_symbol_name(addr, namebuf);
+		if (unlikely(ret)) {
+			namebuf[0] = 0;
+			goto failed;
+		}
+
+		stat.addr = kallsyms_lookup_name(namebuf);
+
+		memset(&stat1, 0, sizeof(stat1));
+		stat1.max = INT_MAX;
+		kallsyms_on_each_match_symbol(match_symbol, namebuf, &stat1);
+
+		/*
+		 * kallsyms_on_each_symbol() is too slow, randomly select some
+		 * symbols for test.
+		 */
+		if (i >= next) {
+			memset(&stat2, 0, sizeof(stat2));
+			stat2.max = INT_MAX;
+			stat2.name = namebuf;
+			kallsyms_on_each_symbol(find_symbol, &stat2);
+
+			/*
+			 * kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol()
+			 * need to get the same traversal result.
+			 */
+			if (stat1.addr != stat2.addr ||
+			    stat1.real_cnt != stat2.real_cnt ||
+			    memcmp(stat1.addrs, stat2.addrs,
+				   stat1.save_cnt * sizeof(stat1.addrs[0])))
+				goto failed;
+
+			/*
+			 * The average of random increments is 128, that is, one of
+			 * them is tested every 128 symbols.
+			 */
+			get_random_bytes(&rand, sizeof(rand));
+			next = i + (rand & 0xff) + 1;
+		}
+
+		/* Need to be found at least once */
+		if (!stat1.real_cnt)
+			goto failed;
+
+		/*
+		 * kallsyms_lookup_name() returns the address of the first
+		 * symbol found and cannot be NULL.
+		 */
+		if (!stat.addr || stat.addr != stat1.addrs[0])
+			goto failed;
+
+		/*
+		 * If the addresses of all matching symbols are recorded, the
+		 * target address needs to be exist.
+		 */
+		if (stat1.real_cnt <= MAX_NUM_OF_RECORDS) {
+			for (j = 0; j < stat1.save_cnt; j++) {
+				if (stat1.addrs[j] == addr)
+					break;
+			}
+
+			if (j == stat1.save_cnt)
+				goto failed;
+		}
+	}
+
+	return 0;
+
+failed:
+	pr_info("Test for %dth symbol failed: (%s) addr=%lx", i, namebuf, addr);
+	return -EFAULT;
+}
+
+static int test_entry(void *p)
+{
+	int ret;
+
+	do {
+		schedule_timeout(5 * HZ);
+	} while (system_state != SYSTEM_RUNNING);
+
+	pr_info("start\n");
+	ret = test_kallsyms_basic_function();
+	if (ret) {
+		pr_info("abort\n");
+		return ret;
+	}
+
+	test_kallsyms_compression_ratio();
+	test_perf_kallsyms_lookup_name();
+	test_perf_kallsyms_on_each_symbol();
+	test_perf_kallsyms_on_each_match_symbol();
+	pr_info("finish\n");
+
+	return 0;
+}
+
+static int __init kallsyms_test_init(void)
+{
+	struct task_struct *t;
+
+	t = kthread_create(test_entry, NULL, "kallsyms_test");
+	if (IS_ERR(t)) {
+		pr_info("Create kallsyms selftest task failed\n");
+		return PTR_ERR(t);
+	}
+	kthread_bind(t, 0);
+	wake_up_process(t);
+
+	return 0;
+}
+late_initcall(kallsyms_test_init);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table()
  2022-09-20  7:13 ` [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
@ 2022-09-21  7:47   ` Petr Mladek
  0 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2022-09-21  7:47 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Tue 2022-09-20 15:13:10, Zhen Lei wrote:
> Except for the function build_initial_tok_table(), no token abbreviation
> is used elsewhere.
> 
> $ cat scripts/kallsyms.c | grep tok | wc -l
> 33
> $ cat scripts/kallsyms.c | grep token | wc -l
> 31
> 
> Here, it would be clearer to use the full name.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed
  2022-09-20  7:13 ` [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed Zhen Lei
@ 2022-09-21  8:00   ` Petr Mladek
  2022-09-21  8:31     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2022-09-21  8:00 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Tue 2022-09-20 15:13:11, Zhen Lei wrote:
> For a symbol, there may be more than one place that can be merged. For
> example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
> And we're only compressing the first combination at the moment.

Really?

> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 8caccc8f4a23703..3319d9f38d7a5f2 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
>  	unsigned char *p1, *p2;
>  
>  	for (i = 0; i < table_cnt; i++) {
> -
> +retry:
>  		len = table[i]->len;
>  		p1 = table[i]->sym;
>  
> @@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
>  
>  		/* increase the counts for this symbol's new tokens */
>  		learn_symbol(table[i]->sym, len);
> +
> +		/* May be more than one place that can be merged, try again */
> +		goto retry;
>  	}
>  }

My understanding is that the code already tries to find the same
token several times. Here are the important parts of the existing
code:

static void compress_symbols(const unsigned char *str, int idx)
{

		p2 = find_token(p1, len, str);

		do {
			/* replace the found token with idx */
			*p2 = idx;
			[...]

			/* find the token on the symbol */
			p2 = find_token(p1, size, str);

		} while (p2);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed
  2022-09-21  8:00   ` Petr Mladek
@ 2022-09-21  8:31     ` Leizhen (ThunderTown)
  2022-09-21 12:46       ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-21  8:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/21 16:00, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:11, Zhen Lei wrote:
>> For a symbol, there may be more than one place that can be merged. For
>> example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
>> And we're only compressing the first combination at the moment.
> 
> Really?

Yes, there are about 200 such functions.

> 
>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>> index 8caccc8f4a23703..3319d9f38d7a5f2 100644
>> --- a/scripts/kallsyms.c
>> +++ b/scripts/kallsyms.c
>> @@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
>>  	unsigned char *p1, *p2;
>>  
>>  	for (i = 0; i < table_cnt; i++) {
>> -
>> +retry:
>>  		len = table[i]->len;
>>  		p1 = table[i]->sym;
>>  
>> @@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
>>  
>>  		/* increase the counts for this symbol's new tokens */
>>  		learn_symbol(table[i]->sym, len);
>> +
>> +		/* May be more than one place that can be merged, try again */
>> +		goto retry;
>>  	}
>>  }
> 
> My understanding is that the code already tries to find the same
> token several times. Here are the important parts of the existing
> code:
> 
> static void compress_symbols(const unsigned char *str, int idx)
> {
> 
> 		p2 = find_token(p1, len, str);
> 
> 		do {
> 			/* replace the found token with idx */
> 			*p2 = idx;
> 			[...]
> 
> 			/* find the token on the symbol */
> 			p2 = find_token(p1, size, str);

Oh, yes, it retries. Let me reanalyze it. However, the problem is
real, and there may be a problem somewhere in the loop.

> 
> 		} while (p2);
> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types
  2022-09-20  7:13 ` [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types Zhen Lei
@ 2022-09-21  9:00   ` Petr Mladek
  2022-09-21 13:13     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2022-09-21  9:00 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Tue 2022-09-20 15:13:12, Zhen Lei wrote:
> Currently, to search for a symbol, we need to expand the symbols in
> 'kallsyms_names' one by one, and then use the expanded string for
> comparison. Because we do not know the symbol type, and the symbol type
> may be combined with the following characters to form a token.
>
> So if we don't compress the symbol type, we can first compress the
> searched symbol and then make a quick comparison based on the compressed
> length and content. In this way, for entries with mismatched lengths,
> there is no need to expand and compare strings. And for those matching
> lengths, there's no need to expand the symbol. This saves a lot of time.
> According to my test results, the average performance of
> kallsyms_lookup_name() can be improved by 20 to 30 times.
>
> Of course, because the symbol type is forcibly not compressed, the
> compression rate also decreases. Here are the test results with
> defconfig:
> 
> arm64: <<<<<<
>         ---------------------------------------------------------------
>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>         -----|---------------------------------------------------------|
> Before |  Y  |     174094 |       1884938   |      3750653  |  50.25   |
> After  |  Y  |     174099 |       1960154   |      3750756  |  52.26   |
> Before |  N  |      61744 |        725507   |      1222737  |  59.33   |
> After  |  N  |      61747 |        745733   |      1222801  |  60.98   |
>         ---------------------------------------------------------------
> The memory overhead is increased by:
>   73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
>   19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.
> 
> x86: <<<<<<<<
>        ---------------------------------------------------------------
>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>         -----|---------------------------------------------------------|
> Before |  Y  |     131415 |       1697542   |      3161216  |  53.69   |
> After  |  Y  |     131540 |       1747769   |      3163933  |  55.24   |
> Before |  N  |      60695 |        737627   |      1283046  |  57.49   |
> After  |  N  |      60699 |        754797   |      1283149  |  58.82   |
>         ---------------------------------------------------------------
> The memory overhead is increased by:
>   49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
>   16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.
>
> This additional memory overhead is worth it compared to the performance
> improvement, I think.

I agree. The speedup mentioned in the followup patches looks big.
I just suggest to do this change a cleaner way, see below.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  scripts/kallsyms.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -61,6 +61,15 @@ static int all_symbols;
>  static int absolute_percpu;
>  static int base_relative;
>  
> +/*
> + * Each entry in the symbol table consists of the symbol type and the symbol
> + * itself. To optimize the performance of finding or traversing symbols in
> + * kernel, do not compress the symbol type. In this way, when looking for a
> + * symbol of unknown type, we can first compress the searched symbol and then
> + * make a quick comparison based on the compressed length and content.
> + */
> +static int sym_start_idx = 1;
> +
>  static int token_profit[0x10000];
>  
>  /* the table that holds the result of the compression */
> @@ -511,7 +520,7 @@ static void learn_symbol(const unsigned char *symbol, int len)
>  {
>  	int i;
>  
> -	for (i = 0; i < len - 1; i++)
> +	for (i = sym_start_idx; i < len - 1; i++)

It creates yet another twists in scripts/kallsyms.c. read_symbol()
explicitely adds the type as the first character so that it can be
compressed. And this patch adds a hack to skip it.

Let's do it a clean way and store the type serarately:

struct sym_entry {
	unsigned long long addr;
	unsigned int len;
	unsigned int start_pos;
	unsigned int percpu_absolute;
	unsigned char type;
	unsigned char name[];
};

static struct sym_entry *read_symbol(FILE *in)
{
[...]
	name_len = strlen(name);

	sym = malloc(sizeof(*sym) + name_len);
	if (!sym) {
		fprintf(stderr, "kallsyms failure: "
			"unable to allocate required amount of memory\n");
		exit(EXIT_FAILURE);
	}
	sym->addr = addr;
	sym->len = name_len;
	sym->type = type;
	strcpy(sys->name, name);
	sym->percpu_absolute = 0;
}

It would allow to remove the tricky:

	static char *sym_name(const struct sym_entry *s)
	{
		return (char *)s->sym + 1;
	}

and access s->name directly.

OK, the problem is how to store the type. The clean way would be
to put it into a separate section, for example:

static void write_src(void)
{
[...]
	output_label("kallsyms_types");
	off = 0;
	for (i = 0; i < table_cnt; i++) {
		printf("\t.byte 0x%02x\n", table[i]->type);
	}
	printf("\n");
[...]
}

It would probably increase the size even more. Another problem
is that it would need changes in the crash dump tools, see:

static int __init crash_save_vmcoreinfo_init(void)
{
[...]
	VMCOREINFO_SYMBOL(kallsyms_names);
[...]
}

A solution would be to store it the old way:

static void write_src(void)
{
[...]
	output_label("kallsyms_names");
	off = 0;
	for (i = 0; i < table_cnt; i++) {
		if ((i & 0xFF) == 0)
			markers[i >> 8] = off;

		/*
		 * Store the symbol type togerher with symbol name.
		 * It helps to reduce the size.
		 */
		printf("\t.byte 0x%02x", table[i]->len + 1);
		printf(", 0x%02x", table[i]->type);
		for (k = 0; k < table[i]->len; k++)
			printf(", 0x%02x", table[i]->sym[k]);
		printf("\n");

		/* symbol name lenght + type + "\n" */
		off += table[i]->len + 2;
	}
	printf("\n");
[...]
}

The result would be the same as with your patch. But the code would be
even cleaner than before.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed
  2022-09-21  8:31     ` Leizhen (ThunderTown)
@ 2022-09-21 12:46       ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-21 12:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/21 16:31, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/21 16:00, Petr Mladek wrote:
>> On Tue 2022-09-20 15:13:11, Zhen Lei wrote:
>>> For a symbol, there may be more than one place that can be merged. For
>>> example: nfs_fs_proc_net_init, there are two "f"+"s_" combinations.
>>> And we're only compressing the first combination at the moment.
>>
>> Really?
> 
> Yes, there are about 200 such functions.
> 
>>
>>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>>> index 8caccc8f4a23703..3319d9f38d7a5f2 100644
>>> --- a/scripts/kallsyms.c
>>> +++ b/scripts/kallsyms.c
>>> @@ -553,7 +553,7 @@ static void compress_symbols(const unsigned char *str, int idx)
>>>  	unsigned char *p1, *p2;
>>>  
>>>  	for (i = 0; i < table_cnt; i++) {
>>> -
>>> +retry:
>>>  		len = table[i]->len;
>>>  		p1 = table[i]->sym;
>>>  
>>> @@ -585,6 +585,9 @@ static void compress_symbols(const unsigned char *str, int idx)
>>>  
>>>  		/* increase the counts for this symbol's new tokens */
>>>  		learn_symbol(table[i]->sym, len);
>>> +
>>> +		/* May be more than one place that can be merged, try again */
>>> +		goto retry;
>>>  	}
>>>  }
>>
>> My understanding is that the code already tries to find the same
>> token several times. Here are the important parts of the existing
>> code:
>>
>> static void compress_symbols(const unsigned char *str, int idx)
>> {
>>
>> 		p2 = find_token(p1, len, str);
>>
>> 		do {
>> 			/* replace the found token with idx */
>> 			*p2 = idx;
>> 			[...]
>>
>> 			/* find the token on the symbol */
>> 			p2 = find_token(p1, size, str);
> 
> Oh, yes, it retries. Let me reanalyze it. However, the problem is
> real, and there may be a problem somewhere in the loop.

Hi, Petr:
  Thanks. I found that it's my fault. The first round skip the type
character. But the next round will incorrectly skip one character,
so for nfs_fs_proc_net_init, the next round start from s, and using
            ^
the proposed "unsigned char type" in your next reply should solve
the problem. Thank you very much.

-	for (i = 0; i < len - 1; i++)
+	for (i = sym_start_idx; i < len - 1; i++)

> 
>>
>> 		} while (p2);
>>
>> Best Regards,
>> Petr
>> .
>>
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types
  2022-09-21  9:00   ` Petr Mladek
@ 2022-09-21 13:13     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-21 13:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/21 17:00, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:12, Zhen Lei wrote:
>> Currently, to search for a symbol, we need to expand the symbols in
>> 'kallsyms_names' one by one, and then use the expanded string for
>> comparison. Because we do not know the symbol type, and the symbol type
>> may be combined with the following characters to form a token.
>>
>> So if we don't compress the symbol type, we can first compress the
>> searched symbol and then make a quick comparison based on the compressed
>> length and content. In this way, for entries with mismatched lengths,
>> there is no need to expand and compare strings. And for those matching
>> lengths, there's no need to expand the symbol. This saves a lot of time.
>> According to my test results, the average performance of
>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>
>> Of course, because the symbol type is forcibly not compressed, the
>> compression rate also decreases. Here are the test results with
>> defconfig:
>>
>> arm64: <<<<<<
>>         ---------------------------------------------------------------
>>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>>         -----|---------------------------------------------------------|
>> Before |  Y  |     174094 |       1884938   |      3750653  |  50.25   |
>> After  |  Y  |     174099 |       1960154   |      3750756  |  52.26   |
>> Before |  N  |      61744 |        725507   |      1222737  |  59.33   |
>> After  |  N  |      61747 |        745733   |      1222801  |  60.98   |
>>         ---------------------------------------------------------------
>> The memory overhead is increased by:
>>   73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
>>   19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.
>>
>> x86: <<<<<<<<
>>        ---------------------------------------------------------------
>>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>>         -----|---------------------------------------------------------|
>> Before |  Y  |     131415 |       1697542   |      3161216  |  53.69   |
>> After  |  Y  |     131540 |       1747769   |      3163933  |  55.24   |
>> Before |  N  |      60695 |        737627   |      1283046  |  57.49   |
>> After  |  N  |      60699 |        754797   |      1283149  |  58.82   |
>>         ---------------------------------------------------------------
>> The memory overhead is increased by:
>>   49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
>>   16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.
>>
>> This additional memory overhead is worth it compared to the performance
>> improvement, I think.
> 
> I agree. The speedup mentioned in the followup patches looks big.
> I just suggest to do this change a cleaner way, see below.
> 
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  scripts/kallsyms.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>> index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644
>> --- a/scripts/kallsyms.c
>> +++ b/scripts/kallsyms.c
>> @@ -61,6 +61,15 @@ static int all_symbols;
>>  static int absolute_percpu;
>>  static int base_relative;
>>  
>> +/*
>> + * Each entry in the symbol table consists of the symbol type and the symbol
>> + * itself. To optimize the performance of finding or traversing symbols in
>> + * kernel, do not compress the symbol type. In this way, when looking for a
>> + * symbol of unknown type, we can first compress the searched symbol and then
>> + * make a quick comparison based on the compressed length and content.
>> + */
>> +static int sym_start_idx = 1;
>> +
>>  static int token_profit[0x10000];
>>  
>>  /* the table that holds the result of the compression */
>> @@ -511,7 +520,7 @@ static void learn_symbol(const unsigned char *symbol, int len)
>>  {
>>  	int i;
>>  
>> -	for (i = 0; i < len - 1; i++)
>> +	for (i = sym_start_idx; i < len - 1; i++)
> 
> It creates yet another twists in scripts/kallsyms.c. read_symbol()
> explicitely adds the type as the first character so that it can be
> compressed. And this patch adds a hack to skip it.
> 
> Let's do it a clean way and store the type serarately:
> 
> struct sym_entry {
> 	unsigned long long addr;
> 	unsigned int len;
> 	unsigned int start_pos;
> 	unsigned int percpu_absolute;
> 	unsigned char type;

Yes, it's very necessary. Thanks.

> 	unsigned char name[];

Yes, using "name[]" will be clearer than using "sym[]"

> };
> 
> static struct sym_entry *read_symbol(FILE *in)
> {
> [...]
> 	name_len = strlen(name);
> 
> 	sym = malloc(sizeof(*sym) + name_len);
> 	if (!sym) {
> 		fprintf(stderr, "kallsyms failure: "
> 			"unable to allocate required amount of memory\n");
> 		exit(EXIT_FAILURE);
> 	}
> 	sym->addr = addr;
> 	sym->len = name_len;
> 	sym->type = type;
> 	strcpy(sys->name, name);
> 	sym->percpu_absolute = 0;
> }
> 
> It would allow to remove the tricky:
> 
> 	static char *sym_name(const struct sym_entry *s)
> 	{
> 		return (char *)s->sym + 1;
> 	}
> 
> and access s->name directly.

OK

> 
> OK, the problem is how to store the type. The clean way would be
> to put it into a separate section, for example:
> 
> static void write_src(void)
> {
> [...]
> 	output_label("kallsyms_types");
> 	off = 0;
> 	for (i = 0; i < table_cnt; i++) {
> 		printf("\t.byte 0x%02x\n", table[i]->type);
> 	}
> 	printf("\n");
> [...]
> }
> 
> It would probably increase the size even more. Another problem
> is that it would need changes in the crash dump tools, see:
> 
> static int __init crash_save_vmcoreinfo_init(void)
> {
> [...]
> 	VMCOREINFO_SYMBOL(kallsyms_names);
> [...]
> }
> 
> A solution would be to store it the old way:

I'll use this compatibility mode first. Because I'm going to compress
the type later.

> 
> static void write_src(void)
> {
> [...]
> 	output_label("kallsyms_names");
> 	off = 0;
> 	for (i = 0; i < table_cnt; i++) {
> 		if ((i & 0xFF) == 0)
> 			markers[i >> 8] = off;
> 
> 		/*
> 		 * Store the symbol type togerher with symbol name.
> 		 * It helps to reduce the size.
> 		 */
> 		printf("\t.byte 0x%02x", table[i]->len + 1);
> 		printf(", 0x%02x", table[i]->type);
> 		for (k = 0; k < table[i]->len; k++)
> 			printf(", 0x%02x", table[i]->sym[k]);
> 		printf("\n");
> 
> 		/* symbol name lenght + type + "\n" */
> 		off += table[i]->len + 2;
> 	}
> 	printf("\n");
> [...]
> }
> 
> The result would be the same as with your patch. But the code would be
> even cleaner than before.

Yes,Thank you for your valuable comments.

> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-20  7:13 ` [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
@ 2022-09-21 15:25   ` Petr Mladek
  2022-09-22  2:15     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2022-09-21 15:25 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> Currently, to search for a symbol, we need to expand the symbols in
> 'kallsyms_names' one by one, and then use the expanded string for
> comparison. This process can be optimized.
> 
> And now scripts/kallsyms no longer compresses the symbol types, each
> symbol type always occupies one byte. So we can first compress the
> searched symbol and then make a quick comparison based on the compressed
> length and content. In this way, for entries with mismatched lengths,
> there is no need to expand and compare strings. And for those matching
> lengths, there's no need to expand the symbol. This saves a lot of time.
> According to my test results, the average performance of
> kallsyms_lookup_name() can be improved by 20 to 30 times.
> 
> The pseudo code of the test case is as follows:
> static int stat_find_name(...)
> {
> 	start = sched_clock();
> 	(void)kallsyms_lookup_name(name);
> 	end = sched_clock();
> 	//Update min, max, cnt, sum
> }
> 
> /*
>  * Traverse all symbols in sequence and collect statistics on the time
>  * taken by kallsyms_lookup_name() to lookup each symbol.
>  */
> kallsyms_on_each_symbol(stat_find_name, NULL);
> 
> The test results are as follows (twice):
> After : min=5250, max=  726560, avg= 302132
> After : min=5320, max=  726850, avg= 301978
> Before: min=170,  max=15949190, avg=7553906
> Before: min=160,  max=15877280, avg=7517784
> 
> The average time consumed is only 4.01% and the maximum time consumed is
> only 4.57% of the time consumed before optimization.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>  	return off;
>  }
>  
> +static int kallsyms_name_to_tokens(const char *name, char *buf)

This is not safe API. It is always needed to pass the size of the
buffer.

Also it should be called "compress". "token" is just an implementation
detail.

I would do:

static int kallsyms_compress_symbol_name(const char *name,
					 char *buf, size_t size)


> +{
> +	int i, j, k, n;
> +	int len, token_len;
> +	const char *token;
> +	unsigned char token_idx[KSYM_NAME_LEN];
> +	unsigned char token_bak[KSYM_NAME_LEN];

Why do we need two buffers? It should be possible to compress the name
in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.

> +
> +	/*
> +	 * n, number of tokens in the string name.
> +	 * token_idx[i], the start index of the ith token.
> +	 * token_idx[n] is used to calculate the length of the last token.
> +	 */
> +	n = strlen(name);
> +	if (n >= KSYM_NAME_LEN) {
> +		buf[0] = 0;
> +		return 0;
> +	}
> +	for (i = 0; i <= n; i++)
> +		token_idx[i] = (unsigned char)i;
> +
> +	/*
> +	 * For tokens whose token_len >= 2, a larger index value indicates
> +	 * a higher occurrence frequency. See scripts/kallsyms.c
> +	 */
> +	for (i = 255; i >= 0; i--) {
> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
> +		token_len = strlen(token);
> +		if (token_len <= 1)
> +			continue;
> +
> +		/*
> +		 * Find and merge two tokens into one.
> +		 *
> +		 *                |<-- new_token -->|
> +		 *                | token1 | token2 |
> +		 * token_idx[]:   j       j+1      j+2
> +		 */
> +		for (j = 0; j < n - 1; j++) {
> +			len = token_idx[j + 2] - token_idx[j];
> +			if (len == token_len &&
> +			    !strncmp(name + token_idx[j], token, len)) {
> +				token_bak[token_idx[j]] = (unsigned char)i;
> +				for (k = j + 1; k < n; k++)
> +					token_idx[k] = token_idx[k + 1];
> +				n--;
> +			}
> +		}
> +	}
> +
> +	for (j = 0; j < n; j++) {
> +		len = token_idx[j + 1] - token_idx[j];
> +		if (len <= 1) {
> +			buf[j] = name[token_idx[j]];
> +			continue;
> +		}
> +
> +		buf[j] = token_bak[token_idx[j]];

Maybe, I do not understand the compression format correctly but
this code looks too complicated. Honestly, I even did not try to
understand it.

My understanding is the we just need to find all tokens and
replace them with index.

It should be even easier than compress_symbols() in scripts/callsyms.c.
The token_table already exists and we do not need to handle the token_profit...

The following looks more strigtforward (not even compile tested):

	ssize_t len, size;

	len = strscpy(buf, symname, size);
	if (WARN_ON_ONCE(len < 0))
		return -EINVAL;

	/* the tokens with higher index are used first */
	for (idx = 255; idx >= 0; idx--) {
		token =	&kallsyms_token_table[kallsyms_token_index[i]];
		token_len = strlen(token);

		p1 = buf;
		/* length of the remaining symname including the trailing '\0' */
		remaining = len + 1;

		/* find the token in the symbol name */
		p2 = strstr(token, p1);

		while (p2) {
			/* replace token with index */
			*p2 = idx;
			remaining -= ((p2 - p1) + token_len);
			memmove(p2 + 1, p2 + token_len, remaining);
			len -= (token_len - 1);
			p1 = p2;

			/* find the token in the rest of the symbol name */
			p2 = strstr(token, p1);
		}
	}

	return len;

> +	}
> +	buf[n] = 0;
> +
> +	return n;
> +}
> +
>  /*
>   * Get symbol type information. This is encoded as a single char at the
>   * beginning of the symbol name.
> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>  	char namebuf[KSYM_NAME_LEN];
>  	unsigned long i;
>  	unsigned int off;
> +	int len;
>  
>  	/* Skip the search for empty string. */
>  	if (!*name)
>  		return 0;
>  
> +	len = kallsyms_name_to_tokens(name, namebuf);
> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
> +		if (kallsyms_names[off] == len + 1 &&
> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
> +			return kallsyms_sym_address(i);
> +
> +		off += kallsyms_names[off] + 1;

These complicated checks are hard to review. The following looks much
more readable to me:

	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
		/* the stored symbol name is prefixed by symbol type */
		name_len = kallsyms_names[off] - 1;
		name = &kallsyms_names[off + 2];
		off += name_len + 2;

		if (name_len != len)
			continue;

		if (!memcmp(name, namebuf, len))
			return kallsyms_sym_address(i);
	}


> +	}
> +
>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>  
> -		if (strcmp(namebuf, name) == 0)
> -			return kallsyms_sym_address(i);
> -
>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>  			return kallsyms_sym_address(i);

Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
It might actually slow down the search because we would need to use
both fast and slow search?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol()
  2022-09-20  7:13 ` [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
@ 2022-09-21 15:30   ` Petr Mladek
  2022-09-22  2:16     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 25+ messages in thread
From: Petr Mladek @ 2022-09-21 15:30 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Tue 2022-09-20 15:13:14, Zhen Lei wrote:
> Function kallsyms_on_each_symbol() traverses all symbols and submits each
> symbol to the hook 'fn' for judgment and processing. For some cases, the
> hook actually only handles the matched symbol, such as livepatch.
> 
> So that, we can first compress the name being looked up and then use
> it for comparison when traversing 'kallsyms_names', this greatly reduces
> the time consumed by traversing.
> 
> The pseudo code of the test case is as follows:
> static int tst_find(void *data, const char *name,
> 		    struct module *mod, unsigned long addr)
> {
> 	if (strcmp(name, "vmap") == 0)
> 		*(unsigned long *)data = addr;
>         return 0;
> }
> 
> static int tst_match(void *data, unsigned long addr)
> {
>         *(unsigned long *)data = addr;
>         return 0;
> }
> 
> start = sched_clock();
> kallsyms_on_each_match_symbol(tst_match, "vmap", &addr);
> end = sched_clock();
> 
> start = sched_clock();
> kallsyms_on_each_symbol(tst_find, &addr);
> end = sched_clock();
> 
> The test results are as follows (twice):
> kallsyms_on_each_match_symbol:   557400,   583900
> kallsyms_on_each_symbol      : 16659500, 16113950
> 
> kallsyms_on_each_match_symbol() consumes only 3.48% of
> kallsyms_on_each_symbol()'s time.
> 
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -305,6 +305,31 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
>  	return 0;
>  }
>  
> +int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
> +				  const char *name, void *data)
> +{
> +	unsigned int i, off;
> +	int len, ret;
> +	char namebuf[KSYM_NAME_LEN];
> +
> +	len = kallsyms_name_to_tokens(name, namebuf);
> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
> +		if ((i & 0xfff) == 0)
> +			cond_resched();
> +
> +		if ((kallsyms_names[off] == len + 1) &&
> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len)) {
> +			ret = fn(data, kallsyms_sym_address(i));
> +			if (ret != 0)
> +				return ret;
> +			cond_resched();
> +		}
> +		off += kallsyms_names[off] + 1;

Similar tricky code is used also in kallsyms_lookup_name(). Please,
avoid code duplication and put this into a helper funtion.

Best Regards,
Petr

> +	}
> +
> +	return 0;
> +}
> +
>  static unsigned long get_symbol_pos(unsigned long addr,
>  				    unsigned long *symbolsize,
>  				    unsigned long *offset)
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-21 15:25   ` Petr Mladek
@ 2022-09-22  2:15     ` Leizhen (ThunderTown)
  2022-09-22  7:02       ` Petr Mladek
  2022-09-22  7:14       ` Leizhen (ThunderTown)
  0 siblings, 2 replies; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-22  2:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/21 23:25, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>> Currently, to search for a symbol, we need to expand the symbols in
>> 'kallsyms_names' one by one, and then use the expanded string for
>> comparison. This process can be optimized.
>>
>> And now scripts/kallsyms no longer compresses the symbol types, each
>> symbol type always occupies one byte. So we can first compress the
>> searched symbol and then make a quick comparison based on the compressed
>> length and content. In this way, for entries with mismatched lengths,
>> there is no need to expand and compare strings. And for those matching
>> lengths, there's no need to expand the symbol. This saves a lot of time.
>> According to my test results, the average performance of
>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>
>> The pseudo code of the test case is as follows:
>> static int stat_find_name(...)
>> {
>> 	start = sched_clock();
>> 	(void)kallsyms_lookup_name(name);
>> 	end = sched_clock();
>> 	//Update min, max, cnt, sum
>> }
>>
>> /*
>>  * Traverse all symbols in sequence and collect statistics on the time
>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>  */
>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>
>> The test results are as follows (twice):
>> After : min=5250, max=  726560, avg= 302132
>> After : min=5320, max=  726850, avg= 301978
>> Before: min=170,  max=15949190, avg=7553906
>> Before: min=160,  max=15877280, avg=7517784
>>
>> The average time consumed is only 4.01% and the maximum time consumed is
>> only 4.57% of the time consumed before optimization.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>> --- a/kernel/kallsyms.c
>> +++ b/kernel/kallsyms.c
>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>  	return off;
>>  }
>>  
>> +static int kallsyms_name_to_tokens(const char *name, char *buf)
> 
> This is not safe API. It is always needed to pass the size of the
> buffer.

OK

> 
> Also it should be called "compress". "token" is just an implementation
> detail.
> 
> I would do:
> 
> static int kallsyms_compress_symbol_name(const char *name,
> 					 char *buf, size_t size)

This's a wonderful name. Thanks.

> 
> 
>> +{
>> +	int i, j, k, n;
>> +	int len, token_len;
>> +	const char *token;
>> +	unsigned char token_idx[KSYM_NAME_LEN];
>> +	unsigned char token_bak[KSYM_NAME_LEN];
> 
> Why do we need two buffers? It should be possible to compress the name
> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.

Because the performance would be a little better. Now this function takes
just over a microsecond. Currently, it takes about 250 microseconds on
average to lookup a symbol, so adding a little more time to this function
doesn't affect the overall picture. I'll modify and test it as you suggest
below.

> 
>> +
>> +	/*
>> +	 * n, number of tokens in the string name.
>> +	 * token_idx[i], the start index of the ith token.
>> +	 * token_idx[n] is used to calculate the length of the last token.
>> +	 */
>> +	n = strlen(name);
>> +	if (n >= KSYM_NAME_LEN) {
>> +		buf[0] = 0;
>> +		return 0;
>> +	}
>> +	for (i = 0; i <= n; i++)
>> +		token_idx[i] = (unsigned char)i;
>> +
>> +	/*
>> +	 * For tokens whose token_len >= 2, a larger index value indicates
>> +	 * a higher occurrence frequency. See scripts/kallsyms.c
>> +	 */
>> +	for (i = 255; i >= 0; i--) {
>> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
>> +		token_len = strlen(token);
>> +		if (token_len <= 1)
>> +			continue;
>> +
>> +		/*
>> +		 * Find and merge two tokens into one.
>> +		 *
>> +		 *                |<-- new_token -->|
>> +		 *                | token1 | token2 |
>> +		 * token_idx[]:   j       j+1      j+2
>> +		 */
>> +		for (j = 0; j < n - 1; j++) {
>> +			len = token_idx[j + 2] - token_idx[j];
>> +			if (len == token_len &&
>> +			    !strncmp(name + token_idx[j], token, len)) {
>> +				token_bak[token_idx[j]] = (unsigned char)i;
>> +				for (k = j + 1; k < n; k++)
>> +					token_idx[k] = token_idx[k + 1];
>> +				n--;
>> +			}
>> +		}
>> +	}
>> +
>> +	for (j = 0; j < n; j++) {
>> +		len = token_idx[j + 1] - token_idx[j];
>> +		if (len <= 1) {
>> +			buf[j] = name[token_idx[j]];
>> +			continue;
>> +		}
>> +
>> +		buf[j] = token_bak[token_idx[j]];
> 
> Maybe, I do not understand the compression format correctly but
> this code looks too complicated. Honestly, I even did not try to
> understand it.
> 
> My understanding is the we just need to find all tokens and
> replace them with index.
> 
> It should be even easier than compress_symbols() in scripts/callsyms.c.
> The token_table already exists and we do not need to handle the token_profit...
> 
> The following looks more strigtforward (not even compile tested):

OK, I will try this one. Or refer to compress_symbols() in scripts/callsyms.c.

> 
> 	ssize_t len, size;
> 
> 	len = strscpy(buf, symname, size);
> 	if (WARN_ON_ONCE(len < 0))
> 		return -EINVAL;
> 
> 	/* the tokens with higher index are used first */
> 	for (idx = 255; idx >= 0; idx--) {
> 		token =	&kallsyms_token_table[kallsyms_token_index[i]];
> 		token_len = strlen(token);
> 
> 		p1 = buf;
> 		/* length of the remaining symname including the trailing '\0' */
> 		remaining = len + 1;
> 
> 		/* find the token in the symbol name */
> 		p2 = strstr(token, p1);
> 
> 		while (p2) {
> 			/* replace token with index */
> 			*p2 = idx;
> 			remaining -= ((p2 - p1) + token_len);
> 			memmove(p2 + 1, p2 + token_len, remaining);
> 			len -= (token_len - 1);
> 			p1 = p2;
> 
> 			/* find the token in the rest of the symbol name */
> 			p2 = strstr(token, p1);
> 		}
> 	}
> 
> 	return len;
> 
>> +	}
>> +	buf[n] = 0;
>> +
>> +	return n;
>> +}
>> +
>>  /*
>>   * Get symbol type information. This is encoded as a single char at the
>>   * beginning of the symbol name.
>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>  	char namebuf[KSYM_NAME_LEN];
>>  	unsigned long i;
>>  	unsigned int off;
>> +	int len;
>>  
>>  	/* Skip the search for empty string. */
>>  	if (!*name)
>>  		return 0;
>>  
>> +	len = kallsyms_name_to_tokens(name, namebuf);
>> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>> +		if (kallsyms_names[off] == len + 1 &&
>> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
>> +			return kallsyms_sym_address(i);
>> +
>> +		off += kallsyms_names[off] + 1;
> 
> These complicated checks are hard to review. The following looks much
> more readable to me:

Yes, it looks well.

> 
> 	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
> 		/* the stored symbol name is prefixed by symbol type */
> 		name_len = kallsyms_names[off] - 1;
> 		name = &kallsyms_names[off + 2];
> 		off += name_len + 2;
> 
> 		if (name_len != len)
> 			continue;
> 
> 		if (!memcmp(name, namebuf, len))
> 			return kallsyms_sym_address(i);
> 	}
> 
> 
>> +	}
>> +
>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>  
>> -		if (strcmp(namebuf, name) == 0)
>> -			return kallsyms_sym_address(i);
>> -
>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>  			return kallsyms_sym_address(i);
> 
> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
> It might actually slow down the search because we would need to use
> both fast and slow search?

Theoretically, I don't think so. A string comparison was removed from the
slow search. "if (name_len != len)" is faster than
"if (strcmp(namebuf, name) == 0)". Even if they're equal,
kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
overall picture. The average lookup time before optimization is
millisecond-level. To allay your concerns, I can run a test.

Before: min=170,  max=15949190, avg=7553906

> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol()
  2022-09-21 15:30   ` Petr Mladek
@ 2022-09-22  2:16     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-22  2:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/21 23:30, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:14, Zhen Lei wrote:
>> Function kallsyms_on_each_symbol() traverses all symbols and submits each
>> symbol to the hook 'fn' for judgment and processing. For some cases, the
>> hook actually only handles the matched symbol, such as livepatch.
>>
>> So that, we can first compress the name being looked up and then use
>> it for comparison when traversing 'kallsyms_names', this greatly reduces
>> the time consumed by traversing.
>>
>> The pseudo code of the test case is as follows:
>> static int tst_find(void *data, const char *name,
>> 		    struct module *mod, unsigned long addr)
>> {
>> 	if (strcmp(name, "vmap") == 0)
>> 		*(unsigned long *)data = addr;
>>         return 0;
>> }
>>
>> static int tst_match(void *data, unsigned long addr)
>> {
>>         *(unsigned long *)data = addr;
>>         return 0;
>> }
>>
>> start = sched_clock();
>> kallsyms_on_each_match_symbol(tst_match, "vmap", &addr);
>> end = sched_clock();
>>
>> start = sched_clock();
>> kallsyms_on_each_symbol(tst_find, &addr);
>> end = sched_clock();
>>
>> The test results are as follows (twice):
>> kallsyms_on_each_match_symbol:   557400,   583900
>> kallsyms_on_each_symbol      : 16659500, 16113950
>>
>> kallsyms_on_each_match_symbol() consumes only 3.48% of
>> kallsyms_on_each_symbol()'s time.
>>
>> --- a/kernel/kallsyms.c
>> +++ b/kernel/kallsyms.c
>> @@ -305,6 +305,31 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
>>  	return 0;
>>  }
>>  
>> +int kallsyms_on_each_match_symbol(int (*fn)(void *, unsigned long),
>> +				  const char *name, void *data)
>> +{
>> +	unsigned int i, off;
>> +	int len, ret;
>> +	char namebuf[KSYM_NAME_LEN];
>> +
>> +	len = kallsyms_name_to_tokens(name, namebuf);
>> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>> +		if ((i & 0xfff) == 0)
>> +			cond_resched();
>> +
>> +		if ((kallsyms_names[off] == len + 1) &&
>> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len)) {
>> +			ret = fn(data, kallsyms_sym_address(i));
>> +			if (ret != 0)
>> +				return ret;
>> +			cond_resched();
>> +		}
>> +		off += kallsyms_names[off] + 1;
> 
> Similar tricky code is used also in kallsyms_lookup_name(). Please,
> avoid code duplication and put this into a helper funtion.

Okay, I'll try it.

> 
> Best Regards,
> Petr
> 
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static unsigned long get_symbol_pos(unsigned long addr,
>>  				    unsigned long *symbolsize,
>>  				    unsigned long *offset)
>> -- 
>> 2.25.1
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-22  2:15     ` Leizhen (ThunderTown)
@ 2022-09-22  7:02       ` Petr Mladek
  2022-09-22  7:21         ` Leizhen (ThunderTown)
  2022-09-28  1:35         ` Leizhen (ThunderTown)
  2022-09-22  7:14       ` Leizhen (ThunderTown)
  1 sibling, 2 replies; 25+ messages in thread
From: Petr Mladek @ 2022-09-22  7:02 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/21 23:25, Petr Mladek wrote:
> > On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> >> Currently, to search for a symbol, we need to expand the symbols in
> >> 'kallsyms_names' one by one, and then use the expanded string for
> >> comparison. This process can be optimized.
> >>
> >> And now scripts/kallsyms no longer compresses the symbol types, each
> >> symbol type always occupies one byte. So we can first compress the
> >> searched symbol and then make a quick comparison based on the compressed
> >> length and content. In this way, for entries with mismatched lengths,
> >> there is no need to expand and compare strings. And for those matching
> >> lengths, there's no need to expand the symbol. This saves a lot of time.
> >> According to my test results, the average performance of
> >> kallsyms_lookup_name() can be improved by 20 to 30 times.
> >>
> >> The pseudo code of the test case is as follows:
> >> static int stat_find_name(...)
> >> {
> >> 	start = sched_clock();
> >> 	(void)kallsyms_lookup_name(name);
> >> 	end = sched_clock();
> >> 	//Update min, max, cnt, sum
> >> }
> >>
> >> /*
> >>  * Traverse all symbols in sequence and collect statistics on the time
> >>  * taken by kallsyms_lookup_name() to lookup each symbol.
> >>  */
> >> kallsyms_on_each_symbol(stat_find_name, NULL);
> >>
> >> The test results are as follows (twice):
> >> After : min=5250, max=  726560, avg= 302132
> >> After : min=5320, max=  726850, avg= 301978
> >> Before: min=170,  max=15949190, avg=7553906
> >> Before: min=160,  max=15877280, avg=7517784
> >>
> >> The average time consumed is only 4.01% and the maximum time consumed is
> >> only 4.57% of the time consumed before optimization.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> ---
> >>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 76 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> >> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> >> --- a/kernel/kallsyms.c
> >> +++ b/kernel/kallsyms.c
> >> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
> >> +{
> >> +	int i, j, k, n;
> >> +	int len, token_len;
> >> +	const char *token;
> >> +	unsigned char token_idx[KSYM_NAME_LEN];
> >> +	unsigned char token_bak[KSYM_NAME_LEN];
> > 
> > Why do we need two buffers? It should be possible to compress the name
> > in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
> 
> Because the performance would be a little better. Now this function takes
> just over a microsecond. Currently, it takes about 250 microseconds on
> average to lookup a symbol, so adding a little more time to this function
> doesn't affect the overall picture. I'll modify and test it as you suggest
> below.

We need to be careful about a stack overflow. I have seen that
KSYM_NAME_LEN might need to be increased to 512 because of
Rust support, see
https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org

> >> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
> >>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
> >>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
> >>  
> >> -		if (strcmp(namebuf, name) == 0)
> >> -			return kallsyms_sym_address(i);
> >> -
> >>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
> >>  			return kallsyms_sym_address(i);
> > 
> > Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
> > It might actually slow down the search because we would need to use
> > both fast and slow search?
> 
> Theoretically, I don't think so. A string comparison was removed from the
> slow search. "if (name_len != len)" is faster than
> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
> overall picture. The average lookup time before optimization is
> millisecond-level.
>
> Before: min=170,  max=15949190, avg=7553906

Good point! I agree that the potential extra overhead is negligible
when using the old code as a fallback.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-22  2:15     ` Leizhen (ThunderTown)
  2022-09-22  7:02       ` Petr Mladek
@ 2022-09-22  7:14       ` Leizhen (ThunderTown)
  1 sibling, 0 replies; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-22  7:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/22 10:15, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/21 23:25, Petr Mladek wrote:
>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>> Currently, to search for a symbol, we need to expand the symbols in
>>> 'kallsyms_names' one by one, and then use the expanded string for
>>> comparison. This process can be optimized.
>>>
>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>> symbol type always occupies one byte. So we can first compress the
>>> searched symbol and then make a quick comparison based on the compressed
>>> length and content. In this way, for entries with mismatched lengths,
>>> there is no need to expand and compare strings. And for those matching
>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>> According to my test results, the average performance of
>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>
>>> The pseudo code of the test case is as follows:
>>> static int stat_find_name(...)
>>> {
>>> 	start = sched_clock();
>>> 	(void)kallsyms_lookup_name(name);
>>> 	end = sched_clock();
>>> 	//Update min, max, cnt, sum
>>> }
>>>
>>> /*
>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>  */
>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>
>>> The test results are as follows (twice):
>>> After : min=5250, max=  726560, avg= 302132
>>> After : min=5320, max=  726850, avg= 301978
>>> Before: min=170,  max=15949190, avg=7553906
>>> Before: min=160,  max=15877280, avg=7517784
>>>
>>> The average time consumed is only 4.01% and the maximum time consumed is
>>> only 4.57% of the time consumed before optimization.
>>>
>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>> ---
>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>> --- a/kernel/kallsyms.c
>>> +++ b/kernel/kallsyms.c
>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>  	return off;
>>>  }
>>>  
>>> +static int kallsyms_name_to_tokens(const char *name, char *buf)
>>
>> This is not safe API. It is always needed to pass the size of the
>> buffer.
> 
> OK
> 
>>
>> Also it should be called "compress". "token" is just an implementation
>> detail.
>>
>> I would do:
>>
>> static int kallsyms_compress_symbol_name(const char *name,
>> 					 char *buf, size_t size)
> 
> This's a wonderful name. Thanks.
> 
>>
>>
>>> +{
>>> +	int i, j, k, n;
>>> +	int len, token_len;
>>> +	const char *token;
>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>
>> Why do we need two buffers? It should be possible to compress the name
>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
> 
> Because the performance would be a little better. Now this function takes
> just over a microsecond. Currently, it takes about 250 microseconds on
> average to lookup a symbol, so adding a little more time to this function
> doesn't affect the overall picture. I'll modify and test it as you suggest
> below.
> 
>>
>>> +
>>> +	/*
>>> +	 * n, number of tokens in the string name.
>>> +	 * token_idx[i], the start index of the ith token.
>>> +	 * token_idx[n] is used to calculate the length of the last token.
>>> +	 */
>>> +	n = strlen(name);
>>> +	if (n >= KSYM_NAME_LEN) {
>>> +		buf[0] = 0;
>>> +		return 0;
>>> +	}
>>> +	for (i = 0; i <= n; i++)
>>> +		token_idx[i] = (unsigned char)i;
>>> +
>>> +	/*
>>> +	 * For tokens whose token_len >= 2, a larger index value indicates
>>> +	 * a higher occurrence frequency. See scripts/kallsyms.c
>>> +	 */
>>> +	for (i = 255; i >= 0; i--) {
>>> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
>>> +		token_len = strlen(token);
>>> +		if (token_len <= 1)
>>> +			continue;
>>> +
>>> +		/*
>>> +		 * Find and merge two tokens into one.
>>> +		 *
>>> +		 *                |<-- new_token -->|
>>> +		 *                | token1 | token2 |
>>> +		 * token_idx[]:   j       j+1      j+2
>>> +		 */
>>> +		for (j = 0; j < n - 1; j++) {
>>> +			len = token_idx[j + 2] - token_idx[j];
>>> +			if (len == token_len &&
>>> +			    !strncmp(name + token_idx[j], token, len)) {
>>> +				token_bak[token_idx[j]] = (unsigned char)i;
>>> +				for (k = j + 1; k < n; k++)
>>> +					token_idx[k] = token_idx[k + 1];
>>> +				n--;
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +	for (j = 0; j < n; j++) {
>>> +		len = token_idx[j + 1] - token_idx[j];
>>> +		if (len <= 1) {
>>> +			buf[j] = name[token_idx[j]];
>>> +			continue;
>>> +		}
>>> +
>>> +		buf[j] = token_bak[token_idx[j]];
>>
>> Maybe, I do not understand the compression format correctly but
>> this code looks too complicated. Honestly, I even did not try to
>> understand it.
>>
>> My understanding is the we just need to find all tokens and
>> replace them with index.
>>
>> It should be even easier than compress_symbols() in scripts/callsyms.c.
>> The token_table already exists and we do not need to handle the token_profit...
>>
>> The following looks more strigtforward (not even compile tested):
> 
> OK, I will try this one. Or refer to compress_symbols() in scripts/callsyms.c.

This method won't work. Because the tokens in kallsyms_token_table[] have
been expanded. For example: name = "nfs_fs_proc_net_init"
kallsyms_token_table[0xf3] = "s_",  raw token = 73 5f
kallsyms_token_table[0x9f] = "fs_", raw token = 66 f3

After "s_" have been replaced with f3, there is no "fs_" substring in namebuf.
That's why I wrote a new piece of code. Due to I didn't want to add a variable
like kallsyms_token_table[].

Now, I will add kallsyms_best_token_table[], kallsyms_best_token_table_len;
kallsyms_best_token_table[] does not store tokens that contain only one
character. And index=0 is the token with the highest frequency.


> 
>>
>> 	ssize_t len, size;
>>
>> 	len = strscpy(buf, symname, size);
>> 	if (WARN_ON_ONCE(len < 0))
>> 		return -EINVAL;
>>
>> 	/* the tokens with higher index are used first */
>> 	for (idx = 255; idx >= 0; idx--) {
>> 		token =	&kallsyms_token_table[kallsyms_token_index[i]];
>> 		token_len = strlen(token);
>>
>> 		p1 = buf;
>> 		/* length of the remaining symname including the trailing '\0' */
>> 		remaining = len + 1;
>>
>> 		/* find the token in the symbol name */
>> 		p2 = strstr(token, p1);
>>
>> 		while (p2) {
>> 			/* replace token with index */
>> 			*p2 = idx;
>> 			remaining -= ((p2 - p1) + token_len);
>> 			memmove(p2 + 1, p2 + token_len, remaining);
>> 			len -= (token_len - 1);
>> 			p1 = p2;
>>
>> 			/* find the token in the rest of the symbol name */
>> 			p2 = strstr(token, p1);
>> 		}
>> 	}
>>
>> 	return len;
>>
>>> +	}
>>> +	buf[n] = 0;
>>> +
>>> +	return n;
>>> +}
>>> +
>>>  /*
>>>   * Get symbol type information. This is encoded as a single char at the
>>>   * beginning of the symbol name.
>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>  	char namebuf[KSYM_NAME_LEN];
>>>  	unsigned long i;
>>>  	unsigned int off;
>>> +	int len;
>>>  
>>>  	/* Skip the search for empty string. */
>>>  	if (!*name)
>>>  		return 0;
>>>  
>>> +	len = kallsyms_name_to_tokens(name, namebuf);
>>> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>>> +		if (kallsyms_names[off] == len + 1 &&
>>> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
>>> +			return kallsyms_sym_address(i);
>>> +
>>> +		off += kallsyms_names[off] + 1;
>>
>> These complicated checks are hard to review. The following looks much
>> more readable to me:
> 
> Yes, it looks well.
> 
>>
>> 	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>> 		/* the stored symbol name is prefixed by symbol type */
>> 		name_len = kallsyms_names[off] - 1;
>> 		name = &kallsyms_names[off + 2];
>> 		off += name_len + 2;
>>
>> 		if (name_len != len)
>> 			continue;
>>
>> 		if (!memcmp(name, namebuf, len))
>> 			return kallsyms_sym_address(i);
>> 	}
>>
>>
>>> +	}
>>> +
>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>  
>>> -		if (strcmp(namebuf, name) == 0)
>>> -			return kallsyms_sym_address(i);
>>> -
>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>  			return kallsyms_sym_address(i);
>>
>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>> It might actually slow down the search because we would need to use
>> both fast and slow search?
> 
> Theoretically, I don't think so. A string comparison was removed from the
> slow search. "if (name_len != len)" is faster than
> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
> overall picture. The average lookup time before optimization is
> millisecond-level. To allay your concerns, I can run a test.
> 
> Before: min=170,  max=15949190, avg=7553906
> 
>>
>> Best Regards,
>> Petr
>> .
>>
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-22  7:02       ` Petr Mladek
@ 2022-09-22  7:21         ` Leizhen (ThunderTown)
  2022-09-22 13:17           ` Petr Mladek
  2022-09-28  1:35         ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-22  7:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/22 15:02, Petr Mladek wrote:
> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/9/21 23:25, Petr Mladek wrote:
>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>> comparison. This process can be optimized.
>>>>
>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>> symbol type always occupies one byte. So we can first compress the
>>>> searched symbol and then make a quick comparison based on the compressed
>>>> length and content. In this way, for entries with mismatched lengths,
>>>> there is no need to expand and compare strings. And for those matching
>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>> According to my test results, the average performance of
>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>
>>>> The pseudo code of the test case is as follows:
>>>> static int stat_find_name(...)
>>>> {
>>>> 	start = sched_clock();
>>>> 	(void)kallsyms_lookup_name(name);
>>>> 	end = sched_clock();
>>>> 	//Update min, max, cnt, sum
>>>> }
>>>>
>>>> /*
>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>  */
>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>
>>>> The test results are as follows (twice):
>>>> After : min=5250, max=  726560, avg= 302132
>>>> After : min=5320, max=  726850, avg= 301978
>>>> Before: min=170,  max=15949190, avg=7553906
>>>> Before: min=160,  max=15877280, avg=7517784
>>>>
>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>> only 4.57% of the time consumed before optimization.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>> --- a/kernel/kallsyms.c
>>>> +++ b/kernel/kallsyms.c
>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>> +{
>>>> +	int i, j, k, n;
>>>> +	int len, token_len;
>>>> +	const char *token;
>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>
>>> Why do we need two buffers? It should be possible to compress the name
>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>
>> Because the performance would be a little better. Now this function takes
>> just over a microsecond. Currently, it takes about 250 microseconds on
>> average to lookup a symbol, so adding a little more time to this function
>> doesn't affect the overall picture. I'll modify and test it as you suggest
>> below.
> 
> We need to be careful about a stack overflow. I have seen that
> KSYM_NAME_LEN might need to be increased to 512 because of
> Rust support, see
> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org

OK. Thanks for your information. I decided to add kallsyms_best_token_table[],
kallsyms_best_token_table_len, so that we only need one namebuf[], like
kallsyms_expand_symbol().

> 
>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>  
>>>> -		if (strcmp(namebuf, name) == 0)
>>>> -			return kallsyms_sym_address(i);
>>>> -
>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>  			return kallsyms_sym_address(i);
>>>
>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>> It might actually slow down the search because we would need to use
>>> both fast and slow search?
>>
>> Theoretically, I don't think so. A string comparison was removed from the
>> slow search. "if (name_len != len)" is faster than
>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>> overall picture. The average lookup time before optimization is
>> millisecond-level.
>>
>> Before: min=170,  max=15949190, avg=7553906
> 
> Good point! I agree that the potential extra overhead is negligible
> when using the old code as a fallback.
> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-22  7:21         ` Leizhen (ThunderTown)
@ 2022-09-22 13:17           ` Petr Mladek
  0 siblings, 0 replies; 25+ messages in thread
From: Petr Mladek @ 2022-09-22 13:17 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules

On Thu 2022-09-22 15:21:57, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/22 15:02, Petr Mladek wrote:
> > On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2022/9/21 23:25, Petr Mladek wrote:
> >>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> >>>> Currently, to search for a symbol, we need to expand the symbols in
> >>>> 'kallsyms_names' one by one, and then use the expanded string for
> >>>> comparison. This process can be optimized.
> >>>>
> >>>> And now scripts/kallsyms no longer compresses the symbol types, each
> >>>> symbol type always occupies one byte. So we can first compress the
> >>>> searched symbol and then make a quick comparison based on the compressed
> >>>> length and content. In this way, for entries with mismatched lengths,
> >>>> there is no need to expand and compare strings. And for those matching
> >>>> lengths, there's no need to expand the symbol. This saves a lot of time.
> >>>> According to my test results, the average performance of
> >>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
> >>>>
> >>>> The pseudo code of the test case is as follows:
> >>>> static int stat_find_name(...)
> >>>> {
> >>>> 	start = sched_clock();
> >>>> 	(void)kallsyms_lookup_name(name);
> >>>> 	end = sched_clock();
> >>>> 	//Update min, max, cnt, sum
> >>>> }
> >>>>
> >>>> /*
> >>>>  * Traverse all symbols in sequence and collect statistics on the time
> >>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
> >>>>  */
> >>>> kallsyms_on_each_symbol(stat_find_name, NULL);
> >>>>
> >>>> The test results are as follows (twice):
> >>>> After : min=5250, max=  726560, avg= 302132
> >>>> After : min=5320, max=  726850, avg= 301978
> >>>> Before: min=170,  max=15949190, avg=7553906
> >>>> Before: min=160,  max=15877280, avg=7517784
> >>>>
> >>>> The average time consumed is only 4.01% and the maximum time consumed is
> >>>> only 4.57% of the time consumed before optimization.
> >>>>
> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >>>> ---
> >>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
> >>>>  1 file changed, 76 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> >>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> >>>> --- a/kernel/kallsyms.c
> >>>> +++ b/kernel/kallsyms.c
> >>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
> >>>> +{
> >>>> +	int i, j, k, n;
> >>>> +	int len, token_len;
> >>>> +	const char *token;
> >>>> +	unsigned char token_idx[KSYM_NAME_LEN];
> >>>> +	unsigned char token_bak[KSYM_NAME_LEN];
> >>>
> >>> Why do we need two buffers? It should be possible to compress the name
> >>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
> >>
> >> Because the performance would be a little better. Now this function takes
> >> just over a microsecond. Currently, it takes about 250 microseconds on
> >> average to lookup a symbol, so adding a little more time to this function
> >> doesn't affect the overall picture. I'll modify and test it as you suggest
> >> below.
> > 
> > We need to be careful about a stack overflow. I have seen that
> > KSYM_NAME_LEN might need to be increased to 512 because of
> > Rust support, see
> > https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org
> 
> OK. Thanks for your information. I decided to add kallsyms_best_token_table[],
> kallsyms_best_token_table_len, so that we only need one namebuf[], like
> kallsyms_expand_symbol().

Thanks for the effort. Adding kallsyms_best_token_table[] sounds like
the right solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-22  7:02       ` Petr Mladek
  2022-09-22  7:21         ` Leizhen (ThunderTown)
@ 2022-09-28  1:35         ` Leizhen (ThunderTown)
  2022-09-30 11:37           ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-28  1:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/22 15:02, Petr Mladek wrote:
> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/9/21 23:25, Petr Mladek wrote:
>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>> comparison. This process can be optimized.
>>>>
>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>> symbol type always occupies one byte. So we can first compress the
>>>> searched symbol and then make a quick comparison based on the compressed
>>>> length and content. In this way, for entries with mismatched lengths,
>>>> there is no need to expand and compare strings. And for those matching
>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>> According to my test results, the average performance of
>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>
>>>> The pseudo code of the test case is as follows:
>>>> static int stat_find_name(...)
>>>> {
>>>> 	start = sched_clock();
>>>> 	(void)kallsyms_lookup_name(name);
>>>> 	end = sched_clock();
>>>> 	//Update min, max, cnt, sum
>>>> }
>>>>
>>>> /*
>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>  */
>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>
>>>> The test results are as follows (twice):
>>>> After : min=5250, max=  726560, avg= 302132
>>>> After : min=5320, max=  726850, avg= 301978
>>>> Before: min=170,  max=15949190, avg=7553906
>>>> Before: min=160,  max=15877280, avg=7517784
>>>>
>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>> only 4.57% of the time consumed before optimization.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>> --- a/kernel/kallsyms.c
>>>> +++ b/kernel/kallsyms.c
>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>> +{
>>>> +	int i, j, k, n;
>>>> +	int len, token_len;
>>>> +	const char *token;
>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>
>>> Why do we need two buffers? It should be possible to compress the name
>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>
>> Because the performance would be a little better. Now this function takes
>> just over a microsecond. Currently, it takes about 250 microseconds on
>> average to lookup a symbol, so adding a little more time to this function
>> doesn't affect the overall picture. I'll modify and test it as you suggest
>> below.
> 
> We need to be careful about a stack overflow. I have seen that
> KSYM_NAME_LEN might need to be increased to 512 because of
> Rust support, see
> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org
> 
>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>  
>>>> -		if (strcmp(namebuf, name) == 0)
>>>> -			return kallsyms_sym_address(i);
>>>> -
>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>  			return kallsyms_sym_address(i);
>>>
>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>> It might actually slow down the search because we would need to use
>>> both fast and slow search?
>>
>> Theoretically, I don't think so. A string comparison was removed from the
>> slow search. "if (name_len != len)" is faster than
>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>> overall picture. The average lookup time before optimization is
>> millisecond-level.
>>
>> Before: min=170,  max=15949190, avg=7553906
> 
> Good point! I agree that the potential extra overhead is negligible
> when using the old code as a fallback.

These days sleep better. When I got up this morning, my subconscious told me that
compiled LLVM could also be optimized. In fact, the method is simple, that is,
check whether the next token starts with '.' or '$' after being expanded.

I will post v7 before the holidays.

> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()
  2022-09-28  1:35         ` Leizhen (ThunderTown)
@ 2022-09-30 11:37           ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 25+ messages in thread
From: Leizhen (ThunderTown) @ 2022-09-30 11:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	live-patching, linux-kernel, Masahiro Yamada, Alexei Starovoitov,
	Jiri Olsa, Kees Cook, Andrew Morton, Luis Chamberlain,
	linux-modules



On 2022/9/28 9:35, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/22 15:02, Petr Mladek wrote:
>> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/9/21 23:25, Petr Mladek wrote:
>>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>>> comparison. This process can be optimized.
>>>>>
>>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>>> symbol type always occupies one byte. So we can first compress the
>>>>> searched symbol and then make a quick comparison based on the compressed
>>>>> length and content. In this way, for entries with mismatched lengths,
>>>>> there is no need to expand and compare strings. And for those matching
>>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>>> According to my test results, the average performance of
>>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>>
>>>>> The pseudo code of the test case is as follows:
>>>>> static int stat_find_name(...)
>>>>> {
>>>>> 	start = sched_clock();
>>>>> 	(void)kallsyms_lookup_name(name);
>>>>> 	end = sched_clock();
>>>>> 	//Update min, max, cnt, sum
>>>>> }
>>>>>
>>>>> /*
>>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>>  */
>>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>>
>>>>> The test results are as follows (twice):
>>>>> After : min=5250, max=  726560, avg= 302132
>>>>> After : min=5320, max=  726850, avg= 301978
>>>>> Before: min=170,  max=15949190, avg=7553906
>>>>> Before: min=160,  max=15877280, avg=7517784
>>>>>
>>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>>> only 4.57% of the time consumed before optimization.
>>>>>
>>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>>> ---
>>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>>> --- a/kernel/kallsyms.c
>>>>> +++ b/kernel/kallsyms.c
>>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>>> +{
>>>>> +	int i, j, k, n;
>>>>> +	int len, token_len;
>>>>> +	const char *token;
>>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>>
>>>> Why do we need two buffers? It should be possible to compress the name
>>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>>
>>> Because the performance would be a little better. Now this function takes
>>> just over a microsecond. Currently, it takes about 250 microseconds on
>>> average to lookup a symbol, so adding a little more time to this function
>>> doesn't affect the overall picture. I'll modify and test it as you suggest
>>> below.
>>
>> We need to be careful about a stack overflow. I have seen that
>> KSYM_NAME_LEN might need to be increased to 512 because of
>> Rust support, see
>> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org
>>
>>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>>  
>>>>> -		if (strcmp(namebuf, name) == 0)
>>>>> -			return kallsyms_sym_address(i);
>>>>> -
>>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>>  			return kallsyms_sym_address(i);
>>>>
>>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>>> It might actually slow down the search because we would need to use
>>>> both fast and slow search?
>>>
>>> Theoretically, I don't think so. A string comparison was removed from the
>>> slow search. "if (name_len != len)" is faster than
>>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>>> overall picture. The average lookup time before optimization is
>>> millisecond-level.
>>>
>>> Before: min=170,  max=15949190, avg=7553906
>>
>> Good point! I agree that the potential extra overhead is negligible
>> when using the old code as a fallback.
> 
> These days sleep better. When I got up this morning, my subconscious told me that
> compiled LLVM could also be optimized. In fact, the method is simple, that is,
> check whether the next token starts with '.' or '$' after being expanded.
> 
> I will post v7 before the holidays.

Sorry, I'm going to break my promise. A lot of code needs to be modified on
the tool side, to make sure that the first '.' or '$' will not be in the middle
or end of the expanded substring. The lab is powered off, so I can only post v7
after the holidays (one week).

> 
>>
>> Best Regards,
>> Petr
>> .
>>
> 

-- 
Regards,
  Zhen Lei

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-09-30 11:44 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20  7:13 [PATCH v4 0/8] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
2022-09-20  7:13 ` [PATCH v4 1/8] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
2022-09-21  7:47   ` Petr Mladek
2022-09-20  7:13 ` [PATCH v4 2/8] scripts/kallsyms: ensure that all possible combinations are compressed Zhen Lei
2022-09-21  8:00   ` Petr Mladek
2022-09-21  8:31     ` Leizhen (ThunderTown)
2022-09-21 12:46       ` Leizhen (ThunderTown)
2022-09-20  7:13 ` [PATCH v4 3/8] scripts/kallsyms: don't compress symbol types Zhen Lei
2022-09-21  9:00   ` Petr Mladek
2022-09-21 13:13     ` Leizhen (ThunderTown)
2022-09-20  7:13 ` [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
2022-09-21 15:25   ` Petr Mladek
2022-09-22  2:15     ` Leizhen (ThunderTown)
2022-09-22  7:02       ` Petr Mladek
2022-09-22  7:21         ` Leizhen (ThunderTown)
2022-09-22 13:17           ` Petr Mladek
2022-09-28  1:35         ` Leizhen (ThunderTown)
2022-09-30 11:37           ` Leizhen (ThunderTown)
2022-09-22  7:14       ` Leizhen (ThunderTown)
2022-09-20  7:13 ` [PATCH v4 5/8] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
2022-09-21 15:30   ` Petr Mladek
2022-09-22  2:16     ` Leizhen (ThunderTown)
2022-09-20  7:13 ` [PATCH v4 6/8] livepatch: Use kallsyms_on_each_match_symbol() to improve performance Zhen Lei
2022-09-20  7:13 ` [PATCH v4 7/8] livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Zhen Lei
2022-09-20  7:13 ` [PATCH v4 8/8] kallsyms: Add self-test facility Zhen Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).