All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
	Jiri Kosina <jikos@kernel.org>, Miroslav Benes <mbenes@suse.cz>,
	Petr Mladek <pmladek@suse.com>,
	Joe Lawrence <joe.lawrence@redhat.com>,
	<live-patching@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	<linux-modules@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Ingo Molnar" <mingo@redhat.com>
Subject: Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols
Date: Mon, 31 Oct 2022 23:04:30 +0800	[thread overview]
Message-ID: <b30a540b-019d-4466-19d9-33900b1a89b1@huawei.com> (raw)
In-Reply-To: <b7215b83-11ab-6db6-bd7f-9729725eaaeb@huawei.com>



On 2022/10/31 12:55, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/10/29 16:10, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/10/27 14:27, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/10/27 11:26, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2022/10/27 3:03, Luis Chamberlain wrote:
>>>>> On Wed, Oct 26, 2022 at 02:44:36PM +0800, Leizhen (ThunderTown) wrote:
>>>>>> On 2022/10/26 1:53, Luis Chamberlain wrote:
>>>>>>> This answers how we don't use a hash table, the question was *should* we
>>>>>>> use one?
>>>>>>
>>>>>> I'm not the original author, and I can only answer now based on my understanding. Maybe
>>>>>> the original author didn't think of the hash method, or he has weighed it out.
>>>>>>
>>>>>> Hash is a good solution if only performance is required and memory overhead is not
>>>>>> considered. Using hash will increase the memory size by up to "4 * kallsyms_num_syms +
>>>>>> 4 * ARRAY_SIZE(hashtable)" bytes, kallsyms_num_syms is about 1-2 million.
>>>
>>> Sorry, 1-2 million ==> 0.1~0.2 million
>>>
>>>>>>
>>>>>> Because I don't know what hash algorithm will be used, the cost of generating the
>>>>>> hash value corresponding to the symbol name is unknown now. But I think it's gonna
>>>>>> be small. But it definitely needs a simpler algorithm, the tool needs to implement
>>>>>> the same hash algorithm.
>>>>>
>>>>> For instance, you can look at evaluating if alloc_large_system_hash() would help.
>>>>
>>
>> The following three hash algorithms are compared. The kernel is compiled by defconfig
>> on arm64.
>>
>> |---------------------------------------------------------------------------------------|
>> |                           |             hash &= HASH_TABLE_SIZE - 1                   |
>> |                           |             number of conflicts >= 1000                   |
>> |---------------------------------------------------------------------------------------|
>> |   ARRAY_SIZE(hash_table)  |       crc16        | jhash_one_at_a_time | string hash_32 |
>> |---------------------------------------------------------------------------------------|
>> |                           |     345b: 3905     |     0d40: 1013      |   4a4c: 6548   |
>> |                           |     35bb: 1016     |     35ce: 6549      |   883a: 1015   |
>> |        0x10000            |     385b: 6548     |     4440: 19126     |   d05f: 19129  |
>> |                           |     f0ba: 19127    |     7ebe: 3916      |   ecda: 3903   |
>> |---------------------------------------------------------------------------------------|
>> |                           |      0ba: 19168    |      440: 19165     |    05f: 19170  |
>> |                           |      45b: 3955     |      5ce: 6577      |    83a: 1066   |
>> |        0x1000             |      5bb: 1069     |      d40: 1052      |    a4c: 6609   |
>> |                           |      85b: 6582     |      ebe: 3938      |    cda: 3924   |
>> |---------------------------------------------------------------------------------------|
>>
>> Based on the above test results, I conclude that:
>> 1. For the worst-case scenario, the three algorithms are not much different. But the kernel
>>    only implements crc16 and string hash_32. The latter is not processed byte-by-byte, so
>>    it is coupled with byte order and sizeof(long). So crc16 is the best choice.
>> 2. For the worst-case scenario, there are almost 19K strings are mapped to the same hash
>>    value,just over 1/10 of the total. And with my current compression-then-comparison
>>    approach, it's 25-30 times faster. So there's still a need for my current approach, and
>>    they can be combined.
>>    if (nr_conflicts(key) >= CONST_N) {
>>        newname = compress(name);
>>        for_each_name_in_slot(key): compare(new_name)
>>    } else {
>>        for_each_name_in_slot(key): compare(name)
>>    }
>>
>>    Above CONST_N can be roughly calculated:
>>    time_of_compress(name) + N * time_of_compare(new_name) <= N * time_of_compare(name)
>> 3. For the worst-case scenario, there is no obvious difference between ARRAY_SIZE(hash_table)
>>    0x10000 and 0x1000. So ARRAY_SIZE(hash_table)=0x1000 is enough.
>>    Statistic information:
>>    |------------------------------------------------------|
>>    |   nr_conflicts(key)  |     ARRAY_SIZE(hash_table)    |
>>    |------------------------------------------------------|
>>    |         <= ?         |     0x1000    |    0x10000    |
>>    |------------------------------------------------------|
>>    |             0        |         0     |      7821     |
>>    |            20        |        19     |     57375     |
>>    |            40        |      2419     |       124     |
>>    |            60        |      1343     |        70     |
>>    |            80        |       149     |        73     |
>>    |           100        |        38     |        49     |
>>    |           200        |       108     |        16     |
>>    |           400        |        14     |         2     |
>>    |           600        |         2     |         2     |
>>    |           800        |         0     |         0     |
>>    |          1000        |         0     |         0     |
>>    |        100000        |         4     |         4     |
>>    |------------------------------------------------------|
>>
>>
>> Also, I re-calculated:
>> Using hash will increase the memory size by up to "6 * kallsyms_num_syms + 4 * ARRAY_SIZE(hashtable)"
>>                                                    |---- What I said earlier was 4
>> The increased size is close to 1 MB if CONFIG_KALLSYMS_ALL=y.
>>
>> Hi, Luis:
>>   For the reasons of the above-mentioned second conclusion. And except for patches 4-6,
>> even if only the hash method is used, other patches and option "--lto-clang" in patch 6/11
>> are also needed. If you don't mind, I'd like to use hash at the next stage. The current
>> patch set is already huge.
> 
> I just had an update in response to David Laight's email. The hash solution is like
> a centrist. It doesn't seem very feasible.
> 
> Now, we need to make a decision. Choose one of the two:
> 1. Continue with my current approach. Improve the average performance of
>    kallsyms_lookup_name() by 20 to 30 times. The memory overhead is increased by:
>    arm64 (defconfig):
>      73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
>      19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.
>    x86 (defconfig):
>      49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
>      16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.
> 2. Sort names, binary search (The static function causes duplicate names. Additional work is required)
>    2^18=262144, only up to 18 symbol expansions and comparisons are required.
>    The performance is definitely excellent, although I haven't tested it yet.
>    The memory overhead is increased by: 6 * kallsyms_num_syms
>    arm64 (defconfig):
>        1MiB if CONFIG_KALLSYMS_ALL=y.
>      362KiB if CONFIG_KALLSYMS_ALL=n.
>    x86 (defconfig):
>      770KiB if CONFIG_KALLSYMS_ALL=y.
>      356KiB if CONFIG_KALLSYMS_ALL=n.
> 

Preliminary Test Results: (On Qemu arm64)
[   73.049249] kallsyms_selftest: kallsyms_lookup_name() looked up 151880 symbols
[   73.049331] kallsyms_selftest: The time spent on each symbol is (ns): min=1088, max=46848, avg=6629

> 
> 
> 
>>
>>
>>>> OK, I found the right hash function. In this way, the tool does not need to consider
>>>> the byte order.
>>>
>>> https://en.wikipedia.org/wiki/Jenkins_hash_function
>>>
>>> Let's go with jenkins_one_at_a_time_hash(), which looks simpler and doesn't even
>>> have to think about sizeof(long). It seems to be closest to our current needs.
>>>
>>> uint32_t jenkins_one_at_a_time_hash(const uint8_t* key, size_t length) {
>>> 	size_t i = 0;
>>> 	uint32_t hash = 0;
>>>
>>> 	while (i != length) {
>>> 		hash += key[i++];
>>> 		hash += hash << 10;
>>> 		hash ^= hash >> 6;
>>> 	}
>>> 	hash += hash << 3;
>>> 	hash ^= hash >> 11;
>>> 	hash += hash << 15;
>>>
>>> 	return hash;
>>> }
>>>
>>>>
>>>> include/linux/stringhash.h
>>>>
>>>> /*
>>>>  * Version 1: one byte at a time.  Example of use:
>>>>  *
>>>>  * unsigned long hash = init_name_hash;
>>>>  * while (*p)
>>>>  *      hash = partial_name_hash(tolower(*p++), hash);
>>>>  * hash = end_name_hash(hash);
>>>>
>>>>
>>>>>
>>>>>   Luis
>>>>> .
>>>>>
>>>>
>>>
>>
> 

-- 
Regards,
  Zhen Lei

  reply	other threads:[~2022-10-31 15:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17  6:49 [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
2022-10-17  6:49 ` [PATCH v7 01/11] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 02/11] scripts/kallsyms: don't compress symbol types Zhen Lei
2022-10-17  6:49 ` [PATCH v7 03/11] scripts/kallsyms: remove helper sym_name() and cleanup Zhen Lei
2022-10-17  6:49 ` [PATCH v7 04/11] kallsyms: Add helper kallsyms_compress_symbol_name() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 05/11] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 06/11] kallsyms: Improve the performance of kallsyms_lookup_name() when CONFIG_LTO_CLANG=y Zhen Lei
2022-10-17  6:49 ` [PATCH v7 07/11] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 08/11] livepatch: Use kallsyms_on_each_match_symbol() to improve performance Zhen Lei
2022-10-17  6:49 ` [PATCH v7 09/11] livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 10/11] kallsyms: Delete an unused parameter related to kallsyms_on_each_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 11/11] kallsyms: Add self-test facility Zhen Lei
2022-10-18  8:21   ` kernel test robot
2022-10-18  9:11     ` Leizhen (ThunderTown)
2022-10-18  9:11       ` Leizhen
2022-10-18  9:32   ` kernel test robot
2022-10-19  8:39     ` Leizhen (ThunderTown)
2022-10-19  8:39       ` Leizhen
2022-10-21  2:00   ` kernel test robot
2022-10-19 12:01 ` [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols Luis Chamberlain
2022-10-19 14:11   ` Leizhen (ThunderTown)
2022-10-25 17:53     ` Luis Chamberlain
2022-10-26  6:44       ` Leizhen (ThunderTown)
2022-10-26 19:03         ` Luis Chamberlain
2022-10-27  3:26           ` Leizhen (ThunderTown)
2022-10-27  6:27             ` Leizhen (ThunderTown)
2022-10-29  8:10               ` Leizhen (ThunderTown)
2022-10-29 12:49                 ` David Laight
2022-10-31  2:55                   ` Leizhen (ThunderTown)
2022-10-31  4:55                 ` Leizhen (ThunderTown)
2022-10-31 15:04                   ` Leizhen (ThunderTown) [this message]
2022-11-02  9:18                     ` Leizhen (ThunderTown)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b30a540b-019d-4466-19d9-33900b1a89b1@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mcgrof@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.