All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: David Laight <David.Laight@ACULAB.COM>,
	Luis Chamberlain <mcgrof@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
	Jiri Kosina <jikos@kernel.org>, Miroslav Benes <mbenes@suse.cz>,
	Petr Mladek <pmladek@suse.com>,
	Joe Lawrence <joe.lawrence@redhat.com>,
	"live-patching@vger.kernel.org" <live-patching@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-modules@vger.kernel.org" <linux-modules@vger.kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols
Date: Mon, 31 Oct 2022 10:55:00 +0800	[thread overview]
Message-ID: <842f626a-6d87-72c0-49ed-d66c1ad9534b@huawei.com> (raw)
In-Reply-To: <9e4892b540584b25aa5481cc40f1fb42@AcuMS.aculab.com>



On 2022/10/29 20:49, David Laight wrote:
>>>> On 2022/10/27 3:03, Luis Chamberlain wrote:
>>>>> On Wed, Oct 26, 2022 at 02:44:36PM +0800, Leizhen (ThunderTown) wrote:
>>>>>> On 2022/10/26 1:53, Luis Chamberlain wrote:
>>>>>>> This answers how we don't use a hash table, the question was *should* we
>>>>>>> use one?
> 
> (Probably brainfart) thought...
> 
> Is the current table (effectively) a sorted list of strings?
> So the lookup is a binary chop - so O(log(n)).

Currently not sorted.

> 
> But your hashes are having 'trouble' stopping one chain
> being very long?
> So a linear search of that hash chain is slow.
> In fact that sort of hashed lookup in O(n).

You've analyzed it very well. The hash method is not good for sorting names
and then searching in binary mode. I figured it out when I was working on
the design these days.

Current Implementation:
---------------------------------------
| idx | addresses | markers |  names  |
---------------------------------------
|  0  |    addr0  |         |  name0  |
|  1  |    addr1  |         |  name1  |
| ... |    addrx  |   [0]   |  namex  |
| 255 |    addrx  |         |  name255|
---------------------------------------
| 256 |  addr256  |         |  name256|
| ... |    addrx  |   [1]   |  namex  |
| 511 |  addr511  |         |  name511|
---------------------------------------

markers[0] = offset_of(name0)
markers[1] = offset_of(name256)

1. Find name by address
   binary search addresses[], get idx, traverse names[] from  markers[idx>>8] to markers[(idx>>8) + 1], return name

2. Find address by name
   traverse names[], get idx, return addresses[idx]

Hash Implementation:
Add two new arrays: hash_table[] and names_offsets[]

-----------------------------------------------------------------
| key |      hash_table       |         names_offsets           |
|---------------------------------------------------------------|
|  0  |  names_offsets[key=0] | offsets of all names with key=0 |
|  1  |  names_offsets[key=1] | offsets of all names with key=1 |
| ... |          ...          | offsets of all names with key=k |
|---------------------------------------------------------------|

hash_table[0] = 0
hash_table[1] = hash_table[0] + sizeof(names_offsets[0]) * number_of_names(key=0)
hash_table[2] = hash_table[1] + sizeof(names_offsets[0]) * number_of_names(key=1)

1. Find address by name
   hash name, get key, traverse names_offsets[] from index=hash_table[key] to
   index=hash_table[key+1], get the offset of name in names[]. binary search markers[],
   get index, then traverse names[] from  markers[index] to markers[index + 1], until
   match the offset of name, return addresses[idx].
2. Find address by name
   No change.

Sorted names Implementation:
Add two new arrays: offsets_of_addr_to_name[] and offsets_of_name[]

offsets_of_addr_to_name[i] = offset of name i in names[]
offsets_of_name[i]         = offset of sorted name i in names[]

1. Find name by address
   binary search addresses[], get idx, return names[offsets_of_addr_to_name[idx]]

2. Find address by name
   binary search offsets_of_name[], get idx, return addresses[idx]

> 
> What if the symbols were sorted by hash then name?
> (Without putting the hash into each entry.)
> Then the code could do a binary chop search over
> the symbols with the same hash value.
> The additional data is then an array of symbol numbers
> indexed by the hash - 32 bits for each bucket.
> 
> If the hash table has 0x1000 entries it saves 12 compares.
> (All of which are likely to be data cache misses.)
> 
> If you add the hash to each table entry then you can do
> a binary chop search for the hash itself.
> While this is the same search as is done for the strings
> the comparison (just a number) will be faster than a
> string compare.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

-- 
Regards,
  Zhen Lei

  reply	other threads:[~2022-10-31  2:55 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17  6:49 [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
2022-10-17  6:49 ` [PATCH v7 01/11] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 02/11] scripts/kallsyms: don't compress symbol types Zhen Lei
2022-10-17  6:49 ` [PATCH v7 03/11] scripts/kallsyms: remove helper sym_name() and cleanup Zhen Lei
2022-10-17  6:49 ` [PATCH v7 04/11] kallsyms: Add helper kallsyms_compress_symbol_name() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 05/11] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 06/11] kallsyms: Improve the performance of kallsyms_lookup_name() when CONFIG_LTO_CLANG=y Zhen Lei
2022-10-17  6:49 ` [PATCH v7 07/11] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 08/11] livepatch: Use kallsyms_on_each_match_symbol() to improve performance Zhen Lei
2022-10-17  6:49 ` [PATCH v7 09/11] livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 10/11] kallsyms: Delete an unused parameter related to kallsyms_on_each_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 11/11] kallsyms: Add self-test facility Zhen Lei
2022-10-18  8:21   ` kernel test robot
2022-10-18  9:11     ` Leizhen (ThunderTown)
2022-10-18  9:11       ` Leizhen
2022-10-18  9:32   ` kernel test robot
2022-10-19  8:39     ` Leizhen (ThunderTown)
2022-10-19  8:39       ` Leizhen
2022-10-21  2:00   ` kernel test robot
2022-10-19 12:01 ` [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols Luis Chamberlain
2022-10-19 14:11   ` Leizhen (ThunderTown)
2022-10-25 17:53     ` Luis Chamberlain
2022-10-26  6:44       ` Leizhen (ThunderTown)
2022-10-26 19:03         ` Luis Chamberlain
2022-10-27  3:26           ` Leizhen (ThunderTown)
2022-10-27  6:27             ` Leizhen (ThunderTown)
2022-10-29  8:10               ` Leizhen (ThunderTown)
2022-10-29 12:49                 ` David Laight
2022-10-31  2:55                   ` Leizhen (ThunderTown) [this message]
2022-10-31  4:55                 ` Leizhen (ThunderTown)
2022-10-31 15:04                   ` Leizhen (ThunderTown)
2022-11-02  9:18                     ` Leizhen (ThunderTown)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=842f626a-6d87-72c0-49ed-d66c1ad9534b@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mcgrof@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.