netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hou Tao <houtao1@huawei.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>, Yonghong Song <yhs@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Song Liu <songliubraving@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Joanne Koong <joannekoong@fb.com>
Subject: Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
Date: Wed, 9 Mar 2022 19:47:27 +0800	[thread overview]
Message-ID: <daea3845-ab03-12df-ec3b-a645dcee5aaf@huawei.com> (raw)
In-Reply-To: <CAADnVQJUJp3YBcpESwR3Q1U6GS1mBM=Vp-qYuQX7eZOaoLjdUA@mail.gmail.com>

Hi,

On 2/27/2022 11:08 AM, Alexei Starovoitov wrote:
> On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@huawei.com> wrote:
>>
>> For now, our case is a write-once case, so only lookup is considered.
>> When data set is bigger than 128KB, hash table has better lookup performance as
>> show below:
>>
>> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
>> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
>> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
>> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
>> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |
> 
> Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
> but what are the chances that it stays this way?
> Are you saying you can size up the table and populate to good % just once?
>
Yes. for our case the hash table is populated only once. During these test the
hash table is populated firstly by inserting all strings into the table and then
do the lookup. The strings for all tests come from the same string set.

> If so it's probably better to replace all strings with something
> like a good hash
A strong one like sha1sum and using the string as hash-table value just
as proposed in previous email ?

> 7M elements is not a lot. A hash producing 8 or 16 bytes will have close
> to zero false positives.
> And in case of "populate the table once" the hash seed can be
> precomputed and adjusted, so you can guarantee zero collisions
> for 7M strings. While lookup part can still have 0.001% chance
> of a false positive there could be a way to deal with it after lookup.
>
I can try the above method. But the lookup procedure will be slowed done by
calculating a good hash and the memory usage will not reduced.

>> Ternary search tree always has better memory usage:
>>
>> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
>> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
>> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
>> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
>> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>>
> 
> Ternary search tree looks amazing.
> Since you have a prototype can you wrap it into a new type of bpf map
> and post the patches?
Will do.

> I wonder what data structures look like to achieve such memory efficiency.
The lower memory usage partially is due to the string set for test is full file
paths and these paths share the same prefix. And ternary search tree reduces the
memory usage by sharing the common prefix.
> .
> 

      reply	other threads:[~2022-03-09 11:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-14 11:13 [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Hou Tao
2022-02-14 11:13 ` [RFC PATCH bpf-next v2 1/3] bpf: add support for string in hash table key Hou Tao
2022-02-14 11:13 ` [RFC PATCH bpf-next v2 2/3] selftests/bpf: add a simple test for htab str key Hou Tao
2022-02-14 11:13 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: add benchmark for string-key hash table Hou Tao
2022-02-17  3:50 ` [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Alexei Starovoitov
2022-02-18 13:53   ` Hou Tao
2022-02-19 18:44     ` Alexei Starovoitov
     [not found]       ` <ecc04a70-0b57-62ef-ab52-e7169845d789@huawei.com>
2022-02-27  3:08         ` Alexei Starovoitov
2022-03-09 11:47           ` Hou Tao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=daea3845-ab03-12df-ec3b-a645dcee5aaf@huawei.com \
    --to=houtao1@huawei.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joannekoong@fb.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).