From: Stefano Brivio <sbrivio@redhat.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org,
"Florian Westphal" <fw@strlen.de>,
"Kadlecsik József" <kadlec@blackhole.kfki.hu>,
"Eric Garver" <eric@garver.life>, "Phil Sutter" <phil@nwl.cc>
Subject: Re: [PATCH nf-next v4 5/9] nf_tables: Add set type for arbitrary concatenation of ranges
Date: Mon, 10 Feb 2020 16:10:47 +0100 [thread overview]
Message-ID: <20200210161047.370582c5@redhat.com> (raw)
In-Reply-To: <20200207112308.sqtlvbluujlftqz2@salvia>
On Fri, 7 Feb 2020 12:23:08 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Stefano,
>
> A bit of late feedback on your new datastructure.
>
> Did you tests with 8-bits grouping instead of 4-bits?
Yes, at the very beginning, not with the final implementation. It was
somewhat faster (on x86_64, I don't remember how much) for small
numbers of rules, then I thought we would use too much memory, because:
> Assuming a concatenation of 12 bytes (each field 4 bytes, hence 3
> fields):
>
> * Using 4-bits groups: the number of buckets is 2^4 = 16 multiplied
> by the bucket word (assuming one long word, 8 bytes, 64 pipapo
> rules) is 16 * 8 = 128 bytes per group-row in the looking table. Then,
> the number of group-rows is 8 given that 32 bits, then 32 / 4 = 8
> group-rows.
>
> 8 * 128 bytes = 1024 bytes per lookup table.
>
> Assuming 3 fields, then this is 1024 * 3 = 3072 bytes.
>
> * Using 8-bits groups: 2^8 = 256, then 256 * 8 = 2048 bytes per
> group-row. Then, 32 / 8 = 4 group-rows in total.
>
> 4 * 2048 bytes = 8192 bytes per lookup table.
>
> Therefore, 3 * 8192 = 24576 bytes. Still rather small.
>
> This is missing the mapping table that links the lookup tables in the
> memory counting. And I'm assuming that the number of pipapo rules in
> the lookup table fits into 64-bits bucket long word.
...the (reasonable?) worst case I wanted to cover was two IPv6
addresses, one port, one MAC address (in ipset terms
"net,net,port,mac"), with 2 ^ 16 unique, non-overlapping entries each
(or ranges expanding to that amount of rules), because that's what
(single, non-concatenated) ipset "bitmap" types can do.
Also ignoring the mapping table (it's "small"), with 4-bit buckets:
- for the IPv6 addresses, we have 16 buckets, each 2 ^ 16
bits wide, and 32 groups (128 bits / 4 bits), that is, 8MiB in
total
- for the MAC address, 16 buckets, each 2 ^ 16 bits wide, and 12
groups, 1.5MiB
- for the port, 16 buckets, each 2 ^ 12 bits wide, 2 groups, 0.25MiB
that is, 9.75MiB.
With 8-bit buckets: we can just multiply everything by 8 (that is,
2 ^ 8 / 2 ^ 4 / 2, because we have 2 ^ (8 - 4) times the buckets, with
half the groups), 78MiB.
And that started feeling like "a lot". However, I'm probably overdoing
with the worst case -- this was just to explain what brought me to the
4-bit choice, now I start doubting about it.
> Anyway, my understanding is that the more bits you use for grouping,
> the larger the lookup table becomes.
>
> Still, both look very small in terms of memory consumption for these
> days.
>
> I'm just telling this because the C implementation can probably get
> better numbers at the cost of consuming more memory? Probably do this
> at some point?
Another topic is the additional amount of cachelines we would use. I
don't expect that effect to be visible, but I might be wrong.
So yes, I think it's definitely worth a try, thanks for the hint! I'll
try to look into this soon and test it on a few archs (something with
small cachelines, say MIPS r2k, would be worth checking, too).
We could even consider to dynamically adjust group size depending on
the set size, I don't know yet if that gets too convoluted.
> BTW, with a few more knobs it should be possible to integrate better
> this datastructure into the transaction infrastructure, this can be
> done incrementally.
>
> More questions below.
>
> On Wed, Jan 22, 2020 at 12:17:55AM +0100, Stefano Brivio wrote:
> [...]
> > diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> > new file mode 100644
> > index 000000000000..f0cb1e13af50
> > --- /dev/null
> > +++ b/net/netfilter/nft_set_pipapo.c
> [...]
> > + *
> > + * rule indices in current field: 0 1 2
> > + * map to rules in next field: 0 1 1
> > + *
> > + * the new result bitmap will be 0x02: rule 1 was set, and rule 1 will be
> > + * set.
> > + *
> > + * We can now extend this example to cover the second iteration of the step
> > + * above (lookup and AND bitmap): assuming the port field is
> > + * 2048 < 0 0 5 0 >, with starting result bitmap 0x2, and lookup table
> > + * for "port" field from pre-computation example:
> > + *
> > + * ::
> > + *
> > + * bucket
> > + * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > + * 0 0,1
> > + * 1 0,1
> > + * 2 0 1
> > + * 3 0,1
> > + *
> > + * operations are: 0x2 & 0x3 [bucket 0] & 0x3 [bucket 0] & 0x2 [bucket 5]
> > + * & 0x3 [bucket 0], resulting bitmap is 0x2.
> > + *
> > + * - if this is the last field in the set, look up the value from the mapping
> > + * array corresponding to the final result bitmap
> > + *
> > + * Example: 0x2 resulting bitmap from 192.168.1.5:2048, mapping array for
> > + * last field from insertion example:
> > + *
> > + * ::
> > + *
> > + * rule indices in last field: 0 1
> > + * map to elements: 0x42 0x66
>
> Should this be instead?
>
> rule indices in last field: 0 1
> map to elements: 0x66 0x42
>
> If the resulting bitmap is 0x2, then this is actually pointing to
> rule index 1 in this lookup table, that is the 2048.
Right! Good catch, thanks.
I swapped the values also in the "insertion" example above. For some
reason, this was correct in the equivalent example of the stand-alone
implementation:
https://pipapo.lameexcu.se/pipapo/tree/pipapo.c#n162
> > + * the matching element is at 0x42.
>
> Hence, the matching 0x42 element.
>
> Otherwise, I don't understand how to interpret the "result bitmap". I
> thought this contains the matching pipapo rule index that is expressed
> as a bitmask.
Yes, that's correct. Let me know if you want me to send a patch or if
you'd rather fix it.
> [...]
> > +static int pipapo_refill(unsigned long *map, int len, int rules,
> > + unsigned long *dst, union nft_pipapo_map_bucket *mt,
> > + bool match_only)
> > +{
> > + unsigned long bitset;
> > + int k, ret = -1;
> > +
> > + for (k = 0; k < len; k++) {
> > + bitset = map[k];
> > + while (bitset) {
> > + unsigned long t = bitset & -bitset;
> > + int r = __builtin_ctzl(bitset);
> > + int i = k * BITS_PER_LONG + r;
> > +
> > + if (unlikely(i >= rules)) {
> > + map[k] = 0;
> > + return -1;
> > + }
> > +
> > + if (unlikely(match_only)) {
>
> Not a big issue, but this branch holds true for the last field, my
> understanding for unlikely() is that it should be used for paths where
> 99.99...% is not going to happen. Not a big issue, just that when
> reading the code I got confused this is actually likely to happen.
You're right, I wanted to make sure we avoid branching for the "common"
case (this early return happens just once), but this is probably an
abuse. I'll look into a more acceptable way to achieve this.
> > + bitmap_clear(map, i, 1);
> > + return i;
> > + }
> > +
> > + ret = 0;
> > +
> > + bitmap_set(dst, mt[i].to, mt[i].n);
> > +
> > + bitset ^= t;
> > + }
> > + map[k] = 0;
> > + }
> > +
> > + return ret;
> > +}>
--
Stefano
next prev parent reply other threads:[~2020-02-10 15:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-21 23:17 [PATCH nf-next v4 0/9] nftables: Set implementation for arbitrary concatenation of ranges Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 1/9] netfilter: nf_tables: add nft_setelem_parse_key() Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 2/9] netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 3/9] netfilter: nf_tables: Support for sets with multiple ranged fields Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 4/9] bitmap: Introduce bitmap_cut(): cut bits and shift remaining Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 5/9] nf_tables: Add set type for arbitrary concatenation of ranges Stefano Brivio
2020-02-07 11:23 ` Pablo Neira Ayuso
2020-02-10 15:10 ` Stefano Brivio [this message]
2020-02-14 18:16 ` Pablo Neira Ayuso
2020-02-14 19:42 ` Stefano Brivio
2020-02-14 20:42 ` Pablo Neira Ayuso
2020-02-14 23:06 ` Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 6/9] selftests: netfilter: Introduce tests for sets with range concatenation Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 7/9] nft_set_pipapo: Prepare for vectorised implementation: alignment Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 8/9] nft_set_pipapo: Prepare for vectorised implementation: helpers Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 9/9] nft_set_pipapo: Introduce AVX2-based lookup implementation Stefano Brivio
2020-01-27 6:41 ` kbuild test robot
2020-01-27 8:20 ` [PATCH nf-next v4 0/9] nftables: Set implementation for arbitrary concatenation of ranges Pablo Neira Ayuso
2020-02-20 10:52 ` Phil Sutter
2020-02-20 11:04 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200210161047.370582c5@redhat.com \
--to=sbrivio@redhat.com \
--cc=eric@garver.life \
--cc=fw@strlen.de \
--cc=kadlec@blackhole.kfki.hu \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=phil@nwl.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).