netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org,
	"Florian Westphal" <fw@strlen.de>,
	"Kadlecsik József" <kadlec@blackhole.kfki.hu>,
	"Eric Garver" <eric@garver.life>, "Phil Sutter" <phil@nwl.cc>
Subject: Re: [PATCH nf-next v4 5/9] nf_tables: Add set type for arbitrary concatenation of ranges
Date: Mon, 10 Feb 2020 16:10:47 +0100	[thread overview]
Message-ID: <20200210161047.370582c5@redhat.com> (raw)
In-Reply-To: <20200207112308.sqtlvbluujlftqz2@salvia>

On Fri, 7 Feb 2020 12:23:08 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> Hi Stefano,
> 
> A bit of late feedback on your new datastructure.
> 
> Did you tests with 8-bits grouping instead of 4-bits?

Yes, at the very beginning, not with the final implementation. It was
somewhat faster (on x86_64, I don't remember how much) for small
numbers of rules, then I thought we would use too much memory, because:

> Assuming a concatenation of 12 bytes (each field 4 bytes, hence 3
> fields):
> 
> * Using 4-bits groups: the number of buckets is 2^4 = 16 multiplied
>   by the bucket word (assuming one long word, 8 bytes, 64 pipapo
>   rules) is 16 * 8 = 128 bytes per group-row in the looking table. Then,
>   the number of group-rows is 8 given that 32 bits, then 32 / 4 = 8
>   group-rows.
> 
>   8 * 128 bytes = 1024 bytes per lookup table.
> 
>   Assuming 3 fields, then this is 1024 * 3 = 3072 bytes.
> 
> * Using 8-bits groups: 2^8 = 256, then 256 * 8 = 2048 bytes per
>   group-row. Then, 32 / 8 = 4 group-rows in total.
> 
>   4 * 2048 bytes = 8192 bytes per lookup table.
> 
>   Therefore, 3 * 8192 = 24576 bytes. Still rather small.
> 
> This is missing the mapping table that links the lookup tables in the
> memory counting. And I'm assuming that the number of pipapo rules in
> the lookup table fits into 64-bits bucket long word.

...the (reasonable?) worst case I wanted to cover was two IPv6
addresses, one port, one MAC address (in ipset terms
"net,net,port,mac"), with 2 ^ 16 unique, non-overlapping entries each
(or ranges expanding to that amount of rules), because that's what
(single, non-concatenated) ipset "bitmap" types can do.

Also ignoring the mapping table (it's "small"), with 4-bit buckets:

- for the IPv6 addresses, we have 16 buckets, each 2 ^ 16
  bits wide, and 32 groups (128 bits / 4 bits), that is, 8MiB in
  total

- for the MAC address, 16 buckets, each 2 ^ 16 bits wide, and 12
  groups, 1.5MiB

- for the port, 16 buckets, each 2 ^ 12 bits wide, 2 groups, 0.25MiB

that is, 9.75MiB.

With 8-bit buckets: we can just multiply everything by 8 (that is,
2 ^ 8 / 2 ^ 4 / 2, because we have 2 ^ (8 - 4) times the buckets, with
half the groups), 78MiB.

And that started feeling like "a lot". However, I'm probably overdoing
with the worst case -- this was just to explain what brought me to the
4-bit choice, now I start doubting about it.

> Anyway, my understanding is that the more bits you use for grouping,
> the larger the lookup table becomes.
> 
> Still, both look very small in terms of memory consumption for these
> days.
> 
> I'm just telling this because the C implementation can probably get
> better numbers at the cost of consuming more memory? Probably do this
> at some point?

Another topic is the additional amount of cachelines we would use. I
don't expect that effect to be visible, but I might be wrong.

So yes, I think it's definitely worth a try, thanks for the hint! I'll
try to look into this soon and test it on a few archs (something with
small cachelines, say MIPS r2k, would be worth checking, too).

We could even consider to dynamically adjust group size depending on
the set size, I don't know yet if that gets too convoluted.

> BTW, with a few more knobs it should be possible to integrate better
> this datastructure into the transaction infrastructure, this can be
> done incrementally.
> 
> More questions below.
> 
> On Wed, Jan 22, 2020 at 12:17:55AM +0100, Stefano Brivio wrote:
> [...]
> > diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> > new file mode 100644
> > index 000000000000..f0cb1e13af50
> > --- /dev/null
> > +++ b/net/netfilter/nft_set_pipapo.c  
> [...]
> > + *
> > + *       rule indices in current field: 0    1    2
> > + *       map to rules in next field:    0    1    1
> > + *
> > + *      the new result bitmap will be 0x02: rule 1 was set, and rule 1 will be
> > + *      set.
> > + *
> > + *      We can now extend this example to cover the second iteration of the step
> > + *      above (lookup and AND bitmap): assuming the port field is
> > + *      2048 < 0  0  5  0 >, with starting result bitmap 0x2, and lookup table
> > + *      for "port" field from pre-computation example:
> > + *
> > + * ::
> > + *
> > + *                     bucket
> > + *      group  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
> > + *        0   0,1
> > + *        1   0,1
> > + *        2                    0   1
> > + *        3   0,1
> > + *
> > + *       operations are: 0x2 & 0x3 [bucket 0] & 0x3 [bucket 0] & 0x2 [bucket 5]
> > + *       & 0x3 [bucket 0], resulting bitmap is 0x2.
> > + *
> > + *   - if this is the last field in the set, look up the value from the mapping
> > + *     array corresponding to the final result bitmap
> > + *
> > + *      Example: 0x2 resulting bitmap from 192.168.1.5:2048, mapping array for
> > + *      last field from insertion example:
> > + *
> > + * ::
> > + *
> > + *       rule indices in last field:    0    1
> > + *       map to elements:             0x42  0x66  
> 
> Should this be instead?
> 
>             rule indices in last field:    0    1
>             map to elements:             0x66  0x42
> 
> If the resulting bitmap is 0x2, then this is actually pointing to
> rule index 1 in this lookup table, that is the 2048.

Right! Good catch, thanks.

I swapped the values also in the "insertion" example above. For some
reason, this was correct in the equivalent example of the stand-alone
implementation:
	https://pipapo.lameexcu.se/pipapo/tree/pipapo.c#n162

> > + *      the matching element is at 0x42.  
> 
> Hence, the matching 0x42 element.
> 
> Otherwise, I don't understand how to interpret the "result bitmap". I
> thought this contains the matching pipapo rule index that is expressed
> as a bitmask.

Yes, that's correct. Let me know if you want me to send a patch or if
you'd rather fix it.

> [...]
> > +static int pipapo_refill(unsigned long *map, int len, int rules,
> > +			 unsigned long *dst, union nft_pipapo_map_bucket *mt,
> > +			 bool match_only)
> > +{
> > +	unsigned long bitset;
> > +	int k, ret = -1;
> > +
> > +	for (k = 0; k < len; k++) {
> > +		bitset = map[k];
> > +		while (bitset) {
> > +			unsigned long t = bitset & -bitset;
> > +			int r = __builtin_ctzl(bitset);
> > +			int i = k * BITS_PER_LONG + r;
> > +
> > +			if (unlikely(i >= rules)) {
> > +				map[k] = 0;
> > +				return -1;
> > +			}
> > +
> > +			if (unlikely(match_only)) {  
> 
> Not a big issue, but this branch holds true for the last field, my
> understanding for unlikely() is that it should be used for paths where
> 99.99...% is not going to happen. Not a big issue, just that when
> reading the code I got confused this is actually likely to happen.

You're right, I wanted to make sure we avoid branching for the "common"
case (this early return happens just once), but this is probably an
abuse. I'll look into a more acceptable way to achieve this.

> > +				bitmap_clear(map, i, 1);
> > +				return i;
> > +			}
> > +
> > +			ret = 0;
> > +
> > +			bitmap_set(dst, mt[i].to, mt[i].n);
> > +
> > +			bitset ^= t;
> > +		}
> > +		map[k] = 0;
> > +	}
> > +
> > +	return ret;  
> > +}>   

-- 
Stefano


  reply	other threads:[~2020-02-10 15:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-21 23:17 [PATCH nf-next v4 0/9] nftables: Set implementation for arbitrary concatenation of ranges Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 1/9] netfilter: nf_tables: add nft_setelem_parse_key() Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 2/9] netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 3/9] netfilter: nf_tables: Support for sets with multiple ranged fields Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 4/9] bitmap: Introduce bitmap_cut(): cut bits and shift remaining Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 5/9] nf_tables: Add set type for arbitrary concatenation of ranges Stefano Brivio
2020-02-07 11:23   ` Pablo Neira Ayuso
2020-02-10 15:10     ` Stefano Brivio [this message]
2020-02-14 18:16       ` Pablo Neira Ayuso
2020-02-14 19:42         ` Stefano Brivio
2020-02-14 20:42           ` Pablo Neira Ayuso
2020-02-14 23:06             ` Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 6/9] selftests: netfilter: Introduce tests for sets with range concatenation Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 7/9] nft_set_pipapo: Prepare for vectorised implementation: alignment Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 8/9] nft_set_pipapo: Prepare for vectorised implementation: helpers Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 9/9] nft_set_pipapo: Introduce AVX2-based lookup implementation Stefano Brivio
2020-01-27  6:41   ` kbuild test robot
2020-01-27  8:20 ` [PATCH nf-next v4 0/9] nftables: Set implementation for arbitrary concatenation of ranges Pablo Neira Ayuso
2020-02-20 10:52 ` Phil Sutter
2020-02-20 11:04   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200210161047.370582c5@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=eric@garver.life \
    --cc=fw@strlen.de \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).