Netfilter-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: netfilter-devel@vger.kernel.org,
	"Florian Westphal" <fw@strlen.de>,
	"Kadlecsik József" <kadlec@blackhole.kfki.hu>,
	"Eric Garver" <eric@garver.life>, "Phil Sutter" <phil@nwl.cc>
Subject: Re: [PATCH nf-next v4 5/9] nf_tables: Add set type for arbitrary concatenation of ranges
Date: Fri, 14 Feb 2020 21:42:25 +0100
Message-ID: <20200214204225.dh3ubs67vorh2ail@salvia> (raw)
In-Reply-To: <20200214204213.50b54ed4@redhat.com>

On Fri, Feb 14, 2020 at 08:42:13PM +0100, Stefano Brivio wrote:
> On Fri, 14 Feb 2020 19:16:34 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
> > You refer to a property that says that you can split a range into a
> > 2*n netmasks IIRC. Do you know what is the worst case when splitting
> > ranges?
> 
> I'm not sure I got your question: that is exactly the worst case, i.e.
> we can have _up to_ 2 * n netmasks (hence rules) given a range of n
> bits. There's an additional upper bound on this, given by the address
> space, but single fields in a concatenation can overlap.
> 
> For example, we can have up to 128 rules for an IPv6 range where at
> least 64 bits differ between the endpoints, and which would contain
> 2 ^ 64 addresses. Or, say, the IPv4 range 1.2.3.4 - 255.255.0.2 is
> expressed by 42 rules.
> 
> By the way, 0.0.0.1 - 255.255.255.254 takes 62 rules, so we can
> *probably* say it's 2 * n - 2, but I don't have a formal proof for that.

By "splitting" I was actually refering to "expanding", so you're
replying here to my worst-case range-to-rules expansion question.

> I have a couple of ways in mind to get that down to n / 2, but it's not
> straightforward and it will take me some time (assuming it makes
> sense). For the n bound, we can introduce negations (proof in
> literature), and I have some kind of ugly prototype. For the n / 2
> bound, I'd need some auxiliary data structure to keep insertion
> invertible.

OK, so there is room to improve the "rule expansion" logic. I didn't
spend much time on that front yet.

> In practice, the "average" case is much less, but to define it we would
> first need to agree on what are the actual components of the
> multivariate distribution... size and start? Is it a Poisson
> distribution then? After spending some time on this and disagreeing
> with myself I'd shyly recommend to skip the topic. :)

Yes, I agree to stick to something relatively simple and good is just
fine.

> > There is no ipset set like this, but I agree usecase might happen.
> 
> Actually, for ipset, a "net,port,net,port" type was proposed
> (netfilter-devel <20181216213039.399-1-oliver@uptheinter.net>), but when
> József enquired about the intended use case, none was given. So maybe
> this whole "net,net,port,mac" story makes even less sense.

Would it make sense to you to restrict pipapo to 3 fields until there
is someone with a usecase for this?

[...]
> > The per-cpu scratch index is only required if we cannot fit in the
> > "result bitmap" into the stack, right?
> 
> Right.
> 
> > Probably up to 256 bytes result bitmap in the stack is reasonable?
> > That makes 8192 pipapo rules. There will be no need to disable bh and
> > make use of the percpu scratchpad area in that case.
> 
> Right -- the question is whether that would mean yet another
> implementation for the lookup function.

This would need another lookup function that can be selected from
control plane path. The set size and the range-to-rule expansion
worst-case can tell us if it would fit into the stack. It's would be
just one extra lookup function for this case, ~80-100 LOC.

> > If adjusting the code to deal with variable length "pipapo word" size
> > is not too convoluted, then you could just deal with the variable word
> > size from the insert / delete / get (slow) path and register one
> > lookup function for the version that is optimized for this pipapo word
> > size.
> 
> Yes, I like this a lot -- we would also need one function to rebuild
> tables when the word size changes, but that sounds almost trivial.
> Changes for the slow path are actually rather simple.
>
> Still, I start doubting quite heavily that my original worst case is
> reasonable. If we stick to the one you mentioned, or even something in
> between, it makes no sense to keep 4-bit buckets.

OK, then moving to 8-bits will probably remove a bit of code which is
dealing with "nibbles".

> By the way, I went ahead and tried the 8-bit bucket version of the C
> implementation only, on my usual x86_64 box (one thread, AMD Epyc 7351).
> I think it's worth it:
> 
>                 4-bit       8-bit
> net,port
>  1000 entries   2304165pps  2901299pps
> port,net
>  100 entries    4131471pps  4751247pps
> net6,port
>  1000 entries   1092557pps  1651037pps
> port,proto
>  30000 entries   284147pps   449665pps
> net6,port,mac
>  10 entries     2082880pps  2762291pps
> net6,port,mac,proto
>  1000 entries    783810pps  1195823pps
> net,mac
>  1000 entries   1279122pps  1934003pps

Assuming the same concatenation type, larger bucket size makes pps
drop in the C implementation?

> I would now proceed extending this to the AVX2 implementation and (once
> I finish it) to the NEON one, I actually expect bigger gains there.

Good. BTW, probably you can add a new NFT_SET_CLASS_JIT class that
comes becomes NFT_SET_CLASS_O_1 to make the set routine that selects
the set pick the jit version instead.

> > Probably adding helper function to deal with pipapo words would help
> > to prepare for such update in the future. There is the ->estimate
> > function that allows to calculate for the best word size depending on
> > all the information this gets from the set definition.
> 
> Hm, I really think it should be kind of painless to make this dynamic
> on insertion/deletion.

OK, good. How would you like to proceed?

Thanks!

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-21 23:17 [PATCH nf-next v4 0/9] nftables: Set implementation " Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 1/9] netfilter: nf_tables: add nft_setelem_parse_key() Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 2/9] netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 3/9] netfilter: nf_tables: Support for sets with multiple ranged fields Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 4/9] bitmap: Introduce bitmap_cut(): cut bits and shift remaining Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 5/9] nf_tables: Add set type for arbitrary concatenation of ranges Stefano Brivio
2020-02-07 11:23   ` Pablo Neira Ayuso
2020-02-10 15:10     ` Stefano Brivio
2020-02-14 18:16       ` Pablo Neira Ayuso
2020-02-14 19:42         ` Stefano Brivio
2020-02-14 20:42           ` Pablo Neira Ayuso [this message]
2020-02-14 23:06             ` Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 6/9] selftests: netfilter: Introduce tests for sets with range concatenation Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 7/9] nft_set_pipapo: Prepare for vectorised implementation: alignment Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 8/9] nft_set_pipapo: Prepare for vectorised implementation: helpers Stefano Brivio
2020-01-21 23:17 ` [PATCH nf-next v4 9/9] nft_set_pipapo: Introduce AVX2-based lookup implementation Stefano Brivio
2020-01-27  6:41   ` kbuild test robot
2020-01-27  8:20 ` [PATCH nf-next v4 0/9] nftables: Set implementation for arbitrary concatenation of ranges Pablo Neira Ayuso
2020-02-20 10:52 ` Phil Sutter
2020-02-20 11:04   ` Stefano Brivio

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200214204225.dh3ubs67vorh2ail@salvia \
    --to=pablo@netfilter.org \
    --cc=eric@garver.life \
    --cc=fw@strlen.de \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=phil@nwl.cc \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netfilter-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netfilter-devel/0 netfilter-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netfilter-devel netfilter-devel/ https://lore.kernel.org/netfilter-devel \
		netfilter-devel@vger.kernel.org
	public-inbox-index netfilter-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netfilter-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git