Re: Numen with reference to vmap

From: "Serguei Bezverkhi (sbezverk)" <sbezverk@cisco.com>
To: Phil Sutter <phil@nwl.cc>,
	Arturo Borrero Gonzalez <arturo@netfilter.org>
Cc: "netfilter-devel@vger.kernel.org" <netfilter-devel@vger.kernel.org>
Subject: Re: Numen with reference to vmap
Date: Tue, 17 Dec 2019 00:51:07 +0000	[thread overview]
Message-ID: <98A8233C-1A83-44A1-A122-6F80212D618F@cisco.com> (raw)
In-Reply-To: <20191204223215.GX14469@orbyte.nwl.cc>

Hi Phil,

In this google doc, see link: https://docs.google.com/document/d/128gllbr_o-40pD2i0D14zMNdtCwRYR7YM49T4L2Eyac/edit?usp=sharing

There is a question about possible optimizations. I was wondering if you could comment/reply. Also I got one more question about updates of a set. Let's say there is a set with 10k entries, how costly would be the update of such set.

Thanks a lot
Serguei

On 2019-12-04, 5:32 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote:

    Hi Arturo,

    On Wed, Dec 04, 2019 at 06:31:02PM +0100, Arturo Borrero Gonzalez wrote:
    > On 12/4/19 4:56 PM, Phil Sutter wrote:
    > > OK, static load-balancing between two services - no big deal. :)
    > > 
    > > What happens if config changes? I.e., if one of the endpoints goes down
    > > or a third one is added? (That's the thing we're discussing right now,
    > > aren't we?)
    > 
    > if the non-anon map for random numgen was allowed, then only elements would need
    > to be adjusted:
    > 
    > dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 }
    > 
    > You could always use mod 100 (or 10000 if you want) and just play with the map
    > probabilities by updating map elements. This is a valid use case I think.
    > The mod number can just be the max number of allowed endpoints per service in
    > kubernetes.
    > 
    > @Phil,
    > 
    > I'm not sure if the typeof() thingy will work in this case, since the integer
    > length would depend on the mod value used.
    > What about introducing something like an explicit u128 integer datatype. Perhaps
    > it's useful for other use cases too...

    Out of curiosity I implemented the bits to support typeof keyword in
    parser and scanner. It's a bit clumsy, but it works. I can do:

    | nft add map t m2 '{ type typeof numgen random mod 2 : verdict; }'

    (The 'random mod 2' part is ignored, but needed as otherwise it's not a
    primary_expr. :D)

    The output is:

    | table ip t {
    | 	map m2 {
    | 		type integer : verdict
    | 	}
    | }

    So integer size information is lost, this won't work when fed back.
    There are two options to solve this:

    A) Push expression info into kernel so we can correctly deserialize the
       original input.

    B) As you suggested, have something like 'int32' or maybe better 'int(32)'.

    I consider (B) to be way less ugly. And if we went that route, we could
    actually use the 'int32'/'int(32)' thing in the first place. All users
    have to know is how large is 'numgen' data type. Or we're even smart
    here, taking into account that such a map may be used with different
    inputs and mask input to fit map key size. IIRC, we may even have had
    this discussion in an inconveniently cold room in Malaga once. :)

    > @Serguei,
    > 
    > kubernetes implements a complex chain of mechanisms to deal with traffic. What
    > happens if endpoints for a given svc have different ports? I don't know if
    > that's supported or not, but then this approach wouldn't work either: you can't
    > use dnat numgen randmo { 0-49 : <ip>:<port> }.
    > 
    > Also, we have the masquerade/drop thing going on too, which needs to be deal
    > with and that currently is done by yet another chain jump + packet mark.
    > 
    > I'm not sure in which state of the development you are, but this is my
    > suggestion: Try to don't over-optimize in the first iteration. Just get a
    > working nft ruleset with the few optimization that make sense and are easy to
    > use (and understand). For iteration #2 we can do better optimizations, including
    > patching missing features we may have in nftables.
    > I really want a ruleset with very little rules, but we are still comparing with
    > the iptables ruleset. I suggest we leave the hard optimization for a later point
    > when we are comparing nft vs nft rulesets.

    +1 for optimize not (yet). At least there's a certain chance that we're
    spending much effort into optimizing a path which isn't even the
    bottleneck later.

    Cheers, Phil