Hello, Nftables wiki gives this example for numgen: nft add rule nat prerouting numgen random mod 2 vmap { 0 : jump mychain1, 1 : jump mychain2 } I would like to use it but with map reference, like this: nft add rule nat prerouting numgen random mod 2 vmap @service1-endpoints Could you please confirm if it is supported? If it is what would be the type of the key in such map? I thought it would be integer, but command fails. sudo nft --debug all add map ipv4table k8s-57XVOCFNTLTR3Q27-endpoints { type integer : verdict \; } Error: unqualified key type integer specified in map definition add map ipv4table k8s-57XVOCFNTLTR3Q27-endpoints { type integer : verdict ; } ^^^^^^^^^^^^^^^^^^^^^^^^^^ The ultimate goal is to update dynamically just the map with available endpoints and loadbalance between them without touching the rule. Thank you Serguei
Hi Serguei, On Wed, Dec 04, 2019 at 12:54:05AM +0000, Serguei Bezverkhi (sbezverk) wrote: > Nftables wiki gives this example for numgen: > > nft add rule nat prerouting numgen random mod 2 vmap { 0 : jump mychain1, 1 : jump mychain2 } > > I would like to use it but with map reference, like this: > > nft add rule nat prerouting numgen random mod 2 vmap @service1-endpoints > > Could you please confirm if it is supported? If it is what would be the type of the key in such map? I thought it would be integer, but command fails. > > sudo nft --debug all add map ipv4table k8s-57XVOCFNTLTR3Q27-endpoints { type integer : verdict \; } > Error: unqualified key type integer specified in map definition > add map ipv4table k8s-57XVOCFNTLTR3Q27-endpoints { type integer : verdict ; } > ^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, this is sadly not possible right now. numgen type is 32bit integer, but we don't have a type definition matching that. Type 'integer' is unqualified regarding size, therefore unsuitable for use in map/set definitions. This all works when using anonymous set/map because key type is deduced from map LHS. We plan to support a 'typeof' keyword at some point to allow for the same deduction from within named map/set declarations, but it needs further work as the type info is lost on return path (when listing) so it would create a ruleset that can't be fed back. > The ultimate goal is to update dynamically just the map with available endpoints and loadbalance between them without touching the rule. I don't quite understand why you need to dynamically change the load-balancing rule: numgen modulus is fixed anyway, so the number of elements in vmap are fixed. Maybe just jump to chains and dynamically update those instead? Cheers, Phil
Hello Phil, Thank you for your reply. It is very unfortunate indeed. Here is the scenario where I thought to use a non-anonymous vmap. Each k8s service can have 0, 1 or more associated endpoints, backends (pods providing this service). 0 endpoint already taken care of in filter prerouting hook. When there are 1 or more, proxy needs to load balance incoming connections between endpoints.I thought to create vmap per service with 1 rule per service . When an endpoint gets updated (add/deleted) which could happen anytime then the only vmap get corresponding update and my hope was that automagically load balancing will be adjusted to use updated endpoints list. With what you explained, I am not sure if dynamic load balancing is possible at all. If numgen work only with static anonymous vmap and fixed modulus , the only way to address this dynamic nature of endpoints is to recreate service rule everytime when number of endpoints changes (recalculate modulus and entries in vmap). I suspect it is way less efficient. What will happen to dataplane and packets in transit when the rule will be deleted and then recreated? I suspect it might result in dropped packets, could you please comment on the possible impact? If you could suggest a better approach for the described scenario, appreciate if you share it. Thank you Serguei On 2019-12-04, 5:18 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Wed, Dec 04, 2019 at 12:54:05AM +0000, Serguei Bezverkhi (sbezverk) wrote: > Nftables wiki gives this example for numgen: > > nft add rule nat prerouting numgen random mod 2 vmap { 0 : jump mychain1, 1 : jump mychain2 } > > I would like to use it but with map reference, like this: > > nft add rule nat prerouting numgen random mod 2 vmap @service1-endpoints > > Could you please confirm if it is supported? If it is what would be the type of the key in such map? I thought it would be integer, but command fails. > > sudo nft --debug all add map ipv4table k8s-57XVOCFNTLTR3Q27-endpoints { type integer : verdict \; } > Error: unqualified key type integer specified in map definition > add map ipv4table k8s-57XVOCFNTLTR3Q27-endpoints { type integer : verdict ; } > ^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, this is sadly not possible right now. numgen type is 32bit integer, but we don't have a type definition matching that. Type 'integer' is unqualified regarding size, therefore unsuitable for use in map/set definitions. This all works when using anonymous set/map because key type is deduced from map LHS. We plan to support a 'typeof' keyword at some point to allow for the same deduction from within named map/set declarations, but it needs further work as the type info is lost on return path (when listing) so it would create a ruleset that can't be fed back. > The ultimate goal is to update dynamically just the map with available endpoints and loadbalance between them without touching the rule. I don't quite understand why you need to dynamically change the load-balancing rule: numgen modulus is fixed anyway, so the number of elements in vmap are fixed. Maybe just jump to chains and dynamically update those instead? Cheers, Phil
Hi, On Wed, Dec 04, 2019 at 01:47:47PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Thank you for your reply. It is very unfortunate indeed. Here is the scenario where I thought to use a non-anonymous vmap. > > Each k8s service can have 0, 1 or more associated endpoints, backends (pods providing this service). 0 endpoint already taken care of in filter prerouting hook. When there are 1 or more, proxy needs to load balance incoming connections between endpoints.I thought to create vmap per service with 1 rule per service . When an endpoint gets updated (add/deleted) which could happen anytime then the only vmap get corresponding update and my hope was that automagically load balancing will be adjusted to use updated endpoints list. > > With what you explained, I am not sure if dynamic load balancing is possible at all. If numgen work only with static anonymous vmap and fixed modulus , the only way to address this dynamic nature of endpoints is to recreate service rule everytime when number of endpoints changes (recalculate modulus and entries in vmap). I suspect it is way less efficient. Well, if you have a modulus of, say, 5 and your vmap contains only entries 0 to 3 your setup is broken anyway. So I guess you will need to adjust modulus along with entries in vmap at all times. What is the iptables-equivalent you want to replace? Maybe that serves as inspiration for how to solve it in nftables. > What will happen to dataplane and packets in transit when the rule will be deleted and then recreated? I suspect it might result in dropped packets, could you please comment on the possible impact? Well, you could replace the rule in a single transaction, that would eliminate the timespan the rule doesn't exist. AFAICT, this is RCU-based so packets will either hit the old or the new rule then. Cheers, Phil
Hi Phil, I can also minimize any impact by inserting a new rule in front of the old one, and then delete the old one. So in this case there should no any impact. Here is iptables rules I try to mimic: // -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB // -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-MMFZROQSLQ3DKOQA // ! // ! Endpoint 1 for KUBE-SVC-57XVOCFNTLTR3Q27 // ! // -A KUBE-SEP-FS3FUULGZPVD4VYB -s 57.112.0.247/32 -j KUBE-MARK-MASQ // -A KUBE-SEP-FS3FUULGZPVD4VYB -p tcp -m tcp -j DNAT --to-destination 57.112.0.247:8080 // ! // ! Endpoint 2 for KUBE-SVC-57XVOCFNTLTR3Q27 // ! // -A KUBE-SEP-MMFZROQSLQ3DKOQA -s 57.112.0.248/32 -j KUBE-MARK-MASQ // -A KUBE-SEP-MMFZROQSLQ3DKOQA -p tcp -m tcp -j DNAT --to-destination 57.112.0.248:8080 As you can see SVC chain KUBE-SVC-57XVOCFNTLTR3Q27 load balance between 2 endpoints. Thank you Serguei On 2019-12-04, 10:19 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Wed, Dec 04, 2019 at 01:47:47PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Thank you for your reply. It is very unfortunate indeed. Here is the scenario where I thought to use a non-anonymous vmap. > > Each k8s service can have 0, 1 or more associated endpoints, backends (pods providing this service). 0 endpoint already taken care of in filter prerouting hook. When there are 1 or more, proxy needs to load balance incoming connections between endpoints.I thought to create vmap per service with 1 rule per service . When an endpoint gets updated (add/deleted) which could happen anytime then the only vmap get corresponding update and my hope was that automagically load balancing will be adjusted to use updated endpoints list. > > With what you explained, I am not sure if dynamic load balancing is possible at all. If numgen work only with static anonymous vmap and fixed modulus , the only way to address this dynamic nature of endpoints is to recreate service rule everytime when number of endpoints changes (recalculate modulus and entries in vmap). I suspect it is way less efficient. Well, if you have a modulus of, say, 5 and your vmap contains only entries 0 to 3 your setup is broken anyway. So I guess you will need to adjust modulus along with entries in vmap at all times. What is the iptables-equivalent you want to replace? Maybe that serves as inspiration for how to solve it in nftables. > What will happen to dataplane and packets in transit when the rule will be deleted and then recreated? I suspect it might result in dropped packets, could you please comment on the possible impact? Well, you could replace the rule in a single transaction, that would eliminate the timespan the rule doesn't exist. AFAICT, this is RCU-based so packets will either hit the old or the new rule then. Cheers, Phil
On Wed, Dec 04, 2019 at 03:42:00PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Hi Phil, > > I can also minimize any impact by inserting a new rule in front of the old one, and then delete the old one. So in this case there should no any impact. Here is iptables rules I try to mimic: Yes, that's more or less equivalent to doing it in a single transaction. > // -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB > // -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-MMFZROQSLQ3DKOQA > // ! > // ! Endpoint 1 for KUBE-SVC-57XVOCFNTLTR3Q27 > // ! > // -A KUBE-SEP-FS3FUULGZPVD4VYB -s 57.112.0.247/32 -j KUBE-MARK-MASQ > // -A KUBE-SEP-FS3FUULGZPVD4VYB -p tcp -m tcp -j DNAT --to-destination 57.112.0.247:8080 > // ! > // ! Endpoint 2 for KUBE-SVC-57XVOCFNTLTR3Q27 > // ! > // -A KUBE-SEP-MMFZROQSLQ3DKOQA -s 57.112.0.248/32 -j KUBE-MARK-MASQ > // -A KUBE-SEP-MMFZROQSLQ3DKOQA -p tcp -m tcp -j DNAT --to-destination 57.112.0.248:8080 > > As you can see SVC chain KUBE-SVC-57XVOCFNTLTR3Q27 load balance between 2 endpoints. OK, static load-balancing between two services - no big deal. :) What happens if config changes? I.e., if one of the endpoints goes down or a third one is added? (That's the thing we're discussing right now, aren't we?) Cheers, Phil
It is not static, SVC chain jump rules will be updated on every endpoint change, the dynamic nature is achieved by manipulating rules. It is doable with nftables, I understand that, but I was also looking for a more efficient way to do it, my concern is if we use 1 to 1 conversion, we will end up with the same iptables scalability/performance limitations. Here is how rules look after a third and forth endpoint gets dynamically added to the service. -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-MMFZROQSLQ3DKOQA -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-TEWRTAGT3CD3D47Z -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-4WMWD734WJQW264U Thank you Serguei On 2019-12-04, 10:56 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: On Wed, Dec 04, 2019 at 03:42:00PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Hi Phil, > > I can also minimize any impact by inserting a new rule in front of the old one, and then delete the old one. So in this case there should no any impact. Here is iptables rules I try to mimic: Yes, that's more or less equivalent to doing it in a single transaction. > // -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB > // -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-MMFZROQSLQ3DKOQA > // ! > // ! Endpoint 1 for KUBE-SVC-57XVOCFNTLTR3Q27 > // ! > // -A KUBE-SEP-FS3FUULGZPVD4VYB -s 57.112.0.247/32 -j KUBE-MARK-MASQ > // -A KUBE-SEP-FS3FUULGZPVD4VYB -p tcp -m tcp -j DNAT --to-destination 57.112.0.247:8080 > // ! > // ! Endpoint 2 for KUBE-SVC-57XVOCFNTLTR3Q27 > // ! > // -A KUBE-SEP-MMFZROQSLQ3DKOQA -s 57.112.0.248/32 -j KUBE-MARK-MASQ > // -A KUBE-SEP-MMFZROQSLQ3DKOQA -p tcp -m tcp -j DNAT --to-destination 57.112.0.248:8080 > > As you can see SVC chain KUBE-SVC-57XVOCFNTLTR3Q27 load balance between 2 endpoints. OK, static load-balancing between two services - no big deal. :) What happens if config changes? I.e., if one of the endpoints goes down or a third one is added? (That's the thing we're discussing right now, aren't we?) Cheers, Phil
Hi,
On Wed, Dec 04, 2019 at 04:13:45PM +0000, Serguei Bezverkhi (sbezverk) wrote:
> It is not static, SVC chain jump rules will be updated on every endpoint change, the dynamic nature is achieved by manipulating rules. It is doable with nftables, I understand that, but I was also looking for a more efficient way to do it, my concern is if we use 1 to 1 conversion, we will end up with the same iptables scalability/performance limitations.
>
> Here is how rules look after a third and forth endpoint gets dynamically added to the service.
>
> -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB
> -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-MMFZROQSLQ3DKOQA
> -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-TEWRTAGT3CD3D47Z
> -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-4WMWD734WJQW264U
Ah, that's nice. The rules are updated in a way that with a single added
rule probabilities are equalized again. This is something I fear we
can't do with a map in nftables yet, I guess it would need a new object
type (or maybe a special set/map type or something. All you can do for
now is copy the above in nftables.
Cheers, Phil
On 12/4/19 4:56 PM, Phil Sutter wrote:
> OK, static load-balancing between two services - no big deal. :)
>
> What happens if config changes? I.e., if one of the endpoints goes down
> or a third one is added? (That's the thing we're discussing right now,
> aren't we?)
if the non-anon map for random numgen was allowed, then only elements would need
to be adjusted:
dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 }
You could always use mod 100 (or 10000 if you want) and just play with the map
probabilities by updating map elements. This is a valid use case I think.
The mod number can just be the max number of allowed endpoints per service in
kubernetes.
@Phil,
I'm not sure if the typeof() thingy will work in this case, since the integer
length would depend on the mod value used.
What about introducing something like an explicit u128 integer datatype. Perhaps
it's useful for other use cases too...
@Serguei,
kubernetes implements a complex chain of mechanisms to deal with traffic. What
happens if endpoints for a given svc have different ports? I don't know if
that's supported or not, but then this approach wouldn't work either: you can't
use dnat numgen randmo { 0-49 : <ip>:<port> }.
Also, we have the masquerade/drop thing going on too, which needs to be deal
with and that currently is done by yet another chain jump + packet mark.
I'm not sure in which state of the development you are, but this is my
suggestion: Try to don't over-optimize in the first iteration. Just get a
working nft ruleset with the few optimization that make sense and are easy to
use (and understand). For iteration #2 we can do better optimizations, including
patching missing features we may have in nftables.
I really want a ruleset with very little rules, but we are still comparing with
the iptables ruleset. I suggest we leave the hard optimization for a later point
when we are comparing nft vs nft rulesets.
Hello @Phil, Just to confirm, If I do, Numgen random mod 3 vmap { 0 : jump endpoint1, 1 : jump endpoint2, 2 : jump endpoint3 } Then if 4th endpoint appears I replace the previous rule with: Numgen random mod 4 vmap { 0 : jump endpoint1, 1 : jump endpoint2, 2 : jump endpoint3, 3 : jump endpoint4 } It should do the trick of loadbalancing, right? @Arturo I am no planning to use " dnat numgen randmo { 0-49 : <ip>:<port> }." Each end point will have it is own chain and it will to dnat to ip and specific to endpoint target port. The load balancing will be done in service chain between multiple endpoint chains. See example above. Does it make sense? Thank you Serguei On 2019-12-04, 12:31 PM, "Arturo Borrero Gonzalez" <arturo@netfilter.org> wrote: On 12/4/19 4:56 PM, Phil Sutter wrote: > OK, static load-balancing between two services - no big deal. :) > > What happens if config changes? I.e., if one of the endpoints goes down > or a third one is added? (That's the thing we're discussing right now, > aren't we?) if the non-anon map for random numgen was allowed, then only elements would need to be adjusted: dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 } You could always use mod 100 (or 10000 if you want) and just play with the map probabilities by updating map elements. This is a valid use case I think. The mod number can just be the max number of allowed endpoints per service in kubernetes. @Phil, I'm not sure if the typeof() thingy will work in this case, since the integer length would depend on the mod value used. What about introducing something like an explicit u128 integer datatype. Perhaps it's useful for other use cases too... @Serguei, kubernetes implements a complex chain of mechanisms to deal with traffic. What happens if endpoints for a given svc have different ports? I don't know if that's supported or not, but then this approach wouldn't work either: you can't use dnat numgen randmo { 0-49 : <ip>:<port> }. Also, we have the masquerade/drop thing going on too, which needs to be deal with and that currently is done by yet another chain jump + packet mark. I'm not sure in which state of the development you are, but this is my suggestion: Try to don't over-optimize in the first iteration. Just get a working nft ruleset with the few optimization that make sense and are easy to use (and understand). For iteration #2 we can do better optimizations, including patching missing features we may have in nftables. I really want a ruleset with very little rules, but we are still comparing with the iptables ruleset. I suggest we leave the hard optimization for a later point when we are comparing nft vs nft rulesets.
Here are code generated nftables rules for nat portion of k8s proxy. Probably it does not cover all cases, but on a normal k8s cluster it would be sufficient. Appreciate reviews and suggestions for optimization. Thank you very much. Serguei table ip ipv4table { chain nat-preroutin { type nat hook prerouting priority filter; policy accept; jump k8s-nat-services } chain nat-output { type nat hook output priority filter; policy accept; jump k8s-nat-services } chain nat-postrouting { type nat hook postrouting priority filter; policy accept; jump k8s-nat-postrouting } chain k8s-nat-mark-drop { meta mark set 0x00008000 } chain k8s-nat-services { ip saddr != 57.112.0.0/12 ip daddr 57.142.221.21 tcp dport 80 meta mark set 0x00004000 ip daddr 57.142.221.21 tcp dport 80 jump KUBE-SVC-57XVOCFNTLTR3Q27 ip saddr != 57.112.0.0/12 ip daddr 57.142.35.114 tcp dport 15443 meta mark set 0x00004000 ip daddr 57.142.35.114 tcp dport 15443 jump KUBE-SVC-S4S242M2WNFIAT6Y ip daddr 57.131.151.19 tcp dport 8989 jump KUBE-SVC-MUPXPVK4XAZHSWAR ip daddr 192.168.80.104 tcp dport 8989 meta mark set 0x00004000 fib saddr type != local ip daddr 192.168.80.104 tcp dport 8989 iifname != "bridge*" jump KUBE-SVC-MUPXPVK4XAZHSWAR fib daddr type local ip daddr 192.168.80.104 tcp dport 8989 jump KUBE-SVC-MUPXPVK4XAZHSWAR } chain k8s-nat-nodeports { tcp dport 30725 meta mark set 0x00004000 jump KUBE-SVC-S4S242M2WNFIAT6Y } chain k8s-nat-postrouting { meta mark 0x00004000 masquerade random,persistent } chain KUBE-SVC-S4S242M2WNFIAT6Y { jump KUBE-SEP-CUAZ6PSSTEDPJ43V } chain KUBE-SVC-57XVOCFNTLTR3Q27 { numgen random mod 2 vmap { 0 : jump KUBE-SEP-FS3FUULGZPVD4VYB, 1 : jump KUBE-SEP-MMFZROQSLQ3DKOQA } } chain KUBE-SVC-MUPXPVK4XAZHSWAR { jump KUBE-SEP-LO6TEVOI6GV524F3 } chain KUBE-SEP-CUAZ6PSSTEDPJ43V { ip saddr 57.112.0.244 meta mark set 0x00004000 dnat to 57.112.0.244:15443 fully-random } chain KUBE-SEP-FS3FUULGZPVD4VYB { ip saddr 57.112.0.247 meta mark set 0x00004000 dnat to 57.112.0.247:8080 fully-random } chain KUBE-SEP-MMFZROQSLQ3DKOQA { ip saddr 57.112.0.248 meta mark set 0x00004000 dnat to 57.112.0.248:8080 fully-random } chain KUBE-SEP-LO6TEVOI6GV524F3 { ip saddr 57.112.0.250 meta mark set 0x00004000 dnat to 57.112.0.250:38989 fully-random } } On 2019-12-04, 12:49 PM, "Serguei Bezverkhi (sbezverk)" <sbezverk@cisco.com> wrote: Hello @Phil, Just to confirm, If I do, Numgen random mod 3 vmap { 0 : jump endpoint1, 1 : jump endpoint2, 2 : jump endpoint3 } Then if 4th endpoint appears I replace the previous rule with: Numgen random mod 4 vmap { 0 : jump endpoint1, 1 : jump endpoint2, 2 : jump endpoint3, 3 : jump endpoint4 } It should do the trick of loadbalancing, right? @Arturo I am no planning to use " dnat numgen randmo { 0-49 : <ip>:<port> }." Each end point will have it is own chain and it will to dnat to ip and specific to endpoint target port. The load balancing will be done in service chain between multiple endpoint chains. See example above. Does it make sense? Thank you Serguei On 2019-12-04, 12:31 PM, "Arturo Borrero Gonzalez" <arturo@netfilter.org> wrote: On 12/4/19 4:56 PM, Phil Sutter wrote: > OK, static load-balancing between two services - no big deal. :) > > What happens if config changes? I.e., if one of the endpoints goes down > or a third one is added? (That's the thing we're discussing right now, > aren't we?) if the non-anon map for random numgen was allowed, then only elements would need to be adjusted: dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 } You could always use mod 100 (or 10000 if you want) and just play with the map probabilities by updating map elements. This is a valid use case I think. The mod number can just be the max number of allowed endpoints per service in kubernetes. @Phil, I'm not sure if the typeof() thingy will work in this case, since the integer length would depend on the mod value used. What about introducing something like an explicit u128 integer datatype. Perhaps it's useful for other use cases too... @Serguei, kubernetes implements a complex chain of mechanisms to deal with traffic. What happens if endpoints for a given svc have different ports? I don't know if that's supported or not, but then this approach wouldn't work either: you can't use dnat numgen randmo { 0-49 : <ip>:<port> }. Also, we have the masquerade/drop thing going on too, which needs to be deal with and that currently is done by yet another chain jump + packet mark. I'm not sure in which state of the development you are, but this is my suggestion: Try to don't over-optimize in the first iteration. Just get a working nft ruleset with the few optimization that make sense and are easy to use (and understand). For iteration #2 we can do better optimizations, including patching missing features we may have in nftables. I really want a ruleset with very little rules, but we are still comparing with the iptables ruleset. I suggest we leave the hard optimization for a later point when we are comparing nft vs nft rulesets.
Hi Arturo, On Wed, Dec 04, 2019 at 06:31:02PM +0100, Arturo Borrero Gonzalez wrote: > On 12/4/19 4:56 PM, Phil Sutter wrote: > > OK, static load-balancing between two services - no big deal. :) > > > > What happens if config changes? I.e., if one of the endpoints goes down > > or a third one is added? (That's the thing we're discussing right now, > > aren't we?) > > if the non-anon map for random numgen was allowed, then only elements would need > to be adjusted: > > dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 } > > You could always use mod 100 (or 10000 if you want) and just play with the map > probabilities by updating map elements. This is a valid use case I think. > The mod number can just be the max number of allowed endpoints per service in > kubernetes. > > @Phil, > > I'm not sure if the typeof() thingy will work in this case, since the integer > length would depend on the mod value used. > What about introducing something like an explicit u128 integer datatype. Perhaps > it's useful for other use cases too... Out of curiosity I implemented the bits to support typeof keyword in parser and scanner. It's a bit clumsy, but it works. I can do: | nft add map t m2 '{ type typeof numgen random mod 2 : verdict; }' (The 'random mod 2' part is ignored, but needed as otherwise it's not a primary_expr. :D) The output is: | table ip t { | map m2 { | type integer : verdict | } | } So integer size information is lost, this won't work when fed back. There are two options to solve this: A) Push expression info into kernel so we can correctly deserialize the original input. B) As you suggested, have something like 'int32' or maybe better 'int(32)'. I consider (B) to be way less ugly. And if we went that route, we could actually use the 'int32'/'int(32)' thing in the first place. All users have to know is how large is 'numgen' data type. Or we're even smart here, taking into account that such a map may be used with different inputs and mask input to fit map key size. IIRC, we may even have had this discussion in an inconveniently cold room in Malaga once. :) > @Serguei, > > kubernetes implements a complex chain of mechanisms to deal with traffic. What > happens if endpoints for a given svc have different ports? I don't know if > that's supported or not, but then this approach wouldn't work either: you can't > use dnat numgen randmo { 0-49 : <ip>:<port> }. > > Also, we have the masquerade/drop thing going on too, which needs to be deal > with and that currently is done by yet another chain jump + packet mark. > > I'm not sure in which state of the development you are, but this is my > suggestion: Try to don't over-optimize in the first iteration. Just get a > working nft ruleset with the few optimization that make sense and are easy to > use (and understand). For iteration #2 we can do better optimizations, including > patching missing features we may have in nftables. > I really want a ruleset with very little rules, but we are still comparing with > the iptables ruleset. I suggest we leave the hard optimization for a later point > when we are comparing nft vs nft rulesets. +1 for optimize not (yet). At least there's a certain chance that we're spending much effort into optimizing a path which isn't even the bottleneck later. Cheers, Phil
Hi Phil, In this google doc, see link: https://docs.google.com/document/d/128gllbr_o-40pD2i0D14zMNdtCwRYR7YM49T4L2Eyac/edit?usp=sharing There is a question about possible optimizations. I was wondering if you could comment/reply. Also I got one more question about updates of a set. Let's say there is a set with 10k entries, how costly would be the update of such set. Thanks a lot Serguei On 2019-12-04, 5:32 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Arturo, On Wed, Dec 04, 2019 at 06:31:02PM +0100, Arturo Borrero Gonzalez wrote: > On 12/4/19 4:56 PM, Phil Sutter wrote: > > OK, static load-balancing between two services - no big deal. :) > > > > What happens if config changes? I.e., if one of the endpoints goes down > > or a third one is added? (That's the thing we're discussing right now, > > aren't we?) > > if the non-anon map for random numgen was allowed, then only elements would need > to be adjusted: > > dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 } > > You could always use mod 100 (or 10000 if you want) and just play with the map > probabilities by updating map elements. This is a valid use case I think. > The mod number can just be the max number of allowed endpoints per service in > kubernetes. > > @Phil, > > I'm not sure if the typeof() thingy will work in this case, since the integer > length would depend on the mod value used. > What about introducing something like an explicit u128 integer datatype. Perhaps > it's useful for other use cases too... Out of curiosity I implemented the bits to support typeof keyword in parser and scanner. It's a bit clumsy, but it works. I can do: | nft add map t m2 '{ type typeof numgen random mod 2 : verdict; }' (The 'random mod 2' part is ignored, but needed as otherwise it's not a primary_expr. :D) The output is: | table ip t { | map m2 { | type integer : verdict | } | } So integer size information is lost, this won't work when fed back. There are two options to solve this: A) Push expression info into kernel so we can correctly deserialize the original input. B) As you suggested, have something like 'int32' or maybe better 'int(32)'. I consider (B) to be way less ugly. And if we went that route, we could actually use the 'int32'/'int(32)' thing in the first place. All users have to know is how large is 'numgen' data type. Or we're even smart here, taking into account that such a map may be used with different inputs and mask input to fit map key size. IIRC, we may even have had this discussion in an inconveniently cold room in Malaga once. :) > @Serguei, > > kubernetes implements a complex chain of mechanisms to deal with traffic. What > happens if endpoints for a given svc have different ports? I don't know if > that's supported or not, but then this approach wouldn't work either: you can't > use dnat numgen randmo { 0-49 : <ip>:<port> }. > > Also, we have the masquerade/drop thing going on too, which needs to be deal > with and that currently is done by yet another chain jump + packet mark. > > I'm not sure in which state of the development you are, but this is my > suggestion: Try to don't over-optimize in the first iteration. Just get a > working nft ruleset with the few optimization that make sense and are easy to > use (and understand). For iteration #2 we can do better optimizations, including > patching missing features we may have in nftables. > I really want a ruleset with very little rules, but we are still comparing with > the iptables ruleset. I suggest we leave the hard optimization for a later point > when we are comparing nft vs nft rulesets. +1 for optimize not (yet). At least there's a certain chance that we're spending much effort into optimizing a path which isn't even the bottleneck later. Cheers, Phil
Hi Serguei, On Tue, Dec 17, 2019 at 12:51:07AM +0000, Serguei Bezverkhi (sbezverk) wrote: > In this google doc, see link: https://docs.google.com/document/d/128gllbr_o-40pD2i0D14zMNdtCwRYR7YM49T4L2Eyac/edit?usp=sharing I avoid Google-Doc as far as possible. ;) > There is a question about possible optimizations. I was wondering if you could comment/reply. Also I got one more question about updates of a set. Let's say there is a set with 10k entries, how costly would be the update of such set. Regarding Rob's question: With iptables, for N balanced servers there are N rules. With equal probabilities a package traverses N/2 rules on average (unless I'm mistaken). With nftables, there's a single rule which triggers the map lookup. In kernel, that's a lookup in rhashtable and therefore performs quite well. Another aspect to Rob's question is jitter: With iptables solution, a packet may traverse all N rules before it is dispatched. The nftables map lookup will happen in almost constant time. I can't give you performance numbers, but it should be easy to measure. Given that you won't need set content for insert or delete operations while iptables fetches the whole table for each rule insert or delete command, I guess you can imagine how the numbers will look like. But feel free to verify, it's fun! :) Cheers, Phil
Hi Phil, Thank you very much for your reply. Can I paste your reply into the doc with reference to your name? If you do not wish. I will rephrase it and post it there. I have one question, chain KUBE-SVC-57XVOCFNTLTR3Q27 { numgen random mod 2 vmap { 0 : jump KUBE-SEP-FS3FUULGZPVD4VYB, 1 : jump KUBE-SEP-MMFZROQSLQ3DKOQA } } In this rule, as far as I understood you last time, there is no way dynamically change elements of anonymous vmap. So if the service has large number of dynamic (short lived) endpoints, this rule will have to be reprogrammed for every change and it would be extremely inefficient. Is there any way to make it more dynamic or plans to change the static behavior? That would extremely important. Thank you Serguei On 2019-12-17, 7:29 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Tue, Dec 17, 2019 at 12:51:07AM +0000, Serguei Bezverkhi (sbezverk) wrote: > In this google doc, see link: https://docs.google.com/document/d/128gllbr_o-40pD2i0D14zMNdtCwRYR7YM49T4L2Eyac/edit?usp=sharing I avoid Google-Doc as far as possible. ;) > There is a question about possible optimizations. I was wondering if you could comment/reply. Also I got one more question about updates of a set. Let's say there is a set with 10k entries, how costly would be the update of such set. Regarding Rob's question: With iptables, for N balanced servers there are N rules. With equal probabilities a package traverses N/2 rules on average (unless I'm mistaken). With nftables, there's a single rule which triggers the map lookup. In kernel, that's a lookup in rhashtable and therefore performs quite well. Another aspect to Rob's question is jitter: With iptables solution, a packet may traverse all N rules before it is dispatched. The nftables map lookup will happen in almost constant time. I can't give you performance numbers, but it should be easy to measure. Given that you won't need set content for insert or delete operations while iptables fetches the whole table for each rule insert or delete command, I guess you can imagine how the numbers will look like. But feel free to verify, it's fun! :) Cheers, Phil
Hi Serguei, On Tue, Dec 17, 2019 at 02:05:58PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Thank you very much for your reply. Can I paste your reply into the doc with reference to your name? If you do not wish. I will rephrase it and post it there. Noo, don't tell anyone what I write in mails to public lists! ;) Seriously, I don't care if you paste it there or just link to my reply in a public archive. > I have one question, > > chain KUBE-SVC-57XVOCFNTLTR3Q27 { > numgen random mod 2 vmap { 0 : jump KUBE-SEP-FS3FUULGZPVD4VYB, > 1 : jump KUBE-SEP-MMFZROQSLQ3DKOQA } > } > > In this rule, as far as I understood you last time, there is no way dynamically change elements of anonymous vmap. So if the service has large number of dynamic (short lived) endpoints, this rule will have to be reprogrammed for every change and it would be extremely inefficient. Is there any way to make it more dynamic or plans to change the static behavior? That would extremely important. Consensus was that you should either copy the iptables solution for now (accepting the drawbacks I explained in my last mail) or go with replacing that rule for each added/removed node. You'll have to adjust both mapping contents and modulus value! While it would be nice to have a better way of managing this load-balancing, I have no idea how one would ideally implement it. Feel free to file a ticket in netfilter bugzilla, but don't hold your breath for a quick solution. Cheers, Phil
Hello, I came across a situation when I need to match against L4 proto (tcp/udp), L3 daddr and L4 port(port value) with vmap. Vmap looks like this: map no-endpoints-services { type inet_proto . ipv4_addr . inet_service : verdict } I was wondering if somebody could come up with a single line rule with reference to that vmap. Thank you Serguei On 2019-12-17, 11:41 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Tue, Dec 17, 2019 at 02:05:58PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Thank you very much for your reply. Can I paste your reply into the doc with reference to your name? If you do not wish. I will rephrase it and post it there. Noo, don't tell anyone what I write in mails to public lists! ;) Seriously, I don't care if you paste it there or just link to my reply in a public archive. > I have one question, > > chain KUBE-SVC-57XVOCFNTLTR3Q27 { > numgen random mod 2 vmap { 0 : jump KUBE-SEP-FS3FUULGZPVD4VYB, > 1 : jump KUBE-SEP-MMFZROQSLQ3DKOQA } > } > > In this rule, as far as I understood you last time, there is no way dynamically change elements of anonymous vmap. So if the service has large number of dynamic (short lived) endpoints, this rule will have to be reprogrammed for every change and it would be extremely inefficient. Is there any way to make it more dynamic or plans to change the static behavior? That would extremely important. Consensus was that you should either copy the iptables solution for now (accepting the drawbacks I explained in my last mail) or go with replacing that rule for each added/removed node. You'll have to adjust both mapping contents and modulus value! While it would be nice to have a better way of managing this load-balancing, I have no idea how one would ideally implement it. Feel free to file a ticket in netfilter bugzilla, but don't hold your breath for a quick solution. Cheers, Phil
Hi Serguei,
On Wed, Dec 18, 2019 at 05:01:33PM +0000, Serguei Bezverkhi (sbezverk) wrote:
> I came across a situation when I need to match against L4 proto (tcp/udp), L3 daddr and L4 port(port value) with vmap.
>
> Vmap looks like this:
>
> map no-endpoints-services {
> type inet_proto . ipv4_addr . inet_service : verdict
> }
>
> I was wondering if somebody could come up with a single line rule with reference to that vmap.
Should work using th header expression:
| ip protocol . ip daddr . th dport vmap @no-endpoints-services
Cheers, Phil
Error: syntax error, unexpected th add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services ^^ sbezverk@dev-ubuntu-1:mimic-filter$ sudo nft -v nftables v0.9.1 (Headless Horseman) Any clues? Am I using old version? Thank you Serguei On 2019-12-18, 12:24 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Wed, Dec 18, 2019 at 05:01:33PM +0000, Serguei Bezverkhi (sbezverk) wrote: > I came across a situation when I need to match against L4 proto (tcp/udp), L3 daddr and L4 port(port value) with vmap. > > Vmap looks like this: > > map no-endpoints-services { > type inet_proto . ipv4_addr . inet_service : verdict > } > > I was wondering if somebody could come up with a single line rule with reference to that vmap. Should work using th header expression: | ip protocol . ip daddr . th dport vmap @no-endpoints-services Cheers, Phil
On Wed, Dec 18, 2019 at 8:44 PM Serguei Bezverkhi (sbezverk)
<sbezverk@cisco.com> wrote:
>
> Error: syntax error, unexpected th
>
> add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services
> ^^
Try this:
... @th dport vmap ...
or
... @th,16,16 vmap ...
Still no luck ( @th,16,16 Error: can not use variable sized data types (integer) in concat expressions add rule ipv4table k8s-filter-services ip protocol . ip daddr . @th,16,16 vmap @no-endpoints-services ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^ Error: syntax error, unexpected dport, expecting comma add rule ipv4table k8s-filter-services ip protocol . ip daddr . @th dport vmap @no-endpoints-services ^^^^^ In first case the syntax looks ok, but I am not sure if vlam type needs to be changed type inet_proto . ipv6_addr . inet_service : verdict --> type inet_proto . ipv6_addr . integer: verdict. ??? Thank you Serguei On 2019-12-18, 2:58 PM, "Laura Garcia" <nevola@gmail.com> wrote: On Wed, Dec 18, 2019 at 8:44 PM Serguei Bezverkhi (sbezverk) <sbezverk@cisco.com> wrote: > > Error: syntax error, unexpected th > > add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services > ^^ Try this: ... @th dport vmap ... or ... @th,16,16 vmap ...
Hi, On Wed, Dec 18, 2019 at 08:58:12PM +0100, Laura Garcia wrote: > On Wed, Dec 18, 2019 at 8:44 PM Serguei Bezverkhi (sbezverk) > <sbezverk@cisco.com> wrote: > > > > Error: syntax error, unexpected th > > > > add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services > > ^^ > The th header expression is available since v0.9.2, you'll have to update nftables to use it. > Try this: > > ... @th dport vmap ... Wrong syntax. > or > > ... @th,16,16 vmap ... This not working in concatenations was one of Florian's motivations to implement th expression, see a43a696443a15 ("proto: add pseudo th protocol to match d/sport in generic way") for details. :) Cheers, Phil
Hi Phil, Not sure why, but even with 0.9.2 "th" expression is not recognized. error: syntax error, unexpected th add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services ^^ sbezverk@dev-ubuntu-1:mimic-filter$ sudo nft -version nftables v0.9.2 (Scram) sbezverk@dev-ubuntu-1:mimic-filter$ It seems 0.9.3 is out but still no Debian package. Is it possible it did not make it into 0.9.2? Thank you Serguei On 2019-12-19, 5:48 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Wed, Dec 18, 2019 at 08:58:12PM +0100, Laura Garcia wrote: > On Wed, Dec 18, 2019 at 8:44 PM Serguei Bezverkhi (sbezverk) > <sbezverk@cisco.com> wrote: > > > > Error: syntax error, unexpected th > > > > add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services > > ^^ > The th header expression is available since v0.9.2, you'll have to update nftables to use it. > Try this: > > ... @th dport vmap ... Wrong syntax. > or > > ... @th,16,16 vmap ... This not working in concatenations was one of Florian's motivations to implement th expression, see a43a696443a15 ("proto: add pseudo th protocol to match d/sport in generic way") for details. :) Cheers, Phil
Hi,
On Thu, Dec 19, 2019 at 02:59:01PM +0000, Serguei Bezverkhi (sbezverk) wrote:
> Not sure why, but even with 0.9.2 "th" expression is not recognized.
>
> error: syntax error, unexpected th
> add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services
> ^^
> sbezverk@dev-ubuntu-1:mimic-filter$ sudo nft -version
> nftables v0.9.2 (Scram)
> sbezverk@dev-ubuntu-1:mimic-filter$
>
> It seems 0.9.3 is out but still no Debian package. Is it possible it did not make it into 0.9.2?
Not sure what's missing on your end. I checked 0.9.2 tarball, at least
parser should understand the syntax.
Cheers, Phil
HI Phil, I built 0.9.3 and now it recognizes "th", but there is I suspect a cosmetic issue in the output in nft cli. See below the command I used: sudo nft --debug all add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services It looks like correctly generating expressions: ip ipv4table k8s-filter-services [ payload load 1b @ network header + 9 => reg 1 ] [ payload load 4b @ network header + 16 => reg 9 ] [ payload load 2b @ transport header + 2 => reg 10 ] [ lookup reg 1 set no-endpoints-services dreg 0 ] But when I run "sudo nft list tables ipv4table" the rule is missing third parameter. table ip ipv4table { map no-endpoints-services { type inet_proto . ipv4_addr . inet_service : verdict } chain k8s-filter-services { ip protocol . ip daddr vmap @no-endpoints-services < ------------------- Missing " th dport" } } It seems just a cosmetic thing, but eventually would be nice to have it fixed, if it has not been already in the master branch. I am using v0.9.3 branch. Thank you Serguei On 2019-12-19, 10:46 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Thu, Dec 19, 2019 at 02:59:01PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Not sure why, but even with 0.9.2 "th" expression is not recognized. > > error: syntax error, unexpected th > add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services > ^^ > sbezverk@dev-ubuntu-1:mimic-filter$ sudo nft -version > nftables v0.9.2 (Scram) > sbezverk@dev-ubuntu-1:mimic-filter$ > > It seems 0.9.3 is out but still no Debian package. Is it possible it did not make it into 0.9.2? Not sure what's missing on your end. I checked 0.9.2 tarball, at least parser should understand the syntax. Cheers, Phil
While trying to add an element to the set, I am getting error: sudo nft --debug all add element ipv4table no-endpoints-services { tcp . 192.168.80.104 . 8989 : goto do_reject } Error: Could not process rule: Invalid argument add element ipv4table no-endpoints-services { tcp . 192.168.80.104 . 8989 : goto do_reject } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Anything am I doing wrong? Thank you Serguei On 2019-12-19, 11:00 AM, "Serguei Bezverkhi (sbezverk)" <sbezverk@cisco.com> wrote: HI Phil, I built 0.9.3 and now it recognizes "th", but there is I suspect a cosmetic issue in the output in nft cli. See below the command I used: sudo nft --debug all add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services It looks like correctly generating expressions: ip ipv4table k8s-filter-services [ payload load 1b @ network header + 9 => reg 1 ] [ payload load 4b @ network header + 16 => reg 9 ] [ payload load 2b @ transport header + 2 => reg 10 ] [ lookup reg 1 set no-endpoints-services dreg 0 ] But when I run "sudo nft list tables ipv4table" the rule is missing third parameter. table ip ipv4table { map no-endpoints-services { type inet_proto . ipv4_addr . inet_service : verdict } chain k8s-filter-services { ip protocol . ip daddr vmap @no-endpoints-services < ------------------- Missing " th dport" } } It seems just a cosmetic thing, but eventually would be nice to have it fixed, if it has not been already in the master branch. I am using v0.9.3 branch. Thank you Serguei On 2019-12-19, 10:46 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Thu, Dec 19, 2019 at 02:59:01PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Not sure why, but even with 0.9.2 "th" expression is not recognized. > > error: syntax error, unexpected th > add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services > ^^ > sbezverk@dev-ubuntu-1:mimic-filter$ sudo nft -version > nftables v0.9.2 (Scram) > sbezverk@dev-ubuntu-1:mimic-filter$ > > It seems 0.9.3 is out but still no Debian package. Is it possible it did not make it into 0.9.2? Not sure what's missing on your end. I checked 0.9.2 tarball, at least parser should understand the syntax. Cheers, Phil
Hello, Happy New Year! I was wondering if there is a chance to fix the issue reported below. With this type of vmap, kube-proxy's rules could be grouped together to make them more efficient. Thank you Serguei On 2019-12-19, 1:19 PM, "Serguei Bezverkhi (sbezverk)" <sbezverk@cisco.com> wrote: While trying to add an element to the set, I am getting error: sudo nft --debug all add element ipv4table no-endpoints-services { tcp . 192.168.80.104 . 8989 : goto do_reject } Error: Could not process rule: Invalid argument add element ipv4table no-endpoints-services { tcp . 192.168.80.104 . 8989 : goto do_reject } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Anything am I doing wrong? Thank you Serguei On 2019-12-19, 11:00 AM, "Serguei Bezverkhi (sbezverk)" <sbezverk@cisco.com> wrote: HI Phil, I built 0.9.3 and now it recognizes "th", but there is I suspect a cosmetic issue in the output in nft cli. See below the command I used: sudo nft --debug all add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services It looks like correctly generating expressions: ip ipv4table k8s-filter-services [ payload load 1b @ network header + 9 => reg 1 ] [ payload load 4b @ network header + 16 => reg 9 ] [ payload load 2b @ transport header + 2 => reg 10 ] [ lookup reg 1 set no-endpoints-services dreg 0 ] But when I run "sudo nft list tables ipv4table" the rule is missing third parameter. table ip ipv4table { map no-endpoints-services { type inet_proto . ipv4_addr . inet_service : verdict } chain k8s-filter-services { ip protocol . ip daddr vmap @no-endpoints-services < ------------------- Missing " th dport" } } It seems just a cosmetic thing, but eventually would be nice to have it fixed, if it has not been already in the master branch. I am using v0.9.3 branch. Thank you Serguei On 2019-12-19, 10:46 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Thu, Dec 19, 2019 at 02:59:01PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Not sure why, but even with 0.9.2 "th" expression is not recognized. > > error: syntax error, unexpected th > add rule ipv4table k8s-filter-services ip protocol . ip daddr . th dport vmap @no-endpoints-services > ^^ > sbezverk@dev-ubuntu-1:mimic-filter$ sudo nft -version > nftables v0.9.2 (Scram) > sbezverk@dev-ubuntu-1:mimic-filter$ > > It seems 0.9.3 is out but still no Debian package. Is it possible it did not make it into 0.9.2? Not sure what's missing on your end. I checked 0.9.2 tarball, at least parser should understand the syntax. Cheers, Phil