* Operation not supported when adding jump command @ 2019-11-25 18:55 Serguei Bezverkhi (sbezverk) 2019-11-26 12:21 ` Florian Westphal 2019-12-03 23:50 ` Duncan Roe 0 siblings, 2 replies; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-25 18:55 UTC (permalink / raw) To: Pablo Neira Ayuso, netfilter-devel Hello Pablo, Please see below table/chain/rules/sets I program, when I try to add jump from input-net, input-local to services it fails with " Operation not supported" , I would appreciate if somebody could help to understand why: sudo nft add rule ipv4table input-net jump services Error: Could not process rule: Operation not supported add rule ipv4table input-net jump services ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ table ip ipv4table { set no-endpoint-svc-ports { type inet_service elements = { 8080, 8989 } } set no-endpoint-svc-addrs { type ipv4_addr flags interval elements = { 10.1.1.1, 10.1.1.2 } } chain input-net { type nat hook prerouting priority filter; policy accept; } chain input-local { type nat hook output priority filter; policy accept; } chain services { ip daddr @no-endpoint-svc-addrs tcp dport @no-endpoint-svc-ports reject with tcp reset ip daddr @no-endpoint-svc-addrs udp dport @no-endpoint-svc-ports reject with icmp type net-unreachable } } Thank you Serguei ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-25 18:55 Operation not supported when adding jump command Serguei Bezverkhi (sbezverk) @ 2019-11-26 12:21 ` Florian Westphal 2019-11-26 14:30 ` Serguei Bezverkhi (sbezverk) 2019-12-03 23:50 ` Duncan Roe 1 sibling, 1 reply; 34+ messages in thread From: Florian Westphal @ 2019-11-26 12:21 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk); +Cc: Pablo Neira Ayuso, netfilter-devel Serguei Bezverkhi (sbezverk) <sbezverk@cisco.com> wrote: > Hello Pablo, > > Please see below table/chain/rules/sets I program, when I try to add jump from input-net, input-local to services it fails with " Operation not supported" , I would appreciate if somebody could help to understand why: > > sudo nft add rule ipv4table input-net jump services > Error: Could not process rule: Operation not supported > add rule ipv4table input-net jump services > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ iirc "reject" only works in input/forward/postrouting hooks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 12:21 ` Florian Westphal @ 2019-11-26 14:30 ` Serguei Bezverkhi (sbezverk) 2019-11-26 14:52 ` Florian Westphal 2019-11-26 15:38 ` Pablo Neira Ayuso 0 siblings, 2 replies; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-26 14:30 UTC (permalink / raw) To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel Hello Florian, Thank you very much for your reply. Once I changed to Input chain type, the rule worked. It seems iptables DO allow the same rule configuration see below: -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A KUBE-SERVICES -d 57.131.151.19/32 -p tcp -m comment --comment "default/portal:portal has no endpoints" -m tcp --dport 8989 -j REJECT --reject-with icmp-port-unreachable This config is from working kubernetes cluster for the service which has no endpoints. Do you know if this change in behavior was a design decision or it is a bug? Thank you Serguei On 2019-11-26, 7:21 AM, "Florian Westphal" <fw@strlen.de> wrote: Serguei Bezverkhi (sbezverk) <sbezverk@cisco.com> wrote: > Hello Pablo, > > Please see below table/chain/rules/sets I program, when I try to add jump from input-net, input-local to services it fails with " Operation not supported" , I would appreciate if somebody could help to understand why: > > sudo nft add rule ipv4table input-net jump services > Error: Could not process rule: Operation not supported > add rule ipv4table input-net jump services > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ iirc "reject" only works in input/forward/postrouting hooks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 14:30 ` Serguei Bezverkhi (sbezverk) @ 2019-11-26 14:52 ` Florian Westphal 2019-11-26 15:38 ` Pablo Neira Ayuso 1 sibling, 0 replies; 34+ messages in thread From: Florian Westphal @ 2019-11-26 14:52 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel Serguei Bezverkhi (sbezverk) <sbezverk@cisco.com> wrote: > Hello Florian, > > Thank you very much for your reply. Once I changed to Input chain type, the rule worked. It seems iptables DO allow the same rule configuration see below: > > -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES > -A KUBE-SERVICES -d 57.131.151.19/32 -p tcp -m comment --comment "default/portal:portal has no endpoints" -m tcp --dport 8989 -j REJECT --reject-with icmp-port-unreachable No idea how this could work: iptables -t nat -A PREROUTING -j REJECT iptables: Invalid argument. Run `dmesg' for more information. dmesg | tail -1 x_tables: ip_tables: REJECT target: only valid in filter That check has been there since beginning of git history. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 14:30 ` Serguei Bezverkhi (sbezverk) 2019-11-26 14:52 ` Florian Westphal @ 2019-11-26 15:38 ` Pablo Neira Ayuso 2019-11-26 15:47 ` Serguei Bezverkhi (sbezverk) 1 sibling, 1 reply; 34+ messages in thread From: Pablo Neira Ayuso @ 2019-11-26 15:38 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk); +Cc: Florian Westphal, netfilter-devel On Tue, Nov 26, 2019 at 02:30:02PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Hello Florian, > > Thank you very much for your reply. Once I changed to Input chain type, the rule worked. It seems iptables DO allow the same rule configuration see below: > > -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES > -A KUBE-SERVICES -d 57.131.151.19/32 -p tcp -m comment --comment "default/portal:portal has no endpoints" -m tcp --dport 8989 -j REJECT --reject-with icmp-port-unreachable static struct xt_target reject_tg_reg __read_mostly = { .name = "REJECT", .family = NFPROTO_IPV4, .target = reject_tg, .targetsize = sizeof(struct ipt_reject_info), .table = "filter", .hooks = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT), .checkentry = reject_tg_check, .me = THIS_MODULE, }; ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 15:38 ` Pablo Neira Ayuso @ 2019-11-26 15:47 ` Serguei Bezverkhi (sbezverk) 2019-11-26 15:51 ` Phil Sutter 0 siblings, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-26 15:47 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel Hello, I totally get it that it is not possible in theory, but the matter of fact is in kubernetes somehow it works, maybe in some cases this check is not enforced, I do not know. If you are interested to investigate it further, please let me know as I said I have a cluster with these 2 rules configured. Thank you Serguei On 2019-11-26, 10:40 AM, "Pablo Neira Ayuso" <pablo@netfilter.org> wrote: On Tue, Nov 26, 2019 at 02:30:02PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Hello Florian, > > Thank you very much for your reply. Once I changed to Input chain type, the rule worked. It seems iptables DO allow the same rule configuration see below: > > -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES > -A KUBE-SERVICES -d 57.131.151.19/32 -p tcp -m comment --comment "default/portal:portal has no endpoints" -m tcp --dport 8989 -j REJECT --reject-with icmp-port-unreachable static struct xt_target reject_tg_reg __read_mostly = { .name = "REJECT", .family = NFPROTO_IPV4, .target = reject_tg, .targetsize = sizeof(struct ipt_reject_info), .table = "filter", .hooks = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT), .checkentry = reject_tg_check, .me = THIS_MODULE, }; ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 15:47 ` Serguei Bezverkhi (sbezverk) @ 2019-11-26 15:51 ` Phil Sutter 2019-11-26 18:47 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-26 15:51 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel Hi Serguei, On Tue, Nov 26, 2019 at 03:47:49PM +0000, Serguei Bezverkhi (sbezverk) wrote: > I totally get it that it is not possible in theory, but the matter of fact is in kubernetes somehow it works, maybe in some cases this check is not enforced, I do not know. If you are interested to investigate it further, please let me know as I said I have a cluster with these 2 rules configured. In another case I noticed that user-defined chains are a way to circumvent these types of functional restrictions. If that's good or bad is up to you to decide. ;) Regarding the desired functionality, I guess you're wandering the sinkhole-filled plains of undefined behaviour. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 15:51 ` Phil Sutter @ 2019-11-26 18:47 ` Serguei Bezverkhi (sbezverk) 2019-11-26 19:27 ` Phil Sutter 0 siblings, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-26 18:47 UTC (permalink / raw) To: Phil Sutter; +Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel Ok, I guess I will work around by using input and output chain types, even though it will raise some brows in k8s networking community. I have a second issue I am struggling to solve with nftables. Here is a service exposed for tcp port 80 which has 2 corresponding backends listening on a container port 8080. ! ! Backend 1 ! -A KUBE-SEP-FS3FUULGZPVD4VYB -s 57.112.0.247/32 -j KUBE-MARK-MASQ -A KUBE-SEP-FS3FUULGZPVD4VYB -p tcp -m tcp -j DNAT --to-destination 57.112.0.247:8080 ! ! Backend 2 ! -A KUBE-SEP-MMFZROQSLQ3DKOQA -s 57.112.0.248/32 -j KUBE-MARK-MASQ -A KUBE-SEP-MMFZROQSLQ3DKOQA -p tcp -m tcp -j DNAT --to-destination 57.112.0.248:8080 ! ! Service ! -A KUBE-SERVICES -d 57.142.221.21/32 -p tcp -m comment --comment "default/app:http-web cluster IP" -m tcp --dport 80 -j KUBE-SVC-57XVOCFNTLTR3Q27 ! ! Load balancing between 2 backends ! -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-MMFZROQSLQ3DKOQA I am looking for nftables equivalent for the load balancing part and also in this case there are double dnat translation, destination port from 80 to 8080 and destination IP: 57.112.0.247 or 57.112.0.248. Can it be expressed in a single nft dnat statement with vmaps or sets? Thank you Serguei On 2019-11-26, 10:53 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Tue, Nov 26, 2019 at 03:47:49PM +0000, Serguei Bezverkhi (sbezverk) wrote: > I totally get it that it is not possible in theory, but the matter of fact is in kubernetes somehow it works, maybe in some cases this check is not enforced, I do not know. If you are interested to investigate it further, please let me know as I said I have a cluster with these 2 rules configured. In another case I noticed that user-defined chains are a way to circumvent these types of functional restrictions. If that's good or bad is up to you to decide. ;) Regarding the desired functionality, I guess you're wandering the sinkhole-filled plains of undefined behaviour. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 18:47 ` Serguei Bezverkhi (sbezverk) @ 2019-11-26 19:27 ` Phil Sutter 2019-11-26 21:20 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-26 19:27 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel Hi, On Tue, Nov 26, 2019 at 06:47:09PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Ok, I guess I will work around by using input and output chain types, even though it will raise some brows in k8s networking community. > > I have a second issue I am struggling to solve with nftables. Here is a service exposed for tcp port 80 which has 2 corresponding backends listening on a container port 8080. > > ! > ! Backend 1 > ! > -A KUBE-SEP-FS3FUULGZPVD4VYB -s 57.112.0.247/32 -j KUBE-MARK-MASQ > -A KUBE-SEP-FS3FUULGZPVD4VYB -p tcp -m tcp -j DNAT --to-destination 57.112.0.247:8080 > ! > ! Backend 2 > ! > -A KUBE-SEP-MMFZROQSLQ3DKOQA -s 57.112.0.248/32 -j KUBE-MARK-MASQ > -A KUBE-SEP-MMFZROQSLQ3DKOQA -p tcp -m tcp -j DNAT --to-destination 57.112.0.248:8080 > ! > ! Service > ! > -A KUBE-SERVICES -d 57.142.221.21/32 -p tcp -m comment --comment "default/app:http-web cluster IP" -m tcp --dport 80 -j KUBE-SVC-57XVOCFNTLTR3Q27 > ! > ! Load balancing between 2 backends > ! > -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB > -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-MMFZROQSLQ3DKOQA > > I am looking for nftables equivalent for the load balancing part and also in this case there are double dnat translation, destination port from 80 to 8080 and destination IP: 57.112.0.247 or 57.112.0.248. > Can it be expressed in a single nft dnat statement with vmaps or sets? Regarding xt_statistic replacement, I once identified the equivalent of '-m statistic --mode random --probability 0.5' would be 'numgen random mod 0x2 < 0x1'. Keeping both target address and port in a single map for *NAT statements is not possible AFAIK. If I'm not mistaken, you might be able to hook up a vmap together with the numgen expression above like so: | numgen random mod 0x2 vmap { \ | 0x0: jump KUBE-SEP-FS3FUULGZPVD4VYB, \ | 0x1: jump KUBE-SEP-MMFZROQSLQ3DKOQA } Pure speculation, though. :) Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 19:27 ` Phil Sutter @ 2019-11-26 21:20 ` Serguei Bezverkhi (sbezverk) 2019-11-26 22:15 ` Phil Sutter 2019-11-27 10:11 ` Arturo Borrero Gonzalez 0 siblings, 2 replies; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-26 21:20 UTC (permalink / raw) To: Phil Sutter; +Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel Hello Phil, It almost worked ( Check this out: sudo nft list table ipv4table table ip ipv4table { set no-endpoint-svc-ports { type inet_service elements = { 8080, 8989 } } set no-endpoint-svc-addrs { type ipv4_addr flags interval elements = { 10.1.1.1, 10.1.1.2} } chain input-net { type nat hook input priority filter; policy accept; jump services } chain input-local { type nat hook output priority filter; policy accept; jump services } chain services { ip daddr @no-endpoint-svc-addrs tcp dport @no-endpoint-svc-ports reject with tcp reset ip daddr @no-endpoint-svc-addrs udp dport @no-endpoint-svc-ports reject with icmp type net-unreachable } chain svc1-endpoint-1 { ip protocol tcp dnat to 12.1.1.1:8080 } chain svc1-endpoint-2 { ip protocol tcp dnat to 12.1.1.2:8080 } chain svc2-endpoint-1 { ip protocol tcp dnat to 12.1.1.3:8090 } chain svc2-endpoint-2 { ip protocol tcp dnat to 12.1.1.4:8090 } chain svc1 { } chain svc2 { } chain prerouting { type nat hook prerouting priority filter; policy accept; ip daddr 1.1.1.1 tcp dport 88 numgen random mod 2 vmap { 0 : jump svc1-endpoint-1, 1 : jump svc1-endpoint-2 } ip daddr 2.2.2.2 tcp dport 99 numgen random mod 2 vmap { 0 : jump svc2-endpoint-1, 1 : jump svc2-endpoint-2 } }} Ideally I need to apply this rule " numgen random mod 2 vmap { 0 : jump svc1-endpoint-1, 1 : jump svc1-endpoint-2 }" to svc1 and svc2 chains to load balance between services' endpoints but when I do that it fails with Unsupported operation. In contrast it let me apply this rule to prerouting chain. This split support of reject in input/forward/output and numgen only in prerouting is not ideal as a packet for a client of a service without registered endpoint will need to go through all checks in prerouting chain before it reaches input chain and get its reject back. Thank you very much for your help. Serguei On 2019-11-26, 2:28 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Tue, Nov 26, 2019 at 06:47:09PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Ok, I guess I will work around by using input and output chain types, even though it will raise some brows in k8s networking community. > > I have a second issue I am struggling to solve with nftables. Here is a service exposed for tcp port 80 which has 2 corresponding backends listening on a container port 8080. > > ! > ! Backend 1 > ! > -A KUBE-SEP-FS3FUULGZPVD4VYB -s 57.112.0.247/32 -j KUBE-MARK-MASQ > -A KUBE-SEP-FS3FUULGZPVD4VYB -p tcp -m tcp -j DNAT --to-destination 57.112.0.247:8080 > ! > ! Backend 2 > ! > -A KUBE-SEP-MMFZROQSLQ3DKOQA -s 57.112.0.248/32 -j KUBE-MARK-MASQ > -A KUBE-SEP-MMFZROQSLQ3DKOQA -p tcp -m tcp -j DNAT --to-destination 57.112.0.248:8080 > ! > ! Service > ! > -A KUBE-SERVICES -d 57.142.221.21/32 -p tcp -m comment --comment "default/app:http-web cluster IP" -m tcp --dport 80 -j KUBE-SVC-57XVOCFNTLTR3Q27 > ! > ! Load balancing between 2 backends > ! > -A KUBE-SVC-57XVOCFNTLTR3Q27 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-FS3FUULGZPVD4VYB > -A KUBE-SVC-57XVOCFNTLTR3Q27 -j KUBE-SEP-MMFZROQSLQ3DKOQA > > I am looking for nftables equivalent for the load balancing part and also in this case there are double dnat translation, destination port from 80 to 8080 and destination IP: 57.112.0.247 or 57.112.0.248. > Can it be expressed in a single nft dnat statement with vmaps or sets? Regarding xt_statistic replacement, I once identified the equivalent of '-m statistic --mode random --probability 0.5' would be 'numgen random mod 0x2 < 0x1'. Keeping both target address and port in a single map for *NAT statements is not possible AFAIK. If I'm not mistaken, you might be able to hook up a vmap together with the numgen expression above like so: | numgen random mod 0x2 vmap { \ | 0x0: jump KUBE-SEP-FS3FUULGZPVD4VYB, \ | 0x1: jump KUBE-SEP-MMFZROQSLQ3DKOQA } Pure speculation, though. :) Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 21:20 ` Serguei Bezverkhi (sbezverk) @ 2019-11-26 22:15 ` Phil Sutter 2019-11-27 10:11 ` Arturo Borrero Gonzalez 1 sibling, 0 replies; 34+ messages in thread From: Phil Sutter @ 2019-11-26 22:15 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel Hi, On Tue, Nov 26, 2019 at 09:20:20PM +0000, Serguei Bezverkhi (sbezverk) wrote: > It almost worked ( Check this out: > sudo nft list table ipv4table > table ip ipv4table { > set no-endpoint-svc-ports { > type inet_service > elements = { 8080, 8989 } > } > > set no-endpoint-svc-addrs { > type ipv4_addr > flags interval > elements = { 10.1.1.1, 10.1.1.2} > } > > chain input-net { > type nat hook input priority filter; policy accept; > jump services > } > > chain input-local { > type nat hook output priority filter; policy accept; > jump services > } > > chain services { > ip daddr @no-endpoint-svc-addrs tcp dport @no-endpoint-svc-ports reject with tcp reset > ip daddr @no-endpoint-svc-addrs udp dport @no-endpoint-svc-ports reject with icmp type net-unreachable > } > > chain svc1-endpoint-1 { > ip protocol tcp dnat to 12.1.1.1:8080 > } > > chain svc1-endpoint-2 { > ip protocol tcp dnat to 12.1.1.2:8080 > } > > chain svc2-endpoint-1 { > ip protocol tcp dnat to 12.1.1.3:8090 > } > > chain svc2-endpoint-2 { > ip protocol tcp dnat to 12.1.1.4:8090 > } > > chain svc1 { > } > > chain svc2 { > } > > chain prerouting { > type nat hook prerouting priority filter; policy accept; > ip daddr 1.1.1.1 tcp dport 88 numgen random mod 2 vmap { 0 : jump svc1-endpoint-1, 1 : jump svc1-endpoint-2 } > ip daddr 2.2.2.2 tcp dport 99 numgen random mod 2 vmap { 0 : jump svc2-endpoint-1, 1 : jump svc2-endpoint-2 } > }} > > Ideally I need to apply this rule " numgen random mod 2 vmap { 0 : jump svc1-endpoint-1, 1 : jump svc1-endpoint-2 }" to svc1 and svc2 chains to load balance between services' endpoints but when I do that it fails with Unsupported operation. > In contrast it let me apply this rule to prerouting chain. I don't see where you jump to svc1/svc2 so this is a bit of guesswork. Anyway, please keep in mind that dnat is only supported from nat (and prerouting or output). > This split support of reject in input/forward/output and numgen only in prerouting is not ideal as a packet for a client of a service without registered endpoint will need to go through all checks in prerouting chain before it reaches input chain and get its reject back. As said, it is dnat which is limited to prerouting. Numgen itself works everywhere. If there is a known criteria identifying a client without registered endpoint, you could match on that and 'accept' early in prerouting. This will make the packet go to input/forward directly without traversing the remaining prerouting rules. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-26 21:20 ` Serguei Bezverkhi (sbezverk) 2019-11-26 22:15 ` Phil Sutter @ 2019-11-27 10:11 ` Arturo Borrero Gonzalez 2019-11-27 11:57 ` Phil Sutter 2019-11-27 14:36 ` Serguei Bezverkhi (sbezverk) 1 sibling, 2 replies; 34+ messages in thread From: Arturo Borrero Gonzalez @ 2019-11-27 10:11 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Phil Sutter, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia On 11/26/19 10:20 PM, Serguei Bezverkhi (sbezverk) wrote: > On Tue, Nov 26, 2019 at 06:47:09PM +0000, Serguei Bezverkhi (sbezverk) wrote: > > Ok, I guess I will work around by using input and output chain types, even though it will raise some brows in k8s networking community. > > @Sergei, thanks for reaching out about this topic. I'm using k8s a lot lately and would be interested in knowing more about what you are trying to do with kubernetes and nftables. In any case, if the somebody in kubernetes is planning to introduce nft for kube-proxy or other component, I would suggest the generated ruleset is validated here to really benefit from nftables. Is this what you are doing, right? Recently I had the chance to attend a talk by @Laura (in CC) about the iptables ruleset generated by docker and kube-proxy. Such rulesets are the opposite of something meant to scale and perform well. Then people compare such rulesets with other networking setups... and unfair compare. Worth mentioning at this point this PoC too: https://github.com/zevenet/kube-nftlb Trying to mimic 1:1 what iptables was doing is a mistake from my point of view. I believe you are aware of this already :-) > > Keeping both target address and port in a single map for *NAT statements > is not possible AFAIK. @Phil, I think it is possible! examples in the wiki: https://wiki.nftables.org/wiki-nftables/index.php/Multiple_NATs_using_nftables_maps It would be something like: % nft add rule nat prerouting dnat \ tcp dport map { 1000 : 1.1.1.1, 2000 : 2.2.2.2, 3000 : 3.3.3.3} \ : tcp dport map { 1000 : 1234, 2000 : 2345, 3000 : 3456 } > > If I'm not mistaken, you might be able to hook up a vmap together with > the numgen expression above like so: > > | numgen random mod 0x2 vmap { \ > | 0x0: jump KUBE-SEP-FS3FUULGZPVD4VYB, \ > | 0x1: jump KUBE-SEP-MMFZROQSLQ3DKOQA } > > Pure speculation, though. :) > This works indeed. Just added the example to the wiki: https://wiki.nftables.org/wiki-nftables/index.php/Load_balancing#Round_Robin ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 10:11 ` Arturo Borrero Gonzalez @ 2019-11-27 11:57 ` Phil Sutter 2019-11-27 14:36 ` Serguei Bezverkhi (sbezverk) 1 sibling, 0 replies; 34+ messages in thread From: Phil Sutter @ 2019-11-27 11:57 UTC (permalink / raw) To: Arturo Borrero Gonzalez Cc: Serguei Bezverkhi (sbezverk), Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi Arturo, On Wed, Nov 27, 2019 at 11:11:32AM +0100, Arturo Borrero Gonzalez wrote: > On 11/26/19 10:20 PM, Serguei Bezverkhi (sbezverk) wrote: > > On Tue, Nov 26, 2019 at 06:47:09PM +0000, Serguei Bezverkhi (sbezverk) wrote: > > > Ok, I guess I will work around by using input and output chain types, even though it will raise some brows in k8s networking community. > > > > > @Sergei, thanks for reaching out about this topic. > > I'm using k8s a lot lately and would be interested in knowing more about what > you are trying to do with kubernetes and nftables. > > In any case, if the somebody in kubernetes is planning to introduce nft for > kube-proxy or other component, I would suggest the generated ruleset is > validated here to really benefit from nftables. Is this what you are doing, right? > > Recently I had the chance to attend a talk by @Laura (in CC) about the iptables > ruleset generated by docker and kube-proxy. Such rulesets are the opposite of > something meant to scale and perform well. Then people compare such rulesets > with other networking setups... and unfair compare. > > Worth mentioning at this point this PoC too: > > https://github.com/zevenet/kube-nftlb > > Trying to mimic 1:1 what iptables was doing is a mistake from my point of view. > I believe you are aware of this already :-) > > > > > Keeping both target address and port in a single map for *NAT statements > > is not possible AFAIK. > > @Phil, I think it is possible! examples in the wiki: > > https://wiki.nftables.org/wiki-nftables/index.php/Multiple_NATs_using_nftables_maps > > It would be something like: > > % nft add rule nat prerouting dnat \ > tcp dport map { 1000 : 1.1.1.1, 2000 : 2.2.2.2, 3000 : 3.3.3.3} \ > : tcp dport map { 1000 : 1234, 2000 : 2345, 3000 : 3456 } Ah, thanks! Using two maps didn't come to mind. > > If I'm not mistaken, you might be able to hook up a vmap together with > > the numgen expression above like so: > > > > | numgen random mod 0x2 vmap { \ > > | 0x0: jump KUBE-SEP-FS3FUULGZPVD4VYB, \ > > | 0x1: jump KUBE-SEP-MMFZROQSLQ3DKOQA } > > > > Pure speculation, though. :) > > > > This works indeed. Just added the example to the wiki: > > https://wiki.nftables.org/wiki-nftables/index.php/Load_balancing#Round_Robin Thanks, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 10:11 ` Arturo Borrero Gonzalez 2019-11-27 11:57 ` Phil Sutter @ 2019-11-27 14:36 ` Serguei Bezverkhi (sbezverk) 2019-11-27 15:08 ` Phil Sutter 1 sibling, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-27 14:36 UTC (permalink / raw) To: Arturo Borrero Gonzalez Cc: Phil Sutter, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hello Arturo, Thanks a lot for your reply, my ultimate goal is to develop kube-proxy which is building nftables rules instead of iptables, in addition the goal is to use direct API calls to netlink without any external dependencies and of course to try to leverage nftables' advanced features to achieve the best performance. I am in the process of identifying gaps in functionality available in github.com/google/nftables and github.com/sbezverk/nftableslib libraries, example yesterday I found out that neither of these libraries supports "numgen", which would be a mandatory feature to support load balancing between service's multiple end points. I will have to add it to both to be able to move forward. I use iptables from a working cluster and try to build a code which would program nftables the same way (with optimization). Once it is done, then it can be arranged into a controller listening for svc/endpoints and program into nftables accordingly. I am looking for people interested in the same topic to be able to discuss different approaches, like it was done yesterday with Phil and select the best approach to make nftables to shine ( Please let me know if you are interested in further discussions. Thank you Serguei On 2019-11-27, 5:12 AM, "Arturo Borrero Gonzalez" <arturo@netfilter.org> wrote: On 11/26/19 10:20 PM, Serguei Bezverkhi (sbezverk) wrote: > On Tue, Nov 26, 2019 at 06:47:09PM +0000, Serguei Bezverkhi (sbezverk) wrote: > > Ok, I guess I will work around by using input and output chain types, even though it will raise some brows in k8s networking community. > > @Sergei, thanks for reaching out about this topic. I'm using k8s a lot lately and would be interested in knowing more about what you are trying to do with kubernetes and nftables. In any case, if the somebody in kubernetes is planning to introduce nft for kube-proxy or other component, I would suggest the generated ruleset is validated here to really benefit from nftables. Is this what you are doing, right? Recently I had the chance to attend a talk by @Laura (in CC) about the iptables ruleset generated by docker and kube-proxy. Such rulesets are the opposite of something meant to scale and perform well. Then people compare such rulesets with other networking setups... and unfair compare. Worth mentioning at this point this PoC too: https://github.com/zevenet/kube-nftlb Trying to mimic 1:1 what iptables was doing is a mistake from my point of view. I believe you are aware of this already :-) > > Keeping both target address and port in a single map for *NAT statements > is not possible AFAIK. @Phil, I think it is possible! examples in the wiki: https://wiki.nftables.org/wiki-nftables/index.php/Multiple_NATs_using_nftables_maps It would be something like: % nft add rule nat prerouting dnat \ tcp dport map { 1000 : 1.1.1.1, 2000 : 2.2.2.2, 3000 : 3.3.3.3} \ : tcp dport map { 1000 : 1234, 2000 : 2345, 3000 : 3456 } > > If I'm not mistaken, you might be able to hook up a vmap together with > the numgen expression above like so: > > | numgen random mod 0x2 vmap { \ > | 0x0: jump KUBE-SEP-FS3FUULGZPVD4VYB, \ > | 0x1: jump KUBE-SEP-MMFZROQSLQ3DKOQA } > > Pure speculation, though. :) > This works indeed. Just added the example to the wiki: https://wiki.nftables.org/wiki-nftables/index.php/Load_balancing#Round_Robin ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 14:36 ` Serguei Bezverkhi (sbezverk) @ 2019-11-27 15:08 ` Phil Sutter 2019-11-27 15:35 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-27 15:08 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi Serguei, On Wed, Nov 27, 2019 at 02:36:07PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Thanks a lot for your reply, my ultimate goal is to develop kube-proxy which is building nftables rules instead of iptables, in addition the goal is to use direct API calls to netlink without any external dependencies and of course to try to leverage nftables' advanced features to achieve the best performance. > > I am in the process of identifying gaps in functionality available in github.com/google/nftables and github.com/sbezverk/nftableslib libraries, example yesterday I found out that neither of these libraries supports "numgen", which would be a mandatory feature to support load balancing between service's multiple end points. I will have to add it to both to be able to move forward. > I use iptables from a working cluster and try to build a code which would program nftables the same way (with optimization). Once it is done, then it can be arranged into a controller listening for svc/endpoints and program into nftables accordingly. > > I am looking for people interested in the same topic to be able to discuss different approaches, like it was done yesterday with Phil and select the best approach to make nftables to shine ( > > Please let me know if you are interested in further discussions. Yes, we're definitely interested further discussion/cooperation. You're using the JSON API for nftableslib, right? Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 15:08 ` Phil Sutter @ 2019-11-27 15:35 ` Serguei Bezverkhi (sbezverk) 2019-11-27 16:06 ` Phil Sutter 0 siblings, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-27 15:35 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia HI Phil, No, I do not, nftableslib talks directly talk to netlink connection. nftableslib offers an API which allows create tables/chains/rules and exposes an interface which looks similar to k8s client-go. If you check https://github.com/sbezverk/nftableslib/blob/master/cmd/e2e/e2e.go It will give you a good idea how it operates. The reason for going in this direction is performance, for a relatively static applications like a firewall, json approach is great, but for applications like a kube-proxy where hundreds or even thousands of service/endpoint events happen, I do not believe json is a right approach. When I talked to api machinery folks I was given 5k events per second as a target. Thank you Serguei On 2019-11-27, 10:09 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Wed, Nov 27, 2019 at 02:36:07PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Thanks a lot for your reply, my ultimate goal is to develop kube-proxy which is building nftables rules instead of iptables, in addition the goal is to use direct API calls to netlink without any external dependencies and of course to try to leverage nftables' advanced features to achieve the best performance. > > I am in the process of identifying gaps in functionality available in github.com/google/nftables and github.com/sbezverk/nftableslib libraries, example yesterday I found out that neither of these libraries supports "numgen", which would be a mandatory feature to support load balancing between service's multiple end points. I will have to add it to both to be able to move forward. > I use iptables from a working cluster and try to build a code which would program nftables the same way (with optimization). Once it is done, then it can be arranged into a controller listening for svc/endpoints and program into nftables accordingly. > > I am looking for people interested in the same topic to be able to discuss different approaches, like it was done yesterday with Phil and select the best approach to make nftables to shine ( > > Please let me know if you are interested in further discussions. Yes, we're definitely interested further discussion/cooperation. You're using the JSON API for nftableslib, right? Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 15:35 ` Serguei Bezverkhi (sbezverk) @ 2019-11-27 16:06 ` Phil Sutter 2019-11-27 16:50 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-27 16:06 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi, On Wed, Nov 27, 2019 at 03:35:04PM +0000, Serguei Bezverkhi (sbezverk) wrote: > No, I do not, nftableslib talks directly talk to netlink connection. > > nftableslib offers an API which allows create tables/chains/rules and exposes an interface which looks similar to k8s client-go. If you check https://github.com/sbezverk/nftableslib/blob/master/cmd/e2e/e2e.go > > It will give you a good idea how it operates. > > The reason for going in this direction is performance, for a relatively static applications like a firewall, json approach is great, but for applications like a kube-proxy where hundreds or even thousands of service/endpoint events happen, I do not believe json is a right approach. When I talked to api machinery folks I was given 5k events per second as a target. So you're bypassing both libnftables and libnftnl. Those 5k events per second are a benchmark, not an expected load, right? While you're obviously searching for the most performance, the drawback is complexity. Using JSON (and thereby libnftables and libnftnl as backends) a task like utilizing numgen expression is relatively simple. A problem you won't get rid of with the move from iptables to nftables is concurrent use: The "let's insert our rules on top" approach to dealing with an existing ruleset or other users is obviously not the best one. I guess you're aiming at dedicated applications where this is not an issue but for "general purpose" applications I guess a k8s backend communicating with firewalld would be a good approach of customizing host's firewall setup without stepping onto others' toes. Back to topic, you are creating a static ruleset based on the iptables one you got for simple comparison tests or are you already over that? If not, I guess it would be a good basis for high level ruleset optimization discussions. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 16:06 ` Phil Sutter @ 2019-11-27 16:50 ` Serguei Bezverkhi (sbezverk) 2019-11-27 17:22 ` Phil Sutter 0 siblings, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-27 16:50 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi, According to api folks kube-proxy must sustain 5k or about test otherwise it will never see production environment. Implementing of numgen expression is relatively simple, thanks to "nft --debug all" once it's done, a user can use it as easily as with json __ Regarding concurrent usage, since my primary goal is kube-proxy I do not really care at this moment, as k8s cluster is not an application you co-locate in production with some other applications potentially altering host's tables. I agree firewalld might be interesting and more generic alternative, but seeing how quickly things are done in k8s, maybe it will be done by the end of 21st century __ Once I get filter chain portion in the code I will share a link to repo so you could review. Thanks a lot for this discussion, very useful Serguei On 2019-11-27, 11:08 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Wed, Nov 27, 2019 at 03:35:04PM +0000, Serguei Bezverkhi (sbezverk) wrote: > No, I do not, nftableslib talks directly talk to netlink connection. > > nftableslib offers an API which allows create tables/chains/rules and exposes an interface which looks similar to k8s client-go. If you check https://github.com/sbezverk/nftableslib/blob/master/cmd/e2e/e2e.go > > It will give you a good idea how it operates. > > The reason for going in this direction is performance, for a relatively static applications like a firewall, json approach is great, but for applications like a kube-proxy where hundreds or even thousands of service/endpoint events happen, I do not believe json is a right approach. When I talked to api machinery folks I was given 5k events per second as a target. So you're bypassing both libnftables and libnftnl. Those 5k events per second are a benchmark, not an expected load, right? While you're obviously searching for the most performance, the drawback is complexity. Using JSON (and thereby libnftables and libnftnl as backends) a task like utilizing numgen expression is relatively simple. A problem you won't get rid of with the move from iptables to nftables is concurrent use: The "let's insert our rules on top" approach to dealing with an existing ruleset or other users is obviously not the best one. I guess you're aiming at dedicated applications where this is not an issue but for "general purpose" applications I guess a k8s backend communicating with firewalld would be a good approach of customizing host's firewall setup without stepping onto others' toes. Back to topic, you are creating a static ruleset based on the iptables one you got for simple comparison tests or are you already over that? If not, I guess it would be a good basis for high level ruleset optimization discussions. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 16:50 ` Serguei Bezverkhi (sbezverk) @ 2019-11-27 17:22 ` Phil Sutter 2019-11-28 1:22 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-27 17:22 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi, On Wed, Nov 27, 2019 at 04:50:56PM +0000, Serguei Bezverkhi (sbezverk) wrote: > According to api folks kube-proxy must sustain 5k or about test otherwise it will never see production environment. Implementing of numgen expression is relatively simple, thanks to "nft --debug all" once it's done, a user can use it as easily as with json __ > > Regarding concurrent usage, since my primary goal is kube-proxy I do not really care at this moment, as k8s cluster is not an application you co-locate in production with some other applications potentially altering host's tables. I agree firewalld might be interesting and more generic alternative, but seeing how quickly things are done in k8s, maybe it will be done by the end of 21st century __ I agree, in dedicated setup there's no need for compromises. I guess if you manage to reduce ruleset changes to mere set element modifications, you could outperform iptables in that regard. Run-time performance of the resulting ruleset will obviously benefit from set/map use as there are much fewer rules to traverse for each packet. > Once I get filter chain portion in the code I will share a link to repo so you could review. Thanks! I'm also interested in seeing whether there are any inconveniences due to nftables limitations. Maybe some problems are easier solved on kernel-side. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-27 17:22 ` Phil Sutter @ 2019-11-28 1:22 ` Serguei Bezverkhi (sbezverk) 2019-11-28 9:10 ` Laura Garcia 2019-11-28 13:08 ` Phil Sutter 0 siblings, 2 replies; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-28 1:22 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hello Phil, Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. Here is the location of code programming these rules. https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. sudo nft list table ipv4table table ip ipv4table { set svc1-no-endpoints { type inet_service elements = { 8989 } } chain filter-input { type filter hook input priority filter; policy accept; ct state new jump k8s-filter-services jump k8s-filter-firewall } chain filter-output { type filter hook output priority filter; policy accept; ct state new jump k8s-filter-services jump k8s-filter-firewall } chain filter-forward { type filter hook forward priority filter; policy accept; jump k8s-filter-forward ct state new jump k8s-filter-services } chain k8s-filter-ext-services { } chain k8s-filter-firewall { meta mark 0x00008000 drop } chain k8s-filter-services { ip daddr 192.168.80.104 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable ip daddr 57.131.151.19 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable } chain k8s-filter-forward { ct state invalid drop meta mark 0x00004000 accept ip saddr 57.112.0.0/12 ct state established,related accept ip daddr 57.112.0.0/12 ct state established,related accept } } Thank you Serguei On 2019-11-27, 12:22 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Wed, Nov 27, 2019 at 04:50:56PM +0000, Serguei Bezverkhi (sbezverk) wrote: > According to api folks kube-proxy must sustain 5k or about test otherwise it will never see production environment. Implementing of numgen expression is relatively simple, thanks to "nft --debug all" once it's done, a user can use it as easily as with json __ > > Regarding concurrent usage, since my primary goal is kube-proxy I do not really care at this moment, as k8s cluster is not an application you co-locate in production with some other applications potentially altering host's tables. I agree firewalld might be interesting and more generic alternative, but seeing how quickly things are done in k8s, maybe it will be done by the end of 21st century __ I agree, in dedicated setup there's no need for compromises. I guess if you manage to reduce ruleset changes to mere set element modifications, you could outperform iptables in that regard. Run-time performance of the resulting ruleset will obviously benefit from set/map use as there are much fewer rules to traverse for each packet. > Once I get filter chain portion in the code I will share a link to repo so you could review. Thanks! I'm also interested in seeing whether there are any inconveniences due to nftables limitations. Maybe some problems are easier solved on kernel-side. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 1:22 ` Serguei Bezverkhi (sbezverk) @ 2019-11-28 9:10 ` Laura Garcia 2019-11-28 11:58 ` Serguei Bezverkhi (sbezverk) 2019-11-28 13:08 ` Phil Sutter 1 sibling, 1 reply; 34+ messages in thread From: Laura Garcia @ 2019-11-28 9:10 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Phil Sutter, Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel Hi, I guess we had a very similar conversation with the sig-network guys. Please see below some comments. On Thu, Nov 28, 2019 at 2:22 AM Serguei Bezverkhi (sbezverk) <sbezverk@cisco.com> wrote: > > Hello Phil, > > Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. > > Here is the location of code programming these rules. > https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go > > Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. > > > sudo nft list table ipv4table > table ip ipv4table { > set svc1-no-endpoints { > type inet_service > elements = { 8989 } > } > > chain filter-input { > type filter hook input priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-output { > type filter hook output priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-forward { > type filter hook forward priority filter; policy accept; > jump k8s-filter-forward > ct state new jump k8s-filter-services > } > > chain k8s-filter-ext-services { > } > > chain k8s-filter-firewall { > meta mark 0x00008000 drop > } > > chain k8s-filter-services { > ip daddr 192.168.80.104 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable > ip daddr 57.131.151.19 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable > } > Here you're going to have the same problems with iptables, lack of scalability and complexity during rules removal. In nftlb we create maps and with the same rules, you only have to take care of insert and remove elements in them. Some extensive examples here: https://github.com/zevenet/nftlb/tree/master/tests In regards to the ip : port natting, is not possible to use 2 maps cause you need to generate numgen per each one and it will come to different numbers. Cheers. > chain k8s-filter-forward { > ct state invalid drop > meta mark 0x00004000 accept > ip saddr 57.112.0.0/12 ct state established,related accept > ip daddr 57.112.0.0/12 ct state established,related accept > } > } > > Thank you > Serguei > > On 2019-11-27, 12:22 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: > > Hi, > > On Wed, Nov 27, 2019 at 04:50:56PM +0000, Serguei Bezverkhi (sbezverk) wrote: > > According to api folks kube-proxy must sustain 5k or about test otherwise it will never see production environment. Implementing of numgen expression is relatively simple, thanks to "nft --debug all" once it's done, a user can use it as easily as with json __ > > > > Regarding concurrent usage, since my primary goal is kube-proxy I do not really care at this moment, as k8s cluster is not an application you co-locate in production with some other applications potentially altering host's tables. I agree firewalld might be interesting and more generic alternative, but seeing how quickly things are done in k8s, maybe it will be done by the end of 21st century __ > > I agree, in dedicated setup there's no need for compromises. I guess if > you manage to reduce ruleset changes to mere set element modifications, > you could outperform iptables in that regard. Run-time performance of > the resulting ruleset will obviously benefit from set/map use as there > are much fewer rules to traverse for each packet. > > > Once I get filter chain portion in the code I will share a link to repo so you could review. > > Thanks! I'm also interested in seeing whether there are any > inconveniences due to nftables limitations. Maybe some problems are > easier solved on kernel-side. > > Cheers, Phil > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 9:10 ` Laura Garcia @ 2019-11-28 11:58 ` Serguei Bezverkhi (sbezverk) 0 siblings, 0 replies; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-28 11:58 UTC (permalink / raw) To: Laura Garcia Cc: Phil Sutter, Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel Hello Laura, Thank you for your comments and link. Maybe I have not reach that point, but I do not see complexity in rules removal. Maybe it is because in case of json, rule's handle has not been reported back to the caller? In my case since a rule gets created individually and directly, I get back uint64 of rule handle which can be easily associated with a service or endpoint, same for set/map/vmap. Any further changes with service like removal/(add delete) endpoints, the rule handle or maybe handles will be available. I do not have code yet ready for this part, but I have done a rule update by using rule's handle for other things and it worked. Thanks again for your feedback. Once I have code for rules management, I will ask you to review it if you do not mind. Serguei On 2019-11-28, 4:10 AM, "Laura Garcia" <nevola@gmail.com> wrote: Hi, I guess we had a very similar conversation with the sig-network guys. Please see below some comments. On Thu, Nov 28, 2019 at 2:22 AM Serguei Bezverkhi (sbezverk) <sbezverk@cisco.com> wrote: > > Hello Phil, > > Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. > > Here is the location of code programming these rules. > https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go > > Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. > > > sudo nft list table ipv4table > table ip ipv4table { > set svc1-no-endpoints { > type inet_service > elements = { 8989 } > } > > chain filter-input { > type filter hook input priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-output { > type filter hook output priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-forward { > type filter hook forward priority filter; policy accept; > jump k8s-filter-forward > ct state new jump k8s-filter-services > } > > chain k8s-filter-ext-services { > } > > chain k8s-filter-firewall { > meta mark 0x00008000 drop > } > > chain k8s-filter-services { > ip daddr 192.168.80.104 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable > ip daddr 57.131.151.19 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable > } > Here you're going to have the same problems with iptables, lack of scalability and complexity during rules removal. In nftlb we create maps and with the same rules, you only have to take care of insert and remove elements in them. Some extensive examples here: https://github.com/zevenet/nftlb/tree/master/tests In regards to the ip : port natting, is not possible to use 2 maps cause you need to generate numgen per each one and it will come to different numbers. Cheers. > chain k8s-filter-forward { > ct state invalid drop > meta mark 0x00004000 accept > ip saddr 57.112.0.0/12 ct state established,related accept > ip daddr 57.112.0.0/12 ct state established,related accept > } > } > > Thank you > Serguei > > On 2019-11-27, 12:22 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: > > Hi, > > On Wed, Nov 27, 2019 at 04:50:56PM +0000, Serguei Bezverkhi (sbezverk) wrote: > > According to api folks kube-proxy must sustain 5k or about test otherwise it will never see production environment. Implementing of numgen expression is relatively simple, thanks to "nft --debug all" once it's done, a user can use it as easily as with json __ > > > > Regarding concurrent usage, since my primary goal is kube-proxy I do not really care at this moment, as k8s cluster is not an application you co-locate in production with some other applications potentially altering host's tables. I agree firewalld might be interesting and more generic alternative, but seeing how quickly things are done in k8s, maybe it will be done by the end of 21st century __ > > I agree, in dedicated setup there's no need for compromises. I guess if > you manage to reduce ruleset changes to mere set element modifications, > you could outperform iptables in that regard. Run-time performance of > the resulting ruleset will obviously benefit from set/map use as there > are much fewer rules to traverse for each packet. > > > Once I get filter chain portion in the code I will share a link to repo so you could review. > > Thanks! I'm also interested in seeing whether there are any > inconveniences due to nftables limitations. Maybe some problems are > easier solved on kernel-side. > > Cheers, Phil > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 1:22 ` Serguei Bezverkhi (sbezverk) 2019-11-28 9:10 ` Laura Garcia @ 2019-11-28 13:08 ` Phil Sutter 2019-11-28 13:34 ` Serguei Bezverkhi (sbezverk) 2019-11-28 14:51 ` Serguei Bezverkhi (sbezverk) 1 sibling, 2 replies; 34+ messages in thread From: Phil Sutter @ 2019-11-28 13:08 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi Serguei, On Thu, Nov 28, 2019 at 01:22:17AM +0000, Serguei Bezverkhi (sbezverk) wrote: > Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. > > Here is the location of code programming these rules. > https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go > > Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. What about something like this: | table ip t { | map m { | type ipv4_addr . inet_service : verdict | elements = { 192.168.80.104 . 8989 : goto do_reject } | } | | chain c { | ip daddr . tcp dport vmap @m | } | | chain do_reject { | reject with icmp type host-unreachable | } | } For unknown reasons reject statement can't be used directly in a verdict map, but the do_reject chain hack works. > sudo nft list table ipv4table > table ip ipv4table { > set svc1-no-endpoints { > type inet_service > elements = { 8989 } > } > > chain filter-input { > type filter hook input priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-output { > type filter hook output priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } Same ruleset for input and output? Seems weird given the daddr-based filtering in k8s-filter-services. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 13:08 ` Phil Sutter @ 2019-11-28 13:34 ` Serguei Bezverkhi (sbezverk) 2019-11-28 14:51 ` Serguei Bezverkhi (sbezverk) 1 sibling, 0 replies; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-28 13:34 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hello Phil, Thanks a lot for your suggestions, I will refactor using approach. Best regards Serguei On 2019-11-28, 8:08 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Thu, Nov 28, 2019 at 01:22:17AM +0000, Serguei Bezverkhi (sbezverk) wrote: > Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. > > Here is the location of code programming these rules. > https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go > > Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. What about something like this: | table ip t { | map m { | type ipv4_addr . inet_service : verdict | elements = { 192.168.80.104 . 8989 : goto do_reject } | } | | chain c { | ip daddr . tcp dport vmap @m | } | | chain do_reject { | reject with icmp type host-unreachable | } | } For unknown reasons reject statement can't be used directly in a verdict map, but the do_reject chain hack works. This is exactly what I was looking for, it is just I never knew you could combine address and port in the key.. > sudo nft list table ipv4table > table ip ipv4table { > set svc1-no-endpoints { > type inet_service > elements = { 8989 } > } > > chain filter-input { > type filter hook input priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-output { > type filter hook output priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } Same ruleset for input and output? Seems weird given the daddr-based filtering in k8s-filter-services. I will review one more time k8s filter input/output to confirm if I got something wrong. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 13:08 ` Phil Sutter 2019-11-28 13:34 ` Serguei Bezverkhi (sbezverk) @ 2019-11-28 14:51 ` Serguei Bezverkhi (sbezverk) 2019-11-28 15:15 ` Phil Sutter 1 sibling, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-28 14:51 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi Phil, Quick question, it appears that we do not support yet combining of two types into a key, so I need to quickly add it, your help would be appreciated. Here is the sequence I get to create such map: sudo nft --debug all add map ipv4table no-endpoint-services { type ipv4_addr . inet_service : verdict \; } ---------------- ------------------ | 02 00 00 00 | | extra header | |00014|--|00001| |len |flags| type| | 69 70 76 34 | | data | i p v 4 | 74 61 62 6c | | data | t a b l | 65 00 00 00 | | data | e |00025|--|00002| |len |flags| type| | 6e 6f 2d 65 | | data | n o - e | 6e 64 70 6f | | data | n d p o | 69 6e 74 2d | | data | i n t - | 73 65 72 76 | | data | s e r v | 69 63 65 73 | | data | i c e s | 00 00 00 00 | | data | |00008|--|00003| |len |flags| type| NFTA_SET_FLAGS | 00 00 00 08 | | data | NFT_SET_MAP = 0x8 |00008|--|00004| |len |flags| type| NFTA_SET_KEY_TYPE = 0x4 | 00 00 01 cd | | data | |00008|--|00005| |len |flags| type| NFTA_SET_KEY_LEN = 0x5 | 00 00 00 08 | | data | |00008|--|00006| |len |flags| type| NFTA_SET_DATA_TYPE = 0x6 Verdict | ff ff ff 00 | | data | |00008|--|00007| |len |flags| type| NFTA_SET_DATA_LEN = 0x7 | 00 00 00 00 | | data | |00008|--|00010| |len |flags| type| NFTA_SET_ID = 0xa | 00 00 00 01 | | data | |00016|--|00013| |len |flags| type| | 00 04 00 00 | | data | | 00 00 01 04 | | data | | 00 00 00 00 | | data | ---------------- ------------------ Almost all is clear except 2 points; how set flag "00 00 01 cd " is generated and when key length is 8 and not 6. Thanks a lot Serguei On 2019-11-28, 8:08 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Thu, Nov 28, 2019 at 01:22:17AM +0000, Serguei Bezverkhi (sbezverk) wrote: > Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. > > Here is the location of code programming these rules. > https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go > > Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. What about something like this: | table ip t { | map m { | type ipv4_addr . inet_service : verdict | elements = { 192.168.80.104 . 8989 : goto do_reject } | } | | chain c { | ip daddr . tcp dport vmap @m | } | | chain do_reject { | reject with icmp type host-unreachable | } | } For unknown reasons reject statement can't be used directly in a verdict map, but the do_reject chain hack works. > sudo nft list table ipv4table > table ip ipv4table { > set svc1-no-endpoints { > type inet_service > elements = { 8989 } > } > > chain filter-input { > type filter hook input priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-output { > type filter hook output priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } Same ruleset for input and output? Seems weird given the daddr-based filtering in k8s-filter-services. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 14:51 ` Serguei Bezverkhi (sbezverk) @ 2019-11-28 15:15 ` Phil Sutter 2019-11-29 20:13 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-28 15:15 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi, On Thu, Nov 28, 2019 at 02:51:36PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Quick question, it appears that we do not support yet combining of two types into a key, so I need to quickly add it, your help would be appreciated. Here is the sequence I get to create such map: > sudo nft --debug all add map ipv4table no-endpoint-services { type ipv4_addr . inet_service : verdict \; } > [...] > > Almost all is clear except 2 points; how set flag "00 00 01 cd " is generated and when key length is 8 and not 6. I've been through that recently when implementing among match support in iptables-nft (which uses an anonymous set with concatenated elements internally). Please have a look at the relevant code here: https://git.netfilter.org/iptables/tree/iptables/nft.c#n999 I guess this helps clarifying how set flags are created and how to pad element data. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-28 15:15 ` Phil Sutter @ 2019-11-29 20:13 ` Serguei Bezverkhi (sbezverk) 2019-11-30 0:04 ` Phil Sutter 0 siblings, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-11-29 20:13 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hello, @Phil, thanks so much for Concat suggestion. Any more points for optimization? If no, then I will move to nat portion of k8s iptables. Here are rules generated with refactored code: table ip ipv4table { map no-endpoints-services { type ipv4_addr . inet_service : verdict elements = { 57.131.151.19 . 8989 : jump k8s-filter-do-reject, 192.168.80.104 . 8989 : jump k8s-filter-do-reject } } chain filter-input { type filter hook input priority filter; policy accept; ct state new jump k8s-filter-services jump k8s-filter-firewall } chain filter-output { type filter hook output priority filter; policy accept; ct state new jump k8s-filter-services jump k8s-filter-firewall } chain filter-forward { type filter hook forward priority filter; policy accept; jump k8s-filter-forward ct state new jump k8s-filter-services } chain k8s-filter-firewall { meta mark 0x00008000 drop } chain k8s-filter-services { ip daddr . tcp dport vmap @no-endpoints-services } chain k8s-filter-forward { ct state invalid drop meta mark 0x00004000 accept ip saddr 57.112.0.0/12 ct state established,related accept ip daddr 57.112.0.0/12 ct state established,related accept } chain k8s-filter-do-reject { reject with icmp type host-unreachable } } Thank you Serguei On 2019-11-28, 10:15 AM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi, On Thu, Nov 28, 2019 at 02:51:36PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Quick question, it appears that we do not support yet combining of two types into a key, so I need to quickly add it, your help would be appreciated. Here is the sequence I get to create such map: > sudo nft --debug all add map ipv4table no-endpoint-services { type ipv4_addr . inet_service : verdict \; } > [...] > > Almost all is clear except 2 points; how set flag "00 00 01 cd " is generated and when key length is 8 and not 6. I've been through that recently when implementing among match support in iptables-nft (which uses an anonymous set with concatenated elements internally). Please have a look at the relevant code here: https://git.netfilter.org/iptables/tree/iptables/nft.c#n999 I guess this helps clarifying how set flags are created and how to pad element data. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-29 20:13 ` Serguei Bezverkhi (sbezverk) @ 2019-11-30 0:04 ` Phil Sutter 2019-12-03 18:43 ` Serguei Bezverkhi (sbezverk) 0 siblings, 1 reply; 34+ messages in thread From: Phil Sutter @ 2019-11-30 0:04 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi Serguei, On Fri, Nov 29, 2019 at 08:13:21PM +0000, Serguei Bezverkhi (sbezverk) wrote: > @Phil, thanks so much for Concat suggestion. Any more points for optimization? If no, then I will move to nat portion of k8s iptables. Looks fine to me. I don't like the mark-based verdicts, but to validate those we need to see where the marks are set. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-30 0:04 ` Phil Sutter @ 2019-12-03 18:43 ` Serguei Bezverkhi (sbezverk) 2019-12-04 10:36 ` Phil Sutter 0 siblings, 1 reply; 34+ messages in thread From: Serguei Bezverkhi (sbezverk) @ 2019-12-03 18:43 UTC (permalink / raw) To: Phil Sutter Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hello Phil, Started working on nat portion and here is iptables rule which is a bit concerning. -A KUBE-SERVICES -d 192.168.80.104/32 -p tcp -m comment --comment "default/portal:portal external IP" -m tcp --dport 8989 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-MUPXPVK4XAZHSWAR I can address " addrtype" with nftables "fib" and " iif type local" but I am not sure about "physdev", appreciate any suggestions. Thank you Serguei On 2019-11-29, 7:04 PM, "n0-1@orbyte.nwl.cc on behalf of Phil Sutter" <n0-1@orbyte.nwl.cc on behalf of phil@nwl.cc> wrote: Hi Serguei, On Fri, Nov 29, 2019 at 08:13:21PM +0000, Serguei Bezverkhi (sbezverk) wrote: > @Phil, thanks so much for Concat suggestion. Any more points for optimization? If no, then I will move to nat portion of k8s iptables. Looks fine to me. I don't like the mark-based verdicts, but to validate those we need to see where the marks are set. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-12-03 18:43 ` Serguei Bezverkhi (sbezverk) @ 2019-12-04 10:36 ` Phil Sutter 0 siblings, 0 replies; 34+ messages in thread From: Phil Sutter @ 2019-12-04 10:36 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk) Cc: Arturo Borrero Gonzalez, Pablo Neira Ayuso, Florian Westphal, netfilter-devel, Laura Garcia Hi, On Tue, Dec 03, 2019 at 06:43:19PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Started working on nat portion and here is iptables rule which is a bit concerning. > > -A KUBE-SERVICES -d 192.168.80.104/32 -p tcp -m comment --comment "default/portal:portal external IP" -m tcp --dport 8989 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-MUPXPVK4XAZHSWAR > > I can address " addrtype" with nftables "fib" and " iif type local" but I am not sure about "physdev", appreciate any suggestions. I think you can use 'meta iiftype != "bridge"' in this case. Cheers, Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Operation not supported when adding jump command 2019-11-25 18:55 Operation not supported when adding jump command Serguei Bezverkhi (sbezverk) 2019-11-26 12:21 ` Florian Westphal @ 2019-12-03 23:50 ` Duncan Roe 2019-12-04 1:13 ` [PATCH nft] doc: Clarify conditions under which a reject verdict is permissible Duncan Roe 2019-12-06 2:37 ` [PATCH nft v2] " Duncan Roe 1 sibling, 2 replies; 34+ messages in thread From: Duncan Roe @ 2019-12-03 23:50 UTC (permalink / raw) To: Serguei Bezverkhi (sbezverk); +Cc: Pablo Neira Ayuso, netfilter-devel On Mon, Nov 25, 2019 at 06:55:41PM +0000, Serguei Bezverkhi (sbezverk) wrote: > Hello Pablo, > > Please see below table/chain/rules/sets I program, when I try to add jump from input-net, input-local to services it fails with " Operation not supported" , I would appreciate if somebody could help to understand why: > > sudo nft add rule ipv4table input-net jump services > Error: Could not process rule: Operation not supported > add rule ipv4table input-net jump services > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > table ip ipv4table { > set no-endpoint-svc-ports { > type inet_service > elements = { 8080, 8989 } > } > > set no-endpoint-svc-addrs { > type ipv4_addr > flags interval > elements = { 10.1.1.1, 10.1.1.2 } > } > > chain input-net { > type nat hook prerouting priority filter; policy accept; > } > > chain input-local { > type nat hook output priority filter; policy accept; > } > > chain services { > ip daddr @no-endpoint-svc-addrs tcp dport @no-endpoint-svc-ports reject with tcp reset > ip daddr @no-endpoint-svc-addrs udp dport @no-endpoint-svc-ports reject with icmp type net-unreachable > } > } > > Thank you > Serguei > Hi Serguei, The reason it files is, from *man nft*: > This statement [reject] is only valid in the input, forward and output chains, > and user-defined chains which are only called from those chains. (I inserted the bit in square brackets). The wording could perhaps be clarified: what it really means to say is Reject is only only valid in base chains using the input, forward or output hooks, and user-defined chains which are only called from those chains. Put that way, you can see why your command is rejected. Cheers ... Duncan. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH nft] doc: Clarify conditions under which a reject verdict is permissible 2019-12-03 23:50 ` Duncan Roe @ 2019-12-04 1:13 ` Duncan Roe 2019-12-06 2:37 ` [PATCH nft v2] " Duncan Roe 1 sibling, 0 replies; 34+ messages in thread From: Duncan Roe @ 2019-12-04 1:13 UTC (permalink / raw) To: pablo; +Cc: netfilter-devel, sbezverk A phrase like "input chain" is a throwback to xtables documentation. In nft, chains are containers for rules. They do have a type, but what's important here is which hook each uses. There may be other instances of this throwback elsewhere in the manual. Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au> --- doc/statements.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/statements.txt b/doc/statements.txt index 3b82436..4ff7d05 100644 --- a/doc/statements.txt +++ b/doc/statements.txt @@ -171,8 +171,9 @@ ____ A reject statement is used to send back an error packet in response to the matched packet otherwise it is equivalent to drop so it is a terminating -statement, ending rule traversal. This statement is only valid in the input, -forward and output chains, and user-defined chains which are only called from +statement, ending rule traversal. This statement is only valid in base chains +using the input, +forward or output hooks, and user-defined chains which are only called from those chains. .different ICMP reject variants are meant for use in different table families -- 2.14.5 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH nft v2] doc: Clarify conditions under which a reject verdict is permissible 2019-12-03 23:50 ` Duncan Roe 2019-12-04 1:13 ` [PATCH nft] doc: Clarify conditions under which a reject verdict is permissible Duncan Roe @ 2019-12-06 2:37 ` Duncan Roe 2019-12-06 6:55 ` Florian Westphal 1 sibling, 1 reply; 34+ messages in thread From: Duncan Roe @ 2019-12-06 2:37 UTC (permalink / raw) To: pablo; +Cc: netfilter-devel, sbezverk A phrase like "input chain" is a throwback to xtables documentation. In nft, chains are containers for rules. They do have a type, but what's important here is which hook each uses. v2: Show hook names in bold Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au> --- doc/statements.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/statements.txt b/doc/statements.txt index 3b82436..ced311c 100644 --- a/doc/statements.txt +++ b/doc/statements.txt @@ -171,8 +171,9 @@ ____ A reject statement is used to send back an error packet in response to the matched packet otherwise it is equivalent to drop so it is a terminating -statement, ending rule traversal. This statement is only valid in the input, -forward and output chains, and user-defined chains which are only called from +statement, ending rule traversal. This statement is only valid in base chains +using the *input*, +*forward* or *output* hooks, and user-defined chains which are only called from those chains. .different ICMP reject variants are meant for use in different table families -- 2.14.5 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH nft v2] doc: Clarify conditions under which a reject verdict is permissible 2019-12-06 2:37 ` [PATCH nft v2] " Duncan Roe @ 2019-12-06 6:55 ` Florian Westphal 0 siblings, 0 replies; 34+ messages in thread From: Florian Westphal @ 2019-12-06 6:55 UTC (permalink / raw) To: Duncan Roe; +Cc: pablo, netfilter-devel, sbezverk Duncan Roe <duncan_roe@optusnet.com.au> wrote: > A phrase like "input chain" is a throwback to xtables documentation. > In nft, chains are containers for rules. They do have a type, but what's > important here is which hook each uses. Applied, thanks Duncan. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2019-12-06 6:55 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-25 18:55 Operation not supported when adding jump command Serguei Bezverkhi (sbezverk) 2019-11-26 12:21 ` Florian Westphal 2019-11-26 14:30 ` Serguei Bezverkhi (sbezverk) 2019-11-26 14:52 ` Florian Westphal 2019-11-26 15:38 ` Pablo Neira Ayuso 2019-11-26 15:47 ` Serguei Bezverkhi (sbezverk) 2019-11-26 15:51 ` Phil Sutter 2019-11-26 18:47 ` Serguei Bezverkhi (sbezverk) 2019-11-26 19:27 ` Phil Sutter 2019-11-26 21:20 ` Serguei Bezverkhi (sbezverk) 2019-11-26 22:15 ` Phil Sutter 2019-11-27 10:11 ` Arturo Borrero Gonzalez 2019-11-27 11:57 ` Phil Sutter 2019-11-27 14:36 ` Serguei Bezverkhi (sbezverk) 2019-11-27 15:08 ` Phil Sutter 2019-11-27 15:35 ` Serguei Bezverkhi (sbezverk) 2019-11-27 16:06 ` Phil Sutter 2019-11-27 16:50 ` Serguei Bezverkhi (sbezverk) 2019-11-27 17:22 ` Phil Sutter 2019-11-28 1:22 ` Serguei Bezverkhi (sbezverk) 2019-11-28 9:10 ` Laura Garcia 2019-11-28 11:58 ` Serguei Bezverkhi (sbezverk) 2019-11-28 13:08 ` Phil Sutter 2019-11-28 13:34 ` Serguei Bezverkhi (sbezverk) 2019-11-28 14:51 ` Serguei Bezverkhi (sbezverk) 2019-11-28 15:15 ` Phil Sutter 2019-11-29 20:13 ` Serguei Bezverkhi (sbezverk) 2019-11-30 0:04 ` Phil Sutter 2019-12-03 18:43 ` Serguei Bezverkhi (sbezverk) 2019-12-04 10:36 ` Phil Sutter 2019-12-03 23:50 ` Duncan Roe 2019-12-04 1:13 ` [PATCH nft] doc: Clarify conditions under which a reject verdict is permissible Duncan Roe 2019-12-06 2:37 ` [PATCH nft v2] " Duncan Roe 2019-12-06 6:55 ` Florian Westphal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.