netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] Netfilter/IPVS updates for net-next
@ 2020-07-08 17:45 Pablo Neira Ayuso
  2020-07-08 17:45 ` [PATCH 01/12] netfilter: introduce support for reject at prerouting stage Pablo Neira Ayuso
                   ` (12 more replies)
  0 siblings, 13 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

Hi,

The following patchset contains Netfilter updates for net-next:

1) Support for rejecting packets from the prerouting chain, from
   Laura Garcia Liebana.

2) Remove useless assignment in pipapo, from Stefano Brivio.

3) On demand hook registration in IPVS, from Julian Anastasov.

4) Expire IPVS connection from process context to not overload
   timers, also from Julian.

5) Fallback to conntrack TCP tracker to handle connection reuse
   in IPVS, from Julian Anastasov.

6) Several patches to support for chain bindings.

7) Expose enum nft_chain_flags through UAPI.

8) Reject unsupported chain flags from the netlink control plane.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thank you.

----------------------------------------------------------------

The following changes since commit 5fb62372a0207f1514fa6052c51991198c46ffe2:

  Merge branch 'dpaa2-eth-send-a-scatter-gather-FD-instead-of-realloc-ing' (2020-06-29 17:42:48 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git HEAD

for you to fetch changes up to c1f79a2eefdcc0aef5d7a911c27a3f75f1936ecd:

  netfilter: nf_tables: reject unsupported chain flags (2020-07-04 02:51:28 +0200)

----------------------------------------------------------------
Julian Anastasov (3):
      ipvs: register hooks only with services
      ipvs: avoid expiring many connections from timer
      ipvs: allow connection reuse for unconfirmed conntrack

Laura Garcia Liebana (1):
      netfilter: introduce support for reject at prerouting stage

Pablo Neira Ayuso (7):
      netfilter: nf_tables: add NFTA_CHAIN_ID attribute
      netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute
      netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute
      netfilter: nf_tables: expose enum nft_chain_flags through UAPI
      netfilter: nf_tables: add nft_chain_add()
      netfilter: nf_tables: add NFT_CHAIN_BINDING
      netfilter: nf_tables: reject unsupported chain flags

Stefano Brivio (1):
      netfilter: nft_set_pipapo: Drop useless assignment of scratch  map index on insert

 include/net/ip_vs.h                      |  15 ++-
 include/net/netfilter/nf_tables.h        |  23 ++--
 include/uapi/linux/netfilter/nf_tables.h |  14 +++
 net/ipv4/netfilter/nf_reject_ipv4.c      |  21 ++++
 net/ipv6/netfilter/nf_reject_ipv6.c      |  26 +++++
 net/netfilter/ipvs/ip_vs_conn.c          |  53 ++++++---
 net/netfilter/ipvs/ip_vs_core.c          |  92 +++++++++++----
 net/netfilter/ipvs/ip_vs_ctl.c           |  29 ++++-
 net/netfilter/nf_tables_api.c            | 188 +++++++++++++++++++++++++------
 net/netfilter/nft_immediate.c            |  51 +++++++++
 net/netfilter/nft_reject.c               |   3 +-
 net/netfilter/nft_set_pipapo.c           |   2 -
 12 files changed, 428 insertions(+), 89 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/12] netfilter: introduce support for reject at prerouting stage
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
@ 2020-07-08 17:45 ` Pablo Neira Ayuso
  2020-07-08 17:45 ` [PATCH 02/12] netfilter: nft_set_pipapo: Drop useless assignment of scratch map index on insert Pablo Neira Ayuso
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

From: Laura Garcia Liebana <nevola@gmail.com>

REJECT statement can be only used in INPUT, FORWARD and OUTPUT
chains. This patch adds support of REJECT, both icmp and tcp
reset, at PREROUTING stage.

The need for this patch comes from the requirement of some
forwarding devices to reject traffic before the natting and
routing decisions.

The main use case is to be able to send a graceful termination
to legitimate clients that, under any circumstances, the NATed
endpoints are not available. This option allows clients to
decide either to perform a reconnection or manage the error in
their side, instead of just dropping the connection and let
them die due to timeout.

It is supported ipv4, ipv6 and inet families for nft
infrastructure.

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_reject_ipv4.c | 21 +++++++++++++++++++++
 net/ipv6/netfilter/nf_reject_ipv6.c | 26 ++++++++++++++++++++++++++
 net/netfilter/nft_reject.c          |  3 ++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 2361fdac2c43..9dcfa4e461b6 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -96,6 +96,21 @@ void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip_tcphdr_put);
 
+static int nf_reject_fill_skb_dst(struct sk_buff *skb_in)
+{
+	struct dst_entry *dst = NULL;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(struct flowi));
+	fl.u.ip4.daddr = ip_hdr(skb_in)->saddr;
+	nf_ip_route(dev_net(skb_in->dev), &dst, &fl, false);
+	if (!dst)
+		return -1;
+
+	skb_dst_set(skb_in, dst);
+	return 0;
+}
+
 /* Send RST reply */
 void nf_send_reset(struct net *net, struct sk_buff *oldskb, int hook)
 {
@@ -109,6 +124,9 @@ void nf_send_reset(struct net *net, struct sk_buff *oldskb, int hook)
 	if (!oth)
 		return;
 
+	if (hook == NF_INET_PRE_ROUTING && nf_reject_fill_skb_dst(oldskb))
+		return;
+
 	if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
 		return;
 
@@ -175,6 +193,9 @@ void nf_send_unreach(struct sk_buff *skb_in, int code, int hook)
 	if (iph->frag_off & htons(IP_OFFSET))
 		return;
 
+	if (hook == NF_INET_PRE_ROUTING && nf_reject_fill_skb_dst(skb_in))
+		return;
+
 	if (skb_csum_unnecessary(skb_in) || !nf_reject_verify_csum(proto)) {
 		icmp_send(skb_in, ICMP_DEST_UNREACH, code, 0);
 		return;
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 5fae66f66671..4aef6baaa55e 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -126,6 +126,21 @@ void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip6_tcphdr_put);
 
+static int nf_reject6_fill_skb_dst(struct sk_buff *skb_in)
+{
+	struct dst_entry *dst = NULL;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(struct flowi));
+	fl.u.ip6.daddr = ipv6_hdr(skb_in)->saddr;
+	nf_ip6_route(dev_net(skb_in->dev), &dst, &fl, false);
+	if (!dst)
+		return -1;
+
+	skb_dst_set(skb_in, dst);
+	return 0;
+}
+
 void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 {
 	struct net_device *br_indev __maybe_unused;
@@ -154,6 +169,14 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 	fl6.daddr = oip6h->saddr;
 	fl6.fl6_sport = otcph->dest;
 	fl6.fl6_dport = otcph->source;
+
+	if (hook == NF_INET_PRE_ROUTING) {
+		nf_ip6_route(net, &dst, flowi6_to_flowi(&fl6), false);
+		if (!dst)
+			return;
+		skb_dst_set(oldskb, dst);
+	}
+
 	fl6.flowi6_oif = l3mdev_master_ifindex(skb_dst(oldskb)->dev);
 	fl6.flowi6_mark = IP6_REPLY_MARK(net, oldskb->mark);
 	security_skb_classify_flow(oldskb, flowi6_to_flowi(&fl6));
@@ -245,6 +268,9 @@ void nf_send_unreach6(struct net *net, struct sk_buff *skb_in,
 	if (hooknum == NF_INET_LOCAL_OUT && skb_in->dev == NULL)
 		skb_in->dev = net->loopback_dev;
 
+	if (hooknum == NF_INET_PRE_ROUTING && nf_reject6_fill_skb_dst(skb_in))
+		return;
+
 	icmpv6_send(skb_in, ICMPV6_DEST_UNREACH, code, 0);
 }
 EXPORT_SYMBOL_GPL(nf_send_unreach6);
diff --git a/net/netfilter/nft_reject.c b/net/netfilter/nft_reject.c
index 86eafbb0fdd0..61fb7e8afbf0 100644
--- a/net/netfilter/nft_reject.c
+++ b/net/netfilter/nft_reject.c
@@ -30,7 +30,8 @@ int nft_reject_validate(const struct nft_ctx *ctx,
 	return nft_chain_validate_hooks(ctx->chain,
 					(1 << NF_INET_LOCAL_IN) |
 					(1 << NF_INET_FORWARD) |
-					(1 << NF_INET_LOCAL_OUT));
+					(1 << NF_INET_LOCAL_OUT) |
+					(1 << NF_INET_PRE_ROUTING));
 }
 EXPORT_SYMBOL_GPL(nft_reject_validate);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/12] netfilter: nft_set_pipapo: Drop useless assignment of scratch  map index on insert
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
  2020-07-08 17:45 ` [PATCH 01/12] netfilter: introduce support for reject at prerouting stage Pablo Neira Ayuso
@ 2020-07-08 17:45 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 03/12] ipvs: register hooks only with services Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

From: Stefano Brivio <sbrivio@redhat.com>

In nft_pipapo_insert(), we need to reallocate scratch maps that will
be used for matching by lookup functions, if they have never been
allocated or if the bucket size changes as a result of the insertion.

As pipapo_realloc_scratch() provides a pair of fresh, zeroed out
maps, there's no need to select a particular one after reallocation.

Other than being useless, the existing assignment was also troubled
by the fact that the index was set only on the CPU performing the
actual insertion, as spotted by Florian.

Simply drop the assignment.

Reported-by: Florian Westphal <fw@strlen.de>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_pipapo.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 8c04388296b0..313de1d73168 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1249,8 +1249,6 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 		if (err)
 			return err;
 
-		this_cpu_write(nft_pipapo_scratch_index, false);
-
 		m->bsize_max = bsize_max;
 	} else {
 		put_cpu_ptr(m->scratch);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/12] ipvs: register hooks only with services
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
  2020-07-08 17:45 ` [PATCH 01/12] netfilter: introduce support for reject at prerouting stage Pablo Neira Ayuso
  2020-07-08 17:45 ` [PATCH 02/12] netfilter: nft_set_pipapo: Drop useless assignment of scratch map index on insert Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 04/12] ipvs: avoid expiring many connections from timer Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

From: Julian Anastasov <ja@ssi.bg>

Keep the IPVS hooks registered in Netfilter only
while there are configured virtual services. This
saves CPU cycles while IPVS is loaded but not used.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/ip_vs.h             |  5 +++
 net/netfilter/ipvs/ip_vs_core.c | 80 ++++++++++++++++++++++++++-------
 net/netfilter/ipvs/ip_vs_ctl.c  | 23 +++++++++-
 3 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 83be2d93b407..0c9881241323 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -874,6 +874,7 @@ struct netns_ipvs {
 	struct ip_vs_stats		tot_stats;  /* Statistics & est. */
 
 	int			num_services;    /* no of virtual services */
+	int			num_services6;   /* IPv6 virtual services */
 
 	/* Trash for destinations */
 	struct list_head	dest_trash;
@@ -960,6 +961,7 @@ struct netns_ipvs {
 	 * are not supported when synchronization is enabled.
 	 */
 	unsigned int		mixed_address_family_dests;
+	unsigned int		hooks_afmask;	/* &1=AF_INET, &2=AF_INET6 */
 };
 
 #define DEFAULT_SYNC_THRESHOLD	3
@@ -1670,6 +1672,9 @@ static inline void ip_vs_unregister_conntrack(struct ip_vs_service *svc)
 #endif
 }
 
+int ip_vs_register_hooks(struct netns_ipvs *ipvs, unsigned int af);
+void ip_vs_unregister_hooks(struct netns_ipvs *ipvs, unsigned int af);
+
 static inline int
 ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
 {
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index aa6a603a2425..ca3670152565 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2256,7 +2256,7 @@ ip_vs_forward_icmp_v6(void *priv, struct sk_buff *skb,
 #endif
 
 
-static const struct nf_hook_ops ip_vs_ops[] = {
+static const struct nf_hook_ops ip_vs_ops4[] = {
 	/* After packet filtering, change source only for VS/NAT */
 	{
 		.hook		= ip_vs_reply4,
@@ -2302,7 +2302,10 @@ static const struct nf_hook_ops ip_vs_ops[] = {
 		.hooknum	= NF_INET_FORWARD,
 		.priority	= 100,
 	},
+};
+
 #ifdef CONFIG_IP_VS_IPV6
+static const struct nf_hook_ops ip_vs_ops6[] = {
 	/* After packet filtering, change source only for VS/NAT */
 	{
 		.hook		= ip_vs_reply6,
@@ -2348,8 +2351,64 @@ static const struct nf_hook_ops ip_vs_ops[] = {
 		.hooknum	= NF_INET_FORWARD,
 		.priority	= 100,
 	},
-#endif
 };
+#endif
+
+int ip_vs_register_hooks(struct netns_ipvs *ipvs, unsigned int af)
+{
+	const struct nf_hook_ops *ops;
+	unsigned int count;
+	unsigned int afmask;
+	int ret = 0;
+
+	if (af == AF_INET6) {
+#ifdef CONFIG_IP_VS_IPV6
+		ops = ip_vs_ops6;
+		count = ARRAY_SIZE(ip_vs_ops6);
+		afmask = 2;
+#else
+		return -EINVAL;
+#endif
+	} else {
+		ops = ip_vs_ops4;
+		count = ARRAY_SIZE(ip_vs_ops4);
+		afmask = 1;
+	}
+
+	if (!(ipvs->hooks_afmask & afmask)) {
+		ret = nf_register_net_hooks(ipvs->net, ops, count);
+		if (ret >= 0)
+			ipvs->hooks_afmask |= afmask;
+	}
+	return ret;
+}
+
+void ip_vs_unregister_hooks(struct netns_ipvs *ipvs, unsigned int af)
+{
+	const struct nf_hook_ops *ops;
+	unsigned int count;
+	unsigned int afmask;
+
+	if (af == AF_INET6) {
+#ifdef CONFIG_IP_VS_IPV6
+		ops = ip_vs_ops6;
+		count = ARRAY_SIZE(ip_vs_ops6);
+		afmask = 2;
+#else
+		return;
+#endif
+	} else {
+		ops = ip_vs_ops4;
+		count = ARRAY_SIZE(ip_vs_ops4);
+		afmask = 1;
+	}
+
+	if (ipvs->hooks_afmask & afmask) {
+		nf_unregister_net_hooks(ipvs->net, ops, count);
+		ipvs->hooks_afmask &= ~afmask;
+	}
+}
+
 /*
  *	Initialize IP Virtual Server netns mem.
  */
@@ -2425,19 +2484,6 @@ static void __net_exit __ip_vs_cleanup_batch(struct list_head *net_list)
 	}
 }
 
-static int __net_init __ip_vs_dev_init(struct net *net)
-{
-	int ret;
-
-	ret = nf_register_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
-	if (ret < 0)
-		goto hook_fail;
-	return 0;
-
-hook_fail:
-	return ret;
-}
-
 static void __net_exit __ip_vs_dev_cleanup_batch(struct list_head *net_list)
 {
 	struct netns_ipvs *ipvs;
@@ -2446,7 +2492,8 @@ static void __net_exit __ip_vs_dev_cleanup_batch(struct list_head *net_list)
 	EnterFunction(2);
 	list_for_each_entry(net, net_list, exit_list) {
 		ipvs = net_ipvs(net);
-		nf_unregister_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
+		ip_vs_unregister_hooks(ipvs, AF_INET);
+		ip_vs_unregister_hooks(ipvs, AF_INET6);
 		ipvs->enable = 0;	/* Disable packet reception */
 		smp_wmb();
 		ip_vs_sync_net_cleanup(ipvs);
@@ -2462,7 +2509,6 @@ static struct pernet_operations ipvs_core_ops = {
 };
 
 static struct pernet_operations ipvs_core_dev_ops = {
-	.init = __ip_vs_dev_init,
 	.exit_batch = __ip_vs_dev_cleanup_batch,
 };
 
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 412656c34f20..0eed388c960b 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1272,6 +1272,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
 	struct ip_vs_scheduler *sched = NULL;
 	struct ip_vs_pe *pe = NULL;
 	struct ip_vs_service *svc = NULL;
+	int ret_hooks = -1;
 
 	/* increase the module use count */
 	if (!ip_vs_use_count_inc())
@@ -1313,6 +1314,14 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
 	}
 #endif
 
+	if ((u->af == AF_INET && !ipvs->num_services) ||
+	    (u->af == AF_INET6 && !ipvs->num_services6)) {
+		ret = ip_vs_register_hooks(ipvs, u->af);
+		if (ret < 0)
+			goto out_err;
+		ret_hooks = ret;
+	}
+
 	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL);
 	if (svc == NULL) {
 		IP_VS_DBG(1, "%s(): no memory\n", __func__);
@@ -1374,6 +1383,8 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
 	/* Count only IPv4 services for old get/setsockopt interface */
 	if (svc->af == AF_INET)
 		ipvs->num_services++;
+	else if (svc->af == AF_INET6)
+		ipvs->num_services6++;
 
 	/* Hash the service into the service table */
 	ip_vs_svc_hash(svc);
@@ -1385,6 +1396,8 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
 
 
  out_err:
+	if (ret_hooks >= 0)
+		ip_vs_unregister_hooks(ipvs, u->af);
 	if (svc != NULL) {
 		ip_vs_unbind_scheduler(svc, sched);
 		ip_vs_service_free(svc);
@@ -1500,9 +1513,15 @@ static void __ip_vs_del_service(struct ip_vs_service *svc, bool cleanup)
 	struct ip_vs_pe *old_pe;
 	struct netns_ipvs *ipvs = svc->ipvs;
 
-	/* Count only IPv4 services for old get/setsockopt interface */
-	if (svc->af == AF_INET)
+	if (svc->af == AF_INET) {
 		ipvs->num_services--;
+		if (!ipvs->num_services)
+			ip_vs_unregister_hooks(ipvs, svc->af);
+	} else if (svc->af == AF_INET6) {
+		ipvs->num_services6--;
+		if (!ipvs->num_services6)
+			ip_vs_unregister_hooks(ipvs, svc->af);
+	}
 
 	ip_vs_stop_estimator(svc->ipvs, &svc->stats);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/12] ipvs: avoid expiring many connections from timer
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 03/12] ipvs: register hooks only with services Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 05/12] ipvs: allow connection reuse for unconfirmed conntrack Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

From: Julian Anastasov <ja@ssi.bg>

Add new functions ip_vs_conn_del() and ip_vs_conn_del_put()
to release many IPVS connections in process context.
They are suitable for connections found in table
when we do not want to overload the timers.

Currently, the change is useful for the dropentry delayed
work but it will be used also in following patch
when flushing connections to failed destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipvs/ip_vs_conn.c | 53 +++++++++++++++++++++++----------
 net/netfilter/ipvs/ip_vs_ctl.c  |  6 ++--
 2 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 02f2f636798d..b3921ae92740 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -807,6 +807,31 @@ static void ip_vs_conn_rcu_free(struct rcu_head *head)
 	kmem_cache_free(ip_vs_conn_cachep, cp);
 }
 
+/* Try to delete connection while not holding reference */
+static void ip_vs_conn_del(struct ip_vs_conn *cp)
+{
+	if (del_timer(&cp->timer)) {
+		/* Drop cp->control chain too */
+		if (cp->control)
+			cp->timeout = 0;
+		ip_vs_conn_expire(&cp->timer);
+	}
+}
+
+/* Try to delete connection while holding reference */
+static void ip_vs_conn_del_put(struct ip_vs_conn *cp)
+{
+	if (del_timer(&cp->timer)) {
+		/* Drop cp->control chain too */
+		if (cp->control)
+			cp->timeout = 0;
+		__ip_vs_conn_put(cp);
+		ip_vs_conn_expire(&cp->timer);
+	} else {
+		__ip_vs_conn_put(cp);
+	}
+}
+
 static void ip_vs_conn_expire(struct timer_list *t)
 {
 	struct ip_vs_conn *cp = from_timer(cp, t, timer);
@@ -827,14 +852,17 @@ static void ip_vs_conn_expire(struct timer_list *t)
 
 		/* does anybody control me? */
 		if (ct) {
+			bool has_ref = !cp->timeout && __ip_vs_conn_get(ct);
+
 			ip_vs_control_del(cp);
 			/* Drop CTL or non-assured TPL if not used anymore */
-			if (!cp->timeout && !atomic_read(&ct->n_control) &&
+			if (has_ref && !atomic_read(&ct->n_control) &&
 			    (!(ct->flags & IP_VS_CONN_F_TEMPLATE) ||
 			     !(ct->state & IP_VS_CTPL_S_ASSURED))) {
 				IP_VS_DBG(4, "drop controlling connection\n");
-				ct->timeout = 0;
-				ip_vs_conn_expire_now(ct);
+				ip_vs_conn_del_put(ct);
+			} else if (has_ref) {
+				__ip_vs_conn_put(ct);
 			}
 		}
 
@@ -1317,8 +1345,7 @@ void ip_vs_random_dropentry(struct netns_ipvs *ipvs)
 
 drop:
 			IP_VS_DBG(4, "drop connection\n");
-			cp->timeout = 0;
-			ip_vs_conn_expire_now(cp);
+			ip_vs_conn_del(cp);
 		}
 		cond_resched_rcu();
 	}
@@ -1341,19 +1368,15 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs)
 		hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) {
 			if (cp->ipvs != ipvs)
 				continue;
-			/* As timers are expired in LIFO order, restart
-			 * the timer of controlling connection first, so
-			 * that it is expired after us.
-			 */
+			if (atomic_read(&cp->n_control))
+				continue;
 			cp_c = cp->control;
-			/* cp->control is valid only with reference to cp */
-			if (cp_c && __ip_vs_conn_get(cp)) {
+			IP_VS_DBG(4, "del connection\n");
+			ip_vs_conn_del(cp);
+			if (cp_c && !atomic_read(&cp_c->n_control)) {
 				IP_VS_DBG(4, "del controlling connection\n");
-				ip_vs_conn_expire_now(cp_c);
-				__ip_vs_conn_put(cp);
+				ip_vs_conn_del(cp_c);
 			}
-			IP_VS_DBG(4, "del connection\n");
-			ip_vs_conn_expire_now(cp);
 		}
 		cond_resched_rcu();
 	}
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 0eed388c960b..4af83f466dfc 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -224,7 +224,8 @@ static void defense_work_handler(struct work_struct *work)
 	update_defense_level(ipvs);
 	if (atomic_read(&ipvs->dropentry))
 		ip_vs_random_dropentry(ipvs);
-	schedule_delayed_work(&ipvs->defense_work, DEFENSE_TIMER_PERIOD);
+	queue_delayed_work(system_long_wq, &ipvs->defense_work,
+			   DEFENSE_TIMER_PERIOD);
 }
 #endif
 
@@ -4082,7 +4083,8 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
 	ipvs->sysctl_tbl = tbl;
 	/* Schedule defense work */
 	INIT_DELAYED_WORK(&ipvs->defense_work, defense_work_handler);
-	schedule_delayed_work(&ipvs->defense_work, DEFENSE_TIMER_PERIOD);
+	queue_delayed_work(system_long_wq, &ipvs->defense_work,
+			   DEFENSE_TIMER_PERIOD);
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/12] ipvs: allow connection reuse for unconfirmed conntrack
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 04/12] ipvs: avoid expiring many connections from timer Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 06/12] netfilter: nf_tables: add NFTA_CHAIN_ID attribute Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

From: Julian Anastasov <ja@ssi.bg>

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
https://github.com/kubernetes/kubernetes/issues/70747

- Apache Bench can fill up ipvs service proxy in seconds #544
https://github.com/cloudnativelabs/kube-router/issues/544

- Additional 1s latency in `host -> service IP -> pod`
https://github.com/kubernetes/kubernetes/issues/90854

Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <yx.atom1@gmail.com>
Signed-off-by: YangYuxi <yx.atom1@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/ip_vs.h             | 10 ++++------
 net/netfilter/ipvs/ip_vs_core.c | 12 +++++++-----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 0c9881241323..011f407b76fe 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1626,18 +1626,16 @@ static inline void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp)
 }
 #endif /* CONFIG_IP_VS_NFCT */
 
-/* Really using conntrack? */
-static inline bool ip_vs_conn_uses_conntrack(struct ip_vs_conn *cp,
-					     struct sk_buff *skb)
+/* Using old conntrack that can not be redirected to another real server? */
+static inline bool ip_vs_conn_uses_old_conntrack(struct ip_vs_conn *cp,
+						 struct sk_buff *skb)
 {
 #ifdef CONFIG_IP_VS_NFCT
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct;
 
-	if (!(cp->flags & IP_VS_CONN_F_NFCT))
-		return false;
 	ct = nf_ct_get(skb, &ctinfo);
-	if (ct)
+	if (ct && nf_ct_is_confirmed(ct))
 		return true;
 #endif
 	return false;
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index ca3670152565..b4a6b7662f3f 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2066,14 +2066,14 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
 
 	conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
 	if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
-		bool uses_ct = false, resched = false;
+		bool old_ct = false, resched = false;
 
 		if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
 		    unlikely(!atomic_read(&cp->dest->weight))) {
 			resched = true;
-			uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+			old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
 		} else if (is_new_conn_expected(cp, conn_reuse_mode)) {
-			uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+			old_ct = ip_vs_conn_uses_old_conntrack(cp, skb);
 			if (!atomic_read(&cp->n_control)) {
 				resched = true;
 			} else {
@@ -2081,15 +2081,17 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
 				 * that uses conntrack while it is still
 				 * referenced by controlled connection(s).
 				 */
-				resched = !uses_ct;
+				resched = !old_ct;
 			}
 		}
 
 		if (resched) {
+			if (!old_ct)
+				cp->flags &= ~IP_VS_CONN_F_NFCT;
 			if (!atomic_read(&cp->n_control))
 				ip_vs_conn_expire_now(cp);
 			__ip_vs_conn_put(cp);
-			if (uses_ct)
+			if (old_ct)
 				return NF_DROP;
 			cp = NULL;
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/12] netfilter: nf_tables: add NFTA_CHAIN_ID attribute
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 05/12] ipvs: allow connection reuse for unconfirmed conntrack Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 07/12] netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

This netlink attribute allows you to refer to chains inside a
transaction as an alternative to the name and the handle. The chain
binding support requires this new chain ID approach.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |  3 +++
 include/uapi/linux/netfilter/nf_tables.h |  2 ++
 net/netfilter/nf_tables_api.c            | 15 ++++++++++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 6f0f6fca9ac3..3e5226684017 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1433,6 +1433,7 @@ struct nft_trans_chain {
 	char				*name;
 	struct nft_stats __percpu	*stats;
 	u8				policy;
+	u32				chain_id;
 };
 
 #define nft_trans_chain_update(trans)	\
@@ -1443,6 +1444,8 @@ struct nft_trans_chain {
 	(((struct nft_trans_chain *)trans->data)->stats)
 #define nft_trans_chain_policy(trans)	\
 	(((struct nft_trans_chain *)trans->data)->policy)
+#define nft_trans_chain_id(trans)	\
+	(((struct nft_trans_chain *)trans->data)->chain_id)
 
 struct nft_trans_table {
 	bool				update;
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 4565456c0ef4..477779595b78 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -196,6 +196,7 @@ enum nft_table_attributes {
  * @NFTA_CHAIN_TYPE: type name of the string (NLA_NUL_STRING)
  * @NFTA_CHAIN_COUNTERS: counter specification of the chain (NLA_NESTED: nft_counter_attributes)
  * @NFTA_CHAIN_FLAGS: chain flags
+ * @NFTA_CHAIN_ID: uniquely identifies a chain in a transaction (NLA_U32)
  */
 enum nft_chain_attributes {
 	NFTA_CHAIN_UNSPEC,
@@ -209,6 +210,7 @@ enum nft_chain_attributes {
 	NFTA_CHAIN_COUNTERS,
 	NFTA_CHAIN_PAD,
 	NFTA_CHAIN_FLAGS,
+	NFTA_CHAIN_ID,
 	__NFTA_CHAIN_MAX
 };
 #define NFTA_CHAIN_MAX		(__NFTA_CHAIN_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 7647ecfa0d40..650ef0dd0773 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -280,9 +280,15 @@ static struct nft_trans *nft_trans_chain_add(struct nft_ctx *ctx, int msg_type)
 	if (trans == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	if (msg_type == NFT_MSG_NEWCHAIN)
+	if (msg_type == NFT_MSG_NEWCHAIN) {
 		nft_activate_next(ctx->net, ctx->chain);
 
+		if (ctx->nla[NFTA_CHAIN_ID]) {
+			nft_trans_chain_id(trans) =
+				ntohl(nla_get_be32(ctx->nla[NFTA_CHAIN_ID]));
+		}
+	}
+
 	list_add_tail(&trans->list, &ctx->net->nft.commit_list);
 	return trans;
 }
@@ -1274,6 +1280,7 @@ static const struct nla_policy nft_chain_policy[NFTA_CHAIN_MAX + 1] = {
 				    .len = NFT_MODULE_AUTOLOAD_LIMIT },
 	[NFTA_CHAIN_COUNTERS]	= { .type = NLA_NESTED },
 	[NFTA_CHAIN_FLAGS]	= { .type = NLA_U32 },
+	[NFTA_CHAIN_ID]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy nft_hook_policy[NFTA_HOOK_MAX + 1] = {
@@ -2154,9 +2161,9 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk,
 	const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
 	u8 genmask = nft_genmask_next(net);
 	int family = nfmsg->nfgen_family;
+	struct nft_chain *chain = NULL;
 	const struct nlattr *attr;
 	struct nft_table *table;
-	struct nft_chain *chain;
 	u8 policy = NF_ACCEPT;
 	struct nft_ctx ctx;
 	u64 handle = 0;
@@ -2181,7 +2188,7 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk,
 			return PTR_ERR(chain);
 		}
 		attr = nla[NFTA_CHAIN_HANDLE];
-	} else {
+	} else if (nla[NFTA_CHAIN_NAME]) {
 		chain = nft_chain_lookup(net, table, attr, genmask);
 		if (IS_ERR(chain)) {
 			if (PTR_ERR(chain) != -ENOENT) {
@@ -2190,6 +2197,8 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk,
 			}
 			chain = NULL;
 		}
+	} else if (!nla[NFTA_CHAIN_ID]) {
+		return -EINVAL;
 	}
 
 	if (nla[NFTA_CHAIN_POLICY]) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/12] netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 06/12] netfilter: nf_tables: add NFTA_CHAIN_ID attribute Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 08/12] netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

This new netlink attribute allows you to add rules to chains by the
chain ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h |  1 +
 net/netfilter/nf_tables_api.c            | 36 +++++++++++++++++++++---
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 477779595b78..2304d1b7ba5e 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -240,6 +240,7 @@ enum nft_rule_attributes {
 	NFTA_RULE_PAD,
 	NFTA_RULE_ID,
 	NFTA_RULE_POSITION_ID,
+	NFTA_RULE_CHAIN_ID,
 	__NFTA_RULE_MAX
 };
 #define NFTA_RULE_MAX		(__NFTA_RULE_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 650ef0dd0773..fbe8f9209813 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2153,6 +2153,22 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy,
 	return err;
 }
 
+static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
+					       const struct nlattr *nla)
+{
+	u32 id = ntohl(nla_get_be32(nla));
+	struct nft_trans *trans;
+
+	list_for_each_entry(trans, &net->nft.commit_list, list) {
+		struct nft_chain *chain = trans->ctx.chain;
+
+		if (trans->msg_type == NFT_MSG_NEWCHAIN &&
+		    id == nft_trans_chain_id(trans))
+			return chain;
+	}
+	return ERR_PTR(-ENOENT);
+}
+
 static int nf_tables_newchain(struct net *net, struct sock *nlsk,
 			      struct sk_buff *skb, const struct nlmsghdr *nlh,
 			      const struct nlattr * const nla[],
@@ -2633,6 +2649,7 @@ static const struct nla_policy nft_rule_policy[NFTA_RULE_MAX + 1] = {
 				    .len = NFT_USERDATA_MAXLEN },
 	[NFTA_RULE_ID]		= { .type = NLA_U32 },
 	[NFTA_RULE_POSITION_ID]	= { .type = NLA_U32 },
+	[NFTA_RULE_CHAIN_ID]	= { .type = NLA_U32 },
 };
 
 static int nf_tables_fill_rule_info(struct sk_buff *skb, struct net *net,
@@ -3039,10 +3056,21 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk,
 		return PTR_ERR(table);
 	}
 
-	chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN], genmask);
-	if (IS_ERR(chain)) {
-		NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
-		return PTR_ERR(chain);
+	if (nla[NFTA_RULE_CHAIN]) {
+		chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN],
+					 genmask);
+		if (IS_ERR(chain)) {
+			NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
+			return PTR_ERR(chain);
+		}
+	} else if (nla[NFTA_RULE_CHAIN_ID]) {
+		chain = nft_chain_lookup_byid(net, nla[NFTA_RULE_CHAIN_ID]);
+		if (IS_ERR(chain)) {
+			NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN_ID]);
+			return PTR_ERR(chain);
+		}
+	} else {
+		return -EINVAL;
 	}
 
 	if (nla[NFTA_RULE_HANDLE]) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/12] netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 07/12] netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 09/12] netfilter: nf_tables: expose enum nft_chain_flags through UAPI Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

This netlink attribute allows you to identify the chain to jump/goto by
means of the chain ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h |  2 ++
 net/netfilter/nf_tables_api.c            | 16 +++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 2304d1b7ba5e..683e75126d68 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -471,11 +471,13 @@ enum nft_data_attributes {
  *
  * @NFTA_VERDICT_CODE: nf_tables verdict (NLA_U32: enum nft_verdicts)
  * @NFTA_VERDICT_CHAIN: jump target chain name (NLA_STRING)
+ * @NFTA_VERDICT_CHAIN_ID: jump target chain ID (NLA_U32)
  */
 enum nft_verdict_attributes {
 	NFTA_VERDICT_UNSPEC,
 	NFTA_VERDICT_CODE,
 	NFTA_VERDICT_CHAIN,
+	NFTA_VERDICT_CHAIN_ID,
 	__NFTA_VERDICT_MAX
 };
 #define NFTA_VERDICT_MAX	(__NFTA_VERDICT_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fbe8f9209813..d86602797a69 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -8242,6 +8242,7 @@ static const struct nla_policy nft_verdict_policy[NFTA_VERDICT_MAX + 1] = {
 	[NFTA_VERDICT_CODE]	= { .type = NLA_U32 },
 	[NFTA_VERDICT_CHAIN]	= { .type = NLA_STRING,
 				    .len = NFT_CHAIN_MAXNAMELEN - 1 },
+	[NFTA_VERDICT_CHAIN_ID]	= { .type = NLA_U32 },
 };
 
 static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
@@ -8278,10 +8279,19 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
 		break;
 	case NFT_JUMP:
 	case NFT_GOTO:
-		if (!tb[NFTA_VERDICT_CHAIN])
+		if (tb[NFTA_VERDICT_CHAIN]) {
+			chain = nft_chain_lookup(ctx->net, ctx->table,
+						 tb[NFTA_VERDICT_CHAIN],
+						 genmask);
+		} else if (tb[NFTA_VERDICT_CHAIN_ID]) {
+			chain = nft_chain_lookup_byid(ctx->net,
+						      tb[NFTA_VERDICT_CHAIN_ID]);
+			if (IS_ERR(chain))
+				return PTR_ERR(chain);
+		} else {
 			return -EINVAL;
-		chain = nft_chain_lookup(ctx->net, ctx->table,
-					 tb[NFTA_VERDICT_CHAIN], genmask);
+		}
+
 		if (IS_ERR(chain))
 			return PTR_ERR(chain);
 		if (nft_is_base_chain(chain))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/12] netfilter: nf_tables: expose enum nft_chain_flags through UAPI
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 08/12] netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 10/12] netfilter: nf_tables: add nft_chain_add() Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

This enum definition was never exposed through UAPI. Rename
NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        | 7 +------
 include/uapi/linux/netfilter/nf_tables.h | 5 +++++
 net/netfilter/nf_tables_api.c            | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 3e5226684017..6d1e7da6e00a 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -921,11 +921,6 @@ static inline void nft_set_elem_update_expr(const struct nft_set_ext *ext,
 	     (expr) != (last); \
 	     (expr) = nft_expr_next(expr))
 
-enum nft_chain_flags {
-	NFT_BASE_CHAIN			= 0x1,
-	NFT_CHAIN_HW_OFFLOAD		= 0x2,
-};
-
 #define NFT_CHAIN_POLICY_UNSET		U8_MAX
 
 /**
@@ -1036,7 +1031,7 @@ static inline struct nft_base_chain *nft_base_chain(const struct nft_chain *chai
 
 static inline bool nft_is_base_chain(const struct nft_chain *chain)
 {
-	return chain->flags & NFT_BASE_CHAIN;
+	return chain->flags & NFT_CHAIN_BASE;
 }
 
 int __nft_release_basechain(struct nft_ctx *ctx);
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 683e75126d68..2cf7cc3b50c1 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -184,6 +184,11 @@ enum nft_table_attributes {
 };
 #define NFTA_TABLE_MAX		(__NFTA_TABLE_MAX - 1)
 
+enum nft_chain_flags {
+	NFT_CHAIN_BASE		= (1 << 0),
+	NFT_CHAIN_HW_OFFLOAD	= (1 << 1),
+};
+
 /**
  * enum nft_chain_attributes - nf_tables chain netlink attributes
  *
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d86602797a69..b7582a1c8dce 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1903,7 +1903,7 @@ static int nft_basechain_init(struct nft_base_chain *basechain, u8 family,
 		nft_basechain_hook_init(&basechain->ops, family, hook, chain);
 	}
 
-	chain->flags |= NFT_BASE_CHAIN | flags;
+	chain->flags |= NFT_CHAIN_BASE | flags;
 	basechain->policy = NF_ACCEPT;
 	if (chain->flags & NFT_CHAIN_HW_OFFLOAD &&
 	    nft_chain_offload_priority(basechain) < 0)
@@ -2255,7 +2255,7 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk,
 		if (nlh->nlmsg_flags & NLM_F_REPLACE)
 			return -EOPNOTSUPP;
 
-		flags |= chain->flags & NFT_BASE_CHAIN;
+		flags |= chain->flags & NFT_CHAIN_BASE;
 		return nf_tables_updchain(&ctx, genmask, policy, flags);
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/12] netfilter: nf_tables: add nft_chain_add()
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 09/12] netfilter: nf_tables: expose enum nft_chain_flags through UAPI Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 11/12] netfilter: nf_tables: add NFT_CHAIN_BINDING Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

This patch adds a helper function to add the chain to the hashtable and
the chain list.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b7582a1c8dce..a7cb9c07802b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1914,6 +1914,20 @@ static int nft_basechain_init(struct nft_base_chain *basechain, u8 family,
 	return 0;
 }
 
+static int nft_chain_add(struct nft_table *table, struct nft_chain *chain)
+{
+	int err;
+
+	err = rhltable_insert_key(&table->chains_ht, chain->name,
+				  &chain->rhlhead, nft_chain_ht_params);
+	if (err)
+		return err;
+
+	list_add_tail_rcu(&chain->list, &table->chains);
+
+	return 0;
+}
+
 static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 			      u8 policy, u32 flags)
 {
@@ -1991,16 +2005,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 	if (err < 0)
 		goto err1;
 
-	err = rhltable_insert_key(&table->chains_ht, chain->name,
-				  &chain->rhlhead, nft_chain_ht_params);
-	if (err)
-		goto err2;
-
 	trans = nft_trans_chain_add(ctx, NFT_MSG_NEWCHAIN);
 	if (IS_ERR(trans)) {
 		err = PTR_ERR(trans);
-		rhltable_remove(&table->chains_ht, &chain->rhlhead,
-				nft_chain_ht_params);
 		goto err2;
 	}
 
@@ -2008,8 +2015,13 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 	if (nft_is_base_chain(chain))
 		nft_trans_chain_policy(trans) = policy;
 
+	err = nft_chain_add(table, chain);
+	if (err < 0) {
+		nft_trans_destroy(trans);
+		goto err2;
+	}
+
 	table->use++;
-	list_add_tail_rcu(&chain->list, &table->chains);
 
 	return 0;
 err2:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/12] netfilter: nf_tables: add NFT_CHAIN_BINDING
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 10/12] netfilter: nf_tables: add nft_chain_add() Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 17:46 ` [PATCH 12/12] netfilter: nf_tables: reject unsupported chain flags Pablo Neira Ayuso
  2020-07-08 19:42 ` [PATCH 00/12] Netfilter/IPVS updates for net-next David Miller
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

This new chain flag specifies that:

* the kernel dynamically allocates the chain name, if no chain name
  is specified.

* If the immediate expression that refers to this chain is removed,
  then this bound chain (and its content) is destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        | 13 +++-
 include/uapi/linux/netfilter/nf_tables.h |  1 +
 net/netfilter/nf_tables_api.c            | 86 ++++++++++++++++++++----
 net/netfilter/nft_immediate.c            | 51 ++++++++++++++
 4 files changed, 138 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 6d1e7da6e00a..822c26766330 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -899,6 +899,8 @@ static inline struct nft_userdata *nft_userdata(const struct nft_rule *rule)
 	return (void *)&rule->data[rule->dlen];
 }
 
+void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule);
+
 static inline void nft_set_elem_update_expr(const struct nft_set_ext *ext,
 					    struct nft_regs *regs,
 					    const struct nft_pktinfo *pkt)
@@ -944,7 +946,8 @@ struct nft_chain {
 	struct nft_table		*table;
 	u64				handle;
 	u32				use;
-	u8				flags:6,
+	u8				flags:5,
+					bound:1,
 					genmask:2;
 	char				*name;
 
@@ -989,6 +992,14 @@ int nft_chain_validate_dependency(const struct nft_chain *chain,
 int nft_chain_validate_hooks(const struct nft_chain *chain,
                              unsigned int hook_flags);
 
+static inline bool nft_chain_is_bound(struct nft_chain *chain)
+{
+	return (chain->flags & NFT_CHAIN_BINDING) && chain->bound;
+}
+
+void nft_chain_del(struct nft_chain *chain);
+void nf_tables_chain_destroy(struct nft_ctx *ctx);
+
 struct nft_stats {
 	u64			bytes;
 	u64			pkts;
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 2cf7cc3b50c1..e00b4ae6174e 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -187,6 +187,7 @@ enum nft_table_attributes {
 enum nft_chain_flags {
 	NFT_CHAIN_BASE		= (1 << 0),
 	NFT_CHAIN_HW_OFFLOAD	= (1 << 1),
+	NFT_CHAIN_BINDING	= (1 << 2),
 };
 
 /**
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index a7cb9c07802b..b8a970dad213 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1056,6 +1056,9 @@ static int nft_flush_table(struct nft_ctx *ctx)
 		if (!nft_is_active_next(ctx->net, chain))
 			continue;
 
+		if (nft_chain_is_bound(chain))
+			continue;
+
 		ctx->chain = chain;
 
 		err = nft_delrule_by_chain(ctx);
@@ -1098,6 +1101,9 @@ static int nft_flush_table(struct nft_ctx *ctx)
 		if (!nft_is_active_next(ctx->net, chain))
 			continue;
 
+		if (nft_chain_is_bound(chain))
+			continue;
+
 		ctx->chain = chain;
 
 		err = nft_delchain(ctx);
@@ -1413,13 +1419,12 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
 					      lockdep_commit_lock_is_held(net));
 		if (nft_dump_stats(skb, stats))
 			goto nla_put_failure;
-
-		if ((chain->flags & NFT_CHAIN_HW_OFFLOAD) &&
-		    nla_put_be32(skb, NFTA_CHAIN_FLAGS,
-				 htonl(NFT_CHAIN_HW_OFFLOAD)))
-			goto nla_put_failure;
 	}
 
+	if (chain->flags &&
+	    nla_put_be32(skb, NFTA_CHAIN_FLAGS, htonl(chain->flags)))
+		goto nla_put_failure;
+
 	if (nla_put_be32(skb, NFTA_CHAIN_USE, htonl(chain->use)))
 		goto nla_put_failure;
 
@@ -1621,7 +1626,7 @@ static void nf_tables_chain_free_chain_rules(struct nft_chain *chain)
 	kvfree(chain->rules_next);
 }
 
-static void nf_tables_chain_destroy(struct nft_ctx *ctx)
+void nf_tables_chain_destroy(struct nft_ctx *ctx)
 {
 	struct nft_chain *chain = ctx->chain;
 	struct nft_hook *hook, *next;
@@ -1928,6 +1933,8 @@ static int nft_chain_add(struct nft_table *table, struct nft_chain *chain)
 	return 0;
 }
 
+static u64 chain_id;
+
 static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 			      u8 policy, u32 flags)
 {
@@ -1936,6 +1943,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 	struct nft_base_chain *basechain;
 	struct nft_stats __percpu *stats;
 	struct net *net = ctx->net;
+	char name[NFT_NAME_MAXLEN];
 	struct nft_trans *trans;
 	struct nft_chain *chain;
 	struct nft_rule **rules;
@@ -1947,6 +1955,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 	if (nla[NFTA_CHAIN_HOOK]) {
 		struct nft_chain_hook hook;
 
+		if (flags & NFT_CHAIN_BINDING)
+			return -EOPNOTSUPP;
+
 		err = nft_chain_parse_hook(net, nla, &hook, family, true);
 		if (err < 0)
 			return err;
@@ -1976,16 +1987,33 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 			return err;
 		}
 	} else {
+		if (flags & NFT_CHAIN_BASE)
+			return -EINVAL;
+		if (flags & NFT_CHAIN_HW_OFFLOAD)
+			return -EOPNOTSUPP;
+
 		chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 		if (chain == NULL)
 			return -ENOMEM;
+
+		chain->flags = flags;
 	}
 	ctx->chain = chain;
 
 	INIT_LIST_HEAD(&chain->rules);
 	chain->handle = nf_tables_alloc_handle(table);
 	chain->table = table;
-	chain->name = nla_strdup(nla[NFTA_CHAIN_NAME], GFP_KERNEL);
+
+	if (nla[NFTA_CHAIN_NAME]) {
+		chain->name = nla_strdup(nla[NFTA_CHAIN_NAME], GFP_KERNEL);
+	} else {
+		if (!(flags & NFT_CHAIN_BINDING))
+			return -EINVAL;
+
+		snprintf(name, sizeof(name), "__chain%llu", ++chain_id);
+		chain->name = kstrdup(name, GFP_KERNEL);
+	}
+
 	if (!chain->name) {
 		err = -ENOMEM;
 		goto err1;
@@ -2976,8 +3004,7 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx,
 	kfree(rule);
 }
 
-static void nf_tables_rule_release(const struct nft_ctx *ctx,
-				   struct nft_rule *rule)
+void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule)
 {
 	nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_RELEASE);
 	nf_tables_rule_destroy(ctx, rule);
@@ -3075,6 +3102,9 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk,
 			NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
 			return PTR_ERR(chain);
 		}
+		if (nft_chain_is_bound(chain))
+			return -EOPNOTSUPP;
+
 	} else if (nla[NFTA_RULE_CHAIN_ID]) {
 		chain = nft_chain_lookup_byid(net, nla[NFTA_RULE_CHAIN_ID]);
 		if (IS_ERR(chain)) {
@@ -3294,6 +3324,8 @@ static int nf_tables_delrule(struct net *net, struct sock *nlsk,
 			NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
 			return PTR_ERR(chain);
 		}
+		if (nft_chain_is_bound(chain))
+			return -EOPNOTSUPP;
 	}
 
 	nft_ctx_init(&ctx, net, skb, nlh, family, table, chain, nla);
@@ -5330,11 +5362,24 @@ static int nf_tables_newsetelem(struct net *net, struct sock *nlsk,
  */
 void nft_data_hold(const struct nft_data *data, enum nft_data_types type)
 {
+	struct nft_chain *chain;
+	struct nft_rule *rule;
+
 	if (type == NFT_DATA_VERDICT) {
 		switch (data->verdict.code) {
 		case NFT_JUMP:
 		case NFT_GOTO:
-			data->verdict.chain->use++;
+			chain = data->verdict.chain;
+			chain->use++;
+
+			if (!nft_chain_is_bound(chain))
+				break;
+
+			chain->table->use++;
+			list_for_each_entry(rule, &chain->rules, list)
+				chain->use++;
+
+			nft_chain_add(chain->table, chain);
 			break;
 		}
 	}
@@ -7474,7 +7519,7 @@ static void nft_obj_del(struct nft_object *obj)
 	list_del_rcu(&obj->list);
 }
 
-static void nft_chain_del(struct nft_chain *chain)
+void nft_chain_del(struct nft_chain *chain)
 {
 	struct nft_table *table = chain->table;
 
@@ -7825,6 +7870,10 @@ static int __nf_tables_abort(struct net *net, bool autoload)
 				kfree(nft_trans_chain_name(trans));
 				nft_trans_destroy(trans);
 			} else {
+				if (nft_chain_is_bound(trans->ctx.chain)) {
+					nft_trans_destroy(trans);
+					break;
+				}
 				trans->ctx.table->use--;
 				nft_chain_del(trans->ctx.chain);
 				nf_tables_unregister_hook(trans->ctx.net,
@@ -8321,10 +8370,23 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
 
 static void nft_verdict_uninit(const struct nft_data *data)
 {
+	struct nft_chain *chain;
+	struct nft_rule *rule;
+
 	switch (data->verdict.code) {
 	case NFT_JUMP:
 	case NFT_GOTO:
-		data->verdict.chain->use--;
+		chain = data->verdict.chain;
+		chain->use--;
+
+		if (!nft_chain_is_bound(chain))
+			break;
+
+		chain->table->use--;
+		list_for_each_entry(rule, &chain->rules, list)
+			chain->use--;
+
+		nft_chain_del(chain);
 		break;
 	}
 }
diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c
index c7f0ef73d939..9e556638bb32 100644
--- a/net/netfilter/nft_immediate.c
+++ b/net/netfilter/nft_immediate.c
@@ -54,6 +54,23 @@ static int nft_immediate_init(const struct nft_ctx *ctx,
 	if (err < 0)
 		goto err1;
 
+	if (priv->dreg == NFT_REG_VERDICT) {
+		struct nft_chain *chain = priv->data.verdict.chain;
+
+		switch (priv->data.verdict.code) {
+		case NFT_JUMP:
+		case NFT_GOTO:
+			if (nft_chain_is_bound(chain)) {
+				err = -EBUSY;
+				goto err1;
+			}
+			chain->bound = true;
+			break;
+		default:
+			break;
+		}
+	}
+
 	return 0;
 
 err1:
@@ -81,6 +98,39 @@ static void nft_immediate_deactivate(const struct nft_ctx *ctx,
 	return nft_data_release(&priv->data, nft_dreg_to_type(priv->dreg));
 }
 
+static void nft_immediate_destroy(const struct nft_ctx *ctx,
+				  const struct nft_expr *expr)
+{
+	const struct nft_immediate_expr *priv = nft_expr_priv(expr);
+	const struct nft_data *data = &priv->data;
+	struct nft_ctx chain_ctx;
+	struct nft_chain *chain;
+	struct nft_rule *rule;
+
+	if (priv->dreg != NFT_REG_VERDICT)
+		return;
+
+	switch (data->verdict.code) {
+	case NFT_JUMP:
+	case NFT_GOTO:
+		chain = data->verdict.chain;
+
+		if (!nft_chain_is_bound(chain))
+			break;
+
+		chain_ctx = *ctx;
+		chain_ctx.chain = chain;
+
+		list_for_each_entry(rule, &chain->rules, list)
+			nf_tables_rule_release(&chain_ctx, rule);
+
+		nf_tables_chain_destroy(&chain_ctx);
+		break;
+	default:
+		break;
+	}
+}
+
 static int nft_immediate_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
 	const struct nft_immediate_expr *priv = nft_expr_priv(expr);
@@ -170,6 +220,7 @@ static const struct nft_expr_ops nft_imm_ops = {
 	.init		= nft_immediate_init,
 	.activate	= nft_immediate_activate,
 	.deactivate	= nft_immediate_deactivate,
+	.destroy	= nft_immediate_destroy,
 	.dump		= nft_immediate_dump,
 	.validate	= nft_immediate_validate,
 	.offload	= nft_immediate_offload,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/12] netfilter: nf_tables: reject unsupported chain flags
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 11/12] netfilter: nf_tables: add NFT_CHAIN_BINDING Pablo Neira Ayuso
@ 2020-07-08 17:46 ` Pablo Neira Ayuso
  2020-07-08 19:42 ` [PATCH 00/12] Netfilter/IPVS updates for net-next David Miller
  12 siblings, 0 replies; 18+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 17:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

Bail out if userspace sends unsupported chain flags.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 3 +++
 net/netfilter/nf_tables_api.c            | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index e00b4ae6174e..42f351c1f5c5 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -189,6 +189,9 @@ enum nft_chain_flags {
 	NFT_CHAIN_HW_OFFLOAD	= (1 << 1),
 	NFT_CHAIN_BINDING	= (1 << 2),
 };
+#define NFT_CHAIN_FLAGS		(NFT_CHAIN_BASE		| \
+				 NFT_CHAIN_HW_OFFLOAD	| \
+				 NFT_CHAIN_BINDING)
 
 /**
  * enum nft_chain_attributes - nf_tables chain netlink attributes
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b8a970dad213..f96785586f64 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2285,6 +2285,9 @@ static int nf_tables_newchain(struct net *net, struct sock *nlsk,
 	else if (chain)
 		flags = chain->flags;
 
+	if (flags & ~NFT_CHAIN_FLAGS)
+		return -EOPNOTSUPP;
+
 	nft_ctx_init(&ctx, net, skb, nlh, family, table, chain, nla);
 
 	if (chain != NULL) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/12] Netfilter/IPVS updates for net-next
  2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
                   ` (11 preceding siblings ...)
  2020-07-08 17:46 ` [PATCH 12/12] netfilter: nf_tables: reject unsupported chain flags Pablo Neira Ayuso
@ 2020-07-08 19:42 ` David Miller
  12 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2020-07-08 19:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, kuba

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed,  8 Jul 2020 19:45:57 +0200

> The following patchset contains Netfilter updates for net-next:
> 
> 1) Support for rejecting packets from the prerouting chain, from
>    Laura Garcia Liebana.
> 
> 2) Remove useless assignment in pipapo, from Stefano Brivio.
> 
> 3) On demand hook registration in IPVS, from Julian Anastasov.
> 
> 4) Expire IPVS connection from process context to not overload
>    timers, also from Julian.
> 
> 5) Fallback to conntrack TCP tracker to handle connection reuse
>    in IPVS, from Julian Anastasov.
> 
> 6) Several patches to support for chain bindings.
> 
> 7) Expose enum nft_chain_flags through UAPI.
> 
> 8) Reject unsupported chain flags from the netlink control plane.
> 
> Please, pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks Pablo.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/12] netfilter/IPVS updates for net-next
  2014-01-05 23:09 [PATCH 00/12] netfilter/IPVS " Pablo Neira Ayuso
@ 2014-01-06  1:20 ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2014-01-06  1:20 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon,  6 Jan 2014 00:09:26 +0100

> You can pull these changes from:
> 
> Daniel Borkmann (4):
 ...

I assume you meant to say:

	git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

as the URL rather than just a blank line :-)

Pulled, thanks a lot.  But please address the feedback you received
wrt. secrets protecting hashes.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 00/12] netfilter/IPVS updates for net-next
@ 2014-01-05 23:09 Pablo Neira Ayuso
  2014-01-06  1:20 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Pablo Neira Ayuso @ 2014-01-05 23:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem

Hi David,

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.

You can pull these changes from:

Daniel Borkmann (4):
  netfilter: nf_nat: add full port randomization support
  net: net_cls: move cgroupfs classid handling into core
  net: netprio: rename config to be more consistent with cgroup configs
  netfilter: x_tables: lightweight process control group matching

Eric Leblond (1):
  netfilter: xt_CT: fix error value in xt_ct_tg_check()

Florian Westphal (2):
  netfilter: avoid get_random_bytes calls
  netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark

Geert Uytterhoeven (1):
  ipvs: Remove unused variable ret from sync_thread_master()

Valentina Giusti (1):
  netfilter: nfnetlink_queue: enable UID/GID socket info retrieval

fan.du (1):
  netfilter: add IPv4/6 IPComp extension match support

stephen hemminger (2):
  netfilter: ipset: remove unused code
  netfilter: nf_conntrack: remove dead code

 Documentation/cgroups/net_cls.txt              |    5 +
 include/linux/cgroup_subsys.h                  |    4 +-
 include/linux/netdevice.h                      |    2 +-
 include/linux/netfilter/ipset/ip_set.h         |    1 -
 include/net/cls_cgroup.h                       |   40 +++-----
 include/net/netfilter/ipv4/nf_conntrack_ipv4.h |    2 -
 include/net/netfilter/nf_conntrack_l3proto.h   |    1 -
 include/net/netprio_cgroup.h                   |   18 ++--
 include/net/sock.h                             |    2 +-
 include/uapi/linux/netfilter/Kbuild            |    2 +
 include/uapi/linux/netfilter/nf_nat.h          |   12 ++-
 include/uapi/linux/netfilter/nfnetlink_queue.h |    5 +-
 include/uapi/linux/netfilter/xt_cgroup.h       |   11 +++
 include/uapi/linux/netfilter/xt_ipcomp.h       |   16 ++++
 net/Kconfig                                    |   11 ++-
 net/core/Makefile                              |    3 +-
 net/core/dev.c                                 |    2 +-
 net/core/netclassid_cgroup.c                   |  120 ++++++++++++++++++++++++
 net/core/sock.c                                |   14 +--
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    6 --
 net/netfilter/Kconfig                          |   19 ++++
 net/netfilter/Makefile                         |    2 +
 net/netfilter/ipset/ip_set_core.c              |   28 ------
 net/netfilter/ipvs/ip_vs_sync.c                |    5 +-
 net/netfilter/nf_conntrack_core.c              |   15 ---
 net/netfilter/nf_conntrack_netlink.c           |   12 ++-
 net/netfilter/nf_conntrack_proto.c             |    6 --
 net/netfilter/nf_nat_core.c                    |    4 +-
 net/netfilter/nf_nat_proto_common.c            |   10 +-
 net/netfilter/nfnetlink_log.c                  |    8 --
 net/netfilter/nfnetlink_queue_core.c           |   34 +++++++
 net/netfilter/nft_hash.c                       |    2 +-
 net/netfilter/xt_CT.c                          |    4 +-
 net/netfilter/xt_RATEEST.c                     |    2 +-
 net/netfilter/xt_cgroup.c                      |   71 ++++++++++++++
 net/netfilter/xt_connlimit.c                   |    2 +-
 net/netfilter/xt_hashlimit.c                   |    2 +-
 net/netfilter/xt_ipcomp.c                      |  111 ++++++++++++++++++++++
 net/netfilter/xt_recent.c                      |    2 +-
 net/sched/Kconfig                              |    1 +
 net/sched/cls_cgroup.c                         |  111 +---------------------
 41 files changed, 470 insertions(+), 258 deletions(-)
 create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h
 create mode 100644 include/uapi/linux/netfilter/xt_ipcomp.h
 create mode 100644 net/core/netclassid_cgroup.c
 create mode 100644 net/netfilter/xt_cgroup.c
 create mode 100644 net/netfilter/xt_ipcomp.c

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/12] Netfilter/IPVS updates for net-next
  2013-06-05 20:40 [PATCH 00/12] Netfilter/IPVS " Pablo Neira Ayuso
@ 2013-06-06  9:03 ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2013-06-06  9:03 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed,  5 Jun 2013 22:40:21 +0200

> The following patchset contains the first batch of
> Netfilter/IPVS updates for your net-next tree, they are:
> 
> * Three patches with improvements and code refactorization
>   for nfnetlink_queue, from Florian Westphal.
> 
> * FTP helper now parses replies without brackets, as RFC1123
>   recommends, from Jeff Mahoney.
> 
> * Rise a warning to tell everyone about ULOG deprecation,
>   NFLOG has been already in the kernel tree for long time
>   and supersedes the old logging over netlink stub, from
>   myself.
> 
> * Don't panic if we fail to load netfilter core framework,
>   just bail out instead, from myself.
> 
> * Add cond_resched_rcu, used by IPVS to allow rescheduling
>   while walking over big hashtables, from Simon Horman.
> 
> * Change type of IPVS sysctl_sync_qlen_max sysctl to avoid
>   possible overflow, from Zhang Yanfei.
> 
> * Use strlcpy instead of strncpy to skip zeroing of already
>   initialized area to write the extension names in ebtables,
>   from Chen Gang.
> 
> * Use already existing per-cpu notrack object from xt_CT,
>   from Eric Dumazet.
> 
> * Save explicit socket lookup in xt_socket now that we have
>   early demux, also from Eric Dumazet.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

Pulled, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 00/12] Netfilter/IPVS updates for net-next
@ 2013-06-05 20:40 Pablo Neira Ayuso
  2013-06-06  9:03 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Pablo Neira Ayuso @ 2013-06-05 20:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pablo Neira Ayuso <pablo@soleta.eu>

Hi David,

The following patchset contains the first batch of
Netfilter/IPVS updates for your net-next tree, they are:

* Three patches with improvements and code refactorization
  for nfnetlink_queue, from Florian Westphal.

* FTP helper now parses replies without brackets, as RFC1123
  recommends, from Jeff Mahoney.

* Rise a warning to tell everyone about ULOG deprecation,
  NFLOG has been already in the kernel tree for long time
  and supersedes the old logging over netlink stub, from
  myself.

* Don't panic if we fail to load netfilter core framework,
  just bail out instead, from myself.

* Add cond_resched_rcu, used by IPVS to allow rescheduling
  while walking over big hashtables, from Simon Horman.

* Change type of IPVS sysctl_sync_qlen_max sysctl to avoid
  possible overflow, from Zhang Yanfei.

* Use strlcpy instead of strncpy to skip zeroing of already
  initialized area to write the extension names in ebtables,
  from Chen Gang.

* Use already existing per-cpu notrack object from xt_CT,
  from Eric Dumazet.

* Save explicit socket lookup in xt_socket now that we have
  early demux, also from Eric Dumazet.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

Thanks!

----------------------------------------------------------------

The following changes since commit 8892475386e819aa50856947948c546ccc964d96:

  ipv6: use ipv6_addr_scope() helper (2013-05-23 01:17:47 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

for you to fetch changes up to 7f87712c0152511a1842698ad8dca425fee2dc4f:

  netfilter: nfnetlink_queue: only add CAP_LEN attr when needed (2013-06-05 12:40:54 +0200)

----------------------------------------------------------------
Chen Gang (1):
      bridge: netfilter: using strlcpy() instead of strncpy()

Eric Dumazet (2):
      netfilter: xt_CT: optimize XT_CT_NOTRACK
      netfilter: xt_socket: use IP early demux

Florian Westphal (3):
      netfilter: nfnetlink_queue: avoid peer_portid test
      netfilter: nfnetlink_queue: cleanup copy_range usage
      netfilter: nfnetlink_queue: only add CAP_LEN attr when needed

Jeff Mahoney (1):
      netfilter: Implement RFC 1123 for FTP conntrack

Pablo Neira Ayuso (2):
      netfilter: don't panic on error while walking through the init path
      netfilter: {ipt,ebt}_ULOG: rise warning on deprecation

Simon Horman (2):
      sched: add cond_resched_rcu() helper
      ipvs: use cond_resched_rcu() helper when walking connections

Zhang Yanfei (1):
      ipvs: change type of netns_ipvs->sysctl_sync_qlen_max

 include/linux/netfilter.h            |    2 +-
 include/linux/sched.h                |    9 +++++
 include/net/ip_vs.h                  |    8 ++--
 include/net/netns/x_tables.h         |    6 +++
 net/bridge/netfilter/ebt_ulog.c      |    6 +++
 net/bridge/netfilter/ebtables.c      |    6 +--
 net/ipv4/netfilter/Kconfig           |    2 +-
 net/ipv4/netfilter/ipt_ULOG.c        |    6 +++
 net/netfilter/core.c                 |   21 +++++++---
 net/netfilter/ipvs/ip_vs_conn.c      |   23 ++++-------
 net/netfilter/ipvs/ip_vs_ctl.c       |    4 +-
 net/netfilter/nf_conntrack_ftp.c     |   73 +++++++++++++++++++++++++---------
 net/netfilter/nf_log.c               |    5 +--
 net/netfilter/nfnetlink_queue_core.c |   29 +++++++-------
 net/netfilter/xt_CT.c                |   10 +++--
 net/netfilter/xt_socket.c            |   26 +++++++-----
 net/socket.c                         |    4 +-
 17 files changed, 155 insertions(+), 85 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-07-08 19:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 17:45 [PATCH 00/12] Netfilter/IPVS updates for net-next Pablo Neira Ayuso
2020-07-08 17:45 ` [PATCH 01/12] netfilter: introduce support for reject at prerouting stage Pablo Neira Ayuso
2020-07-08 17:45 ` [PATCH 02/12] netfilter: nft_set_pipapo: Drop useless assignment of scratch map index on insert Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 03/12] ipvs: register hooks only with services Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 04/12] ipvs: avoid expiring many connections from timer Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 05/12] ipvs: allow connection reuse for unconfirmed conntrack Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 06/12] netfilter: nf_tables: add NFTA_CHAIN_ID attribute Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 07/12] netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 08/12] netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 09/12] netfilter: nf_tables: expose enum nft_chain_flags through UAPI Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 10/12] netfilter: nf_tables: add nft_chain_add() Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 11/12] netfilter: nf_tables: add NFT_CHAIN_BINDING Pablo Neira Ayuso
2020-07-08 17:46 ` [PATCH 12/12] netfilter: nf_tables: reject unsupported chain flags Pablo Neira Ayuso
2020-07-08 19:42 ` [PATCH 00/12] Netfilter/IPVS updates for net-next David Miller
  -- strict thread matches above, loose matches on Subject: below --
2014-01-05 23:09 [PATCH 00/12] netfilter/IPVS " Pablo Neira Ayuso
2014-01-06  1:20 ` David Miller
2013-06-05 20:40 [PATCH 00/12] Netfilter/IPVS " Pablo Neira Ayuso
2013-06-06  9:03 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).