All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists
@ 2022-03-23 13:21 Florian Westphal
  2022-03-23 13:21 ` [PATCH nf-next v3 01/16] nfnetlink: handle already-released nl socket Florian Westphal
                   ` (16 more replies)
  0 siblings, 17 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This series removes the unconfirmed and dying percpu lists.

Dying list is replaced by pernet list, only used when reliable event
delivery mode was requested.

Unconfirmed list is replaced by a generation id for the conntrack
extesions, to detect when pointers to external objects (timeout policy,
helper, ...) has gone stale.

An alternative to the genid would be to always take references on
such external objects, let me know if that is the preferred solution.

Changes in v3:
- fix build bugs reported by kbuild robot
- add patch #16

Florian Westphal (16):
  nfnetlink: handle already-released nl socket
  netfilter: ctnetlink: make ecache event cb global again
  netfilter: ecache: move to separate structure
  netfilter: ecache: use dedicated list for event redelivery
  netfilter: conntrack: split inner loop of list dumping to own function
  netfilter: conntrack: include ecache dying list in dumps
  netfilter: conntrack: remove the percpu dying list
  netfilter: cttimeout: inc/dec module refcount per object, not per use
    refcount
  netfilter: nfnetlink_cttimeout: use rcu protection in
    cttimeout_get_timeout
  netfilter: cttimeout: decouple unlink and free on netns destruction
  netfilter: remove nf_ct_unconfirmed_destroy helper
  netfilter: extensions: introduce extension genid count
  netfilter: cttimeout: decouple unlink and free on netns destruction
  netfilter: conntrack: remove __nf_ct_unconfirmed_destroy
  netfilter: conntrack: remove unconfirmed list
  netfilter: conntrack: avoid unconditional local_bh_disable

 include/net/netfilter/nf_conntrack.h         |  13 +-
 include/net/netfilter/nf_conntrack_ecache.h  |  34 +--
 include/net/netfilter/nf_conntrack_extend.h  |  31 +--
 include/net/netfilter/nf_conntrack_labels.h  |  10 +-
 include/net/netfilter/nf_conntrack_timeout.h |   8 -
 include/net/netns/conntrack.h                |   8 -
 net/netfilter/nf_conntrack_core.c            | 230 ++++++++-----------
 net/netfilter/nf_conntrack_ecache.c          | 173 +++++++-------
 net/netfilter/nf_conntrack_extend.c          |  32 ++-
 net/netfilter/nf_conntrack_helper.c          |   5 -
 net/netfilter/nf_conntrack_netlink.c         | 177 +++++++-------
 net/netfilter/nfnetlink.c                    |  62 +++--
 net/netfilter/nfnetlink_cttimeout.c          |  88 ++++---
 13 files changed, 443 insertions(+), 428 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 01/16] nfnetlink: handle already-released nl socket
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
@ 2022-03-23 13:21 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 02/16] netfilter: ctnetlink: make ecache event cb global again Florian Westphal
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

At this time upper layer, e.g. nf_conntrack_event, has to make sure that
its pernet exit handler runs before the nfnetlink one, otherwise we get a
crash if kernel tries to send a conntrack event after the nfnetlink netns
exit handler did close the socket already.

In order to move nf_conntrack_ecache to global (not pernet) netns event
pointer again the nfnetlink apis need to survive attempts to send a netlink
message after the socket has been destroyed in nfnetlink netns exit
function.

Set the pernet socket to null in the pre_exit handler and close it in the
exit_batch handler via a 'stash' pointer.

All functions now check nlsk for NULL before using it.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nfnetlink.c | 62 +++++++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 7e2c8dd01408..6105dc9b8f8e 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -46,6 +46,7 @@ static unsigned int nfnetlink_pernet_id __read_mostly;
 
 struct nfnl_net {
 	struct sock *nfnl;
+	struct sock *nfnl_stash;
 };
 
 static struct {
@@ -160,37 +161,56 @@ nfnetlink_find_client(u16 type, const struct nfnetlink_subsystem *ss)
 	return &ss->cb[cb_id];
 }
 
-int nfnetlink_has_listeners(struct net *net, unsigned int group)
+static struct sock *nfnl_pernet_sk(struct net *net)
 {
 	struct nfnl_net *nfnlnet = nfnl_pernet(net);
 
-	return netlink_has_listeners(nfnlnet->nfnl, group);
+	return READ_ONCE(nfnlnet->nfnl);
+}
+
+int nfnetlink_has_listeners(struct net *net, unsigned int group)
+{
+	struct sock *nlsk = nfnl_pernet_sk(net);
+
+	return nlsk ? netlink_has_listeners(nlsk, group) : 0;
 }
 EXPORT_SYMBOL_GPL(nfnetlink_has_listeners);
 
 int nfnetlink_send(struct sk_buff *skb, struct net *net, u32 portid,
 		   unsigned int group, int echo, gfp_t flags)
 {
-	struct nfnl_net *nfnlnet = nfnl_pernet(net);
+	struct sock *nlsk = nfnl_pernet_sk(net);
+
+	if (nlsk)
+		return nlmsg_notify(nlsk, skb, portid, group, echo, flags);
 
-	return nlmsg_notify(nfnlnet->nfnl, skb, portid, group, echo, flags);
+	/* nlsk already gone? This happens when .pre_exit was already called,
+	 * return 0, we can't retry.
+	 */
+	kfree_skb(skb);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nfnetlink_send);
 
 int nfnetlink_set_err(struct net *net, u32 portid, u32 group, int error)
 {
-	struct nfnl_net *nfnlnet = nfnl_pernet(net);
+	struct sock *nlsk = nfnl_pernet_sk(net);
 
-	return netlink_set_err(nfnlnet->nfnl, portid, group, error);
+	return nlsk ? netlink_set_err(nlsk, portid, group, error) : 0;
 }
 EXPORT_SYMBOL_GPL(nfnetlink_set_err);
 
 int nfnetlink_unicast(struct sk_buff *skb, struct net *net, u32 portid)
 {
-	struct nfnl_net *nfnlnet = nfnl_pernet(net);
+	struct sock *nlsk = nfnl_pernet_sk(net);
 	int err;
 
-	err = nlmsg_unicast(nfnlnet->nfnl, skb, portid);
+	if (!nlsk) {
+		kfree_skb(skb);
+		return 0;
+	}
+
+	err = nlmsg_unicast(nlsk, skb, portid);
 	if (err == -EAGAIN)
 		err = -ENOBUFS;
 
@@ -201,9 +221,12 @@ EXPORT_SYMBOL_GPL(nfnetlink_unicast);
 void nfnetlink_broadcast(struct net *net, struct sk_buff *skb, __u32 portid,
 			 __u32 group, gfp_t allocation)
 {
-	struct nfnl_net *nfnlnet = nfnl_pernet(net);
+	struct sock *nlsk = nfnl_pernet_sk(net);
 
-	netlink_broadcast(nfnlnet->nfnl, skb, portid, group, allocation);
+	if (nlsk)
+		netlink_broadcast(nlsk, skb, portid, group, allocation);
+	else
+		kfree_skb(skb);
 }
 EXPORT_SYMBOL_GPL(nfnetlink_broadcast);
 
@@ -247,7 +270,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	{
 		int min_len = nlmsg_total_size(sizeof(struct nfgenmsg));
-		struct nfnl_net *nfnlnet = nfnl_pernet(net);
+		struct sock *nlsk = nfnl_pernet_sk(net);
 		u8 cb_id = NFNL_MSG_TYPE(nlh->nlmsg_type);
 		struct nlattr *cda[NFNL_MAX_ATTR_COUNT + 1];
 		struct nlattr *attr = (void *)nlh + min_len;
@@ -255,7 +278,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 		__u8 subsys_id = NFNL_SUBSYS_ID(type);
 		struct nfnl_info info = {
 			.net	= net,
-			.sk	= nfnlnet->nfnl,
+			.sk	= nlsk,
 			.nlh	= nlh,
 			.nfmsg	= nlmsg_data(nlh),
 			.extack	= extack,
@@ -484,14 +507,14 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 		{
 			int min_len = nlmsg_total_size(sizeof(struct nfgenmsg));
-			struct nfnl_net *nfnlnet = nfnl_pernet(net);
+			struct sock *nlsk = nfnl_pernet_sk(net);
 			struct nlattr *cda[NFNL_MAX_ATTR_COUNT + 1];
 			struct nlattr *attr = (void *)nlh + min_len;
 			u8 cb_id = NFNL_MSG_TYPE(nlh->nlmsg_type);
 			int attrlen = nlh->nlmsg_len - min_len;
 			struct nfnl_info info = {
 				.net	= net,
-				.sk	= nfnlnet->nfnl,
+				.sk	= nlsk,
 				.nlh	= nlh,
 				.nfmsg	= nlmsg_data(nlh),
 				.extack	= &extack,
@@ -699,12 +722,21 @@ static void __net_exit nfnetlink_net_exit_batch(struct list_head *net_exit_list)
 	list_for_each_entry(net, net_exit_list, exit_list) {
 		nfnlnet = nfnl_pernet(net);
 
-		netlink_kernel_release(nfnlnet->nfnl);
+		netlink_kernel_release(nfnlnet->nfnl_stash);
 	}
 }
 
+static void __net_exit nfnetlink_net_pre_exit(struct net *net)
+{
+	struct nfnl_net *nfnlnet = nfnl_pernet(net);
+
+	nfnlnet->nfnl_stash = nfnlnet->nfnl;
+	WRITE_ONCE(nfnlnet->nfnl, NULL);
+}
+
 static struct pernet_operations nfnetlink_net_ops = {
 	.init		= nfnetlink_net_init,
+	.pre_exit	= nfnetlink_net_pre_exit,
 	.exit_batch	= nfnetlink_net_exit_batch,
 	.id		= &nfnetlink_pernet_id,
 	.size		= sizeof(struct nfnl_net),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 02/16] netfilter: ctnetlink: make ecache event cb global again
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
  2022-03-23 13:21 ` [PATCH nf-next v3 01/16] nfnetlink: handle already-released nl socket Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 03/16] netfilter: ecache: move to separate structure Florian Westphal
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This was pernet to make sure we do not trip over already closed nfnl sk.

After moving nfnl validity checks to nfnetlink core this can be global
again, it only needs to be set to NULL when the module is removed.

This also avoids the need for pernet ops in ctnetlink and register mutex.

Remove access_pointer() checks, ctnetlink module is loaded in most cases.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_ecache.h | 30 ++++-------------
 include/net/netns/conntrack.h               |  1 -
 net/netfilter/nf_conntrack_ecache.c         | 33 ++++++------------
 net/netfilter/nf_conntrack_netlink.c        | 37 +++++----------------
 4 files changed, 26 insertions(+), 75 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_ecache.h b/include/net/netfilter/nf_conntrack_ecache.h
index 6c4c490a3e34..31e6a7572bb7 100644
--- a/include/net/netfilter/nf_conntrack_ecache.h
+++ b/include/net/netfilter/nf_conntrack_ecache.h
@@ -83,9 +83,8 @@ struct nf_ct_event_notifier {
 	int (*exp_event)(unsigned int events, const struct nf_exp_event *item);
 };
 
-void nf_conntrack_register_notifier(struct net *net,
-				   const struct nf_ct_event_notifier *nb);
-void nf_conntrack_unregister_notifier(struct net *net);
+void nf_conntrack_register_notifier(const struct nf_ct_event_notifier *nb);
+void nf_conntrack_unregister_notifier(void);
 
 void nf_ct_deliver_cached_events(struct nf_conn *ct);
 int nf_conntrack_eventmask_report(unsigned int eventmask, struct nf_conn *ct,
@@ -107,21 +106,16 @@ static inline int nf_conntrack_eventmask_report(unsigned int eventmask,
 
 #endif
 
+extern const struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb __read_mostly;
+
 static inline void
 nf_conntrack_event_cache(enum ip_conntrack_events event, struct nf_conn *ct)
 {
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
-	struct net *net = nf_ct_net(ct);
-	struct nf_conntrack_ecache *e;
-
-	if (!rcu_access_pointer(net->ct.nf_conntrack_event_cb))
-		return;
-
-	e = nf_ct_ecache_find(ct);
-	if (e == NULL)
-		return;
+	struct nf_conntrack_ecache *e = nf_ct_ecache_find(ct);
 
-	set_bit(event, &e->cache);
+	if (e)
+		set_bit(event, &e->cache);
 #endif
 }
 
@@ -130,11 +124,6 @@ nf_conntrack_event_report(enum ip_conntrack_events event, struct nf_conn *ct,
 			  u32 portid, int report)
 {
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
-	const struct net *net = nf_ct_net(ct);
-
-	if (!rcu_access_pointer(net->ct.nf_conntrack_event_cb))
-		return 0;
-
 	return nf_conntrack_eventmask_report(1 << event, ct, portid, report);
 #else
 	return 0;
@@ -145,11 +134,6 @@ static inline int
 nf_conntrack_event(enum ip_conntrack_events event, struct nf_conn *ct)
 {
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
-	const struct net *net = nf_ct_net(ct);
-
-	if (!rcu_access_pointer(net->ct.nf_conntrack_event_cb))
-		return 0;
-
 	return nf_conntrack_eventmask_report(1 << event, ct, 0, 0);
 #else
 	return 0;
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 0294f3d473af..3bb62e938fa9 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -112,7 +112,6 @@ struct netns_ct {
 
 	struct ct_pcpu __percpu *pcpu_lists;
 	struct ip_conntrack_stat __percpu *stat;
-	struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb;
 	struct nf_ip_net	nf_ct_proto;
 #if defined(CONFIG_NF_CONNTRACK_LABELS)
 	unsigned int		labels_used;
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 07e65b4e92f8..9ad501d14249 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -27,7 +27,8 @@
 #include <net/netfilter/nf_conntrack_ecache.h>
 #include <net/netfilter/nf_conntrack_extend.h>
 
-static DEFINE_MUTEX(nf_ct_ecache_mutex);
+const struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb __read_mostly;
+EXPORT_SYMBOL_GPL(nf_conntrack_event_cb);
 
 #define ECACHE_RETRY_WAIT (HZ/10)
 #define ECACHE_STACK_ALLOC (256 / sizeof(void *))
@@ -135,8 +136,7 @@ static int __nf_conntrack_eventmask_report(struct nf_conntrack_ecache *e,
 					   const u32 missed,
 					   const struct nf_ct_event *item)
 {
-	struct net *net = nf_ct_net(item->ct);
-	struct nf_ct_event_notifier *notify;
+	const struct nf_ct_event_notifier *notify;
 	u32 old, want;
 	int ret;
 
@@ -145,7 +145,7 @@ static int __nf_conntrack_eventmask_report(struct nf_conntrack_ecache *e,
 
 	rcu_read_lock();
 
-	notify = rcu_dereference(net->ct.nf_conntrack_event_cb);
+	notify = rcu_dereference(nf_conntrack_event_cb);
 	if (!notify) {
 		rcu_read_unlock();
 		return 0;
@@ -240,12 +240,11 @@ void nf_ct_expect_event_report(enum ip_conntrack_expect_events event,
 			       u32 portid, int report)
 
 {
-	struct net *net = nf_ct_exp_net(exp);
-	struct nf_ct_event_notifier *notify;
+	const struct nf_ct_event_notifier *notify;
 	struct nf_conntrack_ecache *e;
 
 	rcu_read_lock();
-	notify = rcu_dereference(net->ct.nf_conntrack_event_cb);
+	notify = rcu_dereference(nf_conntrack_event_cb);
 	if (!notify)
 		goto out_unlock;
 
@@ -265,26 +264,16 @@ void nf_ct_expect_event_report(enum ip_conntrack_expect_events event,
 	rcu_read_unlock();
 }
 
-void nf_conntrack_register_notifier(struct net *net,
-				    const struct nf_ct_event_notifier *new)
+void nf_conntrack_register_notifier(const struct nf_ct_event_notifier *new)
 {
-	struct nf_ct_event_notifier *notify;
-
-	mutex_lock(&nf_ct_ecache_mutex);
-	notify = rcu_dereference_protected(net->ct.nf_conntrack_event_cb,
-					   lockdep_is_held(&nf_ct_ecache_mutex));
-	WARN_ON_ONCE(notify);
-	rcu_assign_pointer(net->ct.nf_conntrack_event_cb, new);
-	mutex_unlock(&nf_ct_ecache_mutex);
+	WARN_ON_ONCE(rcu_access_pointer(nf_conntrack_event_cb));
+	rcu_assign_pointer(nf_conntrack_event_cb, new);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_register_notifier);
 
-void nf_conntrack_unregister_notifier(struct net *net)
+void nf_conntrack_unregister_notifier(void)
 {
-	mutex_lock(&nf_ct_ecache_mutex);
-	RCU_INIT_POINTER(net->ct.nf_conntrack_event_cb, NULL);
-	mutex_unlock(&nf_ct_ecache_mutex);
-	/* synchronize_rcu() is called after netns pre_exit */
+	RCU_INIT_POINTER(nf_conntrack_event_cb, NULL);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_unregister_notifier);
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 1ea2ad732d57..4a460565f275 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3772,7 +3772,7 @@ static int ctnetlink_stat_exp_cpu(struct sk_buff *skb,
 }
 
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
-static struct nf_ct_event_notifier ctnl_notifier = {
+static const struct nf_ct_event_notifier ctnl_notifier = {
 	.ct_event = ctnetlink_conntrack_event,
 	.exp_event = ctnetlink_expect_event,
 };
@@ -3864,26 +3864,6 @@ MODULE_ALIAS("ip_conntrack_netlink");
 MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTNETLINK);
 MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTNETLINK_EXP);
 
-static int __net_init ctnetlink_net_init(struct net *net)
-{
-#ifdef CONFIG_NF_CONNTRACK_EVENTS
-	nf_conntrack_register_notifier(net, &ctnl_notifier);
-#endif
-	return 0;
-}
-
-static void ctnetlink_net_pre_exit(struct net *net)
-{
-#ifdef CONFIG_NF_CONNTRACK_EVENTS
-	nf_conntrack_unregister_notifier(net);
-#endif
-}
-
-static struct pernet_operations ctnetlink_net_ops = {
-	.init		= ctnetlink_net_init,
-	.pre_exit	= ctnetlink_net_pre_exit,
-};
-
 static int __init ctnetlink_init(void)
 {
 	int ret;
@@ -3902,19 +3882,16 @@ static int __init ctnetlink_init(void)
 		goto err_unreg_subsys;
 	}
 
-	ret = register_pernet_subsys(&ctnetlink_net_ops);
-	if (ret < 0) {
-		pr_err("ctnetlink_init: cannot register pernet operations\n");
-		goto err_unreg_exp_subsys;
-	}
+#ifdef CONFIG_NF_CONNTRACK_EVENTS
+	nf_conntrack_register_notifier(&ctnl_notifier);
+#endif
+
 #ifdef CONFIG_NETFILTER_NETLINK_GLUE_CT
 	/* setup interaction between nf_queue and nf_conntrack_netlink. */
 	RCU_INIT_POINTER(nfnl_ct_hook, &ctnetlink_glue_hook);
 #endif
 	return 0;
 
-err_unreg_exp_subsys:
-	nfnetlink_subsys_unregister(&ctnl_exp_subsys);
 err_unreg_subsys:
 	nfnetlink_subsys_unregister(&ctnl_subsys);
 err_out:
@@ -3923,7 +3900,9 @@ static int __init ctnetlink_init(void)
 
 static void __exit ctnetlink_exit(void)
 {
-	unregister_pernet_subsys(&ctnetlink_net_ops);
+#ifdef CONFIG_NF_CONNTRACK_EVENTS
+	nf_conntrack_unregister_notifier();
+#endif
 	nfnetlink_subsys_unregister(&ctnl_exp_subsys);
 	nfnetlink_subsys_unregister(&ctnl_subsys);
 #ifdef CONFIG_NETFILTER_NETLINK_GLUE_CT
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 03/16] netfilter: ecache: move to separate structure
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
  2022-03-23 13:21 ` [PATCH nf-next v3 01/16] nfnetlink: handle already-released nl socket Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 02/16] netfilter: ctnetlink: make ecache event cb global again Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 04/16] netfilter: ecache: use dedicated list for event redelivery Florian Westphal
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This makes it easier for a followup patch to only expose ecache
related parts of nf_conntrack_net structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack.h |  8 ++++++--
 net/netfilter/nf_conntrack_ecache.c  | 19 ++++++++++---------
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index b08b70989d2c..69e6c6a218be 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -43,6 +43,11 @@ union nf_conntrack_expect_proto {
 	/* insert expect proto private data here */
 };
 
+struct nf_conntrack_net_ecache {
+	struct delayed_work dwork;
+	struct netns_ct *ct_net;
+};
+
 struct nf_conntrack_net {
 	/* only used when new connection is allocated: */
 	atomic_t count;
@@ -58,8 +63,7 @@ struct nf_conntrack_net {
 	struct ctl_table_header	*sysctl_header;
 #endif
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
-	struct delayed_work ecache_dwork;
-	struct netns_ct *ct_net;
+	struct nf_conntrack_net_ecache ecache;
 #endif
 };
 
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 9ad501d14249..c9431fa9d51b 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -97,8 +97,8 @@ static enum retry_state ecache_work_evict_list(struct ct_pcpu *pcpu)
 
 static void ecache_work(struct work_struct *work)
 {
-	struct nf_conntrack_net *cnet = container_of(work, struct nf_conntrack_net, ecache_dwork.work);
-	struct netns_ct *ctnet = cnet->ct_net;
+	struct nf_conntrack_net *cnet = container_of(work, struct nf_conntrack_net, ecache.dwork.work);
+	struct netns_ct *ctnet = cnet->ecache.ct_net;
 	int cpu, delay = -1;
 	struct ct_pcpu *pcpu;
 
@@ -128,7 +128,7 @@ static void ecache_work(struct work_struct *work)
 
 	ctnet->ecache_dwork_pending = delay > 0;
 	if (delay >= 0)
-		schedule_delayed_work(&cnet->ecache_dwork, delay);
+		schedule_delayed_work(&cnet->ecache.dwork, delay);
 }
 
 static int __nf_conntrack_eventmask_report(struct nf_conntrack_ecache *e,
@@ -282,12 +282,12 @@ void nf_conntrack_ecache_work(struct net *net, enum nf_ct_ecache_state state)
 	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
 
 	if (state == NFCT_ECACHE_DESTROY_FAIL &&
-	    !delayed_work_pending(&cnet->ecache_dwork)) {
-		schedule_delayed_work(&cnet->ecache_dwork, HZ);
+	    !delayed_work_pending(&cnet->ecache.dwork)) {
+		schedule_delayed_work(&cnet->ecache.dwork, HZ);
 		net->ct.ecache_dwork_pending = true;
 	} else if (state == NFCT_ECACHE_DESTROY_SENT) {
 		net->ct.ecache_dwork_pending = false;
-		mod_delayed_work(system_wq, &cnet->ecache_dwork, 0);
+		mod_delayed_work(system_wq, &cnet->ecache.dwork, 0);
 	}
 }
 
@@ -299,8 +299,9 @@ void nf_conntrack_ecache_pernet_init(struct net *net)
 	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
 
 	net->ct.sysctl_events = nf_ct_events;
-	cnet->ct_net = &net->ct;
-	INIT_DELAYED_WORK(&cnet->ecache_dwork, ecache_work);
+
+	cnet->ecache.ct_net = &net->ct;
+	INIT_DELAYED_WORK(&cnet->ecache.dwork, ecache_work);
 
 	BUILD_BUG_ON(__IPCT_MAX >= 16);	/* e->ctmask is u16 */
 }
@@ -309,5 +310,5 @@ void nf_conntrack_ecache_pernet_fini(struct net *net)
 {
 	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
 
-	cancel_delayed_work_sync(&cnet->ecache_dwork);
+	cancel_delayed_work_sync(&cnet->ecache.dwork);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 04/16] netfilter: ecache: use dedicated list for event redelivery
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (2 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 03/16] netfilter: ecache: move to separate structure Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 05/16] netfilter: conntrack: split inner loop of list dumping to own function Florian Westphal
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This disentangles event redelivery and the percpu dying list.

Because entries are now stored on a dedicated list, all
entries are in NFCT_ECACHE_DESTROY_FAIL state and all entries
still have confirmed bit set -- the reference count is at least 1.

The 'struct net' back-pointer can be removed as well.

The pcpu dying list will be removed eventually, it has no functionality.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack.h        |   3 +-
 include/net/netfilter/nf_conntrack_ecache.h |   2 -
 net/netfilter/nf_conntrack_core.c           |  33 +++++-
 net/netfilter/nf_conntrack_ecache.c         | 118 +++++++++-----------
 4 files changed, 83 insertions(+), 73 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 69e6c6a218be..28672a944499 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -45,7 +45,8 @@ union nf_conntrack_expect_proto {
 
 struct nf_conntrack_net_ecache {
 	struct delayed_work dwork;
-	struct netns_ct *ct_net;
+	spinlock_t dying_lock;
+	struct hlist_nulls_head dying_list;
 };
 
 struct nf_conntrack_net {
diff --git a/include/net/netfilter/nf_conntrack_ecache.h b/include/net/netfilter/nf_conntrack_ecache.h
index 31e6a7572bb7..0364c4782ec3 100644
--- a/include/net/netfilter/nf_conntrack_ecache.h
+++ b/include/net/netfilter/nf_conntrack_ecache.h
@@ -14,7 +14,6 @@
 #include <net/netfilter/nf_conntrack_extend.h>
 
 enum nf_ct_ecache_state {
-	NFCT_ECACHE_UNKNOWN,		/* destroy event not sent */
 	NFCT_ECACHE_DESTROY_FAIL,	/* tried but failed to send destroy event */
 	NFCT_ECACHE_DESTROY_SENT,	/* sent destroy event after failure */
 };
@@ -23,7 +22,6 @@ struct nf_conntrack_ecache {
 	unsigned long cache;		/* bitops want long */
 	u16 ctmask;			/* bitmask of ct events to be delivered */
 	u16 expmask;			/* bitmask of expect events to be delivered */
-	enum nf_ct_ecache_state state:8;/* ecache state */
 	u32 missed;			/* missed events */
 	u32 portid;			/* netlink portid of destroyer */
 };
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 0164e5f522e8..ca1d1d105163 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -660,15 +660,12 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 }
 EXPORT_SYMBOL(nf_ct_destroy);
 
-static void nf_ct_delete_from_lists(struct nf_conn *ct)
+static void __nf_ct_delete_from_lists(struct nf_conn *ct)
 {
 	struct net *net = nf_ct_net(ct);
 	unsigned int hash, reply_hash;
 	unsigned int sequence;
 
-	nf_ct_helper_destroy(ct);
-
-	local_bh_disable();
 	do {
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
 		hash = hash_conntrack(net,
@@ -681,12 +678,33 @@ static void nf_ct_delete_from_lists(struct nf_conn *ct)
 
 	clean_from_lists(ct);
 	nf_conntrack_double_unlock(hash, reply_hash);
+}
 
+static void nf_ct_delete_from_lists(struct nf_conn *ct)
+{
+	nf_ct_helper_destroy(ct);
+	local_bh_disable();
+
+	__nf_ct_delete_from_lists(ct);
 	nf_ct_add_to_dying_list(ct);
 
 	local_bh_enable();
 }
 
+static void nf_ct_add_to_ecache_list(struct nf_conn *ct)
+{
+#ifdef CONFIG_NF_CONNTRACK_EVENTS
+	struct nf_conntrack_net *cnet = nf_ct_pernet(nf_ct_net(ct));
+
+	spin_lock(&cnet->ecache.dying_lock);
+	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
+				 &cnet->ecache.dying_list);
+	spin_unlock(&cnet->ecache.dying_lock);
+#else
+	nf_ct_add_to_dying_list(ct);
+#endif
+}
+
 bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
 {
 	struct nf_conn_tstamp *tstamp;
@@ -709,7 +727,12 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
 		/* destroy event was not delivered. nf_ct_put will
 		 * be done by event cache worker on redelivery.
 		 */
-		nf_ct_delete_from_lists(ct);
+		nf_ct_helper_destroy(ct);
+		local_bh_disable();
+		__nf_ct_delete_from_lists(ct);
+		nf_ct_add_to_ecache_list(ct);
+		local_bh_enable();
+
 		nf_conntrack_ecache_work(nf_ct_net(ct), NFCT_ECACHE_DESTROY_FAIL);
 		return false;
 	}
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index c9431fa9d51b..b6680e108b40 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -16,7 +16,6 @@
 #include <linux/vmalloc.h>
 #include <linux/stddef.h>
 #include <linux/err.h>
-#include <linux/percpu.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/slab.h>
@@ -30,8 +29,9 @@
 const struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_event_cb);
 
-#define ECACHE_RETRY_WAIT (HZ/10)
-#define ECACHE_STACK_ALLOC (256 / sizeof(void *))
+#define DYING_NULLS_VAL			((1 << 30) + 1)
+#define ECACHE_MAX_JIFFIES		msecs_to_jiffies(10)
+#define ECACHE_RETRY_JIFFIES		msecs_to_jiffies(10)
 
 enum retry_state {
 	STATE_CONGESTED,
@@ -39,58 +39,59 @@ enum retry_state {
 	STATE_DONE,
 };
 
-static enum retry_state ecache_work_evict_list(struct ct_pcpu *pcpu)
+static enum retry_state ecache_work_evict_list(struct nf_conntrack_net *cnet)
 {
-	struct nf_conn *refs[ECACHE_STACK_ALLOC];
+	unsigned long stop = jiffies + ECACHE_MAX_JIFFIES;
+	struct hlist_nulls_head evicted_list;
 	enum retry_state ret = STATE_DONE;
 	struct nf_conntrack_tuple_hash *h;
 	struct hlist_nulls_node *n;
-	unsigned int evicted = 0;
+	unsigned int sent;
 
-	spin_lock(&pcpu->lock);
+	INIT_HLIST_NULLS_HEAD(&evicted_list, DYING_NULLS_VAL);
 
-	hlist_nulls_for_each_entry(h, n, &pcpu->dying, hnnode) {
+next:
+	sent = 0;
+	spin_lock_bh(&cnet->ecache.dying_lock);
+
+	hlist_nulls_for_each_entry_safe(h, n, &cnet->ecache.dying_list, hnnode) {
 		struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
-		struct nf_conntrack_ecache *e;
-
-		if (!nf_ct_is_confirmed(ct))
-			continue;
-
-		/* This ecache access is safe because the ct is on the
-		 * pcpu dying list and we hold the spinlock -- the entry
-		 * cannot be free'd until after the lock is released.
-		 *
-		 * This is true even if ct has a refcount of 0: the
-		 * cpu that is about to free the entry must remove it
-		 * from the dying list and needs the lock to do so.
-		 */
-		e = nf_ct_ecache_find(ct);
-		if (!e || e->state != NFCT_ECACHE_DESTROY_FAIL)
-			continue;
 
-		/* ct is in NFCT_ECACHE_DESTROY_FAIL state, this means
-		 * the worker owns this entry: the ct will remain valid
-		 * until the worker puts its ct reference.
+		/* The worker owns all entries, ct remains valid until nf_ct_put
+		 * in the loop below.
 		 */
 		if (nf_conntrack_event(IPCT_DESTROY, ct)) {
 			ret = STATE_CONGESTED;
 			break;
 		}
 
-		e->state = NFCT_ECACHE_DESTROY_SENT;
-		refs[evicted] = ct;
+		hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
+		hlist_nulls_add_head(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode, &evicted_list);
 
-		if (++evicted >= ARRAY_SIZE(refs)) {
+		if (time_after(stop, jiffies)) {
 			ret = STATE_RESTART;
 			break;
 		}
+
+		if (sent++ > 16) {
+			spin_unlock_bh(&cnet->ecache.dying_lock);
+			cond_resched();
+			spin_lock_bh(&cnet->ecache.dying_lock);
+			goto next;
+		}
 	}
 
-	spin_unlock(&pcpu->lock);
+	spin_unlock_bh(&cnet->ecache.dying_lock);
 
-	/* can't _put while holding lock */
-	while (evicted)
-		nf_ct_put(refs[--evicted]);
+	hlist_nulls_for_each_entry_safe(h, n, &evicted_list, hnnode) {
+		struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+		hlist_nulls_add_fake(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
+		hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode);
+		nf_ct_put(ct);
+
+		cond_resched();
+	}
 
 	return ret;
 }
@@ -98,35 +99,20 @@ static enum retry_state ecache_work_evict_list(struct ct_pcpu *pcpu)
 static void ecache_work(struct work_struct *work)
 {
 	struct nf_conntrack_net *cnet = container_of(work, struct nf_conntrack_net, ecache.dwork.work);
-	struct netns_ct *ctnet = cnet->ecache.ct_net;
-	int cpu, delay = -1;
-	struct ct_pcpu *pcpu;
-
-	local_bh_disable();
-
-	for_each_possible_cpu(cpu) {
-		enum retry_state ret;
-
-		pcpu = per_cpu_ptr(ctnet->pcpu_lists, cpu);
-
-		ret = ecache_work_evict_list(pcpu);
-
-		switch (ret) {
-		case STATE_CONGESTED:
-			delay = ECACHE_RETRY_WAIT;
-			goto out;
-		case STATE_RESTART:
-			delay = 0;
-			break;
-		case STATE_DONE:
-			break;
-		}
+	int ret, delay = -1;
+
+	ret = ecache_work_evict_list(cnet);
+	switch (ret) {
+	case STATE_CONGESTED:
+		delay = ECACHE_RETRY_JIFFIES;
+		break;
+	case STATE_RESTART:
+		delay = 0;
+		break;
+	case STATE_DONE:
+		break;
 	}
 
- out:
-	local_bh_enable();
-
-	ctnet->ecache_dwork_pending = delay > 0;
 	if (delay >= 0)
 		schedule_delayed_work(&cnet->ecache.dwork, delay);
 }
@@ -199,7 +185,6 @@ int nf_conntrack_eventmask_report(unsigned int events, struct nf_conn *ct,
 		 */
 		if (e->portid == 0 && portid != 0)
 			e->portid = portid;
-		e->state = NFCT_ECACHE_DESTROY_FAIL;
 	}
 
 	return ret;
@@ -286,8 +271,10 @@ void nf_conntrack_ecache_work(struct net *net, enum nf_ct_ecache_state state)
 		schedule_delayed_work(&cnet->ecache.dwork, HZ);
 		net->ct.ecache_dwork_pending = true;
 	} else if (state == NFCT_ECACHE_DESTROY_SENT) {
-		net->ct.ecache_dwork_pending = false;
-		mod_delayed_work(system_wq, &cnet->ecache.dwork, 0);
+		if (!hlist_nulls_empty(&cnet->ecache.dying_list))
+			mod_delayed_work(system_wq, &cnet->ecache.dwork, 0);
+		else
+			net->ct.ecache_dwork_pending = false;
 	}
 }
 
@@ -300,8 +287,9 @@ void nf_conntrack_ecache_pernet_init(struct net *net)
 
 	net->ct.sysctl_events = nf_ct_events;
 
-	cnet->ecache.ct_net = &net->ct;
 	INIT_DELAYED_WORK(&cnet->ecache.dwork, ecache_work);
+	INIT_HLIST_NULLS_HEAD(&cnet->ecache.dying_list, DYING_NULLS_VAL);
+	spin_lock_init(&cnet->ecache.dying_lock);
 
 	BUILD_BUG_ON(__IPCT_MAX >= 16);	/* e->ctmask is u16 */
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 05/16] netfilter: conntrack: split inner loop of list dumping to own function
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (3 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 04/16] netfilter: ecache: use dedicated list for event redelivery Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 06/16] netfilter: conntrack: include ecache dying list in dumps Florian Westphal
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This allows code re-use in the followup patch.
No functional changes intended.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_netlink.c | 68 ++++++++++++++++++----------
 1 file changed, 43 insertions(+), 25 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 4a460565f275..5f3d211a41e3 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1708,6 +1708,47 @@ static int ctnetlink_done_list(struct netlink_callback *cb)
 	return 0;
 }
 
+static int ctnetlink_dump_one_entry(struct sk_buff *skb,
+				    struct netlink_callback *cb,
+				    struct nf_conn *ct,
+				    bool dying)
+{
+	struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx;
+	struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
+	u8 l3proto = nfmsg->nfgen_family;
+	int res;
+
+	if (l3proto && nf_ct_l3num(ct) != l3proto)
+		return 0;
+
+	if (ctx->last) {
+		if (ct != ctx->last)
+			return 0;
+
+		ctx->last = NULL;
+	}
+
+	/* We can't dump extension info for the unconfirmed
+	 * list because unconfirmed conntracks can have
+	 * ct->ext reallocated (and thus freed).
+	 *
+	 * In the dying list case ct->ext can't be free'd
+	 * until after we drop pcpu->lock.
+	 */
+	res = ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).portid,
+				  cb->nlh->nlmsg_seq,
+				  NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
+				  ct, dying, 0);
+	if (res < 0) {
+		if (!refcount_inc_not_zero(&ct->ct_general.use))
+			return 0;
+
+		ctx->last = ct;
+	}
+
+	return res;
+}
+
 static int
 ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying)
 {
@@ -1715,12 +1756,9 @@ ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying
 	struct nf_conn *ct, *last;
 	struct nf_conntrack_tuple_hash *h;
 	struct hlist_nulls_node *n;
-	struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
-	u_int8_t l3proto = nfmsg->nfgen_family;
-	int res;
-	int cpu;
 	struct hlist_nulls_head *list;
 	struct net *net = sock_net(skb->sk);
+	int res, cpu;
 
 	if (ctx->done)
 		return 0;
@@ -1739,30 +1777,10 @@ ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying
 restart:
 		hlist_nulls_for_each_entry(h, n, list, hnnode) {
 			ct = nf_ct_tuplehash_to_ctrack(h);
-			if (l3proto && nf_ct_l3num(ct) != l3proto)
-				continue;
-			if (ctx->last) {
-				if (ct != last)
-					continue;
-				ctx->last = NULL;
-			}
 
-			/* We can't dump extension info for the unconfirmed
-			 * list because unconfirmed conntracks can have
-			 * ct->ext reallocated (and thus freed).
-			 *
-			 * In the dying list case ct->ext can't be free'd
-			 * until after we drop pcpu->lock.
-			 */
-			res = ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).portid,
-						  cb->nlh->nlmsg_seq,
-						  NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
-						  ct, dying, 0);
+			res = ctnetlink_dump_one_entry(skb, cb, ct, dying);
 			if (res < 0) {
-				if (!refcount_inc_not_zero(&ct->ct_general.use))
-					continue;
 				ctx->cpu = cpu;
-				ctx->last = ct;
 				spin_unlock_bh(&pcpu->lock);
 				goto out;
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 06/16] netfilter: conntrack: include ecache dying list in dumps
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (4 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 05/16] netfilter: conntrack: split inner loop of list dumping to own function Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 07/16] netfilter: conntrack: remove the percpu dying list Florian Westphal
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

The new pernet dying list includes conntrack entries that await
delivery of the 'destroy' event via ctnetlink.

The old percpu dying list will be removed soon.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_ecache.h |  2 +
 net/netfilter/nf_conntrack_ecache.c         | 10 +++++
 net/netfilter/nf_conntrack_netlink.c        | 43 +++++++++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/include/net/netfilter/nf_conntrack_ecache.h b/include/net/netfilter/nf_conntrack_ecache.h
index 0364c4782ec3..c8dd9cd8cb64 100644
--- a/include/net/netfilter/nf_conntrack_ecache.h
+++ b/include/net/netfilter/nf_conntrack_ecache.h
@@ -148,6 +148,8 @@ void nf_conntrack_ecache_work(struct net *net, enum nf_ct_ecache_state state);
 void nf_conntrack_ecache_pernet_init(struct net *net);
 void nf_conntrack_ecache_pernet_fini(struct net *net);
 
+struct nf_conntrack_net_ecache *nf_conn_pernet_ecache(const struct net *net);
+
 static inline bool nf_conntrack_ecache_dwork_pending(const struct net *net)
 {
 	return net->ct.ecache_dwork_pending;
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index b6680e108b40..d34939797638 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -39,6 +39,16 @@ enum retry_state {
 	STATE_DONE,
 };
 
+struct nf_conntrack_net_ecache *nf_conn_pernet_ecache(const struct net *net)
+{
+	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
+
+	return &cnet->ecache;
+}
+#if IS_MODULE(CONFIG_NF_CT_NETLINK)
+EXPORT_SYMBOL_GPL(nf_conn_pernet_ecache);
+#endif
+
 static enum retry_state ecache_work_evict_list(struct nf_conntrack_net *cnet)
 {
 	unsigned long stop = jiffies + ECACHE_MAX_JIFFIES;
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 5f3d211a41e3..3cd0d9d35b13 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -62,6 +62,7 @@ struct ctnetlink_list_dump_ctx {
 	struct nf_conn *last;
 	unsigned int cpu;
 	bool done;
+	bool retrans_done;
 };
 
 static int ctnetlink_dump_tuples_proto(struct sk_buff *skb,
@@ -1802,6 +1803,48 @@ ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying
 static int
 ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx;
+	struct nf_conn *last = ctx->last;
+#ifdef CONFIG_NF_CONNTRACK_EVENTS
+	const struct net *net = sock_net(skb->sk);
+	struct nf_conntrack_net_ecache *ecache_net;
+	struct nf_conntrack_tuple_hash *h;
+	struct hlist_nulls_node *n;
+#endif
+
+	if (ctx->retrans_done)
+		return ctnetlink_dump_list(skb, cb, true);
+
+	ctx->last = NULL;
+
+#ifdef CONFIG_NF_CONNTRACK_EVENTS
+	ecache_net = nf_conn_pernet_ecache(net);
+	spin_lock_bh(&ecache_net->dying_lock);
+
+	hlist_nulls_for_each_entry(h, n, &ecache_net->dying_list, hnnode) {
+		struct nf_conn *ct;
+		int res;
+
+		ct = nf_ct_tuplehash_to_ctrack(h);
+		if (last && last != ct)
+			continue;
+
+		res = ctnetlink_dump_one_entry(skb, cb, ct, true);
+		if (res < 0) {
+			spin_unlock_bh(&ecache_net->dying_lock);
+			nf_ct_put(last);
+			return skb->len;
+		}
+
+		nf_ct_put(last);
+		last = NULL;
+	}
+
+	spin_unlock_bh(&ecache_net->dying_lock);
+#endif
+	nf_ct_put(last);
+	ctx->retrans_done = true;
+
 	return ctnetlink_dump_list(skb, cb, true);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 07/16] netfilter: conntrack: remove the percpu dying list
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (5 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 06/16] netfilter: conntrack: include ecache dying list in dumps Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 08/16] netfilter: cttimeout: inc/dec module refcount per object, not per use refcount Florian Westphal
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Its no longer needed. Entries that need event redelivery are placed
on the new pernet dying list.

The advantage is that there is no need to take additional spinlock on
conntrack removal unless event redelivery failed or the conntrack entry
was never added to the table in the first place (confirmed bit not set).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netns/conntrack.h        |  1 -
 net/netfilter/nf_conntrack_core.c    | 35 +++++-----------------------
 net/netfilter/nf_conntrack_ecache.c  |  1 -
 net/netfilter/nf_conntrack_netlink.c | 23 ++++++------------
 4 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 3bb62e938fa9..dd1d096b2ada 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -96,7 +96,6 @@ struct nf_ip_net {
 struct ct_pcpu {
 	spinlock_t		lock;
 	struct hlist_nulls_head unconfirmed;
-	struct hlist_nulls_head dying;
 };
 
 struct netns_ct {
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ca1d1d105163..9010b6e5a072 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -525,21 +525,6 @@ clean_from_lists(struct nf_conn *ct)
 	nf_ct_remove_expectations(ct);
 }
 
-/* must be called with local_bh_disable */
-static void nf_ct_add_to_dying_list(struct nf_conn *ct)
-{
-	struct ct_pcpu *pcpu;
-
-	/* add this conntrack to the (per cpu) dying list */
-	ct->cpu = smp_processor_id();
-	pcpu = per_cpu_ptr(nf_ct_net(ct)->ct.pcpu_lists, ct->cpu);
-
-	spin_lock(&pcpu->lock);
-	hlist_nulls_add_head(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
-			     &pcpu->dying);
-	spin_unlock(&pcpu->lock);
-}
-
 /* must be called with local_bh_disable */
 static void nf_ct_add_to_unconfirmed_list(struct nf_conn *ct)
 {
@@ -556,11 +541,11 @@ static void nf_ct_add_to_unconfirmed_list(struct nf_conn *ct)
 }
 
 /* must be called with local_bh_disable */
-static void nf_ct_del_from_dying_or_unconfirmed_list(struct nf_conn *ct)
+static void nf_ct_del_from_unconfirmed_list(struct nf_conn *ct)
 {
 	struct ct_pcpu *pcpu;
 
-	/* We overload first tuple to link into unconfirmed or dying list.*/
+	/* We overload first tuple to link into unconfirmed list.*/
 	pcpu = per_cpu_ptr(nf_ct_net(ct)->ct.pcpu_lists, ct->cpu);
 
 	spin_lock(&pcpu->lock);
@@ -648,7 +633,8 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 	 */
 	nf_ct_remove_expectations(ct);
 
-	nf_ct_del_from_dying_or_unconfirmed_list(ct);
+	if (unlikely(!nf_ct_is_confirmed(ct)))
+		nf_ct_del_from_unconfirmed_list(ct);
 
 	local_bh_enable();
 
@@ -686,7 +672,6 @@ static void nf_ct_delete_from_lists(struct nf_conn *ct)
 	local_bh_disable();
 
 	__nf_ct_delete_from_lists(ct);
-	nf_ct_add_to_dying_list(ct);
 
 	local_bh_enable();
 }
@@ -700,8 +685,6 @@ static void nf_ct_add_to_ecache_list(struct nf_conn *ct)
 	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
 				 &cnet->ecache.dying_list);
 	spin_unlock(&cnet->ecache.dying_lock);
-#else
-	nf_ct_add_to_dying_list(ct);
 #endif
 }
 
@@ -995,7 +978,6 @@ static void __nf_conntrack_insert_prepare(struct nf_conn *ct)
 	struct nf_conn_tstamp *tstamp;
 
 	refcount_inc(&ct->ct_general.use);
-	ct->status |= IPS_CONFIRMED;
 
 	/* set conntrack timestamp, if enabled. */
 	tstamp = nf_conn_tstamp_find(ct);
@@ -1024,7 +1006,6 @@ static int __nf_ct_resolve_clash(struct sk_buff *skb,
 		nf_conntrack_get(&ct->ct_general);
 
 		nf_ct_acct_merge(ct, ctinfo, loser_ct);
-		nf_ct_add_to_dying_list(loser_ct);
 		nf_ct_put(loser_ct);
 		nf_ct_set(skb, ct, ctinfo);
 
@@ -1157,7 +1138,6 @@ nf_ct_resolve_clash(struct sk_buff *skb, struct nf_conntrack_tuple_hash *h,
 		return ret;
 
 drop:
-	nf_ct_add_to_dying_list(loser_ct);
 	NF_CT_STAT_INC(net, drop);
 	NF_CT_STAT_INC(net, insert_failed);
 	return NF_DROP;
@@ -1224,10 +1204,10 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	 * user context, else we insert an already 'dead' hash, blocking
 	 * further use of that particular connection -JM.
 	 */
-	nf_ct_del_from_dying_or_unconfirmed_list(ct);
+	nf_ct_del_from_unconfirmed_list(ct);
+	ct->status |= IPS_CONFIRMED;
 
 	if (unlikely(nf_ct_is_dying(ct))) {
-		nf_ct_add_to_dying_list(ct);
 		NF_CT_STAT_INC(net, insert_failed);
 		goto dying;
 	}
@@ -1251,7 +1231,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 			goto out;
 		if (chainlen++ > max_chainlen) {
 chaintoolong:
-			nf_ct_add_to_dying_list(ct);
 			NF_CT_STAT_INC(net, chaintoolong);
 			NF_CT_STAT_INC(net, insert_failed);
 			ret = NF_DROP;
@@ -2800,7 +2779,6 @@ void nf_conntrack_init_end(void)
  * We need to use special "null" values, not used in hash table
  */
 #define UNCONFIRMED_NULLS_VAL	((1<<30)+0)
-#define DYING_NULLS_VAL		((1<<30)+1)
 
 int nf_conntrack_init_net(struct net *net)
 {
@@ -2821,7 +2799,6 @@ int nf_conntrack_init_net(struct net *net)
 
 		spin_lock_init(&pcpu->lock);
 		INIT_HLIST_NULLS_HEAD(&pcpu->unconfirmed, UNCONFIRMED_NULLS_VAL);
-		INIT_HLIST_NULLS_HEAD(&pcpu->dying, DYING_NULLS_VAL);
 	}
 
 	net->ct.stat = alloc_percpu(struct ip_conntrack_stat);
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d34939797638..f060ea7e267f 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -96,7 +96,6 @@ static enum retry_state ecache_work_evict_list(struct nf_conntrack_net *cnet)
 	hlist_nulls_for_each_entry_safe(h, n, &evicted_list, hnnode) {
 		struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 
-		hlist_nulls_add_fake(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
 		hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode);
 		nf_ct_put(ct);
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 3cd0d9d35b13..51e387e8da85 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -62,7 +62,6 @@ struct ctnetlink_list_dump_ctx {
 	struct nf_conn *last;
 	unsigned int cpu;
 	bool done;
-	bool retrans_done;
 };
 
 static int ctnetlink_dump_tuples_proto(struct sk_buff *skb,
@@ -1751,13 +1750,12 @@ static int ctnetlink_dump_one_entry(struct sk_buff *skb,
 }
 
 static int
-ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying)
+ctnetlink_dump_unconfirmed(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx;
 	struct nf_conn *ct, *last;
 	struct nf_conntrack_tuple_hash *h;
 	struct hlist_nulls_node *n;
-	struct hlist_nulls_head *list;
 	struct net *net = sock_net(skb->sk);
 	int res, cpu;
 
@@ -1774,12 +1772,11 @@ ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying
 
 		pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
 		spin_lock_bh(&pcpu->lock);
-		list = dying ? &pcpu->dying : &pcpu->unconfirmed;
 restart:
-		hlist_nulls_for_each_entry(h, n, list, hnnode) {
+		hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
 			ct = nf_ct_tuplehash_to_ctrack(h);
 
-			res = ctnetlink_dump_one_entry(skb, cb, ct, dying);
+			res = ctnetlink_dump_one_entry(skb, cb, ct, false);
 			if (res < 0) {
 				ctx->cpu = cpu;
 				spin_unlock_bh(&pcpu->lock);
@@ -1812,8 +1809,8 @@ ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
 	struct hlist_nulls_node *n;
 #endif
 
-	if (ctx->retrans_done)
-		return ctnetlink_dump_list(skb, cb, true);
+	if (ctx->done)
+		return 0;
 
 	ctx->last = NULL;
 
@@ -1842,10 +1839,10 @@ ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
 
 	spin_unlock_bh(&ecache_net->dying_lock);
 #endif
+	ctx->done = true;
 	nf_ct_put(last);
-	ctx->retrans_done = true;
 
-	return ctnetlink_dump_list(skb, cb, true);
+	return skb->len;
 }
 
 static int ctnetlink_get_ct_dying(struct sk_buff *skb,
@@ -1863,12 +1860,6 @@ static int ctnetlink_get_ct_dying(struct sk_buff *skb,
 	return -EOPNOTSUPP;
 }
 
-static int
-ctnetlink_dump_unconfirmed(struct sk_buff *skb, struct netlink_callback *cb)
-{
-	return ctnetlink_dump_list(skb, cb, false);
-}
-
 static int ctnetlink_get_ct_unconfirmed(struct sk_buff *skb,
 					const struct nfnl_info *info,
 					const struct nlattr * const cda[])
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 08/16] netfilter: cttimeout: inc/dec module refcount per object, not per use refcount
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (6 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 07/16] netfilter: conntrack: remove the percpu dying list Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 09/16] netfilter: nfnetlink_cttimeout: use rcu protection in cttimeout_get_timeout Florian Westphal
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

There is no need to increment the module refcount again, its enough to
obtain one reference per object, i.e. take a reference on object
creation and put it on object destruction.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nfnetlink_cttimeout.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index b0d8888a539b..eea486f32971 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -158,6 +158,7 @@ static int cttimeout_new_timeout(struct sk_buff *skb,
 	timeout->timeout.l3num = l3num;
 	timeout->timeout.l4proto = l4proto;
 	refcount_set(&timeout->refcnt, 1);
+	__module_get(THIS_MODULE);
 	list_add_tail_rcu(&timeout->head, &pernet->nfct_timeout_list);
 
 	return 0;
@@ -506,13 +507,8 @@ static struct nf_ct_timeout *ctnl_timeout_find_get(struct net *net,
 		if (strncmp(timeout->name, name, CTNL_TIMEOUT_NAME_MAX) != 0)
 			continue;
 
-		if (!try_module_get(THIS_MODULE))
+		if (!refcount_inc_not_zero(&timeout->refcnt))
 			goto err;
-
-		if (!refcount_inc_not_zero(&timeout->refcnt)) {
-			module_put(THIS_MODULE);
-			goto err;
-		}
 		matching = timeout;
 		break;
 	}
@@ -525,10 +521,10 @@ static void ctnl_timeout_put(struct nf_ct_timeout *t)
 	struct ctnl_timeout *timeout =
 		container_of(t, struct ctnl_timeout, timeout);
 
-	if (refcount_dec_and_test(&timeout->refcnt))
+	if (refcount_dec_and_test(&timeout->refcnt)) {
 		kfree_rcu(timeout, rcu_head);
-
-	module_put(THIS_MODULE);
+		module_put(THIS_MODULE);
+	}
 }
 
 static const struct nfnl_callback cttimeout_cb[IPCTNL_MSG_TIMEOUT_MAX] = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 09/16] netfilter: nfnetlink_cttimeout: use rcu protection in cttimeout_get_timeout
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (7 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 08/16] netfilter: cttimeout: inc/dec module refcount per object, not per use refcount Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-04-08  9:53   ` Pablo Neira Ayuso
  2022-03-23 13:22 ` [PATCH nf-next v3 10/16] netfilter: cttimeout: decouple unlink and free on netns destruction Florian Westphal
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

I'd like to be able to switch lifetime management of ctnl_timeout
to free-on-zero-refcount.

This isn't possible at the moment because removal of the structures
from the pernet list requires the nfnl mutex and release may happen from
softirq.

Current solution is to prevent this by disallowing policy object removal
if the refcount is > 1 (i.e., policy is still referenced from the ruleset).

Switch traversal to rcu-read-lock as a first step to reduce reliance on
nfnl mutex protection: removal from softirq would require a extra list
spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nfnetlink_cttimeout.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index eea486f32971..aef2547bb579 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -253,6 +253,7 @@ static int cttimeout_get_timeout(struct sk_buff *skb,
 				 const struct nlattr * const cda[])
 {
 	struct nfct_timeout_pernet *pernet = nfct_timeout_pernet(info->net);
+	struct sk_buff *skb2;
 	int ret = -ENOENT;
 	char *name;
 	struct ctnl_timeout *cur;
@@ -268,31 +269,31 @@ static int cttimeout_get_timeout(struct sk_buff *skb,
 		return -EINVAL;
 	name = nla_data(cda[CTA_TIMEOUT_NAME]);
 
-	list_for_each_entry(cur, &pernet->nfct_timeout_list, head) {
-		struct sk_buff *skb2;
+	skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb2)
+		return -ENOMEM;
+
+	rcu_read_lock();
 
+	list_for_each_entry_rcu(cur, &pernet->nfct_timeout_list, head) {
 		if (strncmp(cur->name, name, CTNL_TIMEOUT_NAME_MAX) != 0)
 			continue;
 
-		skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
-		if (skb2 == NULL) {
-			ret = -ENOMEM;
-			break;
-		}
-
 		ret = ctnl_timeout_fill_info(skb2, NETLINK_CB(skb).portid,
 					     info->nlh->nlmsg_seq,
 					     NFNL_MSG_TYPE(info->nlh->nlmsg_type),
 					     IPCTNL_MSG_TIMEOUT_NEW, cur);
-		if (ret <= 0) {
-			kfree_skb(skb2);
+		if (ret <= 0)
 			break;
-		}
 
-		ret = nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid);
-		break;
+		rcu_read_unlock();
+
+		return nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid);
 	}
 
+	rcu_read_unlock();
+	kfree_skb(skb2);
+
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 10/16] netfilter: cttimeout: decouple unlink and free on netns destruction
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (8 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 09/16] netfilter: nfnetlink_cttimeout: use rcu protection in cttimeout_get_timeout Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 11/16] netfilter: remove nf_ct_unconfirmed_destroy helper Florian Westphal
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Make it so netns pre_exit unlinks the objects from the pernet list, so
they cannot be found anymore.

netns core issues a synchronize_rcu() before calling the exit hooks so
any the time the exit hooks run unconfirmed nf_conn entries have been
free'd or they have been committed to the hashtable.

The exit hook still tags unconfirmed entries as dying, this can
now be removed in a followup change.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_timeout.h |  8 ------
 net/netfilter/nfnetlink_cttimeout.c          | 30 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_timeout.h b/include/net/netfilter/nf_conntrack_timeout.h
index 3ea94f6f3844..fea258983d23 100644
--- a/include/net/netfilter/nf_conntrack_timeout.h
+++ b/include/net/netfilter/nf_conntrack_timeout.h
@@ -17,14 +17,6 @@ struct nf_ct_timeout {
 	char			data[];
 };
 
-struct ctnl_timeout {
-	struct list_head	head;
-	struct rcu_head		rcu_head;
-	refcount_t		refcnt;
-	char			name[CTNL_TIMEOUT_NAME_MAX];
-	struct nf_ct_timeout	timeout;
-};
-
 struct nf_conn_timeout {
 	struct nf_ct_timeout __rcu *timeout;
 };
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index aef2547bb579..45a87eaffebe 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -33,8 +33,19 @@
 
 static unsigned int nfct_timeout_id __read_mostly;
 
+struct ctnl_timeout {
+	struct list_head	head;
+	struct rcu_head		rcu_head;
+	refcount_t		refcnt;
+	char			name[CTNL_TIMEOUT_NAME_MAX];
+	struct nf_ct_timeout	timeout;
+
+	struct list_head	free_head;
+};
+
 struct nfct_timeout_pernet {
 	struct list_head	nfct_timeout_list;
+	struct list_head	nfct_timeout_freelist;
 };
 
 MODULE_LICENSE("GPL");
@@ -575,10 +586,24 @@ static int __net_init cttimeout_net_init(struct net *net)
 	struct nfct_timeout_pernet *pernet = nfct_timeout_pernet(net);
 
 	INIT_LIST_HEAD(&pernet->nfct_timeout_list);
+	INIT_LIST_HEAD(&pernet->nfct_timeout_freelist);
 
 	return 0;
 }
 
+static void __net_exit cttimeout_net_pre_exit(struct net *net)
+{
+	struct nfct_timeout_pernet *pernet = nfct_timeout_pernet(net);
+	struct ctnl_timeout *cur, *tmp;
+
+	list_for_each_entry_safe(cur, tmp, &pernet->nfct_timeout_list, head) {
+		list_del_rcu(&cur->head);
+		list_add(&cur->free_head, &pernet->nfct_timeout_freelist);
+	}
+
+	/* core calls synchronize_rcu() after this */
+}
+
 static void __net_exit cttimeout_net_exit(struct net *net)
 {
 	struct nfct_timeout_pernet *pernet = nfct_timeout_pernet(net);
@@ -587,8 +612,8 @@ static void __net_exit cttimeout_net_exit(struct net *net)
 	nf_ct_unconfirmed_destroy(net);
 	nf_ct_untimeout(net, NULL);
 
-	list_for_each_entry_safe(cur, tmp, &pernet->nfct_timeout_list, head) {
-		list_del_rcu(&cur->head);
+	list_for_each_entry_safe(cur, tmp, &pernet->nfct_timeout_freelist, head) {
+		list_del(&cur->free_head);
 
 		if (refcount_dec_and_test(&cur->refcnt))
 			kfree_rcu(cur, rcu_head);
@@ -597,6 +622,7 @@ static void __net_exit cttimeout_net_exit(struct net *net)
 
 static struct pernet_operations cttimeout_ops = {
 	.init	= cttimeout_net_init,
+	.pre_exit = cttimeout_net_pre_exit,
 	.exit	= cttimeout_net_exit,
 	.id     = &nfct_timeout_id,
 	.size   = sizeof(struct nfct_timeout_pernet),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 11/16] netfilter: remove nf_ct_unconfirmed_destroy helper
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (9 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 10/16] netfilter: cttimeout: decouple unlink and free on netns destruction Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 12/16] netfilter: extensions: introduce extension genid count Florian Westphal
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This helper tags connetions not yet in the conntrack table as
dying.  These nf_conn entries will be dropped instead when the
core attempts to insert them from the input or postrouting
'confirm' hook.

After the previous change, the entries get unlinked from the
list earlier, so that by the time the actual exit hook runs,
new connections no longer have a timeout policy assigned.

Its enough to walk the hashtable instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack.h |  3 ---
 net/netfilter/nf_conntrack_core.c    | 14 --------------
 net/netfilter/nfnetlink_cttimeout.c  |  4 +++-
 3 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 28672a944499..f60212244b13 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -237,9 +237,6 @@ static inline bool nf_ct_kill(struct nf_conn *ct)
 	return nf_ct_delete(ct, 0, 0);
 }
 
-/* Set all unconfirmed conntrack as dying */
-void nf_ct_unconfirmed_destroy(struct net *);
-
 /* Iterate over all conntracks: if iter returns true, it's deleted. */
 void nf_ct_iterate_cleanup_net(struct net *net,
 			       int (*iter)(struct nf_conn *i, void *data),
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9010b6e5a072..b3cc318ceb45 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2431,20 +2431,6 @@ __nf_ct_unconfirmed_destroy(struct net *net)
 	}
 }
 
-void nf_ct_unconfirmed_destroy(struct net *net)
-{
-	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
-
-	might_sleep();
-
-	if (atomic_read(&cnet->count) > 0) {
-		__nf_ct_unconfirmed_destroy(net);
-		nf_queue_nf_hook_drop(net);
-		synchronize_net();
-	}
-}
-EXPORT_SYMBOL_GPL(nf_ct_unconfirmed_destroy);
-
 void nf_ct_iterate_cleanup_net(struct net *net,
 			       int (*iter)(struct nf_conn *i, void *data),
 			       void *data, u32 portid, int report)
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 45a87eaffebe..49fe2ec40af9 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -609,7 +609,9 @@ static void __net_exit cttimeout_net_exit(struct net *net)
 	struct nfct_timeout_pernet *pernet = nfct_timeout_pernet(net);
 	struct ctnl_timeout *cur, *tmp;
 
-	nf_ct_unconfirmed_destroy(net);
+	if (list_empty(&pernet->nfct_timeout_freelist))
+		return;
+
 	nf_ct_untimeout(net, NULL);
 
 	list_for_each_entry_safe(cur, tmp, &pernet->nfct_timeout_freelist, head) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 12/16] netfilter: extensions: introduce extension genid count
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (10 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 11/16] netfilter: remove nf_ct_unconfirmed_destroy helper Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 13/16] netfilter: cttimeout: decouple unlink and free on netns destruction Florian Westphal
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Multiple netfilter extensions store pointers to external data
in their extension area struct.

Examples:
1. Timeout policies
2. Connection tracking helpers.

No references are taken for these.

When a helper or timeout policy is removed, the conntrack table gets
traversed and affected extensions are cleared.

Conntrack entries not yet in the hashtable are referenced via a special
list, the unconfirmed list.

On removal of a policy or connection tracking helper, the unconfirmed
list gets traversed an all entries are marked as dying, this prevents
them from getting committed to the table at insertion time: core checks
for dying bit, if set, the conntrack entry gets destroyed at confirm
time.

The disadvantage is that each new conntrack has to be added to the percpu
unconfirmed list, and each insertion needs to remove it from this list.
The list is only ever needed when a policy or helper is removed -- a rare
occurrence.

Add a generation ID count: Instead of adding to the list and then
traversing that list on policy/helper removal, increment a counter
that is stored in the extension area.

For unconfirmed conntracks, the extension has the genid valid at ct
allocation time.

Removal of a helper/policy etc. increments the counter.
At confirmation time, validate that ext->genid == global_id.

If the stored number is not the same, do not allow the conntrack
insertion, just like as if a confirmed-list traversal would have flagged
the entry as dying.

After insertion, the genid is no longer relevant (conntrack entries
are now reachable via the conntrack table iterators and is set to 0.

This allows removal of the percpu unconfirmed list.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_extend.h | 31 ++++++------
 include/net/netfilter/nf_conntrack_labels.h | 10 +++-
 net/netfilter/nf_conntrack_core.c           | 55 +++++++++++++++++++++
 net/netfilter/nf_conntrack_extend.c         | 32 +++++++++++-
 4 files changed, 111 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
index 96635ad2acc7..0b247248b032 100644
--- a/include/net/netfilter/nf_conntrack_extend.h
+++ b/include/net/netfilter/nf_conntrack_extend.h
@@ -34,21 +34,11 @@ enum nf_ct_ext_id {
 	NF_CT_EXT_NUM,
 };
 
-#define NF_CT_EXT_HELPER_TYPE struct nf_conn_help
-#define NF_CT_EXT_NAT_TYPE struct nf_conn_nat
-#define NF_CT_EXT_SEQADJ_TYPE struct nf_conn_seqadj
-#define NF_CT_EXT_ACCT_TYPE struct nf_conn_acct
-#define NF_CT_EXT_ECACHE_TYPE struct nf_conntrack_ecache
-#define NF_CT_EXT_TSTAMP_TYPE struct nf_conn_tstamp
-#define NF_CT_EXT_TIMEOUT_TYPE struct nf_conn_timeout
-#define NF_CT_EXT_LABELS_TYPE struct nf_conn_labels
-#define NF_CT_EXT_SYNPROXY_TYPE struct nf_conn_synproxy
-#define NF_CT_EXT_ACT_CT_TYPE struct nf_conn_act_ct_ext
-
 /* Extensions: optional stuff which isn't permanently in struct. */
 struct nf_ct_ext {
 	u8 offset[NF_CT_EXT_NUM];
 	u8 len;
+	unsigned int gen_id;
 	char data[] __aligned(8);
 };
 
@@ -62,17 +52,28 @@ static inline bool nf_ct_ext_exist(const struct nf_conn *ct, u8 id)
 	return (ct->ext && __nf_ct_ext_exist(ct->ext, id));
 }
 
-static inline void *__nf_ct_ext_find(const struct nf_conn *ct, u8 id)
+void *__nf_ct_ext_find(const struct nf_ct_ext *ext, u8 id);
+
+static inline void *nf_ct_ext_find(const struct nf_conn *ct, u8 id)
 {
-	if (!nf_ct_ext_exist(ct, id))
+	struct nf_ct_ext *ext = ct->ext;
+
+	if (!ext || !__nf_ct_ext_exist(ext, id))
 		return NULL;
 
+	if (unlikely(ext->gen_id))
+		return __nf_ct_ext_find(ext, id);
+
 	return (void *)ct->ext + ct->ext->offset[id];
 }
-#define nf_ct_ext_find(ext, id)	\
-	((id##_TYPE *)__nf_ct_ext_find((ext), (id)))
 
 /* Add this type, returns pointer to data or NULL. */
 void *nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp);
 
+/* ext genid.  if ext->id != ext_genid, extensions cannot be used
+ * anymore unless conntrack has CONFIRMED bit set.
+ */
+extern atomic_t nf_conntrack_ext_genid;
+void nf_ct_ext_bump_genid(void);
+
 #endif /* _NF_CONNTRACK_EXTEND_H */
diff --git a/include/net/netfilter/nf_conntrack_labels.h b/include/net/netfilter/nf_conntrack_labels.h
index 3c23298e68ca..ff6afacefd6d 100644
--- a/include/net/netfilter/nf_conntrack_labels.h
+++ b/include/net/netfilter/nf_conntrack_labels.h
@@ -17,10 +17,18 @@ struct nf_conn_labels {
 	unsigned long bits[NF_CT_LABELS_MAX_SIZE / sizeof(long)];
 };
 
+/* Can't use nf_ct_ext_find(), flow dissector cann't use symbols
+ * exported by nf_conntrack module.
+ */
 static inline struct nf_conn_labels *nf_ct_labels_find(const struct nf_conn *ct)
 {
 #ifdef CONFIG_NF_CONNTRACK_LABELS
-	return nf_ct_ext_find(ct, NF_CT_EXT_LABELS);
+	struct nf_ct_ext *ext = ct->ext;
+
+	if (!ext || !__nf_ct_ext_exist(ext, NF_CT_EXT_LABELS))
+		return NULL;
+
+	return (void *)ct->ext + ct->ext->offset[NF_CT_EXT_LABELS];
 #else
 	return NULL;
 #endif
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index b3cc318ceb45..76310940cbd7 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -876,6 +876,33 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
 			   &nf_conntrack_hash[reply_hash]);
 }
 
+static bool nf_ct_ext_valid_pre(const struct nf_ct_ext *ext)
+{
+	/* if ext->gen_id is not equal to nf_conntrack_ext_genid, some extensions
+	 * may contain stale pointers to e.g. helper that has been removed.
+	 *
+	 * The helper can't clear this because the nf_conn object isn't in
+	 * any hash and synchronize_rcu() isn't enough because associated skb
+	 * might sit in a queue.
+	 */
+	return !ext || ext->gen_id == atomic_read(&nf_conntrack_ext_genid);
+}
+
+static bool nf_ct_ext_valid_post(struct nf_ct_ext *ext)
+{
+	if (!ext)
+		return true;
+
+	if (ext->gen_id != atomic_read(&nf_conntrack_ext_genid))
+		return false;
+
+	/* inserted into conntrack table, nf_ct_iterate_cleanup()
+	 * will find it.  Disable nf_ct_ext_find() id check.
+	 */
+	WRITE_ONCE(ext->gen_id, 0);
+	return true;
+}
+
 int
 nf_conntrack_hash_check_insert(struct nf_conn *ct)
 {
@@ -891,6 +918,11 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 
 	zone = nf_ct_zone(ct);
 
+	if (!nf_ct_ext_valid_pre(ct->ext)) {
+		NF_CT_STAT_INC(net, insert_failed);
+		return -ETIMEDOUT;
+	}
+
 	local_bh_disable();
 	do {
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
@@ -931,6 +963,13 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 	nf_conntrack_double_unlock(hash, reply_hash);
 	NF_CT_STAT_INC(net, insert);
 	local_bh_enable();
+
+	if (!nf_ct_ext_valid_post(ct->ext)) {
+		nf_ct_kill(ct);
+		NF_CT_STAT_INC(net, drop);
+		return -ETIMEDOUT;
+	}
+
 	return 0;
 chaintoolong:
 	NF_CT_STAT_INC(net, chaintoolong);
@@ -1198,6 +1237,11 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		return NF_DROP;
 	}
 
+	if (!nf_ct_ext_valid_pre(ct->ext)) {
+		NF_CT_STAT_INC(net, insert_failed);
+		goto dying;
+	}
+
 	pr_debug("Confirming conntrack %p\n", ct);
 	/* We have to check the DYING flag after unlink to prevent
 	 * a race against nf_ct_get_next_corpse() possibly called from
@@ -1254,6 +1298,16 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	nf_conntrack_double_unlock(hash, reply_hash);
 	local_bh_enable();
 
+	/* ext area is still valid (rcu read lock is held,
+	 * but will go out of scope soon, we need to remove
+	 * this conntrack again.
+	 */
+	if (!nf_ct_ext_valid_post(ct->ext)) {
+		nf_ct_kill(ct);
+		NF_CT_STAT_INC(net, drop);
+		return NF_DROP;
+	}
+
 	help = nfct_help(ct);
 	if (help && help->helper)
 		nf_conntrack_event_cache(IPCT_HELPER, ct);
@@ -2491,6 +2545,7 @@ nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
 	 */
 	synchronize_net();
 
+	nf_ct_ext_bump_genid();
 	nf_ct_iterate_cleanup(iter, data, 0, 0);
 }
 EXPORT_SYMBOL_GPL(nf_ct_iterate_destroy);
diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
index 1296fda54ac6..0b513f7bf9f3 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -27,6 +27,8 @@
 
 #define NF_CT_EXT_PREALLOC	128u /* conntrack events are on by default */
 
+atomic_t nf_conntrack_ext_genid __read_mostly = ATOMIC_INIT(1);
+
 static const u8 nf_ct_ext_type_len[NF_CT_EXT_NUM] = {
 	[NF_CT_EXT_HELPER] = sizeof(struct nf_conn_help),
 #if IS_ENABLED(CONFIG_NF_NAT)
@@ -116,8 +118,10 @@ void *nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp)
 	if (!new)
 		return NULL;
 
-	if (!ct->ext)
+	if (!ct->ext) {
 		memset(new->offset, 0, sizeof(new->offset));
+		new->gen_id = atomic_read(&nf_conntrack_ext_genid);
+	}
 
 	new->offset[id] = newoff;
 	new->len = newlen;
@@ -127,3 +131,29 @@ void *nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp)
 	return (void *)new + newoff;
 }
 EXPORT_SYMBOL(nf_ct_ext_add);
+
+/* Use nf_ct_ext_find wrapper. This is only useful for unconfirmed entries. */
+void *__nf_ct_ext_find(const struct nf_ct_ext *ext, u8 id)
+{
+	unsigned int gen_id = atomic_read(&nf_conntrack_ext_genid);
+	unsigned int this_id = READ_ONCE(ext->gen_id);
+
+	if (!__nf_ct_ext_exist(ext, id))
+		return NULL;
+
+	if (this_id == 0 || ext->gen_id == gen_id)
+		return (void *)ext + ext->offset[id];
+
+	return NULL;
+}
+EXPORT_SYMBOL(__nf_ct_ext_find);
+
+void nf_ct_ext_bump_genid(void)
+{
+	unsigned int value = atomic_inc_return(&nf_conntrack_ext_genid);
+
+	if (value == UINT_MAX)
+		atomic_set(&nf_conntrack_ext_genid, 1);
+
+	msleep(HZ);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 13/16] netfilter: cttimeout: decouple unlink and free on netns destruction
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (11 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 12/16] netfilter: extensions: introduce extension genid count Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 14/16] netfilter: conntrack: remove __nf_ct_unconfirmed_destroy Florian Westphal
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Increment the extid on module removal; this makes sure that even
in extreme cases any old uncofirmed entry that happened to be kept
e.g. on nfnetlink_queue list will not trip over a stale timeout
reference.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nfnetlink_cttimeout.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 49fe2ec40af9..5d49fbeabd86 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -657,12 +657,24 @@ static int __init cttimeout_init(void)
 	return ret;
 }
 
+static int untimeout(struct nf_conn *ct, void *timeout)
+{
+	struct nf_conn_timeout *timeout_ext = nf_ct_timeout_find(ct);
+
+	if (timeout_ext)
+		RCU_INIT_POINTER(timeout_ext->timeout, NULL);
+
+	return 0;
+}
+
 static void __exit cttimeout_exit(void)
 {
 	nfnetlink_subsys_unregister(&cttimeout_subsys);
 
 	unregister_pernet_subsys(&cttimeout_ops);
 	RCU_INIT_POINTER(nf_ct_timeout_hook, NULL);
+
+	nf_ct_iterate_destroy(untimeout, NULL);
 	synchronize_rcu();
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 14/16] netfilter: conntrack: remove __nf_ct_unconfirmed_destroy
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (12 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 13/16] netfilter: cttimeout: decouple unlink and free on netns destruction Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 15/16] netfilter: conntrack: remove unconfirmed list Florian Westphal
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Its not needed anymore:

A. If entry is totally new, then the rcu-protected resource
must already have been removed from global visibility before call
to nf_ct_iterate_destroy.

B. If entry was allocated before, but is not yet in the hash table
   (uncofirmed case), genid gets incremented and synchronize_rcu() call
   makes sure access has completed.

C. Next attempt to peek at extension area will fail for unconfirmed
  conntracks, because ext->genid != genid.

D. Conntracks in the hash are iterated as before.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c   | 46 ++++++++---------------------
 net/netfilter/nf_conntrack_helper.c |  5 ----
 net/netfilter/nfnetlink_cttimeout.c |  1 -
 3 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 76310940cbd7..7b4b3f5db959 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2457,34 +2457,6 @@ static int iter_net_only(struct nf_conn *i, void *data)
 	return d->iter(i, d->data);
 }
 
-static void
-__nf_ct_unconfirmed_destroy(struct net *net)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct nf_conntrack_tuple_hash *h;
-		struct hlist_nulls_node *n;
-		struct ct_pcpu *pcpu;
-
-		pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
-
-		spin_lock_bh(&pcpu->lock);
-		hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
-			struct nf_conn *ct;
-
-			ct = nf_ct_tuplehash_to_ctrack(h);
-
-			/* we cannot call iter() on unconfirmed list, the
-			 * owning cpu can reallocate ct->ext at any time.
-			 */
-			set_bit(IPS_DYING_BIT, &ct->status);
-		}
-		spin_unlock_bh(&pcpu->lock);
-		cond_resched();
-	}
-}
-
 void nf_ct_iterate_cleanup_net(struct net *net,
 			       int (*iter)(struct nf_conn *i, void *data),
 			       void *data, u32 portid, int report)
@@ -2527,26 +2499,34 @@ nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
 
 		if (atomic_read(&cnet->count) == 0)
 			continue;
-		__nf_ct_unconfirmed_destroy(net);
 		nf_queue_nf_hook_drop(net);
 	}
 	up_read(&net_rwsem);
 
 	/* Need to wait for netns cleanup worker to finish, if its
 	 * running -- it might have deleted a net namespace from
-	 * the global list, so our __nf_ct_unconfirmed_destroy() might
-	 * not have affected all namespaces.
+	 * the global list, so hook drop above might not have
+	 * affected all namespaces.
 	 */
 	net_ns_barrier();
 
-	/* a conntrack could have been unlinked from unconfirmed list
-	 * before we grabbed pcpu lock in __nf_ct_unconfirmed_destroy().
+	/* a skb w. unconfirmed conntrack could have been reinjected just
+	 * before we called nf_queue_nf_hook_drop().
+	 *
 	 * This makes sure its inserted into conntrack table.
 	 */
 	synchronize_net();
 
 	nf_ct_ext_bump_genid();
 	nf_ct_iterate_cleanup(iter, data, 0, 0);
+
+	/* Another cpu might be in a rcu read section with
+	 * rcu protected pointer cleared in iter callback
+	 * or hidden via nf_ct_ext_bump_genid() above.
+	 *
+	 * Wait until those are done.
+	 */
+	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(nf_ct_iterate_destroy);
 
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 8dec42ec603e..c12a87ebc3ee 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -468,11 +468,6 @@ void nf_conntrack_helper_unregister(struct nf_conntrack_helper *me)
 
 	nf_ct_expect_iterate_destroy(expect_iter_me, NULL);
 	nf_ct_iterate_destroy(unhelp, me);
-
-	/* Maybe someone has gotten the helper already when unhelp above.
-	 * So need to wait it.
-	 */
-	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_unregister);
 
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 5d49fbeabd86..17fe12bfb6b0 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -675,7 +675,6 @@ static void __exit cttimeout_exit(void)
 	RCU_INIT_POINTER(nf_ct_timeout_hook, NULL);
 
 	nf_ct_iterate_destroy(untimeout, NULL);
-	synchronize_rcu();
 }
 
 module_init(cttimeout_init);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 15/16] netfilter: conntrack: remove unconfirmed list
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (13 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 14/16] netfilter: conntrack: remove __nf_ct_unconfirmed_destroy Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-03-23 13:22 ` [PATCH nf-next v3 16/16] netfilter: conntrack: avoid unconditional local_bh_disable Florian Westphal
  2022-04-08  9:56 ` [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Pablo Neira Ayuso
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

It has no function anymore and can be removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack.h |  1 -
 include/net/netns/conntrack.h        |  6 ----
 net/netfilter/nf_conntrack_core.c    | 54 ++--------------------------
 net/netfilter/nf_conntrack_netlink.c | 44 +----------------------
 4 files changed, 3 insertions(+), 102 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index f60212244b13..3ce9a5b42fe5 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -101,7 +101,6 @@ struct nf_conn {
 	/* Have we seen traffic both ways yet? (bitset) */
 	unsigned long status;
 
-	u16		cpu;
 	possible_net_t ct_net;
 
 #if IS_ENABLED(CONFIG_NF_NAT)
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index dd1d096b2ada..22d8d1909a3b 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -93,11 +93,6 @@ struct nf_ip_net {
 #endif
 };
 
-struct ct_pcpu {
-	spinlock_t		lock;
-	struct hlist_nulls_head unconfirmed;
-};
-
 struct netns_ct {
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
 	bool ecache_dwork_pending;
@@ -109,7 +104,6 @@ struct netns_ct {
 	u8			sysctl_tstamp;
 	u8			sysctl_checksum;
 
-	struct ct_pcpu __percpu *pcpu_lists;
 	struct ip_conntrack_stat __percpu *stat;
 	struct nf_ip_net	nf_ct_proto;
 #if defined(CONFIG_NF_CONNTRACK_LABELS)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7b4b3f5db959..c8d58d6d5b87 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -525,35 +525,6 @@ clean_from_lists(struct nf_conn *ct)
 	nf_ct_remove_expectations(ct);
 }
 
-/* must be called with local_bh_disable */
-static void nf_ct_add_to_unconfirmed_list(struct nf_conn *ct)
-{
-	struct ct_pcpu *pcpu;
-
-	/* add this conntrack to the (per cpu) unconfirmed list */
-	ct->cpu = smp_processor_id();
-	pcpu = per_cpu_ptr(nf_ct_net(ct)->ct.pcpu_lists, ct->cpu);
-
-	spin_lock(&pcpu->lock);
-	hlist_nulls_add_head(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
-			     &pcpu->unconfirmed);
-	spin_unlock(&pcpu->lock);
-}
-
-/* must be called with local_bh_disable */
-static void nf_ct_del_from_unconfirmed_list(struct nf_conn *ct)
-{
-	struct ct_pcpu *pcpu;
-
-	/* We overload first tuple to link into unconfirmed list.*/
-	pcpu = per_cpu_ptr(nf_ct_net(ct)->ct.pcpu_lists, ct->cpu);
-
-	spin_lock(&pcpu->lock);
-	BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
-	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
-	spin_unlock(&pcpu->lock);
-}
-
 #define NFCT_ALIGN(len)	(((len) + NFCT_INFOMASK) & ~NFCT_INFOMASK)
 
 /* Released via nf_ct_destroy() */
@@ -633,9 +604,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 	 */
 	nf_ct_remove_expectations(ct);
 
-	if (unlikely(!nf_ct_is_confirmed(ct)))
-		nf_ct_del_from_unconfirmed_list(ct);
-
 	local_bh_enable();
 
 	if (ct->master)
@@ -1248,7 +1216,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	 * user context, else we insert an already 'dead' hash, blocking
 	 * further use of that particular connection -JM.
 	 */
-	nf_ct_del_from_unconfirmed_list(ct);
 	ct->status |= IPS_CONFIRMED;
 
 	if (unlikely(nf_ct_is_dying(ct))) {
@@ -1803,9 +1770,8 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 	if (!exp)
 		__nf_ct_try_assign_helper(ct, tmpl, GFP_ATOMIC);
 
-	/* Now it is inserted into the unconfirmed list, set refcount to 1. */
+	/* Now it is going to be associated with an sk_buff, set refcount to 1. */
 	refcount_set(&ct->ct_general.use, 1);
-	nf_ct_add_to_unconfirmed_list(ct);
 
 	local_bh_enable();
 
@@ -2594,7 +2560,6 @@ void nf_conntrack_cleanup_net_list(struct list_head *net_exit_list)
 		nf_conntrack_ecache_pernet_fini(net);
 		nf_conntrack_expect_pernet_fini(net);
 		free_percpu(net->ct.stat);
-		free_percpu(net->ct.pcpu_lists);
 	}
 }
 
@@ -2805,26 +2770,14 @@ int nf_conntrack_init_net(struct net *net)
 {
 	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
 	int ret = -ENOMEM;
-	int cpu;
 
 	BUILD_BUG_ON(IP_CT_UNTRACKED == IP_CT_NUMBER);
 	BUILD_BUG_ON_NOT_POWER_OF_2(CONNTRACK_LOCKS);
 	atomic_set(&cnet->count, 0);
 
-	net->ct.pcpu_lists = alloc_percpu(struct ct_pcpu);
-	if (!net->ct.pcpu_lists)
-		goto err_stat;
-
-	for_each_possible_cpu(cpu) {
-		struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
-
-		spin_lock_init(&pcpu->lock);
-		INIT_HLIST_NULLS_HEAD(&pcpu->unconfirmed, UNCONFIRMED_NULLS_VAL);
-	}
-
 	net->ct.stat = alloc_percpu(struct ip_conntrack_stat);
 	if (!net->ct.stat)
-		goto err_pcpu_lists;
+		return ret;
 
 	ret = nf_conntrack_expect_pernet_init(net);
 	if (ret < 0)
@@ -2840,8 +2793,5 @@ int nf_conntrack_init_net(struct net *net)
 
 err_expect:
 	free_percpu(net->ct.stat);
-err_pcpu_lists:
-	free_percpu(net->ct.pcpu_lists);
-err_stat:
 	return ret;
 }
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 51e387e8da85..fef48ee86d6d 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1752,49 +1752,7 @@ static int ctnetlink_dump_one_entry(struct sk_buff *skb,
 static int
 ctnetlink_dump_unconfirmed(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx;
-	struct nf_conn *ct, *last;
-	struct nf_conntrack_tuple_hash *h;
-	struct hlist_nulls_node *n;
-	struct net *net = sock_net(skb->sk);
-	int res, cpu;
-
-	if (ctx->done)
-		return 0;
-
-	last = ctx->last;
-
-	for (cpu = ctx->cpu; cpu < nr_cpu_ids; cpu++) {
-		struct ct_pcpu *pcpu;
-
-		if (!cpu_possible(cpu))
-			continue;
-
-		pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
-		spin_lock_bh(&pcpu->lock);
-restart:
-		hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
-			ct = nf_ct_tuplehash_to_ctrack(h);
-
-			res = ctnetlink_dump_one_entry(skb, cb, ct, false);
-			if (res < 0) {
-				ctx->cpu = cpu;
-				spin_unlock_bh(&pcpu->lock);
-				goto out;
-			}
-		}
-		if (ctx->last) {
-			ctx->last = NULL;
-			goto restart;
-		}
-		spin_unlock_bh(&pcpu->lock);
-	}
-	ctx->done = true;
-out:
-	if (last)
-		nf_ct_put(last);
-
-	return skb->len;
+	return 0;
 }
 
 static int
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH nf-next v3 16/16] netfilter: conntrack: avoid unconditional local_bh_disable
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (14 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 15/16] netfilter: conntrack: remove unconfirmed list Florian Westphal
@ 2022-03-23 13:22 ` Florian Westphal
  2022-04-08  9:56 ` [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Pablo Neira Ayuso
  16 siblings, 0 replies; 23+ messages in thread
From: Florian Westphal @ 2022-03-23 13:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Now that the conntrack entry isn't placed on the pcpu list anymore the
bh only needs to be disabled in the 'expectation present' case.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index c8d58d6d5b87..6e59a35a29b9 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1739,10 +1739,9 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 				 ecache ? ecache->expmask : 0,
 			     GFP_ATOMIC);
 
-	local_bh_disable();
 	cnet = nf_ct_pernet(net);
 	if (cnet->expect_count) {
-		spin_lock(&nf_conntrack_expect_lock);
+		spin_lock_bh(&nf_conntrack_expect_lock);
 		exp = nf_ct_find_expectation(net, zone, tuple);
 		if (exp) {
 			pr_debug("expectation arrives ct=%p exp=%p\n",
@@ -1765,7 +1764,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 #endif
 			NF_CT_STAT_INC(net, expect_new);
 		}
-		spin_unlock(&nf_conntrack_expect_lock);
+		spin_unlock_bh(&nf_conntrack_expect_lock);
 	}
 	if (!exp)
 		__nf_ct_try_assign_helper(ct, tmpl, GFP_ATOMIC);
@@ -1773,8 +1772,6 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 	/* Now it is going to be associated with an sk_buff, set refcount to 1. */
 	refcount_set(&ct->ct_general.use, 1);
 
-	local_bh_enable();
-
 	if (exp) {
 		if (exp->expectfn)
 			exp->expectfn(ct, exp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH nf-next v3 09/16] netfilter: nfnetlink_cttimeout: use rcu protection in cttimeout_get_timeout
  2022-03-23 13:22 ` [PATCH nf-next v3 09/16] netfilter: nfnetlink_cttimeout: use rcu protection in cttimeout_get_timeout Florian Westphal
@ 2022-04-08  9:53   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-08  9:53 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, Mar 23, 2022 at 02:22:07PM +0100, Florian Westphal wrote:
> I'd like to be able to switch lifetime management of ctnl_timeout
> to free-on-zero-refcount.
> 
> This isn't possible at the moment because removal of the structures
> from the pernet list requires the nfnl mutex and release may happen from
> softirq.
> 
> Current solution is to prevent this by disallowing policy object removal
> if the refcount is > 1 (i.e., policy is still referenced from the ruleset).
> 
> Switch traversal to rcu-read-lock as a first step to reduce reliance on
> nfnl mutex protection: removal from softirq would require a extra list
> spinlock.

Needs .type = NFNL_CB_RCU?

> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nfnetlink_cttimeout.c | 27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
> index eea486f32971..aef2547bb579 100644
> --- a/net/netfilter/nfnetlink_cttimeout.c
> +++ b/net/netfilter/nfnetlink_cttimeout.c
> @@ -253,6 +253,7 @@ static int cttimeout_get_timeout(struct sk_buff *skb,
>  				 const struct nlattr * const cda[])
>  {
>  	struct nfct_timeout_pernet *pernet = nfct_timeout_pernet(info->net);
> +	struct sk_buff *skb2;
>  	int ret = -ENOENT;
>  	char *name;
>  	struct ctnl_timeout *cur;
> @@ -268,31 +269,31 @@ static int cttimeout_get_timeout(struct sk_buff *skb,
>  		return -EINVAL;
>  	name = nla_data(cda[CTA_TIMEOUT_NAME]);
>  
> -	list_for_each_entry(cur, &pernet->nfct_timeout_list, head) {
> -		struct sk_buff *skb2;
> +	skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!skb2)
> +		return -ENOMEM;
> +
> +	rcu_read_lock();
>  
> +	list_for_each_entry_rcu(cur, &pernet->nfct_timeout_list, head) {
>  		if (strncmp(cur->name, name, CTNL_TIMEOUT_NAME_MAX) != 0)
>  			continue;
>  
> -		skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> -		if (skb2 == NULL) {
> -			ret = -ENOMEM;
> -			break;
> -		}
> -
>  		ret = ctnl_timeout_fill_info(skb2, NETLINK_CB(skb).portid,
>  					     info->nlh->nlmsg_seq,
>  					     NFNL_MSG_TYPE(info->nlh->nlmsg_type),
>  					     IPCTNL_MSG_TIMEOUT_NEW, cur);
> -		if (ret <= 0) {
> -			kfree_skb(skb2);
> +		if (ret <= 0)
>  			break;
> -		}
>  
> -		ret = nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid);
> -		break;
> +		rcu_read_unlock();
> +
> +		return nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid);
>  	}
>  
> +	rcu_read_unlock();
> +	kfree_skb(skb2);
> +
>  	return ret;
>  }
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists
  2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
                   ` (15 preceding siblings ...)
  2022-03-23 13:22 ` [PATCH nf-next v3 16/16] netfilter: conntrack: avoid unconditional local_bh_disable Florian Westphal
@ 2022-04-08  9:56 ` Pablo Neira Ayuso
  2022-04-08  9:59   ` Pablo Neira Ayuso
  16 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-08  9:56 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, Mar 23, 2022 at 02:21:58PM +0100, Florian Westphal wrote:
> This series removes the unconfirmed and dying percpu lists.
> 
> Dying list is replaced by pernet list, only used when reliable event
> delivery mode was requested.
> 
> Unconfirmed list is replaced by a generation id for the conntrack
> extesions, to detect when pointers to external objects (timeout policy,
> helper, ...) has gone stale.
> 
> An alternative to the genid would be to always take references on
> such external objects, let me know if that is the preferred solution.

Applied 1, 2, 3, 5, 6 and 8.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists
  2022-04-08  9:56 ` [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Pablo Neira Ayuso
@ 2022-04-08  9:59   ` Pablo Neira Ayuso
  2022-04-08 10:05     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-08  9:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 08, 2022 at 11:56:09AM +0200, Pablo Neira Ayuso wrote:
> On Wed, Mar 23, 2022 at 02:21:58PM +0100, Florian Westphal wrote:
> > This series removes the unconfirmed and dying percpu lists.
> > 
> > Dying list is replaced by pernet list, only used when reliable event
> > delivery mode was requested.
> > 
> > Unconfirmed list is replaced by a generation id for the conntrack
> > extesions, to detect when pointers to external objects (timeout policy,
> > helper, ...) has gone stale.
> > 
> > An alternative to the genid would be to always take references on
> > such external objects, let me know if that is the preferred solution.
> 
> Applied 1, 2, 3, 5, 6 and 8.

Not 6 actually, since it depends on 4.

So I'm taking the preparation patches of this batch.

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists
  2022-04-08  9:59   ` Pablo Neira Ayuso
@ 2022-04-08 10:05     ` Pablo Neira Ayuso
  2022-04-08 10:09       ` Pablo Neira Ayuso
  2022-04-08 10:11       ` Pablo Neira Ayuso
  0 siblings, 2 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-08 10:05 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 08, 2022 at 11:59:59AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Apr 08, 2022 at 11:56:09AM +0200, Pablo Neira Ayuso wrote:
> > On Wed, Mar 23, 2022 at 02:21:58PM +0100, Florian Westphal wrote:
> > > This series removes the unconfirmed and dying percpu lists.
> > > 
> > > Dying list is replaced by pernet list, only used when reliable event
> > > delivery mode was requested.
> > > 
> > > Unconfirmed list is replaced by a generation id for the conntrack
> > > extesions, to detect when pointers to external objects (timeout policy,
> > > helper, ...) has gone stale.
> > > 
> > > An alternative to the genid would be to always take references on
> > > such external objects, let me know if that is the preferred solution.
> > 
> > Applied 1, 2, 3, 5, 6 and 8.
> 
> Not 6 actually, since it depends on 4.
> 
> So I'm taking the preparation patches of this batch.

Wait. Can we possibly set a dummy event handler instead?

void nf_conntrack_unregister_notifier(void)
{
        rcu_assign_pointer(nf_conntrack_event_cb, nfct_event_null_handler);
}

which does nothing?

It also needs to be set on initially to this null event handler?

So we can avoid the stash trick in nfnetlink too?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists
  2022-04-08 10:05     ` Pablo Neira Ayuso
@ 2022-04-08 10:09       ` Pablo Neira Ayuso
  2022-04-08 10:11       ` Pablo Neira Ayuso
  1 sibling, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-08 10:09 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 08, 2022 at 12:05:38PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Apr 08, 2022 at 11:59:59AM +0200, Pablo Neira Ayuso wrote:
> > On Fri, Apr 08, 2022 at 11:56:09AM +0200, Pablo Neira Ayuso wrote:
> > > On Wed, Mar 23, 2022 at 02:21:58PM +0100, Florian Westphal wrote:
> > > > This series removes the unconfirmed and dying percpu lists.
> > > > 
> > > > Dying list is replaced by pernet list, only used when reliable event
> > > > delivery mode was requested.
> > > > 
> > > > Unconfirmed list is replaced by a generation id for the conntrack
> > > > extesions, to detect when pointers to external objects (timeout policy,
> > > > helper, ...) has gone stale.
> > > > 
> > > > An alternative to the genid would be to always take references on
> > > > such external objects, let me know if that is the preferred solution.
> > > 
> > > Applied 1, 2, 3, 5, 6 and 8.
> > 
> > Not 6 actually, since it depends on 4.
> > 
> > So I'm taking the preparation patches of this batch.
> 
> Wait. Can we possibly set a dummy event handler instead?
> 
> void nf_conntrack_unregister_notifier(void)
> {
>         rcu_assign_pointer(nf_conntrack_event_cb, nfct_event_null_handler);
> }
> 
> which does nothing?
> 
> It also needs to be set on initially to this null event handler?
> 
> So we can avoid the stash trick in nfnetlink too?

So I'm taking 3, 5 and 8 at this stage.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists
  2022-04-08 10:05     ` Pablo Neira Ayuso
  2022-04-08 10:09       ` Pablo Neira Ayuso
@ 2022-04-08 10:11       ` Pablo Neira Ayuso
  1 sibling, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-08 10:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 08, 2022 at 12:05:38PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Apr 08, 2022 at 11:59:59AM +0200, Pablo Neira Ayuso wrote:
> > On Fri, Apr 08, 2022 at 11:56:09AM +0200, Pablo Neira Ayuso wrote:
> > > On Wed, Mar 23, 2022 at 02:21:58PM +0100, Florian Westphal wrote:
> > > > This series removes the unconfirmed and dying percpu lists.
> > > > 
> > > > Dying list is replaced by pernet list, only used when reliable event
> > > > delivery mode was requested.
> > > > 
> > > > Unconfirmed list is replaced by a generation id for the conntrack
> > > > extesions, to detect when pointers to external objects (timeout policy,
> > > > helper, ...) has gone stale.
> > > > 
> > > > An alternative to the genid would be to always take references on
> > > > such external objects, let me know if that is the preferred solution.
> > > 
> > > Applied 1, 2, 3, 5, 6 and 8.
> > 
> > Not 6 actually, since it depends on 4.
> > 
> > So I'm taking the preparation patches of this batch.
> 
> Wait. Can we possibly set a dummy event handler instead?
> 
> void nf_conntrack_unregister_notifier(void)
> {
>         rcu_assign_pointer(nf_conntrack_event_cb, nfct_event_null_handler);
> }
> 
> which does nothing?
> 
> It also needs to be set on initially to this null event handler?
> 
> So we can avoid the stash trick in nfnetlink too?

Forget this idea, we can't, this event handler is again global.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-04-08 10:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23 13:21 [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Florian Westphal
2022-03-23 13:21 ` [PATCH nf-next v3 01/16] nfnetlink: handle already-released nl socket Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 02/16] netfilter: ctnetlink: make ecache event cb global again Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 03/16] netfilter: ecache: move to separate structure Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 04/16] netfilter: ecache: use dedicated list for event redelivery Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 05/16] netfilter: conntrack: split inner loop of list dumping to own function Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 06/16] netfilter: conntrack: include ecache dying list in dumps Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 07/16] netfilter: conntrack: remove the percpu dying list Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 08/16] netfilter: cttimeout: inc/dec module refcount per object, not per use refcount Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 09/16] netfilter: nfnetlink_cttimeout: use rcu protection in cttimeout_get_timeout Florian Westphal
2022-04-08  9:53   ` Pablo Neira Ayuso
2022-03-23 13:22 ` [PATCH nf-next v3 10/16] netfilter: cttimeout: decouple unlink and free on netns destruction Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 11/16] netfilter: remove nf_ct_unconfirmed_destroy helper Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 12/16] netfilter: extensions: introduce extension genid count Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 13/16] netfilter: cttimeout: decouple unlink and free on netns destruction Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 14/16] netfilter: conntrack: remove __nf_ct_unconfirmed_destroy Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 15/16] netfilter: conntrack: remove unconfirmed list Florian Westphal
2022-03-23 13:22 ` [PATCH nf-next v3 16/16] netfilter: conntrack: avoid unconditional local_bh_disable Florian Westphal
2022-04-08  9:56 ` [PATCH nf-next v3 00/16] netfilter: conntrack: remove percpu lists Pablo Neira Ayuso
2022-04-08  9:59   ` Pablo Neira Ayuso
2022-04-08 10:05     ` Pablo Neira Ayuso
2022-04-08 10:09       ` Pablo Neira Ayuso
2022-04-08 10:11       ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.