All of lore.kernel.org
 help / color / mirror / Atom feed
* rtnetlink: fix dump+module unload races, take 2
@ 2017-11-06 10:51 Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 1/8] rtnetlink: Revert "rtnetlink: add reference counting to prevent module unload while dump is in progress" Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz

Peter Zijlstra reported:
----------
  I just ran across commit:
  019a316992ee ("rtnetlink: add reference counting to prevent module unload while dump is in progress")
  And that commit is _completely_ broken.

  1) it not in fact a refcount, so using refcount_t is silly
  2) there is a distinct lack of memory barriers, so we can easily
     observe the decrement while the msg_handler is still in progress.
  3) waiting with a schedule()/yield() loop is complete crap and subject
     life-locks, imagine doing that rtnl_unregister_all() from a RT task.
----------

Moreover, the commit doesn't work.

Problem is that netlink keeps the dump function address in the control
block to resume dumps, i.e. when netlink_dump_start() returns then
we may not be done with the dump, it can be restarted on next recv.

netlink already has a mechanism to prevent module unload while
dumps are in progress: netlink_dump_control struct contains a pointer
to struct module, and netlink will take care of module_get/put as needed
if that is set.

To make use of this however we need to store pointer to module that
registered the particular dump function.  This series does that.

First, revert the broken commit to start from scratch.
Second patch adds new rtnl_register_module() that takes pointer to
the owning module.

Rest of patches convert affected modules.
IPv6 is not converted as ipv6 module cannot be unloaded.

One alternative to this series would be to decrement
rtnl_msg_handlers_ref count via netlink_dump_control->done(), but that
doesn't address any of Peters points.

I am submitting this for -next for 2 reasons:
1. This problem has existed for so long its evidently not a big deal
2. I think its a bad idea to cram this in at last second in current
development cycle.

ps: I will be travelling, replies will be delayed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 1/8] rtnetlink: Revert "rtnetlink: add reference counting to prevent module unload while dump is in progress"
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 2/8] rtnetlink: add rtnl_register_module Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

This reverts commit 019a316992ee0d9832b1c480c899d6bdf2a0a77e.

Peter Zijlstra reported:
  1) it not in fact a refcount, so using refcount_t is silly
  2) there is a distinct lack of memory barriers, so we can easily
     observe the decrement while the msg_handler is still in progress.
  3) waiting with a schedule()/yield() loop is complete crap and subject
     life-locks, imagine doing that rtnl_unregister_all() from a RT task.
----------

Moreover, the commit doesn't even work.
Problem is that netlink keeps the dump function address in the control
block to resume dumps, i.e. when netlink_dump_start() returns then
we may not be done with the dump, it can be restarted on next recv.

First, revert the broken commit to start from scratch.
Next patch will allow to store a pointer the owning module so we can
let netlink infrastructure grab a refcount to module while dump function
address is in use.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/core/rtnetlink.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index dc5ad84ac096..c70f62137dd8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -128,7 +128,6 @@ EXPORT_SYMBOL(lockdep_rtnl_is_held);
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
 static struct rtnl_link __rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];
-static refcount_t rtnl_msg_handlers_ref[RTNL_FAMILY_MAX + 1];
 
 static inline int rtm_msgindex(int msgtype)
 {
@@ -262,8 +261,6 @@ void rtnl_unregister_all(int protocol)
 
 	synchronize_net();
 
-	while (refcount_read(&rtnl_msg_handlers_ref[protocol]) > 1)
-		schedule();
 	kfree(handlers);
 }
 EXPORT_SYMBOL_GPL(rtnl_unregister_all);
@@ -4365,8 +4362,6 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 				goto err_unlock;
 		}
 
-		refcount_inc(&rtnl_msg_handlers_ref[family]);
-
 		if (type == RTM_GETLINK - RTM_BASE)
 			min_dump_alloc = rtnl_calcit(skb, nlh);
 
@@ -4380,7 +4375,6 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 			};
 			err = netlink_dump_start(rtnl, skb, nlh, &c);
 		}
-		refcount_dec(&rtnl_msg_handlers_ref[family]);
 		return err;
 	}
 
@@ -4392,12 +4386,10 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	flags = READ_ONCE(handlers[type].flags);
 	if (flags & RTNL_FLAG_DOIT_UNLOCKED) {
-		refcount_inc(&rtnl_msg_handlers_ref[family]);
 		doit = READ_ONCE(handlers[type].doit);
 		rcu_read_unlock();
 		if (doit)
 			err = doit(skb, nlh, extack);
-		refcount_dec(&rtnl_msg_handlers_ref[family]);
 		return err;
 	}
 
@@ -4498,11 +4490,6 @@ static struct pernet_operations rtnetlink_net_ops = {
 
 void __init rtnetlink_init(void)
 {
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(rtnl_msg_handlers_ref); i++)
-		refcount_set(&rtnl_msg_handlers_ref[i], 1);
-
 	if (register_pernet_subsys(&rtnetlink_net_ops))
 		panic("rtnetlink_init: cannot initialize rtnetlink\n");
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 1/8] rtnetlink: Revert "rtnetlink: add reference counting to prevent module unload while dump is in progress" Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 12:44   ` Peter Zijlstra
  2017-11-06 10:51 ` [PATCH net-next 3/8] qtr: use rtnl_register_module Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Add yet another rtnl_register function.  It will be used by modules
that can be removed.

The passed module struct is used to take a reference while a netlink
dump is in progress to prevent module unload while netlink core can
invoke registered dumper function again.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/rtnetlink.h |   2 +
 net/core/rtnetlink.c    | 103 +++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 83 insertions(+), 22 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index ead018744ff5..e326b3f9eb5f 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -17,6 +17,8 @@ int __rtnl_register(int protocol, int msgtype,
 		    rtnl_doit_func, rtnl_dumpit_func, unsigned int flags);
 void rtnl_register(int protocol, int msgtype,
 		   rtnl_doit_func, rtnl_dumpit_func, unsigned int flags);
+int rtnl_register_module(struct module *owner, int protocol, int msgtype,
+			 rtnl_doit_func, rtnl_dumpit_func, unsigned int flags);
 int rtnl_unregister(int protocol, int msgtype);
 void rtnl_unregister_all(int protocol);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c70f62137dd8..50b483b24845 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -62,6 +62,7 @@
 struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
+	struct module		*owner;
 	unsigned int		flags;
 };
 
@@ -143,27 +144,10 @@ static inline int rtm_msgindex(int msgtype)
 	return msgindex;
 }
 
-/**
- * __rtnl_register - Register a rtnetlink message type
- * @protocol: Protocol family or PF_UNSPEC
- * @msgtype: rtnetlink message type
- * @doit: Function pointer called for each request message
- * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message
- * @flags: rtnl_link_flags to modifiy behaviour of doit/dumpit functions
- *
- * Registers the specified function pointers (at least one of them has
- * to be non-NULL) to be called whenever a request message for the
- * specified protocol family and message type is received.
- *
- * The special protocol family PF_UNSPEC may be used to define fallback
- * function pointers for the case when no entry for the specific protocol
- * family exists.
- *
- * Returns 0 on success or a negative error code.
- */
-int __rtnl_register(int protocol, int msgtype,
-		    rtnl_doit_func doit, rtnl_dumpit_func dumpit,
-		    unsigned int flags)
+static int rtnl_register_internal(struct module *owner,
+				  int protocol, int msgtype,
+				  rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+				  unsigned int flags)
 {
 	struct rtnl_link *tab;
 	int msgindex;
@@ -180,6 +164,12 @@ int __rtnl_register(int protocol, int msgtype,
 		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
 	}
 
+	WARN_ON(tab[msgindex].owner && tab[msgindex].owner != owner);
+
+	tab[msgindex].owner = owner;
+	/* make sure owner is always visible first */
+	smp_wmb();
+
 	if (doit)
 		tab[msgindex].doit = doit;
 	if (dumpit)
@@ -188,6 +178,60 @@ int __rtnl_register(int protocol, int msgtype,
 
 	return 0;
 }
+
+/**
+ * rtnl_register_module - Register a rtnetlink message type
+ *
+ * @owner: module registering the hook (THIS_MODULE)
+ * @protocol: Protocol family or PF_UNSPEC
+ * @msgtype: rtnetlink message type
+ * @doit: Function pointer called for each request message
+ * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message
+ * @flags: rtnl_link_flags to modifiy behaviour of doit/dumpit functions
+ *
+ * Like rtnl_register, but for use by removable modules.
+ */
+int rtnl_register_module(struct module *owner,
+			 int protocol, int msgtype,
+			 rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+			 unsigned int flags)
+{
+	int ret;
+
+	rtnl_lock();
+	ret = rtnl_register_internal(owner, protocol, msgtype,
+				     doit, dumpit, flags);
+	rtnl_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(rtnl_register_module);
+
+/**
+ * __rtnl_register - Register a rtnetlink message type
+ * @protocol: Protocol family or PF_UNSPEC
+ * @msgtype: rtnetlink message type
+ * @doit: Function pointer called for each request message
+ * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message
+ * @flags: rtnl_link_flags to modifiy behaviour of doit/dumpit functions
+ *
+ * Registers the specified function pointers (at least one of them has
+ * to be non-NULL) to be called whenever a request message for the
+ * specified protocol family and message type is received.
+ *
+ * The special protocol family PF_UNSPEC may be used to define fallback
+ * function pointers for the case when no entry for the specific protocol
+ * family exists.
+ *
+ * Returns 0 on success or a negative error code.
+ */
+int __rtnl_register(int protocol, int msgtype,
+		    rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+		    unsigned int flags)
+{
+	return rtnl_register_internal(NULL, protocol, msgtype,
+				      doit, dumpit, flags);
+}
 EXPORT_SYMBOL_GPL(__rtnl_register);
 
 /**
@@ -235,6 +279,9 @@ int rtnl_unregister(int protocol, int msgtype)
 	handlers[msgindex].doit = NULL;
 	handlers[msgindex].dumpit = NULL;
 	handlers[msgindex].flags = 0;
+	/* make sure we clear owner last */
+	smp_wmb();
+	handlers[msgindex].owner = NULL;
 	rtnl_unlock();
 
 	return 0;
@@ -4346,6 +4393,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
+		struct module *owner = NULL;
 		struct sock *rtnl;
 		rtnl_dumpit_func dumpit;
 		u16 min_dump_alloc = 0;
@@ -4360,20 +4408,31 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 			dumpit = READ_ONCE(handlers[type].dumpit);
 			if (!dumpit)
 				goto err_unlock;
+			owner = READ_ONCE(handlers[type].owner);
 		}
 
 		if (type == RTM_GETLINK - RTM_BASE)
 			min_dump_alloc = rtnl_calcit(skb, nlh);
 
+		err = 0;
+		/* need to do this before rcu_read_unlock() */
+		if (!try_module_get(owner))
+			err = -EPROTONOSUPPORT;
+
 		rcu_read_unlock();
 
 		rtnl = net->rtnl;
-		{
+		if (err == 0) {
 			struct netlink_dump_control c = {
 				.dump		= dumpit,
 				.min_dump_alloc	= min_dump_alloc,
+				.module		= owner,
 			};
 			err = netlink_dump_start(rtnl, skb, nlh, &c);
+			/* netlink_dump_start() will keep a reference on
+			 * module if dump is still in progress.
+			 */
+			module_put(owner);
 		}
 		return err;
 	}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 3/8] qtr: use rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 1/8] rtnetlink: Revert "rtnetlink: add reference counting to prevent module unload while dump is in progress" Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 2/8] rtnetlink: add rtnl_register_module Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 4/8] bridge: " Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/qrtr/qrtr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index e458ece96d3d..5098625469e0 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -1116,9 +1116,13 @@ static int __init qrtr_proto_init(void)
 		return rc;
 	}
 
-	rtnl_register(PF_QIPCRTR, RTM_NEWADDR, qrtr_addr_doit, NULL, 0);
+	rc = rtnl_register_module(THIS_MODULE, PF_QIPCRTR, RTM_NEWADDR, qrtr_addr_doit, NULL, 0);
+	if (rc) {
+		sock_unregister(qrtr_family.family);
+		proto_unregister(&qrtr_proto);
+	}
 
-	return 0;
+	return rc;
 }
 module_init(qrtr_proto_init);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 4/8] bridge: use rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
                   ` (2 preceding siblings ...)
  2017-11-06 10:51 ` [PATCH net-next 3/8] qtr: use rtnl_register_module Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 5/8] can: " Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_mdb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 31ddff22563e..f56eb480abb2 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -714,9 +714,9 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 void br_mdb_init(void)
 {
-	rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, 0);
-	rtnl_register(PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, 0);
-	rtnl_register(PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, 0);
+	rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, 0);
 }
 
 void br_mdb_uninit(void)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 5/8] can: use rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
                   ` (3 preceding siblings ...)
  2017-11-06 10:51 ` [PATCH net-next 4/8] bridge: " Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 6/8] decnet: " Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/can/gw.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/can/gw.c b/net/can/gw.c
index 73a02af4b5d7..398dd0395ad9 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -1014,6 +1014,8 @@ static struct pernet_operations cangw_pernet_ops = {
 
 static __init int cgw_module_init(void)
 {
+	int ret;
+
 	/* sanitize given module parameter */
 	max_hops = clamp_t(unsigned int, max_hops, CGW_MIN_HOPS, CGW_MAX_HOPS);
 
@@ -1031,15 +1033,19 @@ static __init int cgw_module_init(void)
 	notifier.notifier_call = cgw_notifier;
 	register_netdevice_notifier(&notifier);
 
-	if (__rtnl_register(PF_CAN, RTM_GETROUTE, NULL, cgw_dump_jobs, 0)) {
+	ret = rtnl_register_module(THIS_MODULE, PF_CAN, RTM_GETROUTE,
+				   NULL, cgw_dump_jobs, 0);
+	if (ret) {
 		unregister_netdevice_notifier(&notifier);
 		kmem_cache_destroy(cgw_cache);
 		return -ENOBUFS;
 	}
 
-	/* Only the first call to __rtnl_register can fail */
-	__rtnl_register(PF_CAN, RTM_NEWROUTE, cgw_create_job, NULL, 0);
-	__rtnl_register(PF_CAN, RTM_DELROUTE, cgw_remove_job, NULL, 0);
+	/* Only the first call to rtnl_register_module can fail */
+	rtnl_register_module(THIS_MODULE, PF_CAN, RTM_NEWROUTE,
+			     cgw_create_job, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_CAN, RTM_DELROUTE,
+			     cgw_remove_job, NULL, 0);
 
 	return 0;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 6/8] decnet: use rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
                   ` (4 preceding siblings ...)
  2017-11-06 10:51 ` [PATCH net-next 5/8] can: " Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 7/8] phonet: " Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 8/8] mpls: " Florian Westphal
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/decnet/dn_dev.c   | 9 ++++++---
 net/decnet/dn_fib.c   | 6 ++++--
 net/decnet/dn_route.c | 8 ++++----
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 9153247dad28..d1885cf59319 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -1418,9 +1418,12 @@ void __init dn_dev_init(void)
 
 	dn_dev_devices_on();
 
-	rtnl_register(PF_DECnet, RTM_NEWADDR, dn_nl_newaddr, NULL, 0);
-	rtnl_register(PF_DECnet, RTM_DELADDR, dn_nl_deladdr, NULL, 0);
-	rtnl_register(PF_DECnet, RTM_GETADDR, NULL, dn_nl_dump_ifaddr, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_NEWADDR,
+			     dn_nl_newaddr, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_DELADDR,
+			     dn_nl_deladdr, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_GETADDR,
+			     NULL, dn_nl_dump_ifaddr, 0);
 
 	proc_create("decnet_dev", S_IRUGO, init_net.proc_net, &dn_dev_seq_fops);
 
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index b37a1b833c77..fce94cbd4378 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -792,8 +792,10 @@ void __init dn_fib_init(void)
 
 	register_dnaddr_notifier(&dn_fib_dnaddr_notifier);
 
-	rtnl_register(PF_DECnet, RTM_NEWROUTE, dn_fib_rtm_newroute, NULL, 0);
-	rtnl_register(PF_DECnet, RTM_DELROUTE, dn_fib_rtm_delroute, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_NEWROUTE,
+			     dn_fib_rtm_newroute, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_DELROUTE,
+			     dn_fib_rtm_delroute, NULL, 0);
 }
 
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index bff5ab88cdbb..27bf6e5dc8d9 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1921,11 +1921,11 @@ void __init dn_route_init(void)
 		    &dn_rt_cache_seq_fops);
 
 #ifdef CONFIG_DECNET_ROUTER
-	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute,
-		      dn_fib_dump, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_GETROUTE,
+			     dn_cache_getroute, dn_fib_dump, 0);
 #else
-	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute,
-		      dn_cache_dump, 0);
+	rtnl_register_module(THIS_MODULE, PF_DECnet, RTM_GETROUTE,
+			     dn_cache_getroute, dn_cache_dump, 0);
 #endif
 }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 7/8] phonet: use rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
                   ` (5 preceding siblings ...)
  2017-11-06 10:51 ` [PATCH net-next 6/8] decnet: " Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  2017-11-06 10:51 ` [PATCH net-next 8/8] mpls: " Florian Westphal
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/phonet/pn_netlink.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index da754fc926e7..871eaf2cb85e 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -299,16 +299,21 @@ static int route_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 
 int __init phonet_netlink_register(void)
 {
-	int err = __rtnl_register(PF_PHONET, RTM_NEWADDR, addr_doit,
-				  NULL, 0);
+	int err = rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_NEWADDR,
+				       addr_doit, NULL, 0);
 	if (err)
 		return err;
 
-	/* Further __rtnl_register() cannot fail */
-	__rtnl_register(PF_PHONET, RTM_DELADDR, addr_doit, NULL, 0);
-	__rtnl_register(PF_PHONET, RTM_GETADDR, NULL, getaddr_dumpit, 0);
-	__rtnl_register(PF_PHONET, RTM_NEWROUTE, route_doit, NULL, 0);
-	__rtnl_register(PF_PHONET, RTM_DELROUTE, route_doit, NULL, 0);
-	__rtnl_register(PF_PHONET, RTM_GETROUTE, NULL, route_dumpit, 0);
+	/* Further rtnl_register_module() cannot fail */
+	rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_DELADDR,
+			     addr_doit, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_GETADDR,
+			     NULL, getaddr_dumpit, 0);
+	rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_NEWROUTE,
+			     route_doit, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_DELROUTE,
+			     route_doit, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_PHONET, RTM_GETROUTE,
+			     NULL, route_dumpit, 0);
 	return 0;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 8/8] mpls: use rtnl_register_module
  2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
                   ` (6 preceding siblings ...)
  2017-11-06 10:51 ` [PATCH net-next 7/8] phonet: " Florian Westphal
@ 2017-11-06 10:51 ` Florian Westphal
  7 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-06 10:51 UTC (permalink / raw)
  To: netdev; +Cc: peterz, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mpls/af_mpls.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8ca9915befc8..5dce8336d33f 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2510,12 +2510,15 @@ static int __init mpls_init(void)
 
 	rtnl_af_register(&mpls_af_ops);
 
-	rtnl_register(PF_MPLS, RTM_NEWROUTE, mpls_rtm_newroute, NULL, 0);
-	rtnl_register(PF_MPLS, RTM_DELROUTE, mpls_rtm_delroute, NULL, 0);
-	rtnl_register(PF_MPLS, RTM_GETROUTE, mpls_getroute, mpls_dump_routes,
-		      0);
-	rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
-		      mpls_netconf_dump_devconf, 0);
+	rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_NEWROUTE,
+			     mpls_rtm_newroute, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_DELROUTE,
+			     mpls_rtm_delroute, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_GETROUTE,
+			     mpls_getroute, mpls_dump_routes, 0);
+	rtnl_register_module(THIS_MODULE, PF_MPLS, RTM_GETNETCONF,
+			     mpls_netconf_get_devconf,
+			     mpls_netconf_dump_devconf, 0);
 	err = ipgre_tunnel_encap_add_mpls_ops();
 	if (err)
 		pr_err("Can't add mpls over gre tunnel ops\n");
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-06 10:51 ` [PATCH net-next 2/8] rtnetlink: add rtnl_register_module Florian Westphal
@ 2017-11-06 12:44   ` Peter Zijlstra
  2017-11-07  6:11     ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-11-06 12:44 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Mon, Nov 06, 2017 at 11:51:07AM +0100, Florian Westphal wrote:
> @@ -180,6 +164,12 @@ int __rtnl_register(int protocol, int msgtype,
>  		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
>  	}
>  
> +	WARN_ON(tab[msgindex].owner && tab[msgindex].owner != owner);
> +
> +	tab[msgindex].owner = owner;
> +	/* make sure owner is always visible first */
> +	smp_wmb();
> +
>  	if (doit)
>  		tab[msgindex].doit = doit;
>  	if (dumpit)

> @@ -235,6 +279,9 @@ int rtnl_unregister(int protocol, int msgtype)
>  	handlers[msgindex].doit = NULL;
>  	handlers[msgindex].dumpit = NULL;
>  	handlers[msgindex].flags = 0;
> +	/* make sure we clear owner last */
> +	smp_wmb();
> +	handlers[msgindex].owner = NULL;
>  	rtnl_unlock();
>  
>  	return 0;

These wmb()'s don't make sense; and the comments are incomplete. What do
they pair with? Who cares about this ordering?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-06 12:44   ` Peter Zijlstra
@ 2017-11-07  6:11     ` Florian Westphal
  2017-11-07  9:10       ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2017-11-07  6:11 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, netdev

Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Nov 06, 2017 at 11:51:07AM +0100, Florian Westphal wrote:
> > @@ -180,6 +164,12 @@ int __rtnl_register(int protocol, int msgtype,
> >  		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
> >  	}
> >  
> > +	WARN_ON(tab[msgindex].owner && tab[msgindex].owner != owner);
> > +
> > +	tab[msgindex].owner = owner;
> > +	/* make sure owner is always visible first */
> > +	smp_wmb();
> > +
> >  	if (doit)
> >  		tab[msgindex].doit = doit;
> >  	if (dumpit)
> 
> > @@ -235,6 +279,9 @@ int rtnl_unregister(int protocol, int msgtype)
> >  	handlers[msgindex].doit = NULL;
> >  	handlers[msgindex].dumpit = NULL;
> >  	handlers[msgindex].flags = 0;
> > +	/* make sure we clear owner last */
> > +	smp_wmb();
> > +	handlers[msgindex].owner = NULL;
> >  	rtnl_unlock();
> >  
> >  	return 0;
> 
> These wmb()'s don't make sense; and the comments are incomplete. What do
> they pair with? Who cares about this ordering?

rtnetlink_rcv_msg:

4406                         dumpit = READ_ONCE(handlers[type].dumpit);
4407                         if (!dumpit)
4408                                 goto err_unlock;
4409                         owner = READ_ONCE(handlers[type].owner);
4410                 }
..
4417                 if (!try_module_get(owner))
4418                         err = -EPROTONOSUPPORT;
4419 

I don't want dumpit function address to be visible before owner.
Does that make sense?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07  6:11     ` Florian Westphal
@ 2017-11-07  9:10       ` Peter Zijlstra
  2017-11-07  9:24         ` Peter Zijlstra
  2017-11-07  9:43         ` Florian Westphal
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-11-07  9:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Tue, Nov 07, 2017 at 07:11:56AM +0100, Florian Westphal wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Nov 06, 2017 at 11:51:07AM +0100, Florian Westphal wrote:
> > > @@ -180,6 +164,12 @@ int __rtnl_register(int protocol, int msgtype,
> > >  		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
> > >  	}
> > >  
> > > +	WARN_ON(tab[msgindex].owner && tab[msgindex].owner != owner);
> > > +
> > > +	tab[msgindex].owner = owner;
> > > +	/* make sure owner is always visible first */
> > > +	smp_wmb();
> > > +
> > >  	if (doit)
> > >  		tab[msgindex].doit = doit;
> > >  	if (dumpit)
> > 
> > > @@ -235,6 +279,9 @@ int rtnl_unregister(int protocol, int msgtype)
> > >  	handlers[msgindex].doit = NULL;
> > >  	handlers[msgindex].dumpit = NULL;
> > >  	handlers[msgindex].flags = 0;
> > > +	/* make sure we clear owner last */
> > > +	smp_wmb();
> > > +	handlers[msgindex].owner = NULL;
> > >  	rtnl_unlock();
> > >  
> > >  	return 0;
> > 
> > These wmb()'s don't make sense; and the comments are incomplete. What do
> > they pair with? Who cares about this ordering?
> 
> rtnetlink_rcv_msg:
> 
> 4406                         dumpit = READ_ONCE(handlers[type].dumpit);
> 4407                         if (!dumpit)
> 4408                                 goto err_unlock;
> 4409                         owner = READ_ONCE(handlers[type].owner);

So what stops the CPU from hoisting this load before the dumpit load?

> 4410                 }
> ..
> 4417                 if (!try_module_get(owner))
> 4418                         err = -EPROTONOSUPPORT;
> 4419 
> 
> I don't want dumpit function address to be visible before owner.
> Does that make sense?

And no. That's insane, how can it ever observe an incomplete tab in the
first place.

The problem is that __rtnl_register() and rtnl_unregister are broken.

__rtnl_register() publishes the tab before it initializes it; allowing
people to observe the thing incomplete.

Also, are we required to hold rtnl_lock() across __rtnl_register()? I'd
hope so, otherwise what stops concurrent allocations and leaking of tab?

Also, rtnl_register() doesn't seen to employ rtnl_lock() and panic()
WTF?!

rtnl_unregister() should then RCU free the tab.

None of that is happening, so what is that RCU stuff supposed to do?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07  9:10       ` Peter Zijlstra
@ 2017-11-07  9:24         ` Peter Zijlstra
  2017-11-07  9:47           ` Florian Westphal
  2017-11-07  9:43         ` Florian Westphal
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-11-07  9:24 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Tue, Nov 07, 2017 at 10:10:04AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 07, 2017 at 07:11:56AM +0100, Florian Westphal wrote:
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Mon, Nov 06, 2017 at 11:51:07AM +0100, Florian Westphal wrote:
> > > > @@ -180,6 +164,12 @@ int __rtnl_register(int protocol, int msgtype,
> > > >  		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
> > > >  	}
> > > >  
> > > > +	WARN_ON(tab[msgindex].owner && tab[msgindex].owner != owner);
> > > > +
> > > > +	tab[msgindex].owner = owner;
> > > > +	/* make sure owner is always visible first */
> > > > +	smp_wmb();
> > > > +
> > > >  	if (doit)
> > > >  		tab[msgindex].doit = doit;
> > > >  	if (dumpit)
> > > 
> > > > @@ -235,6 +279,9 @@ int rtnl_unregister(int protocol, int msgtype)
> > > >  	handlers[msgindex].doit = NULL;
> > > >  	handlers[msgindex].dumpit = NULL;
> > > >  	handlers[msgindex].flags = 0;
> > > > +	/* make sure we clear owner last */
> > > > +	smp_wmb();
> > > > +	handlers[msgindex].owner = NULL;
> > > >  	rtnl_unlock();
> > > >  
> > > >  	return 0;
> > > 
> > > These wmb()'s don't make sense; and the comments are incomplete. What do
> > > they pair with? Who cares about this ordering?
> > 
> > rtnetlink_rcv_msg:
> > 
> > 4406                         dumpit = READ_ONCE(handlers[type].dumpit);
> > 4407                         if (!dumpit)
> > 4408                                 goto err_unlock;
> > 4409                         owner = READ_ONCE(handlers[type].owner);
> 
> So what stops the CPU from hoisting this load before the dumpit load?
> 
> > 4410                 }
> > ..
> > 4417                 if (!try_module_get(owner))
> > 4418                         err = -EPROTONOSUPPORT;
> > 4419 
> > 
> > I don't want dumpit function address to be visible before owner.
> > Does that make sense?
> 
> And no. That's insane, how can it ever observe an incomplete tab in the
> first place.
> 
> The problem is that __rtnl_register() and rtnl_unregister are broken.
> 
> __rtnl_register() publishes the tab before it initializes it; allowing
> people to observe the thing incomplete.
> 
> Also, are we required to hold rtnl_lock() across __rtnl_register()? I'd
> hope so, otherwise what stops concurrent allocations and leaking of tab?
> 
> Also, rtnl_register() doesn't seen to employ rtnl_lock() and panic()
> WTF?!
> 
> rtnl_unregister() should then RCU free the tab.
> 
> None of that is happening, so what is that RCU stuff supposed to do?

Something like the below would go some way toward sanitizing this stuff;
rcu_assign_pointer() is a store-release, meaning it happens after
everything coming before.

Therefore, when you observe that tab (through rcu_dereference) you're
guaranteed to see the thing initialized. The memory ordering on the
consume side is through an address dependency; we need to have completed
the load of the tab pointer before we can compute the address of its
members and load from there, these are not things a CPU is allowed to
reorder (lets forget about Alpha).

Quite possibly, if rtnl_unregister() is called from module unload, this
is broken; in that case we'd need something like:

	rcu_assign_pointer(rtnl_msg_handler[protocol], NULL);
	/*
	 * Ensure nobody can still observe our old protocol handler
	 * before continuing to free the module that includes the
	 * functions called from it.
	 */
	synchronize_rcu();

---

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ace48926b19..25391c7b9c5d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -63,6 +63,7 @@ struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
 	unsigned int		flags;
+	struct rcu_head		rcu;
 };
 
 static DEFINE_MUTEX(rtnl_mutex);
@@ -172,14 +173,15 @@ int __rtnl_register(int protocol, int msgtype,
 	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 	msgindex = rtm_msgindex(msgtype);
 
-	tab = rcu_dereference_raw(rtnl_msg_handlers[protocol]);
-	if (tab == NULL) {
-		tab = kcalloc(RTM_NR_MSGTYPES, sizeof(*tab), GFP_KERNEL);
-		if (tab == NULL)
-			return -ENOBUFS;
+	if (WARN_ONCE(rtnl_msg_handler[protocol],
+		      "Double registration for protocol: %d\n", protcol))
+		return -EEXIST;
 
-		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
-	}
+	lockdep_assert_held(&rtnl_mutex);
+
+	tab = kcalloc(RTM_NR_MSGTYPES, sizeof(*tab), GFP_KERNEL);
+	if (tab == NULL)
+		return -ENOBUFS;
 
 	if (doit)
 		tab[msgindex].doit = doit;
@@ -187,6 +189,8 @@ int __rtnl_register(int protocol, int msgtype,
 		tab[msgindex].dumpit = dumpit;
 	tab[msgindex].flags |= flags;
 
+	rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__rtnl_register);
@@ -227,15 +231,13 @@ int rtnl_unregister(int protocol, int msgtype)
 	msgindex = rtm_msgindex(msgtype);
 
 	rtnl_lock();
-	handlers = rtnl_dereference(rtnl_msg_handlers[protocol]);
+	handlers = rtnl_msg_handlers[protocol];
 	if (!handlers) {
 		rtnl_unlock();
 		return -ENOENT;
 	}
-
-	handlers[msgindex].doit = NULL;
-	handlers[msgindex].dumpit = NULL;
-	handlers[msgindex].flags = 0;
+	rcu_assign_pointer(rtnl_msg_handler[protocol], NULL);
+	kfree_rcu(handlers, rcu);
 	rtnl_unlock();
 
 	return 0;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07  9:10       ` Peter Zijlstra
  2017-11-07  9:24         ` Peter Zijlstra
@ 2017-11-07  9:43         ` Florian Westphal
  1 sibling, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-07  9:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, netdev

Peter Zijlstra <peterz@infradead.org> wrote:
> > rtnetlink_rcv_msg:
> > 
> > 4406                         dumpit = READ_ONCE(handlers[type].dumpit);
> > 4407                         if (!dumpit)
> > 4408                                 goto err_unlock;
> > 4409                         owner = READ_ONCE(handlers[type].owner);
> 
> So what stops the CPU from hoisting this load before the dumpit load?

I was under impression READ_ONCE also includes rmb but I see i was
wrong.

> > I don't want dumpit function address to be visible before owner.
> > Does that make sense?
> 
> And no. That's insane, how can it ever observe an incomplete tab in the
> first place.
> 
> The problem is that __rtnl_register() and rtnl_unregister are broken.
> 
> __rtnl_register() publishes the tab before it initializes it; allowing
> people to observe the thing incomplete.
>
> Also, are we required to hold rtnl_lock() across __rtnl_register()? I'd
> hope so, otherwise what stops concurrent allocations and leaking of tab?

I don't think these ever acquired rtnl mutex.
Hostorically the rtnl callbacks were statically allocated and only ran
from initcalls.

Use of of kmalloc came later, and then use in modules.

> rtnl_unregister() should then RCU free the tab.

I do not think that will work since that will make it behave like
rtnl_unregister_all(), i.e. removes all callbacks of the family.

> None of that is happening, so what is that RCU stuff supposed to do?

Its supposed to delay rmmod until all places that are still executing a
registered callback are done.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07  9:24         ` Peter Zijlstra
@ 2017-11-07  9:47           ` Florian Westphal
  2017-11-07 14:57             ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2017-11-07  9:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, netdev

Peter Zijlstra <peterz@infradead.org> wrote:
> Something like the below would go some way toward sanitizing this stuff;
> rcu_assign_pointer() is a store-release, meaning it happens after
> everything coming before.
> 
> Therefore, when you observe that tab (through rcu_dereference) you're
> guaranteed to see the thing initialized. The memory ordering on the
> consume side is through an address dependency; we need to have completed
> the load of the tab pointer before we can compute the address of its
> members and load from there, these are not things a CPU is allowed to
> reorder (lets forget about Alpha).
> 
> Quite possibly, if rtnl_unregister() is called from module unload, this
> is broken; in that case we'd need something like:
> 
> 	rcu_assign_pointer(rtnl_msg_handler[protocol], NULL);
> 	/*
> 	 * Ensure nobody can still observe our old protocol handler
> 	 * before continuing to free the module that includes the
> 	 * functions called from it.
> 	 */
> 	synchronize_rcu();
> 
> ---
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 5ace48926b19..25391c7b9c5d 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -63,6 +63,7 @@ struct rtnl_link {
>  	rtnl_doit_func		doit;
>  	rtnl_dumpit_func	dumpit;
>  	unsigned int		flags;
> +	struct rcu_head		rcu;
>  };
>  
>  static DEFINE_MUTEX(rtnl_mutex);
> @@ -172,14 +173,15 @@ int __rtnl_register(int protocol, int msgtype,
>  	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
>  	msgindex = rtm_msgindex(msgtype);
>  
> -	tab = rcu_dereference_raw(rtnl_msg_handlers[protocol]);
> -	if (tab == NULL) {
> -		tab = kcalloc(RTM_NR_MSGTYPES, sizeof(*tab), GFP_KERNEL);
> -		if (tab == NULL)
> -			return -ENOBUFS;
> +	if (WARN_ONCE(rtnl_msg_handler[protocol],
> +		      "Double registration for protocol: %d\n", protcol))
> +		return -EEXIST;

I would expect this to trigger all the time, due to

rtnl_register(AF_INET, RTM_GETROUTE, ...
rtnl_register(AF_INET, RTM_GETADDR, ...

etc.

> @@ -227,15 +231,13 @@ int rtnl_unregister(int protocol, int msgtype)
>  	msgindex = rtm_msgindex(msgtype);
>  
>  	rtnl_lock();
> -	handlers = rtnl_dereference(rtnl_msg_handlers[protocol]);
> +	handlers = rtnl_msg_handlers[protocol];
>  	if (!handlers) {
>  		rtnl_unlock();
>  		return -ENOENT;
>  	}
> -
> -	handlers[msgindex].doit = NULL;
> -	handlers[msgindex].dumpit = NULL;
> -	handlers[msgindex].flags = 0;
> +	rcu_assign_pointer(rtnl_msg_handler[protocol], NULL);
> +	kfree_rcu(handlers, rcu);

This unregisters all handlers of "protocol" instead of
"protocol:msgtype".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07  9:47           ` Florian Westphal
@ 2017-11-07 14:57             ` Peter Zijlstra
  2017-11-08  1:27               ` Stephen Hemminger
  2017-11-13  7:21               ` Florian Westphal
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-11-07 14:57 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Tue, Nov 07, 2017 at 10:47:51AM +0100, Florian Westphal wrote:
> I would expect this to trigger all the time, due to
> 
> rtnl_register(AF_INET, RTM_GETROUTE, ...
> rtnl_register(AF_INET, RTM_GETADDR, ...

Ah, sure, then something like so then...

There's bound to be bugs there too, as I pretty much typed this without
thinking, but it should show the idea.

---
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ace48926b19..de1336775602 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -63,6 +63,7 @@ struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
 	unsigned int		flags;
+	struct rcu_head		rcu;
 };
 
 static DEFINE_MUTEX(rtnl_mutex);
@@ -127,7 +128,7 @@ bool lockdep_rtnl_is_held(void)
 EXPORT_SYMBOL(lockdep_rtnl_is_held);
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
-static struct rtnl_link __rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];
+static struct rtnl_link __rcu **rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];
 static refcount_t rtnl_msg_handlers_ref[RTNL_FAMILY_MAX + 1];
 
 static inline int rtm_msgindex(int msgtype)
@@ -144,6 +145,23 @@ static inline int rtm_msgindex(int msgtype)
 	return msgindex;
 }
 
+static struct rtnl_link *rtnl_get_link(int protocol, int msgtype)
+{
+	struct rtnl_link **tab;
+
+	if (protocol >= ARRAY_SIZE(rtnl_msg_handlers))
+		protocol = PF_UNSPEC;
+
+	tab = rcu_dereference(rtnl_msg_handlers[protocol]);
+	if (!tab) {
+		tab = rcu_dereference(rtnl_msg_handlers[PF_UNSPECl]);
+		if (!tab)
+			return NULL;
+	}
+
+	return rcu_dereference(handlers[rtm_msgindex(msgtype)]);
+}
+
 /**
  * __rtnl_register - Register a rtnetlink message type
  * @protocol: Protocol family or PF_UNSPEC
@@ -166,28 +184,39 @@ int __rtnl_register(int protocol, int msgtype,
 		    rtnl_doit_func doit, rtnl_dumpit_func dumpit,
 		    unsigned int flags)
 {
-	struct rtnl_link *tab;
+	struct rtnl_link **tab, *link;
 	int msgindex;
+	int ret = -ENOBUFS;
 
 	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 	msgindex = rtm_msgindex(msgtype);
 
-	tab = rcu_dereference_raw(rtnl_msg_handlers[protocol]);
+	rtnl_lock();
+	tab = rtnl_msg_handlers[protocol];
 	if (tab == NULL) {
-		tab = kcalloc(RTM_NR_MSGTYPES, sizeof(*tab), GFP_KERNEL);
-		if (tab == NULL)
-			return -ENOBUFS;
+		tab = kcalloc(RTM_NR_MSGTYPES, sizeof(void *), GFP_KERNEL);
+		if (!tab)
+			goto unlock;
 
+		/* ensures we see the 0 stores */
 		rcu_assign_pointer(rtnl_msg_handlers[protocol], tab);
 	}
 
-	if (doit)
-		tab[msgindex].doit = doit;
-	if (dumpit)
-		tab[msgindex].dumpit = dumpit;
-	tab[msgindex].flags |= flags;
+	WARN_ONCE(tab[msgindex], "Double allocated rtnl(%d:%d)\n", protocol, msgtype);
+	link = kzalloc(sizeof(struct rtnl_link), GFP_KERNEL);
+	if (!link)
+		goto unlock;
 
-	return 0;
+	link->doit = doit;
+	link->dumpit = dumpit;
+	link->flags |= flags;
+	/* publish protocol:msgtype */
+	rcu_assign_pointer(tab[msgindex], link);
+	ret = 0;
+unlock:
+	rtnl_unlock();
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__rtnl_register);
 
@@ -220,22 +249,22 @@ EXPORT_SYMBOL_GPL(rtnl_register);
  */
 int rtnl_unregister(int protocol, int msgtype)
 {
-	struct rtnl_link *handlers;
+	struct rtnl_link **tab, *link;
 	int msgindex;
 
 	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 	msgindex = rtm_msgindex(msgtype);
 
 	rtnl_lock();
-	handlers = rtnl_dereference(rtnl_msg_handlers[protocol]);
-	if (!handlers) {
+	tab = rtnl_dereference(rtnl_msg_handlers[protocol]);
+	if (!tab) {
 		rtnl_unlock();
 		return -ENOENT;
 	}
 
-	handlers[msgindex].doit = NULL;
-	handlers[msgindex].dumpit = NULL;
-	handlers[msgindex].flags = 0;
+	link = tab[msgindex];
+	rcu_assign_pointer(tab[msgindex], NULL);
+	kfree_rcu(link, rcu);
 	rtnl_unlock();
 
 	return 0;
@@ -251,13 +280,22 @@ EXPORT_SYMBOL_GPL(rtnl_unregister);
  */
 void rtnl_unregister_all(int protocol)
 {
-	struct rtnl_link *handlers;
+	struct rtnl_link **tab, *link;
+	int msgindex;
 
 	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 
 	rtnl_lock();
-	handlers = rtnl_dereference(rtnl_msg_handlers[protocol]);
+	tab = rtnl_msg_handlers[protocol];
 	RCU_INIT_POINTER(rtnl_msg_handlers[protocol], NULL);
+	for (msgindex = 0; msgindex < RTM_NR_MSGTYPES; msgindex++) {
+		link = tab[msgindex];
+		if (!link)
+			continue;
+
+		rcu_assign_pointer(tab[msgindex], NULL);
+		kfree_rcu(link, rcu);
+	}
 	rtnl_unlock();
 
 	synchronize_net();
@@ -2830,17 +2868,17 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 
 	for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
 		int type = cb->nlh->nlmsg_type-RTM_BASE;
-		struct rtnl_link *handlers;
+		struct rtnl_link *link;
 		rtnl_dumpit_func dumpit;
 
 		if (idx < s_idx || idx == PF_PACKET)
 			continue;
 
-		handlers = rtnl_dereference(rtnl_msg_handlers[idx]);
-		if (!handlers)
+		link = rtnl_get_link(idx, type);
+		if (!link)
 			continue;
 
-		dumpit = READ_ONCE(handlers[type].dumpit);
+		dumpit = link->dumpit;
 		if (!dumpit)
 			continue;
 
@@ -4155,7 +4193,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 			     struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(skb->sk);
-	struct rtnl_link *handlers;
+	struct rtnl_link *link;
 	int err = -EOPNOTSUPP;
 	rtnl_doit_func doit;
 	unsigned int flags;
@@ -4179,32 +4217,19 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (kind != 2 && !netlink_net_capable(skb, CAP_NET_ADMIN))
 		return -EPERM;
 
-	if (family >= ARRAY_SIZE(rtnl_msg_handlers))
-		family = PF_UNSPEC;
-
 	rcu_read_lock();
-	handlers = rcu_dereference(rtnl_msg_handlers[family]);
-	if (!handlers) {
-		family = PF_UNSPEC;
-		handlers = rcu_dereference(rtnl_msg_handlers[family]);
-	}
-
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
 		struct sock *rtnl;
 		rtnl_dumpit_func dumpit;
 		u16 min_dump_alloc = 0;
 
-		dumpit = READ_ONCE(handlers[type].dumpit);
-		if (!dumpit) {
-			family = PF_UNSPEC;
-			handlers = rcu_dereference(rtnl_msg_handlers[PF_UNSPEC]);
-			if (!handlers)
-				goto err_unlock;
-
-			dumpit = READ_ONCE(handlers[type].dumpit);
-			if (!dumpit)
+		link = rtnl_get_link(family, type);
+		if (!link || !link->dumpit) {
+			link = rtnl_get_link(PF_UNSPEC, type);
+			if (!link || !link->dumpit)
 				goto err_unlock;
 		}
+		dumpit = link->dumpit;
 
 		refcount_inc(&rtnl_msg_handlers_ref[family]);
 
@@ -4225,33 +4250,36 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return err;
 	}
 
-	doit = READ_ONCE(handlers[type].doit);
-	if (!doit) {
+	link = rtnl_get_link(family, type);
+	if (!link || !link->doit) {
 		family = PF_UNSPEC;
-		handlers = rcu_dereference(rtnl_msg_handlers[family]);
+		link = rtnl_get_link(PF_UNSPEC, type);
+		if (!link || !link->doit)
+			goto out_unlock;
 	}
 
-	flags = READ_ONCE(handlers[type].flags);
+	flags = link->flags;
 	if (flags & RTNL_FLAG_DOIT_UNLOCKED) {
 		refcount_inc(&rtnl_msg_handlers_ref[family]);
-		doit = READ_ONCE(handlers[type].doit);
+		doit = link->oit;
 		rcu_read_unlock();
 		if (doit)
 			err = doit(skb, nlh, extack);
 		refcount_dec(&rtnl_msg_handlers_ref[family]);
 		return err;
 	}
-
 	rcu_read_unlock();
 
 	rtnl_lock();
-	handlers = rtnl_dereference(rtnl_msg_handlers[family]);
-	if (handlers) {
-		doit = READ_ONCE(handlers[type].doit);
-		if (doit)
-			err = doit(skb, nlh, extack);
-	}
+	link = rtnl_get_link(family, type);
+	if (link && link->doit)
+		err = link->doit(skb, nlh, extack);
 	rtnl_unlock();
+
+	return err;
+
+out_unlock:
+	rcu_read_unlock();
 	return err;
 
 err_unlock:

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07 14:57             ` Peter Zijlstra
@ 2017-11-08  1:27               ` Stephen Hemminger
  2017-11-13  7:21               ` Florian Westphal
  1 sibling, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2017-11-08  1:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, netdev

On Tue, 7 Nov 2017 15:57:36 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> +	link = tab[msgindex];
> +	rcu_assign_pointer(tab[msgindex], NULL);

It is not necessary (or useful) to use rcu_assign_pointer with NULL.
The rcu_assign_pointer used to special case NULL but doesn't in recent
years.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-07 14:57             ` Peter Zijlstra
  2017-11-08  1:27               ` Stephen Hemminger
@ 2017-11-13  7:21               ` Florian Westphal
  2017-11-13  7:55                 ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2017-11-13  7:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, netdev

Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Nov 07, 2017 at 10:47:51AM +0100, Florian Westphal wrote:
> > I would expect this to trigger all the time, due to
> > 
> > rtnl_register(AF_INET, RTM_GETROUTE, ...
> > rtnl_register(AF_INET, RTM_GETADDR, ...
> 
> Ah, sure, then something like so then...
> 
> There's bound to be bugs there too, as I pretty much typed this without
> thinking, but it should show the idea.

Just o let you know, I am backlogged at the moment so I Will not have
time to work on this for the time being.

> ---
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 5ace48926b19..de1336775602 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -63,6 +63,7 @@ struct rtnl_link {
>  	rtnl_doit_func		doit;
>  	rtnl_dumpit_func	dumpit;
>  	unsigned int		flags;
> +	struct rcu_head		rcu;
>  };

This will need to be split:

struct rtnl_link {
	rtnl_doit_func          doit;
	unsigned int		flags;
	struct rcu_head		rcu;
};
struct rtnl_link_dump {
  	rtnl_dumpit_func	dumpit;
	struct rcu_head		rcu;
};

> -static struct rtnl_link __rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];
> +static struct rtnl_link __rcu **rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];

So this will need to be two arrays.

Reason is that some places do this:

rtnl_register(pf, RTM_FOO, doit, NULL, 0);
rtnl_register(pf, RTM_FOO, NULL, dumpit, 0);

(from different call sites in the stack).
> -	if (doit)
> -		tab[msgindex].doit = doit;
> -	if (dumpit)
> -		tab[msgindex].dumpit = dumpit;

Which is the reason for these if () tests.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-13  7:21               ` Florian Westphal
@ 2017-11-13  7:55                 ` Peter Zijlstra
  2017-11-13  7:59                   ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-11-13  7:55 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Mon, Nov 13, 2017 at 08:21:59AM +0100, Florian Westphal wrote:
> Reason is that some places do this:
> 
> rtnl_register(pf, RTM_FOO, doit, NULL, 0);
> rtnl_register(pf, RTM_FOO, NULL, dumpit, 0);

Sure, however,

> (from different call sites in the stack).
> > -	if (doit)
> > -		tab[msgindex].doit = doit;
> > -	if (dumpit)
> > -		tab[msgindex].dumpit = dumpit;
> 
> Which is the reason for these if () tests.

then we assign NULL, which is fine, no?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/8] rtnetlink: add rtnl_register_module
  2017-11-13  7:55                 ` Peter Zijlstra
@ 2017-11-13  7:59                   ` Florian Westphal
  0 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2017-11-13  7:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Florian Westphal, netdev

Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Nov 13, 2017 at 08:21:59AM +0100, Florian Westphal wrote:
> > Reason is that some places do this:
> > 
> > rtnl_register(pf, RTM_FOO, doit, NULL, 0);
> > rtnl_register(pf, RTM_FOO, NULL, dumpit, 0);
> 
> Sure, however,
> 
> > (from different call sites in the stack).
> > > -	if (doit)
> > > -		tab[msgindex].doit = doit;
> > > -	if (dumpit)
> > > -		tab[msgindex].dumpit = dumpit;
> > 
> > Which is the reason for these if () tests.
> 
> then we assign NULL, which is fine, no?

I meant that
1) rtnl_register(pf, RTM_FOO, doit, NULL, 0);
2) rtnl_register(pf, RTM_FOO, NULL, dumpit, 0);

2) overwrites doit() back to NULL.

(it doesn't at the moment due to if() guard quoted above).

We could not do this split, and keep the if () around.

But then we change a member of the link array after it has
been published via rcu_assign_pointer.

AFAIU this is exactly what you want to avoid with this patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-11-13  8:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 10:51 rtnetlink: fix dump+module unload races, take 2 Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 1/8] rtnetlink: Revert "rtnetlink: add reference counting to prevent module unload while dump is in progress" Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 2/8] rtnetlink: add rtnl_register_module Florian Westphal
2017-11-06 12:44   ` Peter Zijlstra
2017-11-07  6:11     ` Florian Westphal
2017-11-07  9:10       ` Peter Zijlstra
2017-11-07  9:24         ` Peter Zijlstra
2017-11-07  9:47           ` Florian Westphal
2017-11-07 14:57             ` Peter Zijlstra
2017-11-08  1:27               ` Stephen Hemminger
2017-11-13  7:21               ` Florian Westphal
2017-11-13  7:55                 ` Peter Zijlstra
2017-11-13  7:59                   ` Florian Westphal
2017-11-07  9:43         ` Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 3/8] qtr: use rtnl_register_module Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 4/8] bridge: " Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 5/8] can: " Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 6/8] decnet: " Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 7/8] phonet: " Florian Westphal
2017-11-06 10:51 ` [PATCH net-next 8/8] mpls: " Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.