All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
@ 2023-05-09 10:09 Jiri Pirko
  2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
  To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

From: Jiri Pirko <jiri@nvidia.com>

The commit 565b4824c39f ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be global
which fixed the issue of non-receiving events of netdev uninit if that
moved to a different namespace. That worked fine in -net tree.

However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.

Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)

Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1

The first two patches are just dependencies of the last one, which is
actually fixing the issue. It converts the notifier to per per-net
again. But this time also per-devlink_port and setting to follow
the netdev to different namespace.

Jiri Pirko (3):
  net: allow to ask per-net netdevice notifier to follow netdev
    dynamically
  devlink: make netdev notifier per-port
  devlink: change port event netdev notifier to be per-net and following
    netdev

 include/linux/netdevice.h   |  6 +++++
 include/net/devlink.h       |  2 ++
 net/core/dev.c              | 34 +++++++++++++++++++++++-----
 net/devlink/core.c          |  9 --------
 net/devlink/devl_internal.h |  4 ----
 net/devlink/leftover.c      | 45 ++++++++++++++++++++++++++-----------
 6 files changed, 69 insertions(+), 31 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically
  2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
@ 2023-05-09 10:09 ` Jiri Pirko
  2023-05-09 10:09 ` [patch net 2/3] devlink: make netdev notifier per-port Jiri Pirko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
  To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

From: Jiri Pirko <jiri@nvidia.com>

Currently, it is possible to register netdev notifier to get only
events related to a selected namespace. This could be done by:
register_netdevice_notifier_net()

Another extension which currently exists is to register netdev notifier
that receives events related to a namespace, where a netdev is. The
notifier moves from namespace to namespace with the selected netdev.
This could be done by:
register_netdevice_notifier_dev_net()

Devlink has a usecase to monitor a namespace and whenever certain netdev
appears in this namespace, it needs to get notifications even in case
netdev moves to a different namespace. It's basically a combination of
the two described above.

Introduce a pair of functions netdev_net_notifier_follow() and
netdev_net_notifier_unfollow() to be called on previously registered
per-net notifier asking to follow the given netdev.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            | 34 +++++++++++++++++++++++++++++-----
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08fbd4622ccf..63376dad8464 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2890,6 +2890,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb);
 int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb);
 int unregister_netdevice_notifier_net(struct net *net,
 				      struct notifier_block *nb);
+void netdev_net_notifier_follow(struct net_device *dev,
+				struct notifier_block *nb,
+				struct netdev_net_notifier *nn);
+void netdev_net_notifier_unfollow(struct net_device *dev,
+				  struct netdev_net_notifier *nn,
+				  struct net *net);
 int register_netdevice_notifier_dev_net(struct net_device *dev,
 					struct notifier_block *nb,
 					struct netdev_net_notifier *nn);
diff --git a/net/core/dev.c b/net/core/dev.c
index 735096d42c1d..3458ed8f98f2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1868,6 +1868,32 @@ static void __move_netdevice_notifier_net(struct net *src_net,
 	__register_netdevice_notifier_net(dst_net, nb, true);
 }
 
+void netdev_net_notifier_follow(struct net_device *dev,
+				struct notifier_block *nb,
+				struct netdev_net_notifier *nn)
+{
+	ASSERT_RTNL();
+	nn->nb = nb;
+	list_add(&nn->list, &dev->net_notifier_list);
+}
+EXPORT_SYMBOL(netdev_net_notifier_follow);
+
+static void __netdev_net_notifier_unfollow(struct netdev_net_notifier *nn)
+{
+	list_del(&nn->list);
+}
+
+void netdev_net_notifier_unfollow(struct net_device *dev,
+				  struct netdev_net_notifier *nn,
+				  struct net *net)
+{
+	ASSERT_RTNL();
+	__netdev_net_notifier_unfollow(nn);
+	if (!net_eq(dev_net(dev), net))
+		__move_netdevice_notifier_net(dev_net(dev), net, nn->nb);
+}
+EXPORT_SYMBOL(netdev_net_notifier_unfollow);
+
 int register_netdevice_notifier_dev_net(struct net_device *dev,
 					struct notifier_block *nb,
 					struct netdev_net_notifier *nn)
@@ -1876,10 +1902,8 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
 
 	rtnl_lock();
 	err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
-	if (!err) {
-		nn->nb = nb;
-		list_add(&nn->list, &dev->net_notifier_list);
-	}
+	if (!err)
+		netdev_net_notifier_follow(dev, nb, nn);
 	rtnl_unlock();
 	return err;
 }
@@ -1892,7 +1916,7 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
 	int err;
 
 	rtnl_lock();
-	list_del(&nn->list);
+	__netdev_net_notifier_unfollow(nn);
 	err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
 	rtnl_unlock();
 	return err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [patch net 2/3] devlink: make netdev notifier per-port
  2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
  2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
@ 2023-05-09 10:09 ` Jiri Pirko
  2023-05-09 10:09 ` [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev Jiri Pirko
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
  To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

From: Jiri Pirko <jiri@nvidia.com>

The netdev notifier is used to track changes of a netdev related to a
certain devlink port. In preparation for the next patch,
change the netdev notifier to register per devlink port instance.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h       |  1 +
 net/devlink/core.c          |  9 ---------
 net/devlink/devl_internal.h |  4 ----
 net/devlink/leftover.c      | 31 ++++++++++++++++++++++---------
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6a942e70e451..d0a0d1ce7db4 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -149,6 +149,7 @@ struct devlink_port {
 
 	struct devlink_rate *devlink_rate;
 	struct devlink_linecard *linecard;
+	struct notifier_block netdevice_nb;
 };
 
 struct devlink_port_new_attrs {
diff --git a/net/devlink/core.c b/net/devlink/core.c
index 777b091ef74d..018e547bdb98 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	if (ret < 0)
 		goto err_xa_alloc;
 
-	devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
-	ret = register_netdevice_notifier(&devlink->netdevice_nb);
-	if (ret)
-		goto err_register_netdevice_notifier;
-
 	devlink->dev = dev;
 	devlink->ops = ops;
 	xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
@@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 
 	return devlink;
 
-err_register_netdevice_notifier:
-	xa_erase(&devlinks, devlink->index);
 err_xa_alloc:
 	kfree(devlink);
 	return NULL;
@@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
 	xa_destroy(&devlink->params);
 	xa_destroy(&devlink->ports);
 
-	WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
-
 	xa_erase(&devlinks, devlink->index);
 
 	devlink_put(devlink);
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index e133f423294a..e595c5dcff45 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -50,7 +50,6 @@ struct devlink {
 	u8 reload_failed:1;
 	refcount_t refcount;
 	struct rcu_work rwork;
-	struct notifier_block netdevice_nb;
 	char priv[] __aligned(NETDEV_ALIGN);
 };
 
@@ -171,9 +170,6 @@ extern const struct devlink_cmd devl_cmd_selftests_get;
 void devlink_notify(struct devlink *devlink, enum devlink_command cmd);
 
 /* Ports */
-int devlink_port_netdevice_event(struct notifier_block *nb,
-				 unsigned long event, void *ptr);
-
 struct devlink_port *
 devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info);
 struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink,
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index dffca2f9bfa7..4b1627cb2b83 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -6841,6 +6841,9 @@ void devlink_port_fini(struct devlink_port *devlink_port)
 }
 EXPORT_SYMBOL_GPL(devlink_port_fini);
 
+static int devlink_port_netdevice_event(struct notifier_block *nb,
+					unsigned long event, void *ptr);
+
 /**
  * devl_port_register() - Register devlink port
  *
@@ -6869,14 +6872,24 @@ int devl_port_register(struct devlink *devlink,
 	devlink_port->index = port_index;
 	spin_lock_init(&devlink_port->type_lock);
 	INIT_LIST_HEAD(&devlink_port->reporter_list);
-	err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL);
+
+	devlink_port->netdevice_nb.notifier_call = devlink_port_netdevice_event;
+	err = register_netdevice_notifier(&devlink_port->netdevice_nb);
 	if (err)
 		return err;
 
+	err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL);
+	if (err)
+		goto err_xa_insert;
+
 	INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn);
 	devlink_port_type_warn_schedule(devlink_port);
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 	return 0;
+
+err_xa_insert:
+	unregister_netdevice_notifier(&devlink_port->netdevice_nb);
+	return err;
 }
 EXPORT_SYMBOL_GPL(devl_port_register);
 
@@ -6921,6 +6934,7 @@ void devl_port_unregister(struct devlink_port *devlink_port)
 	devlink_port_type_warn_cancel(devlink_port);
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
 	xa_erase(&devlink_port->devlink->ports, devlink_port->index);
+	WARN_ON_ONCE(unregister_netdevice_notifier(&devlink_port->netdevice_nb));
 	WARN_ON(!list_empty(&devlink_port->reporter_list));
 	devlink_port->registered = false;
 }
@@ -7066,16 +7080,15 @@ void devlink_port_type_clear(struct devlink_port *devlink_port)
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_clear);
 
-int devlink_port_netdevice_event(struct notifier_block *nb,
-				 unsigned long event, void *ptr)
+static int devlink_port_netdevice_event(struct notifier_block *nb,
+					unsigned long event, void *ptr)
 {
 	struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
-	struct devlink_port *devlink_port = netdev->devlink_port;
-	struct devlink *devlink;
+	struct devlink_port *devlink_port;
 
-	devlink = container_of(nb, struct devlink, netdevice_nb);
+	devlink_port = container_of(nb, struct devlink_port, netdevice_nb);
 
-	if (!devlink_port || devlink_port->devlink != devlink)
+	if (netdev->devlink_port != devlink_port)
 		return NOTIFY_OK;
 
 	switch (event) {
@@ -7089,7 +7102,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
 		break;
 	case NETDEV_REGISTER:
 	case NETDEV_CHANGENAME:
-		if (devlink_net(devlink) != dev_net(netdev))
+		if (devlink_net(devlink_port->devlink) != dev_net(netdev))
 			return NOTIFY_OK;
 		/* Set the netdev on top of previously set type. Note this
 		 * event happens also during net namespace change so here
@@ -7100,7 +7113,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
 					netdev);
 		break;
 	case NETDEV_UNREGISTER:
-		if (devlink_net(devlink) != dev_net(netdev))
+		if (devlink_net(devlink_port->devlink) != dev_net(netdev))
 			return NOTIFY_OK;
 		/* Clear netdev pointer, but not the type. This event happens
 		 * also during net namespace change so we need to clear
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev
  2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
  2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
  2023-05-09 10:09 ` [patch net 2/3] devlink: make netdev notifier per-port Jiri Pirko
@ 2023-05-09 10:09 ` Jiri Pirko
  2023-05-09 20:13 ` [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jacob Keller
  2023-05-10  3:24 ` Jakub Kicinski
  4 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
  To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

From: Jiri Pirko <jiri@nvidia.com>

The commit 565b4824c39f ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be global
which fixed the issue of non-receiving events of netdev uninit if that
moved to a different namespace. That worked fine in -net tree.

However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.

Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)

Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1

Resolve this by converting the notifier to per per-net again.
But this time also per-devlink_port and setting to follow the netdev
to different namespace when spotted, ensured by
netdev_net_notifier_follow().

Note what a tree needs this fix only in case all of the cited fixes
commits are present.

Reported-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h  |  1 +
 net/devlink/leftover.c | 24 +++++++++++++++---------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index d0a0d1ce7db4..f252da446264 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -150,6 +150,7 @@ struct devlink_port {
 	struct devlink_rate *devlink_rate;
 	struct devlink_linecard *linecard;
 	struct notifier_block netdevice_nb;
+	struct netdev_net_notifier netdevice_nn;
 };
 
 struct devlink_port_new_attrs {
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index 4b1627cb2b83..9e11db6528ce 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -6874,7 +6874,8 @@ int devl_port_register(struct devlink *devlink,
 	INIT_LIST_HEAD(&devlink_port->reporter_list);
 
 	devlink_port->netdevice_nb.notifier_call = devlink_port_netdevice_event;
-	err = register_netdevice_notifier(&devlink_port->netdevice_nb);
+	err = register_netdevice_notifier_net(devlink_net(devlink),
+					      &devlink_port->netdevice_nb);
 	if (err)
 		return err;
 
@@ -6888,7 +6889,8 @@ int devl_port_register(struct devlink *devlink,
 	return 0;
 
 err_xa_insert:
-	unregister_netdevice_notifier(&devlink_port->netdevice_nb);
+	unregister_netdevice_notifier_net(devlink_net(devlink),
+					  &devlink_port->netdevice_nb);
 	return err;
 }
 EXPORT_SYMBOL_GPL(devl_port_register);
@@ -6928,13 +6930,16 @@ EXPORT_SYMBOL_GPL(devlink_port_register);
  */
 void devl_port_unregister(struct devlink_port *devlink_port)
 {
-	lockdep_assert_held(&devlink_port->devlink->lock);
+	struct devlink *devlink = devlink_port->devlink;
+
+	lockdep_assert_held(&devlink->lock);
 	WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);
 
 	devlink_port_type_warn_cancel(devlink_port);
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
-	xa_erase(&devlink_port->devlink->ports, devlink_port->index);
-	WARN_ON_ONCE(unregister_netdevice_notifier(&devlink_port->netdevice_nb));
+	xa_erase(&devlink->ports, devlink_port->index);
+	WARN_ON_ONCE(unregister_netdevice_notifier_net(devlink_net(devlink),
+						       &devlink_port->netdevice_nb));
 	WARN_ON(!list_empty(&devlink_port->reporter_list));
 	devlink_port->registered = false;
 }
@@ -7099,11 +7104,11 @@ static int devlink_port_netdevice_event(struct notifier_block *nb,
 		 */
 		__devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH,
 					NULL);
+		netdev_net_notifier_follow(netdev, nb,
+					   &devlink_port->netdevice_nn);
 		break;
 	case NETDEV_REGISTER:
 	case NETDEV_CHANGENAME:
-		if (devlink_net(devlink_port->devlink) != dev_net(netdev))
-			return NOTIFY_OK;
 		/* Set the netdev on top of previously set type. Note this
 		 * event happens also during net namespace change so here
 		 * we take into account netdev pointer appearing in this
@@ -7113,8 +7118,6 @@ static int devlink_port_netdevice_event(struct notifier_block *nb,
 					netdev);
 		break;
 	case NETDEV_UNREGISTER:
-		if (devlink_net(devlink_port->devlink) != dev_net(netdev))
-			return NOTIFY_OK;
 		/* Clear netdev pointer, but not the type. This event happens
 		 * also during net namespace change so we need to clear
 		 * pointer to netdev that is going to another net namespace.
@@ -7128,6 +7131,9 @@ static int devlink_port_netdevice_event(struct notifier_block *nb,
 		 */
 		__devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET,
 					NULL);
+		netdev_net_notifier_unfollow(netdev,
+					     &devlink_port->netdevice_nn,
+					     devlink_net(devlink_port->devlink));
 		break;
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
  2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
                   ` (2 preceding siblings ...)
  2023-05-09 10:09 ` [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev Jiri Pirko
@ 2023-05-09 20:13 ` Jacob Keller
  2023-05-10  3:24 ` Jakub Kicinski
  4 siblings, 0 replies; 8+ messages in thread
From: Jacob Keller @ 2023-05-09 20:13 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: kuba, pabeni, davem, edumazet, saeedm, moshe



On 5/9/2023 3:09 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be global
> which fixed the issue of non-receiving events of netdev uninit if that
> moved to a different namespace. That worked fine in -net tree.
> 
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
> 
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
> 
> Steps to reproduce:
> $ modprobe mlx5_core
> $ ip netns add ns1
> $ devlink dev reload pci/0000:08:00.0 netns ns1
> $ ip netns del ns1
> 
> The first two patches are just dependencies of the last one, which is
> actually fixing the issue. It converts the notifier to per per-net
> again. But this time also per-devlink_port and setting to follow
> the netdev to different namespace.
> 
> Jiri Pirko (3):
>   net: allow to ask per-net netdevice notifier to follow netdev
>     dynamically
>   devlink: make netdev notifier per-port
>   devlink: change port event netdev notifier to be per-net and following
>     netdev
> 
>  include/linux/netdevice.h   |  6 +++++
>  include/net/devlink.h       |  2 ++
>  net/core/dev.c              | 34 +++++++++++++++++++++++-----
>  net/devlink/core.c          |  9 --------
>  net/devlink/devl_internal.h |  4 ----
>  net/devlink/leftover.c      | 45 ++++++++++++++++++++++++++-----------
>  6 files changed, 69 insertions(+), 31 deletions(-)
> 

Not sure of the patchworks is confused by 3/3 instead of 0/3 on the
cover letter, but the series looks good to me:

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
  2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
                   ` (3 preceding siblings ...)
  2023-05-09 20:13 ` [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jacob Keller
@ 2023-05-10  3:24 ` Jakub Kicinski
  2023-05-10  6:11   ` Jiri Pirko
  2023-05-10 18:27   ` Jacob Keller
  4 siblings, 2 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-05-10  3:24 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

On Tue,  9 May 2023 12:09:36 +0200 Jiri Pirko wrote:
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be global
> which fixed the issue of non-receiving events of netdev uninit if that
> moved to a different namespace. That worked fine in -net tree.
> 
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
> 
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)

Why don't we have a single, static notifier for all of devlink?
Why the per device/per port notifiers?

We have the devlink port pointer in struct net_device, resolving from
a global event to the correct devlink instance is trivial.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
  2023-05-10  3:24 ` Jakub Kicinski
@ 2023-05-10  6:11   ` Jiri Pirko
  2023-05-10 18:27   ` Jacob Keller
  1 sibling, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-10  6:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe

Wed, May 10, 2023 at 05:24:44AM CEST, kuba@kernel.org wrote:
>On Tue,  9 May 2023 12:09:36 +0200 Jiri Pirko wrote:
>> The commit 565b4824c39f ("devlink: change port event netdev notifier
>> from per-net to global") changed original per-net notifier to be global
>> which fixed the issue of non-receiving events of netdev uninit if that
>> moved to a different namespace. That worked fine in -net tree.
>> 
>> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
>> separate devlink instance for ethernet auxiliary device") and
>> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
>> case of PCI device suspend") were merged, a deadlock was introduced
>> when removing a namespace with devlink instance with another nested
>> instance.
>> 
>> Here there is the bad flow example resulting in deadlock with mlx5:
>> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
>> devlink_pernet_pre_exit() -> devlink_reload() ->
>> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
>> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
>> mlx5e_destroy_devlink() -> devlink_free() ->
>> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>
>Why don't we have a single, static notifier for all of devlink?
>Why the per device/per port notifiers?
>
>We have the devlink port pointer in struct net_device, resolving from
>a global event to the correct devlink instance is trivial.

Okay, that might work. Let me explore that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
  2023-05-10  3:24 ` Jakub Kicinski
  2023-05-10  6:11   ` Jiri Pirko
@ 2023-05-10 18:27   ` Jacob Keller
  1 sibling, 0 replies; 8+ messages in thread
From: Jacob Keller @ 2023-05-10 18:27 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: netdev, pabeni, davem, edumazet, saeedm, moshe



On 5/9/2023 8:24 PM, Jakub Kicinski wrote:
> On Tue,  9 May 2023 12:09:36 +0200 Jiri Pirko wrote:
>> The commit 565b4824c39f ("devlink: change port event netdev notifier
>> from per-net to global") changed original per-net notifier to be global
>> which fixed the issue of non-receiving events of netdev uninit if that
>> moved to a different namespace. That worked fine in -net tree.
>>
>> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
>> separate devlink instance for ethernet auxiliary device") and
>> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
>> case of PCI device suspend") were merged, a deadlock was introduced
>> when removing a namespace with devlink instance with another nested
>> instance.
>>
>> Here there is the bad flow example resulting in deadlock with mlx5:
>> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
>> devlink_pernet_pre_exit() -> devlink_reload() ->
>> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
>> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
>> mlx5e_destroy_devlink() -> devlink_free() ->
>> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
> 
> Why don't we have a single, static notifier for all of devlink?
> Why the per device/per port notifiers?
> 
> We have the devlink port pointer in struct net_device, resolving from
> a global event to the correct devlink instance is trivial.

Ok, so if I think through all the possibilities:

1. Originally we had a namespace specific notifier for each struct devlink.

2. Then we added a global notifier for all namespaces, but still had one
for each devlink.

3. Then Jiri's proposal here was a per-namespace notifier per port, but
we then we follow the namespace when the netdev changes namespaces.



But its simpler to just have a single global notifier that is setup once
for all devlinks. Then, when we get a notification, instead of looking
up the devlink instance from the notifier using container_of we look it
up through the netdev->devlink_port connection.

Ya that seems a lot simpler and requires only one notifier instead, and
wouldn't require namespace following code.

I think that makes a lot of sense.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-10 18:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
2023-05-09 10:09 ` [patch net 2/3] devlink: make netdev notifier per-port Jiri Pirko
2023-05-09 10:09 ` [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev Jiri Pirko
2023-05-09 20:13 ` [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jacob Keller
2023-05-10  3:24 ` Jakub Kicinski
2023-05-10  6:11   ` Jiri Pirko
2023-05-10 18:27   ` Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.