* [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
@ 2023-05-09 10:09 Jiri Pirko
2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe
From: Jiri Pirko <jiri@nvidia.com>
The commit 565b4824c39f ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be global
which fixed the issue of non-receiving events of netdev uninit if that
moved to a different namespace. That worked fine in -net tree.
However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.
Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1
The first two patches are just dependencies of the last one, which is
actually fixing the issue. It converts the notifier to per per-net
again. But this time also per-devlink_port and setting to follow
the netdev to different namespace.
Jiri Pirko (3):
net: allow to ask per-net netdevice notifier to follow netdev
dynamically
devlink: make netdev notifier per-port
devlink: change port event netdev notifier to be per-net and following
netdev
include/linux/netdevice.h | 6 +++++
include/net/devlink.h | 2 ++
net/core/dev.c | 34 +++++++++++++++++++++++-----
net/devlink/core.c | 9 --------
net/devlink/devl_internal.h | 4 ----
net/devlink/leftover.c | 45 ++++++++++++++++++++++++++-----------
6 files changed, 69 insertions(+), 31 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
@ 2023-05-09 10:09 ` Jiri Pirko
2023-05-09 10:09 ` [patch net 2/3] devlink: make netdev notifier per-port Jiri Pirko
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe
From: Jiri Pirko <jiri@nvidia.com>
Currently, it is possible to register netdev notifier to get only
events related to a selected namespace. This could be done by:
register_netdevice_notifier_net()
Another extension which currently exists is to register netdev notifier
that receives events related to a namespace, where a netdev is. The
notifier moves from namespace to namespace with the selected netdev.
This could be done by:
register_netdevice_notifier_dev_net()
Devlink has a usecase to monitor a namespace and whenever certain netdev
appears in this namespace, it needs to get notifications even in case
netdev moves to a different namespace. It's basically a combination of
the two described above.
Introduce a pair of functions netdev_net_notifier_follow() and
netdev_net_notifier_unfollow() to be called on previously registered
per-net notifier asking to follow the given netdev.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
include/linux/netdevice.h | 6 ++++++
net/core/dev.c | 34 +++++++++++++++++++++++++++++-----
2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08fbd4622ccf..63376dad8464 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2890,6 +2890,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb);
int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb);
int unregister_netdevice_notifier_net(struct net *net,
struct notifier_block *nb);
+void netdev_net_notifier_follow(struct net_device *dev,
+ struct notifier_block *nb,
+ struct netdev_net_notifier *nn);
+void netdev_net_notifier_unfollow(struct net_device *dev,
+ struct netdev_net_notifier *nn,
+ struct net *net);
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn);
diff --git a/net/core/dev.c b/net/core/dev.c
index 735096d42c1d..3458ed8f98f2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1868,6 +1868,32 @@ static void __move_netdevice_notifier_net(struct net *src_net,
__register_netdevice_notifier_net(dst_net, nb, true);
}
+void netdev_net_notifier_follow(struct net_device *dev,
+ struct notifier_block *nb,
+ struct netdev_net_notifier *nn)
+{
+ ASSERT_RTNL();
+ nn->nb = nb;
+ list_add(&nn->list, &dev->net_notifier_list);
+}
+EXPORT_SYMBOL(netdev_net_notifier_follow);
+
+static void __netdev_net_notifier_unfollow(struct netdev_net_notifier *nn)
+{
+ list_del(&nn->list);
+}
+
+void netdev_net_notifier_unfollow(struct net_device *dev,
+ struct netdev_net_notifier *nn,
+ struct net *net)
+{
+ ASSERT_RTNL();
+ __netdev_net_notifier_unfollow(nn);
+ if (!net_eq(dev_net(dev), net))
+ __move_netdevice_notifier_net(dev_net(dev), net, nn->nb);
+}
+EXPORT_SYMBOL(netdev_net_notifier_unfollow);
+
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
@@ -1876,10 +1902,8 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
rtnl_lock();
err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
- if (!err) {
- nn->nb = nb;
- list_add(&nn->list, &dev->net_notifier_list);
- }
+ if (!err)
+ netdev_net_notifier_follow(dev, nb, nn);
rtnl_unlock();
return err;
}
@@ -1892,7 +1916,7 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
int err;
rtnl_lock();
- list_del(&nn->list);
+ __netdev_net_notifier_unfollow(nn);
err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
rtnl_unlock();
return err;
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [patch net 2/3] devlink: make netdev notifier per-port
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
@ 2023-05-09 10:09 ` Jiri Pirko
2023-05-09 10:09 ` [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev Jiri Pirko
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe
From: Jiri Pirko <jiri@nvidia.com>
The netdev notifier is used to track changes of a netdev related to a
certain devlink port. In preparation for the next patch,
change the netdev notifier to register per devlink port instance.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
include/net/devlink.h | 1 +
net/devlink/core.c | 9 ---------
net/devlink/devl_internal.h | 4 ----
net/devlink/leftover.c | 31 ++++++++++++++++++++++---------
4 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6a942e70e451..d0a0d1ce7db4 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -149,6 +149,7 @@ struct devlink_port {
struct devlink_rate *devlink_rate;
struct devlink_linecard *linecard;
+ struct notifier_block netdevice_nb;
};
struct devlink_port_new_attrs {
diff --git a/net/devlink/core.c b/net/devlink/core.c
index 777b091ef74d..018e547bdb98 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
if (ret < 0)
goto err_xa_alloc;
- devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
- ret = register_netdevice_notifier(&devlink->netdevice_nb);
- if (ret)
- goto err_register_netdevice_notifier;
-
devlink->dev = dev;
devlink->ops = ops;
xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
@@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
return devlink;
-err_register_netdevice_notifier:
- xa_erase(&devlinks, devlink->index);
err_xa_alloc:
kfree(devlink);
return NULL;
@@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
xa_destroy(&devlink->params);
xa_destroy(&devlink->ports);
- WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
-
xa_erase(&devlinks, devlink->index);
devlink_put(devlink);
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index e133f423294a..e595c5dcff45 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -50,7 +50,6 @@ struct devlink {
u8 reload_failed:1;
refcount_t refcount;
struct rcu_work rwork;
- struct notifier_block netdevice_nb;
char priv[] __aligned(NETDEV_ALIGN);
};
@@ -171,9 +170,6 @@ extern const struct devlink_cmd devl_cmd_selftests_get;
void devlink_notify(struct devlink *devlink, enum devlink_command cmd);
/* Ports */
-int devlink_port_netdevice_event(struct notifier_block *nb,
- unsigned long event, void *ptr);
-
struct devlink_port *
devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info);
struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink,
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index dffca2f9bfa7..4b1627cb2b83 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -6841,6 +6841,9 @@ void devlink_port_fini(struct devlink_port *devlink_port)
}
EXPORT_SYMBOL_GPL(devlink_port_fini);
+static int devlink_port_netdevice_event(struct notifier_block *nb,
+ unsigned long event, void *ptr);
+
/**
* devl_port_register() - Register devlink port
*
@@ -6869,14 +6872,24 @@ int devl_port_register(struct devlink *devlink,
devlink_port->index = port_index;
spin_lock_init(&devlink_port->type_lock);
INIT_LIST_HEAD(&devlink_port->reporter_list);
- err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL);
+
+ devlink_port->netdevice_nb.notifier_call = devlink_port_netdevice_event;
+ err = register_netdevice_notifier(&devlink_port->netdevice_nb);
if (err)
return err;
+ err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL);
+ if (err)
+ goto err_xa_insert;
+
INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn);
devlink_port_type_warn_schedule(devlink_port);
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
return 0;
+
+err_xa_insert:
+ unregister_netdevice_notifier(&devlink_port->netdevice_nb);
+ return err;
}
EXPORT_SYMBOL_GPL(devl_port_register);
@@ -6921,6 +6934,7 @@ void devl_port_unregister(struct devlink_port *devlink_port)
devlink_port_type_warn_cancel(devlink_port);
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
xa_erase(&devlink_port->devlink->ports, devlink_port->index);
+ WARN_ON_ONCE(unregister_netdevice_notifier(&devlink_port->netdevice_nb));
WARN_ON(!list_empty(&devlink_port->reporter_list));
devlink_port->registered = false;
}
@@ -7066,16 +7080,15 @@ void devlink_port_type_clear(struct devlink_port *devlink_port)
}
EXPORT_SYMBOL_GPL(devlink_port_type_clear);
-int devlink_port_netdevice_event(struct notifier_block *nb,
- unsigned long event, void *ptr)
+static int devlink_port_netdevice_event(struct notifier_block *nb,
+ unsigned long event, void *ptr)
{
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
- struct devlink_port *devlink_port = netdev->devlink_port;
- struct devlink *devlink;
+ struct devlink_port *devlink_port;
- devlink = container_of(nb, struct devlink, netdevice_nb);
+ devlink_port = container_of(nb, struct devlink_port, netdevice_nb);
- if (!devlink_port || devlink_port->devlink != devlink)
+ if (netdev->devlink_port != devlink_port)
return NOTIFY_OK;
switch (event) {
@@ -7089,7 +7102,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
break;
case NETDEV_REGISTER:
case NETDEV_CHANGENAME:
- if (devlink_net(devlink) != dev_net(netdev))
+ if (devlink_net(devlink_port->devlink) != dev_net(netdev))
return NOTIFY_OK;
/* Set the netdev on top of previously set type. Note this
* event happens also during net namespace change so here
@@ -7100,7 +7113,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
netdev);
break;
case NETDEV_UNREGISTER:
- if (devlink_net(devlink) != dev_net(netdev))
+ if (devlink_net(devlink_port->devlink) != dev_net(netdev))
return NOTIFY_OK;
/* Clear netdev pointer, but not the type. This event happens
* also during net namespace change so we need to clear
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
2023-05-09 10:09 ` [patch net 2/3] devlink: make netdev notifier per-port Jiri Pirko
@ 2023-05-09 10:09 ` Jiri Pirko
2023-05-09 20:13 ` [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jacob Keller
2023-05-10 3:24 ` Jakub Kicinski
4 siblings, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-09 10:09 UTC (permalink / raw)
To: netdev; +Cc: kuba, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe
From: Jiri Pirko <jiri@nvidia.com>
The commit 565b4824c39f ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be global
which fixed the issue of non-receiving events of netdev uninit if that
moved to a different namespace. That worked fine in -net tree.
However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.
Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1
Resolve this by converting the notifier to per per-net again.
But this time also per-devlink_port and setting to follow the netdev
to different namespace when spotted, ensured by
netdev_net_notifier_follow().
Note what a tree needs this fix only in case all of the cited fixes
commits are present.
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
include/net/devlink.h | 1 +
net/devlink/leftover.c | 24 +++++++++++++++---------
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index d0a0d1ce7db4..f252da446264 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -150,6 +150,7 @@ struct devlink_port {
struct devlink_rate *devlink_rate;
struct devlink_linecard *linecard;
struct notifier_block netdevice_nb;
+ struct netdev_net_notifier netdevice_nn;
};
struct devlink_port_new_attrs {
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index 4b1627cb2b83..9e11db6528ce 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -6874,7 +6874,8 @@ int devl_port_register(struct devlink *devlink,
INIT_LIST_HEAD(&devlink_port->reporter_list);
devlink_port->netdevice_nb.notifier_call = devlink_port_netdevice_event;
- err = register_netdevice_notifier(&devlink_port->netdevice_nb);
+ err = register_netdevice_notifier_net(devlink_net(devlink),
+ &devlink_port->netdevice_nb);
if (err)
return err;
@@ -6888,7 +6889,8 @@ int devl_port_register(struct devlink *devlink,
return 0;
err_xa_insert:
- unregister_netdevice_notifier(&devlink_port->netdevice_nb);
+ unregister_netdevice_notifier_net(devlink_net(devlink),
+ &devlink_port->netdevice_nb);
return err;
}
EXPORT_SYMBOL_GPL(devl_port_register);
@@ -6928,13 +6930,16 @@ EXPORT_SYMBOL_GPL(devlink_port_register);
*/
void devl_port_unregister(struct devlink_port *devlink_port)
{
- lockdep_assert_held(&devlink_port->devlink->lock);
+ struct devlink *devlink = devlink_port->devlink;
+
+ lockdep_assert_held(&devlink->lock);
WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);
devlink_port_type_warn_cancel(devlink_port);
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
- xa_erase(&devlink_port->devlink->ports, devlink_port->index);
- WARN_ON_ONCE(unregister_netdevice_notifier(&devlink_port->netdevice_nb));
+ xa_erase(&devlink->ports, devlink_port->index);
+ WARN_ON_ONCE(unregister_netdevice_notifier_net(devlink_net(devlink),
+ &devlink_port->netdevice_nb));
WARN_ON(!list_empty(&devlink_port->reporter_list));
devlink_port->registered = false;
}
@@ -7099,11 +7104,11 @@ static int devlink_port_netdevice_event(struct notifier_block *nb,
*/
__devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH,
NULL);
+ netdev_net_notifier_follow(netdev, nb,
+ &devlink_port->netdevice_nn);
break;
case NETDEV_REGISTER:
case NETDEV_CHANGENAME:
- if (devlink_net(devlink_port->devlink) != dev_net(netdev))
- return NOTIFY_OK;
/* Set the netdev on top of previously set type. Note this
* event happens also during net namespace change so here
* we take into account netdev pointer appearing in this
@@ -7113,8 +7118,6 @@ static int devlink_port_netdevice_event(struct notifier_block *nb,
netdev);
break;
case NETDEV_UNREGISTER:
- if (devlink_net(devlink_port->devlink) != dev_net(netdev))
- return NOTIFY_OK;
/* Clear netdev pointer, but not the type. This event happens
* also during net namespace change so we need to clear
* pointer to netdev that is going to another net namespace.
@@ -7128,6 +7131,9 @@ static int devlink_port_netdevice_event(struct notifier_block *nb,
*/
__devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET,
NULL);
+ netdev_net_notifier_unfollow(netdev,
+ &devlink_port->netdevice_nn,
+ devlink_net(devlink_port->devlink));
break;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
` (2 preceding siblings ...)
2023-05-09 10:09 ` [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev Jiri Pirko
@ 2023-05-09 20:13 ` Jacob Keller
2023-05-10 3:24 ` Jakub Kicinski
4 siblings, 0 replies; 8+ messages in thread
From: Jacob Keller @ 2023-05-09 20:13 UTC (permalink / raw)
To: Jiri Pirko, netdev; +Cc: kuba, pabeni, davem, edumazet, saeedm, moshe
On 5/9/2023 3:09 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
>
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be global
> which fixed the issue of non-receiving events of netdev uninit if that
> moved to a different namespace. That worked fine in -net tree.
>
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
>
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>
> Steps to reproduce:
> $ modprobe mlx5_core
> $ ip netns add ns1
> $ devlink dev reload pci/0000:08:00.0 netns ns1
> $ ip netns del ns1
>
> The first two patches are just dependencies of the last one, which is
> actually fixing the issue. It converts the notifier to per per-net
> again. But this time also per-devlink_port and setting to follow
> the netdev to different namespace.
>
> Jiri Pirko (3):
> net: allow to ask per-net netdevice notifier to follow netdev
> dynamically
> devlink: make netdev notifier per-port
> devlink: change port event netdev notifier to be per-net and following
> netdev
>
> include/linux/netdevice.h | 6 +++++
> include/net/devlink.h | 2 ++
> net/core/dev.c | 34 +++++++++++++++++++++++-----
> net/devlink/core.c | 9 --------
> net/devlink/devl_internal.h | 4 ----
> net/devlink/leftover.c | 45 ++++++++++++++++++++++++++-----------
> 6 files changed, 69 insertions(+), 31 deletions(-)
>
Not sure of the patchworks is confused by 3/3 instead of 0/3 on the
cover letter, but the series looks good to me:
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
` (3 preceding siblings ...)
2023-05-09 20:13 ` [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jacob Keller
@ 2023-05-10 3:24 ` Jakub Kicinski
2023-05-10 6:11 ` Jiri Pirko
2023-05-10 18:27 ` Jacob Keller
4 siblings, 2 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-05-10 3:24 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe
On Tue, 9 May 2023 12:09:36 +0200 Jiri Pirko wrote:
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be global
> which fixed the issue of non-receiving events of netdev uninit if that
> moved to a different namespace. That worked fine in -net tree.
>
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
>
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
Why don't we have a single, static notifier for all of devlink?
Why the per device/per port notifiers?
We have the devlink port pointer in struct net_device, resolving from
a global event to the correct devlink instance is trivial.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
2023-05-10 3:24 ` Jakub Kicinski
@ 2023-05-10 6:11 ` Jiri Pirko
2023-05-10 18:27 ` Jacob Keller
1 sibling, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2023-05-10 6:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, pabeni, davem, edumazet, jacob.e.keller, saeedm, moshe
Wed, May 10, 2023 at 05:24:44AM CEST, kuba@kernel.org wrote:
>On Tue, 9 May 2023 12:09:36 +0200 Jiri Pirko wrote:
>> The commit 565b4824c39f ("devlink: change port event netdev notifier
>> from per-net to global") changed original per-net notifier to be global
>> which fixed the issue of non-receiving events of netdev uninit if that
>> moved to a different namespace. That worked fine in -net tree.
>>
>> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
>> separate devlink instance for ethernet auxiliary device") and
>> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
>> case of PCI device suspend") were merged, a deadlock was introduced
>> when removing a namespace with devlink instance with another nested
>> instance.
>>
>> Here there is the bad flow example resulting in deadlock with mlx5:
>> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
>> devlink_pernet_pre_exit() -> devlink_reload() ->
>> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
>> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
>> mlx5e_destroy_devlink() -> devlink_free() ->
>> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>
>Why don't we have a single, static notifier for all of devlink?
>Why the per device/per port notifiers?
>
>We have the devlink port pointer in struct net_device, resolving from
>a global event to the correct devlink instance is trivial.
Okay, that might work. Let me explore that.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove
2023-05-10 3:24 ` Jakub Kicinski
2023-05-10 6:11 ` Jiri Pirko
@ 2023-05-10 18:27 ` Jacob Keller
1 sibling, 0 replies; 8+ messages in thread
From: Jacob Keller @ 2023-05-10 18:27 UTC (permalink / raw)
To: Jakub Kicinski, Jiri Pirko; +Cc: netdev, pabeni, davem, edumazet, saeedm, moshe
On 5/9/2023 8:24 PM, Jakub Kicinski wrote:
> On Tue, 9 May 2023 12:09:36 +0200 Jiri Pirko wrote:
>> The commit 565b4824c39f ("devlink: change port event netdev notifier
>> from per-net to global") changed original per-net notifier to be global
>> which fixed the issue of non-receiving events of netdev uninit if that
>> moved to a different namespace. That worked fine in -net tree.
>>
>> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
>> separate devlink instance for ethernet auxiliary device") and
>> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
>> case of PCI device suspend") were merged, a deadlock was introduced
>> when removing a namespace with devlink instance with another nested
>> instance.
>>
>> Here there is the bad flow example resulting in deadlock with mlx5:
>> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
>> devlink_pernet_pre_exit() -> devlink_reload() ->
>> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
>> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
>> mlx5e_destroy_devlink() -> devlink_free() ->
>> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>
> Why don't we have a single, static notifier for all of devlink?
> Why the per device/per port notifiers?
>
> We have the devlink port pointer in struct net_device, resolving from
> a global event to the correct devlink instance is trivial.
Ok, so if I think through all the possibilities:
1. Originally we had a namespace specific notifier for each struct devlink.
2. Then we added a global notifier for all namespaces, but still had one
for each devlink.
3. Then Jiri's proposal here was a per-namespace notifier per port, but
we then we follow the namespace when the netdev changes namespaces.
But its simpler to just have a single global notifier that is setup once
for all devlinks. Then, when we get a notification, instead of looking
up the devlink instance from the notifier using container_of we look it
up through the netdev->devlink_port connection.
Ya that seems a lot simpler and requires only one notifier instead, and
wouldn't require namespace following code.
I think that makes a lot of sense.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-05-10 18:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 10:09 [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jiri Pirko
2023-05-09 10:09 ` [patch net 1/3] net: allow to ask per-net netdevice notifier to follow netdev dynamically Jiri Pirko
2023-05-09 10:09 ` [patch net 2/3] devlink: make netdev notifier per-port Jiri Pirko
2023-05-09 10:09 ` [patch net 3/3] devlink: change port event netdev notifier to be per-net and following netdev Jiri Pirko
2023-05-09 20:13 ` [patch net 3/3] devlink: fix a deadlock with nested instances during namespace remove Jacob Keller
2023-05-10 3:24 ` Jakub Kicinski
2023-05-10 6:11 ` Jiri Pirko
2023-05-10 18:27 ` Jacob Keller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.