Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
@ 2020-10-19  5:27 Leon Romanovsky
  2020-10-19 13:07 ` Jason Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-10-19  5:27 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Parav Pandit, Jakub Kicinski, Jiri Pirko, linux-rdma,
	Michael Guralnik, netdev, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

When a mlx5 core devlink instance is reloaded in different net
namespace, its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it
through the netdev notifier chain during load and unload sequence.
A below call graph of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to access
unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev
notifier instead of global one, which operats on the net namespace
without holding the pernet_ops_rwsem.

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c                  | 6 ++++--
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
 include/linux/mlx5/driver.h                        | 5 +++++
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 944bb7691913..b1b3e563c15e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3323,7 +3323,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 	int err;

 	dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
-	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
+	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
+					      &dev->port[port_num].roce.nb);
 	if (err) {
 		dev->port[port_num].roce.nb.notifier_call = NULL;
 		return err;
@@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	if (dev->port[port_num].roce.nb.notifier_call) {
-		unregister_netdevice_notifier(&dev->port[port_num].roce.nb);
+		unregister_netdevice_notifier_net(mlx5_core_net(dev->mdev),
+						  &dev->port[port_num].roce.nb);
 		dev->port[port_num].roce.nb.notifier_call = NULL;
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index d046db7bb047..3a9fa629503f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -90,9 +90,4 @@ int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
 			       u32 key_type, u32 *p_key_id);
 void mlx5_destroy_encryption_key(struct mlx5_core_dev *mdev, u32 key_id);

-static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
-{
-	return devlink_net(priv_to_devlink(dev));
-}
-
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c484805d8a22..1c810911d367 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1210,4 +1210,9 @@ static inline bool mlx5_is_roce_enabled(struct mlx5_core_dev *dev)
 	return val.vbool;
 }

+static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
+{
+	return devlink_net(priv_to_devlink(dev));
+}
+
 #endif /* MLX5_DRIVER_H */
--
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19  5:27 [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion Leon Romanovsky
@ 2020-10-19 13:07 ` Jason Gunthorpe
  2020-10-19 13:23   ` Parav Pandit
  2020-10-26 13:38 ` Parav Pandit
  2020-10-26 13:43 ` [PATCH rdma-rc RESEND v1] " Parav Pandit
  2 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-10-19 13:07 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Parav Pandit, Jakub Kicinski, Jiri Pirko,
	linux-rdma, Michael Guralnik, netdev, Saeed Mahameed

On Mon, Oct 19, 2020 at 08:27:36AM +0300, Leon Romanovsky wrote:
> From: Parav Pandit <parav@nvidia.com>
> 
> When a mlx5 core devlink instance is reloaded in different net
> namespace, its associated IB device is deleted and recreated.
> 
> Example sequence is:
> $ ip netns add foo
> $ devlink dev reload pci/0000:00:08.0 netns foo
> $ ip netns del foo
> 
> mlx5 IB device needs to attach and detach the netdevice to it
> through the netdev notifier chain during load and unload sequence.
> A below call graph of the unload flow.
> 
> cleanup_net()
>    down_read(&pernet_ops_rwsem); <- first sem acquired
>      ops_pre_exit_list()
>        pre_exit()
>          devlink_pernet_pre_exit()
>            devlink_reload()
>              mlx5_devlink_reload_down()
>                mlx5_unload_one()
>                [...]
>                  mlx5_ib_remove()
>                    mlx5_ib_unbind_slave_port()
>                      mlx5_remove_netdev_notifier()
>                        unregister_netdevice_notifier()
>                          down_write(&pernet_ops_rwsem);<- recurrsive lock
> 
> Hence, when net namespace is deleted, mlx5 reload results in deadlock.
> 
> When deadlock occurs, devlink mutex is also held. This not only deadlocks
> the mlx5 device under reload, but all the processes which attempt to access
> unrelated devlink devices are deadlocked.
> 
> Hence, fix this by mlx5 ib driver to register for per net netdev
> notifier instead of global one, which operats on the net namespace
> without holding the pernet_ops_rwsem.
> 
> Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>  drivers/infiniband/hw/mlx5/main.c                  | 6 ++++--
>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
>  include/linux/mlx5/driver.h                        | 5 +++++
>  3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index 944bb7691913..b1b3e563c15e 100644
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -3323,7 +3323,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
>  	int err;
> 
>  	dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
> -	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
> +	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
> +					      &dev->port[port_num].roce.nb);

This looks racy, what lock needs to be held to keep *mlx5_core_net()
stable?

>  	if (err) {
>  		dev->port[port_num].roce.nb.notifier_call = NULL;
>  		return err;
> @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
>  static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
>  {
>  	if (dev->port[port_num].roce.nb.notifier_call) {
> -		unregister_netdevice_notifier(&dev->port[port_num].roce.nb);
> +		unregister_netdevice_notifier_net(mlx5_core_net(dev->mdev),
> +						  &dev->port[port_num].roce.nb);

This seems dangerous too, what if the mlx5_core_net changed before we
get here?

What are the rules for when devlink_net() changes?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19 13:07 ` Jason Gunthorpe
@ 2020-10-19 13:23   ` Parav Pandit
  2020-10-19 19:01     ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Parav Pandit @ 2020-10-19 13:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Doug Ledford, Jakub Kicinski, Jiri Pirko, linux-rdma,
	Michael Guralnik, netdev, Saeed Mahameed


> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, October 19, 2020 6:38 PM
> 
> On Mon, Oct 19, 2020 at 08:27:36AM +0300, Leon Romanovsky wrote:
> > From: Parav Pandit <parav@nvidia.com>
> >
> > When a mlx5 core devlink instance is reloaded in different net
> > namespace, its associated IB device is deleted and recreated.
> >
> > Example sequence is:
> > $ ip netns add foo
> > $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo
> >
> > mlx5 IB device needs to attach and detach the netdevice to it through
> > the netdev notifier chain during load and unload sequence.
> > A below call graph of the unload flow.
> >
> > cleanup_net()
> >    down_read(&pernet_ops_rwsem); <- first sem acquired
> >      ops_pre_exit_list()
> >        pre_exit()
> >          devlink_pernet_pre_exit()
> >            devlink_reload()
> >              mlx5_devlink_reload_down()
> >                mlx5_unload_one()
> >                [...]
> >                  mlx5_ib_remove()
> >                    mlx5_ib_unbind_slave_port()
> >                      mlx5_remove_netdev_notifier()
> >                        unregister_netdevice_notifier()
> >                          down_write(&pernet_ops_rwsem);<- recurrsive
> > lock
> >
> > Hence, when net namespace is deleted, mlx5 reload results in deadlock.
> >
> > When deadlock occurs, devlink mutex is also held. This not only
> > deadlocks the mlx5 device under reload, but all the processes which
> > attempt to access unrelated devlink devices are deadlocked.
> >
> > Hence, fix this by mlx5 ib driver to register for per net netdev
> > notifier instead of global one, which operats on the net namespace
> > without holding the pernet_ops_rwsem.
> >
> > Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >  drivers/infiniband/hw/mlx5/main.c                  | 6 ++++--
> >  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
> >  include/linux/mlx5/driver.h                        | 5 +++++
> >  3 files changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/main.c
> > b/drivers/infiniband/hw/mlx5/main.c
> > index 944bb7691913..b1b3e563c15e 100644
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -3323,7 +3323,8 @@ static int mlx5_add_netdev_notifier(struct
> mlx5_ib_dev *dev, u8 port_num)
> >  	int err;
> >
> >  	dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
> > -	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
> > +	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
> > +					      &dev->port[port_num].roce.nb);
> 
> This looks racy, what lock needs to be held to keep *mlx5_core_net() stable?

mlx5_core_net() cannot be accessed outside of mlx5 driver's load, unload, reload path.

When this is getting executed, devlink cannot be executing reload.
This is guarded by devlink_reload_enable/disable calls done by mlx5 core.

> 
> >  	if (err) {
> >  		dev->port[port_num].roce.nb.notifier_call = NULL;
> >  		return err;
> > @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct
> > mlx5_ib_dev *dev, u8 port_num)  static void
> > mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)  {
> >  	if (dev->port[port_num].roce.nb.notifier_call) {
> > -		unregister_netdevice_notifier(&dev-
> >port[port_num].roce.nb);
> > +		unregister_netdevice_notifier_net(mlx5_core_net(dev-
> >mdev),
> > +						  &dev-
> >port[port_num].roce.nb);
> 
> This seems dangerous too, what if the mlx5_core_net changed before we
> get here?
> 
When I inspected driver, code, I am not aware of any code flow where this can
change before reaching here, because registration and unregistratio is done only in driver load, unload and reload path.
Reload can happen only after devlink_reload_enable() is done.

> What are the rules for when devlink_net() changes?
> 
devlink_net() changes only after unload() callback is completed in driver.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19 13:23   ` Parav Pandit
@ 2020-10-19 19:01     ` Jason Gunthorpe
  2020-10-19 19:26       ` Parav Pandit
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-10-19 19:01 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Leon Romanovsky, Doug Ledford, Jakub Kicinski, Jiri Pirko,
	linux-rdma, Michael Guralnik, netdev, Saeed Mahameed

On Mon, Oct 19, 2020 at 01:23:23PM +0000, Parav Pandit wrote:
> > > -	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
> > > +	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
> > > +					      &dev->port[port_num].roce.nb);
> > 
> > This looks racy, what lock needs to be held to keep *mlx5_core_net() stable?
> 
> mlx5_core_net() cannot be accessed outside of mlx5 driver's load, unload, reload path.
> 
> When this is getting executed, devlink cannot be executing reload.
> This is guarded by devlink_reload_enable/disable calls done by mlx5 core.

A comment that devlink_reload_enable/disable() must be held would be
helpful
 
> > 
> > >  	if (err) {
> > >  		dev->port[port_num].roce.nb.notifier_call = NULL;
> > >  		return err;
> > > @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct
> > > mlx5_ib_dev *dev, u8 port_num)  static void
> > > mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)  {
> > >  	if (dev->port[port_num].roce.nb.notifier_call) {
> > > -		unregister_netdevice_notifier(&dev-
> > >port[port_num].roce.nb);
> > > +		unregister_netdevice_notifier_net(mlx5_core_net(dev-
> > >mdev),
> > > +						  &dev-
> > >port[port_num].roce.nb);
> > 
> > This seems dangerous too, what if the mlx5_core_net changed before we
> > get here?
>
> When I inspected driver, code, I am not aware of any code flow where
> this can change before reaching here, because registration and
> unregistration is done only in driver load, unload and reload path.
> Reload can happen only after devlink_reload_enable() is done.

But we enable reload right after init_one

> > What are the rules for when devlink_net() changes?
> > 
> devlink_net() changes only after unload() callback is completed in driver.

You mean mlx5_devlink_reload_down ?

That seems OK then

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19 19:01     ` Jason Gunthorpe
@ 2020-10-19 19:26       ` Parav Pandit
  2020-10-20 11:41         ` Parav Pandit
  0 siblings, 1 reply; 10+ messages in thread
From: Parav Pandit @ 2020-10-19 19:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Jakub Kicinski, Jiri Pirko,
	linux-rdma, Michael Guralnik, netdev, Saeed Mahameed



> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, October 20, 2020 12:31 AM
> 
> On Mon, Oct 19, 2020 at 01:23:23PM +0000, Parav Pandit wrote:
> > > > -	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
> > > > +	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
> > > > +					      &dev->port[port_num].roce.nb);
> > >
> > > This looks racy, what lock needs to be held to keep *mlx5_core_net()
> stable?
> >
> > mlx5_core_net() cannot be accessed outside of mlx5 driver's load, unload,
> reload path.
> >
> > When this is getting executed, devlink cannot be executing reload.
> > This is guarded by devlink_reload_enable/disable calls done by mlx5 core.
> 
> A comment that devlink_reload_enable/disable() must be held would be
> helpful
> 
Yes. will add.

> > >
> > > >  	if (err) {
> > > >  		dev->port[port_num].roce.nb.notifier_call = NULL;
> > > >  		return err;
> > > > @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct
> > > >mlx5_ib_dev *dev, u8 port_num)  static void
> > > >mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
> {
> > > >  	if (dev->port[port_num].roce.nb.notifier_call) {
> > > > -		unregister_netdevice_notifier(&dev-
> > > >port[port_num].roce.nb);
> > > > +		unregister_netdevice_notifier_net(mlx5_core_net(dev-
> > > >mdev),
> > > > +						  &dev-
> > > >port[port_num].roce.nb);
> > >
> > > This seems dangerous too, what if the mlx5_core_net changed before
> > > we get here?
> >
> > When I inspected driver, code, I am not aware of any code flow where
> > this can change before reaching here, because registration and
> > unregistration is done only in driver load, unload and reload path.
> > Reload can happen only after devlink_reload_enable() is done.
> 
> But we enable reload right after init_one
> 
> > > What are the rules for when devlink_net() changes?
> > >
> > devlink_net() changes only after unload() callback is completed in driver.
> 
> You mean mlx5_devlink_reload_down ?
> 
Right.
> That seems OK then
Ok. will work with Leon to add the comment.
> 
> Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19 19:26       ` Parav Pandit
@ 2020-10-20 11:41         ` Parav Pandit
  0 siblings, 0 replies; 10+ messages in thread
From: Parav Pandit @ 2020-10-20 11:41 UTC (permalink / raw)
  To: Parav Pandit, Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Jakub Kicinski, Jiri Pirko,
	linux-rdma, Michael Guralnik, netdev, Saeed Mahameed

Hi Jason,

> From: Parav Pandit <parav@nvidia.com>
> Sent: Tuesday, October 20, 2020 12:57 AM
> 
> 
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, October 20, 2020 12:31 AM
> >
> > On Mon, Oct 19, 2020 at 01:23:23PM +0000, Parav Pandit wrote:
> > > > > -	err = register_netdevice_notifier(&dev-
> >port[port_num].roce.nb);
> > > > > +	err = register_netdevice_notifier_net(mlx5_core_net(dev-
> >mdev),
> > > > > +					      &dev-
> >port[port_num].roce.nb);
> > > >
> > > > This looks racy, what lock needs to be held to keep
> > > > *mlx5_core_net()
> > stable?
> > >
> > > mlx5_core_net() cannot be accessed outside of mlx5 driver's load,
> > > unload,
> > reload path.
> > >
> > > When this is getting executed, devlink cannot be executing reload.
> > > This is guarded by devlink_reload_enable/disable calls done by mlx5 core.
> >
> > A comment that devlink_reload_enable/disable() must be held would be
> > helpful
> >
> Yes. will add.
> 
> > > >
> > > > >  	if (err) {
> > > > >  		dev->port[port_num].roce.nb.notifier_call = NULL;
> > > > >  		return err;
> > > > > @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct
> > > > >mlx5_ib_dev *dev, u8 port_num)  static void
> > > > >mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8
> port_num)
> > {
> > > > >  	if (dev->port[port_num].roce.nb.notifier_call) {
> > > > > -		unregister_netdevice_notifier(&dev-
> > > > >port[port_num].roce.nb);
> > > > > +
> 	unregister_netdevice_notifier_net(mlx5_core_net(dev-
> > > > >mdev),
> > > > > +						  &dev-
> > > > >port[port_num].roce.nb);
> > > >
> > > > This seems dangerous too, what if the mlx5_core_net changed before
> > > > we get here?
> > >
> > > When I inspected driver, code, I am not aware of any code flow where
> > > this can change before reaching here, because registration and
> > > unregistration is done only in driver load, unload and reload path.
> > > Reload can happen only after devlink_reload_enable() is done.
> >
> > But we enable reload right after init_one
> >
> > > > What are the rules for when devlink_net() changes?
> > > >
> > > devlink_net() changes only after unload() callback is completed in driver.
> >
> > You mean mlx5_devlink_reload_down ?
> >
> Right.
> > That seems OK then
> Ok. will work with Leon to add the comment.

Is below fix up ok?

commit 33cf8a09e735849f622e8084a7b08d421f11a4e1 (HEAD -> netns-del-fix)
Author: Parav Pandit <parav@nvidia.com>
Date:   Tue Oct 20 12:26:08 2020 +0300

    fixup: for RDMA/mlx5: Fix devlink deadlock on net namespace deletion

    Changelog:
    v0->v1:
     - Added kdoc comment description for the API usage and allowed context

    issue: 2230150
    Change-Id: Ibd233f771682c27565f48c54cd48fd87b0a7790f
    Signed-off-by: Parav Pandit <parav@nvidia.com>

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 560b551d5ff8..3382855b7ef1 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1209,6 +1209,19 @@ static inline bool mlx5_is_roce_enabled(struct mlx5_core_dev *dev)
        return val.vbool;
 }

+/**
+ * mlx5_core_net - Provide net namespace of the mlx5_core_dev
+ * @dev: mlx5 core device
+ *
+ * mlx5_core_net() returns the net namespace of mlx5 core device.
+ * This can be called only in below described limited context.
+ * (a) When a devlink instance for mlx5_core is registered and
+ *     when devlink reload operation is disabled.
+ *     or
+ * (b) during devlink reload reload_down() and reload_up callbacks
+ *     where it is ensured that devlink instance's net namespace is
+ *     stable.
+ */
 static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
 {
        return devlink_net(priv_to_devlink(dev));

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19  5:27 [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion Leon Romanovsky
  2020-10-19 13:07 ` Jason Gunthorpe
@ 2020-10-26 13:38 ` Parav Pandit
  2020-10-26 13:47   ` Parav Pandit
  2020-10-26 13:43 ` [PATCH rdma-rc RESEND v1] " Parav Pandit
  2 siblings, 1 reply; 10+ messages in thread
From: Parav Pandit @ 2020-10-26 13:38 UTC (permalink / raw)
  To: dledford, jgg; +Cc: jiri, linux-rdma, michaelgur, netdev, saeedm, Parav Pandit

When a mlx5 core devlink instance is reloaded in different net
namespace, its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it
through the netdev notifier chain during load and unload sequence.
A below call graph of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to access
unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev
notifier instead of global one, which operats on the net namespace
without holding the pernet_ops_rwsem.

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
Changelog:
v0->v1:
 - updated comment for mlx5_core_net API to be used by multiple mlx5
   drivers
---
 drivers/infiniband/hw/mlx5/main.c              |  6 ++++--
 .../net/ethernet/mellanox/mlx5/core/lib/mlx5.h |  5 -----
 include/linux/mlx5/driver.h                    | 18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 89e04ca62ae0..246e3cbe0b2c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3305,7 +3305,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 	int err;
 
 	dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
-	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
+	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
+					      &dev->port[port_num].roce.nb);
 	if (err) {
 		dev->port[port_num].roce.nb.notifier_call = NULL;
 		return err;
@@ -3317,7 +3318,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	if (dev->port[port_num].roce.nb.notifier_call) {
-		unregister_netdevice_notifier(&dev->port[port_num].roce.nb);
+		unregister_netdevice_notifier_net(mlx5_core_net(dev->mdev),
+						  &dev->port[port_num].roce.nb);
 		dev->port[port_num].roce.nb.notifier_call = NULL;
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index d046db7bb047..3a9fa629503f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -90,9 +90,4 @@ int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
 			       u32 key_type, u32 *p_key_id);
 void mlx5_destroy_encryption_key(struct mlx5_core_dev *mdev, u32 key_id);
 
-static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
-{
-	return devlink_net(priv_to_devlink(dev));
-}
-
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c145de0473bc..3382855b7ef1 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1209,4 +1209,22 @@ static inline bool mlx5_is_roce_enabled(struct mlx5_core_dev *dev)
 	return val.vbool;
 }
 
+/**
+ * mlx5_core_net - Provide net namespace of the mlx5_core_dev
+ * @dev: mlx5 core device
+ *
+ * mlx5_core_net() returns the net namespace of mlx5 core device.
+ * This can be called only in below described limited context.
+ * (a) When a devlink instance for mlx5_core is registered and
+ *     when devlink reload operation is disabled.
+ *     or
+ * (b) during devlink reload reload_down() and reload_up callbacks
+ *     where it is ensured that devlink instance's net namespace is
+ *     stable.
+ */
+static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
+{
+	return devlink_net(priv_to_devlink(dev));
+}
+
 #endif /* MLX5_DRIVER_H */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc RESEND v1] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-19  5:27 [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion Leon Romanovsky
  2020-10-19 13:07 ` Jason Gunthorpe
  2020-10-26 13:38 ` Parav Pandit
@ 2020-10-26 13:43 ` Parav Pandit
  2020-10-26 22:25   ` Jason Gunthorpe
  2 siblings, 1 reply; 10+ messages in thread
From: Parav Pandit @ 2020-10-26 13:43 UTC (permalink / raw)
  To: dledford, jgg
  Cc: jiri, linux-rdma, michaelgur, netdev, saeedm, Parav Pandit,
	Leon Romanovsky

When a mlx5 core devlink instance is reloaded in different net
namespace, its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it
through the netdev notifier chain during load and unload sequence.
A below call graph of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to access
unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev
notifier instead of global one, which operats on the net namespace
without holding the pernet_ops_rwsem.

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Changelog:
v0->v1:
 - updated comment for mlx5_core_net API to be used by multiple mlx5
   drivers
---
 drivers/infiniband/hw/mlx5/main.c              |  6 ++++--
 .../net/ethernet/mellanox/mlx5/core/lib/mlx5.h |  5 -----
 include/linux/mlx5/driver.h                    | 18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 89e04ca62ae0..246e3cbe0b2c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3305,7 +3305,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 	int err;
 
 	dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
-	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
+	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
+					      &dev->port[port_num].roce.nb);
 	if (err) {
 		dev->port[port_num].roce.nb.notifier_call = NULL;
 		return err;
@@ -3317,7 +3318,8 @@ static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	if (dev->port[port_num].roce.nb.notifier_call) {
-		unregister_netdevice_notifier(&dev->port[port_num].roce.nb);
+		unregister_netdevice_notifier_net(mlx5_core_net(dev->mdev),
+						  &dev->port[port_num].roce.nb);
 		dev->port[port_num].roce.nb.notifier_call = NULL;
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index d046db7bb047..3a9fa629503f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -90,9 +90,4 @@ int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
 			       u32 key_type, u32 *p_key_id);
 void mlx5_destroy_encryption_key(struct mlx5_core_dev *mdev, u32 key_id);
 
-static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
-{
-	return devlink_net(priv_to_devlink(dev));
-}
-
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c145de0473bc..3382855b7ef1 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1209,4 +1209,22 @@ static inline bool mlx5_is_roce_enabled(struct mlx5_core_dev *dev)
 	return val.vbool;
 }
 
+/**
+ * mlx5_core_net - Provide net namespace of the mlx5_core_dev
+ * @dev: mlx5 core device
+ *
+ * mlx5_core_net() returns the net namespace of mlx5 core device.
+ * This can be called only in below described limited context.
+ * (a) When a devlink instance for mlx5_core is registered and
+ *     when devlink reload operation is disabled.
+ *     or
+ * (b) during devlink reload reload_down() and reload_up callbacks
+ *     where it is ensured that devlink instance's net namespace is
+ *     stable.
+ */
+static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
+{
+	return devlink_net(priv_to_devlink(dev));
+}
+
 #endif /* MLX5_DRIVER_H */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-26 13:38 ` Parav Pandit
@ 2020-10-26 13:47   ` Parav Pandit
  0 siblings, 0 replies; 10+ messages in thread
From: Parav Pandit @ 2020-10-26 13:47 UTC (permalink / raw)
  To: Parav Pandit, dledford, Jason Gunthorpe
  Cc: jiri, linux-rdma, michaelgur, netdev, Saeed Mahameed


> From: Parav Pandit <parav@nvidia.com>
> Sent: Monday, October 26, 2020 7:09 PM

[..]

> 
> Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
I missed to include Leon's signed-off and updating version here.

Sent now with correct header and signoff at [1].

[1] https://lore.kernel.org/linux-rdma/20201026134359.23150-1-parav@nvidia.com/T/#u

> ---
> Changelog:
> v0->v1:
>  - updated comment for mlx5_core_net API to be used by multiple mlx5
>    drivers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc RESEND v1] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
  2020-10-26 13:43 ` [PATCH rdma-rc RESEND v1] " Parav Pandit
@ 2020-10-26 22:25   ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-10-26 22:25 UTC (permalink / raw)
  To: Parav Pandit
  Cc: dledford, jiri, linux-rdma, michaelgur, netdev, saeedm, Leon Romanovsky

On Mon, Oct 26, 2020 at 03:43:59PM +0200, Parav Pandit wrote:
> When a mlx5 core devlink instance is reloaded in different net
> namespace, its associated IB device is deleted and recreated.
> 
> Example sequence is:
> $ ip netns add foo
> $ devlink dev reload pci/0000:00:08.0 netns foo
> $ ip netns del foo
> 
> mlx5 IB device needs to attach and detach the netdevice to it
> through the netdev notifier chain during load and unload sequence.
> A below call graph of the unload flow.
> 
> cleanup_net()
>    down_read(&pernet_ops_rwsem); <- first sem acquired
>      ops_pre_exit_list()
>        pre_exit()
>          devlink_pernet_pre_exit()
>            devlink_reload()
>              mlx5_devlink_reload_down()
>                mlx5_unload_one()
>                [...]
>                  mlx5_ib_remove()
>                    mlx5_ib_unbind_slave_port()
>                      mlx5_remove_netdev_notifier()
>                        unregister_netdevice_notifier()
>                          down_write(&pernet_ops_rwsem);<- recurrsive lock
> 
> Hence, when net namespace is deleted, mlx5 reload results in deadlock.
> 
> When deadlock occurs, devlink mutex is also held. This not only deadlocks
> the mlx5 device under reload, but all the processes which attempt to access
> unrelated devlink devices are deadlocked.
> 
> Hence, fix this by mlx5 ib driver to register for per net netdev
> notifier instead of global one, which operats on the net namespace
> without holding the pernet_ops_rwsem.
> 
> Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> Changelog:
> v0->v1:
>  - updated comment for mlx5_core_net API to be used by multiple mlx5
>    drivers
> ---
>  drivers/infiniband/hw/mlx5/main.c              |  6 ++++--
>  .../net/ethernet/mellanox/mlx5/core/lib/mlx5.h |  5 -----
>  include/linux/mlx5/driver.h                    | 18 ++++++++++++++++++
>  3 files changed, 22 insertions(+), 7 deletions(-)

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19  5:27 [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion Leon Romanovsky
2020-10-19 13:07 ` Jason Gunthorpe
2020-10-19 13:23   ` Parav Pandit
2020-10-19 19:01     ` Jason Gunthorpe
2020-10-19 19:26       ` Parav Pandit
2020-10-20 11:41         ` Parav Pandit
2020-10-26 13:38 ` Parav Pandit
2020-10-26 13:47   ` Parav Pandit
2020-10-26 13:43 ` [PATCH rdma-rc RESEND v1] " Parav Pandit
2020-10-26 22:25   ` Jason Gunthorpe

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git