All of lore.kernel.org
 help / color / mirror / Atom feed
From: Parav Pandit <parav@nvidia.com>
To: Parav Pandit <parav@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>, Jiri Pirko <jiri@mellanox.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Michael Guralnik <michaelgur@mellanox.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>
Subject: RE: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
Date: Tue, 20 Oct 2020 11:41:50 +0000	[thread overview]
Message-ID: <BY5PR12MB432287BF4E79E6F453881735DC1F0@BY5PR12MB4322.namprd12.prod.outlook.com> (raw)
In-Reply-To: <BY5PR12MB43223A907782D48862D82246DC1E0@BY5PR12MB4322.namprd12.prod.outlook.com>

Hi Jason,

> From: Parav Pandit <parav@nvidia.com>
> Sent: Tuesday, October 20, 2020 12:57 AM
> 
> 
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, October 20, 2020 12:31 AM
> >
> > On Mon, Oct 19, 2020 at 01:23:23PM +0000, Parav Pandit wrote:
> > > > > -	err = register_netdevice_notifier(&dev-
> >port[port_num].roce.nb);
> > > > > +	err = register_netdevice_notifier_net(mlx5_core_net(dev-
> >mdev),
> > > > > +					      &dev-
> >port[port_num].roce.nb);
> > > >
> > > > This looks racy, what lock needs to be held to keep
> > > > *mlx5_core_net()
> > stable?
> > >
> > > mlx5_core_net() cannot be accessed outside of mlx5 driver's load,
> > > unload,
> > reload path.
> > >
> > > When this is getting executed, devlink cannot be executing reload.
> > > This is guarded by devlink_reload_enable/disable calls done by mlx5 core.
> >
> > A comment that devlink_reload_enable/disable() must be held would be
> > helpful
> >
> Yes. will add.
> 
> > > >
> > > > >  	if (err) {
> > > > >  		dev->port[port_num].roce.nb.notifier_call = NULL;
> > > > >  		return err;
> > > > > @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct
> > > > >mlx5_ib_dev *dev, u8 port_num)  static void
> > > > >mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8
> port_num)
> > {
> > > > >  	if (dev->port[port_num].roce.nb.notifier_call) {
> > > > > -		unregister_netdevice_notifier(&dev-
> > > > >port[port_num].roce.nb);
> > > > > +
> 	unregister_netdevice_notifier_net(mlx5_core_net(dev-
> > > > >mdev),
> > > > > +						  &dev-
> > > > >port[port_num].roce.nb);
> > > >
> > > > This seems dangerous too, what if the mlx5_core_net changed before
> > > > we get here?
> > >
> > > When I inspected driver, code, I am not aware of any code flow where
> > > this can change before reaching here, because registration and
> > > unregistration is done only in driver load, unload and reload path.
> > > Reload can happen only after devlink_reload_enable() is done.
> >
> > But we enable reload right after init_one
> >
> > > > What are the rules for when devlink_net() changes?
> > > >
> > > devlink_net() changes only after unload() callback is completed in driver.
> >
> > You mean mlx5_devlink_reload_down ?
> >
> Right.
> > That seems OK then
> Ok. will work with Leon to add the comment.

Is below fix up ok?

commit 33cf8a09e735849f622e8084a7b08d421f11a4e1 (HEAD -> netns-del-fix)
Author: Parav Pandit <parav@nvidia.com>
Date:   Tue Oct 20 12:26:08 2020 +0300

    fixup: for RDMA/mlx5: Fix devlink deadlock on net namespace deletion

    Changelog:
    v0->v1:
     - Added kdoc comment description for the API usage and allowed context

    issue: 2230150
    Change-Id: Ibd233f771682c27565f48c54cd48fd87b0a7790f
    Signed-off-by: Parav Pandit <parav@nvidia.com>

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 560b551d5ff8..3382855b7ef1 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1209,6 +1209,19 @@ static inline bool mlx5_is_roce_enabled(struct mlx5_core_dev *dev)
        return val.vbool;
 }

+/**
+ * mlx5_core_net - Provide net namespace of the mlx5_core_dev
+ * @dev: mlx5 core device
+ *
+ * mlx5_core_net() returns the net namespace of mlx5 core device.
+ * This can be called only in below described limited context.
+ * (a) When a devlink instance for mlx5_core is registered and
+ *     when devlink reload operation is disabled.
+ *     or
+ * (b) during devlink reload reload_down() and reload_up callbacks
+ *     where it is ensured that devlink instance's net namespace is
+ *     stable.
+ */
 static inline struct net *mlx5_core_net(struct mlx5_core_dev *dev)
 {
        return devlink_net(priv_to_devlink(dev));

  reply	other threads:[~2020-10-20 11:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19  5:27 [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion Leon Romanovsky
2020-10-19 13:07 ` Jason Gunthorpe
2020-10-19 13:23   ` Parav Pandit
2020-10-19 19:01     ` Jason Gunthorpe
2020-10-19 19:26       ` Parav Pandit
2020-10-20 11:41         ` Parav Pandit [this message]
2020-10-26 13:38 ` Parav Pandit
2020-10-26 13:47   ` Parav Pandit
2020-10-26 13:43 ` [PATCH rdma-rc RESEND v1] " Parav Pandit
2020-10-26 22:25   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BY5PR12MB432287BF4E79E6F453881735DC1F0@BY5PR12MB4322.namprd12.prod.outlook.com \
    --to=parav@nvidia.com \
    --cc=dledford@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=jiri@mellanox.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=michaelgur@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.