linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/10] IPoIB uninit
@ 2018-07-29  8:34 Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 01/10] IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN Leon Romanovsky
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

IP link was broken due to the changes in IPoIB for the rdma_netdev
support after commit cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks").

This patchset restores IPoIB pkey creation and removal using rtnetlink.

It is completely rewritten variant of
https://marc.info/?l=linux-rdma&m=151553425830918&w=2 patch series.

Thanks

Erez Shitrit (2):
  IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
  IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work

Jason Gunthorpe (8):
  IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN
  IB/ipoib: Move all uninit code into ndo_uninit
  IB/ipoib: Move init code to ndo_init
  RDMA/netdev: Use priv_destructor for netdev cleanup
  IB/ipoib: Get rid of the sysfs_mutex
  IB/ipoib: Do not remove child devices from within the ndo_uninit
  IB/ipoib: Maintain the child_intfs list from ndo_init/uninit
  IB/ipoib: Consolidate checking of the proposed child interface

 drivers/infiniband/hw/mlx5/main.c                  |  10 -
 drivers/infiniband/ulp/ipoib/ipoib.h               |  16 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c            |  14 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c          | 419 ++++++++++++---------
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c       |  23 --
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c          | 259 +++++++------
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  37 +-
 include/linux/mlx5/driver.h                        |   3 -
 include/rdma/ib_verbs.h                            |   6 +-
 9 files changed, 428 insertions(+), 359 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 01/10] IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 02/10] IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task Leon Romanovsky
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

This essentially duplicates the netdev's reg_state, so just use that
directly. The reg_state is updated under the rntl_lock, and all places
using GOING_DOWN already acquire the rtnl_lock so checking is safe.

Since the only place we use GOING_DOWN is for the parent device this
does not fix any bugs, but it is a step to tidy up the unregister flow
so that after later patches the flow is uniform and sane.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  1 -
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |  9 ++++++---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  3 ---
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 18 ++++++++++++------
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index e255a7e5a4c3..9eebb705d994 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -95,7 +95,6 @@ enum {
 	IPOIB_NEIGH_TBL_FLUSH	  = 12,
 	IPOIB_FLAG_DEV_ADDR_SET	  = 13,
 	IPOIB_FLAG_DEV_ADDR_CTRL  = 14,
-	IPOIB_FLAG_GOING_DOWN	  = 15,
 
 	IPOIB_MAX_BACKOFF_SECONDS = 16,
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8b44f33c7ae0..83fa402e5d03 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1516,9 +1516,6 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 	int ret;
 	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
-	if (test_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags))
-		return -EPERM;
-
 	if (!mutex_trylock(&priv->sysfs_mutex))
 		return restart_syscall();
 
@@ -1527,6 +1524,12 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 		return restart_syscall();
 	}
 
+	if (dev->reg_state != NETREG_REGISTERED) {
+		rtnl_unlock();
+		mutex_unlock(&priv->sysfs_mutex);
+		return -EPERM;
+	}
+
 	ret = ipoib_set_mode(dev, buf);
 
 	/* The assumption is that the function ipoib_set_mode returned
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 82f0e3869b04..7ca9013bf05c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2406,9 +2406,6 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data)
 		ib_unregister_event_handler(&priv->event_handler);
 		flush_workqueue(ipoib_workqueue);
 
-		/* mark interface in the middle of destruction */
-		set_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags);
-
 		rtnl_lock();
 		dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP);
 		rtnl_unlock();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index b067ad5e4c7e..1b7bfd500893 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -127,9 +127,6 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 
 	ppriv = ipoib_priv(pdev);
 
-	if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags))
-		return -EPERM;
-
 	snprintf(intf_name, sizeof(intf_name), "%s.%04x",
 		 ppriv->dev->name, pkey);
 
@@ -141,6 +138,12 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		return restart_syscall();
 	}
 
+	if (pdev->reg_state != NETREG_REGISTERED) {
+		rtnl_unlock();
+		mutex_unlock(&ppriv->sysfs_mutex);
+		return -EPERM;
+	}
+
 	if (!down_write_trylock(&ppriv->vlan_rwsem)) {
 		rtnl_unlock();
 		mutex_unlock(&ppriv->sysfs_mutex);
@@ -199,9 +202,6 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 
 	ppriv = ipoib_priv(pdev);
 
-	if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags))
-		return -EPERM;
-
 	if (!mutex_trylock(&ppriv->sysfs_mutex))
 		return restart_syscall();
 
@@ -210,6 +210,12 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 		return restart_syscall();
 	}
 
+	if (pdev->reg_state != NETREG_REGISTERED) {
+		rtnl_unlock();
+		mutex_unlock(&ppriv->sysfs_mutex);
+		return -EPERM;
+	}
+
 	if (!down_write_trylock(&ppriv->vlan_rwsem)) {
 		rtnl_unlock();
 		mutex_unlock(&ppriv->sysfs_mutex);
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 02/10] IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 01/10] IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 03/10] IB/ipoib: Move all uninit code into ndo_uninit Leon Romanovsky
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Erez Shitrit <erezsh@mellanox.com>

The neigh_reap_task is self restarting, but so long as we call
cancel_delayed_work_sync() it will be guaranteed to not be running and
never start again. Thus we don't need to have the racy
IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes
calling flush_workqueue after the cancel.

This fixes a situation where the GC work could have been left running
in some rare situations.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7ca9013bf05c..7cd42619deb2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1661,7 +1661,7 @@ static void ipoib_neigh_hash_uninit(struct net_device *dev)
 	/* Stop GC if called at init fail need to cancel work */
 	stopped = test_and_set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
 	if (!stopped)
-		cancel_delayed_work(&priv->neigh_reap_task);
+		cancel_delayed_work_sync(&priv->neigh_reap_task);
 
 	ipoib_flush_neighs(priv);
 
@@ -1837,7 +1837,7 @@ void ipoib_dev_cleanup(struct net_device *dev)
 	list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
 		/* Stop GC on child */
 		set_bit(IPOIB_STOP_NEIGH_GC, &cpriv->flags);
-		cancel_delayed_work(&cpriv->neigh_reap_task);
+		cancel_delayed_work_sync(&cpriv->neigh_reap_task);
 		unregister_netdevice_queue(cpriv->dev, &head);
 	}
 	unregister_netdevice_many(&head);
@@ -2346,7 +2346,7 @@ static struct net_device *ipoib_add_port(const char *format,
 	flush_workqueue(ipoib_workqueue);
 	/* Stop GC if started before flush */
 	set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
-	cancel_delayed_work(&priv->neigh_reap_task);
+	cancel_delayed_work_sync(&priv->neigh_reap_task);
 	flush_workqueue(priv->wq);
 	ipoib_dev_cleanup(priv->dev);
 
@@ -2412,7 +2412,7 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data)
 
 		/* Stop GC */
 		set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
-		cancel_delayed_work(&priv->neigh_reap_task);
+		cancel_delayed_work_sync(&priv->neigh_reap_task);
 		flush_workqueue(priv->wq);
 
 		/* Wrap rtnl_lock/unlock with mutex to protect sysfs calls */
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 03/10] IB/ipoib: Move all uninit code into ndo_uninit
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 01/10] IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 02/10] IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 04/10] IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work Leon Romanovsky
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

Currently uninit is sometimes done twice in error flows, and is sprinkled
a bit all over the place.

Improve the clarity of the design by moving all uninit only into
ndo_uinit.

Some duplication is removed:
 - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but
   this duplicates the process in ipoib_neigh_hash_init
 - Flushing priv->wq was sometimes done before unregister,
   but that duplicates what has been done in ndo_uninit

Uniniting the IB event queue must remain before unregister_netdev as it
requires the RTNL lock to be dropped, this is moved to a helper to make
that flow really clear and remove some duplication in error flows.

If register_netdev fails (and ndo_init is NULL) then it almost always
calls ndo_uninit, which lets us remove all the extra code from the error
unwinds. The next patch in the series will close the 'almost always' hole
by pairing a proper ndo_init with ndo_uninit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  1 -
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 62 ++++++++++++++++---------------
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c |  5 +--
 3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 9eebb705d994..48e9ea50ca87 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -510,7 +510,6 @@ int ipoib_ib_dev_stop_default(struct net_device *dev);
 void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
-void ipoib_dev_cleanup(struct net_device *dev);
 
 void ipoib_mcast_join_task(struct work_struct *work);
 void ipoib_mcast_carrier_on_task(struct work_struct *work);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7cd42619deb2..4eec2781e83f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -215,11 +215,6 @@ static int ipoib_stop(struct net_device *dev)
 	return 0;
 }
 
-static void ipoib_uninit(struct net_device *dev)
-{
-	ipoib_dev_cleanup(dev);
-}
-
 static netdev_features_t ipoib_fix_features(struct net_device *dev, netdev_features_t features)
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -1826,7 +1821,33 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 	return ret;
 }
 
-void ipoib_dev_cleanup(struct net_device *dev)
+/*
+ * This must be called before doing an unregister_netdev on a parent device to
+ * shutdown the IB event handler.
+ */
+static void ipoib_parent_unregister_pre(struct net_device *ndev)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(ndev);
+
+	/*
+	 * ipoib_set_mac checks netif_running before pushing work, clearing
+	 * running ensures the it will not add more work.
+	 */
+	rtnl_lock();
+	dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP);
+	rtnl_unlock();
+
+	/* ipoib_event() cannot be running once this returns */
+	ib_unregister_event_handler(&priv->event_handler);
+
+	/*
+	 * Work on the queue grabs the rtnl lock, so this cannot be done while
+	 * also holding it.
+	 */
+	flush_workqueue(ipoib_workqueue);
+}
+
+static void ipoib_ndo_uninit(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(dev), *cpriv, *tcpriv;
 	LIST_HEAD(head);
@@ -1899,7 +1920,7 @@ static const struct header_ops ipoib_header_ops = {
 };
 
 static const struct net_device_ops ipoib_netdev_ops_pf = {
-	.ndo_uninit		 = ipoib_uninit,
+	.ndo_uninit		 = ipoib_ndo_uninit,
 	.ndo_open		 = ipoib_open,
 	.ndo_stop		 = ipoib_stop,
 	.ndo_change_mtu		 = ipoib_change_mtu,
@@ -1918,7 +1939,7 @@ static const struct net_device_ops ipoib_netdev_ops_pf = {
 };
 
 static const struct net_device_ops ipoib_netdev_ops_vf = {
-	.ndo_uninit		 = ipoib_uninit,
+	.ndo_uninit		 = ipoib_ndo_uninit,
 	.ndo_open		 = ipoib_open,
 	.ndo_stop		 = ipoib_stop,
 	.ndo_change_mtu		 = ipoib_change_mtu,
@@ -2321,7 +2342,8 @@ static struct net_device *ipoib_add_port(const char *format,
 	if (result) {
 		pr_warn("%s: couldn't register ipoib port %d; error %d\n",
 			hca->name, port, result);
-		goto register_failed;
+		ipoib_parent_unregister_pre(priv->dev);
+		goto device_init_failed;
 	}
 
 	result = -ENOMEM;
@@ -2339,17 +2361,9 @@ static struct net_device *ipoib_add_port(const char *format,
 	return priv->dev;
 
 sysfs_failed:
+	ipoib_parent_unregister_pre(priv->dev);
 	unregister_netdev(priv->dev);
 
-register_failed:
-	ib_unregister_event_handler(&priv->event_handler);
-	flush_workqueue(ipoib_workqueue);
-	/* Stop GC if started before flush */
-	set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
-	cancel_delayed_work_sync(&priv->neigh_reap_task);
-	flush_workqueue(priv->wq);
-	ipoib_dev_cleanup(priv->dev);
-
 device_init_failed:
 	rn = netdev_priv(priv->dev);
 	rn->free_rdma_netdev(priv->dev);
@@ -2403,17 +2417,7 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data)
 	list_for_each_entry_safe(priv, tmp, dev_list, list) {
 		struct rdma_netdev *parent_rn = netdev_priv(priv->dev);
 
-		ib_unregister_event_handler(&priv->event_handler);
-		flush_workqueue(ipoib_workqueue);
-
-		rtnl_lock();
-		dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP);
-		rtnl_unlock();
-
-		/* Stop GC */
-		set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
-		cancel_delayed_work_sync(&priv->neigh_reap_task);
-		flush_workqueue(priv->wq);
+		ipoib_parent_unregister_pre(priv->dev);
 
 		/* Wrap rtnl_lock/unlock with mutex to protect sysfs calls */
 		mutex_lock(&priv->sysfs_mutex);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 1b7bfd500893..b62ab85c8ead 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -83,7 +83,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 	result = register_netdevice(priv->dev);
 	if (result) {
 		ipoib_warn(priv, "failed to initialize; error %i", result);
-		goto register_failed;
+		goto err;
 	}
 
 	/* RTNL childs don't need proprietary sysfs entries */
@@ -108,9 +108,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 	result = -ENOMEM;
 	unregister_netdevice(priv->dev);
 
-register_failed:
-	ipoib_dev_cleanup(priv->dev);
-
 err:
 	return result;
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 04/10] IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (2 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 03/10] IB/ipoib: Move all uninit code into ndo_uninit Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 05/10] IB/ipoib: Move init code to ndo_init Leon Romanovsky
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Erez Shitrit <erezsh@mellanox.com>

The neigh_reap_task is self restarting, but so long as we call
cancel_delayed_work_sync() it will be guaranteed to not be running and
never start again. Thus there is no longer any purpose to
IPOIB_STOP_NEIGH_GC (which was racy/buggy anyhow), so remove it too.

This fixes a situation where the GC work could have been running

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  1 -
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 25 +++++++------------------
 2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 48e9ea50ca87..04fc5ad1b69f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -91,7 +91,6 @@ enum {
 	IPOIB_STOP_REAPER	  = 7,
 	IPOIB_FLAG_ADMIN_CM	  = 9,
 	IPOIB_FLAG_UMCAST	  = 10,
-	IPOIB_STOP_NEIGH_GC	  = 11,
 	IPOIB_NEIGH_TBL_FLUSH	  = 12,
 	IPOIB_FLAG_DEV_ADDR_SET	  = 13,
 	IPOIB_FLAG_DEV_ADDR_CTRL  = 14,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 4eec2781e83f..d4e9951dc539 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1305,9 +1305,6 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv *priv)
 	int i;
 	LIST_HEAD(remove_list);
 
-	if (test_bit(IPOIB_STOP_NEIGH_GC, &priv->flags))
-		return;
-
 	spin_lock_irqsave(&priv->lock, flags);
 
 	htbl = rcu_dereference_protected(ntbl->htbl,
@@ -1319,9 +1316,6 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv *priv)
 	/* neigh is obsolete if it was idle for two GC periods */
 	dt = 2 * arp_tbl.gc_interval;
 	neigh_obsolete = jiffies - dt;
-	/* handle possible race condition */
-	if (test_bit(IPOIB_STOP_NEIGH_GC, &priv->flags))
-		goto out_unlock;
 
 	for (i = 0; i < htbl->size; i++) {
 		struct ipoib_neigh *neigh;
@@ -1359,9 +1353,8 @@ static void ipoib_reap_neigh(struct work_struct *work)
 
 	__ipoib_reap_neigh(priv);
 
-	if (!test_bit(IPOIB_STOP_NEIGH_GC, &priv->flags))
-		queue_delayed_work(priv->wq, &priv->neigh_reap_task,
-				   arp_tbl.gc_interval);
+	queue_delayed_work(priv->wq, &priv->neigh_reap_task,
+			   arp_tbl.gc_interval);
 }
 
 
@@ -1523,7 +1516,6 @@ static int ipoib_neigh_hash_init(struct ipoib_dev_priv *priv)
 	htbl = kzalloc(sizeof(*htbl), GFP_KERNEL);
 	if (!htbl)
 		return -ENOMEM;
-	set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
 	size = roundup_pow_of_two(arp_tbl.gc_thresh3);
 	buckets = kvcalloc(size, sizeof(*buckets), GFP_KERNEL);
 	if (!buckets) {
@@ -1538,7 +1530,6 @@ static int ipoib_neigh_hash_init(struct ipoib_dev_priv *priv)
 	atomic_set(&ntbl->entries, 0);
 
 	/* start garbage collection */
-	clear_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
 	queue_delayed_work(priv->wq, &priv->neigh_reap_task,
 			   arp_tbl.gc_interval);
 
@@ -1648,15 +1639,11 @@ static void ipoib_flush_neighs(struct ipoib_dev_priv *priv)
 static void ipoib_neigh_hash_uninit(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(dev);
-	int stopped;
 
 	ipoib_dbg(priv, "ipoib_neigh_hash_uninit\n");
 	init_completion(&priv->ntbl.deleted);
 
-	/* Stop GC if called at init fail need to cancel work */
-	stopped = test_and_set_bit(IPOIB_STOP_NEIGH_GC, &priv->flags);
-	if (!stopped)
-		cancel_delayed_work_sync(&priv->neigh_reap_task);
+	cancel_delayed_work_sync(&priv->neigh_reap_task);
 
 	ipoib_flush_neighs(priv);
 
@@ -1796,12 +1783,15 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		if (ipoib_ib_dev_open(dev)) {
 			pr_warn("%s failed to open device\n", dev->name);
 			ret = -ENODEV;
-			goto out_dev_uninit;
+			goto out_hash_uninit;
 		}
 	}
 
 	return 0;
 
+out_hash_uninit:
+	ipoib_neigh_hash_uninit(dev);
+
 out_dev_uninit:
 	ipoib_ib_dev_cleanup(dev);
 
@@ -1857,7 +1847,6 @@ static void ipoib_ndo_uninit(struct net_device *dev)
 	/* Delete any child interfaces first */
 	list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
 		/* Stop GC on child */
-		set_bit(IPOIB_STOP_NEIGH_GC, &cpriv->flags);
 		cancel_delayed_work_sync(&cpriv->neigh_reap_task);
 		unregister_netdevice_queue(cpriv->dev, &head);
 	}
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 05/10] IB/ipoib: Move init code to ndo_init
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (3 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 04/10] IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 06/10] RDMA/netdev: Use priv_destructor for netdev cleanup Leon Romanovsky
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

Now that we have a proper ndo_uninit, move code that naturally pairs
with the ndo_uninit into ndo_init. This allows the netdev core to natually
handle ordering.

This fixes the situation where register_netdev can fail before calling
ndo_init, in which case it wouldn't call ndo_uninit either.

Also move a bunch of duplicated init code that is shared between child
and parent for clarity. Now the child and parent register functions look
very similar.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h         |   3 -
 drivers/infiniband/ulp/ipoib/ipoib_main.c    | 193 +++++++++++++++------------
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c |   6 -
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    |  31 +----
 4 files changed, 114 insertions(+), 119 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 04fc5ad1b69f..02ad1a60dc80 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -508,8 +508,6 @@ void ipoib_ib_dev_down(struct net_device *dev);
 int ipoib_ib_dev_stop_default(struct net_device *dev);
 void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
-int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
-
 void ipoib_mcast_join_task(struct work_struct *work);
 void ipoib_mcast_carrier_on_task(struct work_struct *work);
 void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb);
@@ -597,7 +595,6 @@ void ipoib_pkey_open(struct ipoib_dev_priv *priv);
 void ipoib_drain_cq(struct net_device *dev);
 
 void ipoib_set_ethtool_ops(struct net_device *dev);
-void ipoib_set_dev_features(struct ipoib_dev_priv *priv, struct ib_device *hca);
 
 #define IPOIB_FLAGS_RC		0x80
 #define IPOIB_FLAGS_UC		0x40
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index d4e9951dc539..67ab52eec3e9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1741,13 +1741,11 @@ static int ipoib_ioctl(struct net_device *dev, struct ifreq *ifr,
 	return priv->rn_ops->ndo_do_ioctl(dev, ifr, cmd);
 }
 
-int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
+static int ipoib_dev_init(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int ret = -ENOMEM;
 
-	priv->ca = ca;
-	priv->port = port;
 	priv->qp = NULL;
 
 	/*
@@ -1763,7 +1761,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 	/* create pd, which used both for control and datapath*/
 	priv->pd = ib_alloc_pd(priv->ca, 0);
 	if (IS_ERR(priv->pd)) {
-		pr_warn("%s: failed to allocate PD\n", ca->name);
+		pr_warn("%s: failed to allocate PD\n", priv->ca->name);
 		goto clean_wq;
 	}
 
@@ -1837,6 +1835,108 @@ static void ipoib_parent_unregister_pre(struct net_device *ndev)
 	flush_workqueue(ipoib_workqueue);
 }
 
+static void ipoib_set_dev_features(struct ipoib_dev_priv *priv)
+{
+	priv->hca_caps = priv->ca->attrs.device_cap_flags;
+
+	if (priv->hca_caps & IB_DEVICE_UD_IP_CSUM) {
+		priv->dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM;
+
+		if (priv->hca_caps & IB_DEVICE_UD_TSO)
+			priv->dev->hw_features |= NETIF_F_TSO;
+
+		priv->dev->features |= priv->dev->hw_features;
+	}
+}
+
+static int ipoib_parent_init(struct net_device *ndev)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(ndev);
+	struct ib_port_attr attr;
+	int result;
+
+	result = ib_query_port(priv->ca, priv->port, &attr);
+	if (result) {
+		pr_warn("%s: ib_query_port %d failed\n", priv->ca->name,
+			priv->port);
+		return result;
+	}
+	priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu);
+
+	result = ib_query_pkey(priv->ca, priv->port, 0, &priv->pkey);
+	if (result) {
+		pr_warn("%s: ib_query_pkey port %d failed (ret = %d)\n",
+			priv->ca->name, priv->port, result);
+		return result;
+	}
+
+	result = rdma_query_gid(priv->ca, priv->port, 0, &priv->local_gid);
+	if (result) {
+		pr_warn("%s: rdma_query_gid port %d failed (ret = %d)\n",
+			priv->ca->name, priv->port, result);
+		return result;
+	}
+	memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw,
+	       sizeof(union ib_gid));
+
+	SET_NETDEV_DEV(priv->dev, priv->ca->dev.parent);
+	priv->dev->dev_id = priv->port - 1;
+
+	return 0;
+}
+
+static void ipoib_child_init(struct net_device *ndev)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(ndev);
+	struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
+
+	priv->max_ib_mtu = ppriv->max_ib_mtu;
+	set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
+	memcpy(priv->dev->dev_addr, ppriv->dev->dev_addr, INFINIBAND_ALEN);
+	memcpy(&priv->local_gid, &ppriv->local_gid, sizeof(priv->local_gid));
+}
+
+static int ipoib_ndo_init(struct net_device *ndev)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(ndev);
+	int rc;
+
+	if (priv->parent) {
+		ipoib_child_init(ndev);
+	} else {
+		rc = ipoib_parent_init(ndev);
+		if (rc)
+			return rc;
+	}
+
+	/* MTU will be reset when mcast join happens */
+	ndev->mtu = IPOIB_UD_MTU(priv->max_ib_mtu);
+	priv->mcast_mtu = priv->admin_mtu = ndev->mtu;
+	ndev->max_mtu = IPOIB_CM_MTU;
+
+	ndev->neigh_priv_len = sizeof(struct ipoib_neigh);
+
+	/*
+	 * Set the full membership bit, so that we join the right
+	 * broadcast group, etc.
+	 */
+	priv->pkey |= 0x8000;
+
+	ndev->broadcast[8] = priv->pkey >> 8;
+	ndev->broadcast[9] = priv->pkey & 0xff;
+	set_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags);
+
+	ipoib_set_dev_features(priv);
+
+	rc = ipoib_dev_init(ndev);
+	if (rc) {
+		pr_warn("%s: failed to initialize device: %s port %d (ret = %d)\n",
+			priv->ca->name, priv->dev->name, priv->port, rc);
+	}
+
+	return 0;
+}
+
 static void ipoib_ndo_uninit(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(dev), *cpriv, *tcpriv;
@@ -1909,6 +2009,7 @@ static const struct header_ops ipoib_header_ops = {
 };
 
 static const struct net_device_ops ipoib_netdev_ops_pf = {
+	.ndo_init		 = ipoib_ndo_init,
 	.ndo_uninit		 = ipoib_ndo_uninit,
 	.ndo_open		 = ipoib_open,
 	.ndo_stop		 = ipoib_stop,
@@ -1928,6 +2029,7 @@ static const struct net_device_ops ipoib_netdev_ops_pf = {
 };
 
 static const struct net_device_ops ipoib_netdev_ops_vf = {
+	.ndo_init		 = ipoib_ndo_init,
 	.ndo_uninit		 = ipoib_ndo_uninit,
 	.ndo_open		 = ipoib_open,
 	.ndo_stop		 = ipoib_stop,
@@ -2054,6 +2156,9 @@ struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
 	if (!priv)
 		return NULL;
 
+	priv->ca = hca;
+	priv->port = port;
+
 	dev = ipoib_get_netdev(hca, port, name);
 	if (!dev)
 		goto free_priv;
@@ -2201,12 +2306,6 @@ static ssize_t create_child(struct device *dev,
 	if (pkey <= 0 || pkey > 0xffff || pkey == 0x8000)
 		return -EINVAL;
 
-	/*
-	 * Set the full membership bit, so that we join the right
-	 * broadcast group, etc.
-	 */
-	pkey |= 0x8000;
-
 	ret = ipoib_vlan_add(to_net_dev(dev), pkey);
 
 	return ret ? ret : count;
@@ -2238,86 +2337,17 @@ int ipoib_add_pkey_attr(struct net_device *dev)
 	return device_create_file(&dev->dev, &dev_attr_pkey);
 }
 
-void ipoib_set_dev_features(struct ipoib_dev_priv *priv, struct ib_device *hca)
-{
-	priv->hca_caps = hca->attrs.device_cap_flags;
-
-	if (priv->hca_caps & IB_DEVICE_UD_IP_CSUM) {
-		priv->dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM;
-
-		if (priv->hca_caps & IB_DEVICE_UD_TSO)
-			priv->dev->hw_features |= NETIF_F_TSO;
-
-		priv->dev->features |= priv->dev->hw_features;
-	}
-}
-
 static struct net_device *ipoib_add_port(const char *format,
 					 struct ib_device *hca, u8 port)
 {
 	struct ipoib_dev_priv *priv;
-	struct ib_port_attr attr;
 	struct rdma_netdev *rn;
-	int result = -ENOMEM;
+	int result;
 
 	priv = ipoib_intf_alloc(hca, port, format);
 	if (!priv) {
 		pr_warn("%s, %d: ipoib_intf_alloc failed\n", hca->name, port);
-		goto alloc_mem_failed;
-	}
-
-	SET_NETDEV_DEV(priv->dev, hca->dev.parent);
-	priv->dev->dev_id = port - 1;
-
-	result = ib_query_port(hca, port, &attr);
-	if (result) {
-		pr_warn("%s: ib_query_port %d failed\n", hca->name, port);
-		goto device_init_failed;
-	}
-
-	priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu);
-
-	/* MTU will be reset when mcast join happens */
-	priv->dev->mtu  = IPOIB_UD_MTU(priv->max_ib_mtu);
-	priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
-	priv->dev->max_mtu = IPOIB_CM_MTU;
-
-	priv->dev->neigh_priv_len = sizeof(struct ipoib_neigh);
-
-	result = ib_query_pkey(hca, port, 0, &priv->pkey);
-	if (result) {
-		pr_warn("%s: ib_query_pkey port %d failed (ret = %d)\n",
-			hca->name, port, result);
-		goto device_init_failed;
-	}
-
-	ipoib_set_dev_features(priv, hca);
-
-	/*
-	 * Set the full membership bit, so that we join the right
-	 * broadcast group, etc.
-	 */
-	priv->pkey |= 0x8000;
-
-	priv->dev->broadcast[8] = priv->pkey >> 8;
-	priv->dev->broadcast[9] = priv->pkey & 0xff;
-
-	result = rdma_query_gid(hca, port, 0, &priv->local_gid);
-	if (result) {
-		pr_warn("%s: rdma_query_gid port %d failed (ret = %d)\n",
-			hca->name, port, result);
-		goto device_init_failed;
-	}
-
-	memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw,
-	       sizeof(union ib_gid));
-	set_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags);
-
-	result = ipoib_dev_init(priv->dev, hca, port);
-	if (result) {
-		pr_warn("%s: failed to initialize port %d (ret = %d)\n",
-			hca->name, port, result);
-		goto device_init_failed;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	INIT_IB_EVENT_HANDLER(&priv->event_handler,
@@ -2358,7 +2388,6 @@ static struct net_device *ipoib_add_port(const char *format,
 	rn->free_rdma_netdev(priv->dev);
 	kfree(priv);
 
-alloc_mem_failed:
 	return ERR_PTR(result);
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index 3e44087935ae..a86928a80c08 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -125,12 +125,6 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 	if (child_pkey == 0 || child_pkey == 0x8000)
 		return -EINVAL;
 
-	/*
-	 * Set the full membership bit, so that we join the right
-	 * broadcast group, etc.
-	 */
-	child_pkey |= 0x8000;
-
 	err = __ipoib_vlan_add(ppriv, ipoib_priv(dev),
 			       child_pkey, IPOIB_RTNL_CHILD);
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index b62ab85c8ead..3103729a73fd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -55,35 +55,14 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 {
 	int result;
 
-	priv->max_ib_mtu = ppriv->max_ib_mtu;
-	/* MTU will be reset when mcast join happens */
-	priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
-	priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
 	priv->parent = ppriv->dev;
-	set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
-
-	ipoib_set_dev_features(priv, ppriv->ca);
-
 	priv->pkey = pkey;
-
-	memcpy(priv->dev->dev_addr, ppriv->dev->dev_addr, INFINIBAND_ALEN);
-	memcpy(&priv->local_gid, &ppriv->local_gid, sizeof(priv->local_gid));
-	set_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags);
-	priv->dev->broadcast[8] = pkey >> 8;
-	priv->dev->broadcast[9] = pkey & 0xff;
-
-	result = ipoib_dev_init(priv->dev, ppriv->ca, ppriv->port);
-	if (result < 0) {
-		ipoib_warn(ppriv, "failed to initialize subinterface: "
-			   "device %s, port %d",
-			   ppriv->ca->name, ppriv->port);
-		goto err;
-	}
+	priv->child_type = type;
 
 	result = register_netdevice(priv->dev);
 	if (result) {
 		ipoib_warn(priv, "failed to initialize; error %i", result);
-		goto err;
+		return result;
 	}
 
 	/* RTNL childs don't need proprietary sysfs entries */
@@ -99,17 +78,13 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 			goto sysfs_failed;
 	}
 
-	priv->child_type  = type;
 	list_add_tail(&priv->list, &ppriv->child_intfs);
 
 	return 0;
 
 sysfs_failed:
-	result = -ENOMEM;
 	unregister_netdevice(priv->dev);
-
-err:
-	return result;
+	return -ENOMEM;
 }
 
 int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 06/10] RDMA/netdev: Use priv_destructor for netdev cleanup
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (4 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 05/10] IB/ipoib: Move init code to ndo_init Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 07/10] IB/ipoib: Get rid of the sysfs_mutex Leon Romanovsky
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c                  |  10 --
 drivers/infiniband/ulp/ipoib/ipoib.h               |   2 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c          | 101 +++++++++++++--------
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c          |  68 ++++++++------
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  37 ++++----
 include/linux/mlx5/driver.h                        |   3 -
 include/rdma/ib_verbs.h                            |   6 +-
 7 files changed, 129 insertions(+), 98 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a93ef6367d38..9011187ca081 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5157,11 +5157,6 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
 	return num_counters;
 }

-static void mlx5_ib_free_rdma_netdev(struct net_device *netdev)
-{
-	return mlx5_rdma_netdev_free(netdev);
-}
-
 static struct net_device*
 mlx5_ib_alloc_rdma_netdev(struct ib_device *hca,
 			  u8 port_num,
@@ -5171,17 +5166,12 @@ mlx5_ib_alloc_rdma_netdev(struct ib_device *hca,
 			  void (*setup)(struct net_device *))
 {
 	struct net_device *netdev;
-	struct rdma_netdev *rn;

 	if (type != RDMA_NETDEV_IPOIB)
 		return ERR_PTR(-EOPNOTSUPP);

 	netdev = mlx5_rdma_netdev_alloc(to_mdev(hca)->mdev, hca,
 					name, setup);
-	if (likely(!IS_ERR_OR_NULL(netdev))) {
-		rn = netdev_priv(netdev);
-		rn->free_rdma_netdev = mlx5_ib_free_rdma_netdev;
-	}
 	return netdev;
 }

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 02ad1a60dc80..d2cb0a8500e3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -323,6 +323,7 @@ struct ipoib_dev_priv {
 	spinlock_t lock;

 	struct net_device *dev;
+	void (*next_priv_destructor)(struct net_device *dev);

 	struct napi_struct send_napi;
 	struct napi_struct recv_napi;
@@ -481,6 +482,7 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 	kref_put(&ah->ref, ipoib_free_ah);
 }
 int ipoib_open(struct net_device *dev);
+void ipoib_intf_free(struct net_device *dev);
 int ipoib_add_pkey_attr(struct net_device *dev);
 int ipoib_add_umcast_attr(struct net_device *dev);

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 67ab52eec3e9..73d917d57f93 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2062,6 +2062,13 @@ void ipoib_setup_common(struct net_device *dev)
 	netif_keep_dst(dev);

 	memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN);
+
+	/*
+	 * unregister_netdev always frees the netdev, we use this mode
+	 * consistently to unify all the various unregister paths, including
+	 * those connected to rtnl_link_ops which require it.
+	 */
+	dev->needs_free_netdev = true;
 }

 static void ipoib_build_priv(struct net_device *dev)
@@ -2116,9 +2123,7 @@ static struct net_device
 	rn->send = ipoib_send;
 	rn->attach_mcast = ipoib_mcast_attach;
 	rn->detach_mcast = ipoib_mcast_detach;
-	rn->free_rdma_netdev = free_netdev;
 	rn->hca = hca;
-
 	dev->netdev_ops = &ipoib_netdev_default_pf;

 	return dev;
@@ -2173,6 +2178,15 @@ struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,

 	rn = netdev_priv(dev);
 	rn->clnt_priv = priv;
+
+	/*
+	 * Only the child register_netdev flows can handle priv_destructor
+	 * being set, so we force it to NULL here and handle manually until it
+	 * is safe to turn on.
+	 */
+	priv->next_priv_destructor = dev->priv_destructor;
+	dev->priv_destructor = NULL;
+
 	ipoib_build_priv(dev);

 	return priv;
@@ -2181,6 +2195,27 @@ struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
 	return NULL;
 }

+void ipoib_intf_free(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
+
+	dev->priv_destructor = priv->next_priv_destructor;
+	if (dev->priv_destructor)
+		dev->priv_destructor(dev);
+
+	/*
+	 * There are some error flows around register_netdev failing that may
+	 * attempt to call priv_destructor twice, prevent that from happening.
+	 */
+	dev->priv_destructor = NULL;
+
+	/* unregister/destroy is very complicated. Make bugs more obvious. */
+	rn->clnt_priv = NULL;
+
+	kfree(priv);
+}
+
 static ssize_t show_pkey(struct device *dev,
 			 struct device_attribute *attr, char *buf)
 {
@@ -2341,7 +2376,7 @@ static struct net_device *ipoib_add_port(const char *format,
 					 struct ib_device *hca, u8 port)
 {
 	struct ipoib_dev_priv *priv;
-	struct rdma_netdev *rn;
+	struct net_device *ndev;
 	int result;

 	priv = ipoib_intf_alloc(hca, port, format);
@@ -2349,6 +2384,7 @@ static struct net_device *ipoib_add_port(const char *format,
 		pr_warn("%s, %d: ipoib_intf_alloc failed\n", hca->name, port);
 		return ERR_PTR(-ENOMEM);
 	}
+	ndev = priv->dev;

 	INIT_IB_EVENT_HANDLER(&priv->event_handler,
 			      priv->ca, ipoib_event);
@@ -2357,38 +2393,43 @@ static struct net_device *ipoib_add_port(const char *format,
 	/* call event handler to ensure pkey in sync */
 	queue_work(ipoib_workqueue, &priv->flush_heavy);

-	result = register_netdev(priv->dev);
+	result = register_netdev(ndev);
 	if (result) {
 		pr_warn("%s: couldn't register ipoib port %d; error %d\n",
 			hca->name, port, result);
-		ipoib_parent_unregister_pre(priv->dev);
-		goto device_init_failed;
+
+		ipoib_parent_unregister_pre(ndev);
+		ipoib_intf_free(ndev);
+		free_netdev(ndev);
+
+		return ERR_PTR(result);
 	}

-	result = -ENOMEM;
-	if (ipoib_cm_add_mode_attr(priv->dev))
+	/*
+	 * We cannot set priv_destructor before register_netdev because we
+	 * need priv to be always valid during the error flow to execute
+	 * ipoib_parent_unregister_pre(). Instead handle it manually and only
+	 * enter priv_destructor mode once we are completely registered.
+	 */
+	ndev->priv_destructor = ipoib_intf_free;
+
+	if (ipoib_cm_add_mode_attr(ndev))
 		goto sysfs_failed;
-	if (ipoib_add_pkey_attr(priv->dev))
+	if (ipoib_add_pkey_attr(ndev))
 		goto sysfs_failed;
-	if (ipoib_add_umcast_attr(priv->dev))
+	if (ipoib_add_umcast_attr(ndev))
 		goto sysfs_failed;
-	if (device_create_file(&priv->dev->dev, &dev_attr_create_child))
+	if (device_create_file(&ndev->dev, &dev_attr_create_child))
 		goto sysfs_failed;
-	if (device_create_file(&priv->dev->dev, &dev_attr_delete_child))
+	if (device_create_file(&ndev->dev, &dev_attr_delete_child))
 		goto sysfs_failed;

-	return priv->dev;
+	return ndev;

 sysfs_failed:
-	ipoib_parent_unregister_pre(priv->dev);
-	unregister_netdev(priv->dev);
-
-device_init_failed:
-	rn = netdev_priv(priv->dev);
-	rn->free_rdma_netdev(priv->dev);
-	kfree(priv);
-
-	return ERR_PTR(result);
+	ipoib_parent_unregister_pre(ndev);
+	unregister_netdev(ndev);
+	return ERR_PTR(-ENOMEM);
 }

 static void ipoib_add_one(struct ib_device *device)
@@ -2426,33 +2467,19 @@ static void ipoib_add_one(struct ib_device *device)

 static void ipoib_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ipoib_dev_priv *priv, *tmp, *cpriv, *tcpriv;
+	struct ipoib_dev_priv *priv, *tmp;
 	struct list_head *dev_list = client_data;

 	if (!dev_list)
 		return;

 	list_for_each_entry_safe(priv, tmp, dev_list, list) {
-		struct rdma_netdev *parent_rn = netdev_priv(priv->dev);
-
 		ipoib_parent_unregister_pre(priv->dev);

 		/* Wrap rtnl_lock/unlock with mutex to protect sysfs calls */
 		mutex_lock(&priv->sysfs_mutex);
 		unregister_netdev(priv->dev);
 		mutex_unlock(&priv->sysfs_mutex);
-
-		parent_rn->free_rdma_netdev(priv->dev);
-
-		list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
-			struct rdma_netdev *child_rn;
-
-			child_rn = netdev_priv(cpriv->dev);
-			child_rn->free_rdma_netdev(cpriv->dev);
-			kfree(cpriv);
-		}
-
-		kfree(priv);
 	}

 	kfree(dev_list);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 3103729a73fd..7776334cf8c5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -50,31 +50,53 @@ static ssize_t show_parent(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR(parent, S_IRUGO, show_parent, NULL);

+/*
+ * NOTE: If this function fails then the priv->dev will remain valid, however
+ * priv can have been freed and must not be touched by caller in the error
+ * case.
+ *
+ * If (ndev->reg_state == NETREG_UNINITIALIZED) then it is up to the caller to
+ * free the net_device (just as rtnl_newlink does) otherwise the net_device
+ * will be freed when the rtnl is unlocked.
+ */
 int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 		     u16 pkey, int type)
 {
+	struct net_device *ndev = priv->dev;
 	int result;

+	ASSERT_RTNL();
+
 	priv->parent = ppriv->dev;
 	priv->pkey = pkey;
 	priv->child_type = type;

-	result = register_netdevice(priv->dev);
+	/* We do not need to touch priv if register_netdevice fails */
+	ndev->priv_destructor = ipoib_intf_free;
+
+	result = register_netdevice(ndev);
 	if (result) {
 		ipoib_warn(priv, "failed to initialize; error %i", result);
+
+		/*
+		 * register_netdevice sometimes calls priv_destructor,
+		 * sometimes not. Make sure it was done.
+		 */
+		if (ndev->priv_destructor)
+			ndev->priv_destructor(ndev);
 		return result;
 	}

 	/* RTNL childs don't need proprietary sysfs entries */
 	if (type == IPOIB_LEGACY_CHILD) {
-		if (ipoib_cm_add_mode_attr(priv->dev))
+		if (ipoib_cm_add_mode_attr(ndev))
 			goto sysfs_failed;
-		if (ipoib_add_pkey_attr(priv->dev))
+		if (ipoib_add_pkey_attr(ndev))
 			goto sysfs_failed;
-		if (ipoib_add_umcast_attr(priv->dev))
+		if (ipoib_add_umcast_attr(ndev))
 			goto sysfs_failed;

-		if (device_create_file(&priv->dev->dev, &dev_attr_parent))
+		if (device_create_file(&ndev->dev, &dev_attr_parent))
 			goto sysfs_failed;
 	}

@@ -91,6 +113,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 {
 	struct ipoib_dev_priv *ppriv, *priv;
 	char intf_name[IFNAMSIZ];
+	struct net_device *ndev;
 	struct ipoib_dev_priv *tpriv;
 	int result;

@@ -122,12 +145,6 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		return restart_syscall();
 	}

-	priv = ipoib_intf_alloc(ppriv->ca, ppriv->port, intf_name);
-	if (!priv) {
-		result = -ENOMEM;
-		goto out;
-	}
-
 	/*
 	 * First ensure this isn't a duplicate. We check the parent device and
 	 * then all of the legacy child interfaces to make sure the Pkey
@@ -146,21 +163,23 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		}
 	}

+	priv = ipoib_intf_alloc(ppriv->ca, ppriv->port, intf_name);
+	if (!priv) {
+		result = -ENOMEM;
+		goto out;
+	}
+	ndev = priv->dev;
+
 	result = __ipoib_vlan_add(ppriv, priv, pkey, IPOIB_LEGACY_CHILD);

+	if (result && ndev->reg_state == NETREG_UNINITIALIZED)
+		free_netdev(ndev);
+
 out:
 	up_write(&ppriv->vlan_rwsem);
 	rtnl_unlock();
 	mutex_unlock(&ppriv->sysfs_mutex);

-	if (result && priv) {
-		struct rdma_netdev *rn;
-
-		rn = netdev_priv(priv->dev);
-		rn->free_rdma_netdev(priv->dev);
-		kfree(priv);
-	}
-
 	return result;
 }

@@ -212,14 +231,5 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 	rtnl_unlock();
 	mutex_unlock(&ppriv->sysfs_mutex);

-	if (dev) {
-		struct rdma_netdev *rn;
-
-		rn = netdev_priv(dev);
-		rn->free_rdma_netdev(priv->dev);
-		kfree(priv);
-		return 0;
-	}
-
-	return -ENODEV;
+	return (dev) ? 0 : -ENODEV;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index af3bb2f7a504..b8d150d2fd72 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -580,6 +580,22 @@ static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
 	return 0;
 }

+static void mlx5_rdma_netdev_free(struct net_device *netdev)
+{
+	struct mlx5e_priv *priv = mlx5i_epriv(netdev);
+	struct mlx5i_priv *ipriv = priv->ppriv;
+	const struct mlx5e_profile *profile = priv->profile;
+
+	mlx5e_detach_netdev(priv);
+	profile->cleanup(priv);
+	destroy_workqueue(priv->wq);
+
+	if (!ipriv->sub_interface) {
+		mlx5i_pkey_qpn_ht_cleanup(netdev);
+		mlx5e_destroy_mdev_resources(priv->mdev);
+	}
+}
+
 struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 					  struct ib_device *ibdev,
 					  const char *name,
@@ -653,6 +669,9 @@ struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 	rn->detach_mcast = mlx5i_detach_mcast;
 	rn->set_id = mlx5i_set_pkey_index;

+	netdev->priv_destructor = mlx5_rdma_netdev_free;
+	netdev->needs_free_netdev = 1;
+
 	return netdev;

 destroy_ht:
@@ -665,21 +684,3 @@ struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 	return NULL;
 }
 EXPORT_SYMBOL(mlx5_rdma_netdev_alloc);
-
-void mlx5_rdma_netdev_free(struct net_device *netdev)
-{
-	struct mlx5e_priv *priv = mlx5i_epriv(netdev);
-	struct mlx5i_priv *ipriv = priv->ppriv;
-	const struct mlx5e_profile *profile = priv->profile;
-
-	mlx5e_detach_netdev(priv);
-	profile->cleanup(priv);
-	destroy_workqueue(priv->wq);
-
-	if (!ipriv->sub_interface) {
-		mlx5i_pkey_qpn_ht_cleanup(netdev);
-		mlx5e_destroy_mdev_resources(priv->mdev);
-	}
-	free_netdev(netdev);
-}
-EXPORT_SYMBOL(mlx5_rdma_netdev_free);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0c1beb14837b..789489e3aa76 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1218,14 +1218,11 @@ struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
-
-static inline void mlx5_rdma_netdev_free(struct net_device *netdev) {}
 #else
 struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 					  struct ib_device *ibdev,
 					  const char *name,
 					  void (*setup)(struct net_device *));
-void mlx5_rdma_netdev_free(struct net_device *netdev);
 #endif /* CONFIG_MLX5_CORE_IPOIB */

 struct mlx5_profile {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 42cbf8eabe9d..73c780c99482 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2202,7 +2202,11 @@ struct rdma_netdev {
 	struct ib_device  *hca;
 	u8                 port_num;

-	/* cleanup function must be specified */
+	/*
+	 * cleanup function must be specified.
+	 * FIXME: This is only used for OPA_VNIC and that usage should be
+	 * removed too.
+	 */
 	void (*free_rdma_netdev)(struct net_device *netdev);

 	/* control functions */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 07/10] IB/ipoib: Get rid of the sysfs_mutex
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (5 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 06/10] RDMA/netdev: Use priv_destructor for netdev cleanup Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 08/10] IB/ipoib: Do not remove child devices from within the ndo_uninit Leon Romanovsky
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

This mutex was introduced to deal with the deadlock formed by calling
unregister_netdev from within the sysfs callback of a netdev.

Now that we have priv_destructor and needs_free_netdev we can switch
to the more targeted solution of running the unregister from a
work queue. This avoids the deadlock and gets rid of the mutex.

The next patch in the series needs this mutex eliminated to create
atomicity of unregisteration.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  1 -
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |  7 ---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  7 +--
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 98 ++++++++++++++++++++-----------
 4 files changed, 65 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index d2cb0a8500e3..804cb4bee57d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -332,7 +332,6 @@ struct ipoib_dev_priv {
 
 	struct rw_semaphore vlan_rwsem;
 	struct mutex mcast_mutex;
-	struct mutex sysfs_mutex;
 
 	struct rb_root  path_tree;
 	struct list_head path_list;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 83fa402e5d03..04785d3f8195 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1514,19 +1514,13 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 {
 	struct net_device *dev = to_net_dev(d);
 	int ret;
-	struct ipoib_dev_priv *priv = ipoib_priv(dev);
-
-	if (!mutex_trylock(&priv->sysfs_mutex))
-		return restart_syscall();
 
 	if (!rtnl_trylock()) {
-		mutex_unlock(&priv->sysfs_mutex);
 		return restart_syscall();
 	}
 
 	if (dev->reg_state != NETREG_REGISTERED) {
 		rtnl_unlock();
-		mutex_unlock(&priv->sysfs_mutex);
 		return -EPERM;
 	}
 
@@ -1538,7 +1532,6 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 	 */
 	if (ret != -EBUSY)
 		rtnl_unlock();
-	mutex_unlock(&priv->sysfs_mutex);
 
 	return (!ret || ret == -EBUSY) ? count : ret;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 73d917d57f93..e9f4f261fe20 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2079,7 +2079,6 @@ static void ipoib_build_priv(struct net_device *dev)
 	spin_lock_init(&priv->lock);
 	init_rwsem(&priv->vlan_rwsem);
 	mutex_init(&priv->mcast_mutex);
-	mutex_init(&priv->sysfs_mutex);
 
 	INIT_LIST_HEAD(&priv->path_list);
 	INIT_LIST_HEAD(&priv->child_intfs);
@@ -2476,10 +2475,7 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data)
 	list_for_each_entry_safe(priv, tmp, dev_list, list) {
 		ipoib_parent_unregister_pre(priv->dev);
 
-		/* Wrap rtnl_lock/unlock with mutex to protect sysfs calls */
-		mutex_lock(&priv->sysfs_mutex);
 		unregister_netdev(priv->dev);
-		mutex_unlock(&priv->sysfs_mutex);
 	}
 
 	kfree(dev_list);
@@ -2527,8 +2523,7 @@ static int __init ipoib_init_module(void)
 	 * its private workqueue, and we only queue up flush events
 	 * on our global flush workqueue.  This avoids the deadlocks.
 	 */
-	ipoib_workqueue = alloc_ordered_workqueue("ipoib_flush",
-						  WQ_MEM_RECLAIM);
+	ipoib_workqueue = alloc_ordered_workqueue("ipoib_flush", 0);
 	if (!ipoib_workqueue) {
 		ret = -ENOMEM;
 		goto err_fs;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 7776334cf8c5..891c5b40018a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -125,23 +125,16 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 	snprintf(intf_name, sizeof(intf_name), "%s.%04x",
 		 ppriv->dev->name, pkey);
 
-	if (!mutex_trylock(&ppriv->sysfs_mutex))
+	if (!rtnl_trylock())
 		return restart_syscall();
 
-	if (!rtnl_trylock()) {
-		mutex_unlock(&ppriv->sysfs_mutex);
-		return restart_syscall();
-	}
-
 	if (pdev->reg_state != NETREG_REGISTERED) {
 		rtnl_unlock();
-		mutex_unlock(&ppriv->sysfs_mutex);
 		return -EPERM;
 	}
 
 	if (!down_write_trylock(&ppriv->vlan_rwsem)) {
 		rtnl_unlock();
-		mutex_unlock(&ppriv->sysfs_mutex);
 		return restart_syscall();
 	}
 
@@ -178,58 +171,95 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 out:
 	up_write(&ppriv->vlan_rwsem);
 	rtnl_unlock();
-	mutex_unlock(&ppriv->sysfs_mutex);
 
 	return result;
 }
 
+struct ipoib_vlan_delete_work {
+	struct work_struct work;
+	struct net_device *dev;
+};
+
+/*
+ * sysfs callbacks of a netdevice cannot obtain the rtnl lock as
+ * unregister_netdev ultimately deletes the sysfs files while holding the rtnl
+ * lock. This deadlocks the system.
+ *
+ * A callback can use rtnl_trylock to avoid the deadlock but it cannot call
+ * unregister_netdev as that internally takes and releases the rtnl_lock.  So
+ * instead we find the netdev to unregister and then do the actual unregister
+ * from the global work queue where we can obtain the rtnl_lock safely.
+ */
+static void ipoib_vlan_delete_task(struct work_struct *work)
+{
+	struct ipoib_vlan_delete_work *pwork =
+		container_of(work, struct ipoib_vlan_delete_work, work);
+	struct net_device *dev = pwork->dev;
+
+	rtnl_lock();
+
+	/* Unregistering tasks can race with another task or parent removal */
+	if (dev->reg_state == NETREG_REGISTERED) {
+		struct ipoib_dev_priv *priv = ipoib_priv(dev);
+		struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
+
+		down_write(&ppriv->vlan_rwsem);
+		list_del(&priv->list);
+		up_write(&ppriv->vlan_rwsem);
+
+		ipoib_dbg(ppriv, "delete child vlan %s\n", dev->name);
+		unregister_netdevice(dev);
+	}
+
+	rtnl_unlock();
+
+	kfree(pwork);
+}
+
 int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 {
 	struct ipoib_dev_priv *ppriv, *priv, *tpriv;
-	struct net_device *dev = NULL;
+	int rc;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
-	ppriv = ipoib_priv(pdev);
-
-	if (!mutex_trylock(&ppriv->sysfs_mutex))
+	if (!rtnl_trylock())
 		return restart_syscall();
 
-	if (!rtnl_trylock()) {
-		mutex_unlock(&ppriv->sysfs_mutex);
-		return restart_syscall();
-	}
-
 	if (pdev->reg_state != NETREG_REGISTERED) {
 		rtnl_unlock();
-		mutex_unlock(&ppriv->sysfs_mutex);
 		return -EPERM;
 	}
 
-	if (!down_write_trylock(&ppriv->vlan_rwsem)) {
-		rtnl_unlock();
-		mutex_unlock(&ppriv->sysfs_mutex);
-		return restart_syscall();
-	}
+	ppriv = ipoib_priv(pdev);
 
+	rc = -ENODEV;
 	list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) {
 		if (priv->pkey == pkey &&
 		    priv->child_type == IPOIB_LEGACY_CHILD) {
-			list_del(&priv->list);
-			dev = priv->dev;
+			struct ipoib_vlan_delete_work *work;
+
+			work = kmalloc(sizeof(*work), GFP_KERNEL);
+			if (!work) {
+				rc = -ENOMEM;
+				goto out;
+			}
+
+			down_write(&ppriv->vlan_rwsem);
+			list_del_init(&priv->list);
+			up_write(&ppriv->vlan_rwsem);
+			work->dev = priv->dev;
+			INIT_WORK(&work->work, ipoib_vlan_delete_task);
+			queue_work(ipoib_workqueue, &work->work);
+
+			rc = 0;
 			break;
 		}
 	}
-	up_write(&ppriv->vlan_rwsem);
-
-	if (dev) {
-		ipoib_dbg(ppriv, "delete child vlan %s\n", dev->name);
-		unregister_netdevice(dev);
-	}
 
+out:
 	rtnl_unlock();
-	mutex_unlock(&ppriv->sysfs_mutex);
 
-	return (dev) ? 0 : -ENODEV;
+	return rc;
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 08/10] IB/ipoib: Do not remove child devices from within the ndo_uninit
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (6 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 07/10] IB/ipoib: Get rid of the sysfs_mutex Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:34 ` [PATCH rdma-next 09/10] IB/ipoib: Maintain the child_intfs list from ndo_init/uninit Leon Romanovsky
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

Switching to priv_destructor and needs_free_netdev created a subtle
ordering problem in ipoib_remove_one.

Now that unregister_netdev frees the netdev and priv we must ensure that
the children are unregistered before trying to unregister the parent,
or child unregister will use after free.

The solution is to unregister the children, then parent, in the same batch
all while holding the rtnl_lock. This closes all the races where a new
child could have been added and ensures proper ordering.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  7 +++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 28 +++++++++++++++++-----------
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c |  6 ++++++
 3 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 804cb4bee57d..1abe3c62f106 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -330,6 +330,13 @@ struct ipoib_dev_priv {
 
 	unsigned long flags;
 
+	/*
+	 * This protects access to the child_intfs list.
+	 * To READ from child_intfs the RTNL or vlan_rwsem read side must be
+	 * held.  To WRITE RTNL and the vlan_rwsem write side must be held (in
+	 * that order) This lock exists because we have a few contexts where
+	 * we need the child_intfs, but do not want to grab the RTNL.
+	 */
 	struct rw_semaphore vlan_rwsem;
 	struct mutex mcast_mutex;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index e9f4f261fe20..b2fe23d60103 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1939,18 +1939,15 @@ static int ipoib_ndo_init(struct net_device *ndev)
 
 static void ipoib_ndo_uninit(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = ipoib_priv(dev), *cpriv, *tcpriv;
-	LIST_HEAD(head);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ASSERT_RTNL();
 
-	/* Delete any child interfaces first */
-	list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
-		/* Stop GC on child */
-		cancel_delayed_work_sync(&cpriv->neigh_reap_task);
-		unregister_netdevice_queue(cpriv->dev, &head);
-	}
-	unregister_netdevice_many(&head);
+	/*
+	 * ipoib_remove_one guarantees the children are removed before the
+	 * parent, and that is the only place where a parent can be removed.
+	 */
+	WARN_ON(!list_empty(&priv->child_intfs));
 
 	ipoib_neigh_hash_uninit(dev);
 
@@ -2466,16 +2463,25 @@ static void ipoib_add_one(struct ib_device *device)
 
 static void ipoib_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ipoib_dev_priv *priv, *tmp;
+	struct ipoib_dev_priv *priv, *tmp, *cpriv, *tcpriv;
 	struct list_head *dev_list = client_data;
 
 	if (!dev_list)
 		return;
 
 	list_for_each_entry_safe(priv, tmp, dev_list, list) {
+		LIST_HEAD(head);
 		ipoib_parent_unregister_pre(priv->dev);
 
-		unregister_netdev(priv->dev);
+		rtnl_lock();
+
+		list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs,
+					 list)
+			unregister_netdevice_queue(cpriv->dev, &head);
+		unregister_netdevice_queue(priv->dev, &head);
+		unregister_netdevice_many(&head);
+
+		rtnl_unlock();
 	}
 
 	kfree(dev_list);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 891c5b40018a..fa4dfcee2644 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -67,6 +67,12 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 
 	ASSERT_RTNL();
 
+	/*
+	 * Racing with unregister of the parent must be prevented by the
+	 * caller.
+	 */
+	WARN_ON(ppriv->dev->reg_state != NETREG_REGISTERED);
+
 	priv->parent = ppriv->dev;
 	priv->pkey = pkey;
 	priv->child_type = type;
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 09/10] IB/ipoib: Maintain the child_intfs list from ndo_init/uninit
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (7 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 08/10] IB/ipoib: Do not remove child devices from within the ndo_uninit Leon Romanovsky
@ 2018-07-29  8:34 ` Leon Romanovsky
  2018-07-29  8:35 ` [PATCH rdma-next 10/10] IB/ipoib: Consolidate checking of the proposed child interface Leon Romanovsky
  2018-08-03  2:31 ` [PATCH rdma-next 00/10] IPoIB uninit Jason Gunthorpe
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

This fixes a bug in the netlink path where the vlan_rwsem was not
held around __ipoib_vlan_add causing the child_intfs to be manipulated
unsafely.

In the process this greatly simplifies the vlan_rwsem write side locking
to only cover a single non-sleeping statement.

This also further increases the safety of the removal ordering by holding
the netdev of the parent while the child is active to ensure most bugs
become either an oops on a NULL priv or a deadlock on the netdev refcount.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c    | 16 ++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 14 --------------
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    | 12 ------------
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b2fe23d60103..e3d28f9ad9c0 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1890,6 +1890,12 @@ static void ipoib_child_init(struct net_device *ndev)
 	struct ipoib_dev_priv *priv = ipoib_priv(ndev);
 	struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
 
+	dev_hold(priv->parent);
+
+	down_write(&ppriv->vlan_rwsem);
+	list_add_tail(&priv->list, &ppriv->child_intfs);
+	up_write(&ppriv->vlan_rwsem);
+
 	priv->max_ib_mtu = ppriv->max_ib_mtu;
 	set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
 	memcpy(priv->dev->dev_addr, ppriv->dev->dev_addr, INFINIBAND_ALEN);
@@ -1959,6 +1965,16 @@ static void ipoib_ndo_uninit(struct net_device *dev)
 		destroy_workqueue(priv->wq);
 		priv->wq = NULL;
 	}
+
+	if (priv->parent) {
+		struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
+
+		down_write(&ppriv->vlan_rwsem);
+		list_del(&priv->list);
+		up_write(&ppriv->vlan_rwsem);
+
+		dev_put(priv->parent);
+	}
 }
 
 static int ipoib_set_vf_link_state(struct net_device *dev, int vf, int link_state)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index a86928a80c08..7e093b7aad8f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -133,19 +133,6 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 	return err;
 }
 
-static void ipoib_unregister_child_dev(struct net_device *dev, struct list_head *head)
-{
-	struct ipoib_dev_priv *priv, *ppriv;
-
-	priv = ipoib_priv(dev);
-	ppriv = ipoib_priv(priv->parent);
-
-	down_write(&ppriv->vlan_rwsem);
-	unregister_netdevice_queue(dev, head);
-	list_del(&priv->list);
-	up_write(&ppriv->vlan_rwsem);
-}
-
 static size_t ipoib_get_size(const struct net_device *dev)
 {
 	return nla_total_size(2) +	/* IFLA_IPOIB_PKEY   */
@@ -161,7 +148,6 @@ static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
 	.setup		= ipoib_setup_common,
 	.newlink	= ipoib_new_child_link,
 	.changelink	= ipoib_changelink,
-	.dellink	= ipoib_unregister_child_dev,
 	.get_size	= ipoib_get_size,
 	.fill_info	= ipoib_fill_info,
 };
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index fa4dfcee2644..ca3a7f6c0998 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -106,8 +106,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 			goto sysfs_failed;
 	}
 
-	list_add_tail(&priv->list, &ppriv->child_intfs);
-
 	return 0;
 
 sysfs_failed:
@@ -139,11 +137,6 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		return -EPERM;
 	}
 
-	if (!down_write_trylock(&ppriv->vlan_rwsem)) {
-		rtnl_unlock();
-		return restart_syscall();
-	}
-
 	/*
 	 * First ensure this isn't a duplicate. We check the parent device and
 	 * then all of the legacy child interfaces to make sure the Pkey
@@ -175,7 +168,6 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		free_netdev(ndev);
 
 out:
-	up_write(&ppriv->vlan_rwsem);
 	rtnl_unlock();
 
 	return result;
@@ -209,10 +201,6 @@ static void ipoib_vlan_delete_task(struct work_struct *work)
 		struct ipoib_dev_priv *priv = ipoib_priv(dev);
 		struct ipoib_dev_priv *ppriv = ipoib_priv(priv->parent);
 
-		down_write(&ppriv->vlan_rwsem);
-		list_del(&priv->list);
-		up_write(&ppriv->vlan_rwsem);
-
 		ipoib_dbg(ppriv, "delete child vlan %s\n", dev->name);
 		unregister_netdevice(dev);
 	}
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next 10/10] IB/ipoib: Consolidate checking of the proposed child interface
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (8 preceding siblings ...)
  2018-07-29  8:34 ` [PATCH rdma-next 09/10] IB/ipoib: Maintain the child_intfs list from ndo_init/uninit Leon Romanovsky
@ 2018-07-29  8:35 ` Leon Romanovsky
  2018-08-03  2:31 ` [PATCH rdma-next 00/10] IPoIB uninit Jason Gunthorpe
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2018-07-29  8:35 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Denis Drozdov, Erez Shitrit,
	Saeed Mahameed, linux-netdev

From: Jason Gunthorpe <jgg@mellanox.com>

Move all the checking for pkey and other validity to the __ipoib_vlan_add
function. This removes the last difference from the control flow
of the __ipoib_vlan_add to make the overall design simpler to
understand.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c |  3 --
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    | 77 +++++++++++++++++++---------
 2 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index 7e093b7aad8f..d4d553a51fa9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -122,9 +122,6 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 	} else
 		child_pkey  = nla_get_u16(data[IFLA_IPOIB_PKEY]);
 
-	if (child_pkey == 0 || child_pkey == 0x8000)
-		return -EINVAL;
-
 	err = __ipoib_vlan_add(ppriv, ipoib_priv(dev),
 			       child_pkey, IPOIB_RTNL_CHILD);
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index ca3a7f6c0998..341753fbda54 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -50,6 +50,39 @@ static ssize_t show_parent(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR(parent, S_IRUGO, show_parent, NULL);
 
+static bool is_child_unique(struct ipoib_dev_priv *ppriv,
+			    struct ipoib_dev_priv *priv)
+{
+	struct ipoib_dev_priv *tpriv;
+
+	ASSERT_RTNL();
+
+	/*
+	 * Since the legacy sysfs interface uses pkey for deletion it cannot
+	 * support more than one interface with the same pkey, it creates
+	 * ambiguity.  The RTNL interface deletes using the netdev so it does
+	 * not have a problem to support duplicated pkeys.
+	 */
+	if (priv->child_type != IPOIB_LEGACY_CHILD)
+		return true;
+
+	/*
+	 * First ensure this isn't a duplicate. We check the parent device and
+	 * then all of the legacy child interfaces to make sure the Pkey
+	 * doesn't match.
+	 */
+	if (ppriv->pkey == priv->pkey)
+		return false;
+
+	list_for_each_entry(tpriv, &ppriv->child_intfs, list) {
+		if (tpriv->pkey == priv->pkey &&
+		    tpriv->child_type == IPOIB_LEGACY_CHILD)
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * NOTE: If this function fails then the priv->dev will remain valid, however
  * priv can have been freed and must not be touched by caller in the error
@@ -73,10 +106,20 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 	 */
 	WARN_ON(ppriv->dev->reg_state != NETREG_REGISTERED);
 
+	if (pkey == 0 || pkey == 0x8000) {
+		result = -EINVAL;
+		goto out_early;
+	}
+
 	priv->parent = ppriv->dev;
 	priv->pkey = pkey;
 	priv->child_type = type;
 
+	if (!is_child_unique(ppriv, priv)) {
+		result = -ENOTUNIQ;
+		goto out_early;
+	}
+
 	/* We do not need to touch priv if register_netdevice fails */
 	ndev->priv_destructor = ipoib_intf_free;
 
@@ -88,9 +131,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 		 * register_netdevice sometimes calls priv_destructor,
 		 * sometimes not. Make sure it was done.
 		 */
-		if (ndev->priv_destructor)
-			ndev->priv_destructor(ndev);
-		return result;
+		goto out_early;
 	}
 
 	/* RTNL childs don't need proprietary sysfs entries */
@@ -111,6 +152,11 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 sysfs_failed:
 	unregister_netdevice(priv->dev);
 	return -ENOMEM;
+
+out_early:
+	if (ndev->priv_destructor)
+		ndev->priv_destructor(ndev);
+	return result;
 }
 
 int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
@@ -118,17 +164,11 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 	struct ipoib_dev_priv *ppriv, *priv;
 	char intf_name[IFNAMSIZ];
 	struct net_device *ndev;
-	struct ipoib_dev_priv *tpriv;
 	int result;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
-	ppriv = ipoib_priv(pdev);
-
-	snprintf(intf_name, sizeof(intf_name), "%s.%04x",
-		 ppriv->dev->name, pkey);
-
 	if (!rtnl_trylock())
 		return restart_syscall();
 
@@ -137,23 +177,10 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		return -EPERM;
 	}
 
-	/*
-	 * First ensure this isn't a duplicate. We check the parent device and
-	 * then all of the legacy child interfaces to make sure the Pkey
-	 * doesn't match.
-	 */
-	if (ppriv->pkey == pkey) {
-		result = -ENOTUNIQ;
-		goto out;
-	}
+	ppriv = ipoib_priv(pdev);
 
-	list_for_each_entry(tpriv, &ppriv->child_intfs, list) {
-		if (tpriv->pkey == pkey &&
-		    tpriv->child_type == IPOIB_LEGACY_CHILD) {
-			result = -ENOTUNIQ;
-			goto out;
-		}
-	}
+	snprintf(intf_name, sizeof(intf_name), "%s.%04x",
+		 ppriv->dev->name, pkey);
 
 	priv = ipoib_intf_alloc(ppriv->ca, ppriv->port, intf_name);
 	if (!priv) {
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-next 00/10] IPoIB uninit
  2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
                   ` (9 preceding siblings ...)
  2018-07-29  8:35 ` [PATCH rdma-next 10/10] IB/ipoib: Consolidate checking of the proposed child interface Leon Romanovsky
@ 2018-08-03  2:31 ` Jason Gunthorpe
  10 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2018-08-03  2:31 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Denis Drozdov,
	Erez Shitrit, Saeed Mahameed, linux-netdev

On Sun, Jul 29, 2018 at 11:34:50AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> IP link was broken due to the changes in IPoIB for the rdma_netdev
> support after commit cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks").
> 
> This patchset restores IPoIB pkey creation and removal using rtnetlink.
> 
> It is completely rewritten variant of
> https://marc.info/?l=linux-rdma&m=151553425830918&w=2 patch series.
> 
> Thanks
> 
> Erez Shitrit (2):
>   IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
>   IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work
> 
> Jason Gunthorpe (8):
>   IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN
>   IB/ipoib: Move all uninit code into ndo_uninit
>   IB/ipoib: Move init code to ndo_init
>   RDMA/netdev: Use priv_destructor for netdev cleanup
>   IB/ipoib: Get rid of the sysfs_mutex
>   IB/ipoib: Do not remove child devices from within the ndo_uninit
>   IB/ipoib: Maintain the child_intfs list from ndo_init/uninit
>   IB/ipoib: Consolidate checking of the proposed child interface

Applied to for-next, finally this bug is fixed..

I squashed these two patches though:

   IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
   IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work

Thanks,
Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-03  2:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29  8:34 [PATCH rdma-next 00/10] IPoIB uninit Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 01/10] IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 02/10] IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 03/10] IB/ipoib: Move all uninit code into ndo_uninit Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 04/10] IB/ipoib: Make ipoib_neigh_hash_uninit fully fence its work Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 05/10] IB/ipoib: Move init code to ndo_init Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 06/10] RDMA/netdev: Use priv_destructor for netdev cleanup Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 07/10] IB/ipoib: Get rid of the sysfs_mutex Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 08/10] IB/ipoib: Do not remove child devices from within the ndo_uninit Leon Romanovsky
2018-07-29  8:34 ` [PATCH rdma-next 09/10] IB/ipoib: Maintain the child_intfs list from ndo_init/uninit Leon Romanovsky
2018-07-29  8:35 ` [PATCH rdma-next 10/10] IB/ipoib: Consolidate checking of the proposed child interface Leon Romanovsky
2018-08-03  2:31 ` [PATCH rdma-next 00/10] IPoIB uninit Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).