linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next V2 0/4]  RoCE GID management fixes
@ 2015-08-06 17:13 Matan Barak
       [not found] ` <1438881240-22790-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Matan Barak @ 2015-08-06 17:13 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Matan Barak, Haggai Eran, Somnath Kotur

This patch-set mainly fixes teardown and error flow fixes that Jason and
you have found in the recent submitted patchset. As suggested by
Jason, I've split the cache removal process to cleanup and release.
Although I don't expect the HW vendor drivers to use the cache after
it was cleaned up, releasing it in the put function could prevent
possible use-after-free errors. Saying that, we could alternatively
review the vendors' usage more carefully and probably put the release
function as part of the unregister device flow, but asynchronous contexts
make it harder to review.

The third patch in this series was written by Jason.
The current implementation of ib_register_device could have caused
the device to be released if an error occurs. By then, a consumer
would call  ib_dealloc_device and get a double-free bug.
This patch makes the following schema mandatory for all flows:
ib_alloc_device
ib_register_device
ib_unregister_device
ib_dealloc_device

In addition, this patch-set also fixes a small error flow issue that
was found by Dan Carpenter's kbuild system and a possible dead-lock.

Except for patch 0003, these patches could be squashed into
"IB/core: Add RoCE GID table management". If you choose to do so,
patch 0003 should come before this series. I can of course re-send
the RoCE GID table management series if needed.

Thanks,
Matan

Jason Gunthorpe (1):
  IB/core: Make ib_alloc_device initialize the kobject

Matan Barak (3):
  IB/core: Access to one past end of array in _gid_table_setup_one
  IB/core: Fix possible deadlock in write_gid
  IB/core: RoCE GID management separate cleanup and release

 drivers/infiniband/core/cache.c     | 109 +++++++++++++++++++++++++-----------
 drivers/infiniband/core/core_priv.h |   4 +-
 drivers/infiniband/core/device.c    | 109 +++++++++++++++++++++++-------------
 drivers/infiniband/core/sysfs.c     |  52 ++---------------
 4 files changed, 151 insertions(+), 123 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-next V2 1/4] IB/core: Access to one past end of array in _gid_table_setup_one
       [not found] ` <1438881240-22790-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-08-06 17:13   ` Matan Barak
  2015-08-06 17:13   ` [PATCH for-next V2 2/4] IB/core: Fix possible deadlock in write_gid Matan Barak
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Matan Barak @ 2015-08-06 17:13 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Matan Barak, Haggai Eran, Somnath Kotur

The error flow of _gid_table_setup_one accessed table[port] when
port was 1-based instead of 0-based.

Squash-into: 'IB/core: Add RoCE GID table management'
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index a9d5c70..01b8792 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -582,8 +582,9 @@ static int _gid_table_setup_one(struct ib_device *ib_dev)
 	return 0;
 
 rollback_table_setup:
-	for (port = 1; port <= ib_dev->phys_port_cnt; port++)
-		free_gid_table(ib_dev, port, table[port]);
+	for (port = 0; port < ib_dev->phys_port_cnt; port++)
+		free_gid_table(ib_dev, port + rdma_start_port(ib_dev),
+			       table[port]);
 
 	kfree(table);
 	return err;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-next V2 2/4] IB/core: Fix possible deadlock in write_gid
       [not found] ` <1438881240-22790-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-08-06 17:13   ` [PATCH for-next V2 1/4] IB/core: Access to one past end of array in _gid_table_setup_one Matan Barak
@ 2015-08-06 17:13   ` Matan Barak
  2015-08-06 17:13   ` [PATCH for-next V2 3/4] IB/core: Make ib_alloc_device initialize the kobject Matan Barak
  2015-08-06 17:14   ` [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release Matan Barak
  3 siblings, 0 replies; 11+ messages in thread
From: Matan Barak @ 2015-08-06 17:13 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Matan Barak, Haggai Eran, Somnath Kotur

Fixing the possible deadlock in write_gid:
write_gid -> write_lock()
<interrupt>
	find_gid -> read_lock_irqsave()
		    ** DEAD LOCK **LOCK

Fixing it by changing the write_lock to write_lock_irqsave.

Squash-into: 'IB/core: Add RoCE GID table management'
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 01b8792..21cf94b 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -121,15 +121,16 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
 {
 	int ret = 0;
 	struct net_device *old_net_dev;
+	unsigned long flags;
 
 	/* in rdma_cap_roce_gid_table, this funciton should be protected by a
 	 * sleep-able lock.
 	 */
-	write_lock(&table->data_vec[ix].lock);
+	write_lock_irqsave(&table->data_vec[ix].lock, flags);
 
 	if (rdma_cap_roce_gid_table(ib_dev, port)) {
 		table->data_vec[ix].props |= GID_TABLE_ENTRY_INVALID;
-		write_unlock(&table->data_vec[ix].lock);
+		write_unlock_irqrestore(&table->data_vec[ix].lock, flags);
 		/* GID_TABLE_WRITE_ACTION_MODIFY currently isn't supported by
 		 * RoCE providers and thus only updates the cache.
 		 */
@@ -139,7 +140,7 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
 		else if (action == GID_TABLE_WRITE_ACTION_DEL)
 			ret = ib_dev->del_gid(ib_dev, port, ix,
 					      &table->data_vec[ix].context);
-		write_lock(&table->data_vec[ix].lock);
+		write_lock_irqsave(&table->data_vec[ix].lock, flags);
 	}
 
 	old_net_dev = table->data_vec[ix].attr.ndev;
@@ -161,7 +162,7 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
 
 	table->data_vec[ix].props &= ~GID_TABLE_ENTRY_INVALID;
 
-	write_unlock(&table->data_vec[ix].lock);
+	write_unlock_irqrestore(&table->data_vec[ix].lock, flags);
 
 	if (!ret && rdma_cap_roce_gid_table(ib_dev, port)) {
 		struct ib_event event;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-next V2 3/4] IB/core: Make ib_alloc_device initialize the kobject
       [not found] ` <1438881240-22790-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-08-06 17:13   ` [PATCH for-next V2 1/4] IB/core: Access to one past end of array in _gid_table_setup_one Matan Barak
  2015-08-06 17:13   ` [PATCH for-next V2 2/4] IB/core: Fix possible deadlock in write_gid Matan Barak
@ 2015-08-06 17:13   ` Matan Barak
  2015-08-06 17:14   ` [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release Matan Barak
  3 siblings, 0 replies; 11+ messages in thread
From: Matan Barak @ 2015-08-06 17:13 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Matan Barak, Haggai Eran, Somnath Kotur

From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.

Consequently ib_device_release is now guaranteed to be called in
all situations and we can't duplicate its kfrees on error paths.
Choose to just drop kfree(port_immutable)

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 drivers/infiniband/core/core_priv.h |  3 --
 drivers/infiniband/core/device.c    | 92 ++++++++++++++++++++++++-------------
 drivers/infiniband/core/sysfs.c     | 52 ++-------------------
 3 files changed, 65 insertions(+), 82 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 210ddd2..6c4880e 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -43,9 +43,6 @@ int  ib_device_register_sysfs(struct ib_device *device,
 						   u8, struct kobject *));
 void ib_device_unregister_sysfs(struct ib_device *device);
 
-int  ib_sysfs_setup(void);
-void ib_sysfs_cleanup(void);
-
 void  ib_cache_setup(void);
 void ib_cache_cleanup(void);
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 14ea709..25c9301 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -161,6 +161,35 @@ static int alloc_name(char *name)
 	return 0;
 }
 
+static void ib_device_release(struct device *device)
+{
+	struct ib_device *dev = dev_get_drvdata(device);
+
+	kfree(dev->port_immutable);
+	kfree(dev);
+}
+
+static int ib_device_uevent(struct device *device,
+			    struct kobj_uevent_env *env)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+
+	if (add_uevent_var(env, "NAME=%s", dev->name))
+		return -ENOMEM;
+
+	/*
+	 * It would be nice to pass the node GUID with the event...
+	 */
+
+	return 0;
+}
+
+static struct class ib_class = {
+	.name    = "infiniband",
+	.dev_release = ib_device_release,
+	.dev_uevent = ib_device_uevent,
+};
+
 /**
  * ib_alloc_device - allocate an IB device struct
  * @size:size of structure to allocate
@@ -173,9 +202,27 @@ static int alloc_name(char *name)
  */
 struct ib_device *ib_alloc_device(size_t size)
 {
-	BUG_ON(size < sizeof (struct ib_device));
+	struct ib_device *device;
+
+	if (WARN_ON(size < sizeof(struct ib_device)))
+		return NULL;
+
+	device = kzalloc(size, GFP_KERNEL);
+	if (!device)
+		return NULL;
 
-	return kzalloc(size, GFP_KERNEL);
+	device->dev.class = &ib_class;
+	device_initialize(&device->dev);
+
+	dev_set_drvdata(&device->dev, device);
+
+	INIT_LIST_HEAD(&device->event_handler_list);
+	spin_lock_init(&device->event_handler_lock);
+	spin_lock_init(&device->client_data_lock);
+	INIT_LIST_HEAD(&device->client_data_list);
+	INIT_LIST_HEAD(&device->port_list);
+
+	return device;
 }
 EXPORT_SYMBOL(ib_alloc_device);
 
@@ -187,13 +234,8 @@ EXPORT_SYMBOL(ib_alloc_device);
  */
 void ib_dealloc_device(struct ib_device *device)
 {
-	if (device->reg_state == IB_DEV_UNINITIALIZED) {
-		kfree(device);
-		return;
-	}
-
-	BUG_ON(device->reg_state != IB_DEV_UNREGISTERED);
-
+	WARN_ON(device->reg_state != IB_DEV_UNREGISTERED &&
+		device->reg_state != IB_DEV_UNINITIALIZED);
 	kobject_put(&device->dev.kobj);
 }
 EXPORT_SYMBOL(ib_dealloc_device);
@@ -228,7 +270,7 @@ static int verify_immutable(const struct ib_device *dev, u8 port)
 
 static int read_port_immutable(struct ib_device *device)
 {
-	int ret = -ENOMEM;
+	int ret;
 	u8 start_port = rdma_start_port(device);
 	u8 end_port = rdma_end_port(device);
 	u8 port;
@@ -244,26 +286,18 @@ static int read_port_immutable(struct ib_device *device)
 					 * (end_port + 1),
 					 GFP_KERNEL);
 	if (!device->port_immutable)
-		goto err;
+		return -ENOMEM;
 
 	for (port = start_port; port <= end_port; ++port) {
 		ret = device->get_port_immutable(device, port,
 						 &device->port_immutable[port]);
 		if (ret)
-			goto err;
+			return ret;
 
-		if (verify_immutable(device, port)) {
-			ret = -EINVAL;
-			goto err;
-		}
+		if (verify_immutable(device, port))
+			return -EINVAL;
 	}
-
-	ret = 0;
-	goto out;
-err:
-	kfree(device->port_immutable);
-out:
-	return ret;
+	return 0;
 }
 
 /**
@@ -294,11 +328,6 @@ int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
-	INIT_LIST_HEAD(&device->event_handler_list);
-	INIT_LIST_HEAD(&device->client_data_list);
-	spin_lock_init(&device->event_handler_lock);
-	spin_lock_init(&device->client_data_lock);
-
 	ret = read_port_immutable(device);
 	if (ret) {
 		printk(KERN_WARNING "Couldn't create per port immutable data %s\n",
@@ -310,7 +339,6 @@ int ib_register_device(struct ib_device *device,
 	if (ret) {
 		printk(KERN_WARNING "Couldn't register device %s with driver model\n",
 		       device->name);
-		kfree(device->port_immutable);
 		goto out;
 	}
 
@@ -840,7 +868,7 @@ static int __init ib_core_init(void)
 	if (!ib_wq)
 		return -ENOMEM;
 
-	ret = ib_sysfs_setup();
+	ret = class_register(&ib_class);
 	if (ret) {
 		printk(KERN_WARNING "Couldn't create InfiniBand device class\n");
 		goto err;
@@ -857,7 +885,7 @@ static int __init ib_core_init(void)
 	return 0;
 
 err_sysfs:
-	ib_sysfs_cleanup();
+	class_unregister(&ib_class);
 
 err:
 	destroy_workqueue(ib_wq);
@@ -868,7 +896,7 @@ static void __exit ib_core_cleanup(void)
 {
 	ib_cache_cleanup();
 	ibnl_cleanup();
-	ib_sysfs_cleanup();
+	class_unregister(&ib_class);
 	/* Make sure that any pending umem accounting work is done. */
 	destroy_workqueue(ib_wq);
 }
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index c10c6d3..34cdd74 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -457,30 +457,6 @@ static struct kobj_type port_type = {
 	.default_attrs = port_default_attrs
 };
 
-static void ib_device_release(struct device *device)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-
-	ib_cache_cleanup_one(dev);
-	kfree(dev->port_immutable);
-	kfree(dev);
-}
-
-static int ib_device_uevent(struct device *device,
-			    struct kobj_uevent_env *env)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-
-	if (add_uevent_var(env, "NAME=%s", dev->name))
-		return -ENOMEM;
-
-	/*
-	 * It would be nice to pass the node GUID with the event...
-	 */
-
-	return 0;
-}
-
 static struct attribute **
 alloc_group_attrs(ssize_t (*show)(struct ib_port *,
 				  struct port_attribute *, char *buf),
@@ -703,12 +679,6 @@ static struct device_attribute *ib_class_attributes[] = {
 	&dev_attr_node_desc
 };
 
-static struct class ib_class = {
-	.name    = "infiniband",
-	.dev_release = ib_device_release,
-	.dev_uevent = ib_device_uevent,
-};
-
 /* Show a given an attribute in the statistics group */
 static ssize_t show_protocol_stat(const struct device *device,
 			    struct device_attribute *attr, char *buf,
@@ -847,14 +817,12 @@ int ib_device_register_sysfs(struct ib_device *device,
 	int ret;
 	int i;
 
-	class_dev->class      = &ib_class;
-	class_dev->parent     = device->dma_device;
-	dev_set_name(class_dev, "%s", device->name);
-	dev_set_drvdata(class_dev, device);
-
-	INIT_LIST_HEAD(&device->port_list);
+	device->dev.parent = device->dma_device;
+	ret = dev_set_name(class_dev, "%s", device->name);
+	if (ret)
+		return ret;
 
-	ret = device_register(class_dev);
+	ret = device_add(class_dev);
 	if (ret)
 		goto err;
 
@@ -917,13 +885,3 @@ void ib_device_unregister_sysfs(struct ib_device *device)
 
 	device_unregister(&device->dev);
 }
-
-int ib_sysfs_setup(void)
-{
-	return class_register(&ib_class);
-}
-
-void ib_sysfs_cleanup(void)
-{
-	class_unregister(&ib_class);
-}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found] ` <1438881240-22790-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-08-06 17:13   ` [PATCH for-next V2 3/4] IB/core: Make ib_alloc_device initialize the kobject Matan Barak
@ 2015-08-06 17:14   ` Matan Barak
       [not found]     ` <1438881240-22790-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  3 siblings, 1 reply; 11+ messages in thread
From: Matan Barak @ 2015-08-06 17:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Matan Barak, Haggai Eran, Somnath Kotur

Separate the cleanup and release IB cache functions.
The cleanup function delete all GIDs and unregister the event channel,
while the release function frees all memory. The cleanup function is
called after all clients were removed (in the device un-registration
process).

The release function is called after the device was put.
This is in order to avoid use-after-free errors if ther vendor
driver's teardown code uses IB cache.

Squash-into: 'IB/core: Add RoCE GID table management'
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c     | 101 +++++++++++++++++++++++++-----------
 drivers/infiniband/core/core_priv.h |   1 +
 drivers/infiniband/core/device.c    |  17 +++---
 3 files changed, 81 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 21cf94b..ba3720c 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -465,8 +465,16 @@ err_free_table:
 	return NULL;
 }
 
-static void free_gid_table(struct ib_device *ib_dev, u8 port,
-			   struct ib_gid_table *table)
+static void release_gid_table(struct ib_gid_table *table)
+{
+	if (table) {
+		kfree(table->data_vec);
+		kfree(table);
+	}
+}
+
+static void cleanup_gid_table_port(struct ib_device *ib_dev, u8 port,
+				   struct ib_gid_table *table)
 {
 	int i;
 
@@ -480,8 +488,6 @@ static void free_gid_table(struct ib_device *ib_dev, u8 port,
 				table->data_vec[i].props &
 				GID_ATTR_FIND_MASK_DEFAULT);
 	}
-	kfree(table->data_vec);
-	kfree(table);
 }
 
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
@@ -583,15 +589,17 @@ static int _gid_table_setup_one(struct ib_device *ib_dev)
 	return 0;
 
 rollback_table_setup:
-	for (port = 0; port < ib_dev->phys_port_cnt; port++)
-		free_gid_table(ib_dev, port + rdma_start_port(ib_dev),
-			       table[port]);
+	for (port = 0; port < ib_dev->phys_port_cnt; port++) {
+		cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
+				       table[port]);
+		release_gid_table(table[port]);
+	}
 
 	kfree(table);
 	return err;
 }
 
-static void gid_table_cleanup_one(struct ib_device *ib_dev)
+static void gid_table_release_one(struct ib_device *ib_dev)
 {
 	struct ib_gid_table **table = ib_dev->cache.gid_cache;
 	u8 port;
@@ -600,10 +608,23 @@ static void gid_table_cleanup_one(struct ib_device *ib_dev)
 		return;
 
 	for (port = 0; port < ib_dev->phys_port_cnt; port++)
-		free_gid_table(ib_dev, port + rdma_start_port(ib_dev),
-			       table[port]);
+		release_gid_table(table[port]);
 
 	kfree(table);
+	ib_dev->cache.gid_cache = NULL;
+}
+
+static void gid_table_cleanup_one(struct ib_device *ib_dev)
+{
+	struct ib_gid_table **table = ib_dev->cache.gid_cache;
+	u8 port;
+
+	if (!table)
+		return;
+
+	for (port = 0; port < ib_dev->phys_port_cnt; port++)
+		cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
+				       table[port]);
 }
 
 static int gid_table_setup_one(struct ib_device *ib_dev)
@@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
 					  (rdma_end_port(device) -
 					   rdma_start_port(device) + 1),
 					  GFP_KERNEL);
-	err = gid_table_setup_one(device);
-
-	if (!device->cache.pkey_cache || !device->cache.gid_cache ||
+	if (!device->cache.pkey_cache ||
 	    !device->cache.lmc_cache) {
 		printk(KERN_WARNING "Couldn't allocate cache "
 		       "for %s\n", device->name);
-		err = -ENOMEM;
-		goto err;
+		/* pkey_cache is freed here because we later access it's
+		 * elements.
+		 */
+		kfree(device->cache.pkey_cache);
+		device->cache.pkey_cache = NULL;
+		return -ENOMEM;
 	}
 
-	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
+	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
 		device->cache.pkey_cache[p] = NULL;
+
+	err = gid_table_setup_one(device);
+	if (err)
+		/* Allocated memory will be cleaned in the release funciton */
+		return err;
+
+	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
 		ib_cache_update(device, p + rdma_start_port(device));
-	}
 
 	INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
 			      device, ib_cache_event);
@@ -927,32 +956,44 @@ int ib_cache_setup_one(struct ib_device *device)
 	return 0;
 
 err_cache:
-	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
-		kfree(device->cache.pkey_cache[p]);
-
-err:
-	kfree(device->cache.pkey_cache);
 	gid_table_cleanup_one(device);
-	kfree(device->cache.lmc_cache);
 
 	return err;
 }
 
-void ib_cache_cleanup_one(struct ib_device *device)
+void ib_cache_release_one(struct ib_device *device)
 {
 	int p;
 
-	ib_unregister_event_handler(&device->cache.event_handler);
-	flush_workqueue(ib_wq);
-
-	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
-		kfree(device->cache.pkey_cache[p]);
+	/* The release function frees all the cache elements.
+	 * This function should be called as part of freeing
+	 * all the device's resources when the cache could no
+	 * longer be accessed.
+	 */
+	if (device->cache.pkey_cache) {
+		for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
+			kfree(device->cache.pkey_cache[p]);
+	}
 
+	gid_table_release_one(device);
 	kfree(device->cache.pkey_cache);
-	gid_table_cleanup_one(device);
 	kfree(device->cache.lmc_cache);
 }
 
+void ib_cache_cleanup_one(struct ib_device *device)
+{
+	/* The cleanup function unregisters the event handler,
+	 * waits for all in-progress workqueue elements and cleans
+	 * up the GID cache. This function should be called after
+	 * the device was removed from the devices list and all
+	 * clients were removed, so the cache exists but is
+	 * non-functional and shouldn't be updated anymore.
+	 */
+	ib_unregister_event_handler(&device->cache.event_handler);
+	flush_workqueue(ib_wq);
+	gid_table_cleanup_one(device);
+}
+
 void __init ib_cache_setup(void)
 {
 	roce_gid_mgmt_init();
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 6c4880e..0ebcd29 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -95,5 +95,6 @@ int roce_rescan_device(struct ib_device *ib_dev);
 
 int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
+void ib_cache_release_one(struct ib_device *device);
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 25c9301..853e3d2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -165,6 +165,7 @@ static void ib_device_release(struct device *device)
 {
 	struct ib_device *dev = dev_get_drvdata(device);
 
+	ib_cache_release_one(dev);
 	kfree(dev->port_immutable);
 	kfree(dev);
 }
@@ -335,8 +336,15 @@ int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
+	ret = ib_cache_setup_one(device);
+	if (ret) {
+		printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
+		goto out;
+	}
+
 	ret = ib_device_register_sysfs(device, port_callback);
 	if (ret) {
+		ib_cache_cleanup_one(device);
 		printk(KERN_WARNING "Couldn't register device %s with driver model\n",
 		       device->name);
 		goto out;
@@ -344,14 +352,6 @@ int ib_register_device(struct ib_device *device,
 
 	device->reg_state = IB_DEV_REGISTERED;
 
-	ret = ib_cache_setup_one(device);
-	if (ret) {
-		printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
-		ib_device_unregister_sysfs(device);
-		kfree(device->port_immutable);
-		goto out;
-	}
-
 	{
 		struct ib_client *client;
 
@@ -395,6 +395,7 @@ void ib_unregister_device(struct ib_device *device)
 	mutex_unlock(&device_mutex);
 
 	ib_device_unregister_sysfs(device);
+	ib_cache_cleanup_one(device);
 
 	spin_lock_irqsave(&device->client_data_lock, flags);
 	list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found]     ` <1438881240-22790-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-08-11  6:02       ` Jason Gunthorpe
  2015-08-15 14:14       ` Doug Ledford
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2015-08-11  6:02 UTC (permalink / raw)
  To: Matan Barak
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Haggai Eran, Somnath Kotur

On Thu, Aug 06, 2015 at 08:14:00PM +0300, Matan Barak wrote:
> Separate the cleanup and release IB cache functions.
> The cleanup function delete all GIDs and unregister the event channel,
> while the release function frees all memory. The cleanup function is
> called after all clients were removed (in the device un-registration
> process).

I'm finding this patches ontop of patches rebase stuff to be hard to
follow, but this looks like the right general idea. Hopefully Doug can
check the net result carefully..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found]     ` <1438881240-22790-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-08-11  6:02       ` Jason Gunthorpe
@ 2015-08-15 14:14       ` Doug Ledford
       [not found]         ` <55CF495C.6010708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Doug Ledford @ 2015-08-15 14:14 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Haggai Eran, Somnath Kotur

[-- Attachment #1: Type: text/plain, Size: 2402 bytes --]

On 08/06/2015 01:14 PM, Matan Barak wrote:
> Separate the cleanup and release IB cache functions.
> The cleanup function delete all GIDs and unregister the event channel,
> while the release function frees all memory. The cleanup function is
> called after all clients were removed (in the device un-registration
> process).
> 
> The release function is called after the device was put.
> This is in order to avoid use-after-free errors if ther vendor
> driver's teardown code uses IB cache.
> 
> Squash-into: 'IB/core: Add RoCE GID table management'
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

I had originally used the v1 of this patch instead of v2.  In order to
make sure I didn't miss anything, I hand compared the final results of
the v1 patch to what this patch would have created if I had used it.  I
found a few things worth nothing, comments inline below.

> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>  					  (rdma_end_port(device) -
>  					   rdma_start_port(device) + 1),
>  					  GFP_KERNEL);
> -	err = gid_table_setup_one(device);
> -
> -	if (!device->cache.pkey_cache || !device->cache.gid_cache ||
> +	if (!device->cache.pkey_cache ||
>  	    !device->cache.lmc_cache) {
>  		printk(KERN_WARNING "Couldn't allocate cache "
>  		       "for %s\n", device->name);
> -		err = -ENOMEM;
> -		goto err;
> +		/* pkey_cache is freed here because we later access it's
> +		 * elements.
> +		 */
> +		kfree(device->cache.pkey_cache);
> +		device->cache.pkey_cache = NULL;
> +		return -ENOMEM;

This function is unnecessarily convoluted IMO.  A simple change to using
kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
result, I fixed this function up to be different than either patch.
Please look over my results and make sure you agree with my changes.

>  	}
>  
> -	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
> +	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
>  		device->cache.pkey_cache[p] = NULL;

For instance this goes away entirely by using kzalloc().

The rest looked ok.  I have to fixup the ordering changes between sysfs
and cache to make the v1 patch match the v2 patch.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found]         ` <55CF495C.6010708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-08-16  8:19           ` Matan Barak
       [not found]             ` <55D04785.9070007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Matan Barak @ 2015-08-16  8:19 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Haggai Eran, Somnath Kotur



On 8/15/2015 5:14 PM, Doug Ledford wrote:
> On 08/06/2015 01:14 PM, Matan Barak wrote:
>> Separate the cleanup and release IB cache functions.
>> The cleanup function delete all GIDs and unregister the event channel,
>> while the release function frees all memory. The cleanup function is
>> called after all clients were removed (in the device un-registration
>> process).
>>
>> The release function is called after the device was put.
>> This is in order to avoid use-after-free errors if ther vendor
>> driver's teardown code uses IB cache.
>>
>> Squash-into: 'IB/core: Add RoCE GID table management'
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> I had originally used the v1 of this patch instead of v2.  In order to
> make sure I didn't miss anything, I hand compared the final results of
> the v1 patch to what this patch would have created if I had used it.  I
> found a few things worth nothing, comments inline below.
>
>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>   					  (rdma_end_port(device) -
>>   					   rdma_start_port(device) + 1),
>>   					  GFP_KERNEL);
>> -	err = gid_table_setup_one(device);
>> -
>> -	if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>> +	if (!device->cache.pkey_cache ||
>>   	    !device->cache.lmc_cache) {
>>   		printk(KERN_WARNING "Couldn't allocate cache "
>>   		       "for %s\n", device->name);
>> -		err = -ENOMEM;
>> -		goto err;
>> +		/* pkey_cache is freed here because we later access it's
>> +		 * elements.
>> +		 */
>> +		kfree(device->cache.pkey_cache);
>> +		device->cache.pkey_cache = NULL;
>> +		return -ENOMEM;
>
> This function is unnecessarily convoluted IMO.  A simple change to using
> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
> result, I fixed this function up to be different than either patch.
> Please look over my results and make sure you agree with my changes.
>
>>   	}
>>
>> -	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
>> +	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
>>   		device->cache.pkey_cache[p] = NULL;
>
> For instance this goes away entirely by using kzalloc().
>
> The rest looked ok.  I have to fixup the ordering changes between sysfs
> and cache to make the v1 patch match the v2 patch.
>
>

Thanks for the squashing and re-ordering these patches.
I think you're missing if (device->cache.pkey_cache) over the for pkey 
free loop in ib_cache_release_one. Otherwise, the allocation of 
device->cache.pkey_cache might has failed and you dereference an invalid 
address device->cache.pkey_cache[p].

In addition, ib_device_release calls ib_cache_release_one with device 
instead of dev (which has the wrong type).
ib_register_device calls ib_cache_clean_one that I can't really find :)


Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found]             ` <55D04785.9070007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-08-16 16:24               ` Doug Ledford
       [not found]                 ` <55D0B925.8000504-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Doug Ledford @ 2015-08-16 16:24 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Haggai Eran, Somnath Kotur

[-- Attachment #1: Type: text/plain, Size: 3857 bytes --]

On 08/16/2015 04:19 AM, Matan Barak wrote:
> 
> 
> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>> Separate the cleanup and release IB cache functions.
>>> The cleanup function delete all GIDs and unregister the event channel,
>>> while the release function frees all memory. The cleanup function is
>>> called after all clients were removed (in the device un-registration
>>> process).
>>>
>>> The release function is called after the device was put.
>>> This is in order to avoid use-after-free errors if ther vendor
>>> driver's teardown code uses IB cache.
>>>
>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> I had originally used the v1 of this patch instead of v2.  In order to
>> make sure I didn't miss anything, I hand compared the final results of
>> the v1 patch to what this patch would have created if I had used it.  I
>> found a few things worth nothing, comments inline below.
>>
>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>                         (rdma_end_port(device) -
>>>                          rdma_start_port(device) + 1),
>>>                         GFP_KERNEL);
>>> -    err = gid_table_setup_one(device);
>>> -
>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>> +    if (!device->cache.pkey_cache ||
>>>           !device->cache.lmc_cache) {
>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>                  "for %s\n", device->name);
>>> -        err = -ENOMEM;
>>> -        goto err;
>>> +        /* pkey_cache is freed here because we later access it's
>>> +         * elements.
>>> +         */
>>> +        kfree(device->cache.pkey_cache);
>>> +        device->cache.pkey_cache = NULL;
>>> +        return -ENOMEM;
>>
>> This function is unnecessarily convoluted IMO.  A simple change to using
>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>> result, I fixed this function up to be different than either patch.
>> Please look over my results and make sure you agree with my changes.
>>
>>>       }
>>>
>>> -    for (p = 0; p <= rdma_end_port(device) -
>>> rdma_start_port(device); ++p) {
>>> +    for (p = 0; p <= rdma_end_port(device) -
>>> rdma_start_port(device); ++p)
>>>           device->cache.pkey_cache[p] = NULL;
>>
>> For instance this goes away entirely by using kzalloc().
>>
>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>> and cache to make the v1 patch match the v2 patch.
>>
>>
> 
> Thanks for the squashing and re-ordering these patches.
> I think you're missing if (device->cache.pkey_cache) over the for pkey
> free loop in ib_cache_release_one.

I think you're right, but I think that particular issue exists in both
versions of the code (mine and yours).  In particular, a failure during
ib_register_device should result in a call to ib_dealloc_device which
will result in a call to ib_cache_release_one, which will happen whether
ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
must check pkey_cache before releasing the ports because it can't assume
the original setup succeeded.

> Otherwise, the allocation of
> device->cache.pkey_cache might has failed and you dereference an invalid
> address device->cache.pkey_cache[p].
> 
> In addition, ib_device_release calls ib_cache_release_one with device
> instead of dev (which has the wrong type).
> ib_register_device calls ib_cache_clean_one that I can't really find :)

Both of these turned up in compile tests.  I've since resolved them.
Hand merge issues ;-)

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found]                 ` <55D0B925.8000504-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-08-16 16:48                   ` Doug Ledford
       [not found]                     ` <55D0BEE3.3040000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Doug Ledford @ 2015-08-16 16:48 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe, Or Gerlitz,
	Haggai Eran, Somnath Kotur

[-- Attachment #1: Type: text/plain, Size: 4088 bytes --]

On 08/16/2015 12:24 PM, Doug Ledford wrote:
> On 08/16/2015 04:19 AM, Matan Barak wrote:
>>
>>
>> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>>> Separate the cleanup and release IB cache functions.
>>>> The cleanup function delete all GIDs and unregister the event channel,
>>>> while the release function frees all memory. The cleanup function is
>>>> called after all clients were removed (in the device un-registration
>>>> process).
>>>>
>>>> The release function is called after the device was put.
>>>> This is in order to avoid use-after-free errors if ther vendor
>>>> driver's teardown code uses IB cache.
>>>>
>>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>
>>> I had originally used the v1 of this patch instead of v2.  In order to
>>> make sure I didn't miss anything, I hand compared the final results of
>>> the v1 patch to what this patch would have created if I had used it.  I
>>> found a few things worth nothing, comments inline below.
>>>
>>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>>                         (rdma_end_port(device) -
>>>>                          rdma_start_port(device) + 1),
>>>>                         GFP_KERNEL);
>>>> -    err = gid_table_setup_one(device);
>>>> -
>>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>>> +    if (!device->cache.pkey_cache ||
>>>>           !device->cache.lmc_cache) {
>>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>>                  "for %s\n", device->name);
>>>> -        err = -ENOMEM;
>>>> -        goto err;
>>>> +        /* pkey_cache is freed here because we later access it's
>>>> +         * elements.
>>>> +         */
>>>> +        kfree(device->cache.pkey_cache);
>>>> +        device->cache.pkey_cache = NULL;
>>>> +        return -ENOMEM;
>>>
>>> This function is unnecessarily convoluted IMO.  A simple change to using
>>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>>> result, I fixed this function up to be different than either patch.
>>> Please look over my results and make sure you agree with my changes.
>>>
>>>>       }
>>>>
>>>> -    for (p = 0; p <= rdma_end_port(device) -
>>>> rdma_start_port(device); ++p) {
>>>> +    for (p = 0; p <= rdma_end_port(device) -
>>>> rdma_start_port(device); ++p)
>>>>           device->cache.pkey_cache[p] = NULL;
>>>
>>> For instance this goes away entirely by using kzalloc().
>>>
>>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>>> and cache to make the v1 patch match the v2 patch.
>>>
>>>
>>
>> Thanks for the squashing and re-ordering these patches.
>> I think you're missing if (device->cache.pkey_cache) over the for pkey
>> free loop in ib_cache_release_one.
> 
> I think you're right, but I think that particular issue exists in both
> versions of the code (mine and yours).  In particular, a failure during
> ib_register_device should result in a call to ib_dealloc_device which
> will result in a call to ib_cache_release_one, which will happen whether
> ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
> must check pkey_cache before releasing the ports because it can't assume
> the original setup succeeded.
> 
>> Otherwise, the allocation of
>> device->cache.pkey_cache might has failed and you dereference an invalid
>> address device->cache.pkey_cache[p].
>>
>> In addition, ib_device_release calls ib_cache_release_one with device
>> instead of dev (which has the wrong type).
>> ib_register_device calls ib_cache_clean_one that I can't really find :)
> 
> Both of these turned up in compile tests.  I've since resolved them.
> Hand merge issues ;-)
> 

I've pushed what I think is a fixed version to github.  Please feel free
to review.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release
       [not found]                     ` <55D0BEE3.3040000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-08-19 13:16                       ` Matan Barak
  0 siblings, 0 replies; 11+ messages in thread
From: Matan Barak @ 2015-08-19 13:16 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe,
	Or Gerlitz, Haggai Eran, Somnath Kotur

On Sun, Aug 16, 2015 at 7:48 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 08/16/2015 12:24 PM, Doug Ledford wrote:
>> On 08/16/2015 04:19 AM, Matan Barak wrote:
>>>
>>>
>>> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>>>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>>>> Separate the cleanup and release IB cache functions.
>>>>> The cleanup function delete all GIDs and unregister the event channel,
>>>>> while the release function frees all memory. The cleanup function is
>>>>> called after all clients were removed (in the device un-registration
>>>>> process).
>>>>>
>>>>> The release function is called after the device was put.
>>>>> This is in order to avoid use-after-free errors if ther vendor
>>>>> driver's teardown code uses IB cache.
>>>>>
>>>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>>>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>
>>>> I had originally used the v1 of this patch instead of v2.  In order to
>>>> make sure I didn't miss anything, I hand compared the final results of
>>>> the v1 patch to what this patch would have created if I had used it.  I
>>>> found a few things worth nothing, comments inline below.
>>>>
>>>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>>>                         (rdma_end_port(device) -
>>>>>                          rdma_start_port(device) + 1),
>>>>>                         GFP_KERNEL);
>>>>> -    err = gid_table_setup_one(device);
>>>>> -
>>>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>>>> +    if (!device->cache.pkey_cache ||
>>>>>           !device->cache.lmc_cache) {
>>>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>>>                  "for %s\n", device->name);
>>>>> -        err = -ENOMEM;
>>>>> -        goto err;
>>>>> +        /* pkey_cache is freed here because we later access it's
>>>>> +         * elements.
>>>>> +         */
>>>>> +        kfree(device->cache.pkey_cache);
>>>>> +        device->cache.pkey_cache = NULL;
>>>>> +        return -ENOMEM;
>>>>
>>>> This function is unnecessarily convoluted IMO.  A simple change to using
>>>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>>>> result, I fixed this function up to be different than either patch.
>>>> Please look over my results and make sure you agree with my changes.
>>>>
>>>>>       }
>>>>>
>>>>> -    for (p = 0; p <= rdma_end_port(device) -
>>>>> rdma_start_port(device); ++p) {
>>>>> +    for (p = 0; p <= rdma_end_port(device) -
>>>>> rdma_start_port(device); ++p)
>>>>>           device->cache.pkey_cache[p] = NULL;
>>>>
>>>> For instance this goes away entirely by using kzalloc().
>>>>
>>>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>>>> and cache to make the v1 patch match the v2 patch.
>>>>
>>>>
>>>
>>> Thanks for the squashing and re-ordering these patches.
>>> I think you're missing if (device->cache.pkey_cache) over the for pkey
>>> free loop in ib_cache_release_one.
>>
>> I think you're right, but I think that particular issue exists in both
>> versions of the code (mine and yours).  In particular, a failure during
>> ib_register_device should result in a call to ib_dealloc_device which
>> will result in a call to ib_cache_release_one, which will happen whether
>> ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
>> must check pkey_cache before releasing the ports because it can't assume
>> the original setup succeeded.
>>
>>> Otherwise, the allocation of
>>> device->cache.pkey_cache might has failed and you dereference an invalid
>>> address device->cache.pkey_cache[p].
>>>
>>> In addition, ib_device_release calls ib_cache_release_one with device
>>> instead of dev (which has the wrong type).
>>> ib_register_device calls ib_cache_clean_one that I can't really find :)
>>
>> Both of these turned up in compile tests.  I've since resolved them.
>> Hand merge issues ;-)
>>
>
> I've pushed what I think is a fixed version to github.  Please feel free
> to review.
>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>               GPG KeyID: 0E572FDD
>
>

I think that after applying Dan Carpenter's "[patch] IB/core: missing
curly braces in ib_find_gid()" it's fine :-)
Thanks.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-19 13:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-06 17:13 [PATCH for-next V2 0/4] RoCE GID management fixes Matan Barak
     [not found] ` <1438881240-22790-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-06 17:13   ` [PATCH for-next V2 1/4] IB/core: Access to one past end of array in _gid_table_setup_one Matan Barak
2015-08-06 17:13   ` [PATCH for-next V2 2/4] IB/core: Fix possible deadlock in write_gid Matan Barak
2015-08-06 17:13   ` [PATCH for-next V2 3/4] IB/core: Make ib_alloc_device initialize the kobject Matan Barak
2015-08-06 17:14   ` [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release Matan Barak
     [not found]     ` <1438881240-22790-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-11  6:02       ` Jason Gunthorpe
2015-08-15 14:14       ` Doug Ledford
     [not found]         ` <55CF495C.6010708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-08-16  8:19           ` Matan Barak
     [not found]             ` <55D04785.9070007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-16 16:24               ` Doug Ledford
     [not found]                 ` <55D0B925.8000504-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-08-16 16:48                   ` Doug Ledford
     [not found]                     ` <55D0BEE3.3040000-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-08-19 13:16                       ` Matan Barak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).