linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/14] Cleanup locking and events in ucma
@ 2020-08-18 12:05 Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 01/14] RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx() Leon Romanovsky
                   ` (14 more replies)
  0 siblings, 15 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Leon Romanovsky, linux-kernel, linux-rdma,
	Roland Dreier, Sean Hefty

From: Leon Romanovsky <leonro@nvidia.com>

From Jason:

Rework how the uevents for new connections are handled so all the locking
ends up simpler and a work queue can be removed. This should also speed up
destruction of ucma_context's as a flush_workqueue() was replaced with
cancel_work_sync().

The simpler locking comes from narrowing what file->mut covers and moving
other data to other locks, particularly by injecting the handler_mutex
from the RDMA CM core as a construct available to ULPs. The handler_mutex
directly prevents handlers from running without creating any ABBA locking
problems.

Fix various error cases and data races caused by missing locking.

Thanks

Jason Gunthorpe (14):
  RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()
  RDMA/ucma: Remove unnecessary locking of file->ctx_list in close
  RDMA/ucma: Consolidate the two destroy flows
  RDMA/ucma: Fix error cases around ucma_alloc_ctx()
  RDMA/ucma: Remove mc_list and rely on xarray
  RDMA/cma: Add missing locking to rdma_accept()
  RDMA/ucma: Do not use file->mut to lock destroying
  RDMA/ucma: Fix the locking of ctx->file
  RDMA/ucma: Fix locking for ctx->events_reported
  RDMA/ucma: Add missing locking around rdma_leave_multicast()
  RDMA/ucma: Change backlog into an atomic
  RDMA/ucma: Narrow file->mut in ucma_event_handler()
  RDMA/ucma: Rework how new connections are passed through event
    delivery
  RDMA/ucma: Remove closing and the close_wq

 drivers/infiniband/core/cma.c  |  25 +-
 drivers/infiniband/core/ucma.c | 444 +++++++++++++++------------------
 include/rdma/rdma_cm.h         |   5 +
 3 files changed, 226 insertions(+), 248 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 01/14] RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 02/14] RDMA/ucma: Remove unnecessary locking of file->ctx_list in close Leon Romanovsky
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a
wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming
that the refcount can only go to zero from ucma_destroy_id() which also
removes it from the xarray.

Use refcount_inc_not_zero() instead.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index d03dacaef788..625168563443 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -153,8 +153,8 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
 	if (!IS_ERR(ctx)) {
 		if (ctx->closing)
 			ctx = ERR_PTR(-EIO);
-		else
-			refcount_inc(&ctx->ref);
+		else if (!refcount_inc_not_zero(&ctx->ref))
+			ctx = ERR_PTR(-ENXIO);
 	}
 	xa_unlock(&ctx_table);
 	return ctx;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 02/14] RDMA/ucma: Remove unnecessary locking of file->ctx_list in close
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 01/14] RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx() Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 03/14] RDMA/ucma: Consolidate the two destroy flows Leon Romanovsky
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

During the file_operations release function it is already not possible
that write() can be running concurrently, remove the extra locking
around the ctx_list.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 625168563443..9b019f31743d 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1824,12 +1824,17 @@ static int ucma_close(struct inode *inode, struct file *filp)
 	struct ucma_file *file = filp->private_data;
 	struct ucma_context *ctx, *tmp;
 
-	mutex_lock(&file->mut);
+	/*
+	 * ctx_list can only be mutated under the write(), which is no longer
+	 * possible, so no locking needed.
+	 */
 	list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
+		xa_erase(&ctx_table, ctx->id);
+
+		mutex_lock(&file->mut);
 		ctx->destroying = 1;
 		mutex_unlock(&file->mut);
 
-		xa_erase(&ctx_table, ctx->id);
 		flush_workqueue(file->close_wq);
 		/* At that step once ctx was marked as destroying and workqueue
 		 * was flushed we are safe from any inflights handlers that
@@ -1849,9 +1854,7 @@ static int ucma_close(struct inode *inode, struct file *filp)
 		}
 
 		ucma_free_ctx(ctx);
-		mutex_lock(&file->mut);
 	}
-	mutex_unlock(&file->mut);
 	destroy_workqueue(file->close_wq);
 	kfree(file);
 	return 0;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 03/14] RDMA/ucma: Consolidate the two destroy flows
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 01/14] RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx() Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 02/14] RDMA/ucma: Remove unnecessary locking of file->ctx_list in close Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 04/14] RDMA/ucma: Fix error cases around ucma_alloc_ctx() Leon Romanovsky
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate
this duplicated code into a function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 64 ++++++++++++----------------------
 1 file changed, 22 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 9b019f31743d..878cbb94065f 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -601,6 +601,26 @@ static int ucma_free_ctx(struct ucma_context *ctx)
 	return events_reported;
 }
 
+static int __destroy_id(struct ucma_context *ctx)
+{
+	mutex_lock(&ctx->file->mut);
+	ctx->destroying = 1;
+	mutex_unlock(&ctx->file->mut);
+
+	flush_workqueue(ctx->file->close_wq);
+	/* At this point it's guaranteed that there is no inflight closing task */
+	xa_lock(&ctx_table);
+	if (!ctx->closing) {
+		xa_unlock(&ctx_table);
+		ucma_put_ctx(ctx);
+		wait_for_completion(&ctx->comp);
+		rdma_destroy_id(ctx->cm_id);
+	} else {
+		xa_unlock(&ctx_table);
+	}
+	return ucma_free_ctx(ctx);
+}
+
 static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
 			       int in_len, int out_len)
 {
@@ -624,24 +644,7 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
 
-	mutex_lock(&ctx->file->mut);
-	ctx->destroying = 1;
-	mutex_unlock(&ctx->file->mut);
-
-	flush_workqueue(ctx->file->close_wq);
-	/* At this point it's guaranteed that there is no inflight
-	 * closing task */
-	xa_lock(&ctx_table);
-	if (!ctx->closing) {
-		xa_unlock(&ctx_table);
-		ucma_put_ctx(ctx);
-		wait_for_completion(&ctx->comp);
-		rdma_destroy_id(ctx->cm_id);
-	} else {
-		xa_unlock(&ctx_table);
-	}
-
-	resp.events_reported = ucma_free_ctx(ctx);
+	resp.events_reported = __destroy_id(ctx);
 	if (copy_to_user(u64_to_user_ptr(cmd.response),
 			 &resp, sizeof(resp)))
 		ret = -EFAULT;
@@ -1830,30 +1833,7 @@ static int ucma_close(struct inode *inode, struct file *filp)
 	 */
 	list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
 		xa_erase(&ctx_table, ctx->id);
-
-		mutex_lock(&file->mut);
-		ctx->destroying = 1;
-		mutex_unlock(&file->mut);
-
-		flush_workqueue(file->close_wq);
-		/* At that step once ctx was marked as destroying and workqueue
-		 * was flushed we are safe from any inflights handlers that
-		 * might put other closing task.
-		 */
-		xa_lock(&ctx_table);
-		if (!ctx->closing) {
-			xa_unlock(&ctx_table);
-			ucma_put_ctx(ctx);
-			wait_for_completion(&ctx->comp);
-			/* rdma_destroy_id ensures that no event handlers are
-			 * inflight for that id before releasing it.
-			 */
-			rdma_destroy_id(ctx->cm_id);
-		} else {
-			xa_unlock(&ctx_table);
-		}
-
-		ucma_free_ctx(ctx);
+		__destroy_id(ctx);
 	}
 	destroy_workqueue(file->close_wq);
 	kfree(file);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 04/14] RDMA/ucma: Fix error cases around ucma_alloc_ctx()
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 03/14] RDMA/ucma: Consolidate the two destroy flows Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 05/14] RDMA/ucma: Remove mc_list and rely on xarray Leon Romanovsky
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

The store to ctx->cm_id was based on the idea that _ucma_find_context()
would not return the ctx until it was fully setup.

Without locking this doesn't work properly.

Split things so that the xarray is allocated with NULL to reserve the ID
and once everything is final set the cm_id and store.

Along the way this shows that the error unwind in ucma_get_event() if a
new ctx is created is wrong, fix it up.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 68 +++++++++++++++++++++-------------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 878cbb94065f..7416a5a6aa69 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -130,6 +130,7 @@ static DEFINE_XARRAY_ALLOC(ctx_table);
 static DEFINE_XARRAY_ALLOC(multicast_table);
 
 static const struct file_operations ucma_fops;
+static int __destroy_id(struct ucma_context *ctx);
 
 static inline struct ucma_context *_ucma_find_context(int id,
 						      struct ucma_file *file)
@@ -139,7 +140,7 @@ static inline struct ucma_context *_ucma_find_context(int id,
 	ctx = xa_load(&ctx_table, id);
 	if (!ctx)
 		ctx = ERR_PTR(-ENOENT);
-	else if (ctx->file != file || !ctx->cm_id)
+	else if (ctx->file != file)
 		ctx = ERR_PTR(-EINVAL);
 	return ctx;
 }
@@ -217,18 +218,23 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
 	refcount_set(&ctx->ref, 1);
 	init_completion(&ctx->comp);
 	INIT_LIST_HEAD(&ctx->mc_list);
+	/* So list_del() will work if we don't do ucma_finish_ctx() */
+	INIT_LIST_HEAD(&ctx->list);
 	ctx->file = file;
 	mutex_init(&ctx->mutex);
 
-	if (xa_alloc(&ctx_table, &ctx->id, ctx, xa_limit_32b, GFP_KERNEL))
-		goto error;
-
-	list_add_tail(&ctx->list, &file->ctx_list);
+	if (xa_alloc(&ctx_table, &ctx->id, NULL, xa_limit_32b, GFP_KERNEL)) {
+		kfree(ctx);
+		return NULL;
+	}
 	return ctx;
+}
 
-error:
-	kfree(ctx);
-	return NULL;
+static void ucma_finish_ctx(struct ucma_context *ctx)
+{
+	lockdep_assert_held(&ctx->file->mut);
+	list_add_tail(&ctx->list, &ctx->file->ctx_list);
+	xa_store(&ctx_table, ctx->id, ctx, GFP_KERNEL);
 }
 
 static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx)
@@ -399,7 +405,7 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
 			      int in_len, int out_len)
 {
-	struct ucma_context *ctx;
+	struct ucma_context *ctx = NULL;
 	struct rdma_ucm_get_event cmd;
 	struct ucma_event *uevent;
 	int ret = 0;
@@ -429,33 +435,46 @@ static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
 		mutex_lock(&file->mut);
 	}
 
-	uevent = list_entry(file->event_list.next, struct ucma_event, list);
+	uevent = list_first_entry(&file->event_list, struct ucma_event, list);
 
 	if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
 		ctx = ucma_alloc_ctx(file);
 		if (!ctx) {
 			ret = -ENOMEM;
-			goto done;
+			goto err_unlock;
 		}
-		uevent->ctx->backlog++;
-		ctx->cm_id = uevent->cm_id;
-		ctx->cm_id->context = ctx;
 		uevent->resp.id = ctx->id;
+		ctx->cm_id = uevent->cm_id;
 	}
 
 	if (copy_to_user(u64_to_user_ptr(cmd.response),
 			 &uevent->resp,
 			 min_t(size_t, out_len, sizeof(uevent->resp)))) {
 		ret = -EFAULT;
-		goto done;
+		goto err_ctx;
+	}
+
+	if (ctx) {
+		uevent->ctx->backlog++;
+		uevent->cm_id->context = ctx;
+		ucma_finish_ctx(ctx);
 	}
 
 	list_del(&uevent->list);
 	uevent->ctx->events_reported++;
 	if (uevent->mc)
 		uevent->mc->events_reported++;
+	mutex_unlock(&file->mut);
+
 	kfree(uevent);
-done:
+	return 0;
+
+err_ctx:
+	if (ctx) {
+		xa_erase(&ctx_table, ctx->id);
+		kfree(ctx);
+	}
+err_unlock:
 	mutex_unlock(&file->mut);
 	return ret;
 }
@@ -498,9 +517,7 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 	if (ret)
 		return ret;
 
-	mutex_lock(&file->mut);
 	ctx = ucma_alloc_ctx(file);
-	mutex_unlock(&file->mut);
 	if (!ctx)
 		return -ENOMEM;
 
@@ -511,24 +528,23 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 		ret = PTR_ERR(cm_id);
 		goto err1;
 	}
+	ctx->cm_id = cm_id;
 
 	resp.id = ctx->id;
 	if (copy_to_user(u64_to_user_ptr(cmd.response),
 			 &resp, sizeof(resp))) {
-		ret = -EFAULT;
-		goto err2;
+		xa_erase(&ctx_table, ctx->id);
+		__destroy_id(ctx);
+		return -EFAULT;
 	}
 
-	ctx->cm_id = cm_id;
+	mutex_lock(&file->mut);
+	ucma_finish_ctx(ctx);
+	mutex_unlock(&file->mut);
 	return 0;
 
-err2:
-	rdma_destroy_id(cm_id);
 err1:
 	xa_erase(&ctx_table, ctx->id);
-	mutex_lock(&file->mut);
-	list_del(&ctx->list);
-	mutex_unlock(&file->mut);
 	kfree(ctx);
 	return ret;
 }
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 05/14] RDMA/ucma: Remove mc_list and rely on xarray
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 04/14] RDMA/ucma: Fix error cases around ucma_alloc_ctx() Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept() Leon Romanovsky
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

It is not really necessary to keep a linked list of mcs associated with
each context when we can just scan the xarray to find the right things.

The removes another overloading of file->mut by relying on the xarray
locking for mc instead.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 59 +++++++++++++---------------------
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 7416a5a6aa69..dd12931f3038 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -96,7 +96,6 @@ struct ucma_context {
 	u64			uid;
 
 	struct list_head	list;
-	struct list_head	mc_list;
 	/* mark that device is in process of destroying the internal HW
 	 * resources, protected by the ctx_table lock
 	 */
@@ -113,7 +112,6 @@ struct ucma_multicast {
 
 	u64			uid;
 	u8			join_state;
-	struct list_head	list;
 	struct sockaddr_storage	addr;
 };
 
@@ -217,7 +215,6 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
 	INIT_WORK(&ctx->close_work, ucma_close_id);
 	refcount_set(&ctx->ref, 1);
 	init_completion(&ctx->comp);
-	INIT_LIST_HEAD(&ctx->mc_list);
 	/* So list_del() will work if we don't do ucma_finish_ctx() */
 	INIT_LIST_HEAD(&ctx->list);
 	ctx->file = file;
@@ -237,26 +234,6 @@ static void ucma_finish_ctx(struct ucma_context *ctx)
 	xa_store(&ctx_table, ctx->id, ctx, GFP_KERNEL);
 }
 
-static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx)
-{
-	struct ucma_multicast *mc;
-
-	mc = kzalloc(sizeof(*mc), GFP_KERNEL);
-	if (!mc)
-		return NULL;
-
-	mc->ctx = ctx;
-	if (xa_alloc(&multicast_table, &mc->id, NULL, xa_limit_32b, GFP_KERNEL))
-		goto error;
-
-	list_add_tail(&mc->list, &ctx->mc_list);
-	return mc;
-
-error:
-	kfree(mc);
-	return NULL;
-}
-
 static void ucma_copy_conn_event(struct rdma_ucm_conn_param *dst,
 				 struct rdma_conn_param *src)
 {
@@ -551,21 +528,26 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 
 static void ucma_cleanup_multicast(struct ucma_context *ctx)
 {
-	struct ucma_multicast *mc, *tmp;
+	struct ucma_multicast *mc;
+	unsigned long index;
 
-	mutex_lock(&ctx->file->mut);
-	list_for_each_entry_safe(mc, tmp, &ctx->mc_list, list) {
-		list_del(&mc->list);
-		xa_erase(&multicast_table, mc->id);
+	xa_for_each(&multicast_table, index, mc) {
+		if (mc->ctx != ctx)
+			continue;
+		/*
+		 * At this point mc->ctx->ref is 0 so the mc cannot leave the
+		 * lock on the reader and this is enough serialization
+		 */
+		xa_erase(&multicast_table, index);
 		kfree(mc);
 	}
-	mutex_unlock(&ctx->file->mut);
 }
 
 static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
 {
 	struct ucma_event *uevent, *tmp;
 
+	mutex_lock(&mc->ctx->file->mut);
 	list_for_each_entry_safe(uevent, tmp, &mc->ctx->file->event_list, list) {
 		if (uevent->mc != mc)
 			continue;
@@ -573,6 +555,7 @@ static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
 		list_del(&uevent->list);
 		kfree(uevent);
 	}
+	mutex_unlock(&mc->ctx->file->mut);
 }
 
 /*
@@ -1501,15 +1484,23 @@ static ssize_t ucma_process_join(struct ucma_file *file,
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
 
-	mutex_lock(&file->mut);
-	mc = ucma_alloc_multicast(ctx);
+	mc = kzalloc(sizeof(*mc), GFP_KERNEL);
 	if (!mc) {
 		ret = -ENOMEM;
 		goto err1;
 	}
+
+	mc->ctx = ctx;
 	mc->join_state = join_state;
 	mc->uid = cmd->uid;
 	memcpy(&mc->addr, addr, cmd->addr_size);
+
+	if (xa_alloc(&multicast_table, &mc->id, NULL, xa_limit_32b,
+		     GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto err1;
+	}
+
 	mutex_lock(&ctx->mutex);
 	ret = rdma_join_multicast(ctx->cm_id, (struct sockaddr *)&mc->addr,
 				  join_state, mc);
@@ -1526,7 +1517,6 @@ static ssize_t ucma_process_join(struct ucma_file *file,
 
 	xa_store(&multicast_table, mc->id, mc, 0);
 
-	mutex_unlock(&file->mut);
 	ucma_put_ctx(ctx);
 	return 0;
 
@@ -1535,10 +1525,8 @@ static ssize_t ucma_process_join(struct ucma_file *file,
 	ucma_cleanup_mc_events(mc);
 err2:
 	xa_erase(&multicast_table, mc->id);
-	list_del(&mc->list);
 	kfree(mc);
 err1:
-	mutex_unlock(&file->mut);
 	ucma_put_ctx(ctx);
 	return ret;
 }
@@ -1617,10 +1605,7 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
 	rdma_leave_multicast(mc->ctx->cm_id, (struct sockaddr *) &mc->addr);
 	mutex_unlock(&mc->ctx->mutex);
 
-	mutex_lock(&mc->ctx->file->mut);
 	ucma_cleanup_mc_events(mc);
-	list_del(&mc->list);
-	mutex_unlock(&mc->ctx->file->mut);
 
 	ucma_put_ctx(mc->ctx);
 	resp.events_reported = mc->events_reported;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept()
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 05/14] RDMA/ucma: Remove mc_list and rely on xarray Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2021-02-09 14:46   ` Chuck Lever
  2020-08-18 12:05 ` [PATCH rdma-next 07/14] RDMA/ucma: Do not use file->mut to lock destroying Leon Romanovsky
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

In almost all cases rdma_accept() is called under the handler_mutex by
ULPs from their handler callbacks. The one exception was ucma which did
not get the handler_mutex.

To improve the understand-ability of the locking scheme obtain the mutex
for ucma as well.

This improves how ucma works by allowing it to directly use handler_mutex
for some of its internal locking against the handler callbacks intead of
the global file->mut lock.

There does not seem to be a serious bug here, other than a DISCONNECT event
can be delivered concurrently with accept succeeding.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cma.c  | 25 ++++++++++++++++++++++---
 drivers/infiniband/core/ucma.c | 12 ++++++++----
 include/rdma/rdma_cm.h         |  5 +++++
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 26de0dab60bb..78641858abe2 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4154,14 +4154,15 @@ static int cma_send_sidr_rep(struct rdma_id_private *id_priv,
 int __rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
 		  const char *caller)
 {
-	struct rdma_id_private *id_priv;
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
 	int ret;
 
-	id_priv = container_of(id, struct rdma_id_private, id);
+	lockdep_assert_held(&id_priv->handler_mutex);
 
 	rdma_restrack_set_task(&id_priv->res, caller);
 
-	if (!cma_comp(id_priv, RDMA_CM_CONNECT))
+	if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
 		return -EINVAL;
 
 	if (!id->qp && conn_param) {
@@ -4214,6 +4215,24 @@ int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
 }
 EXPORT_SYMBOL(__rdma_accept_ece);
 
+void rdma_lock_handler(struct rdma_cm_id *id)
+{
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
+
+	mutex_lock(&id_priv->handler_mutex);
+}
+EXPORT_SYMBOL(rdma_lock_handler);
+
+void rdma_unlock_handler(struct rdma_cm_id *id)
+{
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
+
+	mutex_unlock(&id_priv->handler_mutex);
+}
+EXPORT_SYMBOL(rdma_unlock_handler);
+
 int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event)
 {
 	struct rdma_id_private *id_priv;
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index dd12931f3038..add1ece38739 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1162,16 +1162,20 @@ static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf,
 
 	if (cmd.conn_param.valid) {
 		ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param);
-		mutex_lock(&file->mut);
 		mutex_lock(&ctx->mutex);
+		rdma_lock_handler(ctx->cm_id);
 		ret = __rdma_accept_ece(ctx->cm_id, &conn_param, NULL, &ece);
-		mutex_unlock(&ctx->mutex);
-		if (!ret)
+		if (!ret) {
+			/* The uid must be set atomically with the handler */
 			ctx->uid = cmd.uid;
-		mutex_unlock(&file->mut);
+		}
+		rdma_unlock_handler(ctx->cm_id);
+		mutex_unlock(&ctx->mutex);
 	} else {
 		mutex_lock(&ctx->mutex);
+		rdma_lock_handler(ctx->cm_id);
 		ret = __rdma_accept_ece(ctx->cm_id, NULL, NULL, &ece);
+		rdma_unlock_handler(ctx->cm_id);
 		mutex_unlock(&ctx->mutex);
 	}
 	ucma_put_ctx(ctx);
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index cf5da2ae49bf..c1334c9a7aa8 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -253,6 +253,8 @@ int rdma_listen(struct rdma_cm_id *id, int backlog);
 int __rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
 		  const char *caller);
 
+void rdma_lock_handler(struct rdma_cm_id *id);
+void rdma_unlock_handler(struct rdma_cm_id *id);
 int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
 		      const char *caller, struct rdma_ucm_ece *ece);
 
@@ -270,6 +272,9 @@ int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
  * In the case of error, a reject message is sent to the remote side and the
  * state of the qp associated with the id is modified to error, such that any
  * previously posted receive buffers would be flushed.
+ *
+ * This function is for use by kernel ULPs and must be called from under the
+ * handler callback.
  */
 #define rdma_accept(id, conn_param) \
 	__rdma_accept((id), (conn_param),  KBUILD_MODNAME)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 07/14] RDMA/ucma: Do not use file->mut to lock destroying
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (5 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept() Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 08/14] RDMA/ucma: Fix the locking of ctx->file Leon Romanovsky
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

The only reader of destroying is inside a handler under the handler_mutex,
so directly use the handler_mutex when setting it instead of the larger
file->mut.

As the refcount could be zero here, and the cm_id already freed, and
additional refcount grab around the locking is required to touch the
cm_id.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index add1ece38739..18285941aec3 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -602,9 +602,17 @@ static int ucma_free_ctx(struct ucma_context *ctx)
 
 static int __destroy_id(struct ucma_context *ctx)
 {
-	mutex_lock(&ctx->file->mut);
-	ctx->destroying = 1;
-	mutex_unlock(&ctx->file->mut);
+	/*
+	 * If the refcount is already 0 then ucma_close_id() has already
+	 * destroyed the cm_id, otherwise holding the refcount keeps cm_id
+	 * valid. Prevent queue_work() from being called.
+	 */
+	if (refcount_inc_not_zero(&ctx->ref)) {
+		rdma_lock_handler(ctx->cm_id);
+		ctx->destroying = 1;
+		rdma_unlock_handler(ctx->cm_id);
+		ucma_put_ctx(ctx);
+	}
 
 	flush_workqueue(ctx->file->close_wq);
 	/* At this point it's guaranteed that there is no inflight closing task */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 08/14] RDMA/ucma: Fix the locking of ctx->file
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (6 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 07/14] RDMA/ucma: Do not use file->mut to lock destroying Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 09/14] RDMA/ucma: Fix locking for ctx->events_reported Leon Romanovsky
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

ctx->file is changed under the file->mut lock by ucma_migrate_id(), which
is impossible to lock correctly. Instead change ctx->file under the
handler_lock and ctx_table lock and revise all places touching ctx->file
to use this locking when reading ctx->file.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 18285941aec3..f7ec71225e87 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -547,6 +547,7 @@ static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
 {
 	struct ucma_event *uevent, *tmp;
 
+	rdma_lock_handler(mc->ctx->cm_id);
 	mutex_lock(&mc->ctx->file->mut);
 	list_for_each_entry_safe(uevent, tmp, &mc->ctx->file->event_list, list) {
 		if (uevent->mc != mc)
@@ -556,6 +557,7 @@ static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
 		kfree(uevent);
 	}
 	mutex_unlock(&mc->ctx->file->mut);
+	rdma_unlock_handler(mc->ctx->cm_id);
 }
 
 /*
@@ -1600,7 +1602,7 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
 	mc = xa_load(&multicast_table, cmd.id);
 	if (!mc)
 		mc = ERR_PTR(-ENOENT);
-	else if (mc->ctx->file != file)
+	else if (READ_ONCE(mc->ctx->file) != file)
 		mc = ERR_PTR(-EINVAL);
 	else if (!refcount_inc_not_zero(&mc->ctx->ref))
 		mc = ERR_PTR(-ENXIO);
@@ -1692,6 +1694,7 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 		goto file_put;
 	}
 
+	rdma_lock_handler(ctx->cm_id);
 	cur_file = ctx->file;
 	if (cur_file == new_file) {
 		resp.events_reported = ctx->events_reported;
@@ -1718,6 +1721,7 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 			 &resp, sizeof(resp)))
 		ret = -EFAULT;
 
+	rdma_unlock_handler(ctx->cm_id);
 	ucma_put_ctx(ctx);
 file_put:
 	fdput(f);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 09/14] RDMA/ucma: Fix locking for ctx->events_reported
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (7 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 08/14] RDMA/ucma: Fix the locking of ctx->file Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 10/14] RDMA/ucma: Add missing locking around rdma_leave_multicast() Leon Romanovsky
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Roland Dreier, Sean Hefty

From: Jason Gunthorpe <jgg@nvidia.com>

This value is locked under the file->mut, ensure it is held whenever
touching it.

The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is
already not possible for the write side to run, the movement is just for
clarity.

Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index f7ec71225e87..ca5c44cac48c 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -587,6 +587,7 @@ static int ucma_free_ctx(struct ucma_context *ctx)
 			list_move_tail(&uevent->list, &list);
 	}
 	list_del(&ctx->list);
+	events_reported = ctx->events_reported;
 	mutex_unlock(&ctx->file->mut);
 
 	list_for_each_entry_safe(uevent, tmp, &list, list) {
@@ -596,7 +597,6 @@ static int ucma_free_ctx(struct ucma_context *ctx)
 		kfree(uevent);
 	}
 
-	events_reported = ctx->events_reported;
 	mutex_destroy(&ctx->mutex);
 	kfree(ctx);
 	return events_reported;
@@ -1697,7 +1697,9 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 	rdma_lock_handler(ctx->cm_id);
 	cur_file = ctx->file;
 	if (cur_file == new_file) {
+		mutex_lock(&cur_file->mut);
 		resp.events_reported = ctx->events_reported;
+		mutex_unlock(&cur_file->mut);
 		goto response;
 	}
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 10/14] RDMA/ucma: Add missing locking around rdma_leave_multicast()
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (8 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 09/14] RDMA/ucma: Fix locking for ctx->events_reported Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 11/14] RDMA/ucma: Change backlog into an atomic Leon Romanovsky
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

All entry points to the rdma_cm from a ULP must be single threaded,
even this error unwinds. Add the missing locking.

Fixes: 7c11910783a1 ("RDMA/ucma: Put a lock around every call to the rdma_cm layer")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index ca5c44cac48c..ad78b05de656 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1535,7 +1535,9 @@ static ssize_t ucma_process_join(struct ucma_file *file,
 	return 0;
 
 err3:
+	mutex_lock(&ctx->mutex);
 	rdma_leave_multicast(ctx->cm_id, (struct sockaddr *) &mc->addr);
+	mutex_unlock(&ctx->mutex);
 	ucma_cleanup_mc_events(mc);
 err2:
 	xa_erase(&multicast_table, mc->id);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 11/14] RDMA/ucma: Change backlog into an atomic
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (9 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 10/14] RDMA/ucma: Add missing locking around rdma_leave_multicast() Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 12/14] RDMA/ucma: Narrow file->mut in ucma_event_handler() Leon Romanovsky
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

There is no reason to grab the file->mut just to do this inc/dec work. Use
an atomic.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index ad78b05de656..8be8ff14ab62 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -88,7 +88,7 @@ struct ucma_context {
 	struct completion	comp;
 	refcount_t		ref;
 	int			events_reported;
-	int			backlog;
+	atomic_t		backlog;
 
 	struct ucma_file	*file;
 	struct rdma_cm_id	*cm_id;
@@ -348,12 +348,11 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 	uevent->resp.ece.attr_mod = event->ece.attr_mod;
 
 	if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
-		if (!ctx->backlog) {
+		if (!atomic_add_unless(&ctx->backlog, -1, 0)) {
 			ret = -ENOMEM;
 			kfree(uevent);
 			goto out;
 		}
-		ctx->backlog--;
 	} else if (!ctx->uid || ctx->cm_id != cm_id) {
 		/*
 		 * We ignore events for new connections until userspace has set
@@ -432,7 +431,7 @@ static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
 	}
 
 	if (ctx) {
-		uevent->ctx->backlog++;
+		atomic_inc(&uevent->ctx->backlog);
 		uevent->cm_id->context = ctx;
 		ucma_finish_ctx(ctx);
 	}
@@ -1136,10 +1135,12 @@ static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf,
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
 
-	ctx->backlog = cmd.backlog > 0 && cmd.backlog < max_backlog ?
-		       cmd.backlog : max_backlog;
+	if (cmd.backlog <= 0 || cmd.backlog > max_backlog)
+		cmd.backlog = max_backlog;
+	atomic_set(&ctx->backlog, cmd.backlog);
+
 	mutex_lock(&ctx->mutex);
-	ret = rdma_listen(ctx->cm_id, ctx->backlog);
+	ret = rdma_listen(ctx->cm_id, cmd.backlog);
 	mutex_unlock(&ctx->mutex);
 	ucma_put_ctx(ctx);
 	return ret;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 12/14] RDMA/ucma: Narrow file->mut in ucma_event_handler()
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (10 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 11/14] RDMA/ucma: Change backlog into an atomic Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 13/14] RDMA/ucma: Rework how new connections are passed through event delivery Leon Romanovsky
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

Since the backlog is now an atomic the file->mut is now only protecting
the event_list and ctx_list. Narrow its scope to make it clear

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 8be8ff14ab62..32e82bcffccd 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -283,7 +283,6 @@ static void ucma_set_event_context(struct ucma_context *ctx,
 	}
 }
 
-/* Called with file->mut locked for the relevant context. */
 static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
 {
 	struct ucma_context *ctx = cm_id->context;
@@ -307,6 +306,7 @@ static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
 		return;
 	}
 
+	mutex_lock(&ctx->file->mut);
 	list_for_each_entry(con_req_eve, &ctx->file->event_list, list) {
 		if (con_req_eve->cm_id == cm_id &&
 		    con_req_eve->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
@@ -317,6 +317,7 @@ static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
 			break;
 		}
 	}
+	mutex_unlock(&ctx->file->mut);
 	if (!event_found)
 		pr_err("ucma_removal_event_handler: warning: connect request event wasn't found\n");
 }
@@ -326,13 +327,11 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 {
 	struct ucma_event *uevent;
 	struct ucma_context *ctx = cm_id->context;
-	int ret = 0;
 
 	uevent = kzalloc(sizeof(*uevent), GFP_KERNEL);
 	if (!uevent)
 		return event->event == RDMA_CM_EVENT_CONNECT_REQUEST;
 
-	mutex_lock(&ctx->file->mut);
 	uevent->cm_id = cm_id;
 	ucma_set_event_context(ctx, event, uevent);
 	uevent->resp.event = event->event;
@@ -349,9 +348,8 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 
 	if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
 		if (!atomic_add_unless(&ctx->backlog, -1, 0)) {
-			ret = -ENOMEM;
 			kfree(uevent);
-			goto out;
+			return -ENOMEM;
 		}
 	} else if (!ctx->uid || ctx->cm_id != cm_id) {
 		/*
@@ -366,16 +364,16 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 			ucma_removal_event_handler(cm_id);
 
 		kfree(uevent);
-		goto out;
+		return 0;
 	}
 
+	mutex_lock(&ctx->file->mut);
 	list_add_tail(&uevent->list, &ctx->file->event_list);
+	mutex_unlock(&ctx->file->mut);
 	wake_up_interruptible(&ctx->file->poll_wait);
 	if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
 		ucma_removal_event_handler(cm_id);
-out:
-	mutex_unlock(&ctx->file->mut);
-	return ret;
+	return 0;
 }
 
 static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 13/14] RDMA/ucma: Rework how new connections are passed through event delivery
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (11 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 12/14] RDMA/ucma: Narrow file->mut in ucma_event_handler() Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-18 12:05 ` [PATCH rdma-next 14/14] RDMA/ucma: Remove closing and the close_wq Leon Romanovsky
  2020-08-27 11:39 ` [PATCH rdma-next 00/14] Cleanup locking and events in ucma Jason Gunthorpe
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

When a new connection is established the RDMA CM creates a new cm_id and
passes it through to the event handler. However inside the UCMA the new ID
is not assigned a ucma_context until the user retrieves the event from a
syscall.

This creates a weird edge condition where a cm_id's context can continue
to point at the listening_id that created it, and a number of additional
edge conditions on event list clean up related to destroying half created
IDs.

There is also a race condition in ucma_get_events() where the
cm_id->context is being assigned without holding the handler_mutex.

Simplify all of this by creating the ucma_context inside the event handler
itself and eliminating the edge case of a half created cm_id. All cm_id's
can be uniformly destroyed via __destroy_id() or via the close_work.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 222 ++++++++++++++-------------------
 1 file changed, 96 insertions(+), 126 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 32e82bcffccd..40539c9625d6 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -117,11 +117,10 @@ struct ucma_multicast {
 
 struct ucma_event {
 	struct ucma_context	*ctx;
+	struct ucma_context	*listen_ctx;
 	struct ucma_multicast	*mc;
 	struct list_head	list;
-	struct rdma_cm_id	*cm_id;
 	struct rdma_ucm_event_resp resp;
-	struct work_struct	close_work;
 };
 
 static DEFINE_XARRAY_ALLOC(ctx_table);
@@ -182,14 +181,6 @@ static struct ucma_context *ucma_get_ctx_dev(struct ucma_file *file, int id)
 	return ctx;
 }
 
-static void ucma_close_event_id(struct work_struct *work)
-{
-	struct ucma_event *uevent_close =  container_of(work, struct ucma_event, close_work);
-
-	rdma_destroy_id(uevent_close->cm_id);
-	kfree(uevent_close);
-}
-
 static void ucma_close_id(struct work_struct *work)
 {
 	struct ucma_context *ctx =  container_of(work, struct ucma_context, close_work);
@@ -263,10 +254,15 @@ static void ucma_copy_ud_event(struct ib_device *device,
 	dst->qkey = src->qkey;
 }
 
-static void ucma_set_event_context(struct ucma_context *ctx,
-				   struct rdma_cm_event *event,
-				   struct ucma_event *uevent)
+static struct ucma_event *ucma_create_uevent(struct ucma_context *ctx,
+					     struct rdma_cm_event *event)
 {
+	struct ucma_event *uevent;
+
+	uevent = kzalloc(sizeof(*uevent), GFP_KERNEL);
+	if (!uevent)
+		return NULL;
+
 	uevent->ctx = ctx;
 	switch (event->event) {
 	case RDMA_CM_EVENT_MULTICAST_JOIN:
@@ -281,45 +277,56 @@ static void ucma_set_event_context(struct ucma_context *ctx,
 		uevent->resp.id = ctx->id;
 		break;
 	}
+	uevent->resp.event = event->event;
+	uevent->resp.status = event->status;
+	if (ctx->cm_id->qp_type == IB_QPT_UD)
+		ucma_copy_ud_event(ctx->cm_id->device, &uevent->resp.param.ud,
+				   &event->param.ud);
+	else
+		ucma_copy_conn_event(&uevent->resp.param.conn,
+				     &event->param.conn);
+
+	uevent->resp.ece.vendor_id = event->ece.vendor_id;
+	uevent->resp.ece.attr_mod = event->ece.attr_mod;
+	return uevent;
 }
 
-static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
+static int ucma_connect_event_handler(struct rdma_cm_id *cm_id,
+				      struct rdma_cm_event *event)
 {
-	struct ucma_context *ctx = cm_id->context;
-	struct ucma_event *con_req_eve;
-	int event_found = 0;
+	struct ucma_context *listen_ctx = cm_id->context;
+	struct ucma_context *ctx;
+	struct ucma_event *uevent;
 
-	if (ctx->destroying)
-		return;
+	if (!atomic_add_unless(&listen_ctx->backlog, -1, 0))
+		return -ENOMEM;
+	ctx = ucma_alloc_ctx(listen_ctx->file);
+	if (!ctx)
+		goto err_backlog;
+	ctx->cm_id = cm_id;
 
-	/* only if context is pointing to cm_id that it owns it and can be
-	 * queued to be closed, otherwise that cm_id is an inflight one that
-	 * is part of that context event list pending to be detached and
-	 * reattached to its new context as part of ucma_get_event,
-	 * handled separately below.
-	 */
-	if (ctx->cm_id == cm_id) {
-		xa_lock(&ctx_table);
-		ctx->closing = 1;
-		xa_unlock(&ctx_table);
-		queue_work(ctx->file->close_wq, &ctx->close_work);
-		return;
-	}
+	uevent = ucma_create_uevent(listen_ctx, event);
+	if (!uevent)
+		goto err_alloc;
+	uevent->listen_ctx = listen_ctx;
+	uevent->resp.id = ctx->id;
+
+	ctx->cm_id->context = ctx;
 
 	mutex_lock(&ctx->file->mut);
-	list_for_each_entry(con_req_eve, &ctx->file->event_list, list) {
-		if (con_req_eve->cm_id == cm_id &&
-		    con_req_eve->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
-			list_del(&con_req_eve->list);
-			INIT_WORK(&con_req_eve->close_work, ucma_close_event_id);
-			queue_work(ctx->file->close_wq, &con_req_eve->close_work);
-			event_found = 1;
-			break;
-		}
-	}
+	ucma_finish_ctx(ctx);
+	list_add_tail(&uevent->list, &ctx->file->event_list);
 	mutex_unlock(&ctx->file->mut);
-	if (!event_found)
-		pr_err("ucma_removal_event_handler: warning: connect request event wasn't found\n");
+	wake_up_interruptible(&ctx->file->poll_wait);
+	return 0;
+
+err_alloc:
+	xa_erase(&ctx_table, ctx->id);
+	kfree(ctx);
+err_backlog:
+	atomic_inc(&listen_ctx->backlog);
+	/* Returning error causes the new ID to be destroyed */
+	return -ENOMEM;
 }
 
 static int ucma_event_handler(struct rdma_cm_id *cm_id,
@@ -328,61 +335,41 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 	struct ucma_event *uevent;
 	struct ucma_context *ctx = cm_id->context;
 
-	uevent = kzalloc(sizeof(*uevent), GFP_KERNEL);
-	if (!uevent)
-		return event->event == RDMA_CM_EVENT_CONNECT_REQUEST;
+	if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST)
+		return ucma_connect_event_handler(cm_id, event);
 
-	uevent->cm_id = cm_id;
-	ucma_set_event_context(ctx, event, uevent);
-	uevent->resp.event = event->event;
-	uevent->resp.status = event->status;
-	if (cm_id->qp_type == IB_QPT_UD)
-		ucma_copy_ud_event(cm_id->device, &uevent->resp.param.ud,
-				   &event->param.ud);
-	else
-		ucma_copy_conn_event(&uevent->resp.param.conn,
-				     &event->param.conn);
-
-	uevent->resp.ece.vendor_id = event->ece.vendor_id;
-	uevent->resp.ece.attr_mod = event->ece.attr_mod;
-
-	if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
-		if (!atomic_add_unless(&ctx->backlog, -1, 0)) {
-			kfree(uevent);
-			return -ENOMEM;
-		}
-	} else if (!ctx->uid || ctx->cm_id != cm_id) {
-		/*
-		 * We ignore events for new connections until userspace has set
-		 * their context.  This can only happen if an error occurs on a
-		 * new connection before the user accepts it.  This is okay,
-		 * since the accept will just fail later. However, we do need
-		 * to release the underlying HW resources in case of a device
-		 * removal event.
-		 */
-		if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
-			ucma_removal_event_handler(cm_id);
-
-		kfree(uevent);
-		return 0;
+	/*
+	 * We ignore events for new connections until userspace has set their
+	 * context.  This can only happen if an error occurs on a new connection
+	 * before the user accepts it.  This is okay, since the accept will just
+	 * fail later. However, we do need to release the underlying HW
+	 * resources in case of a device removal event.
+	 */
+	if (ctx->uid) {
+		uevent = ucma_create_uevent(ctx, event);
+		if (!uevent)
+			return 0;
+
+		mutex_lock(&ctx->file->mut);
+		list_add_tail(&uevent->list, &ctx->file->event_list);
+		mutex_unlock(&ctx->file->mut);
+		wake_up_interruptible(&ctx->file->poll_wait);
 	}
 
-	mutex_lock(&ctx->file->mut);
-	list_add_tail(&uevent->list, &ctx->file->event_list);
-	mutex_unlock(&ctx->file->mut);
-	wake_up_interruptible(&ctx->file->poll_wait);
-	if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
-		ucma_removal_event_handler(cm_id);
+	if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL && !ctx->destroying) {
+		xa_lock(&ctx_table);
+		ctx->closing = 1;
+		xa_unlock(&ctx_table);
+		queue_work(ctx->file->close_wq, &ctx->close_work);
+	}
 	return 0;
 }
 
 static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
 			      int in_len, int out_len)
 {
-	struct ucma_context *ctx = NULL;
 	struct rdma_ucm_get_event cmd;
 	struct ucma_event *uevent;
-	int ret = 0;
 
 	/*
 	 * Old 32 bit user space does not send the 4 byte padding in the
@@ -411,46 +398,23 @@ static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
 
 	uevent = list_first_entry(&file->event_list, struct ucma_event, list);
 
-	if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
-		ctx = ucma_alloc_ctx(file);
-		if (!ctx) {
-			ret = -ENOMEM;
-			goto err_unlock;
-		}
-		uevent->resp.id = ctx->id;
-		ctx->cm_id = uevent->cm_id;
-	}
-
 	if (copy_to_user(u64_to_user_ptr(cmd.response),
 			 &uevent->resp,
 			 min_t(size_t, out_len, sizeof(uevent->resp)))) {
-		ret = -EFAULT;
-		goto err_ctx;
-	}
-
-	if (ctx) {
-		atomic_inc(&uevent->ctx->backlog);
-		uevent->cm_id->context = ctx;
-		ucma_finish_ctx(ctx);
+		mutex_unlock(&file->mut);
+		return -EFAULT;
 	}
 
 	list_del(&uevent->list);
 	uevent->ctx->events_reported++;
 	if (uevent->mc)
 		uevent->mc->events_reported++;
+	if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST)
+		atomic_inc(&uevent->ctx->backlog);
 	mutex_unlock(&file->mut);
 
 	kfree(uevent);
 	return 0;
-
-err_ctx:
-	if (ctx) {
-		xa_erase(&ctx_table, ctx->id);
-		kfree(ctx);
-	}
-err_unlock:
-	mutex_unlock(&file->mut);
-	return ret;
 }
 
 static int ucma_get_qp_type(struct rdma_ucm_create_id *cmd, enum ib_qp_type *qp_type)
@@ -562,10 +526,6 @@ static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
  * this point, no new events will be reported from the hardware. However, we
  * still need to cleanup the UCMA context for this ID. Specifically, there
  * might be events that have not yet been consumed by the user space software.
- * These might include pending connect requests which we have not completed
- * processing.  We cannot call rdma_destroy_id while holding the lock of the
- * context (file->mut), as it might cause a deadlock. We therefore extract all
- * relevant events from the context pending events list while holding the
  * mutex. After that we release them as needed.
  */
 static int ucma_free_ctx(struct ucma_context *ctx)
@@ -574,23 +534,27 @@ static int ucma_free_ctx(struct ucma_context *ctx)
 	struct ucma_event *uevent, *tmp;
 	LIST_HEAD(list);
 
-
 	ucma_cleanup_multicast(ctx);
 
 	/* Cleanup events not yet reported to the user. */
 	mutex_lock(&ctx->file->mut);
 	list_for_each_entry_safe(uevent, tmp, &ctx->file->event_list, list) {
-		if (uevent->ctx == ctx)
+		if (uevent->ctx == ctx || uevent->listen_ctx == ctx)
 			list_move_tail(&uevent->list, &list);
 	}
 	list_del(&ctx->list);
 	events_reported = ctx->events_reported;
 	mutex_unlock(&ctx->file->mut);
 
+	/*
+	 * If this was a listening ID then any connections spawned from it
+	 * that have not been delivered to userspace are cleaned up too.
+	 * Must be done outside any locks.
+	 */
 	list_for_each_entry_safe(uevent, tmp, &list, list) {
 		list_del(&uevent->list);
 		if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST)
-			rdma_destroy_id(uevent->cm_id);
+			__destroy_id(uevent->ctx);
 		kfree(uevent);
 	}
 
@@ -1845,13 +1809,19 @@ static int ucma_open(struct inode *inode, struct file *filp)
 static int ucma_close(struct inode *inode, struct file *filp)
 {
 	struct ucma_file *file = filp->private_data;
-	struct ucma_context *ctx, *tmp;
 
 	/*
-	 * ctx_list can only be mutated under the write(), which is no longer
-	 * possible, so no locking needed.
+	 * All paths that touch ctx_list or ctx_list starting from write() are
+	 * prevented by this being a FD release function. The list_add_tail() in
+	 * ucma_connect_event_handler() can run concurrently, however it only
+	 * adds to the list *after* a listening ID. By only reading the first of
+	 * the list, and relying on __destroy_id() to block
+	 * ucma_connect_event_handler(), no additional locking is needed.
 	 */
-	list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
+	while (!list_empty(&file->ctx_list)) {
+		struct ucma_context *ctx = list_first_entry(
+			&file->ctx_list, struct ucma_context, list);
+
 		xa_erase(&ctx_table, ctx->id);
 		__destroy_id(ctx);
 	}
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH rdma-next 14/14] RDMA/ucma: Remove closing and the close_wq
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (12 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 13/14] RDMA/ucma: Rework how new connections are passed through event delivery Leon Romanovsky
@ 2020-08-18 12:05 ` Leon Romanovsky
  2020-08-27 11:39 ` [PATCH rdma-next 00/14] Cleanup locking and events in ucma Jason Gunthorpe
  14 siblings, 0 replies; 18+ messages in thread
From: Leon Romanovsky @ 2020-08-18 12:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

Use cancel_work_sync() to ensure that the wq is not running and simply
assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the
close_wq since flush_workqueue() is no longer needed.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/ucma.c | 49 +++++++++++-----------------------
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 40539c9625d6..46d37e470e98 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -80,7 +80,6 @@ struct ucma_file {
 	struct list_head	ctx_list;
 	struct list_head	event_list;
 	wait_queue_head_t	poll_wait;
-	struct workqueue_struct	*close_wq;
 };
 
 struct ucma_context {
@@ -96,10 +95,6 @@ struct ucma_context {
 	u64			uid;
 
 	struct list_head	list;
-	/* mark that device is in process of destroying the internal HW
-	 * resources, protected by the ctx_table lock
-	 */
-	int			closing;
 	/* sync between removal event and id destroy, protected by file mut */
 	int			destroying;
 	struct work_struct	close_work;
@@ -148,12 +143,9 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
 
 	xa_lock(&ctx_table);
 	ctx = _ucma_find_context(id, file);
-	if (!IS_ERR(ctx)) {
-		if (ctx->closing)
-			ctx = ERR_PTR(-EIO);
-		else if (!refcount_inc_not_zero(&ctx->ref))
+	if (!IS_ERR(ctx))
+		if (!refcount_inc_not_zero(&ctx->ref))
 			ctx = ERR_PTR(-ENXIO);
-	}
 	xa_unlock(&ctx_table);
 	return ctx;
 }
@@ -193,6 +185,14 @@ static void ucma_close_id(struct work_struct *work)
 	wait_for_completion(&ctx->comp);
 	/* No new events will be generated after destroying the id. */
 	rdma_destroy_id(ctx->cm_id);
+
+	/*
+	 * At this point ctx->ref is zero so the only place the ctx can be is in
+	 * a uevent or in __destroy_id(). Since the former doesn't touch
+	 * ctx->cm_id and the latter sync cancels this, there is no races with
+	 * this store.
+	 */
+	ctx->cm_id = NULL;
 }
 
 static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
@@ -356,12 +356,8 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
 		wake_up_interruptible(&ctx->file->poll_wait);
 	}
 
-	if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL && !ctx->destroying) {
-		xa_lock(&ctx_table);
-		ctx->closing = 1;
-		xa_unlock(&ctx_table);
-		queue_work(ctx->file->close_wq, &ctx->close_work);
-	}
+	if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL && !ctx->destroying)
+		queue_work(system_unbound_wq, &ctx->close_work);
 	return 0;
 }
 
@@ -577,17 +573,10 @@ static int __destroy_id(struct ucma_context *ctx)
 		ucma_put_ctx(ctx);
 	}
 
-	flush_workqueue(ctx->file->close_wq);
+	cancel_work_sync(&ctx->close_work);
 	/* At this point it's guaranteed that there is no inflight closing task */
-	xa_lock(&ctx_table);
-	if (!ctx->closing) {
-		xa_unlock(&ctx_table);
-		ucma_put_ctx(ctx);
-		wait_for_completion(&ctx->comp);
-		rdma_destroy_id(ctx->cm_id);
-	} else {
-		xa_unlock(&ctx_table);
-	}
+	if (ctx->cm_id)
+		ucma_close_id(&ctx->close_work);
 	return ucma_free_ctx(ctx);
 }
 
@@ -1788,13 +1777,6 @@ static int ucma_open(struct inode *inode, struct file *filp)
 	if (!file)
 		return -ENOMEM;
 
-	file->close_wq = alloc_ordered_workqueue("ucma_close_id",
-						 WQ_MEM_RECLAIM);
-	if (!file->close_wq) {
-		kfree(file);
-		return -ENOMEM;
-	}
-
 	INIT_LIST_HEAD(&file->event_list);
 	INIT_LIST_HEAD(&file->ctx_list);
 	init_waitqueue_head(&file->poll_wait);
@@ -1825,7 +1807,6 @@ static int ucma_close(struct inode *inode, struct file *filp)
 		xa_erase(&ctx_table, ctx->id);
 		__destroy_id(ctx);
 	}
-	destroy_workqueue(file->close_wq);
 	kfree(file);
 	return 0;
 }
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next 00/14] Cleanup locking and events in ucma
  2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
                   ` (13 preceding siblings ...)
  2020-08-18 12:05 ` [PATCH rdma-next 14/14] RDMA/ucma: Remove closing and the close_wq Leon Romanovsky
@ 2020-08-27 11:39 ` Jason Gunthorpe
  14 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2020-08-27 11:39 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Leon Romanovsky, linux-kernel,
	linux-rdma, Roland Dreier, Sean Hefty

On Tue, Aug 18, 2020 at 03:05:12PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> >From Jason:
> 
> Rework how the uevents for new connections are handled so all the locking
> ends up simpler and a work queue can be removed. This should also speed up
> destruction of ucma_context's as a flush_workqueue() was replaced with
> cancel_work_sync().
> 
> The simpler locking comes from narrowing what file->mut covers and moving
> other data to other locks, particularly by injecting the handler_mutex
> from the RDMA CM core as a construct available to ULPs. The handler_mutex
> directly prevents handlers from running without creating any ABBA locking
> problems.
> 
> Fix various error cases and data races caused by missing locking.
> 
> Thanks
> 
> Jason Gunthorpe (14):
>   RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()
>   RDMA/ucma: Remove unnecessary locking of file->ctx_list in close
>   RDMA/ucma: Consolidate the two destroy flows
>   RDMA/ucma: Fix error cases around ucma_alloc_ctx()
>   RDMA/ucma: Remove mc_list and rely on xarray
>   RDMA/cma: Add missing locking to rdma_accept()
>   RDMA/ucma: Do not use file->mut to lock destroying
>   RDMA/ucma: Fix the locking of ctx->file
>   RDMA/ucma: Fix locking for ctx->events_reported
>   RDMA/ucma: Add missing locking around rdma_leave_multicast()
>   RDMA/ucma: Change backlog into an atomic
>   RDMA/ucma: Narrow file->mut in ucma_event_handler()
>   RDMA/ucma: Rework how new connections are passed through event
>     delivery
>   RDMA/ucma: Remove closing and the close_wq

Applied to for-next

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept()
  2020-08-18 12:05 ` [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept() Leon Romanovsky
@ 2021-02-09 14:46   ` Chuck Lever
  2021-02-09 15:40     ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2021-02-09 14:46 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

Howdy-

> On Aug 18, 2020, at 8:05 AM, Leon Romanovsky <leon@kernel.org> wrote:
> 
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> In almost all cases rdma_accept() is called under the handler_mutex by
> ULPs from their handler callbacks. The one exception was ucma which did
> not get the handler_mutex.

It turns out that the RPC/RDMA server also does not invoke rdma_accept()
from its CM event handler.

See net/sunrpc/xprtrdma/svc_rdma_transport.c:svc_rdma_accept()

When lock debugging is enabled, the lockdep assertion in rdma_accept()
fires on every RPC/RDMA connection.

I'm not quite sure what to do about this.


> To improve the understand-ability of the locking scheme obtain the mutex
> for ucma as well.
> 
> This improves how ucma works by allowing it to directly use handler_mutex
> for some of its internal locking against the handler callbacks intead of
> the global file->mut lock.
> 
> There does not seem to be a serious bug here, other than a DISCONNECT event
> can be delivered concurrently with accept succeeding.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
> drivers/infiniband/core/cma.c  | 25 ++++++++++++++++++++++---
> drivers/infiniband/core/ucma.c | 12 ++++++++----
> include/rdma/rdma_cm.h         |  5 +++++
> 3 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 26de0dab60bb..78641858abe2 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -4154,14 +4154,15 @@ static int cma_send_sidr_rep(struct rdma_id_private *id_priv,
> int __rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
> 		  const char *caller)
> {
> -	struct rdma_id_private *id_priv;
> +	struct rdma_id_private *id_priv =
> +		container_of(id, struct rdma_id_private, id);
> 	int ret;
> 
> -	id_priv = container_of(id, struct rdma_id_private, id);
> +	lockdep_assert_held(&id_priv->handler_mutex);
> 
> 	rdma_restrack_set_task(&id_priv->res, caller);
> 
> -	if (!cma_comp(id_priv, RDMA_CM_CONNECT))
> +	if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
> 		return -EINVAL;
> 
> 	if (!id->qp && conn_param) {
> @@ -4214,6 +4215,24 @@ int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
> }
> EXPORT_SYMBOL(__rdma_accept_ece);
> 
> +void rdma_lock_handler(struct rdma_cm_id *id)
> +{
> +	struct rdma_id_private *id_priv =
> +		container_of(id, struct rdma_id_private, id);
> +
> +	mutex_lock(&id_priv->handler_mutex);
> +}
> +EXPORT_SYMBOL(rdma_lock_handler);
> +
> +void rdma_unlock_handler(struct rdma_cm_id *id)
> +{
> +	struct rdma_id_private *id_priv =
> +		container_of(id, struct rdma_id_private, id);
> +
> +	mutex_unlock(&id_priv->handler_mutex);
> +}
> +EXPORT_SYMBOL(rdma_unlock_handler);
> +
> int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event)
> {
> 	struct rdma_id_private *id_priv;
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index dd12931f3038..add1ece38739 100644
> --- a/drivers/infiniband/core/ucma.c
> +++ b/drivers/infiniband/core/ucma.c
> @@ -1162,16 +1162,20 @@ static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf,
> 
> 	if (cmd.conn_param.valid) {
> 		ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param);
> -		mutex_lock(&file->mut);
> 		mutex_lock(&ctx->mutex);
> +		rdma_lock_handler(ctx->cm_id);
> 		ret = __rdma_accept_ece(ctx->cm_id, &conn_param, NULL, &ece);
> -		mutex_unlock(&ctx->mutex);
> -		if (!ret)
> +		if (!ret) {
> +			/* The uid must be set atomically with the handler */
> 			ctx->uid = cmd.uid;
> -		mutex_unlock(&file->mut);
> +		}
> +		rdma_unlock_handler(ctx->cm_id);
> +		mutex_unlock(&ctx->mutex);
> 	} else {
> 		mutex_lock(&ctx->mutex);
> +		rdma_lock_handler(ctx->cm_id);
> 		ret = __rdma_accept_ece(ctx->cm_id, NULL, NULL, &ece);
> +		rdma_unlock_handler(ctx->cm_id);
> 		mutex_unlock(&ctx->mutex);
> 	}
> 	ucma_put_ctx(ctx);
> diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
> index cf5da2ae49bf..c1334c9a7aa8 100644
> --- a/include/rdma/rdma_cm.h
> +++ b/include/rdma/rdma_cm.h
> @@ -253,6 +253,8 @@ int rdma_listen(struct rdma_cm_id *id, int backlog);
> int __rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
> 		  const char *caller);
> 
> +void rdma_lock_handler(struct rdma_cm_id *id);
> +void rdma_unlock_handler(struct rdma_cm_id *id);
> int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
> 		      const char *caller, struct rdma_ucm_ece *ece);
> 
> @@ -270,6 +272,9 @@ int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
>  * In the case of error, a reject message is sent to the remote side and the
>  * state of the qp associated with the id is modified to error, such that any
>  * previously posted receive buffers would be flushed.
> + *
> + * This function is for use by kernel ULPs and must be called from under the
> + * handler callback.
>  */
> #define rdma_accept(id, conn_param) \
> 	__rdma_accept((id), (conn_param),  KBUILD_MODNAME)
> -- 
> 2.26.2
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept()
  2021-02-09 14:46   ` Chuck Lever
@ 2021-02-09 15:40     ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2021-02-09 15:40 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Leon Romanovsky, Doug Ledford, linux-rdma

On Tue, Feb 09, 2021 at 02:46:48PM +0000, Chuck Lever wrote:
> Howdy-
> 
> > On Aug 18, 2020, at 8:05 AM, Leon Romanovsky <leon@kernel.org> wrote:
> > 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > In almost all cases rdma_accept() is called under the handler_mutex by
> > ULPs from their handler callbacks. The one exception was ucma which did
> > not get the handler_mutex.
> 
> It turns out that the RPC/RDMA server also does not invoke rdma_accept()
> from its CM event handler.
> 
> See net/sunrpc/xprtrdma/svc_rdma_transport.c:svc_rdma_accept()
> 
> When lock debugging is enabled, the lockdep assertion in rdma_accept()
> fires on every RPC/RDMA connection.
> 
> I'm not quite sure what to do about this.

Add the manual handler mutex calls like ucma did:

> > +void rdma_lock_handler(struct rdma_cm_id *id)
> > +{
> > +	struct rdma_id_private *id_priv =
> > +		container_of(id, struct rdma_id_private, id);
> > +
> > +	mutex_lock(&id_priv->handler_mutex);
> > +}
> > +EXPORT_SYMBOL(rdma_lock_handler);
> > +
> > +void rdma_unlock_handler(struct rdma_cm_id *id)
> > +{
> > +	struct rdma_id_private *id_priv =
> > +		container_of(id, struct rdma_id_private, id);
> > +
> > +	mutex_unlock(&id_priv->handler_mutex);
> > +}
> > +EXPORT_SYMBOL(rdma_unlock_handler);

But you need to audit carefully that this doesn't have messed up
concurrancy.. IIRC this means being careful that no events that could
be delivered before you get to accepting could have done something
they shouldn't do, like free the cm_id for instance.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-02-09 15:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18 12:05 [PATCH rdma-next 00/14] Cleanup locking and events in ucma Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 01/14] RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx() Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 02/14] RDMA/ucma: Remove unnecessary locking of file->ctx_list in close Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 03/14] RDMA/ucma: Consolidate the two destroy flows Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 04/14] RDMA/ucma: Fix error cases around ucma_alloc_ctx() Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 05/14] RDMA/ucma: Remove mc_list and rely on xarray Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 06/14] RDMA/cma: Add missing locking to rdma_accept() Leon Romanovsky
2021-02-09 14:46   ` Chuck Lever
2021-02-09 15:40     ` Jason Gunthorpe
2020-08-18 12:05 ` [PATCH rdma-next 07/14] RDMA/ucma: Do not use file->mut to lock destroying Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 08/14] RDMA/ucma: Fix the locking of ctx->file Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 09/14] RDMA/ucma: Fix locking for ctx->events_reported Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 10/14] RDMA/ucma: Add missing locking around rdma_leave_multicast() Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 11/14] RDMA/ucma: Change backlog into an atomic Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 12/14] RDMA/ucma: Narrow file->mut in ucma_event_handler() Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 13/14] RDMA/ucma: Rework how new connections are passed through event delivery Leon Romanovsky
2020-08-18 12:05 ` [PATCH rdma-next 14/14] RDMA/ucma: Remove closing and the close_wq Leon Romanovsky
2020-08-27 11:39 ` [PATCH rdma-next 00/14] Cleanup locking and events in ucma Jason Gunthorpe

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox