All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm
@ 2020-03-10  9:25 Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Leon Romanovsky
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Daniel Jurgens, Haggai Eran, linux-kernel,
	linux-rdma, Sean Hefty

From: Leon Romanovsky <leonro@mellanox.com>

From Jason:

cm_id.state is a non-atomic value that must always be read and written
under lock, or while the thread has the only pointer to the cm_id.

Critically, during MAD handling the cm_id.state is used to control when
MAD handlers can run, and in turn what data they can touch. Without
locking, an assignment to state can immediately allow concurrent MAD
handlers to execute, potentially creating a mess.

Several of these cases only risk load/store tearing, but create very
confusing code. For instance changing the state from IB_CM_IDLE to
IB_CM_LISTEN doesn't allow any MAD handlers to run in either state, but a
superficial audit would suggest that it is not locked properly.

This loose methodology has allowed two bugs to creep in. After creating an
ID the code did not lock the state transition, apparently mistakenly
assuming that the new ID could not be used concurrently. However, the ID
is immediately placed in the xarray and so a carefully crafted network
sequence could trigger races with the unlocked stores.

The main solution to many of these problems is to use the xarray to create
a two stage add - the first reserves the ID and the second publishes the
pointer. The second stage is either omitted entirely or moved after the
newly created ID is setup.

Where it is trivial to do so other places directly put the state
manipulation under lock, or add an assertion that it is, in fact, under
lock.

This also removes a number of places where the state is being read under
lock, then the lock dropped, reacquired and state tested again.

There remain other issues related to missing locking on cm_id data.

Thanks

------------------------------------------------------------------------
It is based on rdma-next + rdma-rc patch c14dfddbd869
("RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()")

Jason Gunthorpe (15):
  RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
  RDMA/cm: Fix checking for allowed duplicate listens
  RDMA/cm: Remove a race freeing timewait_info
  RDMA/cm: Make the destroy_id flow more robust
  RDMA/cm: Simplify establishing a listen cm_id
  RDMA/cm: Read id.state under lock when doing pr_debug()
  RDMA/cm: Make it clear that there is no concurrency in
    cm_sidr_req_handler()
  RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
  RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
  RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
  RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
  RDMA/cm: Allow ib_send_cm_drep() to be done under lock
  RDMA/cm: Allow ib_send_cm_rej() to be done under lock
  RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
  RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy

 drivers/infiniband/core/cm.c | 732 ++++++++++++++++++++---------------
 1 file changed, 420 insertions(+), 312 deletions(-)

--
2.24.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 02/15] RDMA/cm: Fix checking for allowed duplicate listens Leon Romanovsky
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma, Sean Hefty

From: Jason Gunthorpe <jgg@mellanox.com>

xa_alloc_cyclic() is a SMP release to be paired with some later acquire
during xa_load() as part of cm_acquire_id().

As such, xa_alloc_cyclic() must be done after the cm_id is fully
initialized, in particular, it absolutely must be after the
refcount_set(), otherwise the refcount_inc() in cm_acquire_id() may not
see the set.

As there are several cases where a reader will be able to use the
id.local_id after cm_acquire_id in the IB_CM_IDLE state there needs to be
an unfortunate split into a NULL allocate and a finalizing xa_store.

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c9bd9589bd5a..b1fccbf6ebd8 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -575,18 +575,6 @@ static int cm_init_av_by_path(struct sa_path_rec *path,
 	return 0;
 }

-static int cm_alloc_id(struct cm_id_private *cm_id_priv)
-{
-	int err;
-	u32 id;
-
-	err = xa_alloc_cyclic_irq(&cm.local_id_table, &id, cm_id_priv,
-			xa_limit_32b, &cm.local_id_next, GFP_KERNEL);
-
-	cm_id_priv->id.local_id = (__force __be32)id ^ cm.random_id_operand;
-	return err;
-}
-
 static u32 cm_local_id(__be32 local_id)
 {
 	return (__force u32) (local_id ^ cm.random_id_operand);
@@ -828,6 +816,7 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 void *context)
 {
 	struct cm_id_private *cm_id_priv;
+	u32 id;
 	int ret;

 	cm_id_priv = kzalloc(sizeof *cm_id_priv, GFP_KERNEL);
@@ -839,9 +828,6 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	cm_id_priv->id.cm_handler = cm_handler;
 	cm_id_priv->id.context = context;
 	cm_id_priv->id.remote_cm_qpn = 1;
-	ret = cm_alloc_id(cm_id_priv);
-	if (ret)
-		goto error;

 	spin_lock_init(&cm_id_priv->lock);
 	init_completion(&cm_id_priv->comp);
@@ -850,11 +836,20 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	INIT_LIST_HEAD(&cm_id_priv->altr_list);
 	atomic_set(&cm_id_priv->work_count, -1);
 	refcount_set(&cm_id_priv->refcount, 1);
+
+	ret = xa_alloc_cyclic_irq(&cm.local_id_table, &id, NULL, xa_limit_32b,
+				  &cm.local_id_next, GFP_KERNEL);
+	if (ret)
+		goto error;
+	cm_id_priv->id.local_id = (__force __be32)id ^ cm.random_id_operand;
+	xa_store_irq(&cm.local_id_table, cm_local_id(cm_id_priv->id.local_id),
+		     cm_id_priv, GFP_KERNEL);
+
 	return &cm_id_priv->id;

 error:
 	kfree(cm_id_priv);
-	return ERR_PTR(-ENOMEM);
+	return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(ib_create_cm_id);

--
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 02/15] RDMA/cm: Fix checking for allowed duplicate listens
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 03/15] RDMA/cm: Remove a race freeing timewait_info Leon Romanovsky
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Haggai Eran, linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The test here typod the cm_id_priv to use, it used the one that was
freshly allocated. By definition the allocated one has the matching
cm_handler and zero context, so the condition was always true.

Instead check that the existing listening ID is compatible with the
proposed handler so that it can be shared, as was originally intended.

Fixes: 067b171b8679 ("IB/cm: Share listening CM IDs")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index b1fccbf6ebd8..67b36b8b34ba 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1185,7 +1185,8 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 	/* Find an existing ID */
 	cm_id_priv = cm_find_listen(device, service_id);
 	if (cm_id_priv) {
-		if (cm_id->cm_handler != cm_handler || cm_id->context) {
+		if (cm_id_priv->id.cm_handler != cm_handler ||
+		    cm_id_priv->id.context) {
 			/* Sharing an ib_cm_id with different handlers is not
 			 * supported */
 			spin_unlock_irqrestore(&cm.lock, flags);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 03/15] RDMA/cm: Remove a race freeing timewait_info
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 02/15] RDMA/cm: Fix checking for allowed duplicate listens Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 04/15] RDMA/cm: Make the destroy_id flow more robust Leon Romanovsky
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma, Sean Hefty

From: Jason Gunthorpe <jgg@mellanox.com>

When creating a cm_id during REQ the id immediately becomes visible to the
other MAD handlers, and shortly after the state is moved to IB_CM_REQ_RCVD

This allows cm_rej_handler() to run concurrently and free the work:

        CPU 0                                CPU1
 cm_req_handler()
  ib_create_cm_id()
  cm_match_req()
    id_priv->state = IB_CM_REQ_RCVD
                                       cm_rej_handler()
                                         cm_acquire_id()
                                         spin_lock(&id_priv->lock)
                                         switch (id_priv->state)
  					   case IB_CM_REQ_RCVD:
                                            cm_reset_to_idle()
                                             kfree(id_priv->timewait_info);
   goto destroy
  destroy:
    kfree(id_priv->timewait_info);
                                             id_priv->timewait_info = NULL

Causing a double free or worse.

Do not free the timewait_info without also holding the
id_priv->lock. Simplify this entire flow by making the free unconditional
during cm_destroy_id() and removing the confusing special case error
unwind during creation of the timewait_info.

This also fixes a leak of the timewait if cm_destroy_id() is called in
IB_CM_ESTABLISHED with an XRC TGT QP. The state machine will be left in
ESTABLISHED while it needed to transition through IB_CM_TIMEWAIT to
release the timewait pointer.

Also fix a leak of the timewait_info if the caller mis-uses the API and
does ib_send_cm_reqs().

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 67b36b8b34ba..0e7d2363df88 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1058,14 +1058,22 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 		break;
 	}

-	spin_lock_irq(&cm.lock);
+	spin_lock_irq(&cm_id_priv->lock);
+	spin_lock(&cm.lock);
+	/* Required for cleanup paths related cm_req_handler() */
+	if (cm_id_priv->timewait_info) {
+		cm_cleanup_timewait(cm_id_priv->timewait_info);
+		kfree(cm_id_priv->timewait_info);
+		cm_id_priv->timewait_info = NULL;
+	}
 	if (!list_empty(&cm_id_priv->altr_list) &&
 	    (!cm_id_priv->altr_send_port_not_ready))
 		list_del(&cm_id_priv->altr_list);
 	if (!list_empty(&cm_id_priv->prim_list) &&
 	    (!cm_id_priv->prim_send_port_not_ready))
 		list_del(&cm_id_priv->prim_list);
-	spin_unlock_irq(&cm.lock);
+	spin_unlock(&cm.lock);
+	spin_unlock_irq(&cm_id_priv->lock);

 	cm_free_id(cm_id->local_id);
 	cm_deref_id(cm_id_priv);
@@ -1422,7 +1430,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
 	/* Verify that we're not in timewait. */
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
-	if (cm_id->state != IB_CM_IDLE) {
+	if (cm_id->state != IB_CM_IDLE || WARN_ON(cm_id_priv->timewait_info)) {
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		ret = -EINVAL;
 		goto out;
@@ -1440,12 +1448,12 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
 				 param->ppath_sgid_attr, &cm_id_priv->av,
 				 cm_id_priv);
 	if (ret)
-		goto error1;
+		goto out;
 	if (param->alternate_path) {
 		ret = cm_init_av_by_path(param->alternate_path, NULL,
 					 &cm_id_priv->alt_av, cm_id_priv);
 		if (ret)
-			goto error1;
+			goto out;
 	}
 	cm_id->service_id = param->service_id;
 	cm_id->service_mask = ~cpu_to_be64(0);
@@ -1463,7 +1471,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,

 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
 	if (ret)
-		goto error1;
+		goto out;

 	req_msg = (struct cm_req_msg *) cm_id_priv->msg->mad;
 	cm_format_req(req_msg, cm_id_priv, param);
@@ -1486,7 +1494,6 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
 	return 0;

 error2:	cm_free_msg(cm_id_priv->msg);
-error1:	kfree(cm_id_priv->timewait_info);
 out:	return ret;
 }
 EXPORT_SYMBOL(ib_send_cm_req);
@@ -2018,7 +2025,7 @@ static int cm_req_handler(struct cm_work *work)
 		pr_debug("%s: local_id %d, no listen_cm_id_priv\n", __func__,
 			 be32_to_cpu(cm_id->local_id));
 		ret = -EINVAL;
-		goto free_timeinfo;
+		goto destroy;
 	}

 	cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
@@ -2108,8 +2115,6 @@ static int cm_req_handler(struct cm_work *work)
 rejected:
 	refcount_dec(&cm_id_priv->refcount);
 	cm_deref_id(listen_cm_id_priv);
-free_timeinfo:
-	kfree(cm_id_priv->timewait_info);
 destroy:
 	ib_destroy_cm_id(cm_id);
 	return ret;
--
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 04/15] RDMA/cm: Make the destroy_id flow more robust
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 03/15] RDMA/cm: Remove a race freeing timewait_info Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 05/15] RDMA/cm: Simplify establishing a listen cm_id Leon Romanovsky
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

Too much of the destruction is very carefully sensitive to the state
and various other things. Move more code to the unconditional path and
add several WARN_ONs to check consistency.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 0e7d2363df88..51e6cebb606e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -829,6 +829,8 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	cm_id_priv->id.context = context;
 	cm_id_priv->id.remote_cm_qpn = 1;
 
+	RB_CLEAR_NODE(&cm_id_priv->service_node);
+	RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
 	spin_lock_init(&cm_id_priv->lock);
 	init_completion(&cm_id_priv->comp);
 	INIT_LIST_HEAD(&cm_id_priv->work_list);
@@ -986,11 +988,13 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 		spin_lock_irq(&cm.lock);
 		if (--cm_id_priv->listen_sharecount > 0) {
 			/* The id is still shared. */
+			WARN_ON(refcount_read(&cm_id_priv->refcount) == 1);
 			cm_deref_id(cm_id_priv);
 			spin_unlock_irq(&cm.lock);
 			return;
 		}
 		rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
+		RB_CLEAR_NODE(&cm_id_priv->service_node);
 		spin_unlock_irq(&cm.lock);
 		break;
 	case IB_CM_SIDR_REQ_SENT:
@@ -1001,11 +1005,6 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 	case IB_CM_SIDR_REQ_RCVD:
 		spin_unlock_irq(&cm_id_priv->lock);
 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
-		spin_lock_irq(&cm.lock);
-		if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
-			rb_erase(&cm_id_priv->sidr_id_node,
-				 &cm.remote_sidr_table);
-		spin_unlock_irq(&cm.lock);
 		break;
 	case IB_CM_REQ_SENT:
 	case IB_CM_MRA_REQ_RCVD:
@@ -1072,6 +1071,10 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 	if (!list_empty(&cm_id_priv->prim_list) &&
 	    (!cm_id_priv->prim_send_port_not_ready))
 		list_del(&cm_id_priv->prim_list);
+	WARN_ON(cm_id_priv->listen_sharecount);
+	WARN_ON(!RB_EMPTY_NODE(&cm_id_priv->service_node));
+	if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
+		rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
 	spin_unlock(&cm.lock);
 	spin_unlock_irq(&cm_id_priv->lock);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 05/15] RDMA/cm: Simplify establishing a listen cm_id
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 04/15] RDMA/cm: Make the destroy_id flow more robust Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 06/15] RDMA/cm: Read id.state under lock when doing pr_debug() Leon Romanovsky
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

Any manipulation of cm_id->state must be done under the cm_id_priv->lock,
the two routines that added listens did not follow this rule, because they
never participate in any concurrent access around the state.

However, since this exception makes the code hard to understand, simplify
the flow so that it can be fully locked:
 - Move manipulation of listen_sharecount into cm_insert_listen() so it is
   trivially under the cm.lock without having to expose the cm.lock to the
   caller.
 - Push the cm.lock down into cm_insert_listen() and have the function
   increment the reference count before returning an existing pointer.
 - Split ib_cm_listen() into an cm_init_listen() and do not call
   ib_cm_listen() from ib_cm_insert_listen()
 - Make both ib_cm_listen() and ib_cm_insert_listen() directly call
   cm_insert_listen() under their cm_id_priv->lock which does both a
   collision detect and, if needed, the insert (atomically)
 - Enclose all state manipulation within the cm_id_priv->lock, notice this
   set can be done safely after cm_insert_listen() as no reader is allowed
   to read the state without holding the lock.
 - Do not set the listen cm_id in the xarray, as it is never correct to
   look it up. This makes the concurrency simpler to understand.

Many needless error unwinds are removed in the process.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 199 ++++++++++++++++++++---------------
 1 file changed, 116 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 51e6cebb606e..051a546b8e7b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -624,22 +624,44 @@ static int be64_gt(__be64 a, __be64 b)
 	return (__force u64) a > (__force u64) b;
 }
 
-static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
+/*
+ * Inserts a new cm_id_priv into the listen_service_table. Returns cm_id_priv
+ * if the new ID was inserted, NULL if it could not be inserted due to a
+ * collision, or the existing cm_id_priv ready for shared usage.
+ */
+static struct cm_id_private *cm_insert_listen(struct cm_id_private *cm_id_priv,
+					      ib_cm_handler shared_handler)
 {
 	struct rb_node **link = &cm.listen_service_table.rb_node;
 	struct rb_node *parent = NULL;
 	struct cm_id_private *cur_cm_id_priv;
 	__be64 service_id = cm_id_priv->id.service_id;
 	__be64 service_mask = cm_id_priv->id.service_mask;
+	unsigned long flags;
 
+	spin_lock_irqsave(&cm.lock, flags);
 	while (*link) {
 		parent = *link;
 		cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
 					  service_node);
 		if ((cur_cm_id_priv->id.service_mask & service_id) ==
 		    (service_mask & cur_cm_id_priv->id.service_id) &&
-		    (cm_id_priv->id.device == cur_cm_id_priv->id.device))
+		    (cm_id_priv->id.device == cur_cm_id_priv->id.device)) {
+			/*
+			 * Sharing an ib_cm_id with different handlers is not
+			 * supported
+			 */
+			if (cur_cm_id_priv->id.cm_handler != shared_handler ||
+			    cur_cm_id_priv->id.context ||
+			    WARN_ON(!cur_cm_id_priv->id.cm_handler)) {
+				spin_unlock_irqrestore(&cm.lock, flags);
+				return NULL;
+			}
+			refcount_inc(&cur_cm_id_priv->refcount);
+			cur_cm_id_priv->listen_sharecount++;
+			spin_unlock_irqrestore(&cm.lock, flags);
 			return cur_cm_id_priv;
+		}
 
 		if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
 			link = &(*link)->rb_left;
@@ -652,9 +674,11 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 		else
 			link = &(*link)->rb_right;
 	}
+	cm_id_priv->listen_sharecount++;
 	rb_link_node(&cm_id_priv->service_node, parent, link);
 	rb_insert_color(&cm_id_priv->service_node, &cm.listen_service_table);
-	return NULL;
+	spin_unlock_irqrestore(&cm.lock, flags);
+	return cm_id_priv;
 }
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
@@ -811,9 +835,9 @@ static void cm_reject_sidr_req(struct cm_id_private *cm_id_priv,
 	ib_send_cm_sidr_rep(&cm_id_priv->id, &param);
 }
 
-struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
-				 ib_cm_handler cm_handler,
-				 void *context)
+static struct cm_id_private *cm_alloc_id_priv(struct ib_device *device,
+					      ib_cm_handler cm_handler,
+					      void *context)
 {
 	struct cm_id_private *cm_id_priv;
 	u32 id;
@@ -844,15 +868,37 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	if (ret)
 		goto error;
 	cm_id_priv->id.local_id = (__force __be32)id ^ cm.random_id_operand;
-	xa_store_irq(&cm.local_id_table, cm_local_id(cm_id_priv->id.local_id),
-		     cm_id_priv, GFP_KERNEL);
 
-	return &cm_id_priv->id;
+	return cm_id_priv;
 
 error:
 	kfree(cm_id_priv);
 	return ERR_PTR(ret);
 }
+
+/*
+ * Make the ID visible to the MAD handlers and other threads that use the
+ * xarray.
+ */
+static void cm_finalize_id(struct cm_id_private *cm_id_priv)
+{
+	xa_store_irq(&cm.local_id_table, cm_local_id(cm_id_priv->id.local_id),
+		     cm_id_priv, GFP_KERNEL);
+}
+
+struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
+				 ib_cm_handler cm_handler,
+				 void *context)
+{
+	struct cm_id_private *cm_id_priv;
+
+	cm_id_priv = cm_alloc_id_priv(device, cm_handler, context);
+	if (IS_ERR(cm_id_priv))
+		return ERR_CAST(cm_id_priv);
+
+	cm_finalize_id(cm_id_priv);
+	return &cm_id_priv->id;
+}
 EXPORT_SYMBOL(ib_create_cm_id);
 
 static struct cm_work * cm_dequeue_work(struct cm_id_private *cm_id_priv)
@@ -1096,8 +1142,27 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
 }
 EXPORT_SYMBOL(ib_destroy_cm_id);
 
+static int cm_init_listen(struct cm_id_private *cm_id_priv, __be64 service_id,
+			  __be64 service_mask)
+{
+	service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
+	service_id &= service_mask;
+	if ((service_id & IB_SERVICE_ID_AGN_MASK) == IB_CM_ASSIGN_SERVICE_ID &&
+	    (service_id != IB_CM_ASSIGN_SERVICE_ID))
+		return -EINVAL;
+
+	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
+		cm_id_priv->id.service_id = cpu_to_be64(cm.listen_service_id++);
+		cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+	} else {
+		cm_id_priv->id.service_id = service_id;
+		cm_id_priv->id.service_mask = service_mask;
+	}
+	return 0;
+}
+
 /**
- * __ib_cm_listen - Initiates listening on the specified service ID for
+ * ib_cm_listen - Initiates listening on the specified service ID for
  *   connection and service ID resolution requests.
  * @cm_id: Connection identifier associated with the listen request.
  * @service_id: Service identifier matched against incoming connection
@@ -1109,51 +1174,33 @@ EXPORT_SYMBOL(ib_destroy_cm_id);
  *   exactly.  This parameter is ignored if %service_id is set to
  *   IB_CM_ASSIGN_SERVICE_ID.
  */
-static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
-			  __be64 service_mask)
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask)
 {
-	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
-	int ret = 0;
-
-	service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
-	service_id &= service_mask;
-	if ((service_id & IB_SERVICE_ID_AGN_MASK) == IB_CM_ASSIGN_SERVICE_ID &&
-	    (service_id != IB_CM_ASSIGN_SERVICE_ID))
-		return -EINVAL;
-
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-	if (cm_id->state != IB_CM_IDLE)
-		return -EINVAL;
-
-	cm_id->state = IB_CM_LISTEN;
-	++cm_id_priv->listen_sharecount;
+	struct cm_id_private *cm_id_priv =
+		container_of(cm_id, struct cm_id_private, id);
+	unsigned long flags;
+	int ret;
 
-	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
-		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
-		cm_id->service_mask = ~cpu_to_be64(0);
-	} else {
-		cm_id->service_id = service_id;
-		cm_id->service_mask = service_mask;
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	if (cm_id_priv->id.state != IB_CM_IDLE) {
+		ret = -EINVAL;
+		goto out;
 	}
-	cur_cm_id_priv = cm_insert_listen(cm_id_priv);
 
-	if (cur_cm_id_priv) {
-		cm_id->state = IB_CM_IDLE;
-		--cm_id_priv->listen_sharecount;
+	ret = cm_init_listen(cm_id_priv, service_id, service_mask);
+	if (ret)
+		goto out;
+
+	if (!cm_insert_listen(cm_id_priv, NULL)) {
 		ret = -EBUSY;
+		goto out;
 	}
-	return ret;
-}
 
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask)
-{
-	unsigned long flags;
-	int ret;
-
-	spin_lock_irqsave(&cm.lock, flags);
-	ret = __ib_cm_listen(cm_id, service_id, service_mask);
-	spin_unlock_irqrestore(&cm.lock, flags);
+	cm_id_priv->id.state = IB_CM_LISTEN;
+	ret = 0;
 
+out:
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(ib_cm_listen);
@@ -1178,52 +1225,38 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 				     ib_cm_handler cm_handler,
 				     __be64 service_id)
 {
+	struct cm_id_private *listen_id_priv;
 	struct cm_id_private *cm_id_priv;
-	struct ib_cm_id *cm_id;
-	unsigned long flags;
 	int err = 0;
 
 	/* Create an ID in advance, since the creation may sleep */
-	cm_id = ib_create_cm_id(device, cm_handler, NULL);
-	if (IS_ERR(cm_id))
-		return cm_id;
-
-	spin_lock_irqsave(&cm.lock, flags);
+	cm_id_priv = cm_alloc_id_priv(device, cm_handler, NULL);
+	if (IS_ERR(cm_id_priv))
+		return ERR_CAST(cm_id_priv);
 
-	if (service_id == IB_CM_ASSIGN_SERVICE_ID)
-		goto new_id;
+	err = cm_init_listen(cm_id_priv, service_id, 0);
+	if (err)
+		return ERR_PTR(err);
 
-	/* Find an existing ID */
-	cm_id_priv = cm_find_listen(device, service_id);
-	if (cm_id_priv) {
-		if (cm_id_priv->id.cm_handler != cm_handler ||
-		    cm_id_priv->id.context) {
-			/* Sharing an ib_cm_id with different handlers is not
-			 * supported */
-			spin_unlock_irqrestore(&cm.lock, flags);
-			ib_destroy_cm_id(cm_id);
+	spin_lock_irq(&cm_id_priv->lock);
+	listen_id_priv = cm_insert_listen(cm_id_priv, cm_handler);
+	if (listen_id_priv != cm_id_priv) {
+		spin_unlock_irq(&cm_id_priv->lock);
+		ib_destroy_cm_id(&cm_id_priv->id);
+		if (!listen_id_priv)
 			return ERR_PTR(-EINVAL);
-		}
-		refcount_inc(&cm_id_priv->refcount);
-		++cm_id_priv->listen_sharecount;
-		spin_unlock_irqrestore(&cm.lock, flags);
-
-		ib_destroy_cm_id(cm_id);
-		cm_id = &cm_id_priv->id;
-		return cm_id;
+		return &listen_id_priv->id;
 	}
+	cm_id_priv->id.state = IB_CM_LISTEN;
+	spin_unlock_irq(&cm_id_priv->lock);
 
-new_id:
-	/* Use newly created ID */
-	err = __ib_cm_listen(cm_id, service_id, 0);
-
-	spin_unlock_irqrestore(&cm.lock, flags);
+	/*
+	 * A listen ID does not need to be in the xarray since it does not
+	 * receive mads, is not placed in the remote_id or remote_qpn rbtree,
+	 * and does not enter timewait.
+	 */
 
-	if (err) {
-		ib_destroy_cm_id(cm_id);
-		return ERR_PTR(err);
-	}
-	return cm_id;
+	return &cm_id_priv->id;
 }
 EXPORT_SYMBOL(ib_cm_insert_listen);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 06/15] RDMA/cm: Read id.state under lock when doing pr_debug()
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 05/15] RDMA/cm: Simplify establishing a listen cm_id Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 07/15] RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler() Leon Romanovsky
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Daniel Jurgens, linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The lock should not be dropped before doing the pr_debug() print as it is
accessing data protected by the lock, such as id.state.

Fixes: 119bf81793ea ("IB/cm: Add debug prints to ib_cm")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 051a546b8e7b..f50b56302500 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2421,13 +2421,13 @@ static int cm_rep_handler(struct cm_work *work)
 	case IB_CM_MRA_REQ_RCVD:
 		break;
 	default:
-		spin_unlock_irq(&cm_id_priv->lock);
 		ret = -EINVAL;
 		pr_debug(
 			"%s: cm_id_priv->id.state: %d, local_comm_id %d, remote_comm_id %d\n",
 			__func__, cm_id_priv->id.state,
 			IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
 			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
+		spin_unlock_irq(&cm_id_priv->lock);
 		goto error;
 	}
 
@@ -2693,10 +2693,10 @@ int ib_send_cm_drep(struct ib_cm_id *cm_id,
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
 	if (cm_id->state != IB_CM_DREQ_RCVD) {
-		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
-		kfree(data);
 		pr_debug("%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
 			 __func__, be32_to_cpu(cm_id->local_id), cm_id->state);
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		kfree(data);
 		return -EINVAL;
 	}
 
@@ -3032,10 +3032,10 @@ static int cm_rej_handler(struct cm_work *work)
 		}
 		/* fall through */
 	default:
-		spin_unlock_irq(&cm_id_priv->lock);
 		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
 			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
 			 cm_id_priv->id.state);
+		spin_unlock_irq(&cm_id_priv->lock);
 		ret = -EINVAL;
 		goto out;
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 07/15] RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler()
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (5 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 06/15] RDMA/cm: Read id.state under lock when doing pr_debug() Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 08/15] RDMA/cm: Make it clearer how concurrency works in cm_req_handler() Leon Romanovsky
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

ib_create_cm_id() immediately places the id in the xarray, so it is visible
to network traffic.

The state is initially set to IB_CM_IDLE and all the MAD handlers will
test this state under lock and refuse to advance from IDLE, so adding to
the xarray is harmless.

Further, the set to IB_CM_SIDR_REQ_RCVD also excludes all MAD handlers.

However, the local_id isn't even used for SIDR mode, and there will be no
input MADs related to the newly created ID.

So, make the whole flow simpler so it can be understood:
 - Do not put the SIDR cm_id in the xarray. This directly shows that there
   is no concurrency
 - Delete the confusing work_count and pending_list manipulations. This
   mechanism is only used by MAD handlers and timewait, neither of which
   apply to SIDR.
 - Add a few comments and rename 'cur_cm_id_priv' to 'listen_cm_id_priv'
 - Move other loose sets up to immediately after cm_id creation so that
   the cm_id is fully configured right away. This fixes an oversight where
   the service_id will not be returned back on a IB_SIDR_UNSUPPORTED
   reject.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 64 +++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f50b56302500..2eb6ece9b783 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3562,20 +3562,27 @@ static void cm_format_sidr_req_event(struct cm_work *work,
 
 static int cm_sidr_req_handler(struct cm_work *work)
 {
-	struct ib_cm_id *cm_id;
-	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
+	struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
 	struct cm_sidr_req_msg *sidr_req_msg;
 	struct ib_wc *wc;
 	int ret;
 
-	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
-	if (IS_ERR(cm_id))
-		return PTR_ERR(cm_id);
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
+	cm_id_priv =
+		cm_alloc_id_priv(work->port->cm_dev->ib_device, NULL, NULL);
+	if (IS_ERR(cm_id_priv))
+		return PTR_ERR(cm_id_priv);
 
 	/* Record SGID/SLID and request ID for lookup. */
 	sidr_req_msg = (struct cm_sidr_req_msg *)
 				work->mad_recv_wc->recv_buf.mad;
+
+	cm_id_priv->id.remote_id =
+		cpu_to_be32(IBA_GET(CM_SIDR_REQ_REQUESTID, sidr_req_msg));
+	cm_id_priv->id.service_id =
+		cpu_to_be64(IBA_GET(CM_SIDR_REQ_SERVICEID, sidr_req_msg));
+	cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+	cm_id_priv->tid = sidr_req_msg->hdr.tid;
+
 	wc = work->mad_recv_wc->wc;
 	cm_id_priv->av.dgid.global.subnet_prefix = cpu_to_be64(wc->slid);
 	cm_id_priv->av.dgid.global.interface_id = 0;
@@ -3585,41 +3592,44 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	if (ret)
 		goto out;
 
-	cm_id_priv->id.remote_id =
-		cpu_to_be32(IBA_GET(CM_SIDR_REQ_REQUESTID, sidr_req_msg));
-	cm_id_priv->tid = sidr_req_msg->hdr.tid;
-	atomic_inc(&cm_id_priv->work_count);
-
 	spin_lock_irq(&cm.lock);
-	cur_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
-	if (cur_cm_id_priv) {
+	listen_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
+	if (listen_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
 		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
 				counter[CM_SIDR_REQ_COUNTER]);
 		goto out; /* Duplicate message. */
 	}
 	cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
-	cur_cm_id_priv = cm_find_listen(
-		cm_id->device,
-		cpu_to_be64(IBA_GET(CM_SIDR_REQ_SERVICEID, sidr_req_msg)));
-	if (!cur_cm_id_priv) {
+	listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
+					   cm_id_priv->id.service_id);
+	if (!listen_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
 		goto out; /* No match. */
 	}
-	refcount_inc(&cur_cm_id_priv->refcount);
-	refcount_inc(&cm_id_priv->refcount);
+	refcount_inc(&listen_cm_id_priv->refcount);
 	spin_unlock_irq(&cm.lock);
 
-	cm_id_priv->id.cm_handler = cur_cm_id_priv->id.cm_handler;
-	cm_id_priv->id.context = cur_cm_id_priv->id.context;
-	cm_id_priv->id.service_id =
-		cpu_to_be64(IBA_GET(CM_SIDR_REQ_SERVICEID, sidr_req_msg));
-	cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+	cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
+	cm_id_priv->id.context = listen_cm_id_priv->id.context;
 
-	cm_format_sidr_req_event(work, cm_id_priv, &cur_cm_id_priv->id);
-	cm_process_work(cm_id_priv, work);
-	cm_deref_id(cur_cm_id_priv);
+	/*
+	 * A SIDR ID does not need to be in the xarray since it does not receive
+	 * mads, is not placed in the remote_id or remote_qpn rbtree, and does
+	 * not enter timewait.
+	 */
+
+	cm_format_sidr_req_event(work, cm_id_priv, &listen_cm_id_priv->id);
+	ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, &work->cm_event);
+	cm_free_work(work);
+	/*
+	 * A pointer to the listen_cm_id is held in the event, so this deref
+	 * must be after the event is delivered above.
+	 */
+	cm_deref_id(listen_cm_id_priv);
+	if (ret)
+		cm_destroy_id(&cm_id_priv->id, ret);
 	return 0;
 out:
 	ib_destroy_cm_id(&cm_id_priv->id);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 08/15] RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (6 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 07/15] RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler() Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 09/15] RDMA/cm: Add missing locking around id.state in cm_dup_req_handler Leon Romanovsky
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

ib_crate_cm_id() immediately places the id in the xarray, and publishes it
into the remote_id and remote_qpn rbtrees. This makes it visible to other
threads before it is fully set up.

It appears the thinking here was that the states IB_CM_IDLE and
IB_CM_REQ_RCVD do not allow any MAD handler or lookup in the remote_id and
remote_qpn rbtrees to advance.

However, cm_rej_handler() does take an action on IB_CM_REQ_RCVD, which is
not really expected by the design.

Make the whole thing clearer:
 - Keep the new cm_id out of the xarray until it is completely set up.
   This directly prevents MAD handlers and all rbtree lookups from seeing
   the pointer.
 - Move all the trivial setup right to the top so it is obviously done
   before any concurrency begins
 - Move the mutation of the cm_id_priv out of cm_match_id() and into the
   caller so the state transition is obvious
 - Place the manipulation of the work_list at the end, under lock, after
   the cm_id is placed in the xarray. The work_count cannot change on an
   ID outside the xarray.
 - Add some comments

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 99 +++++++++++++++++++++---------------
 1 file changed, 57 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 2eb6ece9b783..d00eb4751ae5 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1973,14 +1973,10 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 		cm_issue_rej(work->port, work->mad_recv_wc,
 			     IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ,
 			     NULL, 0);
-		goto out;
+		return NULL;
 	}
 	refcount_inc(&listen_cm_id_priv->refcount);
-	refcount_inc(&cm_id_priv->refcount);
-	cm_id_priv->id.state = IB_CM_REQ_RCVD;
-	atomic_inc(&cm_id_priv->work_count);
 	spin_unlock_irq(&cm.lock);
-out:
 	return listen_cm_id_priv;
 }
 
@@ -2022,7 +2018,6 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
 
 static int cm_req_handler(struct cm_work *work)
 {
-	struct ib_cm_id *cm_id;
 	struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
 	struct cm_req_msg *req_msg;
 	const struct ib_global_route *grh;
@@ -2031,13 +2026,33 @@ static int cm_req_handler(struct cm_work *work)
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
 
-	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
-	if (IS_ERR(cm_id))
-		return PTR_ERR(cm_id);
+	cm_id_priv =
+		cm_alloc_id_priv(work->port->cm_dev->ib_device, NULL, NULL);
+	if (IS_ERR(cm_id_priv))
+		return PTR_ERR(cm_id_priv);
 
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	cm_id_priv->id.remote_id =
 		cpu_to_be32(IBA_GET(CM_REQ_LOCAL_COMM_ID, req_msg));
+	cm_id_priv->id.service_id =
+		cpu_to_be64(IBA_GET(CM_REQ_SERVICE_ID, req_msg));
+	cm_id_priv->id.service_mask = ~cpu_to_be64(0);
+	cm_id_priv->tid = req_msg->hdr.tid;
+	cm_id_priv->timeout_ms = cm_convert_to_ms(
+		IBA_GET(CM_REQ_LOCAL_CM_RESPONSE_TIMEOUT, req_msg));
+	cm_id_priv->max_cm_retries = IBA_GET(CM_REQ_MAX_CM_RETRIES, req_msg);
+	cm_id_priv->remote_qpn =
+		cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
+	cm_id_priv->initiator_depth =
+		IBA_GET(CM_REQ_RESPONDER_RESOURCES, req_msg);
+	cm_id_priv->responder_resources =
+		IBA_GET(CM_REQ_INITIATOR_DEPTH, req_msg);
+	cm_id_priv->path_mtu = IBA_GET(CM_REQ_PATH_PACKET_PAYLOAD_MTU, req_msg);
+	cm_id_priv->pkey = cpu_to_be16(IBA_GET(CM_REQ_PARTITION_KEY, req_msg));
+	cm_id_priv->sq_psn = cpu_to_be32(IBA_GET(CM_REQ_STARTING_PSN, req_msg));
+	cm_id_priv->retry_count = IBA_GET(CM_REQ_RETRY_COUNT, req_msg);
+	cm_id_priv->rnr_retry_count = IBA_GET(CM_REQ_RNR_RETRY_COUNT, req_msg);
+	cm_id_priv->qp_type = cm_req_get_qp_type(req_msg);
+
 	ret = cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				      work->mad_recv_wc->recv_buf.grh,
 				      &cm_id_priv->av);
@@ -2049,27 +2064,26 @@ static int cm_req_handler(struct cm_work *work)
 		ret = PTR_ERR(cm_id_priv->timewait_info);
 		goto destroy;
 	}
-	cm_id_priv->timewait_info->work.remote_id =
-		cpu_to_be32(IBA_GET(CM_REQ_LOCAL_COMM_ID, req_msg));
+	cm_id_priv->timewait_info->work.remote_id = cm_id_priv->id.remote_id;
 	cm_id_priv->timewait_info->remote_ca_guid =
 		cpu_to_be64(IBA_GET(CM_REQ_LOCAL_CA_GUID, req_msg));
-	cm_id_priv->timewait_info->remote_qpn =
-		cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
+	cm_id_priv->timewait_info->remote_qpn = cm_id_priv->remote_qpn;
+
+	/*
+	 * Note that the ID pointer is not in the xarray at this point,
+	 * so this set is only visible to the local thread.
+	 */
+	cm_id_priv->id.state = IB_CM_REQ_RCVD;
 
 	listen_cm_id_priv = cm_match_req(work, cm_id_priv);
 	if (!listen_cm_id_priv) {
 		pr_debug("%s: local_id %d, no listen_cm_id_priv\n", __func__,
-			 be32_to_cpu(cm_id->local_id));
+			 be32_to_cpu(cm_id_priv->id.local_id));
+		cm_id_priv->id.state = IB_CM_IDLE;
 		ret = -EINVAL;
 		goto destroy;
 	}
 
-	cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
-	cm_id_priv->id.context = listen_cm_id_priv->id.context;
-	cm_id_priv->id.service_id =
-		cpu_to_be64(IBA_GET(CM_REQ_SERVICE_ID, req_msg));
-	cm_id_priv->id.service_mask = ~cpu_to_be64(0);
-
 	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
 
 	memset(&work->path[0], 0, sizeof(work->path[0]));
@@ -2107,10 +2121,10 @@ static int cm_req_handler(struct cm_work *work)
 				     work->port->port_num, 0,
 				     &work->path[0].sgid);
 		if (err)
-			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
+			ib_send_cm_rej(&cm_id_priv->id, IB_CM_REJ_INVALID_GID,
 				       NULL, 0, NULL, 0);
 		else
-			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
+			ib_send_cm_rej(&cm_id_priv->id, IB_CM_REJ_INVALID_GID,
 				       &work->path[0].sgid,
 				       sizeof(work->path[0].sgid),
 				       NULL, 0);
@@ -2120,39 +2134,40 @@ static int cm_req_handler(struct cm_work *work)
 		ret = cm_init_av_by_path(&work->path[1], NULL,
 					 &cm_id_priv->alt_av, cm_id_priv);
 		if (ret) {
-			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
+			ib_send_cm_rej(&cm_id_priv->id,
+				       IB_CM_REJ_INVALID_ALT_GID,
 				       &work->path[0].sgid,
 				       sizeof(work->path[0].sgid), NULL, 0);
 			goto rejected;
 		}
 	}
-	cm_id_priv->tid = req_msg->hdr.tid;
-	cm_id_priv->timeout_ms = cm_convert_to_ms(
-		IBA_GET(CM_REQ_LOCAL_CM_RESPONSE_TIMEOUT, req_msg));
-	cm_id_priv->max_cm_retries = IBA_GET(CM_REQ_MAX_CM_RETRIES, req_msg);
-	cm_id_priv->remote_qpn =
-		cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
-	cm_id_priv->initiator_depth =
-		IBA_GET(CM_REQ_RESPONDER_RESOURCES, req_msg);
-	cm_id_priv->responder_resources =
-		IBA_GET(CM_REQ_INITIATOR_DEPTH, req_msg);
-	cm_id_priv->path_mtu = IBA_GET(CM_REQ_PATH_PACKET_PAYLOAD_MTU, req_msg);
-	cm_id_priv->pkey = cpu_to_be16(IBA_GET(CM_REQ_PARTITION_KEY, req_msg));
-	cm_id_priv->sq_psn = cpu_to_be32(IBA_GET(CM_REQ_STARTING_PSN, req_msg));
-	cm_id_priv->retry_count = IBA_GET(CM_REQ_RETRY_COUNT, req_msg);
-	cm_id_priv->rnr_retry_count = IBA_GET(CM_REQ_RNR_RETRY_COUNT, req_msg);
-	cm_id_priv->qp_type = cm_req_get_qp_type(req_msg);
 
+	cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
+	cm_id_priv->id.context = listen_cm_id_priv->id.context;
 	cm_format_req_event(work, cm_id_priv, &listen_cm_id_priv->id);
+
+	/* Now MAD handlers can see the new ID */
+	spin_lock_irq(&cm_id_priv->lock);
+	cm_finalize_id(cm_id_priv);
+
+	/* Refcount belongs to the event, pairs with cm_process_work() */
+	refcount_inc(&cm_id_priv->refcount);
+	atomic_inc(&cm_id_priv->work_count);
+	spin_unlock_irq(&cm_id_priv->lock);
 	cm_process_work(cm_id_priv, work);
+	/*
+	 * Since this ID was just created and was not made visible to other MAD
+	 * handlers until the cm_finalize_id() above we know that the
+	 * cm_process_work() will deliver the event and the listen_cm_id
+	 * embedded in the event can be derefed here.
+	 */
 	cm_deref_id(listen_cm_id_priv);
 	return 0;
 
 rejected:
-	refcount_dec(&cm_id_priv->refcount);
 	cm_deref_id(listen_cm_id_priv);
 destroy:
-	ib_destroy_cm_id(cm_id);
+	ib_destroy_cm_id(&cm_id_priv->id);
 	return ret;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 09/15] RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (7 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 08/15] RDMA/cm: Make it clearer how concurrency works in cm_req_handler() Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 10/15] RDMA/cm: Add some lockdep assertions for cm_id_priv->lock Leon Romanovsky
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma, Sean Hefty

From: Jason Gunthorpe <jgg@mellanox.com>

All accesses to id.state must be done under the spinlock.

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d00eb4751ae5..d002b34bd5a3 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1887,8 +1887,12 @@ static void cm_dup_req_handler(struct cm_work *work,
 			counter[CM_REQ_COUNTER]);

 	/* Quick state check to discard duplicate REQs. */
-	if (cm_id_priv->id.state == IB_CM_REQ_RCVD)
+	spin_lock_irq(&cm_id_priv->lock);
+	if (cm_id_priv->id.state == IB_CM_REQ_RCVD) {
+		spin_unlock_irq(&cm_id_priv->lock);
 		return;
+	}
+	spin_unlock_irq(&cm_id_priv->lock);

 	ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg);
 	if (ret)
--
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 10/15] RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (8 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 09/15] RDMA/cm: Add missing locking around id.state in cm_dup_req_handler Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 11/15] RDMA/cm: Allow ib_send_cm_dreq() to be done under lock Leon Romanovsky
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

These functions all touch state, so must be called under the lock.
Inspection shows this is currently true.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d002b34bd5a3..84679f16a923 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -978,6 +978,8 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv)
 	unsigned long flags;
 	struct cm_device *cm_dev;
 
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	cm_dev = ib_get_client_data(cm_id_priv->id.device, &cm_client);
 	if (!cm_dev)
 		return;
@@ -1009,6 +1011,8 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
 {
 	unsigned long flags;
 
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	cm_id_priv->id.state = IB_CM_IDLE;
 	if (cm_id_priv->timewait_info) {
 		spin_lock_irqsave(&cm.lock, flags);
@@ -1838,6 +1842,8 @@ static void cm_format_rej(struct cm_rej_msg *rej_msg,
 			  const void *private_data,
 			  u8 private_data_len)
 {
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	cm_format_mad_hdr(&rej_msg->hdr, CM_REJ_ATTR_ID, cm_id_priv->tid);
 	IBA_SET(CM_REJ_REMOTE_COMM_ID, rej_msg,
 		be32_to_cpu(cm_id_priv->id.remote_id));
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 11/15] RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (9 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 10/15] RDMA/cm: Add some lockdep assertions for cm_id_priv->lock Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 12/15] RDMA/cm: Allow ib_send_cm_drep() " Leon Romanovsky
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The first thing ib_send_cm_dreq() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 54 +++++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 84679f16a923..dd99387d0839 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -82,8 +82,11 @@ const char *__attribute_const__ ibcm_reject_msg(int reason)
 }
 EXPORT_SYMBOL(ibcm_reject_msg);
 
+struct cm_id_private;
 static void cm_add_one(struct ib_device *device);
 static void cm_remove_one(struct ib_device *device, void *client_data);
+static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
+			       const void *private_data, u8 private_data_len);
 
 static struct ib_client cm_client = {
 	.name   = "cm",
@@ -1088,10 +1091,12 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 			       NULL, 0, NULL, 0);
 		break;
 	case IB_CM_ESTABLISHED:
-		spin_unlock_irq(&cm_id_priv->lock);
-		if (cm_id_priv->qp_type == IB_QPT_XRC_TGT)
+		if (cm_id_priv->qp_type == IB_QPT_XRC_TGT) {
+			spin_unlock_irq(&cm_id_priv->lock);
 			break;
-		ib_send_cm_dreq(cm_id, NULL, 0);
+		}
+		cm_send_dreq_locked(cm_id_priv, NULL, 0);
+		spin_unlock_irq(&cm_id_priv->lock);
 		goto retest;
 	case IB_CM_DREQ_SENT:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
@@ -2631,35 +2636,32 @@ static void cm_format_dreq(struct cm_dreq_msg *dreq_msg,
 			    private_data_len);
 }
 
-int ib_send_cm_dreq(struct ib_cm_id *cm_id,
-		    const void *private_data,
-		    u8 private_data_len)
+static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
+			       const void *private_data, u8 private_data_len)
 {
-	struct cm_id_private *cm_id_priv;
 	struct ib_mad_send_buf *msg;
-	unsigned long flags;
 	int ret;
 
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	if (private_data && private_data_len > IB_CM_DREQ_PRIVATE_DATA_SIZE)
 		return -EINVAL;
 
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-	spin_lock_irqsave(&cm_id_priv->lock, flags);
-	if (cm_id->state != IB_CM_ESTABLISHED) {
+	if (cm_id_priv->id.state != IB_CM_ESTABLISHED) {
 		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
-			 be32_to_cpu(cm_id->local_id), cm_id->state);
-		ret = -EINVAL;
-		goto out;
+			 be32_to_cpu(cm_id_priv->id.local_id),
+			 cm_id_priv->id.state);
+		return -EINVAL;
 	}
 
-	if (cm_id->lap_state == IB_CM_LAP_SENT ||
-	    cm_id->lap_state == IB_CM_MRA_LAP_RCVD)
+	if (cm_id_priv->id.lap_state == IB_CM_LAP_SENT ||
+	    cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD)
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
 
 	ret = cm_alloc_msg(cm_id_priv, &msg);
 	if (ret) {
 		cm_enter_timewait(cm_id_priv);
-		goto out;
+		return ret;
 	}
 
 	cm_format_dreq((struct cm_dreq_msg *) msg->mad, cm_id_priv,
@@ -2670,14 +2672,26 @@ int ib_send_cm_dreq(struct ib_cm_id *cm_id,
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		cm_enter_timewait(cm_id_priv);
-		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		cm_free_msg(msg);
 		return ret;
 	}
 
-	cm_id->state = IB_CM_DREQ_SENT;
+	cm_id_priv->id.state = IB_CM_DREQ_SENT;
 	cm_id_priv->msg = msg;
-out:	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	return 0;
+}
+
+int ib_send_cm_dreq(struct ib_cm_id *cm_id, const void *private_data,
+		    u8 private_data_len)
+{
+	struct cm_id_private *cm_id_priv =
+		container_of(cm_id, struct cm_id_private, id);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	ret = cm_send_dreq_locked(cm_id_priv, private_data, private_data_len);
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(ib_send_cm_dreq);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 12/15] RDMA/cm: Allow ib_send_cm_drep() to be done under lock
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (10 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 11/15] RDMA/cm: Allow ib_send_cm_dreq() to be done under lock Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 13/15] RDMA/cm: Allow ib_send_cm_rej() " Leon Romanovsky
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The first thing ib_send_cm_drep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 55 +++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index dd99387d0839..2d3b661153a4 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -87,6 +87,8 @@ static void cm_add_one(struct ib_device *device);
 static void cm_remove_one(struct ib_device *device, void *client_data);
 static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
 			       const void *private_data, u8 private_data_len);
+static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
+			       void *private_data, u8 private_data_len);
 
 static struct ib_client cm_client = {
 	.name   = "cm",
@@ -1104,8 +1106,8 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 		spin_unlock_irq(&cm_id_priv->lock);
 		break;
 	case IB_CM_DREQ_RCVD:
+		cm_send_drep_locked(cm_id_priv, NULL, 0);
 		spin_unlock_irq(&cm_id_priv->lock);
-		ib_send_cm_drep(cm_id, NULL, 0);
 		break;
 	default:
 		spin_unlock_irq(&cm_id_priv->lock);
@@ -2712,51 +2714,60 @@ static void cm_format_drep(struct cm_drep_msg *drep_msg,
 			    private_data_len);
 }
 
-int ib_send_cm_drep(struct ib_cm_id *cm_id,
-		    const void *private_data,
-		    u8 private_data_len)
+static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
+			       void *private_data, u8 private_data_len)
 {
-	struct cm_id_private *cm_id_priv;
 	struct ib_mad_send_buf *msg;
-	unsigned long flags;
-	void *data;
 	int ret;
 
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	if (private_data && private_data_len > IB_CM_DREP_PRIVATE_DATA_SIZE)
 		return -EINVAL;
 
-	data = cm_copy_private_data(private_data, private_data_len);
-	if (IS_ERR(data))
-		return PTR_ERR(data);
-
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-	spin_lock_irqsave(&cm_id_priv->lock, flags);
-	if (cm_id->state != IB_CM_DREQ_RCVD) {
-		pr_debug("%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
-			 __func__, be32_to_cpu(cm_id->local_id), cm_id->state);
-		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
-		kfree(data);
+	if (cm_id_priv->id.state != IB_CM_DREQ_RCVD) {
+		pr_debug(
+			"%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
+			__func__, be32_to_cpu(cm_id_priv->id.local_id),
+			cm_id_priv->id.state);
+		kfree(private_data);
 		return -EINVAL;
 	}
 
-	cm_set_private_data(cm_id_priv, data, private_data_len);
+	cm_set_private_data(cm_id_priv, private_data, private_data_len);
 	cm_enter_timewait(cm_id_priv);
 
 	ret = cm_alloc_msg(cm_id_priv, &msg);
 	if (ret)
-		goto out;
+		return ret;
 
 	cm_format_drep((struct cm_drep_msg *) msg->mad, cm_id_priv,
 		       private_data, private_data_len);
 
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
-		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		cm_free_msg(msg);
 		return ret;
 	}
+	return 0;
+}
 
-out:	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+int ib_send_cm_drep(struct ib_cm_id *cm_id, const void *private_data,
+		    u8 private_data_len)
+{
+	struct cm_id_private *cm_id_priv =
+		container_of(cm_id, struct cm_id_private, id);
+	unsigned long flags;
+	void *data;
+	int ret;
+
+	data = cm_copy_private_data(private_data, private_data_len);
+	if (IS_ERR(data))
+		return PTR_ERR(data);
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	ret = cm_send_drep_locked(cm_id_priv, data, private_data_len);
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(ib_send_cm_drep);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 13/15] RDMA/cm: Allow ib_send_cm_rej() to be done under lock
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (11 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 12/15] RDMA/cm: Allow ib_send_cm_drep() " Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 14/15] RDMA/cm: Allow ib_send_cm_sidr_rep() " Leon Romanovsky
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The first thing ib_send_cm_rej() does is obtain the lock, so use the usual
unlocked wrapper, locked actor pattern here.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

While here simplify some of the logic in the implementation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 92 ++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 2d3b661153a4..cf7f1f7958c8 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -89,6 +89,10 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
 			       const void *private_data, u8 private_data_len);
 static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
 			       void *private_data, u8 private_data_len);
+static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
+			      enum ib_cm_rej_reason reason, void *ari,
+			      u8 ari_length, const void *private_data,
+			      u8 private_data_len);
 
 static struct ib_client cm_client = {
 	.name   = "cm",
@@ -1064,11 +1068,11 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 	case IB_CM_REQ_SENT:
 	case IB_CM_MRA_REQ_RCVD:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
+		cm_send_rej_locked(cm_id_priv, IB_CM_REJ_TIMEOUT,
+				   &cm_id_priv->id.device->node_guid,
+				   sizeof(cm_id_priv->id.device->node_guid),
+				   NULL, 0);
 		spin_unlock_irq(&cm_id_priv->lock);
-		ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
-			       &cm_id_priv->id.device->node_guid,
-			       sizeof cm_id_priv->id.device->node_guid,
-			       NULL, 0);
 		break;
 	case IB_CM_REQ_RCVD:
 		if (err == -ENOMEM) {
@@ -1076,9 +1080,10 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 			cm_reset_to_idle(cm_id_priv);
 			spin_unlock_irq(&cm_id_priv->lock);
 		} else {
+			cm_send_rej_locked(cm_id_priv,
+					   IB_CM_REJ_CONSUMER_DEFINED, NULL, 0,
+					   NULL, 0);
 			spin_unlock_irq(&cm_id_priv->lock);
-			ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
-				       NULL, 0, NULL, 0);
 		}
 		break;
 	case IB_CM_REP_SENT:
@@ -1088,9 +1093,9 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 	case IB_CM_MRA_REQ_SENT:
 	case IB_CM_REP_RCVD:
 	case IB_CM_MRA_REP_SENT:
+		cm_send_rej_locked(cm_id_priv, IB_CM_REJ_CONSUMER_DEFINED, NULL,
+				   0, NULL, 0);
 		spin_unlock_irq(&cm_id_priv->lock);
-		ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
-			       NULL, 0, NULL, 0);
 		break;
 	case IB_CM_ESTABLISHED:
 		if (cm_id_priv->qp_type == IB_QPT_XRC_TGT) {
@@ -2926,65 +2931,72 @@ static int cm_drep_handler(struct cm_work *work)
 	return -EINVAL;
 }
 
-int ib_send_cm_rej(struct ib_cm_id *cm_id,
-		   enum ib_cm_rej_reason reason,
-		   void *ari,
-		   u8 ari_length,
-		   const void *private_data,
-		   u8 private_data_len)
+static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
+			      enum ib_cm_rej_reason reason, void *ari,
+			      u8 ari_length, const void *private_data,
+			      u8 private_data_len)
 {
-	struct cm_id_private *cm_id_priv;
 	struct ib_mad_send_buf *msg;
-	unsigned long flags;
 	int ret;
 
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	if ((private_data && private_data_len > IB_CM_REJ_PRIVATE_DATA_SIZE) ||
 	    (ari && ari_length > IB_CM_REJ_ARI_LENGTH))
 		return -EINVAL;
 
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-
-	spin_lock_irqsave(&cm_id_priv->lock, flags);
-	switch (cm_id->state) {
+	switch (cm_id_priv->id.state) {
 	case IB_CM_REQ_SENT:
 	case IB_CM_MRA_REQ_RCVD:
 	case IB_CM_REQ_RCVD:
 	case IB_CM_MRA_REQ_SENT:
 	case IB_CM_REP_RCVD:
 	case IB_CM_MRA_REP_SENT:
-		ret = cm_alloc_msg(cm_id_priv, &msg);
-		if (!ret)
-			cm_format_rej((struct cm_rej_msg *) msg->mad,
-				      cm_id_priv, reason, ari, ari_length,
-				      private_data, private_data_len);
-
 		cm_reset_to_idle(cm_id_priv);
+		ret = cm_alloc_msg(cm_id_priv, &msg);
+		if (ret)
+			return ret;
+		cm_format_rej((struct cm_rej_msg *)msg->mad, cm_id_priv, reason,
+			      ari, ari_length, private_data, private_data_len);
 		break;
 	case IB_CM_REP_SENT:
 	case IB_CM_MRA_REP_RCVD:
-		ret = cm_alloc_msg(cm_id_priv, &msg);
-		if (!ret)
-			cm_format_rej((struct cm_rej_msg *) msg->mad,
-				      cm_id_priv, reason, ari, ari_length,
-				      private_data, private_data_len);
-
 		cm_enter_timewait(cm_id_priv);
+		ret = cm_alloc_msg(cm_id_priv, &msg);
+		if (ret)
+			return ret;
+		cm_format_rej((struct cm_rej_msg *)msg->mad, cm_id_priv, reason,
+			      ari, ari_length, private_data, private_data_len);
 		break;
 	default:
 		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
-			 be32_to_cpu(cm_id_priv->id.local_id), cm_id->state);
-		ret = -EINVAL;
-		goto out;
+			 be32_to_cpu(cm_id_priv->id.local_id),
+			 cm_id_priv->id.state);
+		return -EINVAL;
 	}
 
-	if (ret)
-		goto out;
-
 	ret = ib_post_send_mad(msg, NULL);
-	if (ret)
+	if (ret) {
 		cm_free_msg(msg);
+		return ret;
+	}
 
-out:	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	return 0;
+}
+
+int ib_send_cm_rej(struct ib_cm_id *cm_id, enum ib_cm_rej_reason reason,
+		   void *ari, u8 ari_length, const void *private_data,
+		   u8 private_data_len)
+{
+	struct cm_id_private *cm_id_priv =
+		container_of(cm_id, struct cm_id_private, id);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	ret = cm_send_rej_locked(cm_id_priv, reason, ari, ari_length,
+				 private_data, private_data_len);
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(ib_send_cm_rej);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 14/15] RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (12 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 13/15] RDMA/cm: Allow ib_send_cm_rej() " Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-10  9:25 ` [PATCH rdma-next 15/15] RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy Leon Romanovsky
  2020-03-17 20:15 ` [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Jason Gunthorpe
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The first thing ib_send_cm_sidr_rep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.

Get rid of the cm_reject_sidr_req() wrapper so each call site can call the
locked or unlocked version as required.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 58 +++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index cf7f1f7958c8..f536e20e5394 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -85,6 +85,8 @@ EXPORT_SYMBOL(ibcm_reject_msg);
 struct cm_id_private;
 static void cm_add_one(struct ib_device *device);
 static void cm_remove_one(struct ib_device *device, void *client_data);
+static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
+				   struct ib_cm_sidr_rep_param *param);
 static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
 			       const void *private_data, u8 private_data_len);
 static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
@@ -834,16 +836,6 @@ static struct cm_id_private * cm_insert_remote_sidr(struct cm_id_private
 	return NULL;
 }
 
-static void cm_reject_sidr_req(struct cm_id_private *cm_id_priv,
-			       enum ib_cm_sidr_status status)
-{
-	struct ib_cm_sidr_rep_param param;
-
-	memset(&param, 0, sizeof param);
-	param.status = status;
-	ib_send_cm_sidr_rep(&cm_id_priv->id, &param);
-}
-
 static struct cm_id_private *cm_alloc_id_priv(struct ib_device *device,
 					      ib_cm_handler cm_handler,
 					      void *context)
@@ -1062,8 +1054,10 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 		spin_unlock_irq(&cm_id_priv->lock);
 		break;
 	case IB_CM_SIDR_REQ_RCVD:
+		cm_send_sidr_rep_locked(cm_id_priv,
+					&(struct ib_cm_sidr_rep_param){
+						.status = IB_SIDR_REJECT });
 		spin_unlock_irq(&cm_id_priv->lock);
-		cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
 		break;
 	case IB_CM_REQ_SENT:
 	case IB_CM_MRA_REQ_RCVD:
@@ -3667,7 +3661,9 @@ static int cm_sidr_req_handler(struct cm_work *work)
 					   cm_id_priv->id.service_id);
 	if (!listen_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
-		cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
+		ib_send_cm_sidr_rep(&cm_id_priv->id,
+				    &(struct ib_cm_sidr_rep_param){
+					    .status = IB_SIDR_UNSUPPORTED });
 		goto out; /* No match. */
 	}
 	refcount_inc(&listen_cm_id_priv->refcount);
@@ -3725,50 +3721,52 @@ static void cm_format_sidr_rep(struct cm_sidr_rep_msg *sidr_rep_msg,
 			    param->private_data, param->private_data_len);
 }
 
-int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
-			struct ib_cm_sidr_rep_param *param)
+static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
+				   struct ib_cm_sidr_rep_param *param)
 {
-	struct cm_id_private *cm_id_priv;
 	struct ib_mad_send_buf *msg;
-	unsigned long flags;
 	int ret;
 
+	lockdep_assert_held(&cm_id_priv->lock);
+
 	if ((param->info && param->info_length > IB_CM_SIDR_REP_INFO_LENGTH) ||
 	    (param->private_data &&
 	     param->private_data_len > IB_CM_SIDR_REP_PRIVATE_DATA_SIZE))
 		return -EINVAL;
 
-	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-	spin_lock_irqsave(&cm_id_priv->lock, flags);
-	if (cm_id->state != IB_CM_SIDR_REQ_RCVD) {
-		ret = -EINVAL;
-		goto error;
-	}
+	if (cm_id_priv->id.state != IB_CM_SIDR_REQ_RCVD)
+		return -EINVAL;
 
 	ret = cm_alloc_msg(cm_id_priv, &msg);
 	if (ret)
-		goto error;
+		return ret;
 
 	cm_format_sidr_rep((struct cm_sidr_rep_msg *) msg->mad, cm_id_priv,
 			   param);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
-		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		cm_free_msg(msg);
 		return ret;
 	}
-	cm_id->state = IB_CM_IDLE;
-	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
-
-	spin_lock_irqsave(&cm.lock, flags);
+	cm_id_priv->id.state = IB_CM_IDLE;
 	if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) {
 		rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
 		RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
 	}
-	spin_unlock_irqrestore(&cm.lock, flags);
 	return 0;
+}
 
-error:	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
+			struct ib_cm_sidr_rep_param *param)
+{
+	struct cm_id_private *cm_id_priv =
+		container_of(cm_id, struct cm_id_private, id);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	ret = cm_send_sidr_rep_locked(cm_id_priv, param);
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(ib_send_cm_sidr_rep);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH rdma-next 15/15] RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (13 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 14/15] RDMA/cm: Allow ib_send_cm_sidr_rep() " Leon Romanovsky
@ 2020-03-10  9:25 ` Leon Romanovsky
  2020-03-17 20:15 ` [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Jason Gunthorpe
  15 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2020-03-10  9:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@mellanox.com>

The first switch statement in cm_destroy_id() tries to move the ID to
either IB_CM_IDLE or IB_CM_TIMEWAIT. Both states will block concurrent
MAD handlers from progressing.

Previous patches removed the unreliably lock/unlock sequences in this
flow, this patch removes the extra locking steps and adds the missing
parts to guarantee that destroy reaches IB_CM_IDLE. There is no point in
leaving the ID in the IB_CM_TIMEWAIT state the memory about to be kfreed.

Rework things to hold the lock across all the state transitions and
directly assert when done that it ended up in IB_CM_IDLE as expected.

This was accompanied by a careful audit of all the state transitions here,
which generally did end up in IDLE on their success and non-racy paths.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c | 41 ++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index f536e20e5394..bbbfa77dbce7 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1030,34 +1030,34 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 	struct cm_work *work;
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-retest:
 	spin_lock_irq(&cm_id_priv->lock);
+retest:
 	switch (cm_id->state) {
 	case IB_CM_LISTEN:
-		spin_unlock_irq(&cm_id_priv->lock);
-
-		spin_lock_irq(&cm.lock);
+		spin_lock(&cm.lock);
 		if (--cm_id_priv->listen_sharecount > 0) {
 			/* The id is still shared. */
 			WARN_ON(refcount_read(&cm_id_priv->refcount) == 1);
+			spin_unlock(&cm.lock);
+			spin_unlock_irq(&cm_id_priv->lock);
 			cm_deref_id(cm_id_priv);
-			spin_unlock_irq(&cm.lock);
 			return;
 		}
+		cm_id->state = IB_CM_IDLE;
 		rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
 		RB_CLEAR_NODE(&cm_id_priv->service_node);
-		spin_unlock_irq(&cm.lock);
+		spin_unlock(&cm.lock);
 		break;
 	case IB_CM_SIDR_REQ_SENT:
 		cm_id->state = IB_CM_IDLE;
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
-		spin_unlock_irq(&cm_id_priv->lock);
 		break;
 	case IB_CM_SIDR_REQ_RCVD:
 		cm_send_sidr_rep_locked(cm_id_priv,
 					&(struct ib_cm_sidr_rep_param){
 						.status = IB_SIDR_REJECT });
-		spin_unlock_irq(&cm_id_priv->lock);
+		/* cm_send_sidr_rep_locked will not move to IDLE if it fails */
+		cm_id->state = IB_CM_IDLE;
 		break;
 	case IB_CM_REQ_SENT:
 	case IB_CM_MRA_REQ_RCVD:
@@ -1066,18 +1066,15 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 				   &cm_id_priv->id.device->node_guid,
 				   sizeof(cm_id_priv->id.device->node_guid),
 				   NULL, 0);
-		spin_unlock_irq(&cm_id_priv->lock);
 		break;
 	case IB_CM_REQ_RCVD:
 		if (err == -ENOMEM) {
 			/* Do not reject to allow future retries. */
 			cm_reset_to_idle(cm_id_priv);
-			spin_unlock_irq(&cm_id_priv->lock);
 		} else {
 			cm_send_rej_locked(cm_id_priv,
 					   IB_CM_REJ_CONSUMER_DEFINED, NULL, 0,
 					   NULL, 0);
-			spin_unlock_irq(&cm_id_priv->lock);
 		}
 		break;
 	case IB_CM_REP_SENT:
@@ -1089,31 +1086,35 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 	case IB_CM_MRA_REP_SENT:
 		cm_send_rej_locked(cm_id_priv, IB_CM_REJ_CONSUMER_DEFINED, NULL,
 				   0, NULL, 0);
-		spin_unlock_irq(&cm_id_priv->lock);
 		break;
 	case IB_CM_ESTABLISHED:
 		if (cm_id_priv->qp_type == IB_QPT_XRC_TGT) {
-			spin_unlock_irq(&cm_id_priv->lock);
+			cm_id->state = IB_CM_IDLE;
 			break;
 		}
 		cm_send_dreq_locked(cm_id_priv, NULL, 0);
-		spin_unlock_irq(&cm_id_priv->lock);
 		goto retest;
 	case IB_CM_DREQ_SENT:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
 		cm_enter_timewait(cm_id_priv);
-		spin_unlock_irq(&cm_id_priv->lock);
-		break;
+		goto retest;
 	case IB_CM_DREQ_RCVD:
 		cm_send_drep_locked(cm_id_priv, NULL, 0);
-		spin_unlock_irq(&cm_id_priv->lock);
+		WARN_ON(cm_id->state != IB_CM_TIMEWAIT);
+		goto retest;
+	case IB_CM_TIMEWAIT:
+		/*
+		 * The cm_acquire_id in cm_timewait_handler will stop working
+		 * once we do cm_free_id() below, so just move to idle here for
+		 * consistency.
+		 */
+		cm_id->state = IB_CM_IDLE;
 		break;
-	default:
-		spin_unlock_irq(&cm_id_priv->lock);
+	case IB_CM_IDLE:
 		break;
 	}
+	WARN_ON(cm_id->state != IB_CM_IDLE);
 
-	spin_lock_irq(&cm_id_priv->lock);
 	spin_lock(&cm.lock);
 	/* Required for cleanup paths related cm_req_handler() */
 	if (cm_id_priv->timewait_info) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm
  2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
                   ` (14 preceding siblings ...)
  2020-03-10  9:25 ` [PATCH rdma-next 15/15] RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy Leon Romanovsky
@ 2020-03-17 20:15 ` Jason Gunthorpe
  15 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 20:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Daniel Jurgens, Haggai Eran,
	linux-kernel, linux-rdma, Sean Hefty

On Tue, Mar 10, 2020 at 11:25:30AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> >From Jason:
> 
> cm_id.state is a non-atomic value that must always be read and written
> under lock, or while the thread has the only pointer to the cm_id.
> 
> Critically, during MAD handling the cm_id.state is used to control when
> MAD handlers can run, and in turn what data they can touch. Without
> locking, an assignment to state can immediately allow concurrent MAD
> handlers to execute, potentially creating a mess.
> 
> Several of these cases only risk load/store tearing, but create very
> confusing code. For instance changing the state from IB_CM_IDLE to
> IB_CM_LISTEN doesn't allow any MAD handlers to run in either state, but a
> superficial audit would suggest that it is not locked properly.
> 
> This loose methodology has allowed two bugs to creep in. After creating an
> ID the code did not lock the state transition, apparently mistakenly
> assuming that the new ID could not be used concurrently. However, the ID
> is immediately placed in the xarray and so a carefully crafted network
> sequence could trigger races with the unlocked stores.
> 
> The main solution to many of these problems is to use the xarray to create
> a two stage add - the first reserves the ID and the second publishes the
> pointer. The second stage is either omitted entirely or moved after the
> newly created ID is setup.
> 
> Where it is trivial to do so other places directly put the state
> manipulation under lock, or add an assertion that it is, in fact, under
> lock.
> 
> This also removes a number of places where the state is being read under
> lock, then the lock dropped, reacquired and state tested again.
> 
> There remain other issues related to missing locking on cm_id data.
> 
> Thanks
> 
> It is based on rdma-next + rdma-rc patch c14dfddbd869
> ("RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()")
> 
> Jason Gunthorpe (15):
>   RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
>   RDMA/cm: Fix checking for allowed duplicate listens
>   RDMA/cm: Remove a race freeing timewait_info
>   RDMA/cm: Make the destroy_id flow more robust
>   RDMA/cm: Simplify establishing a listen cm_id
>   RDMA/cm: Read id.state under lock when doing pr_debug()
>   RDMA/cm: Make it clear that there is no concurrency in
>     cm_sidr_req_handler()
>   RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
>   RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
>   RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
>   RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
>   RDMA/cm: Allow ib_send_cm_drep() to be done under lock
>   RDMA/cm: Allow ib_send_cm_rej() to be done under lock
>   RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
>   RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy

Applied to for-next. 

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-17 20:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-10  9:25 [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 01/15] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 02/15] RDMA/cm: Fix checking for allowed duplicate listens Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 03/15] RDMA/cm: Remove a race freeing timewait_info Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 04/15] RDMA/cm: Make the destroy_id flow more robust Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 05/15] RDMA/cm: Simplify establishing a listen cm_id Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 06/15] RDMA/cm: Read id.state under lock when doing pr_debug() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 07/15] RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 08/15] RDMA/cm: Make it clearer how concurrency works in cm_req_handler() Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 09/15] RDMA/cm: Add missing locking around id.state in cm_dup_req_handler Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 10/15] RDMA/cm: Add some lockdep assertions for cm_id_priv->lock Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 11/15] RDMA/cm: Allow ib_send_cm_dreq() to be done under lock Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 12/15] RDMA/cm: Allow ib_send_cm_drep() " Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 13/15] RDMA/cm: Allow ib_send_cm_rej() " Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 14/15] RDMA/cm: Allow ib_send_cm_sidr_rep() " Leon Romanovsky
2020-03-10  9:25 ` [PATCH rdma-next 15/15] RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy Leon Romanovsky
2020-03-17 20:15 ` [PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.