All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine
@ 2020-09-02  8:11 Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 1/8] RDMA/cma: Fix locking for the RDMA_CM_CONNECT state Leon Romanovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Eli Cohen, linux-kernel, linux-rdma, Roland Dreier

From: Leon Romanovsky <leonro@nvidia.com>

From Jason:

The RDMA CMA continues to attract syzkaller bugs due to its somewhat loose
operation of its FSM. Audit and scrub the whole thing to follow modern
expectations.

Overall the design elements are broadly:

- The ULP entry points MUST NOT run in parallel with each other. The ULP
  is solely responsible for preventing this.

- If the ULP returns !0 from it's event callback it MUST guarentee that no
  other ULP threads are touching the cm_id or calling into any RDMA CM
  entry point.

- ULP entry points can sometimes run conurrently with handler callbacks,
  although it is tricky because there are many entry points that exist
  in the flow before the handler is registered.

- Some ULP entry points are called from the ULP event handler callback,
  under the handler_mutex. (however ucma never does this)

- state uses a weird double locking scheme, in most cases one should hold
  the handler_mutex. (It is somewhat unclear what exactly the spinlock is
  for)

- Reading the state without holding the spinlock should use READ_ONCE,
  even if the handler_mutex is held.

- There are certain states which are 'stable' under the handler_mutex,
  exit from that state requires also holding the handler_mutex. This
  explains why testing the test under only the handler_mutex makes sense.

Thanks

Jason Gunthorpe (8):
  RDMA/cma: Fix locking for the RDMA_CM_CONNECT state
  RDMA/cma: Make the locking for automatic state transition more clear
  RDMA/cma: Fix locking for the RDMA_CM_LISTEN state
  RDMA/cma: Remove cma_comp()
  RDMA/cma: Combine cma_ndev_work with cma_work
  RDMA/cma: Remove dead code for kernel rdmacm multicast
  RDMA/cma: Consolidate the destruction of a cma_multicast in one place
  RDMA/cma: Fix use after free race in roce multicast join

 drivers/infiniband/core/cma.c | 466 ++++++++++++++++------------------
 1 file changed, 218 insertions(+), 248 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 1/8] RDMA/cma: Fix locking for the RDMA_CM_CONNECT state
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 2/8] RDMA/cma: Make the locking for automatic state transition more clear Leon Romanovsky
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

It is currently a bit confusing, but the design is if the handler_mutex
is held, and the state is in RDMA_CM_CONNECT, then the state cannot leave
RDMA_CM_CONNECT without also serializing with the handler_mutex.

Make this clearer by adding a direct assertion, fixing the usage in
rdma_connect and generally using READ_ONCE to read the state value.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 44 ++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f1c45f67e2eb..7ac9306ab5b3 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -421,6 +421,15 @@ static int cma_comp_exch(struct rdma_id_private *id_priv,
 	unsigned long flags;
 	int ret;
 
+	/*
+	 * The FSM uses a funny double locking where state is protected by both
+	 * the handler_mutex and the spinlock. State is not allowed to change
+	 * away from a handler_mutex protected value without also holding
+	 * handler_mutex.
+	 */
+	if (comp == RDMA_CM_CONNECT)
+		lockdep_assert_held(&id_priv->handler_mutex);
+
 	spin_lock_irqsave(&id_priv->lock, flags);
 	if ((ret = (id_priv->state == comp)))
 		id_priv->state = exch;
@@ -1969,13 +1978,15 @@ static int cma_ib_handler(struct ib_cm_id *cm_id,
 {
 	struct rdma_id_private *id_priv = cm_id->context;
 	struct rdma_cm_event event = {};
+	enum rdma_cm_state state;
 	int ret;
 
 	mutex_lock(&id_priv->handler_mutex);
+	state = READ_ONCE(id_priv->state);
 	if ((ib_event->event != IB_CM_TIMEWAIT_EXIT &&
-	     id_priv->state != RDMA_CM_CONNECT) ||
+	     state != RDMA_CM_CONNECT) ||
 	    (ib_event->event == IB_CM_TIMEWAIT_EXIT &&
-	     id_priv->state != RDMA_CM_DISCONNECT))
+	     state != RDMA_CM_DISCONNECT))
 		goto out;
 
 	switch (ib_event->event) {
@@ -1985,7 +1996,7 @@ static int cma_ib_handler(struct ib_cm_id *cm_id,
 		event.status = -ETIMEDOUT;
 		break;
 	case IB_CM_REP_RECEIVED:
-		if (cma_comp(id_priv, RDMA_CM_CONNECT) &&
+		if (state == RDMA_CM_CONNECT &&
 		    (id_priv->id.qp_type != IB_QPT_UD)) {
 			trace_cm_send_mra(id_priv);
 			ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0);
@@ -2246,8 +2257,8 @@ static int cma_ib_req_handler(struct ib_cm_id *cm_id,
 		goto net_dev_put;
 	}
 
-	if (cma_comp(conn_id, RDMA_CM_CONNECT) &&
-	    (conn_id->id.qp_type != IB_QPT_UD)) {
+	if (READ_ONCE(conn_id->state) == RDMA_CM_CONNECT &&
+	    conn_id->id.qp_type != IB_QPT_UD) {
 		trace_cm_send_mra(cm_id->context);
 		ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0);
 	}
@@ -2308,7 +2319,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
 	struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr;
 
 	mutex_lock(&id_priv->handler_mutex);
-	if (id_priv->state != RDMA_CM_CONNECT)
+	if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
 		goto out;
 
 	switch (iw_event->event) {
@@ -3807,7 +3818,7 @@ static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
 	int ret;
 
 	mutex_lock(&id_priv->handler_mutex);
-	if (id_priv->state != RDMA_CM_CONNECT)
+	if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
 		goto out;
 
 	switch (ib_event->event) {
@@ -4043,12 +4054,15 @@ static int cma_connect_iw(struct rdma_id_private *id_priv,
 
 int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
 {
-	struct rdma_id_private *id_priv;
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
 	int ret;
 
-	id_priv = container_of(id, struct rdma_id_private, id);
-	if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT))
-		return -EINVAL;
+	mutex_lock(&id_priv->handler_mutex);
+	if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT)) {
+		ret = -EINVAL;
+		goto err_unlock;
+	}
 
 	if (!id->qp) {
 		id_priv->qp_num = conn_param->qp_num;
@@ -4065,11 +4079,13 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
 	else
 		ret = -ENOSYS;
 	if (ret)
-		goto err;
-
+		goto err_state;
+	mutex_unlock(&id_priv->handler_mutex);
 	return 0;
-err:
+err_state:
 	cma_comp_exch(id_priv, RDMA_CM_CONNECT, RDMA_CM_ROUTE_RESOLVED);
+err_unlock:
+	mutex_unlock(&id_priv->handler_mutex);
 	return ret;
 }
 EXPORT_SYMBOL(rdma_connect);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 2/8] RDMA/cma: Make the locking for automatic state transition more clear
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 1/8] RDMA/cma: Fix locking for the RDMA_CM_CONNECT state Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 3/8] RDMA/cma: Fix locking for the RDMA_CM_LISTEN state Leon Romanovsky
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

Re-organize things so the state variable is not read unlocked. The first
attempt to go directly from ADDR_BOUND immediately tells us if the ID is
already bound, if we can't do that then the attempt inside
rdma_bind_addr() to go from IDLE to ADDR_BOUND confirms the ID needs
binding.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 67 +++++++++++++++++++++++------------
 1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7ac9306ab5b3..901d1bd35603 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -3278,32 +3278,54 @@ static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 	return rdma_bind_addr(id, src_addr);
 }
 
-int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
-		      const struct sockaddr *dst_addr, unsigned long timeout_ms)
+/*
+ * If required, resolve the source address for bind and leave the id_priv in
+ * state RDMA_CM_ADDR_BOUND. This oddly uses the state to determine the prior
+ * calls made by ULP, a previously bound ID will not be re-bound and src_addr is
+ * ignored.
+ */
+static int resolve_prepare_src(struct rdma_id_private *id_priv,
+			       struct sockaddr *src_addr,
+			       const struct sockaddr *dst_addr)
 {
-	struct rdma_id_private *id_priv;
 	int ret;
 
-	id_priv = container_of(id, struct rdma_id_private, id);
 	memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr));
-	if (id_priv->state == RDMA_CM_IDLE) {
-		ret = cma_bind_addr(id, src_addr, dst_addr);
-		if (ret) {
-			memset(cma_dst_addr(id_priv), 0,
-			       rdma_addr_size(dst_addr));
-			return ret;
+	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) {
+		/* For a well behaved ULP state will be RDMA_CM_IDLE */
+		ret = cma_bind_addr(&id_priv->id, src_addr, dst_addr);
+		if (ret)
+			goto err_dst;
+		if (WARN_ON(!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND,
+					   RDMA_CM_ADDR_QUERY))) {
+			ret = -EINVAL;
+			goto err_dst;
 		}
 	}
 
 	if (cma_family(id_priv) != dst_addr->sa_family) {
-		memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err_state;
 	}
+	return 0;
 
-	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) {
-		memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
-		return -EINVAL;
-	}
+err_state:
+	cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND);
+err_dst:
+	memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
+	return ret;
+}
+
+int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
+		      const struct sockaddr *dst_addr, unsigned long timeout_ms)
+{
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
+	int ret;
+
+	ret = resolve_prepare_src(id_priv, src_addr, dst_addr);
+	if (ret)
+		return ret;
 
 	if (cma_any_addr(dst_addr)) {
 		ret = cma_resolve_loopback(id_priv);
@@ -3676,20 +3698,21 @@ static int cma_check_linklocal(struct rdma_dev_addr *dev_addr,
 
 int rdma_listen(struct rdma_cm_id *id, int backlog)
 {
-	struct rdma_id_private *id_priv;
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
 	int ret;
 
-	id_priv = container_of(id, struct rdma_id_private, id);
-	if (id_priv->state == RDMA_CM_IDLE) {
+	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_LISTEN)) {
+		/* For a well behaved ULP state will be RDMA_CM_IDLE */
 		id->route.addr.src_addr.ss_family = AF_INET;
 		ret = rdma_bind_addr(id, cma_src_addr(id_priv));
 		if (ret)
 			return ret;
+		if (WARN_ON(!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND,
+					   RDMA_CM_LISTEN)))
+			return -EINVAL;
 	}
 
-	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_LISTEN))
-		return -EINVAL;
-
 	if (id_priv->reuseaddr) {
 		ret = cma_bind_listen(id_priv);
 		if (ret)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 3/8] RDMA/cma: Fix locking for the RDMA_CM_LISTEN state
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 1/8] RDMA/cma: Fix locking for the RDMA_CM_CONNECT state Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 2/8] RDMA/cma: Make the locking for automatic state transition more clear Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 4/8] RDMA/cma: Remove cma_comp() Leon Romanovsky
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

There is a strange unlocked read of the ID state when checking for
reuseaddr. This is because an ID cannot be reusable once it becomes a
listening ID. Instead of using the state to exclude reuse, just clear it
as part of rdma_listen()'s flow to convert reusable into not reusable.

Once a ID goes to listen there is no way back out, and the only use of
reusable is on the bind_list check.

Finally, update the checks under handler_mutex to use READ_ONCE and audit
that once RDMA_CM_LISTEN is observed in a req callback it is stable under
the handler_mutex.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 36 +++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 901d1bd35603..1ab779e63762 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2215,7 +2215,7 @@ static int cma_ib_req_handler(struct ib_cm_id *cm_id,
 	}
 
 	mutex_lock(&listen_id->handler_mutex);
-	if (listen_id->state != RDMA_CM_LISTEN) {
+	if (READ_ONCE(listen_id->state) != RDMA_CM_LISTEN) {
 		ret = -ECONNABORTED;
 		goto err_unlock;
 	}
@@ -2393,7 +2393,7 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
 	listen_id = cm_id->context;
 
 	mutex_lock(&listen_id->handler_mutex);
-	if (listen_id->state != RDMA_CM_LISTEN)
+	if (READ_ONCE(listen_id->state) != RDMA_CM_LISTEN)
 		goto out;
 
 	/* Create a new RDMA id for the new IW CM ID */
@@ -3357,7 +3357,8 @@ int rdma_set_reuseaddr(struct rdma_cm_id *id, int reuse)
 
 	id_priv = container_of(id, struct rdma_id_private, id);
 	spin_lock_irqsave(&id_priv->lock, flags);
-	if (reuse || id_priv->state == RDMA_CM_IDLE) {
+	if ((reuse && id_priv->state != RDMA_CM_LISTEN) ||
+	    id_priv->state == RDMA_CM_IDLE) {
 		id_priv->reuseaddr = reuse;
 		ret = 0;
 	} else {
@@ -3551,8 +3552,7 @@ static int cma_check_port(struct rdma_bind_list *bind_list,
 		if (id_priv == cur_id)
 			continue;
 
-		if ((cur_id->state != RDMA_CM_LISTEN) && reuseaddr &&
-		    cur_id->reuseaddr)
+		if (reuseaddr && cur_id->reuseaddr)
 			continue;
 
 		cur_addr = cma_src_addr(cur_id);
@@ -3593,18 +3593,6 @@ static int cma_use_port(enum rdma_ucm_port_space ps,
 	return ret;
 }
 
-static int cma_bind_listen(struct rdma_id_private *id_priv)
-{
-	struct rdma_bind_list *bind_list = id_priv->bind_list;
-	int ret = 0;
-
-	mutex_lock(&lock);
-	if (bind_list->owners.first->next)
-		ret = cma_check_port(bind_list, id_priv, 0);
-	mutex_unlock(&lock);
-	return ret;
-}
-
 static enum rdma_ucm_port_space
 cma_select_inet_ps(struct rdma_id_private *id_priv)
 {
@@ -3713,8 +3701,16 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 			return -EINVAL;
 	}
 
+	/*
+	 * Once the ID reaches RDMA_CM_LISTEN it is not allowed to be reusable
+	 * any more, and has to be unique in the bind list.
+	 */
 	if (id_priv->reuseaddr) {
-		ret = cma_bind_listen(id_priv);
+		mutex_lock(&lock);
+		ret = cma_check_port(id_priv->bind_list, id_priv, 0);
+		if (!ret)
+			id_priv->reuseaddr = 0;
+		mutex_unlock(&lock);
 		if (ret)
 			goto err;
 	}
@@ -3739,6 +3735,10 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 	return 0;
 err:
 	id_priv->backlog = 0;
+	/*
+	 * All the failure paths that lead here will not allow the req_handler's
+	 * to have run.
+	 */
 	cma_comp_exch(id_priv, RDMA_CM_LISTEN, RDMA_CM_ADDR_BOUND);
 	return ret;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 4/8] RDMA/cma: Remove cma_comp()
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-09-02  8:11 ` [PATCH rdma-next 3/8] RDMA/cma: Fix locking for the RDMA_CM_LISTEN state Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 5/8] RDMA/cma: Combine cma_ndev_work with cma_work Leon Romanovsky
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

The only place that still uses it is rdma_join_multicast() which is only
doing a sanity check that the caller hasn't done something wrong and
doesn't need the spinlock.

At least in the case of rdma_join_multicast() the information it needs
will remain until the ID is destroyed once it enters these
states. Similarly there is no reason to check for these specific states in
the handler callback, instead use the usual check for a destroyed id under
the handler_mutex.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 27 +++++++--------------------
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1ab779e63762..1d5781f507e9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -404,17 +404,6 @@ struct cma_req_info {
 	u16 pkey;
 };
 
-static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp)
-{
-	unsigned long flags;
-	int ret;
-
-	spin_lock_irqsave(&id_priv->lock, flags);
-	ret = (id_priv->state == comp);
-	spin_unlock_irqrestore(&id_priv->lock, flags);
-	return ret;
-}
-
 static int cma_comp_exch(struct rdma_id_private *id_priv,
 			 enum rdma_cm_state comp, enum rdma_cm_state exch)
 {
@@ -4392,8 +4381,8 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 
 	id_priv = mc->id_priv;
 	mutex_lock(&id_priv->handler_mutex);
-	if (id_priv->state != RDMA_CM_ADDR_BOUND &&
-	    id_priv->state != RDMA_CM_ADDR_RESOLVED)
+	if (READ_ONCE(id_priv->state) == RDMA_CM_DEVICE_REMOVAL ||
+	    READ_ONCE(id_priv->state) == RDMA_CM_DESTROYING)
 		goto out;
 
 	if (!status)
@@ -4659,16 +4648,14 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 			u8 join_state, void *context)
 {
-	struct rdma_id_private *id_priv;
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
 	struct cma_multicast *mc;
 	int ret;
 
-	if (!id->device)
-		return -EINVAL;
-
-	id_priv = container_of(id, struct rdma_id_private, id);
-	if (!cma_comp(id_priv, RDMA_CM_ADDR_BOUND) &&
-	    !cma_comp(id_priv, RDMA_CM_ADDR_RESOLVED))
+	/* ULP is calling this wrong. */
+	if (!id->device || (READ_ONCE(id_priv->state) != RDMA_CM_ADDR_BOUND &&
+			    READ_ONCE(id_priv->state) != RDMA_CM_ADDR_RESOLVED))
 		return -EINVAL;
 
 	mc = kmalloc(sizeof *mc, GFP_KERNEL);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 5/8] RDMA/cma: Combine cma_ndev_work with cma_work
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-09-02  8:11 ` [PATCH rdma-next 4/8] RDMA/cma: Remove cma_comp() Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 6/8] RDMA/cma: Remove dead code for kernel rdmacm multicast Leon Romanovsky
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

These are the same thing, except that cma_ndev_work doesn't have a state
transition. Signal no state transition by setting old_state and new_state
== 0.

In all cases the handler function should not be called once
rdma_destroy_id() has progressed passed setting the state.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 38 +++++++----------------------------
 1 file changed, 7 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1d5781f507e9..088981db10f7 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -363,12 +363,6 @@ struct cma_work {
 	struct rdma_cm_event	event;
 };
 
-struct cma_ndev_work {
-	struct work_struct	work;
-	struct rdma_id_private	*id;
-	struct rdma_cm_event	event;
-};
-
 struct iboe_mcast_work {
 	struct work_struct	 work;
 	struct rdma_id_private	*id;
@@ -2675,32 +2669,14 @@ static void cma_work_handler(struct work_struct *_work)
 	struct rdma_id_private *id_priv = work->id;
 
 	mutex_lock(&id_priv->handler_mutex);
-	if (!cma_comp_exch(id_priv, work->old_state, work->new_state))
+	if (READ_ONCE(id_priv->state) == RDMA_CM_DESTROYING ||
+	    READ_ONCE(id_priv->state) == RDMA_CM_DEVICE_REMOVAL)
 		goto out_unlock;
-
-	if (cma_cm_event_handler(id_priv, &work->event)) {
-		cma_id_put(id_priv);
-		destroy_id_handler_unlock(id_priv);
-		goto out_free;
+	if (work->old_state != 0 || work->new_state != 0) {
+		if (!cma_comp_exch(id_priv, work->old_state, work->new_state))
+			goto out_unlock;
 	}
 
-out_unlock:
-	mutex_unlock(&id_priv->handler_mutex);
-	cma_id_put(id_priv);
-out_free:
-	kfree(work);
-}
-
-static void cma_ndev_work_handler(struct work_struct *_work)
-{
-	struct cma_ndev_work *work = container_of(_work, struct cma_ndev_work, work);
-	struct rdma_id_private *id_priv = work->id;
-
-	mutex_lock(&id_priv->handler_mutex);
-	if (id_priv->state == RDMA_CM_DESTROYING ||
-	    id_priv->state == RDMA_CM_DEVICE_REMOVAL)
-		goto out_unlock;
-
 	if (cma_cm_event_handler(id_priv, &work->event)) {
 		cma_id_put(id_priv);
 		destroy_id_handler_unlock(id_priv);
@@ -4727,7 +4703,7 @@ EXPORT_SYMBOL(rdma_leave_multicast);
 static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id_priv)
 {
 	struct rdma_dev_addr *dev_addr;
-	struct cma_ndev_work *work;
+	struct cma_work *work;
 
 	dev_addr = &id_priv->id.route.addr.dev_addr;
 
@@ -4740,7 +4716,7 @@ static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id
 		if (!work)
 			return -ENOMEM;
 
-		INIT_WORK(&work->work, cma_ndev_work_handler);
+		INIT_WORK(&work->work, cma_work_handler);
 		work->id = id_priv;
 		work->event.event = RDMA_CM_EVENT_ADDR_CHANGE;
 		cma_id_get(id_priv);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 6/8] RDMA/cma: Remove dead code for kernel rdmacm multicast
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-09-02  8:11 ` [PATCH rdma-next 5/8] RDMA/cma: Combine cma_ndev_work with cma_work Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 7/8] RDMA/cma: Consolidate the destruction of a cma_multicast in one place Leon Romanovsky
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

There is no kernel user of RDMA CM multicast so this code managing the
multicast subscription of the kernel-only internal QP is dead. Remove it.

This makes the bug fixes in the next patches much simpler.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 088981db10f7..b9d794ba9f6a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4366,16 +4366,6 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 	else
 		pr_debug_ratelimited("RDMA CM: MULTICAST_ERROR: failed to join multicast. status %d\n",
 				     status);
-	mutex_lock(&id_priv->qp_mutex);
-	if (!status && id_priv->id.qp) {
-		status = ib_attach_mcast(id_priv->id.qp, &multicast->rec.mgid,
-					 be16_to_cpu(multicast->rec.mlid));
-		if (status)
-			pr_debug_ratelimited("RDMA CM: MULTICAST_ERROR: failed to attach QP. status %d\n",
-					     status);
-	}
-	mutex_unlock(&id_priv->qp_mutex);
-
 	event.status = status;
 	event.param.ud.private_data = mc->context;
 	if (!status) {
@@ -4629,6 +4619,10 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 	struct cma_multicast *mc;
 	int ret;
 
+	/* Not supported for kernel QPs */
+	if (WARN_ON(id->qp))
+		return -EINVAL;
+
 	/* ULP is calling this wrong. */
 	if (!id->device || (READ_ONCE(id_priv->state) != RDMA_CM_ADDR_BOUND &&
 			    READ_ONCE(id_priv->state) != RDMA_CM_ADDR_RESOLVED))
@@ -4680,11 +4674,6 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
 			list_del(&mc->list);
 			spin_unlock_irq(&id_priv->lock);
 
-			if (id->qp)
-				ib_detach_mcast(id->qp,
-						&mc->multicast.ib->rec.mgid,
-						be16_to_cpu(mc->multicast.ib->rec.mlid));
-
 			BUG_ON(id_priv->cma_dev->device != id->device);
 
 			if (rdma_cap_ib_mcast(id->device, id->port_num)) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 7/8] RDMA/cma: Consolidate the destruction of a cma_multicast in one place
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
                   ` (5 preceding siblings ...)
  2020-09-02  8:11 ` [PATCH rdma-next 6/8] RDMA/cma: Remove dead code for kernel rdmacm multicast Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-02  8:11 ` [PATCH rdma-next 8/8] RDMA/cma: Fix use after free race in roce multicast join Leon Romanovsky
  2020-09-17 12:33 ` [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Jason Gunthorpe
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma

From: Jason Gunthorpe <jgg@nvidia.com>

Two places were open coding this sequence, and also pull in
cma_leave_roce_mc_group() which was called only once.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 63 +++++++++++++++++------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index b9d794ba9f6a..794b00056f59 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1795,19 +1795,30 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_unlock(&lock);
 }
 
-static void cma_leave_roce_mc_group(struct rdma_id_private *id_priv,
-				    struct cma_multicast *mc)
+static void destroy_mc(struct rdma_id_private *id_priv,
+		       struct cma_multicast *mc)
 {
-	struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
-	struct net_device *ndev = NULL;
+	if (rdma_cap_ib_mcast(id_priv->id.device, id_priv->id.port_num)) {
+		ib_sa_free_multicast(mc->multicast.ib);
+		kfree(mc);
+		return;
+	}
 
-	if (dev_addr->bound_dev_if)
-		ndev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
-	if (ndev) {
-		cma_igmp_send(ndev, &mc->multicast.ib->rec.mgid, false);
-		dev_put(ndev);
+	if (rdma_protocol_roce(id_priv->id.device,
+				      id_priv->id.port_num)) {
+		struct rdma_dev_addr *dev_addr =
+			&id_priv->id.route.addr.dev_addr;
+		struct net_device *ndev = NULL;
+
+		if (dev_addr->bound_dev_if)
+			ndev = dev_get_by_index(dev_addr->net,
+						dev_addr->bound_dev_if);
+		if (ndev) {
+			cma_igmp_send(ndev, &mc->multicast.ib->rec.mgid, false);
+			dev_put(ndev);
+		}
+		kref_put(&mc->mcref, release_mc);
 	}
-	kref_put(&mc->mcref, release_mc);
 }
 
 static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
@@ -1815,16 +1826,10 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
 	struct cma_multicast *mc;
 
 	while (!list_empty(&id_priv->mc_list)) {
-		mc = container_of(id_priv->mc_list.next,
-				  struct cma_multicast, list);
+		mc = list_first_entry(&id_priv->mc_list, struct cma_multicast,
+				      list);
 		list_del(&mc->list);
-		if (rdma_cap_ib_mcast(id_priv->cma_dev->device,
-				      id_priv->id.port_num)) {
-			ib_sa_free_multicast(mc->multicast.ib);
-			kfree(mc);
-		} else {
-			cma_leave_roce_mc_group(id_priv, mc);
-		}
+		destroy_mc(id_priv, mc);
 	}
 }
 
@@ -4670,20 +4675,14 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
 	id_priv = container_of(id, struct rdma_id_private, id);
 	spin_lock_irq(&id_priv->lock);
 	list_for_each_entry(mc, &id_priv->mc_list, list) {
-		if (!memcmp(&mc->addr, addr, rdma_addr_size(addr))) {
-			list_del(&mc->list);
-			spin_unlock_irq(&id_priv->lock);
-
-			BUG_ON(id_priv->cma_dev->device != id->device);
+		if (memcmp(&mc->addr, addr, rdma_addr_size(addr)) != 0)
+			continue;
+		list_del(&mc->list);
+		spin_unlock_irq(&id_priv->lock);
 
-			if (rdma_cap_ib_mcast(id->device, id->port_num)) {
-				ib_sa_free_multicast(mc->multicast.ib);
-				kfree(mc);
-			} else if (rdma_protocol_roce(id->device, id->port_num)) {
-				cma_leave_roce_mc_group(id_priv, mc);
-			}
-			return;
-		}
+		WARN_ON(id_priv->cma_dev->device != id->device);
+		destroy_mc(id_priv, mc);
+		return;
 	}
 	spin_unlock_irq(&id_priv->lock);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-next 8/8] RDMA/cma: Fix use after free race in roce multicast join
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
                   ` (6 preceding siblings ...)
  2020-09-02  8:11 ` [PATCH rdma-next 7/8] RDMA/cma: Consolidate the destruction of a cma_multicast in one place Leon Romanovsky
@ 2020-09-02  8:11 ` Leon Romanovsky
  2020-09-17 12:33 ` [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Jason Gunthorpe
  8 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2020-09-02  8:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Eli Cohen, linux-rdma, Roland Dreier

From: Jason Gunthorpe <jgg@nvidia.com>

The roce path triggers a work queue that continues to touch the id_priv
but doesn't hold any reference on it. Futher, unlike in the IB case, the
work queue is not fenced during rdma_destroy_id().

This can trigger a use after free if a destroy is triggered in the
incredibly narrow window after the queue_work and the work starting and
obtaining the handler_mutex.

The only purpose of this work queue is to run the ULP event callback from
the standard context, so switch the design to use the existing
cma_work_handler() scheme. This simplifies quite a lot of the flow:

- Use the cma_work_handler() callback to launch the work for roce. This
  requires generating the event synchronously inside the
  rdma_join_multicast(), which in turn means the dummy struct
  ib_sa_multicast can become a simple stack variable.

- cm_work_handler() used the id_priv kref, so we can entirely eliminate
  the kref inside struct cma_multicast. Since the cma_multicast never
  leaks into an unprotected work queue the kfree can be done at the same
  time as for IB.

- Eliminating the general multicast.ib requires using cma_set_mgid() in a
  few places to recompute the mgid.

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 196 +++++++++++++++-------------------
 1 file changed, 88 insertions(+), 108 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 794b00056f59..82338099ba09 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -68,6 +68,9 @@ static const char * const cma_events[] = {
 	[RDMA_CM_EVENT_TIMEWAIT_EXIT]	 = "timewait exit",
 };
 
+static void cma_set_mgid(struct rdma_id_private *id_priv, struct sockaddr *addr,
+			 union ib_gid *mgid);
+
 const char *__attribute_const__ rdma_event_msg(enum rdma_cm_event_type event)
 {
 	size_t index = event;
@@ -345,13 +348,10 @@ struct ib_device *cma_get_ib_dev(struct cma_device *cma_dev)
 
 struct cma_multicast {
 	struct rdma_id_private *id_priv;
-	union {
-		struct ib_sa_multicast *ib;
-	} multicast;
+	struct ib_sa_multicast *sa_mc;
 	struct list_head	list;
 	void			*context;
 	struct sockaddr_storage	addr;
-	struct kref		mcref;
 	u8			join_state;
 };
 
@@ -363,12 +363,6 @@ struct cma_work {
 	struct rdma_cm_event	event;
 };
 
-struct iboe_mcast_work {
-	struct work_struct	 work;
-	struct rdma_id_private	*id;
-	struct cma_multicast	*mc;
-};
-
 union cma_ip_addr {
 	struct in6_addr ip6;
 	struct {
@@ -492,14 +486,6 @@ static int cma_attach_to_dev(struct rdma_id_private *id_priv,
 	return 0;
 }
 
-static inline void release_mc(struct kref *kref)
-{
-	struct cma_multicast *mc = container_of(kref, struct cma_multicast, mcref);
-
-	kfree(mc->multicast.ib);
-	kfree(mc);
-}
-
 static void cma_release_dev(struct rdma_id_private *id_priv)
 {
 	mutex_lock(&lock);
@@ -1798,14 +1784,10 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 static void destroy_mc(struct rdma_id_private *id_priv,
 		       struct cma_multicast *mc)
 {
-	if (rdma_cap_ib_mcast(id_priv->id.device, id_priv->id.port_num)) {
-		ib_sa_free_multicast(mc->multicast.ib);
-		kfree(mc);
-		return;
-	}
+	if (rdma_cap_ib_mcast(id_priv->id.device, id_priv->id.port_num))
+		ib_sa_free_multicast(mc->sa_mc);
 
-	if (rdma_protocol_roce(id_priv->id.device,
-				      id_priv->id.port_num)) {
+	if (rdma_protocol_roce(id_priv->id.device, id_priv->id.port_num)) {
 		struct rdma_dev_addr *dev_addr =
 			&id_priv->id.route.addr.dev_addr;
 		struct net_device *ndev = NULL;
@@ -1814,11 +1796,15 @@ static void destroy_mc(struct rdma_id_private *id_priv,
 			ndev = dev_get_by_index(dev_addr->net,
 						dev_addr->bound_dev_if);
 		if (ndev) {
-			cma_igmp_send(ndev, &mc->multicast.ib->rec.mgid, false);
+			union ib_gid mgid;
+
+			cma_set_mgid(id_priv, (struct sockaddr *)&mc->addr,
+				     &mgid);
+			cma_igmp_send(ndev, &mgid, false);
 			dev_put(ndev);
 		}
-		kref_put(&mc->mcref, release_mc);
 	}
+	kfree(mc);
 }
 
 static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
@@ -2692,6 +2678,8 @@ static void cma_work_handler(struct work_struct *_work)
 	mutex_unlock(&id_priv->handler_mutex);
 	cma_id_put(id_priv);
 out_free:
+	if (work->event.event == RDMA_CM_EVENT_MULTICAST_JOIN)
+		rdma_destroy_ah_attr(&work->event.param.ud.ah_attr);
 	kfree(work);
 }
 
@@ -4353,53 +4341,66 @@ int rdma_disconnect(struct rdma_cm_id *id)
 }
 EXPORT_SYMBOL(rdma_disconnect);
 
+static void cma_make_mc_event(int status, struct rdma_id_private *id_priv,
+			      struct ib_sa_multicast *multicast,
+			      struct rdma_cm_event *event,
+			      struct cma_multicast *mc)
+{
+	struct rdma_dev_addr *dev_addr;
+	enum ib_gid_type gid_type;
+	struct net_device *ndev;
+
+	if (!status)
+		status = cma_set_qkey(id_priv, be32_to_cpu(multicast->rec.qkey));
+	else
+		pr_debug_ratelimited("RDMA CM: MULTICAST_ERROR: failed to join multicast. status %d\n",
+				     status);
+
+	event->status = status;
+	event->param.ud.private_data = mc->context;
+	if (status) {
+		event->event = RDMA_CM_EVENT_MULTICAST_ERROR;
+		return;
+	}
+
+	dev_addr = &id_priv->id.route.addr.dev_addr;
+	ndev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
+	gid_type =
+		id_priv->cma_dev
+			->default_gid_type[id_priv->id.port_num -
+					   rdma_start_port(
+						   id_priv->cma_dev->device)];
+
+	event->event = RDMA_CM_EVENT_MULTICAST_JOIN;
+	if (ib_init_ah_from_mcmember(id_priv->id.device, id_priv->id.port_num,
+				     &multicast->rec, ndev, gid_type,
+				     &event->param.ud.ah_attr)) {
+		event->event = RDMA_CM_EVENT_MULTICAST_ERROR;
+		goto out;
+	}
+
+	event->param.ud.qp_num = 0xFFFFFF;
+	event->param.ud.qkey = be32_to_cpu(multicast->rec.qkey);
+
+out:
+	if (ndev)
+		dev_put(ndev);
+}
+
 static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 {
-	struct rdma_id_private *id_priv;
 	struct cma_multicast *mc = multicast->context;
+	struct rdma_id_private *id_priv = mc->id_priv;
 	struct rdma_cm_event event = {};
 	int ret = 0;
 
-	id_priv = mc->id_priv;
 	mutex_lock(&id_priv->handler_mutex);
 	if (READ_ONCE(id_priv->state) == RDMA_CM_DEVICE_REMOVAL ||
 	    READ_ONCE(id_priv->state) == RDMA_CM_DESTROYING)
 		goto out;
 
-	if (!status)
-		status = cma_set_qkey(id_priv, be32_to_cpu(multicast->rec.qkey));
-	else
-		pr_debug_ratelimited("RDMA CM: MULTICAST_ERROR: failed to join multicast. status %d\n",
-				     status);
-	event.status = status;
-	event.param.ud.private_data = mc->context;
-	if (!status) {
-		struct rdma_dev_addr *dev_addr =
-			&id_priv->id.route.addr.dev_addr;
-		struct net_device *ndev =
-			dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
-		enum ib_gid_type gid_type =
-			id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
-			rdma_start_port(id_priv->cma_dev->device)];
-
-		event.event = RDMA_CM_EVENT_MULTICAST_JOIN;
-		ret = ib_init_ah_from_mcmember(id_priv->id.device,
-					       id_priv->id.port_num,
-					       &multicast->rec,
-					       ndev, gid_type,
-					       &event.param.ud.ah_attr);
-		if (ret)
-			event.event = RDMA_CM_EVENT_MULTICAST_ERROR;
-
-		event.param.ud.qp_num = 0xFFFFFF;
-		event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey);
-		if (ndev)
-			dev_put(ndev);
-	} else
-		event.event = RDMA_CM_EVENT_MULTICAST_ERROR;
-
+	cma_make_mc_event(status, id_priv, multicast, &event, mc);
 	ret = cma_cm_event_handler(id_priv, &event);
-
 	rdma_destroy_ah_attr(&event.param.ud.ah_attr);
 	if (ret) {
 		destroy_id_handler_unlock(id_priv);
@@ -4489,23 +4490,10 @@ static int cma_join_ib_multicast(struct rdma_id_private *id_priv,
 			     IB_SA_MCMEMBER_REC_MTU |
 			     IB_SA_MCMEMBER_REC_HOP_LIMIT;
 
-	mc->multicast.ib = ib_sa_join_multicast(&sa_client, id_priv->id.device,
-						id_priv->id.port_num, &rec,
-						comp_mask, GFP_KERNEL,
-						cma_ib_mc_handler, mc);
-	return PTR_ERR_OR_ZERO(mc->multicast.ib);
-}
-
-static void iboe_mcast_work_handler(struct work_struct *work)
-{
-	struct iboe_mcast_work *mw = container_of(work, struct iboe_mcast_work, work);
-	struct cma_multicast *mc = mw->mc;
-	struct ib_sa_multicast *m = mc->multicast.ib;
-
-	mc->multicast.ib->context = mc;
-	cma_ib_mc_handler(0, m);
-	kref_put(&mc->mcref, release_mc);
-	kfree(mw);
+	mc->sa_mc = ib_sa_join_multicast(&sa_client, id_priv->id.device,
+					 id_priv->id.port_num, &rec, comp_mask,
+					 GFP_KERNEL, cma_ib_mc_handler, mc);
+	return PTR_ERR_OR_ZERO(mc->sa_mc);
 }
 
 static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid,
@@ -4540,52 +4528,47 @@ static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid,
 static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 				   struct cma_multicast *mc)
 {
-	struct iboe_mcast_work *work;
+	struct cma_work *work;
 	struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
 	int err = 0;
 	struct sockaddr *addr = (struct sockaddr *)&mc->addr;
 	struct net_device *ndev = NULL;
+	struct ib_sa_multicast ib;
 	enum ib_gid_type gid_type;
 	bool send_only;
 
 	send_only = mc->join_state == BIT(SENDONLY_FULLMEMBER_JOIN);
 
-	if (cma_zero_addr((struct sockaddr *)&mc->addr))
+	if (cma_zero_addr(addr))
 		return -EINVAL;
 
 	work = kzalloc(sizeof *work, GFP_KERNEL);
 	if (!work)
 		return -ENOMEM;
 
-	mc->multicast.ib = kzalloc(sizeof(struct ib_sa_multicast), GFP_KERNEL);
-	if (!mc->multicast.ib) {
-		err = -ENOMEM;
-		goto out1;
-	}
-
 	gid_type = id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
 		   rdma_start_port(id_priv->cma_dev->device)];
-	cma_iboe_set_mgid(addr, &mc->multicast.ib->rec.mgid, gid_type);
+	cma_iboe_set_mgid(addr, &ib.rec.mgid, gid_type);
 
-	mc->multicast.ib->rec.pkey = cpu_to_be16(0xffff);
+	ib.rec.pkey = cpu_to_be16(0xffff);
 	if (id_priv->id.ps == RDMA_PS_UDP)
-		mc->multicast.ib->rec.qkey = cpu_to_be32(RDMA_UDP_QKEY);
+		ib.rec.qkey = cpu_to_be32(RDMA_UDP_QKEY);
 
 	if (dev_addr->bound_dev_if)
 		ndev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
 	if (!ndev) {
 		err = -ENODEV;
-		goto out2;
+		goto err_free;
 	}
-	mc->multicast.ib->rec.rate = iboe_get_rate(ndev);
-	mc->multicast.ib->rec.hop_limit = 1;
-	mc->multicast.ib->rec.mtu = iboe_get_mtu(ndev->mtu);
+	ib.rec.rate = iboe_get_rate(ndev);
+	ib.rec.hop_limit = 1;
+	ib.rec.mtu = iboe_get_mtu(ndev->mtu);
 
 	if (addr->sa_family == AF_INET) {
 		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
-			mc->multicast.ib->rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+			ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
 			if (!send_only) {
-				err = cma_igmp_send(ndev, &mc->multicast.ib->rec.mgid,
+				err = cma_igmp_send(ndev, &ib.rec.mgid,
 						    true);
 			}
 		}
@@ -4594,24 +4577,22 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 			err = -ENOTSUPP;
 	}
 	dev_put(ndev);
-	if (err || !mc->multicast.ib->rec.mtu) {
+	if (err || !ib.rec.mtu) {
 		if (!err)
 			err = -EINVAL;
-		goto out2;
+		goto err_free;
 	}
 	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
-		    &mc->multicast.ib->rec.port_gid);
+		    &ib.rec.port_gid);
 	work->id = id_priv;
-	work->mc = mc;
-	INIT_WORK(&work->work, iboe_mcast_work_handler);
-	kref_get(&mc->mcref);
+	INIT_WORK(&work->work, cma_work_handler);
+	cma_make_mc_event(0, id_priv, &ib, &work->event, mc);
+	/* Balances with cma_id_put() in cma_work_handler */
+	cma_id_get(id_priv);
 	queue_work(cma_wq, &work->work);
-
 	return 0;
 
-out2:
-	kfree(mc->multicast.ib);
-out1:
+err_free:
 	kfree(work);
 	return err;
 }
@@ -4633,7 +4614,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 			    READ_ONCE(id_priv->state) != RDMA_CM_ADDR_RESOLVED))
 		return -EINVAL;
 
-	mc = kmalloc(sizeof *mc, GFP_KERNEL);
+	mc = kzalloc(sizeof(*mc), GFP_KERNEL);
 	if (!mc)
 		return -ENOMEM;
 
@@ -4643,7 +4624,6 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 	mc->join_state = join_state;
 
 	if (rdma_protocol_roce(id->device, id->port_num)) {
-		kref_init(&mc->mcref);
 		ret = cma_iboe_join_multicast(id_priv, mc);
 		if (ret)
 			goto out_err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine
  2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
                   ` (7 preceding siblings ...)
  2020-09-02  8:11 ` [PATCH rdma-next 8/8] RDMA/cma: Fix use after free race in roce multicast join Leon Romanovsky
@ 2020-09-17 12:33 ` Jason Gunthorpe
  8 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-09-17 12:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Eli Cohen, linux-kernel,
	linux-rdma, Roland Dreier

On Wed, Sep 02, 2020 at 11:11:14AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> >From Jason:
> 
> The RDMA CMA continues to attract syzkaller bugs due to its somewhat loose
> operation of its FSM. Audit and scrub the whole thing to follow modern
> expectations.
> 
> Overall the design elements are broadly:
> 
> - The ULP entry points MUST NOT run in parallel with each other. The ULP
>   is solely responsible for preventing this.
> 
> - If the ULP returns !0 from it's event callback it MUST guarentee that no
>   other ULP threads are touching the cm_id or calling into any RDMA CM
>   entry point.
> 
> - ULP entry points can sometimes run conurrently with handler callbacks,
>   although it is tricky because there are many entry points that exist
>   in the flow before the handler is registered.
> 
> - Some ULP entry points are called from the ULP event handler callback,
>   under the handler_mutex. (however ucma never does this)
> 
> - state uses a weird double locking scheme, in most cases one should hold
>   the handler_mutex. (It is somewhat unclear what exactly the spinlock is
>   for)
> 
> - Reading the state without holding the spinlock should use READ_ONCE,
>   even if the handler_mutex is held.
> 
> - There are certain states which are 'stable' under the handler_mutex,
>   exit from that state requires also holding the handler_mutex. This
>   explains why testing the test under only the handler_mutex makes sense.
> 
> Thanks
> 
> Jason Gunthorpe (8):
>   RDMA/cma: Fix locking for the RDMA_CM_CONNECT state
>   RDMA/cma: Make the locking for automatic state transition more clear
>   RDMA/cma: Fix locking for the RDMA_CM_LISTEN state
>   RDMA/cma: Remove cma_comp()
>   RDMA/cma: Combine cma_ndev_work with cma_work
>   RDMA/cma: Remove dead code for kernel rdmacm multicast
>   RDMA/cma: Consolidate the destruction of a cma_multicast in one place
>   RDMA/cma: Fix use after free race in roce multicast join

Applied to for-next

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-17 12:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02  8:11 [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 1/8] RDMA/cma: Fix locking for the RDMA_CM_CONNECT state Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 2/8] RDMA/cma: Make the locking for automatic state transition more clear Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 3/8] RDMA/cma: Fix locking for the RDMA_CM_LISTEN state Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 4/8] RDMA/cma: Remove cma_comp() Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 5/8] RDMA/cma: Combine cma_ndev_work with cma_work Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 6/8] RDMA/cma: Remove dead code for kernel rdmacm multicast Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 7/8] RDMA/cma: Consolidate the destruction of a cma_multicast in one place Leon Romanovsky
2020-09-02  8:11 ` [PATCH rdma-next 8/8] RDMA/cma: Fix use after free race in roce multicast join Leon Romanovsky
2020-09-17 12:33 ` [PATCH rdma-next 0/8] Cleanup and fix the CMA state machine Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.