linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I)
@ 2020-11-04 14:40 Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 1/5] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-04 14:40 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v4:
 * Rebased on latest for-upstream, all that time the patches were in
 our regression and didn't introduce any issues.
 * Took first five patches that hadn't any comments
v3: https://lore.kernel.org/lkml/20200926101938.2964394-1-leon@kernel.org
 * Rebased on already accepted patches.
 * Added mlx4 special QPs to the list of not-tracked QPs (dropped previous mlx4 special QP patch).
 * Separated to two patches change in return value of cma_listen_* routines.
 * Changed commit messages and added Fixes as Jason requested.
v2: https://lore.kernel.org/linux-rdma/20200907122156.478360-1-leon@kernel.org/
 * Added new patch to fix mlx4 failure on SR-IOV, it didn't have port set.
 * Changed "RDMA/cma: Delete from restrack DB after successful destroy" patch.
v1: https://lore.kernel.org/lkml/20200830101436.108487-1-leon@kernel.org
 * Fixed rebase error, deleted second assignment of qp_type.
 * Rebased code on latests rdma-next, the changes in cma.c caused to change
   in patch "RDMA/cma: Delete from restrack DB after successful destroy".
 * Dropped patch of port assignment, it is already done as part of this
   series.
 * I didn't add @calller description, regular users should not use _named() funciton.
v0: https://lore.kernel.org/lkml/20200824104415.1090901-1-leon@kernel.org

----------------------------------------------------------------------------------

Simple resend of already posted series.
https://lore.kernel.org/lkml/20200926101938.2964394-1-leon@kernel.org

Thanks

Leon Romanovsky (5):
  RDMA/core: Allow drivers to disable restrack DB
  RDMA/counter: Combine allocation and bind logic
  RDMA/restrack: Store all special QPs in restrack DB
  RDMA/cma: Add missing error handling of listen_id
  RDMA/cma: Be strict with attaching to CMA device

 drivers/infiniband/core/cma.c       | 217 +++++++++++++++++-----------
 drivers/infiniband/core/core_priv.h |   2 +
 drivers/infiniband/core/counters.c  | 132 ++++++++---------
 drivers/infiniband/core/restrack.c  |  23 ++-
 drivers/infiniband/hw/mlx4/qp.c     |   5 +
 drivers/infiniband/hw/mlx5/qp.c     |   2 +-
 include/rdma/restrack.h             |  24 +++
 7 files changed, 239 insertions(+), 166 deletions(-)

--
2.28.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH rdma-next v4 1/5] RDMA/core: Allow drivers to disable restrack DB
  2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
@ 2020-11-04 14:40 ` Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 2/5] RDMA/counter: Combine allocation and bind logic Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-04 14:40 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

Driver QP types are special case with no IBTA restrictions. For example,
EFA implemented creation of this QP type as regular one, while mlx5
separated create to two step: create and modify. That separation causes
to the situation where DC QP (mlx5) is always added to the same xarray
index zero.

This change allows to drivers like mlx5 simply disable restrack DB
tracking, but it doesn't disable kref on the memory.

Fixes: 52e0a118a203 ("RDMA/restrack: Track driver QP types in resource tracker")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c |  2 +-
 drivers/infiniband/core/restrack.c | 12 ++++++++++--
 drivers/infiniband/hw/mlx4/qp.c    |  5 +++++
 drivers/infiniband/hw/mlx5/qp.c    |  2 +-
 include/rdma/restrack.h            | 24 ++++++++++++++++++++++++
 5 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index e4ff0d3328b6..fa1a4a318fd7 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -275,7 +275,7 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port)
 	struct rdma_counter *counter;
 	int ret;

-	if (!qp->res.valid || rdma_is_kernel_res(&qp->res))
+	if (!rdma_restrack_is_tracked(&qp->res) || rdma_is_kernel_res(&qp->res))
 		return 0;

 	if (!rdma_is_port_valid(dev, port))
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 4aeeaaed0f17..e26a3213f500 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -221,11 +221,14 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 {
 	struct ib_device *dev = res_to_dev(res);
 	struct rdma_restrack_root *rt;
-	int ret;
+	int ret = 0;

 	if (!dev)
 		return;

+	if (res->no_track)
+		goto out;
+
 	rt = &dev->res[res->type];

 	if (res->type == RDMA_RESTRACK_QP) {
@@ -246,6 +249,7 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 				      &rt->next_id, GFP_KERNEL);
 	}

+out:
 	if (!ret)
 		res->valid = true;
 }
@@ -318,6 +322,9 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 		return;
 	}

+	if (res->no_track)
+		goto out;
+
 	dev = res_to_dev(res);
 	if (WARN_ON(!dev))
 		return;
@@ -328,8 +335,9 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 	if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
 		return;
 	WARN_ON(old != res);
-	res->valid = false;

+out:
+	res->valid = false;
 	rdma_restrack_put(res);
 	wait_for_completion(&res->comp);
 }
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 47b9ed5599b3..44534a12819a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1561,6 +1561,11 @@ static int _mlx4_ib_create_qp(struct ib_pd *pd, struct mlx4_ib_qp *qp,
 		if (err)
 			return err;

+		if (init_attr->create_flags &
+		    (MLX4_IB_SRIOV_SQP | MLX4_IB_SRIOV_TUNNEL_QP))
+			/* Internal QP created with ib_create_qp */
+			rdma_restrack_no_track(&qp->ibqp.res);
+
 		qp->port	= init_attr->port_num;
 		qp->ibqp.qp_num = init_attr->qp_type == IB_QPT_SMI ? 0 :
 			init_attr->create_flags & MLX4_IB_QP_CREATE_ROCE_V2_GSI ? sqpn : 1;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index cfabcd4d46b6..9a04a5949d50 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2435,7 +2435,7 @@ static int create_dct(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 	}

 	qp->state = IB_QPS_RESET;
-
+	rdma_restrack_no_track(&qp->ibqp.res);
 	return 0;
 }

diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index d3a1cc5be7bc..05e18839eaff 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -68,6 +68,14 @@ struct rdma_restrack_entry {
 	 * As an example for that, see mlx5 QPs with type MLX5_IB_QPT_HW_GSI
 	 */
 	bool			valid;
+	/**
+	 * @no_track: don't add this entry to restrack DB
+	 *
+	 * This field is used to mark an entry that doesn't need to be added to
+	 * internal restrack DB and presented later to the users at the nldev
+	 * query stage.
+	 */
+	u8			no_track : 1;
 	/*
 	 * @kref: Protect destroy of the resource
 	 */
@@ -145,4 +153,20 @@ int rdma_nl_stat_hwcounter_entry(struct sk_buff *msg, const char *name,
 struct rdma_restrack_entry *rdma_restrack_get_byid(struct ib_device *dev,
 						   enum rdma_restrack_type type,
 						   u32 id);
+
+/**
+ * rdma_restrack_no_track() - don't add resource to the DB
+ * @res: resource entry
+ *
+ * Every user of thie API should be cross examined.
+ * Probaby you don't need to use this function.
+ */
+static inline void rdma_restrack_no_track(struct rdma_restrack_entry *res)
+{
+	res->no_track = true;
+}
+static inline bool rdma_restrack_is_tracked(struct rdma_restrack_entry *res)
+{
+	return !res->no_track;
+}
 #endif /* _RDMA_RESTRACK_H_ */
--
2.28.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH rdma-next v4 2/5] RDMA/counter: Combine allocation and bind logic
  2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 1/5] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
@ 2020-11-04 14:40 ` Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 3/5] RDMA/restrack: Store all special QPs in restrack DB Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-04 14:40 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@nvidia.com>

RDMA counters are allocated and bounded to QP immediately after that.
Only after this two step process they are really usable. By combining
the logic, we are ensuring that once counter is returned to the caller,
it will have everything set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/counters.c | 130 +++++++++++++----------------
 1 file changed, 58 insertions(+), 72 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index fa1a4a318fd7..e208cc6d3cb0 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -64,8 +64,40 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 	return ret;
 }
 
-static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
-					       enum rdma_nl_counter_mode mode)
+static void auto_mode_init_counter(struct rdma_counter *counter,
+				   const struct ib_qp *qp,
+				   enum rdma_nl_counter_mask new_mask)
+{
+	struct auto_mode_param *param = &counter->mode.param;
+
+	counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
+	counter->mode.mask = new_mask;
+
+	if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
+		param->qp_type = qp->qp_type;
+}
+
+static int __rdma_counter_bind_qp(struct rdma_counter *counter,
+				  struct ib_qp *qp)
+{
+	int ret;
+
+	if (qp->counter)
+		return -EINVAL;
+
+	if (!qp->device->ops.counter_bind_qp)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&counter->lock);
+	ret = qp->device->ops.counter_bind_qp(counter, qp);
+	mutex_unlock(&counter->lock);
+
+	return ret;
+}
+
+static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u8 port,
+					   struct ib_qp *qp,
+					   enum rdma_nl_counter_mode mode)
 {
 	struct rdma_port_counter *port_counter;
 	struct rdma_counter *counter;
@@ -88,11 +120,19 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 
 	port_counter = &dev->port_data[port].port_counter;
 	mutex_lock(&port_counter->lock);
-	if (mode == RDMA_COUNTER_MODE_MANUAL) {
+	switch (mode) {
+	case RDMA_COUNTER_MODE_MANUAL:
 		ret = __counter_set_mode(&port_counter->mode,
 					 RDMA_COUNTER_MODE_MANUAL, 0);
 		if (ret)
 			goto err_mode;
+		break;
+	case RDMA_COUNTER_MODE_AUTO:
+		auto_mode_init_counter(counter, qp, port_counter->mode.mask);
+		break;
+	default:
+		ret = -EOPNOTSUPP;
+		goto err_mode;
 	}
 
 	port_counter->num_counters++;
@@ -102,6 +142,12 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 	kref_init(&counter->kref);
 	mutex_init(&counter->lock);
 
+	ret = __rdma_counter_bind_qp(counter, qp);
+	if (ret)
+		goto err_mode;
+
+	rdma_restrack_parent_name(&counter->res, &qp->res);
+	rdma_restrack_add(&counter->res);
 	return counter;
 
 err_mode:
@@ -132,19 +178,6 @@ static void rdma_counter_free(struct rdma_counter *counter)
 	kfree(counter);
 }
 
-static void auto_mode_init_counter(struct rdma_counter *counter,
-				   const struct ib_qp *qp,
-				   enum rdma_nl_counter_mask new_mask)
-{
-	struct auto_mode_param *param = &counter->mode.param;
-
-	counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
-	counter->mode.mask = new_mask;
-
-	if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
-		param->qp_type = qp->qp_type;
-}
-
 static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
 			    enum rdma_nl_counter_mask auto_mask)
 {
@@ -161,24 +194,6 @@ static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
 	return match;
 }
 
-static int __rdma_counter_bind_qp(struct rdma_counter *counter,
-				  struct ib_qp *qp)
-{
-	int ret;
-
-	if (qp->counter)
-		return -EINVAL;
-
-	if (!qp->device->ops.counter_bind_qp)
-		return -EOPNOTSUPP;
-
-	mutex_lock(&counter->lock);
-	ret = qp->device->ops.counter_bind_qp(counter, qp);
-	mutex_unlock(&counter->lock);
-
-	return ret;
-}
-
 static int __rdma_counter_unbind_qp(struct ib_qp *qp)
 {
 	struct rdma_counter *counter = qp->counter;
@@ -247,13 +262,6 @@ static struct rdma_counter *rdma_get_counter_auto_mode(struct ib_qp *qp,
 	return counter;
 }
 
-static void rdma_counter_res_add(struct rdma_counter *counter,
-				 struct ib_qp *qp)
-{
-	rdma_restrack_parent_name(&counter->res, &qp->res);
-	rdma_restrack_add(&counter->res);
-}
-
 static void counter_release(struct kref *kref)
 {
 	struct rdma_counter *counter;
@@ -293,19 +301,9 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port)
 			return ret;
 		}
 	} else {
-		counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_AUTO);
+		counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO);
 		if (!counter)
 			return -ENOMEM;
-
-		auto_mode_init_counter(counter, qp, port_counter->mode.mask);
-
-		ret = __rdma_counter_bind_qp(counter, qp);
-		if (ret) {
-			rdma_counter_free(counter);
-			return ret;
-		}
-
-		rdma_counter_res_add(counter, qp);
 	}
 
 	return 0;
@@ -419,15 +417,6 @@ static struct ib_qp *rdma_counter_get_qp(struct ib_device *dev, u32 qp_num)
 	return NULL;
 }
 
-static int rdma_counter_bind_qp_manual(struct rdma_counter *counter,
-				       struct ib_qp *qp)
-{
-	if ((counter->device != qp->device) || (counter->port != qp->port))
-		return -EINVAL;
-
-	return __rdma_counter_bind_qp(counter, qp);
-}
-
 static struct rdma_counter *rdma_get_counter_by_id(struct ib_device *dev,
 						   u32 counter_id)
 {
@@ -475,7 +464,12 @@ int rdma_counter_bind_qpn(struct ib_device *dev, u8 port,
 		goto err_task;
 	}
 
-	ret = rdma_counter_bind_qp_manual(counter, qp);
+	if ((counter->device != qp->device) || (counter->port != qp->port)) {
+		ret = -EINVAL;
+		goto err_task;
+	}
+
+	ret = __rdma_counter_bind_qp(counter, qp);
 	if (ret)
 		goto err_task;
 
@@ -520,26 +514,18 @@ int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
 		goto err;
 	}
 
-	counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_MANUAL);
+	counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL);
 	if (!counter) {
 		ret = -ENOMEM;
 		goto err;
 	}
 
-	ret = rdma_counter_bind_qp_manual(counter, qp);
-	if (ret)
-		goto err_bind;
-
 	if (counter_id)
 		*counter_id = counter->id;
 
-	rdma_counter_res_add(counter, qp);
-
 	rdma_restrack_put(&qp->res);
-	return ret;
+	return 0;
 
-err_bind:
-	rdma_counter_free(counter);
 err:
 	rdma_restrack_put(&qp->res);
 	return ret;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH rdma-next v4 3/5] RDMA/restrack: Store all special QPs in restrack DB
  2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 1/5] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 2/5] RDMA/counter: Combine allocation and bind logic Leon Romanovsky
@ 2020-11-04 14:40 ` Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 4/5] RDMA/cma: Add missing error handling of listen_id Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-04 14:40 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

Special QPs (SMI and GSI) have different rules in regards of their QP
numbers. While all other QP numbers are unique per-device, the QP0 and QP1
are created per-port as requested by IBTA.

In multiple port devices, the number of SMI and GSI QPs with be equal
to the number ports.

[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.4.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
[leonro@vm ~]$ rdma link
0/1: ibp0s9/1: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP
0/2: ibp0s9/2: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state UNKNOWN physical_state UNKNOWN

Before:
[leonro@mtl-leonro-l-vm ~]$ rdma res show qp type SMI,GSI
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]

After:
[leonro@vm ~]$ rdma res show qp type SMI,GSI
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/2 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/2 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/core_priv.h |  2 ++
 drivers/infiniband/core/restrack.c  | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index e84b0fedaacb..7c4752c47f80 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -347,6 +347,8 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	qp->srq = attr->srq;
 	qp->rwq_ind_tbl = attr->rwq_ind_tbl;
 	qp->event_handler = attr->event_handler;
+	qp->qp_type = attr->qp_type;
+	qp->port = attr->port_num;

 	atomic_set(&qp->usecnt, 0);
 	spin_lock_init(&qp->mr_lock);
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index e26a3213f500..e0a41c867002 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -235,8 +235,15 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 		/* Special case to ensure that LQPN points to right QP */
 		struct ib_qp *qp = container_of(res, struct ib_qp, res);

-		ret = xa_insert(&rt->xa, qp->qp_num, res, GFP_KERNEL);
-		res->id = ret ? 0 : qp->qp_num;
+		WARN_ONCE(qp->qp_num >> 24 || qp->port >> 8,
+			  "QP number 0x%0X and port 0x%0X", qp->qp_num,
+			  qp->port);
+		res->id = qp->qp_num;
+		if (qp->qp_type == IB_QPT_SMI || qp->qp_type == IB_QPT_GSI)
+			res->id |= qp->port << 24;
+		ret = xa_insert(&rt->xa, res->id, res, GFP_KERNEL);
+		if (ret)
+			res->id = 0;
 	} else if (res->type == RDMA_RESTRACK_COUNTER) {
 		/* Special case to ensure that cntn points to right counter */
 		struct rdma_counter *counter;
--
2.28.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH rdma-next v4 4/5] RDMA/cma: Add missing error handling of listen_id
  2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-11-04 14:40 ` [PATCH rdma-next v4 3/5] RDMA/restrack: Store all special QPs in restrack DB Leon Romanovsky
@ 2020-11-04 14:40 ` Leon Romanovsky
  2020-11-04 14:40 ` [PATCH rdma-next v4 5/5] RDMA/cma: Be strict with attaching to CMA device Leon Romanovsky
  2020-11-12 18:59 ` [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Jason Gunthorpe
  5 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-04 14:40 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@nvidia.com>

Don't silently continue if rdma_listen() fails but destroy previously
created CM_ID and return an error to the caller.

Fixes: d02d1f5359e7 ("RDMA/cma: Fix deadlock destroying listen requests")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 175 ++++++++++++++++++++--------------
 1 file changed, 101 insertions(+), 74 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7c2ab1f2fbea..89cc4b0a48ec 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2495,8 +2495,8 @@ static int cma_listen_handler(struct rdma_cm_id *id,
 	return id_priv->id.event_handler(id, event);
 }

-static void cma_listen_on_dev(struct rdma_id_private *id_priv,
-			      struct cma_device *cma_dev)
+static int cma_listen_on_dev(struct rdma_id_private *id_priv,
+			     struct cma_device *cma_dev)
 {
 	struct rdma_id_private *dev_id_priv;
 	struct net *net = id_priv->id.route.addr.dev_addr.net;
@@ -2505,13 +2505,13 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 	lockdep_assert_held(&lock);

 	if (cma_family(id_priv) == AF_IB && !rdma_cap_ib_cm(cma_dev->device, 1))
-		return;
+		return 0;

 	dev_id_priv =
 		__rdma_create_id(net, cma_listen_handler, id_priv,
 				 id_priv->id.ps, id_priv->id.qp_type, id_priv);
 	if (IS_ERR(dev_id_priv))
-		return;
+		return PTR_ERR(dev_id_priv);

 	dev_id_priv->state = RDMA_CM_ADDR_BOUND;
 	memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv),
@@ -2527,19 +2527,34 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,

 	ret = rdma_listen(&dev_id_priv->id, id_priv->backlog);
 	if (ret)
-		dev_warn(&cma_dev->device->dev,
-			 "RDMA CMA: cma_listen_on_dev, error %d\n", ret);
+		goto err_listen;
+	return 0;
+err_listen:
+	list_del(&id_priv->listen_list);
+	dev_warn(&cma_dev->device->dev, "RDMA CMA: %s, error %d\n", __func__, ret);
+	rdma_destroy_id(&dev_id_priv->id);
+	return ret;
 }

-static void cma_listen_on_all(struct rdma_id_private *id_priv)
+static int cma_listen_on_all(struct rdma_id_private *id_priv)
 {
 	struct cma_device *cma_dev;
+	int ret;

 	mutex_lock(&lock);
 	list_add_tail(&id_priv->list, &listen_any_list);
-	list_for_each_entry(cma_dev, &dev_list, list)
-		cma_listen_on_dev(id_priv, cma_dev);
+	list_for_each_entry(cma_dev, &dev_list, list) {
+		ret = cma_listen_on_dev(id_priv, cma_dev);
+		if (ret)
+			goto err_listen;
+	}
+	mutex_unlock(&lock);
+	return 0;
+
+err_listen:
+	list_del(&id_priv->list);
 	mutex_unlock(&lock);
+	return ret;
 }

 void rdma_set_service_type(struct rdma_cm_id *id, int tos)
@@ -3692,8 +3707,11 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 			ret = -ENOSYS;
 			goto err;
 		}
-	} else
-		cma_listen_on_all(id_priv);
+	} else {
+		ret = cma_listen_on_all(id_priv);
+		if (ret)
+			goto err;
+	}

 	return 0;
 err:
@@ -4745,69 +4763,6 @@ static struct notifier_block cma_nb = {
 	.notifier_call = cma_netdev_callback
 };

-static int cma_add_one(struct ib_device *device)
-{
-	struct cma_device *cma_dev;
-	struct rdma_id_private *id_priv;
-	unsigned int i;
-	unsigned long supported_gids = 0;
-	int ret;
-
-	cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
-	if (!cma_dev)
-		return -ENOMEM;
-
-	cma_dev->device = device;
-	cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
-					    sizeof(*cma_dev->default_gid_type),
-					    GFP_KERNEL);
-	if (!cma_dev->default_gid_type) {
-		ret = -ENOMEM;
-		goto free_cma_dev;
-	}
-
-	cma_dev->default_roce_tos = kcalloc(device->phys_port_cnt,
-					    sizeof(*cma_dev->default_roce_tos),
-					    GFP_KERNEL);
-	if (!cma_dev->default_roce_tos) {
-		ret = -ENOMEM;
-		goto free_gid_type;
-	}
-
-	rdma_for_each_port (device, i) {
-		supported_gids = roce_gid_type_mask_support(device, i);
-		WARN_ON(!supported_gids);
-		if (supported_gids & (1 << CMA_PREFERRED_ROCE_GID_TYPE))
-			cma_dev->default_gid_type[i - rdma_start_port(device)] =
-				CMA_PREFERRED_ROCE_GID_TYPE;
-		else
-			cma_dev->default_gid_type[i - rdma_start_port(device)] =
-				find_first_bit(&supported_gids, BITS_PER_LONG);
-		cma_dev->default_roce_tos[i - rdma_start_port(device)] = 0;
-	}
-
-	init_completion(&cma_dev->comp);
-	refcount_set(&cma_dev->refcount, 1);
-	INIT_LIST_HEAD(&cma_dev->id_list);
-	ib_set_client_data(device, &cma_client, cma_dev);
-
-	mutex_lock(&lock);
-	list_add_tail(&cma_dev->list, &dev_list);
-	list_for_each_entry(id_priv, &listen_any_list, list)
-		cma_listen_on_dev(id_priv, cma_dev);
-	mutex_unlock(&lock);
-
-	trace_cm_add_one(device);
-	return 0;
-
-free_gid_type:
-	kfree(cma_dev->default_gid_type);
-
-free_cma_dev:
-	kfree(cma_dev);
-	return ret;
-}
-
 static void cma_send_device_removal_put(struct rdma_id_private *id_priv)
 {
 	struct rdma_cm_event event = { .event = RDMA_CM_EVENT_DEVICE_REMOVAL };
@@ -4870,6 +4825,78 @@ static void cma_process_remove(struct cma_device *cma_dev)
 	wait_for_completion(&cma_dev->comp);
 }

+static int cma_add_one(struct ib_device *device)
+{
+	struct cma_device *cma_dev;
+	struct rdma_id_private *id_priv;
+	unsigned int i;
+	unsigned long supported_gids = 0;
+	int ret;
+
+	cma_dev = kmalloc(sizeof(*cma_dev), GFP_KERNEL);
+	if (!cma_dev)
+		return -ENOMEM;
+
+	cma_dev->device = device;
+	cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
+					    sizeof(*cma_dev->default_gid_type),
+					    GFP_KERNEL);
+	if (!cma_dev->default_gid_type) {
+		ret = -ENOMEM;
+		goto free_cma_dev;
+	}
+
+	cma_dev->default_roce_tos = kcalloc(device->phys_port_cnt,
+					    sizeof(*cma_dev->default_roce_tos),
+					    GFP_KERNEL);
+	if (!cma_dev->default_roce_tos) {
+		ret = -ENOMEM;
+		goto free_gid_type;
+	}
+
+	rdma_for_each_port (device, i) {
+		supported_gids = roce_gid_type_mask_support(device, i);
+		WARN_ON(!supported_gids);
+		if (supported_gids & (1 << CMA_PREFERRED_ROCE_GID_TYPE))
+			cma_dev->default_gid_type[i - rdma_start_port(device)] =
+				CMA_PREFERRED_ROCE_GID_TYPE;
+		else
+			cma_dev->default_gid_type[i - rdma_start_port(device)] =
+				find_first_bit(&supported_gids, BITS_PER_LONG);
+		cma_dev->default_roce_tos[i - rdma_start_port(device)] = 0;
+	}
+
+	init_completion(&cma_dev->comp);
+	refcount_set(&cma_dev->refcount, 1);
+	INIT_LIST_HEAD(&cma_dev->id_list);
+	ib_set_client_data(device, &cma_client, cma_dev);
+
+	mutex_lock(&lock);
+	list_add_tail(&cma_dev->list, &dev_list);
+	list_for_each_entry(id_priv, &listen_any_list, list) {
+		ret = cma_listen_on_dev(id_priv, cma_dev);
+		if (ret)
+			goto free_listen;
+	}
+	mutex_unlock(&lock);
+
+	trace_cm_add_one(device);
+	return 0;
+
+free_listen:
+	list_del(&cma_dev->list);
+	mutex_unlock(&lock);
+
+	cma_process_remove(cma_dev);
+	kfree(cma_dev->default_roce_tos);
+free_gid_type:
+	kfree(cma_dev->default_gid_type);
+
+free_cma_dev:
+	kfree(cma_dev);
+	return ret;
+}
+
 static void cma_remove_one(struct ib_device *device, void *client_data)
 {
 	struct cma_device *cma_dev = client_data;
--
2.28.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH rdma-next v4 5/5] RDMA/cma: Be strict with attaching to CMA device
  2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-11-04 14:40 ` [PATCH rdma-next v4 4/5] RDMA/cma: Add missing error handling of listen_id Leon Romanovsky
@ 2020-11-04 14:40 ` Leon Romanovsky
  2020-11-12 18:59 ` [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Jason Gunthorpe
  5 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-04 14:40 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

The RDMA-CM code wasn't consistent in flows that attached to cma_dev,
this caused to situations where a failure during attach to listen on such
device leaves RDMA-CM in a non-consistent state.

Change the _cma_attach_to_dev to return an error in case of failure,
so listen/attach flows will deal correctly with the failures.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cma.c | 42 +++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 89cc4b0a48ec..290624150c20 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -448,8 +448,8 @@ static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join)
 	return (in_dev) ? 0 : -ENODEV;
 }

-static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
-			       struct cma_device *cma_dev)
+static int _cma_attach_to_dev(struct rdma_id_private *id_priv,
+			      struct cma_device *cma_dev)
 {
 	cma_dev_get(cma_dev);
 	id_priv->cma_dev = cma_dev;
@@ -460,15 +460,22 @@ static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
 	rdma_restrack_add(&id_priv->res);

 	trace_cm_id_attach(id_priv, cma_dev->device);
+	return 0;
 }

-static void cma_attach_to_dev(struct rdma_id_private *id_priv,
-			      struct cma_device *cma_dev)
+static int cma_attach_to_dev(struct rdma_id_private *id_priv,
+			     struct cma_device *cma_dev)
 {
-	_cma_attach_to_dev(id_priv, cma_dev);
+	int ret;
+
+	ret = _cma_attach_to_dev(id_priv, cma_dev);
+	if (ret)
+		return ret;
+
 	id_priv->gid_type =
 		cma_dev->default_gid_type[id_priv->id.port_num -
 					  rdma_start_port(cma_dev->device)];
+	return 0;
 }

 static void cma_release_dev(struct rdma_id_private *id_priv)
@@ -633,8 +640,7 @@ static int cma_acquire_dev_by_src_ip(struct rdma_id_private *id_priv)
 			if (!IS_ERR(sgid_attr)) {
 				id_priv->id.port_num = port;
 				cma_bind_sgid_attr(id_priv, sgid_attr);
-				cma_attach_to_dev(id_priv, cma_dev);
-				ret = 0;
+				ret = cma_attach_to_dev(id_priv, cma_dev);
 				goto out;
 			}
 		}
@@ -663,6 +669,7 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
 	const struct ib_gid_attr *sgid_attr;
 	enum ib_gid_type gid_type;
 	union ib_gid gid;
+	int ret;

 	if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
 	    id_priv->id.ps == RDMA_PS_IPOIB)
@@ -688,9 +695,9 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
 	 * cma_process_remove().
 	 */
 	mutex_lock(&lock);
-	cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
+	ret = cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
 	mutex_unlock(&lock);
-	return 0;
+	return ret;
 }

 static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
@@ -745,7 +752,7 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,

 out:
 	if (!ret)
-		cma_attach_to_dev(id_priv, cma_dev);
+		ret = cma_attach_to_dev(id_priv, cma_dev);

 	mutex_unlock(&lock);
 	return ret;
@@ -762,7 +769,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
 	unsigned int p;
 	u16 pkey, index;
 	enum ib_port_state port_state;
-	int i;
+	int i, ret;

 	cma_dev = NULL;
 	addr = (struct sockaddr_ib *) cma_dst_addr(id_priv);
@@ -805,8 +812,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
 	return -ENODEV;

 found:
-	cma_attach_to_dev(id_priv, cma_dev);
+	ret = cma_attach_to_dev(id_priv, cma_dev);
 	mutex_unlock(&lock);
+	if (ret)
+		return ret;
 	addr = (struct sockaddr_ib *)cma_src_addr(id_priv);
 	memcpy(&addr->sib_addr, &sgid, sizeof(sgid));
 	cma_translate_ib(addr, &id_priv->id.route.addr.dev_addr);
@@ -2517,7 +2526,9 @@ static int cma_listen_on_dev(struct rdma_id_private *id_priv,
 	memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv),
 	       rdma_addr_size(cma_src_addr(id_priv)));

-	_cma_attach_to_dev(dev_id_priv, cma_dev);
+	ret = _cma_attach_to_dev(dev_id_priv, cma_dev);
+	if (ret)
+		goto err_attach;
 	list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
 	cma_id_get(id_priv);
 	dev_id_priv->internal_id = 1;
@@ -2531,6 +2542,7 @@ static int cma_listen_on_dev(struct rdma_id_private *id_priv,
 	return 0;
 err_listen:
 	list_del(&id_priv->listen_list);
+err_attach:
 	dev_warn(&cma_dev->device->dev, "RDMA CMA: %s, error %d\n", __func__, ret);
 	rdma_destroy_id(&dev_id_priv->id);
 	return ret;
@@ -3128,7 +3140,9 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
 	rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
 	ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
 	id_priv->id.port_num = p;
-	cma_attach_to_dev(id_priv, cma_dev);
+	ret = cma_attach_to_dev(id_priv, cma_dev);
+	if (ret)
+		goto out;
 	cma_set_loopback(cma_src_addr(id_priv));
 out:
 	mutex_unlock(&lock);
--
2.28.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I)
  2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-11-04 14:40 ` [PATCH rdma-next v4 5/5] RDMA/cma: Be strict with attaching to CMA device Leon Romanovsky
@ 2020-11-12 18:59 ` Jason Gunthorpe
  2020-11-12 19:23   ` Leon Romanovsky
  5 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2020-11-12 18:59 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Leon Romanovsky, linux-rdma

On Wed, Nov 04, 2020 at 04:40:03PM +0200, Leon Romanovsky wrote:

> Leon Romanovsky (5):
>   RDMA/core: Allow drivers to disable restrack DB

This stuff is never used

>   RDMA/cma: Be strict with attaching to CMA device

This adds a return 0 which is pointless..

>   RDMA/counter: Combine allocation and bind logic
>   RDMA/restrack: Store all special QPs in restrack DB
>   RDMA/cma: Add missing error handling of listen_id

Took these three to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I)
  2020-11-12 18:59 ` [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Jason Gunthorpe
@ 2020-11-12 19:23   ` Leon Romanovsky
  2020-11-12 19:48     ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2020-11-12 19:23 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

On Thu, Nov 12, 2020 at 02:59:51PM -0400, Jason Gunthorpe wrote:
> On Wed, Nov 04, 2020 at 04:40:03PM +0200, Leon Romanovsky wrote:
>
> > Leon Romanovsky (5):
> >   RDMA/core: Allow drivers to disable restrack DB
>
> This stuff is never used

It is in use in mlx4/mlx5 QPs.
https://lore.kernel.org/linux-rdma/20201104144008.3808124-2-leon@kernel.org/
https://lore.kernel.org/linux-rdma/20201104144008.3808124-2-leon@kernel.org/#Z30drivers:infiniband:hw:mlx4:qp.c
https://lore.kernel.org/linux-rdma/20201104144008.3808124-2-leon@kernel.org/#Z30drivers:infiniband:hw:mlx5:qp.c

>
> >   RDMA/cma: Be strict with attaching to CMA device
>
> This adds a return 0 which is pointless..

It will be used after I will resubmit patch "RDMA/restrack: Add error
handling while adding restrack object" and I'm working on it right now.

Can you please add this patch anyway and save me need to do constant
rebases?

Thanks

>
> >   RDMA/counter: Combine allocation and bind logic
> >   RDMA/restrack: Store all special QPs in restrack DB
> >   RDMA/cma: Add missing error handling of listen_id
>
> Took these three to for-next
>
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I)
  2020-11-12 19:23   ` Leon Romanovsky
@ 2020-11-12 19:48     ` Jason Gunthorpe
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2020-11-12 19:48 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, linux-rdma

On Thu, Nov 12, 2020 at 09:23:46PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 12, 2020 at 02:59:51PM -0400, Jason Gunthorpe wrote:
> > On Wed, Nov 04, 2020 at 04:40:03PM +0200, Leon Romanovsky wrote:
> >
> > > Leon Romanovsky (5):
> > >   RDMA/core: Allow drivers to disable restrack DB
> >
> > This stuff is never used
> 
> It is in use in mlx4/mlx5 QPs.
> https://lore.kernel.org/linux-rdma/20201104144008.3808124-2-leon@kernel.org/
> https://lore.kernel.org/linux-rdma/20201104144008.3808124-2-leon@kernel.org/#Z30drivers:infiniband:hw:mlx4:qp.c
> https://lore.kernel.org/linux-rdma/20201104144008.3808124-2-leon@kernel.org/#Z30drivers:infiniband:hw:mlx5:qp.c

Well send it with that series

> > >   RDMA/cma: Be strict with attaching to CMA device
> >
> > This adds a return 0 which is pointless..
> 
> It will be used after I will resubmit patch "RDMA/restrack: Add error
> handling while adding restrack object" and I'm working on it right now.

I think we will end up with add not being able to fail, so I want to
see this with the series that adds a failure

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-12 19:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04 14:40 [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Leon Romanovsky
2020-11-04 14:40 ` [PATCH rdma-next v4 1/5] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
2020-11-04 14:40 ` [PATCH rdma-next v4 2/5] RDMA/counter: Combine allocation and bind logic Leon Romanovsky
2020-11-04 14:40 ` [PATCH rdma-next v4 3/5] RDMA/restrack: Store all special QPs in restrack DB Leon Romanovsky
2020-11-04 14:40 ` [PATCH rdma-next v4 4/5] RDMA/cma: Add missing error handling of listen_id Leon Romanovsky
2020-11-04 14:40 ` [PATCH rdma-next v4 5/5] RDMA/cma: Be strict with attaching to CMA device Leon Romanovsky
2020-11-12 18:59 ` [PATCH rdma-next v4 0/5] Track memory allocation with restrack DB help (Part I) Jason Gunthorpe
2020-11-12 19:23   ` Leon Romanovsky
2020-11-12 19:48     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).