All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help
@ 2020-09-26 10:19 Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 1/9] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Ariel Levkovich, Gal Pressman, Leon Romanovsky,
	linux-kernel, linux-rdma, Mark Zhang, Sean Hefty, Yishai Hadas

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v3:
 * Rebased on already accepted patches.
 * Added mlx4 special QPs to the list of not-tracked QPs (dropped previous mlx4 special QP patch).
 * Separated to two patches change in return value of cma_listen_* routines.
 * Changed commit messages and added Fixes as Jason requested.
v2: https://lore.kernel.org/linux-rdma/20200907122156.478360-1-leon@kernel.org/
 * Added new patch to fix mlx4 failure on SR-IOV, it didn't have port set.
 * Changed "RDMA/cma: Delete from restrack DB after successful destroy" patch.
v1: https://lore.kernel.org/lkml/20200830101436.108487-1-leon@kernel.org
 * Fixed rebase error, deleted second assignment of qp_type.
 * Rebased code on latests rdma-next, the changes in cma.c caused to change
   in patch "RDMA/cma: Delete from restrack DB after successful destroy".
 * Dropped patch of port assignment, it is already done as part of this
   series.
 * I didn't add @calller description, regular users should not use _named() funciton.
v0: https://lore.kernel.org/lkml/20200824104415.1090901-1-leon@kernel.org

----------------------------------------------------------------------------------
Hi,

The resource tracker has built-in kref counter to synchronize object
release. It makes restrack perfect choice to be responsible for the
memory lifetime of any object in which restrack entry is embedded.

In order to make it, the restrack was changed to be mandatory and all
callers of rdma_restrack_add() started to rely on result returned from
that call. Being mandatory means that all objects specific to restrack
type must be tracked.

Before this series, the restrack and rdmatool were aid tools in debug
session of user space applications, this caused to some of the
functionality to be left behind, like support XRC QPs, device memory MRs
and QP0/QP1 in multi-port devices.

This series fixes all mentioned above without extending rdmatool at all.

Thanks

Leon Romanovsky (9):
  RDMA/core: Allow drivers to disable restrack DB
  RDMA/counter: Combine allocation and bind logic
  RDMA/restrack: Store all special QPs in restrack DB
  RDMA/cma: Add missing error handling of listen_id
  RDMA/cma: Be strict with attaching to CMA device
  RDMA/restrack: Add error handling while adding restrack object
  RDMA/restrack: Support all QP types
  RDMA/core: Track device memory MRs
  RDMA/restrack: Drop valid restrack field as source of ambiguity

 drivers/infiniband/core/cma.c                 | 226 +++++++++++-------
 drivers/infiniband/core/core_priv.h           |  32 ++-
 drivers/infiniband/core/counters.c            | 167 ++++++-------
 drivers/infiniband/core/cq.c                  |  17 +-
 drivers/infiniband/core/rdma_core.c           |   3 +-
 drivers/infiniband/core/restrack.c            |  64 ++---
 drivers/infiniband/core/restrack.h            |   2 +-
 drivers/infiniband/core/uverbs_cmd.c          |  31 ++-
 drivers/infiniband/core/uverbs_std_types_cq.c |   6 +-
 drivers/infiniband/core/uverbs_std_types_mr.c |  10 +
 drivers/infiniband/core/uverbs_std_types_qp.c |   4 +-
 drivers/infiniband/core/verbs.c               |  74 ++++--
 drivers/infiniband/hw/mlx4/qp.c               |   5 +
 drivers/infiniband/hw/mlx5/qp.c               |   2 +-
 include/rdma/ib_verbs.h                       |  10 +-
 include/rdma/restrack.h                       |  27 ++-
 16 files changed, 416 insertions(+), 264 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 1/9] RDMA/core: Allow drivers to disable restrack DB
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 2/9] RDMA/counter: Combine allocation and bind logic Leon Romanovsky
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Gal Pressman, linux-rdma, Yishai Hadas

From: Leon Romanovsky <leonro@mellanox.com>

Driver QP types are special case with no IBTA restrictions. For example,
EFA implemented creation of this QP type as regular one, while mlx5
separated create to two step: create and modify. That separation causes
to the situation where DC QP (mlx5) is always added to the same xarray
index zero.

This change allows to drivers like mlx5 simply disable restrack DB
tracking, but it doesn't disable kref on the memory.

Fixes: 52e0a118a203 ("RDMA/restrack: Track driver QP types in resource tracker")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/counters.c |  2 +-
 drivers/infiniband/core/restrack.c | 12 ++++++++++--
 drivers/infiniband/hw/mlx4/qp.c    |  5 +++++
 drivers/infiniband/hw/mlx5/qp.c    |  2 +-
 include/rdma/restrack.h            | 24 ++++++++++++++++++++++++
 5 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index e4ff0d3328b6..fa1a4a318fd7 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -275,7 +275,7 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port)
 	struct rdma_counter *counter;
 	int ret;
 
-	if (!qp->res.valid || rdma_is_kernel_res(&qp->res))
+	if (!rdma_restrack_is_tracked(&qp->res) || rdma_is_kernel_res(&qp->res))
 		return 0;
 
 	if (!rdma_is_port_valid(dev, port))
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 4aeeaaed0f17..e26a3213f500 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -221,11 +221,14 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 {
 	struct ib_device *dev = res_to_dev(res);
 	struct rdma_restrack_root *rt;
-	int ret;
+	int ret = 0;
 
 	if (!dev)
 		return;
 
+	if (res->no_track)
+		goto out;
+
 	rt = &dev->res[res->type];
 
 	if (res->type == RDMA_RESTRACK_QP) {
@@ -246,6 +249,7 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 				      &rt->next_id, GFP_KERNEL);
 	}
 
+out:
 	if (!ret)
 		res->valid = true;
 }
@@ -318,6 +322,9 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 		return;
 	}
 
+	if (res->no_track)
+		goto out;
+
 	dev = res_to_dev(res);
 	if (WARN_ON(!dev))
 		return;
@@ -328,8 +335,9 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 	if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
 		return;
 	WARN_ON(old != res);
-	res->valid = false;
 
+out:
+	res->valid = false;
 	rdma_restrack_put(res);
 	wait_for_completion(&res->comp);
 }
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 0cb1a6e2fed4..a52d1e9e97d9 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1627,6 +1627,11 @@ static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd,
 		if (err)
 			return ERR_PTR(err);
 
+		if (init_attr->create_flags &
+		    (MLX4_IB_SRIOV_SQP | MLX4_IB_SRIOV_TUNNEL_QP))
+			/* Internal QP created with ib_create_qp */
+			rdma_restrack_no_track(&qp->ibqp.res);
+
 		qp->port	= init_attr->port_num;
 		qp->ibqp.qp_num = init_attr->qp_type == IB_QPT_SMI ? 0 :
 			init_attr->create_flags & MLX4_IB_QP_CREATE_ROCE_V2_GSI ? sqpn : 1;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 55e278750b59..e8dd32d6a7ab 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2436,7 +2436,7 @@ static int create_dct(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 	}
 
 	qp->state = IB_QPS_RESET;
-
+	rdma_restrack_no_track(&qp->ibqp.res);
 	return 0;
 }
 
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index d3a1cc5be7bc..05e18839eaff 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -68,6 +68,14 @@ struct rdma_restrack_entry {
 	 * As an example for that, see mlx5 QPs with type MLX5_IB_QPT_HW_GSI
 	 */
 	bool			valid;
+	/**
+	 * @no_track: don't add this entry to restrack DB
+	 *
+	 * This field is used to mark an entry that doesn't need to be added to
+	 * internal restrack DB and presented later to the users at the nldev
+	 * query stage.
+	 */
+	u8			no_track : 1;
 	/*
 	 * @kref: Protect destroy of the resource
 	 */
@@ -145,4 +153,20 @@ int rdma_nl_stat_hwcounter_entry(struct sk_buff *msg, const char *name,
 struct rdma_restrack_entry *rdma_restrack_get_byid(struct ib_device *dev,
 						   enum rdma_restrack_type type,
 						   u32 id);
+
+/**
+ * rdma_restrack_no_track() - don't add resource to the DB
+ * @res: resource entry
+ *
+ * Every user of thie API should be cross examined.
+ * Probaby you don't need to use this function.
+ */
+static inline void rdma_restrack_no_track(struct rdma_restrack_entry *res)
+{
+	res->no_track = true;
+}
+static inline bool rdma_restrack_is_tracked(struct rdma_restrack_entry *res)
+{
+	return !res->no_track;
+}
 #endif /* _RDMA_RESTRACK_H_ */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 2/9] RDMA/counter: Combine allocation and bind logic
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 1/9] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 3/9] RDMA/restrack: Store all special QPs in restrack DB Leon Romanovsky
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@nvidia.com>

RDMA counters are allocated and bounded to QP immediately after that.
Only after this two step process they are really usable. By combining
the logic, we are ensuring that once counter is returned to the caller,
it will have everything set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/counters.c | 130 +++++++++++++----------------
 1 file changed, 58 insertions(+), 72 deletions(-)

diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index fa1a4a318fd7..e208cc6d3cb0 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -64,8 +64,40 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
 	return ret;
 }
 
-static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
-					       enum rdma_nl_counter_mode mode)
+static void auto_mode_init_counter(struct rdma_counter *counter,
+				   const struct ib_qp *qp,
+				   enum rdma_nl_counter_mask new_mask)
+{
+	struct auto_mode_param *param = &counter->mode.param;
+
+	counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
+	counter->mode.mask = new_mask;
+
+	if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
+		param->qp_type = qp->qp_type;
+}
+
+static int __rdma_counter_bind_qp(struct rdma_counter *counter,
+				  struct ib_qp *qp)
+{
+	int ret;
+
+	if (qp->counter)
+		return -EINVAL;
+
+	if (!qp->device->ops.counter_bind_qp)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&counter->lock);
+	ret = qp->device->ops.counter_bind_qp(counter, qp);
+	mutex_unlock(&counter->lock);
+
+	return ret;
+}
+
+static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u8 port,
+					   struct ib_qp *qp,
+					   enum rdma_nl_counter_mode mode)
 {
 	struct rdma_port_counter *port_counter;
 	struct rdma_counter *counter;
@@ -88,11 +120,19 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 
 	port_counter = &dev->port_data[port].port_counter;
 	mutex_lock(&port_counter->lock);
-	if (mode == RDMA_COUNTER_MODE_MANUAL) {
+	switch (mode) {
+	case RDMA_COUNTER_MODE_MANUAL:
 		ret = __counter_set_mode(&port_counter->mode,
 					 RDMA_COUNTER_MODE_MANUAL, 0);
 		if (ret)
 			goto err_mode;
+		break;
+	case RDMA_COUNTER_MODE_AUTO:
+		auto_mode_init_counter(counter, qp, port_counter->mode.mask);
+		break;
+	default:
+		ret = -EOPNOTSUPP;
+		goto err_mode;
 	}
 
 	port_counter->num_counters++;
@@ -102,6 +142,12 @@ static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
 	kref_init(&counter->kref);
 	mutex_init(&counter->lock);
 
+	ret = __rdma_counter_bind_qp(counter, qp);
+	if (ret)
+		goto err_mode;
+
+	rdma_restrack_parent_name(&counter->res, &qp->res);
+	rdma_restrack_add(&counter->res);
 	return counter;
 
 err_mode:
@@ -132,19 +178,6 @@ static void rdma_counter_free(struct rdma_counter *counter)
 	kfree(counter);
 }
 
-static void auto_mode_init_counter(struct rdma_counter *counter,
-				   const struct ib_qp *qp,
-				   enum rdma_nl_counter_mask new_mask)
-{
-	struct auto_mode_param *param = &counter->mode.param;
-
-	counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
-	counter->mode.mask = new_mask;
-
-	if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
-		param->qp_type = qp->qp_type;
-}
-
 static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
 			    enum rdma_nl_counter_mask auto_mask)
 {
@@ -161,24 +194,6 @@ static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
 	return match;
 }
 
-static int __rdma_counter_bind_qp(struct rdma_counter *counter,
-				  struct ib_qp *qp)
-{
-	int ret;
-
-	if (qp->counter)
-		return -EINVAL;
-
-	if (!qp->device->ops.counter_bind_qp)
-		return -EOPNOTSUPP;
-
-	mutex_lock(&counter->lock);
-	ret = qp->device->ops.counter_bind_qp(counter, qp);
-	mutex_unlock(&counter->lock);
-
-	return ret;
-}
-
 static int __rdma_counter_unbind_qp(struct ib_qp *qp)
 {
 	struct rdma_counter *counter = qp->counter;
@@ -247,13 +262,6 @@ static struct rdma_counter *rdma_get_counter_auto_mode(struct ib_qp *qp,
 	return counter;
 }
 
-static void rdma_counter_res_add(struct rdma_counter *counter,
-				 struct ib_qp *qp)
-{
-	rdma_restrack_parent_name(&counter->res, &qp->res);
-	rdma_restrack_add(&counter->res);
-}
-
 static void counter_release(struct kref *kref)
 {
 	struct rdma_counter *counter;
@@ -293,19 +301,9 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u8 port)
 			return ret;
 		}
 	} else {
-		counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_AUTO);
+		counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO);
 		if (!counter)
 			return -ENOMEM;
-
-		auto_mode_init_counter(counter, qp, port_counter->mode.mask);
-
-		ret = __rdma_counter_bind_qp(counter, qp);
-		if (ret) {
-			rdma_counter_free(counter);
-			return ret;
-		}
-
-		rdma_counter_res_add(counter, qp);
 	}
 
 	return 0;
@@ -419,15 +417,6 @@ static struct ib_qp *rdma_counter_get_qp(struct ib_device *dev, u32 qp_num)
 	return NULL;
 }
 
-static int rdma_counter_bind_qp_manual(struct rdma_counter *counter,
-				       struct ib_qp *qp)
-{
-	if ((counter->device != qp->device) || (counter->port != qp->port))
-		return -EINVAL;
-
-	return __rdma_counter_bind_qp(counter, qp);
-}
-
 static struct rdma_counter *rdma_get_counter_by_id(struct ib_device *dev,
 						   u32 counter_id)
 {
@@ -475,7 +464,12 @@ int rdma_counter_bind_qpn(struct ib_device *dev, u8 port,
 		goto err_task;
 	}
 
-	ret = rdma_counter_bind_qp_manual(counter, qp);
+	if ((counter->device != qp->device) || (counter->port != qp->port)) {
+		ret = -EINVAL;
+		goto err_task;
+	}
+
+	ret = __rdma_counter_bind_qp(counter, qp);
 	if (ret)
 		goto err_task;
 
@@ -520,26 +514,18 @@ int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u8 port,
 		goto err;
 	}
 
-	counter = rdma_counter_alloc(dev, port, RDMA_COUNTER_MODE_MANUAL);
+	counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL);
 	if (!counter) {
 		ret = -ENOMEM;
 		goto err;
 	}
 
-	ret = rdma_counter_bind_qp_manual(counter, qp);
-	if (ret)
-		goto err_bind;
-
 	if (counter_id)
 		*counter_id = counter->id;
 
-	rdma_counter_res_add(counter, qp);
-
 	rdma_restrack_put(&qp->res);
-	return ret;
+	return 0;
 
-err_bind:
-	rdma_counter_free(counter);
 err:
 	rdma_restrack_put(&qp->res);
 	return ret;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 3/9] RDMA/restrack: Store all special QPs in restrack DB
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 1/9] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 2/9] RDMA/counter: Combine allocation and bind logic Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 4/9] RDMA/cma: Add missing error handling of listen_id Leon Romanovsky
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

Special QPs (SMI and GSI) have different rules in regards of their QP
numbers. While all other QP numbers are unique per-device, the QP0 and QP1
are created per-port as requested by IBTA.

In multiple port devices, the number of SMI and GSI QPs with be equal
to the number ports.

[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.4.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
[leonro@vm ~]$ rdma link
0/1: ibp0s9/1: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP
0/2: ibp0s9/2: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state UNKNOWN physical_state UNKNOWN

Before:
[leonro@mtl-leonro-l-vm ~]$ rdma res show qp type SMI,GSI
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]

After:
[leonro@vm ~]$ rdma res show qp type SMI,GSI
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/2 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/2 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/core_priv.h |  2 ++
 drivers/infiniband/core/restrack.c  | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index e84b0fedaacb..7c4752c47f80 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -347,6 +347,8 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	qp->srq = attr->srq;
 	qp->rwq_ind_tbl = attr->rwq_ind_tbl;
 	qp->event_handler = attr->event_handler;
+	qp->qp_type = attr->qp_type;
+	qp->port = attr->port_num;
 
 	atomic_set(&qp->usecnt, 0);
 	spin_lock_init(&qp->mr_lock);
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index e26a3213f500..e0a41c867002 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -235,8 +235,15 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 		/* Special case to ensure that LQPN points to right QP */
 		struct ib_qp *qp = container_of(res, struct ib_qp, res);
 
-		ret = xa_insert(&rt->xa, qp->qp_num, res, GFP_KERNEL);
-		res->id = ret ? 0 : qp->qp_num;
+		WARN_ONCE(qp->qp_num >> 24 || qp->port >> 8,
+			  "QP number 0x%0X and port 0x%0X", qp->qp_num,
+			  qp->port);
+		res->id = qp->qp_num;
+		if (qp->qp_type == IB_QPT_SMI || qp->qp_type == IB_QPT_GSI)
+			res->id |= qp->port << 24;
+		ret = xa_insert(&rt->xa, res->id, res, GFP_KERNEL);
+		if (ret)
+			res->id = 0;
 	} else if (res->type == RDMA_RESTRACK_COUNTER) {
 		/* Special case to ensure that cntn points to right counter */
 		struct rdma_counter *counter;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 4/9] RDMA/cma: Add missing error handling of listen_id
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-09-26 10:19 ` [PATCH rdma-next v3 3/9] RDMA/restrack: Store all special QPs in restrack DB Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 5/9] RDMA/cma: Be strict with attaching to CMA device Leon Romanovsky
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma, Sean Hefty

From: Leon Romanovsky <leonro@nvidia.com>

Don't silently continue if rdma_listen() fails but destroy previously
created CM_ID and return an error to the caller.

Fixes: d02d1f5359e7 ("RDMA/cma: Fix deadlock destroying listen requests")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 175 ++++++++++++++++++++--------------
 1 file changed, 101 insertions(+), 74 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6419b798cd2e..84c7f2f60a6a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2491,8 +2491,8 @@ static int cma_listen_handler(struct rdma_cm_id *id,
 	return id_priv->id.event_handler(id, event);
 }

-static void cma_listen_on_dev(struct rdma_id_private *id_priv,
-			      struct cma_device *cma_dev)
+static int cma_listen_on_dev(struct rdma_id_private *id_priv,
+			     struct cma_device *cma_dev)
 {
 	struct rdma_id_private *dev_id_priv;
 	struct net *net = id_priv->id.route.addr.dev_addr.net;
@@ -2501,13 +2501,13 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 	lockdep_assert_held(&lock);

 	if (cma_family(id_priv) == AF_IB && !rdma_cap_ib_cm(cma_dev->device, 1))
-		return;
+		return 0;

 	dev_id_priv =
 		__rdma_create_id(net, cma_listen_handler, id_priv,
 				 id_priv->id.ps, id_priv->id.qp_type, id_priv);
 	if (IS_ERR(dev_id_priv))
-		return;
+		return PTR_ERR(dev_id_priv);

 	dev_id_priv->state = RDMA_CM_ADDR_BOUND;
 	memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv),
@@ -2523,19 +2523,34 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,

 	ret = rdma_listen(&dev_id_priv->id, id_priv->backlog);
 	if (ret)
-		dev_warn(&cma_dev->device->dev,
-			 "RDMA CMA: cma_listen_on_dev, error %d\n", ret);
+		goto err_listen;
+	return 0;
+err_listen:
+	list_del(&id_priv->listen_list);
+	dev_warn(&cma_dev->device->dev, "RDMA CMA: %s, error %d\n", __func__, ret);
+	rdma_destroy_id(&dev_id_priv->id);
+	return ret;
 }

-static void cma_listen_on_all(struct rdma_id_private *id_priv)
+static int cma_listen_on_all(struct rdma_id_private *id_priv)
 {
 	struct cma_device *cma_dev;
+	int ret;

 	mutex_lock(&lock);
 	list_add_tail(&id_priv->list, &listen_any_list);
-	list_for_each_entry(cma_dev, &dev_list, list)
-		cma_listen_on_dev(id_priv, cma_dev);
+	list_for_each_entry(cma_dev, &dev_list, list) {
+		ret = cma_listen_on_dev(id_priv, cma_dev);
+		if (ret)
+			goto err_listen;
+	}
+	mutex_unlock(&lock);
+	return 0;
+
+err_listen:
+	list_del(&id_priv->list);
 	mutex_unlock(&lock);
+	return ret;
 }

 void rdma_set_service_type(struct rdma_cm_id *id, int tos)
@@ -3685,8 +3700,11 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 			ret = -ENOSYS;
 			goto err;
 		}
-	} else
-		cma_listen_on_all(id_priv);
+	} else {
+		ret = cma_listen_on_all(id_priv);
+		if (ret)
+			goto err;
+	}

 	return 0;
 err:
@@ -4738,69 +4756,6 @@ static struct notifier_block cma_nb = {
 	.notifier_call = cma_netdev_callback
 };

-static int cma_add_one(struct ib_device *device)
-{
-	struct cma_device *cma_dev;
-	struct rdma_id_private *id_priv;
-	unsigned int i;
-	unsigned long supported_gids = 0;
-	int ret;
-
-	cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
-	if (!cma_dev)
-		return -ENOMEM;
-
-	cma_dev->device = device;
-	cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
-					    sizeof(*cma_dev->default_gid_type),
-					    GFP_KERNEL);
-	if (!cma_dev->default_gid_type) {
-		ret = -ENOMEM;
-		goto free_cma_dev;
-	}
-
-	cma_dev->default_roce_tos = kcalloc(device->phys_port_cnt,
-					    sizeof(*cma_dev->default_roce_tos),
-					    GFP_KERNEL);
-	if (!cma_dev->default_roce_tos) {
-		ret = -ENOMEM;
-		goto free_gid_type;
-	}
-
-	rdma_for_each_port (device, i) {
-		supported_gids = roce_gid_type_mask_support(device, i);
-		WARN_ON(!supported_gids);
-		if (supported_gids & (1 << CMA_PREFERRED_ROCE_GID_TYPE))
-			cma_dev->default_gid_type[i - rdma_start_port(device)] =
-				CMA_PREFERRED_ROCE_GID_TYPE;
-		else
-			cma_dev->default_gid_type[i - rdma_start_port(device)] =
-				find_first_bit(&supported_gids, BITS_PER_LONG);
-		cma_dev->default_roce_tos[i - rdma_start_port(device)] = 0;
-	}
-
-	init_completion(&cma_dev->comp);
-	refcount_set(&cma_dev->refcount, 1);
-	INIT_LIST_HEAD(&cma_dev->id_list);
-	ib_set_client_data(device, &cma_client, cma_dev);
-
-	mutex_lock(&lock);
-	list_add_tail(&cma_dev->list, &dev_list);
-	list_for_each_entry(id_priv, &listen_any_list, list)
-		cma_listen_on_dev(id_priv, cma_dev);
-	mutex_unlock(&lock);
-
-	trace_cm_add_one(device);
-	return 0;
-
-free_gid_type:
-	kfree(cma_dev->default_gid_type);
-
-free_cma_dev:
-	kfree(cma_dev);
-	return ret;
-}
-
 static void cma_send_device_removal_put(struct rdma_id_private *id_priv)
 {
 	struct rdma_cm_event event = { .event = RDMA_CM_EVENT_DEVICE_REMOVAL };
@@ -4863,6 +4818,78 @@ static void cma_process_remove(struct cma_device *cma_dev)
 	wait_for_completion(&cma_dev->comp);
 }

+static int cma_add_one(struct ib_device *device)
+{
+	struct cma_device *cma_dev;
+	struct rdma_id_private *id_priv;
+	unsigned int i;
+	unsigned long supported_gids = 0;
+	int ret;
+
+	cma_dev = kmalloc(sizeof(*cma_dev), GFP_KERNEL);
+	if (!cma_dev)
+		return -ENOMEM;
+
+	cma_dev->device = device;
+	cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
+					    sizeof(*cma_dev->default_gid_type),
+					    GFP_KERNEL);
+	if (!cma_dev->default_gid_type) {
+		ret = -ENOMEM;
+		goto free_cma_dev;
+	}
+
+	cma_dev->default_roce_tos = kcalloc(device->phys_port_cnt,
+					    sizeof(*cma_dev->default_roce_tos),
+					    GFP_KERNEL);
+	if (!cma_dev->default_roce_tos) {
+		ret = -ENOMEM;
+		goto free_gid_type;
+	}
+
+	rdma_for_each_port (device, i) {
+		supported_gids = roce_gid_type_mask_support(device, i);
+		WARN_ON(!supported_gids);
+		if (supported_gids & (1 << CMA_PREFERRED_ROCE_GID_TYPE))
+			cma_dev->default_gid_type[i - rdma_start_port(device)] =
+				CMA_PREFERRED_ROCE_GID_TYPE;
+		else
+			cma_dev->default_gid_type[i - rdma_start_port(device)] =
+				find_first_bit(&supported_gids, BITS_PER_LONG);
+		cma_dev->default_roce_tos[i - rdma_start_port(device)] = 0;
+	}
+
+	init_completion(&cma_dev->comp);
+	refcount_set(&cma_dev->refcount, 1);
+	INIT_LIST_HEAD(&cma_dev->id_list);
+	ib_set_client_data(device, &cma_client, cma_dev);
+
+	mutex_lock(&lock);
+	list_add_tail(&cma_dev->list, &dev_list);
+	list_for_each_entry(id_priv, &listen_any_list, list) {
+		ret = cma_listen_on_dev(id_priv, cma_dev);
+		if (ret)
+			goto free_listen;
+	}
+	mutex_unlock(&lock);
+
+	trace_cm_add_one(device);
+	return 0;
+
+free_listen:
+	list_del(&cma_dev->list);
+	mutex_unlock(&lock);
+
+	cma_process_remove(cma_dev);
+	kfree(cma_dev->default_roce_tos);
+free_gid_type:
+	kfree(cma_dev->default_gid_type);
+
+free_cma_dev:
+	kfree(cma_dev);
+	return ret;
+}
+
 static void cma_remove_one(struct ib_device *device, void *client_data)
 {
 	struct cma_device *cma_dev = client_data;
--
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 5/9] RDMA/cma: Be strict with attaching to CMA device
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-09-26 10:19 ` [PATCH rdma-next v3 4/9] RDMA/cma: Add missing error handling of listen_id Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object Leon Romanovsky
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

The RDMA-CM code wasn't consistent in flows that attached to cma_dev,
this caused to situations where a failure during attach to listen on such
device leaves RDMA-CM in a non-consistent state.

Change the _cma_attach_to_dev to return an error in case of failure,
so listen/attach flows will deal correctly with the failures.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cma.c | 42 +++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 84c7f2f60a6a..9d32572c514a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -444,8 +444,8 @@ static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join)
 	return (in_dev) ? 0 : -ENODEV;
 }
 
-static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
-			       struct cma_device *cma_dev)
+static int _cma_attach_to_dev(struct rdma_id_private *id_priv,
+			      struct cma_device *cma_dev)
 {
 	cma_dev_get(cma_dev);
 	id_priv->cma_dev = cma_dev;
@@ -456,15 +456,22 @@ static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
 	rdma_restrack_add(&id_priv->res);
 
 	trace_cm_id_attach(id_priv, cma_dev->device);
+	return 0;
 }
 
-static void cma_attach_to_dev(struct rdma_id_private *id_priv,
-			      struct cma_device *cma_dev)
+static int cma_attach_to_dev(struct rdma_id_private *id_priv,
+			     struct cma_device *cma_dev)
 {
-	_cma_attach_to_dev(id_priv, cma_dev);
+	int ret;
+
+	ret = _cma_attach_to_dev(id_priv, cma_dev);
+	if (ret)
+		return ret;
+
 	id_priv->gid_type =
 		cma_dev->default_gid_type[id_priv->id.port_num -
 					  rdma_start_port(cma_dev->device)];
+	return 0;
 }
 
 static void cma_release_dev(struct rdma_id_private *id_priv)
@@ -629,8 +636,7 @@ static int cma_acquire_dev_by_src_ip(struct rdma_id_private *id_priv)
 			if (!IS_ERR(sgid_attr)) {
 				id_priv->id.port_num = port;
 				cma_bind_sgid_attr(id_priv, sgid_attr);
-				cma_attach_to_dev(id_priv, cma_dev);
-				ret = 0;
+				ret = cma_attach_to_dev(id_priv, cma_dev);
 				goto out;
 			}
 		}
@@ -659,6 +665,7 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
 	const struct ib_gid_attr *sgid_attr;
 	enum ib_gid_type gid_type;
 	union ib_gid gid;
+	int ret;
 
 	if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
 	    id_priv->id.ps == RDMA_PS_IPOIB)
@@ -684,9 +691,9 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
 	 * cma_process_remove().
 	 */
 	mutex_lock(&lock);
-	cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
+	ret = cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
 	mutex_unlock(&lock);
-	return 0;
+	return ret;
 }
 
 static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
@@ -741,7 +748,7 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
 
 out:
 	if (!ret)
-		cma_attach_to_dev(id_priv, cma_dev);
+		ret = cma_attach_to_dev(id_priv, cma_dev);
 
 	mutex_unlock(&lock);
 	return ret;
@@ -758,7 +765,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
 	unsigned int p;
 	u16 pkey, index;
 	enum ib_port_state port_state;
-	int i;
+	int i, ret;
 
 	cma_dev = NULL;
 	addr = (struct sockaddr_ib *) cma_dst_addr(id_priv);
@@ -801,8 +808,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
 	return -ENODEV;
 
 found:
-	cma_attach_to_dev(id_priv, cma_dev);
+	ret = cma_attach_to_dev(id_priv, cma_dev);
 	mutex_unlock(&lock);
+	if (ret)
+		return ret;
 	addr = (struct sockaddr_ib *)cma_src_addr(id_priv);
 	memcpy(&addr->sib_addr, &sgid, sizeof(sgid));
 	cma_translate_ib(addr, &id_priv->id.route.addr.dev_addr);
@@ -2513,7 +2522,9 @@ static int cma_listen_on_dev(struct rdma_id_private *id_priv,
 	memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv),
 	       rdma_addr_size(cma_src_addr(id_priv)));
 
-	_cma_attach_to_dev(dev_id_priv, cma_dev);
+	ret = _cma_attach_to_dev(dev_id_priv, cma_dev);
+	if (ret)
+		goto err_attach;
 	list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
 	cma_id_get(id_priv);
 	dev_id_priv->internal_id = 1;
@@ -2527,6 +2538,7 @@ static int cma_listen_on_dev(struct rdma_id_private *id_priv,
 	return 0;
 err_listen:
 	list_del(&id_priv->listen_list);
+err_attach:
 	dev_warn(&cma_dev->device->dev, "RDMA CMA: %s, error %d\n", __func__, ret);
 	rdma_destroy_id(&dev_id_priv->id);
 	return ret;
@@ -3121,7 +3133,9 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
 	rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
 	ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
 	id_priv->id.port_num = p;
-	cma_attach_to_dev(id_priv, cma_dev);
+	ret = cma_attach_to_dev(id_priv, cma_dev);
+	if (ret)
+		goto out;
 	cma_set_loopback(cma_src_addr(id_priv));
 out:
 	mutex_unlock(&lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-09-26 10:19 ` [PATCH rdma-next v3 5/9] RDMA/cma: Be strict with attaching to CMA device Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-10-02 12:42   ` Jason Gunthorpe
  2020-10-02 13:13   ` Jason Gunthorpe
  2020-09-26 10:19 ` [PATCH rdma-next v3 7/9] RDMA/restrack: Support all QP types Leon Romanovsky
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma, Mark Zhang

From: Leon Romanovsky <leonro@mellanox.com>

Restrack stores the IB objects in the internal xarray and insertion
where can fail. As long as restrack was helper tool for the
debuggability, such failure were ignored.

In the following patches, the ib_core will be changed to manage allocated
IB objects in restrack DB for the proper memory lifetime. It requires to
ensure that insertion errors are not ignored, because the ib_core will
add only valid entries to that DB.

Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cma.c                 |  9 ++-
 drivers/infiniband/core/core_priv.h           |  9 ++-
 drivers/infiniband/core/counters.c            | 37 ++++++-----
 drivers/infiniband/core/cq.c                  | 17 ++++-
 drivers/infiniband/core/rdma_core.c           |  3 +-
 drivers/infiniband/core/restrack.c            | 26 +++++---
 drivers/infiniband/core/restrack.h            |  2 +-
 drivers/infiniband/core/uverbs_cmd.c          | 27 ++++++--
 drivers/infiniband/core/uverbs_std_types_cq.c |  6 +-
 drivers/infiniband/core/verbs.c               | 63 ++++++++++++++-----
 10 files changed, 147 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9d32572c514a..4649fb95c12e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -447,13 +447,20 @@ static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join)
 static int _cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
+	int ret;
+
 	cma_dev_get(cma_dev);
 	id_priv->cma_dev = cma_dev;
 	id_priv->id.device = cma_dev->device;
 	id_priv->id.route.addr.dev_addr.transport =
 		rdma_node_get_transport(cma_dev->device->node_type);
 	list_add_tail(&id_priv->list, &cma_dev->id_list);
-	rdma_restrack_add(&id_priv->res);
+	ret = rdma_restrack_add(&id_priv->res);
+	if (ret) {
+		list_del(&cma_dev->id_list);
+		cma_dev_put(cma_dev);
+		return ret;
+	}
 
 	trace_cm_id_attach(id_priv, cma_dev->device);
 	return 0;
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 7c4752c47f80..7f4bb09efab8 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -327,6 +327,7 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	enum ib_qp_type qp_type = attr->qp_type;
 	struct ib_qp *qp;
 	bool is_xrc;
+	int ret;
 
 	if (!dev->ops.create_qp)
 		return ERR_PTR(-EOPNOTSUPP);
@@ -364,9 +365,15 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	is_xrc = qp_type == IB_QPT_XRC_INI || qp_type == IB_QPT_XRC_TGT;
 	if ((qp_type < IB_QPT_MAX && !is_xrc) || qp_type == IB_QPT_DRIVER) {
 		rdma_restrack_parent_name(&qp->res, &pd->res);
-		rdma_restrack_add(&qp->res);
+		ret = rdma_restrack_add(&qp->res);
+		if (ret)
+			goto err;
 	}
 	return qp;
+err:
+	rdma_restrack_put(&qp->res);
+	dev->ops.destroy_qp(qp, udata);
+	return ERR_PTR(ret);
 }
 
 struct rdma_dev_addr;
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index e208cc6d3cb0..4a0607872e54 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -95,6 +95,21 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
 	return ret;
 }
 
+static int __rdma_counter_unbind_qp(struct ib_qp *qp)
+{
+	struct rdma_counter *counter = qp->counter;
+	int ret;
+
+	if (!qp->device->ops.counter_unbind_qp)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&counter->lock);
+	ret = qp->device->ops.counter_unbind_qp(qp);
+	mutex_unlock(&counter->lock);
+
+	return ret;
+}
+
 static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u8 port,
 					   struct ib_qp *qp,
 					   enum rdma_nl_counter_mode mode)
@@ -147,9 +162,14 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u8 port,
 		goto err_mode;
 
 	rdma_restrack_parent_name(&counter->res, &qp->res);
-	rdma_restrack_add(&counter->res);
+	ret = rdma_restrack_add(&counter->res);
+	if (ret)
+		goto err_restrack;
+
 	return counter;
 
+err_restrack:
+	__rdma_counter_unbind_qp(qp);
 err_mode:
 	mutex_unlock(&port_counter->lock);
 	kfree(counter->stats);
@@ -194,21 +214,6 @@ static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
 	return match;
 }
 
-static int __rdma_counter_unbind_qp(struct ib_qp *qp)
-{
-	struct rdma_counter *counter = qp->counter;
-	int ret;
-
-	if (!qp->device->ops.counter_unbind_qp)
-		return -EOPNOTSUPP;
-
-	mutex_lock(&counter->lock);
-	ret = qp->device->ops.counter_unbind_qp(qp);
-	mutex_unlock(&counter->lock);
-
-	return ret;
-}
-
 static void counter_history_stat_update(struct rdma_counter *counter)
 {
 	struct ib_device *dev = counter->device;
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index 12ebacf52958..1abcb01d362f 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
 		goto out_destroy_cq;
 	}
 
-	rdma_restrack_add(&cq->res);
+	ret = rdma_restrack_add(&cq->res);
+	if (ret)
+		goto out_poll_cq;
+
 	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
 	return cq;
 
+out_poll_cq:
+	switch (cq->poll_ctx) {
+	case IB_POLL_SOFTIRQ:
+		irq_poll_disable(&cq->iop);
+		break;
+	case IB_POLL_WORKQUEUE:
+	case IB_POLL_UNBOUND_WORKQUEUE:
+		cancel_work_sync(&cq->work);
+		break;
+	default:
+		break;
+	}
 out_destroy_cq:
 	rdma_dim_destroy(cq);
 	cq->device->ops.destroy_cq(cq, NULL);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index d2b5417a4d51..492ee941ecdf 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -831,10 +831,9 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
 	ib_rdmacg_uncharge(&ucontext->cg_obj, ib_dev,
 			   RDMACG_RESOURCE_HCA_HANDLE);
 
-	rdma_restrack_del(&ucontext->res);
-
 	ib_dev->ops.dealloc_ucontext(ucontext);
 	WARN_ON(!xa_empty(&ucontext->mmap_xa));
+	rdma_restrack_del(&ucontext->res);
 	kfree(ucontext);
 
 	ufile->ucontext = NULL;
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index e0a41c867002..593af32d86a0 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -217,21 +217,22 @@ EXPORT_SYMBOL(rdma_restrack_new);
  * rdma_restrack_add() - add object to the reource tracking database
  * @res:  resource entry
  */
-void rdma_restrack_add(struct rdma_restrack_entry *res)
+int __must_check rdma_restrack_add(struct rdma_restrack_entry *res)
 {
 	struct ib_device *dev = res_to_dev(res);
 	struct rdma_restrack_root *rt;
 	int ret = 0;
 
 	if (!dev)
-		return;
+		return -ENODEV;
 
 	if (res->no_track)
 		goto out;
 
 	rt = &dev->res[res->type];
 
-	if (res->type == RDMA_RESTRACK_QP) {
+	switch (res->type) {
+	case RDMA_RESTRACK_QP: {
 		/* Special case to ensure that LQPN points to right QP */
 		struct ib_qp *qp = container_of(res, struct ib_qp, res);
 
@@ -244,21 +245,26 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
 		ret = xa_insert(&rt->xa, res->id, res, GFP_KERNEL);
 		if (ret)
 			res->id = 0;
-	} else if (res->type == RDMA_RESTRACK_COUNTER) {
+	} break;
+	case RDMA_RESTRACK_COUNTER: {
 		/* Special case to ensure that cntn points to right counter */
-		struct rdma_counter *counter;
+		struct rdma_counter *counter = container_of(res, struct rdma_counter, res);
 
-		counter = container_of(res, struct rdma_counter, res);
 		ret = xa_insert(&rt->xa, counter->id, res, GFP_KERNEL);
-		res->id = ret ? 0 : counter->id;
-	} else {
+		if (ret)
+			break;
+		res->id = counter->id;
+	} break;
+	default:
 		ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
 				      &rt->next_id, GFP_KERNEL);
 	}
 
 out:
-	if (!ret)
-		res->valid = true;
+	if (ret)
+		return ret;
+	res->valid = true;
+	return 0;
 }
 EXPORT_SYMBOL(rdma_restrack_add);
 
diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
index 6a04fc41f738..d450f5d51022 100644
--- a/drivers/infiniband/core/restrack.h
+++ b/drivers/infiniband/core/restrack.h
@@ -25,7 +25,7 @@ struct rdma_restrack_root {
 
 int rdma_restrack_init(struct ib_device *dev);
 void rdma_restrack_clean(struct ib_device *dev);
-void rdma_restrack_add(struct rdma_restrack_entry *res);
+int __must_check rdma_restrack_add(struct rdma_restrack_entry *res);
 void rdma_restrack_del(struct rdma_restrack_entry *res);
 void rdma_restrack_new(struct rdma_restrack_entry *res,
 		       enum rdma_restrack_type type);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7b8d6b3409d5..1ba961034514 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -252,7 +252,9 @@ int ib_init_ucontext(struct uverbs_attr_bundle *attrs)
 	if (ret)
 		goto err_uncharge;
 
-	rdma_restrack_add(&ucontext->res);
+	ret = rdma_restrack_add(&ucontext->res);
+	if (ret)
+		goto err_restrack;
 
 	/*
 	 * Make sure that ib_uverbs_get_ucontext() sees the pointer update
@@ -264,6 +266,8 @@ int ib_init_ucontext(struct uverbs_attr_bundle *attrs)
 	up_read(&file->hw_destroy_rwsem);
 	return 0;
 
+err_restrack:
+	ucontext->device->ops.dealloc_ucontext(ucontext);
 err_uncharge:
 	ib_rdmacg_uncharge(&ucontext->cg_obj, ucontext->device,
 			   RDMACG_RESOURCE_HCA_HANDLE);
@@ -449,7 +453,10 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
 	ret = ib_dev->ops.alloc_pd(pd, &attrs->driver_udata);
 	if (ret)
 		goto err_alloc;
-	rdma_restrack_add(&pd->res);
+
+	ret = rdma_restrack_add(&pd->res);
+	if (ret)
+		goto err_restrack;
 
 	uobj->object = pd;
 	uobj_finalize_uobj_create(uobj, attrs);
@@ -457,6 +464,9 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
 	resp.pd_handle = uobj->id;
 	return uverbs_response(attrs, &resp, sizeof(resp));
 
+err_restrack:
+	ib_dev->ops.dealloc_pd(pd, &attrs->driver_udata);
+
 err_alloc:
 	rdma_restrack_put(&pd->res);
 	kfree(pd);
@@ -752,7 +762,9 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 
 	rdma_restrack_new(&mr->res, RDMA_RESTRACK_MR);
 	rdma_restrack_set_name(&mr->res, NULL);
-	rdma_restrack_add(&mr->res);
+	ret = rdma_restrack_add(&mr->res);
+	if (ret)
+		goto err_restrack;
 
 	uobj->object = mr;
 	uobj_put_obj_read(pd);
@@ -763,6 +775,8 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	resp.mr_handle = uobj->id;
 	return uverbs_response(attrs, &resp, sizeof(resp));
 
+err_restrack:
+	pd->device->ops.dereg_mr(mr, &attrs->driver_udata);
 err_put:
 	uobj_put_obj_read(pd);
 err_free:
@@ -1017,7 +1031,10 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	ret = ib_dev->ops.create_cq(cq, &attr, &attrs->driver_udata);
 	if (ret)
 		goto err_free;
-	rdma_restrack_add(&cq->res);
+
+	ret = rdma_restrack_add(&cq->res);
+	if (ret)
+		goto err_restrack;
 
 	obj->uevent.uobject.object = cq;
 	obj->uevent.event_file = READ_ONCE(attrs->ufile->default_async_file);
@@ -1030,6 +1047,8 @@ static int create_cq(struct uverbs_attr_bundle *attrs,
 	resp.response_length = uverbs_response_length(attrs, sizeof(resp));
 	return uverbs_response(attrs, &resp, sizeof(resp));
 
+err_restrack:
+	ib_dev->ops.destroy_cq(cq, &attrs->driver_udata);
 err_free:
 	rdma_restrack_put(&cq->res);
 	kfree(cq);
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index 8dabd05988b2..838201f5e921 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -131,16 +131,20 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
 	ret = ib_dev->ops.create_cq(cq, &attr, &attrs->driver_udata);
 	if (ret)
 		goto err_free;
+	ret = rdma_restrack_add(&cq->res);
+	if (ret)
+		goto err_restrack;
 
 	obj->uevent.uobject.object = cq;
 	obj->uevent.uobject.user_handle = user_handle;
-	rdma_restrack_add(&cq->res);
 	uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_CQ_HANDLE);
 
 	ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_CQ_RESP_CQE, &cq->cqe,
 			     sizeof(cq->cqe));
 	return ret;
 
+err_restrack:
+	ib_dev->ops.destroy_cq(cq, &attrs->driver_udata);
 err_free:
 	rdma_restrack_put(&cq->res);
 	kfree(cq);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 53dd8284260a..3b9003941e59 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -276,12 +276,12 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 	rdma_restrack_set_name(&pd->res, caller);
 
 	ret = device->ops.alloc_pd(pd, NULL);
-	if (ret) {
-		rdma_restrack_put(&pd->res);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
-	rdma_restrack_add(&pd->res);
+	if (ret)
+		goto err_alloc;
+
+	ret = rdma_restrack_add(&pd->res);
+	if (ret)
+		goto err_restrack;
 
 	if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
 		pd->local_dma_lkey = device->local_dma_lkey;
@@ -318,6 +318,13 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 	}
 
 	return pd;
+
+err_restrack:
+	device->ops.dealloc_pd(pd, NULL);
+err_alloc:
+	rdma_restrack_put(&pd->res);
+	kfree(pd);
+	return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(__ib_alloc_pd);
 
@@ -2002,14 +2009,21 @@ struct ib_cq *__ib_create_cq(struct ib_device *device,
 	rdma_restrack_set_name(&cq->res, caller);
 
 	ret = device->ops.create_cq(cq, cq_attr, NULL);
-	if (ret) {
-		rdma_restrack_put(&cq->res);
-		kfree(cq);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto err_create;
+
+	ret = rdma_restrack_add(&cq->res);
+	if (ret)
+		goto err_restrack;
 
-	rdma_restrack_add(&cq->res);
 	return cq;
+
+err_restrack:
+	device->ops.destroy_cq(cq, NULL);
+err_create:
+	rdma_restrack_put(&cq->res);
+	kfree(cq);
+	return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(__ib_create_cq);
 
@@ -2060,6 +2074,7 @@ struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			     u64 virt_addr, int access_flags)
 {
 	struct ib_mr *mr;
+	int ret;
 
 	if (access_flags & IB_ACCESS_ON_DEMAND) {
 		if (!(pd->device->attrs.device_cap_flags &
@@ -2082,7 +2097,12 @@ struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	rdma_restrack_new(&mr->res, RDMA_RESTRACK_MR);
 	rdma_restrack_parent_name(&mr->res, &pd->res);
-	rdma_restrack_add(&mr->res);
+	ret = rdma_restrack_add(&mr->res);
+	if (ret) {
+		rdma_restrack_put(&mr->res);
+		pd->device->ops.dereg_mr(mr, NULL);
+		mr = ERR_PTR(ret);
+	}
 
 	return mr;
 }
@@ -2139,6 +2159,7 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
 			  u32 max_num_sg)
 {
 	struct ib_mr *mr;
+	int ret;
 
 	if (!pd->device->ops.alloc_mr) {
 		mr = ERR_PTR(-EOPNOTSUPP);
@@ -2166,7 +2187,13 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
 
 	rdma_restrack_new(&mr->res, RDMA_RESTRACK_MR);
 	rdma_restrack_parent_name(&mr->res, &pd->res);
-	rdma_restrack_add(&mr->res);
+	ret = rdma_restrack_add(&mr->res);
+	if (ret) {
+		rdma_restrack_put(&mr->res);
+		pd->device->ops.dereg_mr(mr, NULL);
+		mr = ERR_PTR(ret);
+	}
+
 out:
 	trace_mr_alloc(pd, mr_type, max_num_sg, mr);
 	return mr;
@@ -2191,6 +2218,7 @@ struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd,
 {
 	struct ib_mr *mr;
 	struct ib_sig_attrs *sig_attrs;
+	int ret;
 
 	if (!pd->device->ops.alloc_mr_integrity ||
 	    !pd->device->ops.map_mr_sg_pi) {
@@ -2227,7 +2255,12 @@ struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd,
 
 	rdma_restrack_new(&mr->res, RDMA_RESTRACK_MR);
 	rdma_restrack_parent_name(&mr->res, &pd->res);
-	rdma_restrack_add(&mr->res);
+	ret = rdma_restrack_add(&mr->res);
+	if (ret) {
+		rdma_restrack_put(&mr->res);
+		pd->device->ops.dereg_mr(mr, NULL);
+		mr = ERR_PTR(ret);
+	}
 out:
 	trace_mr_integ_alloc(pd, max_num_data_sg, max_num_meta_sg, mr);
 	return mr;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 7/9] RDMA/restrack: Support all QP types
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
                   ` (5 preceding siblings ...)
  2020-09-26 10:19 ` [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 8/9] RDMA/core: Track device memory MRs Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity Leon Romanovsky
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma, Mark Zhang

From: Leon Romanovsky <leonro@mellanox.com>

The latest changes in restrack name handling allowed to simplify
the QP creation code to support all types of QPs.

For example XRC QP are presented with inbox rdmatool.

[leonro@vm ~]$ ibv_xsrq_pingpong &
[leonro@vm ~]$ rdma res show qp
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib]
link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs]
link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong

Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/core_priv.h           | 29 ++++++-------------
 drivers/infiniband/core/uverbs_cmd.c          |  4 +--
 drivers/infiniband/core/uverbs_std_types_qp.c |  4 +--
 drivers/infiniband/core/verbs.c               | 11 +++----
 include/rdma/ib_verbs.h                       | 10 +++++--
 5 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 7f4bb09efab8..3962a54f9178 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -318,15 +318,12 @@ struct ib_device *ib_device_get_by_index(const struct net *net, u32 index);
 void nldev_init(void);
 void nldev_exit(void);
 
-static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
-					  struct ib_pd *pd,
-					  struct ib_qp_init_attr *attr,
-					  struct ib_udata *udata,
-					  struct ib_uqp_object *uobj)
+static inline struct ib_qp *
+_ib_create_qp(struct ib_device *dev, struct ib_pd *pd,
+	      struct ib_qp_init_attr *attr, struct ib_udata *udata,
+	      struct ib_uqp_object *uobj, const char *caller)
 {
-	enum ib_qp_type qp_type = attr->qp_type;
 	struct ib_qp *qp;
-	bool is_xrc;
 	int ret;
 
 	if (!dev->ops.create_qp)
@@ -348,7 +345,6 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	qp->srq = attr->srq;
 	qp->rwq_ind_tbl = attr->rwq_ind_tbl;
 	qp->event_handler = attr->event_handler;
-	qp->qp_type = attr->qp_type;
 	qp->port = attr->port_num;
 
 	atomic_set(&qp->usecnt, 0);
@@ -357,18 +353,11 @@ static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	INIT_LIST_HEAD(&qp->sig_mrs);
 
 	rdma_restrack_new(&qp->res, RDMA_RESTRACK_QP);
-	/*
-	 * We don't track XRC QPs for now, because they don't have PD
-	 * and more importantly they are created internaly by driver,
-	 * see mlx5 create_dev_resources() as an example.
-	 */
-	is_xrc = qp_type == IB_QPT_XRC_INI || qp_type == IB_QPT_XRC_TGT;
-	if ((qp_type < IB_QPT_MAX && !is_xrc) || qp_type == IB_QPT_DRIVER) {
-		rdma_restrack_parent_name(&qp->res, &pd->res);
-		ret = rdma_restrack_add(&qp->res);
-		if (ret)
-			goto err;
-	}
+	WARN_ONCE(!udata && !caller, "Missing kernel QP owner");
+	rdma_restrack_set_name(&qp->res, udata ? NULL : caller);
+	ret = rdma_restrack_add(&qp->res);
+	if (ret)
+		goto err;
 	return qp;
 err:
 	rdma_restrack_put(&qp->res);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 1ba961034514..408a1a4b67f6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1407,8 +1407,8 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
 	if (cmd->qp_type == IB_QPT_XRC_TGT)
 		qp = ib_create_qp(pd, &attr);
 	else
-		qp = _ib_create_qp(device, pd, &attr, &attrs->driver_udata,
-				   obj);
+		qp = _ib_create_qp(device, pd, &attr, &attrs->driver_udata, obj,
+				   NULL);
 
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
diff --git a/drivers/infiniband/core/uverbs_std_types_qp.c b/drivers/infiniband/core/uverbs_std_types_qp.c
index 3bf8dcdfe7eb..198c3cd6f8b9 100644
--- a/drivers/infiniband/core/uverbs_std_types_qp.c
+++ b/drivers/infiniband/core/uverbs_std_types_qp.c
@@ -251,8 +251,8 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
 	if (attr.qp_type == IB_QPT_XRC_TGT)
 		qp = ib_create_qp(pd, &attr);
 	else
-		qp = _ib_create_qp(device, pd, &attr, &attrs->driver_udata,
-				   obj);
+		qp = _ib_create_qp(device, pd, &attr, &attrs->driver_udata, obj,
+				   NULL);
 
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 3b9003941e59..f21353f3957d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1195,7 +1195,7 @@ static struct ib_qp *create_xrc_qp_user(struct ib_qp *qp,
 }
 
 /**
- * ib_create_qp - Creates a kernel QP associated with the specified protection
+ * ib_create_named_qp - Creates a kernel QP associated with the specified protection
  *   domain.
  * @pd: The protection domain associated with the QP.
  * @qp_init_attr: A list of initial attributes required to create the
@@ -1204,8 +1204,9 @@ static struct ib_qp *create_xrc_qp_user(struct ib_qp *qp,
  *
  * NOTE: for user qp use ib_create_qp_user with valid udata!
  */
-struct ib_qp *ib_create_qp(struct ib_pd *pd,
-			   struct ib_qp_init_attr *qp_init_attr)
+struct ib_qp *ib_create_named_qp(struct ib_pd *pd,
+				 struct ib_qp_init_attr *qp_init_attr,
+				 const char *caller)
 {
 	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
 	struct ib_qp *qp;
@@ -1230,7 +1231,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	if (qp_init_attr->cap.max_rdma_ctxs)
 		rdma_rw_init_qp(device, qp_init_attr);
 
-	qp = _ib_create_qp(device, pd, qp_init_attr, NULL, NULL);
+	qp = _ib_create_qp(device, pd, qp_init_attr, NULL, NULL, caller);
 	if (IS_ERR(qp))
 		return qp;
 
@@ -1296,7 +1297,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	return ERR_PTR(ret);
 
 }
-EXPORT_SYMBOL(ib_create_qp);
+EXPORT_SYMBOL(ib_create_named_qp);
 
 static const struct {
 	int			valid;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f18502984e6f..0396c6b979a1 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3657,8 +3657,14 @@ static inline int ib_post_srq_recv(struct ib_srq *srq,
 					      bad_recv_wr ? : &dummy);
 }
 
-struct ib_qp *ib_create_qp(struct ib_pd *pd,
-			   struct ib_qp_init_attr *qp_init_attr);
+struct ib_qp *ib_create_named_qp(struct ib_pd *pd,
+				 struct ib_qp_init_attr *qp_init_attr,
+				 const char *caller);
+static inline struct ib_qp *ib_create_qp(struct ib_pd *pd,
+					 struct ib_qp_init_attr *init_attr)
+{
+	return ib_create_named_qp(pd, init_attr, KBUILD_MODNAME);
+}
 
 /**
  * ib_modify_qp_with_udata - Modifies the attributes for the specified QP.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 8/9] RDMA/core: Track device memory MRs
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
                   ` (6 preceding siblings ...)
  2020-09-26 10:19 ` [PATCH rdma-next v3 7/9] RDMA/restrack: Support all QP types Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-09-26 10:19 ` [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity Leon Romanovsky
  8 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Ariel Levkovich, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

Device memory (DM) are registered as MR during initialization flow,
these MRs were not tracked by resource tracker and had res->valid set
as a false. Update the code to manage them too.

Before this change:
[leonro@vm ~]$ ibv_rc_pingpong -j &
[leonro@vm ~]$ rdma res show mr <-- shows nothing

After this change:
[leonro@vm ~]$ ibv_rc_pingpong -j &
[leonro@vm ~]$ rdma res show mr
dev ibp0s9 mrn 0 mrlen 4096 pdn 3 pid 734 comm ibv_rc_pingpong

Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 9b22bb553e8b..9ec5ea49e06e 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -31,6 +31,7 @@
  */
 
 #include "rdma_core.h"
+#include "restrack.h"
 #include "uverbs.h"
 #include <rdma/uverbs_std_types.h>
 
@@ -134,6 +135,15 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
 	atomic_inc(&pd->usecnt);
 	atomic_inc(&dm->usecnt);
 
+	rdma_restrack_new(&mr->res, RDMA_RESTRACK_MR);
+	rdma_restrack_set_name(&mr->res, NULL);
+	ret = rdma_restrack_add(&mr->res);
+	if (ret) {
+		rdma_restrack_put(&mr->res);
+		pd->device->ops.dereg_mr(mr, &attrs->driver_udata);
+		return ret;
+	}
+
 	uobj->object = mr;
 
 	uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DM_MR_HANDLE);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity
  2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
                   ` (7 preceding siblings ...)
  2020-09-26 10:19 ` [PATCH rdma-next v3 8/9] RDMA/core: Track device memory MRs Leon Romanovsky
@ 2020-09-26 10:19 ` Leon Romanovsky
  2020-10-02 12:55   ` Jason Gunthorpe
  8 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-26 10:19 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

The valid field was needed to distinguish between supported/not
supported QPs, after the create_qp was changed to support all types,
that field can be dropped in a favor of no_track field.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/restrack.c | 29 ++++++++---------------------
 include/rdma/restrack.h            |  9 ---------
 2 files changed, 8 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 593af32d86a0..6ca3e6f3adb5 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -143,7 +143,7 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
 		return container_of(res, struct rdma_counter, res)->device;
 	default:
 		WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
-		return NULL;
+		return ERR_PTR(-EINVAL);
 	}
 }
 
@@ -223,7 +223,7 @@ int __must_check rdma_restrack_add(struct rdma_restrack_entry *res)
 	struct rdma_restrack_root *rt;
 	int ret = 0;
 
-	if (!dev)
+	if (IS_ERR_OR_NULL(dev))
 		return -ENODEV;
 
 	if (res->no_track)
@@ -261,10 +261,7 @@ int __must_check rdma_restrack_add(struct rdma_restrack_entry *res)
 	}
 
 out:
-	if (ret)
-		return ret;
-	res->valid = true;
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(rdma_restrack_add);
 
@@ -323,25 +320,16 @@ EXPORT_SYMBOL(rdma_restrack_put);
  */
 void rdma_restrack_del(struct rdma_restrack_entry *res)
 {
+	struct ib_device *dev = res_to_dev(res);
 	struct rdma_restrack_entry *old;
 	struct rdma_restrack_root *rt;
-	struct ib_device *dev;
 
-	if (!res->valid) {
-		if (res->task) {
-			put_task_struct(res->task);
-			res->task = NULL;
-		}
-		return;
-	}
-
-	if (res->no_track)
+	WARN_ONCE(!dev && res->type != RDMA_RESTRACK_CM_ID,
+		  "IB device should be set for restrack type %s",
+		  type2str(res->type));
+	if (res->no_track || IS_ERR_OR_NULL(dev))
 		goto out;
 
-	dev = res_to_dev(res);
-	if (WARN_ON(!dev))
-		return;
-
 	rt = &dev->res[res->type];
 
 	old = xa_erase(&rt->xa, res->id);
@@ -350,7 +338,6 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
 	WARN_ON(old != res);
 
 out:
-	res->valid = false;
 	rdma_restrack_put(res);
 	wait_for_completion(&res->comp);
 }
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index 05e18839eaff..a4140c6e2c81 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -59,15 +59,6 @@ enum rdma_restrack_type {
  * struct rdma_restrack_entry - metadata per-entry
  */
 struct rdma_restrack_entry {
-	/**
-	 * @valid: validity indicator
-	 *
-	 * The entries are filled during rdma_restrack_add,
-	 * can be attempted to be free during rdma_restrack_del.
-	 *
-	 * As an example for that, see mlx5 QPs with type MLX5_IB_QPT_HW_GSI
-	 */
-	bool			valid;
 	/**
 	 * @no_track: don't add this entry to restrack DB
 	 *
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-09-26 10:19 ` [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object Leon Romanovsky
@ 2020-10-02 12:42   ` Jason Gunthorpe
  2020-10-02 12:57     ` Leon Romanovsky
  2020-10-02 13:13   ` Jason Gunthorpe
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-10-02 12:42 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Leon Romanovsky, linux-rdma, Mark Zhang

On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index 12ebacf52958..1abcb01d362f 100644
> +++ b/drivers/infiniband/core/cq.c
> @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
>  		goto out_destroy_cq;
>  	}
>  
> -	rdma_restrack_add(&cq->res);
> +	ret = rdma_restrack_add(&cq->res);
> +	if (ret)
> +		goto out_poll_cq;
> +
>  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
>  	return cq;
>  
> +out_poll_cq:
> +	switch (cq->poll_ctx) {
> +	case IB_POLL_SOFTIRQ:
> +		irq_poll_disable(&cq->iop);
> +		break;
> +	case IB_POLL_WORKQUEUE:
> +	case IB_POLL_UNBOUND_WORKQUEUE:
> +		cancel_work_sync(&cq->work);

This error unwind is *technically* in the wrong order, it is wrong in
ib_free_cq too which is an actual bug.

The cq->comp_handler should be set before calling create_cq and undone
after calling destroy_wq. We can do this right now that the
allocations have been reworked.

Otherwise there is no assurance the ib_cq_completion_workqueue() won't
be called after this cancel == use after free

Also, you need to check all the rdma_restrack_del()'s, they should
always be *before* destroying the HW object, eg ib_free_cq() has it
too late. Similarly the add should always be after the HW object is
allocated.

For instance fill_res_cq_entry() calls 

  dev->ops.fill_res_cq_entry(msg, cq) 

on an already free'd HW object with this arrangment.

These are pre-existing things so lets fix them seperately please

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity
  2020-09-26 10:19 ` [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity Leon Romanovsky
@ 2020-10-02 12:55   ` Jason Gunthorpe
  2020-10-02 13:05     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-10-02 12:55 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Leon Romanovsky, linux-rdma

On Sat, Sep 26, 2020 at 01:19:38PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> The valid field was needed to distinguish between supported/not
> supported QPs, after the create_qp was changed to support all types,
> that field can be dropped in a favor of no_track field.
> 
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/restrack.c | 29 ++++++++---------------------
>  include/rdma/restrack.h            |  9 ---------
>  2 files changed, 8 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> index 593af32d86a0..6ca3e6f3adb5 100644
> +++ b/drivers/infiniband/core/restrack.c
> @@ -143,7 +143,7 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
>  		return container_of(res, struct rdma_counter, res)->device;
>  	default:
>  		WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
> -		return NULL;
> +		return ERR_PTR(-EINVAL);
>  	}
>  }
>  
> @@ -223,7 +223,7 @@ int __must_check rdma_restrack_add(struct rdma_restrack_entry *res)
>  	struct rdma_restrack_root *rt;
>  	int ret = 0;
>  
> -	if (!dev)
> +	if (IS_ERR_OR_NULL(dev))
>  		return -ENODEV;

dev can't be NULL

Not sure why this was changed? The error code is always thrown away,
what was wrong with keeping it as NULL? 

Now that all callers check the return code this should be a WARN_ON as
calling restrack_add in a way that is guarenteed to fail us a ULP
error.

> +	WARN_ONCE(!dev && res->type != RDMA_RESTRACK_CM_ID,
> +		  "IB device should be set for restrack type %s",
> +		  type2str(res->type));
> +	if (res->no_track || IS_ERR_OR_NULL(dev))
>  		goto out;

dev is never NULL so that WARN_ONCE doesn't work

Why does this exclude CM_ID? I thought all the fixing in the cm was so
restrack_add and _del were prefectly paired and a device must be
present?

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-02 12:42   ` Jason Gunthorpe
@ 2020-10-02 12:57     ` Leon Romanovsky
  2020-10-02 13:16       ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-10-02 12:57 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > index 12ebacf52958..1abcb01d362f 100644
> > +++ b/drivers/infiniband/core/cq.c
> > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> >  		goto out_destroy_cq;
> >  	}
> >
> > -	rdma_restrack_add(&cq->res);
> > +	ret = rdma_restrack_add(&cq->res);
> > +	if (ret)
> > +		goto out_poll_cq;
> > +
> >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> >  	return cq;
> >
> > +out_poll_cq:
> > +	switch (cq->poll_ctx) {
> > +	case IB_POLL_SOFTIRQ:
> > +		irq_poll_disable(&cq->iop);
> > +		break;
> > +	case IB_POLL_WORKQUEUE:
> > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > +		cancel_work_sync(&cq->work);
>
> This error unwind is *technically* in the wrong order, it is wrong in
> ib_free_cq too which is an actual bug.
>
> The cq->comp_handler should be set before calling create_cq and undone
> after calling destroy_wq. We can do this right now that the
> allocations have been reworked.
>
> Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> be called after this cancel == use after free
>
> Also, you need to check all the rdma_restrack_del()'s, they should
> always be *before* destroying the HW object, eg ib_free_cq() has it
> too late. Similarly the add should always be after the HW object is
> allocated.

It is true to not converted object (QP and MR), everything that was
converted has two steps: rdma_restrack_put() before creation,
rdma_restrack_add() right after creation and rdma_restrack_del() after
successful destroy.

>
> For instance fill_res_cq_entry() calls
>
>   dev->ops.fill_res_cq_entry(msg, cq)
>
> on an already free'd HW object with this arrangment.
>
> These are pre-existing things so lets fix them seperately please

I'll fix later.

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity
  2020-10-02 12:55   ` Jason Gunthorpe
@ 2020-10-02 13:05     ` Leon Romanovsky
  0 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-10-02 13:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

On Fri, Oct 02, 2020 at 09:55:35AM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 26, 2020 at 01:19:38PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > The valid field was needed to distinguish between supported/not
> > supported QPs, after the create_qp was changed to support all types,
> > that field can be dropped in a favor of no_track field.
> >
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/restrack.c | 29 ++++++++---------------------
> >  include/rdma/restrack.h            |  9 ---------
> >  2 files changed, 8 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> > index 593af32d86a0..6ca3e6f3adb5 100644
> > +++ b/drivers/infiniband/core/restrack.c
> > @@ -143,7 +143,7 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
> >  		return container_of(res, struct rdma_counter, res)->device;
> >  	default:
> >  		WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
> > -		return NULL;
> > +		return ERR_PTR(-EINVAL);
> >  	}
> >  }
> >
> > @@ -223,7 +223,7 @@ int __must_check rdma_restrack_add(struct rdma_restrack_entry *res)
> >  	struct rdma_restrack_root *rt;
> >  	int ret = 0;
> >
> > -	if (!dev)
> > +	if (IS_ERR_OR_NULL(dev))
> >  		return -ENODEV;
>
> dev can't be NULL
>
> Not sure why this was changed? The error code is always thrown away,
> what was wrong with keeping it as NULL?
>
> Now that all callers check the return code this should be a WARN_ON as
> calling restrack_add in a way that is guarenteed to fail us a ULP
> error.
>
> > +	WARN_ONCE(!dev && res->type != RDMA_RESTRACK_CM_ID,
> > +		  "IB device should be set for restrack type %s",
> > +		  type2str(res->type));
> > +	if (res->no_track || IS_ERR_OR_NULL(dev))
> >  		goto out;
>
> dev is never NULL so that WARN_ONCE doesn't work
>
> Why does this exclude CM_ID? I thought all the fixing in the cm was so
> restrack_add and _del were prefectly paired and a device must be
> present?

Both NULL and RDMA_RESTRACK_CM_ID are not needed, this is how I
discovered the need to clean the cma.c, but left it here.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-09-26 10:19 ` [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object Leon Romanovsky
  2020-10-02 12:42   ` Jason Gunthorpe
@ 2020-10-02 13:13   ` Jason Gunthorpe
  2020-10-04  6:04     ` Leon Romanovsky
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-10-02 13:13 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Leon Romanovsky, linux-rdma, Mark Zhang

On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> @@ -449,7 +453,10 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
>  	ret = ib_dev->ops.alloc_pd(pd, &attrs->driver_udata);
>  	if (ret)
>  		goto err_alloc;
> -	rdma_restrack_add(&pd->res);
> +
> +	ret = rdma_restrack_add(&pd->res);
> +	if (ret)
> +		goto err_restrack;
>  
>  	uobj->object = pd;
>  	uobj_finalize_uobj_create(uobj, attrs);
> @@ -457,6 +464,9 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
>  	resp.pd_handle = uobj->id;
>  	return uverbs_response(attrs, &resp, sizeof(resp));
>  
> +err_restrack:
> +	ib_dev->ops.dealloc_pd(pd, &attrs->driver_udata);

There can be no failure between allocating the HW object and calling
uobj_finalize_uobj_create(). That was the whole point of that scheme.
It is really important that be kept.

Now that destroys are allowed to fail we aboslutely cannot have any
open coded destroys like this *anywhere* in the uobject system.

I think you need to go back to a model where the xarray allocation can
fail but we can still call del, otherwise the error unwind is a
complete nightmare.

This also has problems like this:

int ib_dereg_mr_user(struct ib_mr *mr, struct ib_udata *udata)
{
	rdma_restrack_del(&mr->res);
	ret = mr->device->ops.dereg_mr(mr, udata);
	if (!ret) {

With the uobject destroy retry scheme restrack_del will be called
multiple times.

I think this is pretty simple to solve, write del as this:

void rdma_restrack_del(struct rdma_restrack_entry *res)
{
	struct ib_device *dev = res_to_dev(res);
	struct rdma_restrack_entry *old;
	struct rdma_restrack_root *rt;

	if (WARN_ON(!dev))
		return

	rt = &dev->res[res->type];
	if (xa_cmpxchg(&rt->xa, res->id, res, NULL, GFP_KERNEL) != res)
		return;
	rdma_restrack_put(res);
	wait_for_completion(&res->comp);
}

There is no need to do the wait_for_completion if we didn't change the
xarray.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-02 12:57     ` Leon Romanovsky
@ 2020-10-02 13:16       ` Jason Gunthorpe
  2020-10-04  6:48         ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-10-02 13:16 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote:
> On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > > index 12ebacf52958..1abcb01d362f 100644
> > > +++ b/drivers/infiniband/core/cq.c
> > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > >  		goto out_destroy_cq;
> > >  	}
> > >
> > > -	rdma_restrack_add(&cq->res);
> > > +	ret = rdma_restrack_add(&cq->res);
> > > +	if (ret)
> > > +		goto out_poll_cq;
> > > +
> > >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> > >  	return cq;
> > >
> > > +out_poll_cq:
> > > +	switch (cq->poll_ctx) {
> > > +	case IB_POLL_SOFTIRQ:
> > > +		irq_poll_disable(&cq->iop);
> > > +		break;
> > > +	case IB_POLL_WORKQUEUE:
> > > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > > +		cancel_work_sync(&cq->work);
> >
> > This error unwind is *technically* in the wrong order, it is wrong in
> > ib_free_cq too which is an actual bug.
> >
> > The cq->comp_handler should be set before calling create_cq and undone
> > after calling destroy_wq. We can do this right now that the
> > allocations have been reworked.
> >
> > Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> > be called after this cancel == use after free
> >
> > Also, you need to check all the rdma_restrack_del()'s, they should
> > always be *before* destroying the HW object, eg ib_free_cq() has it
> > too late. Similarly the add should always be after the HW object is
> > allocated.
> 
> It is true to not converted object (QP and MR), everything that was
> converted has two steps: rdma_restrack_put() before creation,
> rdma_restrack_add() right after creation and rdma_restrack_del() after
> successful destroy.

It must be before destroy not after.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-02 13:13   ` Jason Gunthorpe
@ 2020-10-04  6:04     ` Leon Romanovsky
  0 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-10-04  6:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Fri, Oct 02, 2020 at 10:13:18AM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > @@ -449,7 +453,10 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
> >  	ret = ib_dev->ops.alloc_pd(pd, &attrs->driver_udata);
> >  	if (ret)
> >  		goto err_alloc;
> > -	rdma_restrack_add(&pd->res);
> > +
> > +	ret = rdma_restrack_add(&pd->res);
> > +	if (ret)
> > +		goto err_restrack;
> >
> >  	uobj->object = pd;
> >  	uobj_finalize_uobj_create(uobj, attrs);
> > @@ -457,6 +464,9 @@ static int ib_uverbs_alloc_pd(struct uverbs_attr_bundle *attrs)
> >  	resp.pd_handle = uobj->id;
> >  	return uverbs_response(attrs, &resp, sizeof(resp));
> >
> > +err_restrack:
> > +	ib_dev->ops.dealloc_pd(pd, &attrs->driver_udata);
>
> There can be no failure between allocating the HW object and calling
> uobj_finalize_uobj_create(). That was the whole point of that scheme.
> It is really important that be kept.
>
> Now that destroys are allowed to fail we aboslutely cannot have any
> open coded destroys like this *anywhere* in the uobject system.
>
> I think you need to go back to a model where the xarray allocation can
> fail but we can still call del, otherwise the error unwind is a
> complete nightmare.
>
> This also has problems like this:
>
> int ib_dereg_mr_user(struct ib_mr *mr, struct ib_udata *udata)
> {
> 	rdma_restrack_del(&mr->res);
> 	ret = mr->device->ops.dereg_mr(mr, udata);
> 	if (!ret) {
>
> With the uobject destroy retry scheme restrack_del will be called
> multiple times.
>
> I think this is pretty simple to solve, write del as this:
>
> void rdma_restrack_del(struct rdma_restrack_entry *res)
> {
> 	struct ib_device *dev = res_to_dev(res);
> 	struct rdma_restrack_entry *old;
> 	struct rdma_restrack_root *rt;
>
> 	if (WARN_ON(!dev))
> 		return
>
> 	rt = &dev->res[res->type];
> 	if (xa_cmpxchg(&rt->xa, res->id, res, NULL, GFP_KERNEL) != res)
> 		return;
> 	rdma_restrack_put(res);
> 	wait_for_completion(&res->comp);
> }
>
> There is no need to do the wait_for_completion if we didn't change the
> xarray.

We still need to do it, because restrack does two things at the same
time: manages object lifetime (rdma_restrack_new and rdma_restrack_put)
and stores HW IDs (rdma_restrack_add).

For the objects that marked as no_track we still want to ensure that
their allocations are correct.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-02 13:16       ` Jason Gunthorpe
@ 2020-10-04  6:48         ` Leon Romanovsky
  2020-10-04 12:32           ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-10-04  6:48 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Fri, Oct 02, 2020 at 10:16:28AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote:
> > On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> > > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > > > index 12ebacf52958..1abcb01d362f 100644
> > > > +++ b/drivers/infiniband/core/cq.c
> > > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > > >  		goto out_destroy_cq;
> > > >  	}
> > > >
> > > > -	rdma_restrack_add(&cq->res);
> > > > +	ret = rdma_restrack_add(&cq->res);
> > > > +	if (ret)
> > > > +		goto out_poll_cq;
> > > > +
> > > >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> > > >  	return cq;
> > > >
> > > > +out_poll_cq:
> > > > +	switch (cq->poll_ctx) {
> > > > +	case IB_POLL_SOFTIRQ:
> > > > +		irq_poll_disable(&cq->iop);
> > > > +		break;
> > > > +	case IB_POLL_WORKQUEUE:
> > > > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > > > +		cancel_work_sync(&cq->work);
> > >
> > > This error unwind is *technically* in the wrong order, it is wrong in
> > > ib_free_cq too which is an actual bug.
> > >
> > > The cq->comp_handler should be set before calling create_cq and undone
> > > after calling destroy_wq. We can do this right now that the
> > > allocations have been reworked.
> > >
> > > Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> > > be called after this cancel == use after free
> > >
> > > Also, you need to check all the rdma_restrack_del()'s, they should
> > > always be *before* destroying the HW object, eg ib_free_cq() has it
> > > too late. Similarly the add should always be after the HW object is
> > > allocated.
> >
> > It is true to not converted object (QP and MR), everything that was
> > converted has two steps: rdma_restrack_put() before creation,
> > rdma_restrack_add() right after creation and rdma_restrack_del() after
> > successful destroy.
>
> It must be before destroy not after.

We need rdma_restrack_put() after destroy to release memory.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-04  6:48         ` Leon Romanovsky
@ 2020-10-04 12:32           ` Jason Gunthorpe
  2020-10-04 12:49             ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-10-04 12:32 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Sun, Oct 04, 2020 at 09:48:18AM +0300, Leon Romanovsky wrote:
> On Fri, Oct 02, 2020 at 10:16:28AM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote:
> > > On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> > > > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > > > > index 12ebacf52958..1abcb01d362f 100644
> > > > > +++ b/drivers/infiniband/core/cq.c
> > > > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > > > >  		goto out_destroy_cq;
> > > > >  	}
> > > > >
> > > > > -	rdma_restrack_add(&cq->res);
> > > > > +	ret = rdma_restrack_add(&cq->res);
> > > > > +	if (ret)
> > > > > +		goto out_poll_cq;
> > > > > +
> > > > >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> > > > >  	return cq;
> > > > >
> > > > > +out_poll_cq:
> > > > > +	switch (cq->poll_ctx) {
> > > > > +	case IB_POLL_SOFTIRQ:
> > > > > +		irq_poll_disable(&cq->iop);
> > > > > +		break;
> > > > > +	case IB_POLL_WORKQUEUE:
> > > > > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > > > > +		cancel_work_sync(&cq->work);
> > > >
> > > > This error unwind is *technically* in the wrong order, it is wrong in
> > > > ib_free_cq too which is an actual bug.
> > > >
> > > > The cq->comp_handler should be set before calling create_cq and undone
> > > > after calling destroy_wq. We can do this right now that the
> > > > allocations have been reworked.
> > > >
> > > > Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> > > > be called after this cancel == use after free
> > > >
> > > > Also, you need to check all the rdma_restrack_del()'s, they should
> > > > always be *before* destroying the HW object, eg ib_free_cq() has it
> > > > too late. Similarly the add should always be after the HW object is
> > > > allocated.
> > >
> > > It is true to not converted object (QP and MR), everything that was
> > > converted has two steps: rdma_restrack_put() before creation,
> > > rdma_restrack_add() right after creation and rdma_restrack_del() after
> > > successful destroy.
> >
> > It must be before destroy not after.
> 
> We need rdma_restrack_put() after destroy to release memory.

The netlink ops must be blocked before ops->destory and the memory
freed after ops->destroy success.

It must work like that since the fill stuff was added as ops - no
choice.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-04 12:32           ` Jason Gunthorpe
@ 2020-10-04 12:49             ` Leon Romanovsky
  2020-10-04 13:03               ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-10-04 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Sun, Oct 04, 2020 at 09:32:26AM -0300, Jason Gunthorpe wrote:
> On Sun, Oct 04, 2020 at 09:48:18AM +0300, Leon Romanovsky wrote:
> > On Fri, Oct 02, 2020 at 10:16:28AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote:
> > > > On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> > > > > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > > > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > > > > > index 12ebacf52958..1abcb01d362f 100644
> > > > > > +++ b/drivers/infiniband/core/cq.c
> > > > > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > > > > >  		goto out_destroy_cq;
> > > > > >  	}
> > > > > >
> > > > > > -	rdma_restrack_add(&cq->res);
> > > > > > +	ret = rdma_restrack_add(&cq->res);
> > > > > > +	if (ret)
> > > > > > +		goto out_poll_cq;
> > > > > > +
> > > > > >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> > > > > >  	return cq;
> > > > > >
> > > > > > +out_poll_cq:
> > > > > > +	switch (cq->poll_ctx) {
> > > > > > +	case IB_POLL_SOFTIRQ:
> > > > > > +		irq_poll_disable(&cq->iop);
> > > > > > +		break;
> > > > > > +	case IB_POLL_WORKQUEUE:
> > > > > > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > > > > > +		cancel_work_sync(&cq->work);
> > > > >
> > > > > This error unwind is *technically* in the wrong order, it is wrong in
> > > > > ib_free_cq too which is an actual bug.
> > > > >
> > > > > The cq->comp_handler should be set before calling create_cq and undone
> > > > > after calling destroy_wq. We can do this right now that the
> > > > > allocations have been reworked.
> > > > >
> > > > > Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> > > > > be called after this cancel == use after free
> > > > >
> > > > > Also, you need to check all the rdma_restrack_del()'s, they should
> > > > > always be *before* destroying the HW object, eg ib_free_cq() has it
> > > > > too late. Similarly the add should always be after the HW object is
> > > > > allocated.
> > > >
> > > > It is true to not converted object (QP and MR), everything that was
> > > > converted has two steps: rdma_restrack_put() before creation,
> > > > rdma_restrack_add() right after creation and rdma_restrack_del() after
> > > > successful destroy.
> > >
> > > It must be before destroy not after.
> >
> > We need rdma_restrack_put() after destroy to release memory.
>
> The netlink ops must be blocked before ops->destory and the memory
> freed after ops->destroy success.
>
> It must work like that since the fill stuff was added as ops - no
> choice.

So I will need to separate _del() to two calls, one is real _del() and
another _put().

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object
  2020-10-04 12:49             ` Leon Romanovsky
@ 2020-10-04 13:03               ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2020-10-04 13:03 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, linux-rdma, Mark Zhang

On Sun, Oct 04, 2020 at 03:49:20PM +0300, Leon Romanovsky wrote:
> On Sun, Oct 04, 2020 at 09:32:26AM -0300, Jason Gunthorpe wrote:
> > On Sun, Oct 04, 2020 at 09:48:18AM +0300, Leon Romanovsky wrote:
> > > On Fri, Oct 02, 2020 at 10:16:28AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote:
> > > > > On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> > > > > > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > > > > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > > > > > > index 12ebacf52958..1abcb01d362f 100644
> > > > > > > +++ b/drivers/infiniband/core/cq.c
> > > > > > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > > > > > >  		goto out_destroy_cq;
> > > > > > >  	}
> > > > > > >
> > > > > > > -	rdma_restrack_add(&cq->res);
> > > > > > > +	ret = rdma_restrack_add(&cq->res);
> > > > > > > +	if (ret)
> > > > > > > +		goto out_poll_cq;
> > > > > > > +
> > > > > > >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> > > > > > >  	return cq;
> > > > > > >
> > > > > > > +out_poll_cq:
> > > > > > > +	switch (cq->poll_ctx) {
> > > > > > > +	case IB_POLL_SOFTIRQ:
> > > > > > > +		irq_poll_disable(&cq->iop);
> > > > > > > +		break;
> > > > > > > +	case IB_POLL_WORKQUEUE:
> > > > > > > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > > > > > > +		cancel_work_sync(&cq->work);
> > > > > >
> > > > > > This error unwind is *technically* in the wrong order, it is wrong in
> > > > > > ib_free_cq too which is an actual bug.
> > > > > >
> > > > > > The cq->comp_handler should be set before calling create_cq and undone
> > > > > > after calling destroy_wq. We can do this right now that the
> > > > > > allocations have been reworked.
> > > > > >
> > > > > > Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> > > > > > be called after this cancel == use after free
> > > > > >
> > > > > > Also, you need to check all the rdma_restrack_del()'s, they should
> > > > > > always be *before* destroying the HW object, eg ib_free_cq() has it
> > > > > > too late. Similarly the add should always be after the HW object is
> > > > > > allocated.
> > > > >
> > > > > It is true to not converted object (QP and MR), everything that was
> > > > > converted has two steps: rdma_restrack_put() before creation,
> > > > > rdma_restrack_add() right after creation and rdma_restrack_del() after
> > > > > successful destroy.
> > > >
> > > > It must be before destroy not after.
> > >
> > > We need rdma_restrack_put() after destroy to release memory.
> >
> > The netlink ops must be blocked before ops->destory and the memory
> > freed after ops->destroy success.
> >
> > It must work like that since the fill stuff was added as ops - no
> > choice.
> 
> So I will need to separate _del() to two calls, one is real _del() and
> another _put().

I think you end up with destroy being like

restrack_remove_from_xarray_and_stop_nl()
rc = ops->destroy()
if (rc)
   restrack_return_to_xarray()
   return rc

restrack_put_for_freeing_memory()

It ends up with *two* refcounts, an kref for the memory lifetime and a
refcount_t for the HW object lifetime (basically HW object destroy
rwlock)

To solve the immediate problems I'd suggest something like

static inline int __rdma_destroy_hw_obj(struct restrack *res, int destroy_rc);

#define rdma_destroy_hw_obj(restrack, op, args ..) \
  ({ __rdma_destroy_hw_ob_pre(restrack); __rdma_destroy_hw_obj(restrack, op(args, ## __VA_ARGS__));})

Which does the sequencing above

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-10-04 13:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-26 10:19 [PATCH rdma-next v3 0/9] Track memory allocation with restrack DB help Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 1/9] RDMA/core: Allow drivers to disable restrack DB Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 2/9] RDMA/counter: Combine allocation and bind logic Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 3/9] RDMA/restrack: Store all special QPs in restrack DB Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 4/9] RDMA/cma: Add missing error handling of listen_id Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 5/9] RDMA/cma: Be strict with attaching to CMA device Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object Leon Romanovsky
2020-10-02 12:42   ` Jason Gunthorpe
2020-10-02 12:57     ` Leon Romanovsky
2020-10-02 13:16       ` Jason Gunthorpe
2020-10-04  6:48         ` Leon Romanovsky
2020-10-04 12:32           ` Jason Gunthorpe
2020-10-04 12:49             ` Leon Romanovsky
2020-10-04 13:03               ` Jason Gunthorpe
2020-10-02 13:13   ` Jason Gunthorpe
2020-10-04  6:04     ` Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 7/9] RDMA/restrack: Support all QP types Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 8/9] RDMA/core: Track device memory MRs Leon Romanovsky
2020-09-26 10:19 ` [PATCH rdma-next v3 9/9] RDMA/restrack: Drop valid restrack field as source of ambiguity Leon Romanovsky
2020-10-02 12:55   ` Jason Gunthorpe
2020-10-02 13:05     ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.