All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c
@ 2022-02-08 21:16 Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe Bob Pearson
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

This patch series completes the separation of the mc_grp and mc_elem
object pools from rxe_pools.c and moving of their code to rxe_mcast.c.
This makes sense because these two pools are different from the other
pools as the only ones that do not share objects with rdma-core and
that use key's instead of indices to enable looking up objects. This
change will enable a significant simplification of the normal object
pools.

This patch series applies cleanly to current for-next.
commit 0d9c00117b8a57a361b27f7bd94284c94155f039 (for-next)

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v11
  Restructured the patch series to simplify it and make each patch
  simpler. Addressed some additional issues raised by Jason Gunthorpe
  in the review of that last version.
v10
  Corrected issues reported by Jason Gunthorpe
  Isolated patches 01-17 separate from the remaining patches.
  They will be submitted later
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (11):
  RDMA/rxe: Move mcg_lock to rxe
  RDMA/rxe: Use kzmalloc/kfree for mca
  RDMA/rxe: Replace grp by mcg, mce by mca
  RDMA/rxe: Replace int num_qp by atomic_t qp_num
  RDMA/rxe: Replace pool key by rxe->mcg_tree
  RDMA/rxe: Remove key'ed object support
  RDMA/rxe: Remove mcg from rxe pools
  RDMA/rxe: Add code to cleanup mcast memory
  RDMA/rxe: Finish cleanup of rxe_mcast.c
  RDMA/rxe: For mcast copy qp list to temp array
  RDMA/rxe: Convert mca read locking to RCU

 drivers/infiniband/sw/rxe/rxe.c       |  22 +-
 drivers/infiniband/sw/rxe/rxe_loc.h   |   4 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 713 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_pool.c  | 137 -----
 drivers/infiniband/sw/rxe/rxe_pool.h  |  42 +-
 drivers/infiniband/sw/rxe/rxe_recv.c  | 113 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  16 +-
 7 files changed, 612 insertions(+), 435 deletions(-)
 rewrite drivers/infiniband/sw/rxe/rxe_mcast.c (67%)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-11 19:00   ` Jason Gunthorpe
  2022-02-08 21:16 ` [PATCH for-next v11 02/11] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace mcg->mcg_lock and mc_grp_pool->pool_lock by rxe->mcg_lock.
This is the first step of several intended to decouple the mc_grp
and mc_elem objects from the rxe pool code.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_mcast.c | 19 +++++++++----------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  3 ++-
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index fab291245366..e74c4216b314 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -211,6 +211,8 @@ static int rxe_init(struct rxe_dev *rxe)
 	spin_lock_init(&rxe->pending_lock);
 	INIT_LIST_HEAD(&rxe->pending_mmaps);
 
+	spin_lock_init(&rxe->mcg_lock);
+
 	mutex_init(&rxe->usdev_lock);
 
 	return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 9336295c4ee2..4828274efbd4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -25,7 +25,7 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_del(rxe->ndev, ll_addr);
 }
 
-/* caller should hold mc_grp_pool->pool_lock */
+/* caller should hold rxe->mcg_lock */
 static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 				     struct rxe_pool *pool,
 				     union ib_gid *mgid)
@@ -38,7 +38,6 @@ static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&grp->qp_list);
-	spin_lock_init(&grp->mcg_lock);
 	grp->rxe = rxe;
 	rxe_add_key_locked(grp, mgid);
 
@@ -62,7 +61,7 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	if (rxe->attr.max_mcast_qp_attach == 0)
 		return -EINVAL;
 
-	write_lock_bh(&pool->pool_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	grp = rxe_pool_get_key_locked(pool, mgid);
 	if (grp)
@@ -70,13 +69,13 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 
 	grp = create_grp(rxe, pool, mgid);
 	if (IS_ERR(grp)) {
-		write_unlock_bh(&pool->pool_lock);
+		spin_unlock_bh(&rxe->mcg_lock);
 		err = PTR_ERR(grp);
 		return err;
 	}
 
 done:
-	write_unlock_bh(&pool->pool_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	*grp_p = grp;
 	return 0;
 }
@@ -88,7 +87,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *elem;
 
 	/* check to see of the qp is already a member of the group */
-	spin_lock_bh(&grp->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry(elem, &grp->qp_list, qp_list) {
 		if (elem->qp == qp) {
 			err = 0;
@@ -118,7 +117,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	err = 0;
 out:
-	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	return err;
 }
 
@@ -132,7 +131,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	if (!grp)
 		goto err1;
 
-	spin_lock_bh(&grp->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
 		if (elem->qp == qp) {
@@ -140,7 +139,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			grp->num_qp--;
 			atomic_dec(&qp->mcg_num);
 
-			spin_unlock_bh(&grp->mcg_lock);
+			spin_unlock_bh(&rxe->mcg_lock);
 			rxe_drop_ref(elem);
 			rxe_drop_ref(grp);	/* ref held by QP */
 			rxe_drop_ref(grp);	/* ref from get_key */
@@ -148,7 +147,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 		}
 	}
 
-	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	rxe_drop_ref(grp);			/* ref from get_key */
 err1:
 	return -EINVAL;
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 7ff6b53555f4..a084b5d69937 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -250,7 +250,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	/* this is unreliable datagram service so we let
 	 * failures to deliver a multicast packet to a
@@ -298,7 +298,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		}
 	}
 
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 
 	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 55f8ed2bc621..9940c69cbb63 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,7 +353,6 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rxe_pool_elem	elem;
-	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	union ib_gid		mgid;
@@ -399,6 +398,8 @@ struct rxe_dev {
 	struct rxe_pool		mc_grp_pool;
 	struct rxe_pool		mc_elem_pool;
 
+	spinlock_t		mcg_lock;
+
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 02/11] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 03/11] RDMA/rxe: Replace grp by mcg, mce by mca Bob Pearson
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc
and kfree to allocate and free in rxe_mcast.c. Call kzalloc
outside of spinlocks to avoid having to use GFP_ATOMIC.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   8 --
 drivers/infiniband/sw/rxe/rxe_mcast.c | 191 ++++++++++++++++----------
 drivers/infiniband/sw/rxe/rxe_pool.c  |   5 -
 drivers/infiniband/sw/rxe/rxe_pool.h  |   3 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   2 -
 5 files changed, 120 insertions(+), 89 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index e74c4216b314..7386a51b953d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,7 +29,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 	rxe_pool_cleanup(&rxe->mc_grp_pool);
-	rxe_pool_cleanup(&rxe->mc_elem_pool);
 
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
@@ -163,15 +162,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 	if (err)
 		goto err9;
 
-	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
-			    rxe->attr.max_total_mcast_qp_attach);
-	if (err)
-		goto err10;
-
 	return 0;
 
-err10:
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
 err9:
 	rxe_pool_cleanup(&rxe->mw_pool);
 err8:
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 4828274efbd4..3c06b0590c82 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -26,94 +26,102 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 }
 
 /* caller should hold rxe->mcg_lock */
-static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
-				     struct rxe_pool *pool,
-				     union ib_gid *mgid)
+static struct rxe_mcg *__rxe_create_grp(struct rxe_dev *rxe,
+					struct rxe_pool *pool,
+					union ib_gid *mgid)
 {
-	int err;
 	struct rxe_mcg *grp;
+	int err;
 
-	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
+	grp = rxe_alloc_locked(pool);
 	if (!grp)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&grp->qp_list);
-	grp->rxe = rxe;
-	rxe_add_key_locked(grp, mgid);
-
 	err = rxe_mcast_add(rxe, mgid);
 	if (unlikely(err)) {
-		rxe_drop_key_locked(grp);
 		rxe_drop_ref(grp);
 		return ERR_PTR(err);
 	}
 
+	INIT_LIST_HEAD(&grp->qp_list);
+	grp->rxe = rxe;
+
+	/* rxe_alloc_locked takes a ref on grp but that will be
+	 * dropped when grp goes out of scope. We need to take a ref
+	 * on the pointer that will be saved in the red-black tree
+	 * by rxe_add_key and used to lookup grp from mgid later.
+	 * Adding key makes object visible to outside so this should
+	 * be done last after the object is ready.
+	 */
+	rxe_add_ref(grp);
+	rxe_add_key_locked(grp, mgid);
+
 	return grp;
 }
 
-static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mcg **grp_p)
+static struct rxe_mcg *rxe_mcast_get_grp(struct rxe_dev *rxe,
+					 union ib_gid *mgid)
 {
-	int err;
 	struct rxe_mcg *grp;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	spin_lock_bh(&rxe->mcg_lock);
-
 	grp = rxe_pool_get_key_locked(pool, mgid);
-	if (grp)
-		goto done;
-
-	grp = create_grp(rxe, pool, mgid);
-	if (IS_ERR(grp)) {
-		spin_unlock_bh(&rxe->mcg_lock);
-		err = PTR_ERR(grp);
-		return err;
-	}
-
-done:
+	if (!grp)
+		grp = __rxe_create_grp(rxe, pool, mgid);
 	spin_unlock_bh(&rxe->mcg_lock);
-	*grp_p = grp;
-	return 0;
+
+	return grp;
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mcg *grp)
+				  struct rxe_mcg *grp)
 {
+	struct rxe_mca *mca, *tmp;
 	int err;
-	struct rxe_mca *elem;
 
-	/* check to see of the qp is already a member of the group */
+	/* check to see if the qp is already a member of the group */
+	spin_lock_bh(&rxe->mcg_lock);
+	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			spin_unlock_bh(&rxe->mcg_lock);
+			return 0;
+		}
+	}
+	spin_unlock_bh(&rxe->mcg_lock);
+
+	/* speculative alloc new mca without using GFP_ATOMIC */
+	mca = kzalloc(sizeof(*mca), GFP_KERNEL);
+	if (!mca)
+		return -ENOMEM;
+
 	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry(elem, &grp->qp_list, qp_list) {
-		if (elem->qp == qp) {
+	/* re-check to see if someone else just attached qp */
+	list_for_each_entry(tmp, &grp->qp_list, qp_list) {
+		if (tmp->qp == qp) {
+			kfree(mca);
 			err = 0;
 			goto out;
 		}
 	}
 
+	/* check limits after checking if already attached */
 	if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
+		kfree(mca);
 		err = -ENOMEM;
 		goto out;
 	}
 
-	elem = rxe_alloc_locked(&rxe->mc_elem_pool);
-	if (!elem) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	/* each qp holds a ref on the grp */
-	rxe_add_ref(grp);
+	/* protect pointer to qp in mca */
+	rxe_add_ref(qp);
+	mca->qp = qp;
 
-	grp->num_qp++;
-	elem->qp = qp;
 	atomic_inc(&qp->mcg_num);
-
-	list_add(&elem->qp_list, &grp->qp_list);
+	grp->num_qp++;
+	list_add(&mca->qp_list, &grp->qp_list);
 
 	err = 0;
 out:
@@ -121,45 +129,78 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
+/* caller should be holding rxe->mcg_lock */
+static void __rxe_destroy_grp(struct rxe_mcg *grp)
+{
+	/* first remove grp from red-black tree then drop ref */
+	rxe_drop_key_locked(grp);
+	rxe_drop_ref(grp);
+
+	rxe_mcast_delete(grp->rxe, &grp->mgid);
+}
+
+static void rxe_destroy_grp(struct rxe_mcg *grp)
+{
+	struct rxe_dev *rxe = grp->rxe;
+
+	spin_lock_bh(&rxe->mcg_lock);
+	__rxe_destroy_grp(grp);
+	spin_unlock_bh(&rxe->mcg_lock);
+}
+
+void rxe_mc_cleanup(struct rxe_pool_elem *elem)
+{
+	/* nothing left to do for now */
+}
+
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
 	struct rxe_mcg *grp;
-	struct rxe_mca *elem, *tmp;
-
-	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
-	if (!grp)
-		goto err1;
+	struct rxe_mca *mca, *tmp;
+	int err;
 
 	spin_lock_bh(&rxe->mcg_lock);
+	grp = rxe_pool_get_key_locked(&rxe->mc_grp_pool, mgid);
+	if (!grp) {
+		/* we didn't find the mcast group for mgid */
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
+	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			list_del(&mca->qp_list);
 
-	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
-		if (elem->qp == qp) {
-			list_del(&elem->qp_list);
+			/* if the number of qp's attached to the
+			 * mcast group falls to zero go ahead and
+			 * tear it down. This will not free the
+			 * object since we are still holding a ref
+			 * from the get key above.
+			 */
 			grp->num_qp--;
+			if (grp->num_qp <= 0)
+				__rxe_destroy_grp(grp);
+
 			atomic_dec(&qp->mcg_num);
 
-			spin_unlock_bh(&rxe->mcg_lock);
-			rxe_drop_ref(elem);
-			rxe_drop_ref(grp);	/* ref held by QP */
-			rxe_drop_ref(grp);	/* ref from get_key */
-			return 0;
+			/* drop the ref from get key. This will free the
+			 * object if num_qp is zero.
+			 */
+			rxe_drop_ref(grp);
+			kfree(mca);
+			err = 0;
+			goto out_unlock;
 		}
 	}
 
-	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(grp);			/* ref from get_key */
-err1:
-	return -EINVAL;
-}
-
-void rxe_mc_cleanup(struct rxe_pool_elem *elem)
-{
-	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
-	struct rxe_dev *rxe = grp->rxe;
+	/* we didn't find the qp on the list */
+	rxe_drop_ref(grp);
+	err = -EINVAL;
 
-	rxe_drop_key(grp);
-	rxe_mcast_delete(rxe, &grp->mgid);
+out_unlock:
+	spin_unlock_bh(&rxe->mcg_lock);
+	return err;
 }
 
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
@@ -170,12 +211,16 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_mcg *grp;
 
 	/* takes a ref on grp if successful */
-	err = rxe_mcast_get_grp(rxe, mgid, &grp);
-	if (err)
-		return err;
+	grp = rxe_mcast_get_grp(rxe, mgid);
+	if (IS_ERR(grp))
+		return PTR_ERR(grp);
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
 
+	/* if we failed to attach the first qp to grp tear it down */
+	if (grp->num_qp == 0)
+		rxe_destroy_grp(grp);
+
 	rxe_drop_ref(grp);
 	return err;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 63c594173565..a6756aa93e2b 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -90,11 +90,6 @@ static const struct rxe_type_info {
 		.key_offset	= offsetof(struct rxe_mcg, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
-	[RXE_TYPE_MC_ELEM] = {
-		.name		= "rxe-mc_elem",
-		.size		= sizeof(struct rxe_mca),
-		.elem_offset	= offsetof(struct rxe_mca, elem),
-	},
 };
 
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 214279310f4d..9201bb6b8b07 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -23,7 +23,6 @@ enum rxe_elem_type {
 	RXE_TYPE_MR,
 	RXE_TYPE_MW,
 	RXE_TYPE_MC_GRP,
-	RXE_TYPE_MC_ELEM,
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
@@ -156,4 +155,6 @@ void rxe_elem_release(struct kref *kref);
 /* drop a reference on an object */
 #define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release)
 
+#define rxe_read_ref(obj) kref_read(&(obj)->elem.ref_cnt)
+
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 9940c69cbb63..1b0f40881895 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -362,7 +362,6 @@ struct rxe_mcg {
 };
 
 struct rxe_mca {
-	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
 	struct rxe_qp		*qp;
 };
@@ -396,7 +395,6 @@ struct rxe_dev {
 	struct rxe_pool		mr_pool;
 	struct rxe_pool		mw_pool;
 	struct rxe_pool		mc_grp_pool;
-	struct rxe_pool		mc_elem_pool;
 
 	spinlock_t		mcg_lock;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 03/11] RDMA/rxe: Replace grp by mcg, mce by mca
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 02/11] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 04/11] RDMA/rxe: Replace int num_qp by atomic_t qp_num Bob Pearson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace 'grp' by 'mcg', 'mce' by 'mca'.
Shorten subroutine names in rxe_mcast.c.
These name uses are more in line with other object names used.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 110 +++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   8 +-
 2 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 3c06b0590c82..96dc11a892a4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -26,66 +26,66 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 }
 
 /* caller should hold rxe->mcg_lock */
-static struct rxe_mcg *__rxe_create_grp(struct rxe_dev *rxe,
+static struct rxe_mcg *__rxe_create_mcg(struct rxe_dev *rxe,
 					struct rxe_pool *pool,
 					union ib_gid *mgid)
 {
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	int err;
 
-	grp = rxe_alloc_locked(pool);
-	if (!grp)
+	mcg = rxe_alloc_locked(pool);
+	if (!mcg)
 		return ERR_PTR(-ENOMEM);
 
 	err = rxe_mcast_add(rxe, mgid);
 	if (unlikely(err)) {
-		rxe_drop_ref(grp);
+		rxe_drop_ref(mcg);
 		return ERR_PTR(err);
 	}
 
-	INIT_LIST_HEAD(&grp->qp_list);
-	grp->rxe = rxe;
+	INIT_LIST_HEAD(&mcg->qp_list);
+	mcg->rxe = rxe;
 
-	/* rxe_alloc_locked takes a ref on grp but that will be
-	 * dropped when grp goes out of scope. We need to take a ref
+	/* rxe_alloc_locked takes a ref on mcg but that will be
+	 * dropped when mcg goes out of scope. We need to take a ref
 	 * on the pointer that will be saved in the red-black tree
-	 * by rxe_add_key and used to lookup grp from mgid later.
+	 * by rxe_add_key and used to lookup mcg from mgid later.
 	 * Adding key makes object visible to outside so this should
 	 * be done last after the object is ready.
 	 */
-	rxe_add_ref(grp);
-	rxe_add_key_locked(grp, mgid);
+	rxe_add_ref(mcg);
+	rxe_add_key_locked(mcg, mgid);
 
-	return grp;
+	return mcg;
 }
 
-static struct rxe_mcg *rxe_mcast_get_grp(struct rxe_dev *rxe,
+static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe,
 					 union ib_gid *mgid)
 {
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
 		return ERR_PTR(-EINVAL);
 
 	spin_lock_bh(&rxe->mcg_lock);
-	grp = rxe_pool_get_key_locked(pool, mgid);
-	if (!grp)
-		grp = __rxe_create_grp(rxe, pool, mgid);
+	mcg = rxe_pool_get_key_locked(pool, mgid);
+	if (!mcg)
+		mcg = __rxe_create_mcg(rxe, pool, mgid);
 	spin_unlock_bh(&rxe->mcg_lock);
 
-	return grp;
+	return mcg;
 }
 
-static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-				  struct rxe_mcg *grp)
+static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
+				  struct rxe_mcg *mcg)
 {
 	struct rxe_mca *mca, *tmp;
 	int err;
 
 	/* check to see if the qp is already a member of the group */
 	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			spin_unlock_bh(&rxe->mcg_lock);
 			return 0;
@@ -100,7 +100,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	spin_lock_bh(&rxe->mcg_lock);
 	/* re-check to see if someone else just attached qp */
-	list_for_each_entry(tmp, &grp->qp_list, qp_list) {
+	list_for_each_entry(tmp, &mcg->qp_list, qp_list) {
 		if (tmp->qp == qp) {
 			kfree(mca);
 			err = 0;
@@ -109,7 +109,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	/* check limits after checking if already attached */
-	if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
+	if (mcg->num_qp >= rxe->attr.max_mcast_qp_attach) {
 		kfree(mca);
 		err = -ENOMEM;
 		goto out;
@@ -120,8 +120,8 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	mca->qp = qp;
 
 	atomic_inc(&qp->mcg_num);
-	grp->num_qp++;
-	list_add(&mca->qp_list, &grp->qp_list);
+	mcg->num_qp++;
+	list_add(&mca->qp_list, &mcg->qp_list);
 
 	err = 0;
 out:
@@ -130,21 +130,21 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 }
 
 /* caller should be holding rxe->mcg_lock */
-static void __rxe_destroy_grp(struct rxe_mcg *grp)
+static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
 {
-	/* first remove grp from red-black tree then drop ref */
-	rxe_drop_key_locked(grp);
-	rxe_drop_ref(grp);
+	/* first remove mcg from red-black tree then drop ref */
+	rxe_drop_key_locked(mcg);
+	rxe_drop_ref(mcg);
 
-	rxe_mcast_delete(grp->rxe, &grp->mgid);
+	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
 }
 
-static void rxe_destroy_grp(struct rxe_mcg *grp)
+static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 {
-	struct rxe_dev *rxe = grp->rxe;
+	struct rxe_dev *rxe = mcg->rxe;
 
 	spin_lock_bh(&rxe->mcg_lock);
-	__rxe_destroy_grp(grp);
+	__rxe_destroy_mcg(mcg);
 	spin_unlock_bh(&rxe->mcg_lock);
 }
 
@@ -153,22 +153,22 @@ void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 	/* nothing left to do for now */
 }
 
-static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	struct rxe_mca *mca, *tmp;
 	int err;
 
 	spin_lock_bh(&rxe->mcg_lock);
-	grp = rxe_pool_get_key_locked(&rxe->mc_grp_pool, mgid);
-	if (!grp) {
+	mcg = rxe_pool_get_key_locked(&rxe->mc_grp_pool, mgid);
+	if (!mcg) {
 		/* we didn't find the mcast group for mgid */
 		err = -EINVAL;
 		goto out_unlock;
 	}
 
-	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
+	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			list_del(&mca->qp_list);
 
@@ -178,16 +178,16 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			 * object since we are still holding a ref
 			 * from the get key above.
 			 */
-			grp->num_qp--;
-			if (grp->num_qp <= 0)
-				__rxe_destroy_grp(grp);
+			mcg->num_qp--;
+			if (mcg->num_qp <= 0)
+				__rxe_destroy_mcg(mcg);
 
 			atomic_dec(&qp->mcg_num);
 
 			/* drop the ref from get key. This will free the
 			 * object if num_qp is zero.
 			 */
-			rxe_drop_ref(grp);
+			rxe_drop_ref(mcg);
 			kfree(mca);
 			err = 0;
 			goto out_unlock;
@@ -195,7 +195,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	/* we didn't find the qp on the list */
-	rxe_drop_ref(grp);
+	rxe_drop_ref(mcg);
 	err = -EINVAL;
 
 out_unlock:
@@ -208,20 +208,20 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 
-	/* takes a ref on grp if successful */
-	grp = rxe_mcast_get_grp(rxe, mgid);
-	if (IS_ERR(grp))
-		return PTR_ERR(grp);
+	/* takes a ref on mcg if successful */
+	mcg = rxe_get_mcg(rxe, mgid);
+	if (IS_ERR(mcg))
+		return PTR_ERR(mcg);
 
-	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
+	err = rxe_attach_mcg(rxe, qp, mcg);
 
-	/* if we failed to attach the first qp to grp tear it down */
-	if (grp->num_qp == 0)
-		rxe_destroy_grp(grp);
+	/* if we failed to attach the first qp to mcg tear it down */
+	if (mcg->num_qp == 0)
+		rxe_destroy_mcg(mcg);
 
-	rxe_drop_ref(grp);
+	rxe_drop_ref(mcg);
 	return err;
 }
 
@@ -230,5 +230,5 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
 
-	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
+	return rxe_detach_mcg(rxe, qp, mgid);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index a084b5d69937..d91c6660e83c 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -234,7 +234,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 {
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
 	struct rxe_mcg *mcg;
-	struct rxe_mca *mce;
+	struct rxe_mca *mca;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
 	int err;
@@ -257,8 +257,8 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	 * single QP happen and just move on and try
 	 * the rest of them on the list
 	 */
-	list_for_each_entry(mce, &mcg->qp_list, qp_list) {
-		qp = mce->qp;
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+		qp = mca->qp;
 
 		/* validate qp for incoming packet */
 		err = check_type_state(rxe, pkt, qp);
@@ -273,7 +273,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		 * skb and pass to the QP. Pass the original skb to
 		 * the last QP in the list.
 		 */
-		if (mce->qp_list.next != &mcg->qp_list) {
+		if (mca->qp_list.next != &mcg->qp_list) {
 			struct sk_buff *cskb;
 			struct rxe_pkt_info *cpkt;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 04/11] RDMA/rxe: Replace int num_qp by atomic_t qp_num
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (2 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 03/11] RDMA/rxe: Replace grp by mcg, mce by mca Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 05/11] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace int num_qp in struct rxe_mcg by atomic_t qp_num.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 9 ++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h | 2 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 96dc11a892a4..2c6cb2eb5ac1 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -109,7 +109,8 @@ static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	/* check limits after checking if already attached */
-	if (mcg->num_qp >= rxe->attr.max_mcast_qp_attach) {
+	if (atomic_inc_return(&mcg->qp_num) > rxe->attr.max_mcast_qp_attach) {
+		atomic_dec(&mcg->qp_num);
 		kfree(mca);
 		err = -ENOMEM;
 		goto out;
@@ -120,7 +121,6 @@ static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 	mca->qp = qp;
 
 	atomic_inc(&qp->mcg_num);
-	mcg->num_qp++;
 	list_add(&mca->qp_list, &mcg->qp_list);
 
 	err = 0;
@@ -178,8 +178,7 @@ static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 			 * object since we are still holding a ref
 			 * from the get key above.
 			 */
-			mcg->num_qp--;
-			if (mcg->num_qp <= 0)
+			if (atomic_dec_return(&mcg->qp_num) <= 0)
 				__rxe_destroy_mcg(mcg);
 
 			atomic_dec(&qp->mcg_num);
@@ -218,7 +217,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	err = rxe_attach_mcg(rxe, qp, mcg);
 
 	/* if we failed to attach the first qp to mcg tear it down */
-	if (mcg->num_qp == 0)
+	if (atomic_read(&mcg->qp_num) == 0)
 		rxe_destroy_mcg(mcg);
 
 	rxe_drop_ref(mcg);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 1b0f40881895..3790163bb265 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -356,7 +356,7 @@ struct rxe_mcg {
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	union ib_gid		mgid;
-	int			num_qp;
+	atomic_t		qp_num;
 	u32			qkey;
 	u16			pkey;
 };
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 05/11] RDMA/rxe: Replace pool key by rxe->mcg_tree
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (3 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 04/11] RDMA/rxe: Replace int num_qp by atomic_t qp_num Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 06/11] RDMA/rxe: Remove key'ed object support Bob Pearson
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Continuing to decouple mcg from rxe pools. Create red-black tree code
in rxe_mcast.c to hold mcg index. Replace pool key calls by calls
to local red-black routines.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   2 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |   3 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 258 ++++++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   6 +-
 5 files changed, 210 insertions(+), 63 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 7386a51b953d..dc36148272dd 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -203,7 +203,9 @@ static int rxe_init(struct rxe_dev *rxe)
 	spin_lock_init(&rxe->pending_lock);
 	INIT_LIST_HEAD(&rxe->pending_mmaps);
 
+	/* init multicast support */
 	spin_lock_init(&rxe->mcg_lock);
+	rxe->mcg_tree = RB_ROOT;
 
 	mutex_init(&rxe->usdev_lock);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index af40e3c212fb..d41831878fa6 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,9 +40,10 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-void rxe_mc_cleanup(struct rxe_pool_elem *arg);
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
+void rxe_mc_cleanup(struct rxe_pool_elem *elem);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 2c6cb2eb5ac1..78d696cd40d5 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -25,56 +25,225 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_del(rxe->ndev, ll_addr);
 }
 
-/* caller should hold rxe->mcg_lock */
-static struct rxe_mcg *__rxe_create_mcg(struct rxe_dev *rxe,
-					struct rxe_pool *pool,
+/**
+ * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree)
+ * @mcg: mcg object with an embedded red-black tree node
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock and
+ * is responsible to avoid adding the same mcg twice to the tree.
+ */
+static void __rxe_insert_mcg(struct rxe_mcg *mcg)
+{
+	struct rb_root *tree = &mcg->rxe->mcg_tree;
+	struct rb_node **link = &tree->rb_node;
+	struct rb_node *node = NULL;
+	struct rxe_mcg *tmp;
+	int cmp;
+
+	while (*link) {
+		node = *link;
+		tmp = rb_entry(node, struct rxe_mcg, node);
+
+		cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid));
+		if (cmp > 0)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&mcg->node, node, link);
+	rb_insert_color(&mcg->node, tree);
+}
+
+/**
+ * __rxe_remove_mcg - remove an mcg from red-black tree holding lock
+ * @mcg: mcast group object with an embedded red-black tree node
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ */
+static void __rxe_remove_mcg(struct rxe_mcg *mcg)
+{
+	rb_erase(&mcg->node, &mcg->rxe->mcg_tree);
+}
+
+/**
+ * __rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ *
+ * Context: caller must hold rxe->mcg_lock
+ * Returns: mcg on success and takes a ref to mcg else NULL
+ */
+static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 					union ib_gid *mgid)
 {
+	struct rb_root *tree = &rxe->mcg_tree;
 	struct rxe_mcg *mcg;
-	int err;
+	struct rb_node *node;
+	int cmp;
 
-	mcg = rxe_alloc_locked(pool);
-	if (!mcg)
-		return ERR_PTR(-ENOMEM);
+	node = tree->rb_node;
 
-	err = rxe_mcast_add(rxe, mgid);
-	if (unlikely(err)) {
-		rxe_drop_ref(mcg);
-		return ERR_PTR(err);
+	while (node) {
+		mcg = rb_entry(node, struct rxe_mcg, node);
+
+		cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid));
+
+		if (cmp > 0)
+			node = node->rb_left;
+		else if (cmp < 0)
+			node = node->rb_right;
+		else
+			break;
 	}
 
+	if (node) {
+		rxe_add_ref(mcg);
+		return mcg;
+	}
+
+	return NULL;
+}
+
+/**
+ * rxe_lookup_mcg - lookup up mcg in red-back tree
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ *
+ * Returns: mcg if found else NULL
+ */
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	struct rxe_mcg *mcg;
+
+	spin_lock_bh(&rxe->mcg_lock);
+	mcg = __rxe_lookup_mcg(rxe, mgid);
+	spin_unlock_bh(&rxe->mcg_lock);
+
+	return mcg;
+}
+
+/**
+ * __rxe_init_mcg - initialize a new mcg
+ * @rxe: rxe device
+ * @mgid: multicast address as a gid
+ * @mcg: new mcg object
+ *
+ * Context: caller should hold rxe->mcg lock
+ * Returns: 0 on success else an error
+ */
+static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
+			  struct rxe_mcg *mcg)
+{
+	int err;
+
+	err = rxe_mcast_add(rxe, mgid);
+	if (unlikely(err))
+		return err;
+
+	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
 	INIT_LIST_HEAD(&mcg->qp_list);
 	mcg->rxe = rxe;
 
-	/* rxe_alloc_locked takes a ref on mcg but that will be
+	/* caller holds a ref on mcg but that will be
 	 * dropped when mcg goes out of scope. We need to take a ref
 	 * on the pointer that will be saved in the red-black tree
-	 * by rxe_add_key and used to lookup mcg from mgid later.
-	 * Adding key makes object visible to outside so this should
+	 * by __rxe_insert_mcg and used to lookup mcg from mgid later.
+	 * Inserting mcg makes it visible to outside so this should
 	 * be done last after the object is ready.
 	 */
 	rxe_add_ref(mcg);
-	rxe_add_key_locked(mcg, mgid);
+	__rxe_insert_mcg(mcg);
 
-	return mcg;
+	return 0;
 }
 
-static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe,
-					 union ib_gid *mgid)
+/**
+ * rxe_get_mcg - lookup or allocate a mcg
+ * @rxe: rxe device object
+ * @mgid: multicast IP address as a gid
+ *
+ * Returns: mcg on success else ERR_PTR(error)
+ */
+static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 {
-	struct rxe_mcg *mcg;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
+	struct rxe_mcg *mcg, *tmp;
+	int err;
 
-	if (rxe->attr.max_mcast_qp_attach == 0)
+	if (rxe->attr.max_mcast_grp == 0)
 		return ERR_PTR(-EINVAL);
 
-	spin_lock_bh(&rxe->mcg_lock);
-	mcg = rxe_pool_get_key_locked(pool, mgid);
+	/* check to see if mcg already exists */
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (mcg)
+		return mcg;
+
+	/* speculative alloc of new mcg */
+	mcg = rxe_alloc(pool);
 	if (!mcg)
-		mcg = __rxe_create_mcg(rxe, pool, mgid);
-	spin_unlock_bh(&rxe->mcg_lock);
+		return ERR_PTR(-ENOMEM);
 
+	spin_lock_bh(&rxe->mcg_lock);
+	/* re-check to see if someone else just added it */
+	tmp = __rxe_lookup_mcg(rxe, mgid);
+	if (tmp) {
+		rxe_drop_ref(mcg);
+		mcg = tmp;
+		goto out;
+	}
+
+	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) {
+		err = -ENOMEM;
+		goto err_dec;
+	}
+
+	err = __rxe_init_mcg(rxe, mgid, mcg);
+	if (err)
+		goto err_dec;
+out:
+	spin_unlock_bh(&rxe->mcg_lock);
 	return mcg;
+
+err_dec:
+	atomic_dec(&rxe->mcg_num);
+	spin_unlock_bh(&rxe->mcg_lock);
+	rxe_drop_ref(mcg);
+	return ERR_PTR(err);
+}
+
+/**
+ * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock
+ * @mcg: the mcg object
+ *
+ * Context: caller is holding rxe->mcg_lock
+ * no qp's are attached to mcg
+ */
+static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
+{
+	/* remove mcg from red-black tree then drop ref */
+	__rxe_remove_mcg(mcg);
+	rxe_drop_ref(mcg);
+
+	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
+}
+
+/**
+ * rxe_destroy_mcg - destroy mcg object
+ * @mcg: the mcg object
+ *
+ * Context: no qp's are attached to mcg
+ */
+static void rxe_destroy_mcg(struct rxe_mcg *mcg)
+{
+	spin_lock_bh(&mcg->rxe->mcg_lock);
+	__rxe_destroy_mcg(mcg);
+	spin_unlock_bh(&mcg->rxe->mcg_lock);
+}
+
+void rxe_mc_cleanup(struct rxe_pool_elem *elem)
+{
+	/* nothing left to do for now */
 }
 
 static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
@@ -129,30 +298,6 @@ static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
-/* caller should be holding rxe->mcg_lock */
-static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	/* first remove mcg from red-black tree then drop ref */
-	rxe_drop_key_locked(mcg);
-	rxe_drop_ref(mcg);
-
-	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
-}
-
-static void rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	struct rxe_dev *rxe = mcg->rxe;
-
-	spin_lock_bh(&rxe->mcg_lock);
-	__rxe_destroy_mcg(mcg);
-	spin_unlock_bh(&rxe->mcg_lock);
-}
-
-void rxe_mc_cleanup(struct rxe_pool_elem *elem)
-{
-	/* nothing left to do for now */
-}
-
 static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
@@ -160,17 +305,16 @@ static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *mca, *tmp;
 	int err;
 
-	spin_lock_bh(&rxe->mcg_lock);
-	mcg = rxe_pool_get_key_locked(&rxe->mc_grp_pool, mgid);
-	if (!mcg) {
-		/* we didn't find the mcast group for mgid */
-		err = -EINVAL;
-		goto out_unlock;
-	}
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (!mcg)
+		return -EINVAL;
 
+	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			list_del(&mca->qp_list);
+			atomic_dec(&qp->mcg_num);
+			rxe_drop_ref(qp);
 
 			/* if the number of qp's attached to the
 			 * mcast group falls to zero go ahead and
@@ -181,10 +325,8 @@ static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 			if (atomic_dec_return(&mcg->qp_num) <= 0)
 				__rxe_destroy_mcg(mcg);
 
-			atomic_dec(&qp->mcg_num);
-
 			/* drop the ref from get key. This will free the
-			 * object if num_qp is zero.
+			 * object if qp_num is zero.
 			 */
 			rxe_drop_ref(mcg);
 			kfree(mca);
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index d91c6660e83c..fb265902f7e3 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -246,7 +246,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		memcpy(&dgid, &ipv6_hdr(skb)->daddr, sizeof(dgid));
 
 	/* lookup mcast group corresponding to mgid, takes a ref */
-	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
+	mcg = rxe_lookup_mcg(rxe, &dgid);
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
@@ -300,7 +300,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 
 	spin_unlock_bh(&rxe->mcg_lock);
 
-	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
+	rxe_drop_ref(mcg);
 
 	if (likely(!skb))
 		return;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 3790163bb265..caa5b1b05019 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,6 +353,7 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rxe_pool_elem	elem;
+	struct rb_node		node;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	union ib_gid		mgid;
@@ -396,7 +397,10 @@ struct rxe_dev {
 	struct rxe_pool		mw_pool;
 	struct rxe_pool		mc_grp_pool;
 
+	/* multicast support */
 	spinlock_t		mcg_lock;
+	struct rb_root		mcg_tree;
+	atomic_t		mcg_num;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
@@ -477,6 +481,4 @@ static inline struct rxe_pd *rxe_mw_pd(struct rxe_mw *mw)
 
 int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name);
 
-void rxe_mc_cleanup(struct rxe_pool_elem *elem);
-
 #endif /* RXE_VERBS_H */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 06/11] RDMA/rxe: Remove key'ed object support
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (4 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 05/11] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 07/11] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Now that rxe_mcast.c has it's own red-black tree support there is no
longer any requirement for key'ed objects in rxe pools. This patch
removes the key APIs and related code.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 123 ---------------------------
 drivers/infiniband/sw/rxe/rxe_pool.h |  38 ---------
 2 files changed, 161 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index a6756aa93e2b..49a25f8ceae1 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -16,8 +16,6 @@ static const struct rxe_type_info {
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
-	size_t key_offset;
-	size_t key_size;
 } rxe_type_info[RXE_NUM_TYPES] = {
 	[RXE_TYPE_UC] = {
 		.name		= "rxe-uc",
@@ -147,12 +145,6 @@ int rxe_pool_init(
 			goto out;
 	}
 
-	if (pool->flags & RXE_POOL_KEY) {
-		pool->key.tree = RB_ROOT;
-		pool->key.key_offset = info->key_offset;
-		pool->key.key_size = info->key_size;
-	}
-
 out:
 	return err;
 }
@@ -209,77 +201,6 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
 	return 0;
 }
 
-static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->key.tree.rb_node;
-	struct rb_node *parent = NULL;
-	struct rxe_pool_elem *elem;
-	int cmp;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     (u8 *)new + pool->key.key_offset,
-			     pool->key.key_size);
-
-		if (cmp == 0) {
-			pr_warn("key already exists!\n");
-			return -EINVAL;
-		}
-
-		if (cmp > 0)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->key_node, parent, link);
-	rb_insert_color(&new->key_node, &pool->key.tree);
-
-	return 0;
-}
-
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size);
-	err = rxe_insert_key(pool, elem);
-
-	return err;
-}
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	write_lock_bh(&pool->pool_lock);
-	err = __rxe_add_key_locked(elem, key);
-	write_unlock_bh(&pool->pool_lock);
-
-	return err;
-}
-
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	rb_erase(&elem->key_node, &pool->key.tree);
-}
-
-void __rxe_drop_key(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	write_lock_bh(&pool->pool_lock);
-	__rxe_drop_key_locked(elem);
-	write_unlock_bh(&pool->pool_lock);
-}
-
 int __rxe_add_index_locked(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
@@ -443,47 +364,3 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 
 	return obj;
 }
-
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
-{
-	struct rb_node *node;
-	struct rxe_pool_elem *elem;
-	void *obj;
-	int cmp;
-
-	node = pool->key.tree.rb_node;
-
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     key, pool->key.key_size);
-
-		if (cmp > 0)
-			node = node->rb_left;
-		else if (cmp < 0)
-			node = node->rb_right;
-		else
-			break;
-	}
-
-	if (node) {
-		kref_get(&elem->ref_cnt);
-		obj = elem->obj;
-	} else {
-		obj = NULL;
-	}
-
-	return obj;
-}
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
-{
-	void *obj;
-
-	read_lock_bh(&pool->pool_lock);
-	obj = rxe_pool_get_key_locked(pool, key);
-	read_unlock_bh(&pool->pool_lock);
-
-	return obj;
-}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 9201bb6b8b07..2db9d9872cd1 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -9,7 +9,6 @@
 
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_KEY		= BIT(2),
 	RXE_POOL_NO_ALLOC	= BIT(4),
 };
 
@@ -32,9 +31,6 @@ struct rxe_pool_elem {
 	struct kref		ref_cnt;
 	struct list_head	list;
 
-	/* only used if keyed */
-	struct rb_node		key_node;
-
 	/* only used if indexed */
 	struct rb_node		index_node;
 	u32			index;
@@ -61,13 +57,6 @@ struct rxe_pool {
 		u32			max_index;
 		u32			min_index;
 	} index;
-
-	/* only used if keyed */
-	struct {
-		struct rb_root		tree;
-		size_t			key_offset;
-		size_t			key_size;
-	} key;
 };
 
 /* initialize a pool of objects with given limit on
@@ -112,26 +101,6 @@ void __rxe_drop_index(struct rxe_pool_elem *elem);
 
 #define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
 
-/* assign a key to a keyed object and insert object into
- * pool's rb tree holding and not holding pool_lock
- */
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->elem, key)
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->elem, key)
-
-/* remove elem from rb tree holding and not holding the pool_lock */
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->elem)
-
-void __rxe_drop_key(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem)
-
 /* lookup an indexed object from index holding and not holding the pool_lock.
  * takes a reference on object
  */
@@ -139,13 +108,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index);
 
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* lookup keyed object from key holding and not holding the pool_lock.
- * takes a reference on the objecti
- */
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key);
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
-
 /* cleanup an object when all references are dropped */
 void rxe_elem_release(struct kref *kref);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 07/11] RDMA/rxe: Remove mcg from rxe pools
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (5 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 06/11] RDMA/rxe: Remove key'ed object support Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Finish removing mcg from rxe pools. Replace rxe pools ref counting by
kref's. Replace rxe_alloc by kzalloc.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  8 ------
 drivers/infiniband/sw/rxe/rxe_loc.h   |  2 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 39 ++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_pool.c  |  9 -------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  1 -
 drivers/infiniband/sw/rxe/rxe_recv.c  |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +-
 7 files changed, 27 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index dc36148272dd..3520eb2db685 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -28,7 +28,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->cq_pool);
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
 
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
@@ -157,15 +156,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 	if (err)
 		goto err8;
 
-	err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
-			    rxe->attr.max_mcast_grp);
-	if (err)
-		goto err9;
-
 	return 0;
 
-err9:
-	rxe_pool_cleanup(&rxe->mw_pool);
 err8:
 	rxe_pool_cleanup(&rxe->mr_pool);
 err7:
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index d41831878fa6..409efeecd581 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -43,7 +43,7 @@ void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
-void rxe_mc_cleanup(struct rxe_pool_elem *elem);
+void rxe_cleanup_mcg(struct kref *kref);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 78d696cd40d5..07c218788c59 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -98,7 +98,7 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 	}
 
 	if (node) {
-		rxe_add_ref(mcg);
+		kref_get(&mcg->ref_cnt);
 		return mcg;
 	}
 
@@ -141,6 +141,7 @@ static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	if (unlikely(err))
 		return err;
 
+	kref_init(&mcg->ref_cnt);
 	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
 	INIT_LIST_HEAD(&mcg->qp_list);
 	mcg->rxe = rxe;
@@ -152,7 +153,7 @@ static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	 * Inserting mcg makes it visible to outside so this should
 	 * be done last after the object is ready.
 	 */
-	rxe_add_ref(mcg);
+	kref_get(&mcg->ref_cnt);
 	__rxe_insert_mcg(mcg);
 
 	return 0;
@@ -167,7 +168,6 @@ static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
  */
 static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 {
-	struct rxe_pool *pool = &rxe->mc_grp_pool;
 	struct rxe_mcg *mcg, *tmp;
 	int err;
 
@@ -180,7 +180,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 		return mcg;
 
 	/* speculative alloc of new mcg */
-	mcg = rxe_alloc(pool);
+	mcg = kzalloc(sizeof(*mcg), GFP_KERNEL);
 	if (!mcg)
 		return ERR_PTR(-ENOMEM);
 
@@ -188,7 +188,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 	/* re-check to see if someone else just added it */
 	tmp = __rxe_lookup_mcg(rxe, mgid);
 	if (tmp) {
-		rxe_drop_ref(mcg);
+		kfree(mcg);
 		mcg = tmp;
 		goto out;
 	}
@@ -208,10 +208,21 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 err_dec:
 	atomic_dec(&rxe->mcg_num);
 	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(mcg);
+	kfree(mcg);
 	return ERR_PTR(err);
 }
 
+/**
+ * rxe_cleanup_mcg - cleanup mcg for kref_put
+ * @kref:
+ */
+void rxe_cleanup_mcg(struct kref *kref)
+{
+	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
+
+	kfree(mcg);
+}
+
 /**
  * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock
  * @mcg: the mcg object
@@ -221,11 +232,14 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
  */
 static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
 {
+	struct rxe_dev *rxe = mcg->rxe;
+
 	/* remove mcg from red-black tree then drop ref */
 	__rxe_remove_mcg(mcg);
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
 	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
+	atomic_dec(&rxe->mcg_num);
 }
 
 /**
@@ -241,11 +255,6 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 	spin_unlock_bh(&mcg->rxe->mcg_lock);
 }
 
-void rxe_mc_cleanup(struct rxe_pool_elem *elem)
-{
-	/* nothing left to do for now */
-}
-
 static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 				  struct rxe_mcg *mcg)
 {
@@ -328,7 +337,7 @@ static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 			/* drop the ref from get key. This will free the
 			 * object if qp_num is zero.
 			 */
-			rxe_drop_ref(mcg);
+			kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 			kfree(mca);
 			err = 0;
 			goto out_unlock;
@@ -336,7 +345,7 @@ static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	/* we didn't find the qp on the list */
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 	err = -EINVAL;
 
 out_unlock:
@@ -362,7 +371,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	if (atomic_read(&mcg->qp_num) == 0)
 		rxe_destroy_mcg(mcg);
 
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 	return err;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 49a25f8ceae1..b6fe7c93aaab 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -79,15 +79,6 @@ static const struct rxe_type_info {
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
-	[RXE_TYPE_MC_GRP] = {
-		.name		= "rxe-mc_grp",
-		.size		= sizeof(struct rxe_mcg),
-		.elem_offset	= offsetof(struct rxe_mcg, elem),
-		.cleanup	= rxe_mc_cleanup,
-		.flags		= RXE_POOL_KEY,
-		.key_offset	= offsetof(struct rxe_mcg, mgid),
-		.key_size	= sizeof(union ib_gid),
-	},
 };
 
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 2db9d9872cd1..8fc95c6b7b9b 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -21,7 +21,6 @@ enum rxe_elem_type {
 	RXE_TYPE_CQ,
 	RXE_TYPE_MR,
 	RXE_TYPE_MW,
-	RXE_TYPE_MC_GRP,
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index fb265902f7e3..53924453abef 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -300,7 +300,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 
 	spin_unlock_bh(&rxe->mcg_lock);
 
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
 	if (likely(!skb))
 		return;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index caa5b1b05019..20fe3ee6589d 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -352,8 +352,8 @@ struct rxe_mw {
 };
 
 struct rxe_mcg {
-	struct rxe_pool_elem	elem;
 	struct rb_node		node;
+	struct kref		ref_cnt;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	union ib_gid		mgid;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (6 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 07/11] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-11 18:43   ` Jason Gunthorpe
  2022-02-08 21:16 ` [PATCH for-next v11 09/11] RDMA/rxe: Finish cleanup of rxe_mcast.c Bob Pearson
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Well behaved applications will free all memory allocated by multicast
but programs which do not clean up properly can leave behind allocated
memory when the rxe driver is unloaded. This patch walks the red-black
tree holding multicast group elements and then walks the list of attached
qp's freeing the mca's and finally the mcg's.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 +
 drivers/infiniband/sw/rxe/rxe_mcast.c | 31 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 3520eb2db685..603b0156f889 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,6 +29,8 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 
+	rxe_cleanup_mcast(rxe);
+
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 409efeecd581..0bc1b7e2877c 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -44,6 +44,7 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 void rxe_cleanup_mcg(struct kref *kref);
+void rxe_cleanup_mcast(struct rxe_dev *rxe);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 07c218788c59..846147878607 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -382,3 +382,34 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	return rxe_detach_mcg(rxe, qp, mgid);
 }
+
+/**
+ * rxe_cleanup_mcast - cleanup all resources held by mcast
+ * @rxe: rxe object
+ *
+ * Called when rxe device is unloaded. Walk red-black tree to
+ * find all mcg's and then walk mcg->qp_list to find all mca's and
+ * free them. These should have been freed already if apps are
+ * well behaved.
+ */
+void rxe_cleanup_mcast(struct rxe_dev *rxe)
+{
+	struct rb_root *root = &rxe->mcg_tree;
+	struct rb_node *node, *next;
+	struct rxe_mcg *mcg;
+	struct rxe_mca *mca, *tmp;
+
+	for (node = rb_first(root); node; node = next) {
+		next = rb_next(node);
+		mcg = rb_entry(node, typeof(*mcg), node);
+
+		spin_lock_bh(&rxe->mcg_lock);
+		list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list)
+			kfree(mca);
+
+		__rxe_remove_mcg(mcg);
+		spin_unlock_bh(&rxe->mcg_lock);
+
+		kfree(mcg);
+	}
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 09/11] RDMA/rxe: Finish cleanup of rxe_mcast.c
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (7 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 10/11] RDMA/rxe: For mcast copy qp list to temp array Bob Pearson
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Cleanup rxe_mcast.c code. Minor changes and complete comments.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 189 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
 2 files changed, 144 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 846147878607..b66839276aa6 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -1,12 +1,33 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
+ * Copyright (c) 2022 Hewlett Packard Enterprise, Inc. All rights reserved.
  * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
 
+/*
+ * rxe_mcast.c implements driver support for multicast transport.
+ * It is based on two data structures struct rxe_mcg ('mcg') and
+ * struct rxe_mca ('mca'). An mcg is allocated each time a qp is
+ * attached to a new mgid for the first time. These are indexed by
+ * a red-black tree using the mgid. This data structure is searched
+ * for the mcg when a multicast packet is received and when another
+ * qp is attached to the same mgid. It is cleaned up when the last qp
+ * is detached from the mcg. Each time a qp is attached to an mcg an
+ * mca is created. It holds a pointer to the qp and is added to a list
+ * of qp's that are attached to the mcg. The qp_list is used to replicate
+ * mcast packets in the rxe receive path.
+ */
+
 #include "rxe.h"
-#include "rxe_loc.h"
 
+/**
+ * rxe_mcast_add - add multicast address to rxe device
+ * @rxe: rxe device object
+ * @mgid: multicast address as a gid
+ *
+ * Returns 0 on success else an error
+ */
 static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
@@ -16,6 +37,13 @@ static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_add(rxe->ndev, ll_addr);
 }
 
+/**
+ * rxe_mcast_delete - delete multicast address from rxe device
+ * @rxe: rxe device object
+ * @mgid: multicast address as a gid
+ *
+ * Returns 0 on success else an error
+ */
 static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
@@ -214,7 +242,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 
 /**
  * rxe_cleanup_mcg - cleanup mcg for kref_put
- * @kref:
+ * @kref: struct kref embedded in mcg
  */
 void rxe_cleanup_mcg(struct kref *kref)
 {
@@ -255,9 +283,57 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 	spin_unlock_bh(&mcg->rxe->mcg_lock);
 }
 
-static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
-				  struct rxe_mcg *mcg)
+/**
+ * __rxe_init_mca - initialize a new mca holding lock
+ * @qp: qp object
+ * @mcg: mcg object
+ * @mca: empty space for new mca
+ *
+ * Context: caller must hold references on qp and mcg, rxe->mcg_lock
+ * and pass memory for new mca
+ *
+ * Returns: 0 on success else an error
+ */
+static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
+			  struct rxe_mca *mca)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	int n;
+
+	n = atomic_inc_return(&rxe->mcg_attach);
+	if (n > rxe->attr.max_total_mcast_qp_attach) {
+		atomic_dec(&rxe->mcg_attach);
+		return -ENOMEM;
+	}
+
+	n = atomic_inc_return(&mcg->qp_num);
+	if (n > rxe->attr.max_mcast_qp_attach) {
+		atomic_dec(&mcg->qp_num);
+		atomic_dec(&rxe->mcg_attach);
+		return -ENOMEM;
+	}
+
+	atomic_inc(&qp->mcg_num);
+
+	rxe_add_ref(qp);
+	mca->qp = qp;
+
+	list_add_tail(&mca->qp_list, &mcg->qp_list);
+
+	return 0;
+}
+
+/**
+ * rxe_attach_mcg - attach qp to mcg if not already attached
+ * @qp: qp object
+ * @mcg: mcg object
+ *
+ * Context: caller must hold reference on qp and mcg.
+ * Returns: 0 on success else an error
+ */
+static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
+	struct rxe_dev *rxe = mcg->rxe;
 	struct rxe_mca *mca, *tmp;
 	int err;
 
@@ -286,73 +362,77 @@ static int rxe_attach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
 		}
 	}
 
-	/* check limits after checking if already attached */
-	if (atomic_inc_return(&mcg->qp_num) > rxe->attr.max_mcast_qp_attach) {
-		atomic_dec(&mcg->qp_num);
+	err = __rxe_init_mca(qp, mcg, mca);
+	if (err)
 		kfree(mca);
-		err = -ENOMEM;
-		goto out;
-	}
-
-	/* protect pointer to qp in mca */
-	rxe_add_ref(qp);
-	mca->qp = qp;
-
-	atomic_inc(&qp->mcg_num);
-	list_add(&mca->qp_list, &mcg->qp_list);
-
-	err = 0;
 out:
 	spin_unlock_bh(&rxe->mcg_lock);
 	return err;
 }
 
-static int rxe_detach_mcg(struct rxe_dev *rxe, struct rxe_qp *qp,
-				   union ib_gid *mgid)
+/**
+ * __rxe_cleanup_mca - cleanup mca object holding lock
+ * @mca: mca object
+ * @mcg: mcg object
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ */
+static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
 {
-	struct rxe_mcg *mcg;
-	struct rxe_mca *mca, *tmp;
-	int err;
+	list_del(&mca->qp_list);
 
-	mcg = rxe_lookup_mcg(rxe, mgid);
-	if (!mcg)
-		return -EINVAL;
+	atomic_dec(&mcg->qp_num);
+	atomic_dec(&mcg->rxe->mcg_attach);
+	atomic_dec(&mca->qp->mcg_num);
+	rxe_drop_ref(mca->qp);
+
+	kfree(mca);
+}
+
+/**
+ * rxe_detach_mcg - detach qp from mcg
+ * @mcg: mcg object
+ * @qp: qp object
+ *
+ * Returns: 0 on success else an error if qp is not attached.
+ */
+static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
+{
+	struct rxe_dev *rxe = mcg->rxe;
+	struct rxe_mca *mca, *tmp;
 
 	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			list_del(&mca->qp_list);
-			atomic_dec(&qp->mcg_num);
-			rxe_drop_ref(qp);
+			__rxe_cleanup_mca(mca, mcg);
 
 			/* if the number of qp's attached to the
 			 * mcast group falls to zero go ahead and
 			 * tear it down. This will not free the
 			 * object since we are still holding a ref
-			 * from the get key above.
+			 * from the caller
 			 */
-			if (atomic_dec_return(&mcg->qp_num) <= 0)
+			if (atomic_read(&mcg->qp_num) <= 0)
 				__rxe_destroy_mcg(mcg);
 
-			/* drop the ref from get key. This will free the
-			 * object if qp_num is zero.
-			 */
-			kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-			kfree(mca);
-			err = 0;
-			goto out_unlock;
+			spin_unlock_bh(&rxe->mcg_lock);
+			return 0;
 		}
 	}
 
 	/* we didn't find the qp on the list */
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-	err = -EINVAL;
-
-out_unlock:
 	spin_unlock_bh(&rxe->mcg_lock);
-	return err;
+	return -EINVAL;
 }
 
+/**
+ * rxe_attach_mcast - attach qp to multicast group (see IBA-11.3.1)
+ * @ibqp: (IB) qp object
+ * @mgid: multicast IP address
+ * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6)
+ *
+ * Returns: 0 on success else an errno
+ */
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	int err;
@@ -365,7 +445,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	if (IS_ERR(mcg))
 		return PTR_ERR(mcg);
 
-	err = rxe_attach_mcg(rxe, qp, mcg);
+	err = rxe_attach_mcg(mcg, qp);
 
 	/* if we failed to attach the first qp to mcg tear it down */
 	if (atomic_read(&mcg->qp_num) == 0)
@@ -375,12 +455,29 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	return err;
 }
 
+/**
+ * rxe_detach_mcast - detach qp from multicast group (see IBA-11.3.2)
+ * @ibqp: address of (IB) qp object
+ * @mgid: multicast IP address
+ * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6)
+ *
+ * Returns: 0 on success else an errno
+ */
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mcg *mcg;
+	int err;
+
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (!mcg)
+		return -EINVAL;
 
-	return rxe_detach_mcg(rxe, qp, mgid);
+	err = rxe_detach_mcg(mcg, qp);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
+	return err;
 }
 
 /**
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 20fe3ee6589d..6b15251ff67a 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -401,6 +401,7 @@ struct rxe_dev {
 	spinlock_t		mcg_lock;
 	struct rb_root		mcg_tree;
 	atomic_t		mcg_num;
+	atomic_t		mcg_attach;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 10/11] RDMA/rxe: For mcast copy qp list to temp array
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (8 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 09/11] RDMA/rxe: Finish cleanup of rxe_mcast.c Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-08 21:16 ` [PATCH for-next v11 11/11] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
  2022-02-11 20:04 ` [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Jason Gunthorpe
  11 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe_rcv_mcast_pkt performs most of its work under the
rxe->mcg_lock and calls into rxe_rcv which queues the packets
to the responder and completer tasklets holding the lock which is
a very bad idea. This patch walks the qp_list in mcg and copies the
qp addresses to a temporary array under the lock but does the rest
of the work without holding the lock. The critical section is now
very small.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_recv.c | 103 +++++++++++++++++----------
 1 file changed, 64 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 53924453abef..9b21cbb22602 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -232,11 +232,15 @@ static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 
 static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 {
+	struct sk_buff *skb_copy;
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_pkt_info *pkt_copy;
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca;
 	struct rxe_qp *qp;
+	struct rxe_qp **qp_array;
 	union ib_gid dgid;
+	int n, nmax;
 	int err;
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -248,68 +252,89 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	/* lookup mcast group corresponding to mgid, takes a ref */
 	mcg = rxe_lookup_mcg(rxe, &dgid);
 	if (!mcg)
-		goto drop;	/* mcast group not registered */
+		goto err_drop;	/* mcast group not registered */
+
+	/* this is the current number of qp's attached to mcg plus a
+	 * little room in case new qp's are attached between here
+	 * and when we finish walking the qp list. If someone can
+	 * attach more than 4 new qp's we will miss forwarding
+	 * packets to those qp's. This is actually OK since UD is
+	 * a unreliable service.
+	 */
+	nmax = atomic_read(&mcg->qp_num) + 4;
+	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
+	n = 0;
 
 	spin_lock_bh(&rxe->mcg_lock);
-
-	/* this is unreliable datagram service so we let
-	 * failures to deliver a multicast packet to a
-	 * single QP happen and just move on and try
-	 * the rest of them on the list
-	 */
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
-		qp = mca->qp;
+		/* protect the qp pointers in the list */
+		rxe_add_ref(mca->qp);
+		qp_array[n++] = mca->qp;
+		if (n == nmax)
+			break;
+	}
+	spin_unlock_bh(&rxe->mcg_lock);
+	nmax = n;
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
-		/* validate qp for incoming packet */
+	for (n = 0; n < nmax; n++) {
+		qp = qp_array[n];
+
+		/* since this is an unreliable transport if
+		 * one of the qp's fails to pass these checks
+		 * just don't forward a packet and continue
+		 * on to the other qp's. If there aren't any
+		 * drop the skb
+		 */
 		err = check_type_state(rxe, pkt, qp);
-		if (err)
+		if (err) {
+			rxe_drop_ref(qp);
+			if (n == nmax - 1)
+				goto err_free;
 			continue;
+		}
 
 		err = check_keys(rxe, pkt, bth_qpn(pkt), qp);
-		if (err)
+		if (err) {
+			rxe_drop_ref(qp);
+			if (n == nmax - 1)
+				goto err_free;
 			continue;
+		}
 
-		/* for all but the last QP create a new clone of the
-		 * skb and pass to the QP. Pass the original skb to
-		 * the last QP in the list.
+		/* for all but the last qp create a new copy(clone)
+		 * of the skb and pass to the qp. Pass the original
+		 * skb to the last qp in the list unless it failed
+		 * checks above
 		 */
-		if (mca->qp_list.next != &mcg->qp_list) {
-			struct sk_buff *cskb;
-			struct rxe_pkt_info *cpkt;
-
-			cskb = skb_clone(skb, GFP_ATOMIC);
-			if (unlikely(!cskb))
+		if (n < nmax - 1) {
+			skb_copy = skb_clone(skb, GFP_KERNEL);
+			if (unlikely(!skb_copy)) {
+				rxe_drop_ref(qp);
 				continue;
+			}
 
 			if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
-				kfree_skb(cskb);
-				break;
+				kfree_skb(skb_copy);
+				rxe_drop_ref(qp);
+				continue;
 			}
 
-			cpkt = SKB_TO_PKT(cskb);
-			cpkt->qp = qp;
-			rxe_add_ref(qp);
-			rxe_rcv_pkt(cpkt, cskb);
+			pkt_copy = SKB_TO_PKT(skb_copy);
+			pkt_copy->qp = qp;
+			rxe_rcv_pkt(pkt_copy, skb_copy);
 		} else {
 			pkt->qp = qp;
-			rxe_add_ref(qp);
 			rxe_rcv_pkt(pkt, skb);
-			skb = NULL;	/* mark consumed */
 		}
 	}
 
-	spin_unlock_bh(&rxe->mcg_lock);
-
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-
-	if (likely(!skb))
-		return;
-
-	/* This only occurs if one of the checks fails on the last
-	 * QP in the list above
-	 */
+	kfree(qp_array);
+	return;
 
-drop:
+err_free:
+	kfree(qp_array);
+err_drop:
 	kfree_skb(skb);
 	ib_device_put(&rxe->ib_dev);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v11 11/11] RDMA/rxe: Convert mca read locking to RCU
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (9 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 10/11] RDMA/rxe: For mcast copy qp list to temp array Bob Pearson
@ 2022-02-08 21:16 ` Bob Pearson
  2022-02-11 20:09   ` Jason Gunthorpe
  2022-02-11 20:04 ` [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Jason Gunthorpe
  11 siblings, 1 reply; 21+ messages in thread
From: Bob Pearson @ 2022-02-08 21:16 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace spinlock with rcu read locks for read side operations
on mca in rxe_recv.c and rxe_mcast.c. Use rcu list extensions on
write side operations and use spinlock to separate write threads.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 73 +++++++++++++++------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  6 +--
 2 files changed, 45 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index b66839276aa6..19cfa06c62ec 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -17,6 +17,12 @@
  * mca is created. It holds a pointer to the qp and is added to a list
  * of qp's that are attached to the mcg. The qp_list is used to replicate
  * mcast packets in the rxe receive path.
+ *
+ * The highest performance operations are mca list traversal when
+ * processing incoming multicast packets which need to be fanned out
+ * to the attached qp's. This list is protected by RCU locking for read
+ * operations and a spinlock in the rxe_dev struct for write operations.
+ * The red-black tree is protected by the same spinlock.
  */
 
 #include "rxe.h"
@@ -284,7 +290,7 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 }
 
 /**
- * __rxe_init_mca - initialize a new mca holding lock
+ * __rxe_init_mca_rcu - initialize a new mca holding lock
  * @qp: qp object
  * @mcg: mcg object
  * @mca: empty space for new mca
@@ -294,8 +300,8 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg)
  *
  * Returns: 0 on success else an error
  */
-static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
-			  struct rxe_mca *mca)
+static int __rxe_init_mca_rcu(struct rxe_qp *qp, struct rxe_mcg *mcg,
+			      struct rxe_mca *mca)
 {
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	int n;
@@ -318,7 +324,7 @@ static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
 	rxe_add_ref(qp);
 	mca->qp = qp;
 
-	list_add_tail(&mca->qp_list, &mcg->qp_list);
+	list_add_tail_rcu(&mca->qp_list, &mcg->qp_list);
 
 	return 0;
 }
@@ -338,14 +344,14 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 	int err;
 
 	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&rxe->mcg_lock);
+			rcu_read_unlock();
 			return 0;
 		}
 	}
-	spin_unlock_bh(&rxe->mcg_lock);
+	rcu_read_unlock();
 
 	/* speculative alloc new mca without using GFP_ATOMIC */
 	mca = kzalloc(sizeof(*mca), GFP_KERNEL);
@@ -362,7 +368,7 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 		}
 	}
 
-	err = __rxe_init_mca(qp, mcg, mca);
+	err = __rxe_init_mca_rcu(qp, mcg, mca);
 	if (err)
 		kfree(mca);
 out:
@@ -371,22 +377,22 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 }
 
 /**
- * __rxe_cleanup_mca - cleanup mca object holding lock
+ * __rxe_cleanup_mca_rcu - cleanup mca object holding lock
  * @mca: mca object
  * @mcg: mcg object
  *
  * Context: caller must hold a reference to mcg and rxe->mcg_lock
  */
-static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
+static void __rxe_cleanup_mca_rcu(struct rxe_mca *mca, struct rxe_mcg *mcg)
 {
-	list_del(&mca->qp_list);
+	list_del_rcu(&mca->qp_list);
 
 	atomic_dec(&mcg->qp_num);
 	atomic_dec(&mcg->rxe->mcg_attach);
 	atomic_dec(&mca->qp->mcg_num);
 	rxe_drop_ref(mca->qp);
 
-	kfree(mca);
+	kfree_rcu(mca);
 }
 
 /**
@@ -399,30 +405,35 @@ static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
 static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
 	struct rxe_dev *rxe = mcg->rxe;
-	struct rxe_mca *mca, *tmp;
+	struct rxe_mca *mca;
+	int ret;
 
 	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
-		if (mca->qp == qp) {
-			__rxe_cleanup_mca(mca, mcg);
-
-			/* if the number of qp's attached to the
-			 * mcast group falls to zero go ahead and
-			 * tear it down. This will not free the
-			 * object since we are still holding a ref
-			 * from the caller
-			 */
-			if (atomic_read(&mcg->qp_num) <= 0)
-				__rxe_destroy_mcg(mcg);
-
-			spin_unlock_bh(&rxe->mcg_lock);
-			return 0;
-		}
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
+		if (mca->qp == qp)
+			goto found;
 	}
 
 	/* we didn't find the qp on the list */
+	ret = -EINVAL;
+	goto done;
+
+found:
+	__rxe_cleanup_mca_rcu(mca, mcg);
+
+	/* if the number of qp's attached to the
+	 * mcast group falls to zero go ahead and
+	 * tear it down. This will not free the
+	 * object since we are still holding a ref
+	 * from the caller
+	 */
+	if (atomic_read(&mcg->qp_num) <= 0)
+		__rxe_destroy_mcg(mcg);
+
+	ret = 0;
+done:
 	spin_unlock_bh(&rxe->mcg_lock);
-	return -EINVAL;
+	return ret;
 }
 
 /**
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 9b21cbb22602..c2cab85c6576 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -265,15 +265,15 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
 	n = 0;
 
-	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
 		/* protect the qp pointers in the list */
 		rxe_add_ref(mca->qp);
 		qp_array[n++] = mca->qp;
 		if (n == nmax)
 			break;
 	}
-	spin_unlock_bh(&rxe->mcg_lock);
+	rcu_read_unlock();
 	nmax = n;
 	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory
  2022-02-08 21:16 ` [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
@ 2022-02-11 18:43   ` Jason Gunthorpe
  2022-02-11 19:36     ` Bob Pearson
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2022-02-11 18:43 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
> Well behaved applications will free all memory allocated by multicast
> but programs which do not clean up properly can leave behind allocated
> memory when the rxe driver is unloaded. This patch walks the red-black
> tree holding multicast group elements and then walks the list of attached
> qp's freeing the mca's and finally the mcg's.

How does this happen? the ib core ensures that all uobjects are
destroyed, so if something is still in the rb tree here it means that
an earlier uobject destruction leaked it

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe
  2022-02-08 21:16 ` [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe Bob Pearson
@ 2022-02-11 19:00   ` Jason Gunthorpe
  2022-02-11 19:27     ` Bob Pearson
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2022-02-11 19:00 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Feb 08, 2022 at 03:16:35PM -0600, Bob Pearson wrote:

> @@ -62,7 +61,7 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>  	if (rxe->attr.max_mcast_qp_attach == 0)
>  		return -EINVAL;
>  
> -	write_lock_bh(&pool->pool_lock);
> +	spin_lock_bh(&rxe->mcg_lock);

>  	grp = rxe_pool_get_key_locked(pool, mgid);

Now this calls rxe_pool_get_key_locked() without the lock :\

This is all fixed up a few patches later, so it only hurts
bisect-ability, do you want to fix it?

I looked up to 'Remove mcg from rxe pools' and it seems fine to me
otherwise

Thanks,
Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe
  2022-02-11 19:00   ` Jason Gunthorpe
@ 2022-02-11 19:27     ` Bob Pearson
  0 siblings, 0 replies; 21+ messages in thread
From: Bob Pearson @ 2022-02-11 19:27 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 2/11/22 13:00, Jason Gunthorpe wrote:
> On Tue, Feb 08, 2022 at 03:16:35PM -0600, Bob Pearson wrote:
> 
>> @@ -62,7 +61,7 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>>  	if (rxe->attr.max_mcast_qp_attach == 0)
>>  		return -EINVAL;
>>  
>> -	write_lock_bh(&pool->pool_lock);
>> +	spin_lock_bh(&rxe->mcg_lock);
> 
>>  	grp = rxe_pool_get_key_locked(pool, mgid);
> 
> Now this calls rxe_pool_get_key_locked() without the lock :\
> 
> This is all fixed up a few patches later, so it only hurts
> bisect-ability, do you want to fix it?
> 
> I looked up to 'Remove mcg from rxe pools' and it seems fine to me
> otherwise
> 
> Thanks,
> Jason
It is locked. Just a different lock. All uses of the red-black tree in rxe_pool for this object pool will be using the same lock. And as you say we are heading towards replacing it all. Just leave it.

Bob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory
  2022-02-11 18:43   ` Jason Gunthorpe
@ 2022-02-11 19:36     ` Bob Pearson
  2022-02-11 19:56       ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Bob Pearson @ 2022-02-11 19:36 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 2/11/22 12:43, Jason Gunthorpe wrote:
> On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
>> Well behaved applications will free all memory allocated by multicast
>> but programs which do not clean up properly can leave behind allocated
>> memory when the rxe driver is unloaded. This patch walks the red-black
>> tree holding multicast group elements and then walks the list of attached
>> qp's freeing the mca's and finally the mcg's.
> 
> How does this happen? the ib core ensures that all uobjects are
> destroyed, so if something is still in the rb tree here it means that
> an earlier uobject destruction leaked it
> 
> Jason

The mc_grp and mc_elem objects are not rdma-core uobjects. So their memory
is allocated by the rxe driver. They get created by ib_attach_mcast and destroyed
by ib_detach_mcast. If an application crashes without calling a matching
ib_detach_mcast for each attachment the driver would have leaked the memory.
This patch fixes that.

Bob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory
  2022-02-11 19:36     ` Bob Pearson
@ 2022-02-11 19:56       ` Jason Gunthorpe
  2022-02-12  0:37         ` Bob Pearson
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2022-02-11 19:56 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Fri, Feb 11, 2022 at 01:36:06PM -0600, Bob Pearson wrote:
> On 2/11/22 12:43, Jason Gunthorpe wrote:
> > On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
> >> Well behaved applications will free all memory allocated by multicast
> >> but programs which do not clean up properly can leave behind allocated
> >> memory when the rxe driver is unloaded. This patch walks the red-black
> >> tree holding multicast group elements and then walks the list of attached
> >> qp's freeing the mca's and finally the mcg's.
> > 
> > How does this happen? the ib core ensures that all uobjects are
> > destroyed, so if something is still in the rb tree here it means that
> > an earlier uobject destruction leaked it
> > 
> > Jason
> 
> The mc_grp and mc_elem objects are not rdma-core uobjects. So their memory
> is allocated by the rxe driver. They get created by ib_attach_mcast and destroyed
> by ib_detach_mcast. If an application crashes without calling a matching
> ib_detach_mcast for each attachment the driver would have leaked the memory.
> This patch fixes that.

The mcast attachment is affiliated with a QP, when all the QPs are
destroyed, which are rdma-core objects, then all attachments should be
free'd as well. That should have happened by the time things get here

I would expect a simple WARN_ON that the rb tree is empty in destroy
to catch any bugs.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c
  2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (10 preceding siblings ...)
  2022-02-08 21:16 ` [PATCH for-next v11 11/11] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
@ 2022-02-11 20:04 ` Jason Gunthorpe
  11 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2022-02-11 20:04 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Feb 08, 2022 at 03:16:34PM -0600, Bob Pearson wrote:
> Bob Pearson (11):
>   RDMA/rxe: Move mcg_lock to rxe
>   RDMA/rxe: Use kzmalloc/kfree for mca
>   RDMA/rxe: Replace grp by mcg, mce by mca
>   RDMA/rxe: Replace int num_qp by atomic_t qp_num
>   RDMA/rxe: Replace pool key by rxe->mcg_tree
>   RDMA/rxe: Remove key'ed object support
>   RDMA/rxe: Remove mcg from rxe pools

I took these ones, thanks

Lets keep looking at the other ones

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 11/11] RDMA/rxe: Convert mca read locking to RCU
  2022-02-08 21:16 ` [PATCH for-next v11 11/11] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
@ 2022-02-11 20:09   ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2022-02-11 20:09 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Feb 08, 2022 at 03:16:45PM -0600, Bob Pearson wrote:
>  	/* check to see if the qp is already a member of the group */
> -	spin_lock_bh(&rxe->mcg_lock);
> -	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
>  		if (mca->qp == qp) {

The use of mca->qp protected by RCU isn't Ok..

Look at the free path:

> +static void __rxe_cleanup_mca_rcu(struct rxe_mca *mca, struct rxe_mcg *mcg)
>  {
> +	list_del_rcu(&mca->qp_list);
>  
>  	atomic_dec(&mcg->qp_num);
>  	atomic_dec(&mcg->rxe->mcg_attach);
>  	atomic_dec(&mca->qp->mcg_num);
>  	rxe_drop_ref(mca->qp);
>  
> +	kfree_rcu(mca);

So the mca deref won't segfault but mca->qp is garbage now since it
might have been freed and reallocated due to the rxe_drop_ref()

It looks easy enough to fix, just use a call_rcu to free the thing
instead of kfree_rcu and do the rxe_drop_ref, and maybe others, inside
the call_rcu function.

> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 9b21cbb22602..c2cab85c6576 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -265,15 +265,15 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>  	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
>  	n = 0;
>  
> -	spin_lock_bh(&rxe->mcg_lock);
> -	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
>  		/* protect the qp pointers in the list */
>  		rxe_add_ref(mca->qp);

And this one could use after free qp

If the mca->qp is guarenteed to have a valid refcount as above then it
is OK.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory
  2022-02-11 19:56       ` Jason Gunthorpe
@ 2022-02-12  0:37         ` Bob Pearson
  2022-02-14 17:41           ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Bob Pearson @ 2022-02-12  0:37 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 2/11/22 13:56, Jason Gunthorpe wrote:
> On Fri, Feb 11, 2022 at 01:36:06PM -0600, Bob Pearson wrote:
>> On 2/11/22 12:43, Jason Gunthorpe wrote:
>>> On Tue, Feb 08, 2022 at 03:16:42PM -0600, Bob Pearson wrote:
>>>> Well behaved applications will free all memory allocated by multicast
>>>> but programs which do not clean up properly can leave behind allocated
>>>> memory when the rxe driver is unloaded. This patch walks the red-black
>>>> tree holding multicast group elements and then walks the list of attached
>>>> qp's freeing the mca's and finally the mcg's.
>>>
>>> How does this happen? the ib core ensures that all uobjects are
>>> destroyed, so if something is still in the rb tree here it means that
>>> an earlier uobject destruction leaked it
>>>
>>> Jason
>>
>> The mc_grp and mc_elem objects are not rdma-core uobjects. So their memory
>> is allocated by the rxe driver. They get created by ib_attach_mcast and destroyed
>> by ib_detach_mcast. If an application crashes without calling a matching
>> ib_detach_mcast for each attachment the driver would have leaked the memory.
>> This patch fixes that.
> 
> The mcast attachment is affiliated with a QP, when all the QPs are
> destroyed, which are rdma-core objects, then all attachments should be
> free'd as well. That should have happened by the time things get here
> 
> I would expect a simple WARN_ON that the rb tree is empty in destroy
> to catch any bugs.
> 
> Jason
I wrote a simple user test case that calls ibv_attach_mcast() and then sleeps.
If I type ^C the qp is detached as you predicted so someone is counting
them somewhere. It isn't obvious in rdma-core. Not sure what you are suggesting
above. The detach code already checks to see if the attachment is *not* present.
The only time it is clear that there shouldn't be any more attachments present
is when the device is closed which is what the cleanup code is doing. I can add
a warning if there was anything there but it will likely never get called since
someone above me is actively cleaning them up from failed apps. Or we can just drop
this patch and assume it isn't needed.

Bob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory
  2022-02-12  0:37         ` Bob Pearson
@ 2022-02-14 17:41           ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2022-02-14 17:41 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Fri, Feb 11, 2022 at 06:37:50PM -0600, Bob Pearson wrote:
> > The mcast attachment is affiliated with a QP, when all the QPs are
> > destroyed, which are rdma-core objects, then all attachments should be
> > free'd as well. That should have happened by the time things get here
> > 
> > I would expect a simple WARN_ON that the rb tree is empty in destroy
> > to catch any bugs.
> > 
> > Jason
>
> I wrote a simple user test case that calls ibv_attach_mcast() and
> then sleeps.  If I type ^C the qp is detached as you predicted so
> someone is counting them somewhere. It isn't obvious in rdma-core.

QP destroy is done inside the kernel in all the uverbs stuff, detatch
from multicast during QP destroy should have been inside RXE's qp
destroy function.

> never get called since someone above me is actively cleaning them up
> from failed apps. Or we can just drop this patch and assume it isn't
> needed.

I mean drop this patch and just put a WARN_ON to assert the rb tree is
empty before destroying the memory that holds it, just in case someone
adds a bug someday.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-02-14 17:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08 21:16 [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 01/11] RDMA/rxe: Move mcg_lock to rxe Bob Pearson
2022-02-11 19:00   ` Jason Gunthorpe
2022-02-11 19:27     ` Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 02/11] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 03/11] RDMA/rxe: Replace grp by mcg, mce by mca Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 04/11] RDMA/rxe: Replace int num_qp by atomic_t qp_num Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 05/11] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 06/11] RDMA/rxe: Remove key'ed object support Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 07/11] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 08/11] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
2022-02-11 18:43   ` Jason Gunthorpe
2022-02-11 19:36     ` Bob Pearson
2022-02-11 19:56       ` Jason Gunthorpe
2022-02-12  0:37         ` Bob Pearson
2022-02-14 17:41           ` Jason Gunthorpe
2022-02-08 21:16 ` [PATCH for-next v11 09/11] RDMA/rxe: Finish cleanup of rxe_mcast.c Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 10/11] RDMA/rxe: For mcast copy qp list to temp array Bob Pearson
2022-02-08 21:16 ` [PATCH for-next v11 11/11] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
2022-02-11 20:09   ` Jason Gunthorpe
2022-02-11 20:04 ` [PATCH for-next v11 00/11] Move two object pools to rxe_mcast.c Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.